Phi-4 Research Assistant Training

Control Panel

Training Configuration

Model

  • Model: unsloth/phi-4-unsloth-bnb-4bit
  • Learning Rate: 2e-05
  • Per-Device Batch Size: 16
  • Gradient Accumulation: 3
  • Total Effective Batch Size: 16 × 4 × 3 = 192
  • Epochs: 3
  • Precision: BF16
  • Max Sequence Length: 2048

Hardware

  • GPU: 4× L4 (24 GB VRAM per GPU, total: 96 GB)
  • Multi-GPU Strategy: ddp
  • Memory Optimizations: Gradient Checkpointing

Dataset

  • Dataset: George-API/phi4-cognitive-dataset
  • Dataset Split: train

Training Information

Hardware:

  • 4× NVIDIA L4 GPUs (24GB VRAM per GPU, 96GB total)
  • Training with BF16 precision
  • Using Data Parallel for multi-GPU
  • Effective batch size: 16 (per device) × 4 (GPUs) × 3 (gradient accumulation) = 192

Notes:

  • Training may take several hours depending on dataset size
  • Check the Space logs for real-time progress
  • Model checkpoints will be saved to ./results directory