Phi-4 Research Assistant Training

Control Panel

Training Configuration

Model

Model: unsloth/phi-4-unsloth-bnb-4bit
Learning Rate: 2e-05
Per-Device Batch Size: 16
Gradient Accumulation: 3
Total Effective Batch Size: 16 × 4 × 3 = 192
Epochs: 3
Precision: BF16
Max Sequence Length: 2048

Hardware

GPU: 4× L4 (24 GB VRAM per GPU, total: 96 GB)
Multi-GPU Strategy: ddp
Memory Optimizations: Gradient Checkpointing

Dataset

Dataset: George-API/phi4-cognitive-dataset
Dataset Split: train

Status

Training Information

Hardware:

4× NVIDIA L4 GPUs (24GB VRAM per GPU, 96GB total)
Training with BF16 precision
Using Data Parallel for multi-GPU
Effective batch size: 16 (per device) × 4 (GPUs) × 3 (gradient accumulation) = 192

Notes:

Training may take several hours depending on dataset size
Check the Space logs for real-time progress
Model checkpoints will be saved to ./results directory