SmolVLA#
SmolVLA is a lightweight Vision-Language-Action (VLA) model from Hugging Face’s LeRobot framework. It provides a faster and more efficient alternative to larger VLA models like OpenVLA, making it ideal for rapid experimentation and resource-constrained environments.
Overview#
SmolVLA uses the same LeRobot data format as the pi0 models, enabling data reuse between these implementations. This guide covers the complete pipeline from data conversion to model evaluation.
Pipeline Overview#
The workflow consists of three main stages:
Demo Collection → LeRobot Conversion → SmolVLA Training → Evaluation
1. Collect Demonstration Trajectories#
Generate robotic demonstration data using RoboVerse simulation:
python scripts/advanced/collect_demo.py \
  --sim=mujoco \
  --task=pick_butter \
  --headless \
  --run_all
Output example:
roboverse_demo/demo_mujoco/pick_butter-/robot-franka/demo_0001
2. Data Format Conversion (to LeRobot)#
Convert collected demos to LeRobot format using the shared conversion script:
python roboverse_learn/vla/pi0/convert_roboverse_to_lerobot.py \
  --input-root roboverse_demo/demo_mujoco \
  --repo-id <your_hf_name>/roboverse-pick_butter \
  --overwrite
Parameters:
- --input-root: Directory containing demonstration data
- --repo-id: Hugging Face repository ID for the dataset
- --overwrite: Overwrite existing dataset if present
Output: Dataset saved to ~/.cache/huggingface/lerobot/<repo-id>
3. Install SmolVLA and LeRobot#
Install LeRobot with SmolVLA support:
cd third_party
git clone https://github.com/huggingface/lerobot.git
cd lerobot
pip install -e ".[smolvla]"
4. Configure Training#
Edit roboverse_learn/vla/SmolVLA/finetune_smolvla.sh to set your dataset:
DATASET_REPO_ID="<your_hf_name>/roboverse-pick_butter"
Key hyperparameters:
- BATCH_SIZE: Batch size for training (default: 8)
- LEARNING_RATE: Initial learning rate (default: 1e-4)
- TRAINING_STEPS: Total training steps (default: 100000)
- WARMUP_STEPS: Learning rate warmup steps (default: 1000)
- SAVE_STEPS: Checkpoint save frequency (default: 50)
5. Train SmolVLA#
Run the training script:
cd roboverse_learn/vla/SmolVLA
bash finetune_smolvla.sh
Training output: Checkpoints saved to ./smolvla_runs/smolvla_roboverse_<timestamp>/checkpoints/
6. Evaluate Model#
Evaluate the trained model on RoboVerse tasks:
python roboverse_learn/vla/SmolVLA/smolvla_eval.py \
  --model_path ./smolvla_runs/smolvla_roboverse_<timestamp>/checkpoints/001000 \
  --task pick_butter \
  --robot franka \
  --sim mujoco \
  --num_episodes 10
Parameters:
- --model_path: Path to trained checkpoint
- --task: Task name (e.g., pick_butter)
- --robot: Robot model (e.g., franka)
- --sim: Simulator backend (mujoco, pybullet, sapien, etc.)
- --num_episodes: Number of evaluation episodes
- --max_steps: Maximum steps per episode (default: 250)
Output:
- Videos: - ./smolvla_eval_output/episode_*.mp4
- Results: - ./smolvla_eval_output/smolvla_eval_<task>_<timestamp>.json
Data Pipeline Diagram#
Raw Demos ──collect_demo.py──▶ demo_mujoco/
      │
      ▼
LeRobot Conversion ──convert_roboverse_to_lerobot.py──▶ ~/.cache/huggingface/lerobot/
      │
      ▼
SmolVLA Training ──finetune_smolvla.sh──▶ Trained Model
      │
      ▼
Evaluation ──smolvla_eval.py──▶ Performance Metrics & Videos
Model Comparison#
| Feature | OpenVLA | SmolVLA | pi0 | 
|---|---|---|---|
| Model Size | ~7B parameters | <1B parameters | Various sizes | 
| Data Format | RLDS | LeRobot | LeRobot | 
| Training Speed | Slow | Fast | Medium | 
| Inference Speed | Slow | Fast | Medium | 
| Resource Requirements | High | Low | Medium | 
| Best Use Case | Research accuracy | Fast experimentation | Production deployment | 
Key Advantages#
- Data Sharing: SmolVLA uses the same LeRobot format as pi0, so you only need to convert data once 
- Lightweight: Smaller model size enables faster training and inference 
- Lower Resources: Can run on GPUs with less memory 
- Quick Iteration: Faster training cycles for experimentation 
Troubleshooting#
CUDA Out of Memory#
Reduce batch size in finetune_smolvla.sh:
BATCH_SIZE=4
Dataset Not Found#
Check that the dataset was converted successfully:
ls ~/.cache/huggingface/lerobot/
Import Errors#
Ensure LeRobot is installed with SmolVLA support:
pip install -e ".[smolvla]"
Model Loading Fails#
Verify the checkpoint path contains the pretrained_model/ subdirectory:
ls <checkpoint_path>/pretrained_model/
Advanced Usage#
Multi-task Training#
Convert multiple tasks to the same dataset repository:
python roboverse_learn/vla/pi0/convert_roboverse_to_lerobot.py \
  --input-root roboverse_demo/demo_mujoco/pick_butter- \
  --repo-id myname/roboverse-multitask
python roboverse_learn/vla/pi0/convert_roboverse_to_lerobot.py \
  --input-root roboverse_demo/demo_mujoco/stack_blocks- \
  --repo-id myname/roboverse-multitask
Custom Hyperparameters#
Modify finetune_smolvla.sh to adjust training parameters:
- Learning rate schedule: Modify - DECAY_LRfor final learning rate
- Gradient clipping: Change - --optimizer.grad_clip_norm
- Logging frequency: Adjust - --log_freq
WandB Logging#
Enable Weights & Biases logging in finetune_smolvla.sh:
USE_WANDB=true
WANDB_PROJECT="smolvla_roboverse"
WANDB_ENTITY="<your_wandb_username>"
File Structure#
roboverse_learn/vla/SmolVLA/
├── README.md                    # Detailed usage documentation
├── finetune_smolvla.sh         # Training script
├── smolvla_eval.py             # Evaluation script
└── roboverse_smolvla_policy.py # Data format adapter (optional)