OpenVLA#

OpenVLA is the first open-source 7B Vision-Language-Action model, which was built upon Prismatic VLM. RLDS is the mainstream data format for VLAs. To finetune RoboVerse data on OpenVLA, you need to convert the RoboVerse data to RLDS format. The following steps will guide you through the whole finetuning process.

RoboVerse VLA Training Pipeline#

This README describes how to train Vision-Language-Action (VLA) models on RoboVerse robotic manipulation data using the OpenVLA framework.

The pipeline consists of four stages:

Demo Collection → RLDS Conversion → VLA Fine-tuning → Evaluation

1. Collect Demonstration Trajectories#

Generate robotic demonstration data using RoboVerse simulation. Example:

python scripts/advanced/collect_demo.py \
  --sim=mujoco \
  --task=pick_butter \
  --headless \
  --run_all

Parameters:

  • --sim: Simulation backend (mujoco, isaac, …)

  • --task: Specific manipulation task (e.g., pick_butter)

  • --headless: Run without GUI for efficiency

  • --run_all: Run all available episodes

Output example:

roboverse_demo/demo_mujoco/pick_butter-/robot-franka/demo_0001

2. Data Format Conversion (to RLDS)#

Collected demos must be converted into the Robot Learning Dataset Specification (RLDS) format.

Step 2.1 Navigate to RLDS utilities#

cd roboverse_learn/vla/rlds_utils/roboverse

Step 2.3 Set up conversion environment#

conda env create -f ../environment_ubuntu.yml
conda activate rlds_env

Key packages: tensorflow, tensorflow_datasets, tensorflow_hub, apache_beam, matplotlib, plotly, wandb.

Step 2.4 Run conversion#

tfds build --overwrite

Output dataset:

~/tensorflow_datasets/roboverse_dataset/

3. Fine-tune OpenVLA#

Step 3.1 Environment Setup#

# Create and activate conda environment
conda create -n openvla python=3.10 -y
conda activate openvla

# Install PyTorch (adjust version for your CUDA setup)
conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia -y

Step 3.2 Install OpenVLA#

git clone https://github.com/openvla/openvla.git
cd openvla
pip install -e .

# Install Flash Attention 2
pip install packaging ninja
ninja --version   # should print a version number
pip install "flash-attn==2.5.5" --no-build-isolation

Troubleshooting Flash Attention Installation:

If you encounter the following error with CUDA 12+ and PyTorch 2.6:

undefined symbol: _ZN3c105ErrorC2ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

Solution: Download the appropriate wheel file for your CUDA and PyTorch versions from the official Flash Attention releases: Dao-AILab/flash-attention

For a stable CUDA 12 environment, we recommend using flash-attn 2.7.4.

Step 3.3 Fine-tuning Execution#

If you haven’t downloaded the OpenVLA checkpoints yet, set your Hugging Face token as an environment variable and then run:

cd roboverse_learn/vla/
export HF_token=your_hf_token

Launch training with the RoboVerse wrapper script:

cd roboverse_learn/vla/
bash finetune_roboverse.sh

⚠️ Make sure dataset paths and model checkpoint paths in finetune_roboverse.sh are correct.

4. Evaluation#

After training, evaluate the VLA model on RoboVerse tasks:

python roboverse_learn/vla/vla_eval.py \
  --model_path openvla_runs/path/to/checkpoint \
  --task pick_butter

5. Data Pipeline Overview#

Raw Demos ──collect_demo.py──▶ demo_mujoco/
      │
      ▼
 RLDS Conversion ──tfds──▶ ~/tensorflow_datasets/roboverse_dataset/
      │
      ▼
 OpenVLA Fine-tuning ──finetune_roboverse.sh──▶ Trained VLA
      │
      ▼
 Evaluation ──vla_eval.py──▶ Task Performance Metrics