OpenVLA#

OpenVLA is the first open-source 7B Vision-Language-Action model, which was built upon Prismatic VLM. RLDS is the mainstream data format for VLAs. To finetune RoboVerse data on OpenVLA, you need to convert the RoboVerse data to RLDS format. The following steps will guide you through the whole finetuning process.

RLDS Conversion#

In RoboVerse, we have provided rlds_utils to convert the RoboVerse data to RLDS format. The script is located at roboverse_learn/rlds_utils. First create a conda environment using the provided environment.yml file (use environment_ubuntu.yml or environment_macos.yml depending on the operating system you’re using):

conda env create -f environment_ubuntu.yml

Then activate the environment using:

conda activate rlds_env

If you want to manually create an environment, the key packages to install are tensorflow, tensorflow_datasets, tensorflow_hub, apache_beam, matplotlib, plotly and wandb.

Then, please refer to the roboverse folder. All you need to do is create a soft link of demo to the roboverse folder and run tfds build --overwrite in the roboverse folder after verifying your installation of the conversion environment. The script will automatically convert all the episodes into RLDS format. The transformed rlds dataset will be stored in ~/tensorflow_datasets/roboverse_dataset.

Finetune OpenVLA#

Clone the RoboVerse version of OpenVLA and install the environment as required.

# Create and activate conda environment
conda create -n openvla python=3.10 -y
conda activate openvla

# Install PyTorch. Below is a sample command to do this, but you should check the following link
# to find installation instructions that are specific to your compute platform:
# https://pytorch.org/get-started/locally/
conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia -y  # UPDATE ME!

# Clone and install the openvla repo
git clone https://github.com/openvla/openvla.git
cd openvla
pip install -e .

# Install Flash Attention 2 for training (https://github.com/Dao-AILab/flash-attention)
#   =>> If you run into difficulty, try `pip cache remove flash_attn` first
pip install packaging ninja
ninja --version; echo $?  # Verify Ninja --> should return exit code "0"
pip install "flash-attn==2.5.5" --no-build-isolation

Then, create a symbolic link of converted rlds data to your workspace.

cd openvla
ln -s <path_to_rlds_data> data

Launch finetuning with the following command:

cd openvla
conda activate openvla-tf
pip install -e .
bash launch_finetune.sh

launch_finetune.sh is as follows. You can modify launch_finetune.sh to change the finetuning settings.

#!/bin/bash

export HF_TOKEN=YOUR_HF_TOKEN
export WANDB_API_KEY=YOUR_WANDB_API_KEY
export HF_HOME=cache

torchrun --standalone --nnodes=1 --nproc-per-node=8 vla-scripts/finetune.py \
    --vla_path=openvla/openvla-7b \
    --data_root_dir=data/ \
    --dataset_name=pickcube \
    --run_root_dir=runs \
    --lora_rank=32 \
    --batch_size=16 \
    --grad_accumulation_steps=1 \
    --learning_rate=5e-4 \
    --image_aug=True \
    --wandb_project=openvla-finetune-lora32-pickcube \
    --wandb_entity=YOUR_WANDB_ENTITY \
    --save_steps=5000 \
    --save_latest_checkpoint_only=True # [Optional] Whether to save only one checkpoint per run and continually overwrite the latest checkpoint(If False, saves all checkpoints)

Evaluation on RoboVerse#

After finetuning, you can evaluate the model on RoboVerse. The evaluation script is located at roboverse_learn/eval_vla.py. You can run the script with the following command:

python roboverse_learn/eval_vla.py --task TASK_NAME --algo openvla --ckpt_path PATH_TO_CHECKPOINT

For Developers#

  • Codebase on Berkeley EM: /home/ghr/wangbangjun/RoboVerse-OpenVLA.

  • Launch Script: stackcube_l0_finetune.sh.

  • Dataset transform RoboVerse-OpenVLA/prismatic/vla/datasets/rlds/oxe/transforms.py. V2 uses roboverse_v2_dataset_transform.

  • Data path: /home/ghr/tensorflow_datasets/stackcube_level0/