Diffusion Policy#

Installation#

cd roboverse_learn/algorithms/diffusion_policy
pip install -e .
cd ../../../

pip install pandas wandb

Option 1: Two Step, Pre-processing and Training#

Data Preparation:#

data2zarr_dp.py converts the metadata stored by the collect_demo script into Zarr format for efficient dataloading. This script can handle both joint position and end effector action and observation spaces.

Command:

python roboverse_learn/algorithms/data2zarr_dp.py \
--task_name <task_name> \
--expert_data_num <expert_data_num> \
--metadata_dir <metadata_dir> \
--action_space <action_space> \
--observation_space <observation_space>

Argument	Description	Example
`task_name`	Name of the task	`CloseBox_Franka_Level0_obs:joint_pos_action:joint_pos`
`expert_data_num`	Number of expert demonstrations to process	`100`
`metadata_dir`	Path to the directory containing demonstration metadata saved by collect_demo	`roboverse_demo/demo_isaaclab/CloseBox-Level0/robot-franka`
`action_space`	Type of action space to use (options: ‘joint_pos’ or ‘ee’)	`joint_pos`
`observation_space`	Type of observation space to use (options: ‘joint_pos’ or ‘ee’)	`joint_pos`
`delta_ee`	(optional) Delta control (0: absolute, 1: delta; default 0)	`0`

Training:#

diffusion_policy/train.py uses the generated Zarr data, which gets stored in the data_policy/ directory, to train the diffusion policy model. Note the policy.runner arguments should match the arguments used in data2zarr_dp.py and are used in downstream evaluations.

Command:

python roboverse_learn/algorithms/diffusion_policy/train.py \
--config-name=robot_dp.yaml \
task.name=<task_name> \
task.dataset.zarr_path=<zarr_path> \
training.seed=<seed> \
horizon=<horizon> \
n_obs_steps=<n_obs_steps> \
n_action_steps=<n_action_steps> \
training.num_epochs=<num_epochs> \
policy_runner.obs.obs_type=<obs_type> \
policy_runner.action.action_type=<action_type> \
policy_runner.action.delta=<delta> \
training.device=<device>

Argument	Description	Example
`task_name`	Name of the task	`CloseBox_Franka_Level0_obs:joint_pos_action:joint_pos`
`zarr_path`	Path to the zarr dataset created in Step 1. This will be {task_name}_{expert_data_num}.zarr	`data_policy/CloseBox_Franka_Level0_obs:joint_pos_action:joint_pos_100.zarr`
`seed`	Random seed for reproducibility	`42`
`horizon`	Time horizon for the policy	`8`
`n_obs_steps`	Number of observation steps	`3`
`n_action_steps`	Number of action steps	`4`
`num_epochs`	Number of training epochs	`200`
`obs_type`	Observation type (joint_pos or ee)	`joint_pos`
`action_type`	Action type (joint_pos or ee)	`joint_pos`
`delta`	Delta control mode (0 for absolute, 1 for delta)	`0`
`device`	GPU device to use	`"cuda:7"`

Option 2: Run with Single Command: train_dp.sh#

We further wrap the data preparation and training into a single command: train_dp.sh. This ensures consistency between the parameters of the data preparation and training, especially the action space, observation space, data directory.

bash roboverse_learn/algorithms/diffusion_policy/train_dp.sh <metadata_dir> <task_name> <expert_data_num> <gpu_id> <num_epochs> <obs_space> <act_space> [<delta_ee>]

Argument	Description
`metadata_dir`	Path to the directory containing demonstration metadata saved by collect_demo
`task_name`	Name of the task
`expert_data_num`	Number of expert demonstrations to use
`gpu_id`	ID of the GPU to use
`num_epochs`	Number of training epochs
`obs_space`	Observation space (`joint_pos` or `ee`)
`act_space`	Action space (`joint_pos` or `ee`)
`delta_ee`	Optional: Delta control (`0` absolute, `1` delta; default 0)

Example:

bash roboverse_learn/algorithms/diffusion_policy/train_dp.sh roboverse_demo/demo_isaaclab/CloseBox-Level0/robot-franka CloseBoxFrankaL0 100 0 200 joint_pos joint_pos

Important Parameter Overrides:

horizon, n_obs_steps, and n_action_steps are set directly in train.sh and override the YAML configurations.
All other parameters (e.g., batch size, number of epochs) can be manually adjusted in the YAML file: roboverse_learn/algorithms/diffusion_policy/diffusion_policy/config/robot_dp.yaml
If you alter observation and action spaces, verify the corresponding shapes in: roboverse_learn/algorithms/diffusion_policy/diffusion_policy/config/task/default_task.yaml Both end effector control and Franka joint space, have dimension 9 but keep this in mind if using a different robot.

Switching between Joint Position and End Effector Control#

Joint Position Control: Set both obs_space and act_space to joint_pos.
End Effector Control: Set both obs_space and act_space to ee. You may use delta_ee=1 for delta mode or delta_ee=0 for absolute positioning.

Adjust relevant configuration parameters in:

roboverse_learn/algorithms/diffusion_policy/diffusion_policy/config/robot_dp.yaml

Evaluation#

To deploy and evaluate the trained policy:

python roboverse_learn/eval.py --task CloseBox --algo diffusion_policy --num_envs <up to ~50 envs works on RTX> --checkpoint_path <checkpoint_path>

Ensure that <checkpoint_path> points to the file of the trained model checkpoint, ie info/outputs/DP/...