Diffusion Policy#
Installation#
cd roboverse_learn/algorithms/diffusion_policy
pip install -e .
cd ../../../
pip install pandas wandb
Option 1: Two Step, Pre-processing and Training#
Data Preparation:#
data2zarr_dp.py converts the metadata stored by the collect_demo script into Zarr format for efficient dataloading. This script can handle both joint position and end effector action and observation spaces.
Command:
python roboverse_learn/algorithms/data2zarr_dp.py \
--task_name <task_name> \
--expert_data_num <expert_data_num> \
--metadata_dir <metadata_dir> \
--action_space <action_space> \
--observation_space <observation_space>
Argument |
Description |
Example |
---|---|---|
|
Name of the task |
|
|
Number of expert demonstrations to process |
|
|
Path to the directory containing demonstration metadata saved by collect_demo |
|
|
Type of action space to use (options: ‘joint_pos’ or ‘ee’) |
|
|
Type of observation space to use (options: ‘joint_pos’ or ‘ee’) |
|
|
(optional) Delta control (0: absolute, 1: delta; default 0) |
|
Training:#
diffusion_policy/train.py uses the generated Zarr data, which gets stored in the data_policy/
directory, to train the diffusion policy model. Note the policy.runner arguments should match the arguments used in data2zarr_dp.py and are used in downstream evaluations.
Command:
python roboverse_learn/algorithms/diffusion_policy/train.py \
--config-name=robot_dp.yaml \
task.name=<task_name> \
task.dataset.zarr_path=<zarr_path> \
training.seed=<seed> \
horizon=<horizon> \
n_obs_steps=<n_obs_steps> \
n_action_steps=<n_action_steps> \
training.num_epochs=<num_epochs> \
policy_runner.obs.obs_type=<obs_type> \
policy_runner.action.action_type=<action_type> \
policy_runner.action.delta=<delta> \
training.device=<device>
Argument |
Description |
Example |
---|---|---|
|
Name of the task |
|
|
Path to the zarr dataset created in Step 1. This will be {task_name}_{expert_data_num}.zarr |
|
|
Random seed for reproducibility |
|
|
Time horizon for the policy |
|
|
Number of observation steps |
|
|
Number of action steps |
|
|
Number of training epochs |
|
|
Observation type (joint_pos or ee) |
|
|
Action type (joint_pos or ee) |
|
|
Delta control mode (0 for absolute, 1 for delta) |
|
|
GPU device to use |
|
Option 2: Run with Single Command: train_dp.sh#
We further wrap the data preparation and training into a single command: train_dp.sh
. This ensures consistency between the parameters of the data preparation and training, especially the action space, observation space, data directory.
bash roboverse_learn/algorithms/diffusion_policy/train_dp.sh <metadata_dir> <task_name> <expert_data_num> <gpu_id> <num_epochs> <obs_space> <act_space> [<delta_ee>]
Argument |
Description |
---|---|
|
Path to the directory containing demonstration metadata saved by collect_demo |
|
Name of the task |
|
Number of expert demonstrations to use |
|
ID of the GPU to use |
|
Number of training epochs |
|
Observation space ( |
|
Action space ( |
|
Optional: Delta control ( |
Example:
bash roboverse_learn/algorithms/diffusion_policy/train_dp.sh roboverse_demo/demo_isaaclab/CloseBox-Level0/robot-franka CloseBoxFrankaL0 100 0 200 joint_pos joint_pos
Important Parameter Overrides:
horizon
,n_obs_steps
, andn_action_steps
are set directly intrain.sh
and override the YAML configurations.All other parameters (e.g., batch size, number of epochs) can be manually adjusted in the YAML file:
roboverse_learn/algorithms/diffusion_policy/diffusion_policy/config/robot_dp.yaml
If you alter observation and action spaces, verify the corresponding shapes in:
roboverse_learn/algorithms/diffusion_policy/diffusion_policy/config/task/default_task.yaml
Both end effector control and Franka joint space, have dimension 9 but keep this in mind if using a different robot.
Switching between Joint Position and End Effector Control#
Joint Position Control: Set both
obs_space
andact_space
tojoint_pos
.End Effector Control: Set both
obs_space
andact_space
toee
. You may usedelta_ee=1
for delta mode ordelta_ee=0
for absolute positioning.
Adjust relevant configuration parameters in:
roboverse_learn/algorithms/diffusion_policy/diffusion_policy/config/robot_dp.yaml
Evaluation#
To deploy and evaluate the trained policy:
python roboverse_learn/eval.py --task CloseBox --algo diffusion_policy --num_envs <up to ~50 envs works on RTX> --checkpoint_path <checkpoint_path>
Ensure that <checkpoint_path>
points to the file of the trained model checkpoint, ie info/outputs/DP/...