0. CloseBox#

Imitation Learning is a powerful tool for training agents to perform tasks by leveraging expert demonstrations, especially when reward signals are sparse or hard to specify.

In this example, we will collect demonstration trajectories of a robotic arm performing various tasks in simulator, then train an agent using Behavioral Cloning to mimic the expert’s motions and generalize to unseen targets.

Collecting Demonstration#

To train the policy, we first need to collect demonstration for it to learn from. In this documentation, we use CloseBox as an example to collect demonstration trajectories.

Task: Close Box#

python collect_demo.py \
  --task CloseBox \
  --robot franka \
  --sim isaaclab \
  --num_envs 1 \
  --headless True \
  --random.level 2 \
  --run_unfinished

Please check the Collecting Demonstration for more details about how to collect demonstration and what are the options for the command.

Training the policy and Converting the dataset format#

After collected to demo data, we can transform into zarr folders so we can read them more easily to train the policy. In this demo, we use diffusion policy as an example. You can also use other algorithms under roboverse_learn/algorithms.

task_name=CloseBox
level=2
expert_data_num=100
gpu_id=0
num_epochs=200

bash roboverse_learn/algorithms/diffusion_policy/train_dp.sh roboverse_demo/demo_isaaclab/"${task_name}"-Level"${level}"/robot-franka "${task_name}"FrankaL"${level}" 100 "${gpus}" "${num_epochs}" joint_pos joint_pos 0 1 1

Evaluating the policy#

After training the policy, we can evaluate it in the environment. The following command will run the trained policy in the environment and save the video output to tmp/${task_name}/${policy_name}/${robot_name}/.

python roboverse_learn/eval.py --task CloseBox --algo diffusion_policy --num_envs <up to ~50 envs works on RTX> --checkpoint_path <checkpoint_path>

Example demos#

CloseBox