ACT#

ACT (Action Chunking with Transformers) implements a transformer-based VAE policy, which generates chunks of ~100 actions at each step. These are averaged using temporal ensembling to generate a single action. This algorithm was introduced by the Aloha paper, and uses the same implementation.

Installation#

cd roboverse_learn/algorithms/act/detr
pip install -e .
cd ../../../

pip install pandas wandb

Option 1: Two Step, Pre-processing and Training#

Data Preparation:#

data2zarr_dp.py converts the metadata stored by the collect_demo script into Zarr format for efficient dataloading. This script can handle both joint position and end effector action and observation spaces.

Command:

python roboverse_learn/algorithms/data2zarr_dp.py \
--task_name <task_name> \
--expert_data_num <expert_data_num> \
--metadata_dir <metadata_dir> \
--action_space <action_space> \
--observation_space <observation_space> \
--delta_ee <delta_ee>

Argument

Description

Example

task_name

Name of the task

CloseBoxFrankaL0_obs:joint_pos_act:joint_pos

expert_data_num

Number of expert demonstrations to process

100

metadata_dir

Path to the directory containing demonstration metadata saved by collect_demo

roboverse_demo/demo_isaaclab/CloseBox-Level0/robot-franka

action_space

Type of action space to use (options: ‘joint_pos’ or ‘ee’)

joint_pos

observation_space

Type of observation space to use (options: ‘joint_pos’ or ‘ee’)

joint_pos

delta_ee

(optional) Delta control (0: absolute, 1: delta; default 0)

0

Training:#

roboverse_learn/algorithms/act/train.py uses the generated Zarr data, which gets stored in the data_policy/ directory, to train the ACT model.

Command:

python -m roboverse_learn.algorithms.act.train \
--task_name <task_name> \
--num_episodes <num_episodes> \
--dataset_dir <dataset_dir> \
--policy_class <policy_class> \
--kl_weight <kl_weight> \
--chunk_size <chunk_size> \
--hidden_dim <hidden_dim> \
--batch_size <batch_size> \
--dim_feedforward <dim_feedforward> \
--num_epochs <num_epochs> \
--lr <lr> \
--state_dim <state_dim> \
--seed <seed>

Argument

Description

Example

task_name

Name of the task

CloseBoxFrankaL0_obs:joint_pos_act:joint_pos

num_episodes

Number of episodes in the dataset

100

dataset_dir

Path to the zarr dataset created in Data Preparation step

data_policy/CloseBoxFrankaL0_obs:joint_pos_act:joint_pos_100.zarr

policy_class

Policy class to use

ACT

kl_weight

Weight for KL divergence loss

10

chunk_size

Number of actions per chunk

100

hidden_dim

Hidden dimension size for the transformer

512

batch_size

Batch size for training

8

dim_feedforward

Feedforward dimension for transformer

3200

num_epochs

Number of training epochs

2000

lr

Learning rate

1e-5

state_dim

State dimension (action space dimension)

9

seed

Random seed for reproducibility

42

Option 2: Run with Single Command: train_act.sh#

We further wrap the data preparation and training into a single command: train_act.sh. This ensures consistency between the parameters of the data preparation and training, especially the action space, observation space, and data directory.

bash roboverse_learn/algorithms/act/train_act.sh <metadata_dir> <task_name> <expert_data_num> <gpu_id> <num_epochs> <obs_space> <act_space> [<delta_ee>]

Argument

Description

Example

metadata_dir

Path to the directory containing demonstration metadata saved by collect_demo

roboverse_demo/demo_isaaclab/CloseBox-Level0/robot-franka

task_name

Name of the task

CloseBoxFrankaL0

expert_data_num

Number of expert demonstrations to use

100

gpu_id

ID of the GPU to use

0

num_epochs

Number of training epochs

2000

obs_space

Observation space (joint_pos or ee)

joint_pos

act_space

Action space (joint_pos or ee)

joint_pos

delta_ee

Optional: Delta control (0 absolute, 1 delta; default 0)

0

Example:

bash roboverse_learn/algorithms/act/train_act.sh roboverse_demo/demo_isaaclab/CloseBox-Level0/robot-franka CloseBoxFrankaL0 100 0 2000 joint_pos joint_pos

Important Parameter Overrides:

  • Key hyperparameters including kl_weight (set to 10), chunk_size (set to 100), hidden_dim (set to 512), batch_size (set to 8), dim_feedforward (set to 3200), and lr (set to 1e-5) are set directly in train_act.sh.

  • state_dim is set to 9 by default, which works for both Franka joint space and end effector space.

  • Notably, chunk_size is the most important parameter, which is defaulted to 100 actions per step.

Switching between Joint Position and End Effector Control#

  • Joint Position Control: Set both obs_space and act_space to joint_pos.

  • End Effector Control: Set both obs_space and act_space to ee. You may use delta_ee=1 for delta mode or delta_ee=0 for absolute positioning.

  • Note the original ACT paper uses an action joint space of 14, but we modify the code to allow a parameterized action dimensionality state_dim to be passed into the training python script, which we default to 9 for Franka joint space or end effector space.

Evaluation#

To deploy and evaluate the trained policy:

python roboverse_learn/eval.py --task CloseBox --algo ACT --num_envs <up to ~50 envs works on RTX> --checkpoint_path <save_directory>

Ensure that <save_directory> points to the directory containing your trained model checkpoint, which should get saved to info/outputs/ACT/...