# Multi-Agent (Bimanual) Datasets

RoboVerse's trajectory format is multi-agent native. A dataset file stores one
entry **per agent, keyed by robot name** — the same on-disk layout single-agent
datasets already use. A single-agent file is therefore just the one-key special
case, so existing datasets keep working unchanged.

This is what makes bimanual workflows (two independent arms acting
simultaneously, e.g. ManiSkill's `TwoRobotStackCube-v1` style tasks) expressible
without inventing a parallel format.

## On-disk format

A `*_v2.pkl` file is a dict keyed by robot name. Each agent maps to a list of
demos; each demo carries `init_state`, `actions`, and optional `states`:

```python
{
    "franka_left":  [{"init_state": {...}, "actions": [...], "states": None}, ...],
    "franka_right": [{"init_state": {...}, "actions": [...], "states": None}, ...],
    "metadata": {"num_agents": 2, "agents": ["franka_left", "franka_right"]},
}
```

Each agent's `init_state` lists that agent's robot entry plus any **shared
objects** (the cube both arms coordinate around). Per-agent actions are
namespaced as `{"dof_pos_target": {...}}`.

## Loading with `get_traj`

The canonical loader `metasim.utils.demo_util.get_traj` takes either a single
robot (single-agent, unchanged) or a **list of robots** (multi-agent):

```python
from metasim.utils.demo_util import get_traj

robots = [franka.replace(name=n) for n in ["franka_left", "franka_right"]]
init_states, all_actions, all_states = get_traj("bimanual_handover_v2.pkl", robots)
```

Passing the list returns the **same three-tuple shape** as the single-agent
path, with every agent merged into each per-step dict:

- `init_states[d]["robots"]` holds every arm; `init_states[d]["objects"]` holds
  the shared objects once.
- `all_actions[d][t]` is `{robot_name: {"dof_pos_target": ...}}` for **all**
  agents at step `t` — exactly what `handler.set_dof_targets([...])` consumes.
- `all_states[d][t]` unions each agent's `robots`/`objects` (or is `None` for
  action-only demos).

Because the shape is identical, the same replay / collection code paths drive
one arm or many. Multi-agent loading requires the v3 namespaced format
(`v2_as_v3=True`, the default); `v2_as_v3=False` with a robot list raises, since
namespacing is what keeps each agent's actions indexed by name.

## Runnable examples

`get_started/8_multiagent_dataset.py` builds a coordinated two-Franka handover
trajectory, saves it as a real `*_v2.pkl`, loads it back through `get_traj`, and
replays both arms simultaneously to video:

```bash
MUJOCO_GL=egl python get_started/8_multiagent_dataset.py --sim mujoco
```

The same trajectory is also exposed as a registered task, so it replays through
the **canonical pipeline** (`scripts/advanced/replay_demo.py`) — which now passes
the full robot list to `get_traj` whenever a task declares more than one robot:

```bash
MUJOCO_GL=egl python scripts/advanced/replay_demo.py \
    --task bimanual.franka_handover --sim mujoco --headless
```

`get_started/9_maniskill_two_robot_stack_cube.py` does the same round trip with
**real ManiSkill data**: it fetches the official `TwoRobotStackCube-v1`
demonstrations, converts one episode into the name-keyed `*_v2` format, loads
both Panda arms through `get_traj`, and replays the recorded states on MuJoCo:

```bash
MUJOCO_GL=egl python get_started/9_maniskill_two_robot_stack_cube.py --sim mujoco
```

The ManiSkill `.h5` stores one articulation per agent
(`panda_wristcam-agent-0` / `-agent-1`) plus the shared cubes; converting it is
just a regrouping into one keyed entry per agent. Replay uses the recorded
**states** (kinematic playback) rather than open-loop action targets: the demos
were collected under SAPIEN's `pd_joint_delta_pos` controller, and closed-loop
contact dynamics do not transfer across simulators, so state replay is the
faithful cross-sim view of the dataset.

## Single-embodiment bimanual vs. two agents

Two distinct cases share this format:

- **Single-embodiment bimanual** (one URDF with two arms, e.g. ALOHA / RoboTwin
  AgileX) — one robot entry whose action dict spans all joints. See the
  [RoboTwin Integration](../integrations/robotwin.md).
- **Two independent agents** (two separate robot entities) — the case above,
  one keyed entry per agent.

The single-embodiment bimanual case is demonstrated by
`get_started/10_robotwin_aloha_replay.py`, but note:

```{warning}
`get_started/10_robotwin_aloha_replay.py` is **experimental** and **not
out-of-the-box** (unlike examples 8 and 9, which run from a clean MuJoCo
install). It needs a local RoboTwin clone, its ~3.74 GB asset pack, a separate
`robotwin` conda env, and a curobo build for the local GPU arch to collect a
bridge pickle first. The manipulated object is rendered as a primitive-cube
proxy (not the real mesh), and only joint *motion* — not task *success* — has
been confirmed. Treat it as a data-bridge demo, not a benchmark.
```