MetaSim for RL#

Layers#

        graph TD
    A[0 Simulator]
    B[1 Handler]
    C[2 Gym.Env]
    D[3 RL Framework or Benchmark]

    A --> B
    B --> C
    B --> D
    C --> D

Layer 0: Simulator#

DON’T USE!

Layer 1: Handler#

Corresponds to env.handler in MetaSim

Interface:

get_observation()
get_reward()
get_success()
get_time_out()
…

Current implementation:

v1 [Feishi]
v2 [Boshi]

Layer 2: Gym.Env#

Corresponds to env in MetaSim

Layer 2 is a light-weight wrapper of Layer 1.

Interface:

reset()
step()
render()
close()

Current implementation:

2.1 Env: (corresponds to env, don’t support Gym.Env!) [Deprecated?]
2.2 Gym.Env 0.26: specialized for HumanoidBench [Haozhe,Yutong]
- MuJoCo [Haozhe,Yutong]
- IsaacGym [Yutong]
2.3 Chaoyi’s Gym.Env if necessary [Chaoyi]
2.4 Gym.Env 1.0: merge all above implementations, final goal [TODO]

Layer 3: RL Framework#

3.1 RSL_RL’s VecEnv + RSL_RL [Chaoyi]
3.2 StableBaseline3 integration [Yutong]

Layer 4: RL Tasks on exising benchmarks#

Interface: depends on the specific RL framework or benchmark

Current implementation:

4.1 HumanoidBench: [Haozhe,Yutong]

TODOs#

Layer 1 Handler
- v2: Boshi implement metasim’s handler using get_states() and set_states() as core functions.
- v1:
  - Feishi and Charlie ensure interface is aligned across IsaacSim and MuJoCo.
  - Feishi support new handler properties (num_envs, num_obs, num_actions. Any else?)
Layer 2: Env
- Serve for Layer 3, everyone can implement their own Gym.Env for current stage.
- (Optional) We will finally merge all above implementations into one and support Gym.Env 1.0.
Layer 3: RL Framework or Benchmark
- This layer is based on layer 1 or 2 (except 2.1, which is deprecated!), but not based on layer 0. In this way, cross-simulator is guaranteed.
- Yutong and Haozhe implement HumanoidBench
- Chaoyi implement RSL_RL
  - Chaoyi will start from PickCube + IsaacLab Handler

Get reward from states (TODO: need update?)#

states = env.handler.get_states()

states is a list of state of each environment. For Mujoco, it has single elements.

The structure of a single state is as follows:

{
    "{object_name}": {
        "pos": [x, y, z],
        "rot": [w, x, y, z],
        "vel": [x, y, z],
        "ang_vel": [x, y, z],
        // below are optional fields for articulated objects
        "dof_pos": {
            "{joint_name}": float,
            ...
        },
        "dof_vel": {
            "{joint_name}": float,
            ...
        },
        // below are optional fields for articulated objects that have actuators
        "dof_pos_target": {
            "{joint_name}": float,
            ...
        },
        "dof_vel_target": {
            "{joint_name}": float,
            ...
        },
        "dof_torque": {
            "{joint_name}": float,
            ...
        }
    },
    // bodies are part of the articulated objects linked by joints
    "metasim_body_{body_name}": {  // the prefix "metasim_body_" is for compatibility with the old code
        "pos": [x, y, z],
        "rot": [w, x, y, z],
        "vel": [x, y, z],
        "ang_vel": [x, y, z],
        "com": [x, y, z],
        "com_vel": [x, y, z],
    },
    // sites are defined in task cfg, by the base (either a object root or a body root) and the relative pose
    "metasim_site_{site_name}": {  // the prefix "metasim_site_" is for compatibility with the old code
        "pos": [x, y, z],
        "rot": [w, x, y, z],
        "vel": [x, y, z], # Optional, only valid if sensor data is present
        "ang_vel": [x, y, z], # Optional, only valid if sensor data is present
    }
}

MetaSim for RL#

Layers#

Layer 0: Simulator#

Layer 1: Handler#

Layer 2: Gym.Env#

Layer 3: RL Framework#

Layer 4: RL Tasks on exising benchmarks#

TODOs#

Get reward from states (TODO: need update?)#

This Page