MetaSim for RL#
Layers#
graph TD A[0 Simulator] B[1 Handler] C[2 Gym.Env] D[3 RL Framework or Benchmark] A --> B B --> C B --> D C --> D
Layer 0: Simulator#
DON’T USE!
Layer 1: Handler#
Corresponds to env.handler
in MetaSim
Interface:
get_observation()
get_reward()
get_success()
get_time_out()
get_termination()
…
Current implementation:
v1 [Feishi]
v2 [Boshi]
Layer 2: Gym.Env#
Corresponds to env
in MetaSim
Layer 2 is a light-weight wrapper of Layer 1.
Interface:
reset()
step()
render()
close()
Current implementation:
2.1 Env: (corresponds to
env
, don’t supportGym.Env
!) [Deprecated?]2.2 Gym.Env 0.26: specialized for HumanoidBench [Haozhe,Yutong]
2.3 Chaoyi’s Gym.Env if necessary [Chaoyi]
2.4 Gym.Env 1.0: merge all above implementations, final goal [TODO]
Layer 3: RL Framework#
3.1 RSL_RL’s VecEnv + RSL_RL [Chaoyi]
3.2 StableBaseline3 integration [Yutong]
Layer 4: RL Tasks on exising benchmarks#
Interface: depends on the specific RL framework or benchmark
Current implementation:
4.1 HumanoidBench: [Haozhe,Yutong]
TODOs#
Layer 1 Handler
[ ] v2: Boshi implement metasim’s handler using
get_states()
andset_states()
as core functions.[ ] v1:
[ ] Feishi and Charlie ensure interface is aligned across IsaacSim and MuJoCo.
[ ] Feishi support new handler properties (num_envs, num_obs, num_actions. Any else?)
Layer 2: Env
[ ] Serve for Layer 3, everyone can implement their own Gym.Env for current stage.
[ ] (Optional) We will finally merge all above implementations into one and support Gym.Env 1.0.
Layer 3: RL Framework or Benchmark
This layer is based on layer 1 or 2 (except 2.1, which is deprecated!), but not based on layer 0. In this way, cross-simulator is guaranteed.
[ ] Yutong and Haozhe implement HumanoidBench
[ ] Chaoyi implement RSL_RL
[ ] Chaoyi will start from PickCube + IsaacLab Handler
Get reward from states (TODO: need update?)#
states = env.handler.get_states()
states
is a list of state of each environment. For Mujoco, it has single elements.
The structure of a single state is as follows:
{
"{object_name}": {
"pos": [x, y, z],
"rot": [w, x, y, z],
"vel": [x, y, z],
"ang_vel": [x, y, z],
// below are optional fields for articulated objects
"dof_pos": {
"{joint_name}": float,
...
},
"dof_vel": {
"{joint_name}": float,
...
},
// below are optional fields for articulated objects that have actuators
"dof_pos_target": {
"{joint_name}": float,
...
},
"dof_vel_target": {
"{joint_name}": float,
...
},
"dof_torque": {
"{joint_name}": float,
...
}
},
// bodies are part of the articulated objects linked by joints
"metasim_body_{body_name}": { // the prefix "metasim_body_" is for compatibility with the old code
"pos": [x, y, z],
"rot": [w, x, y, z],
"vel": [x, y, z],
"ang_vel": [x, y, z],
"com": [x, y, z],
"com_vel": [x, y, z],
},
// sites are defined in task metacfg, by the base (either a object root or a body root) and the relative pose
"metasim_site_{site_name}": { // the prefix "metasim_site_" is for compatibility with the old code
"pos": [x, y, z],
"rot": [w, x, y, z],
"vel": [x, y, z], # Optional, only valid if sensor data is present
"ang_vel": [x, y, z], # Optional, only valid if sensor data is present
}
}