State#

MetaSim use state to describe the state of a simulation environment at a given time. A unified state is the key to align different simulators.

state Structure#

The state is a dictionary that contains the following keys:

  • objects: a dictionary that map object name to its state object_state.

  • robots: a dictionary that map robot name to its state robot_state.

  • cameras: a dictionary that map camera name to its state camera_state.

object_state Structure#

The object_state is a dictionary that contains the following keys:

  • pos: the position of the object, as a tensor([x, y, z]).

  • rot: the quaternion of the object, as a tensor([w, x, y, z]).

  • vel: the linear velocity of the object, as a tensor([vx, vy, vz]).

  • ang_vel: the angular velocity of the object, as a tensor([wx, wy, wz]).

The following keys are optional and only used for articulation objects:

  • dof_pos: the joint positions, as a dict {'joint1': qpos1, 'joint2': qpos2, ...}.

  • dof_vel: the joint velocities, as a dict {'joint1': qvel1, 'joint2': qvel2, ...}.

  • body: a dictionary that maps body link name to its state body_state.

The body_state is a dictionary that contains pos, rot, vel and ang_vel keys. The definition is the same as above, but for the body link.

robot_state Structure#

The robot_state contains all the above keys of an articulation object. Plus, it also contains the following keys:

  • dof_pos_target: the target joint positions, as a dict {'joint1': qpos1, 'joint2': qpos2, ...}.

  • dof_vel_target: the target joint velocities, as a dict {'joint1': qvel1, 'joint2': qvel2, ...}.

camera_state Structure#

The camera_state is a dictionary that contains the following keys:

  • rgb: the RGB images, as a tensor of shape [H, W, 3].

  • depth: the depth images, as a tensor of shape [H, W].

  • pos: the position of the camera, as a tensor([x, y, z]). (not supported yet)

  • look_at: the look at point of the camera, as a tensor([x, y, z]). (not supported yet)

  • intrinsic: the intrinsic matrix of the camera, as a tensor of shape [3, 3]. (not supported yet)

  • extrinsic: the extrinsic matrix of the camera, as a tensor of shape [4, 4]. (not supported yet)

State Example#

Here is an feasible example of a state:

{
    "objects": {
        "cube": {
            "pos": tensor([0.0, 0.0, 0.0]),
            "rot": tensor([1.0, 0.0, 0.0, 0.0]),
            "vel": tensor([0.0, 0.0, 0.0]),
            "ang_vel": tensor([0.0, 0.0, 0.0]),
        },
        "box": {
            "pos": tensor([0.0, 0.0, 0.0]),
            "rot": tensor([1.0, 0.0, 0.0, 0.0]),
            "vel": tensor([0.0, 0.0, 0.0]),
            "ang_vel": tensor([0.0, 0.0, 0.0]),
            "dof_pos": { "box_joint": 0.0 },
            "dof_vel": { "box_joint": 0.0 },
            "body": {
                "box_lid": {
                    "pos": tensor([0.0, 0.0, 0.0]),
                    "rot": tensor([1.0, 0.0, 0.0, 0.0]),
                    "vel": tensor([0.0, 0.0, 0.0]),
                    "ang_vel": tensor([0.0, 0.0, 0.0]),
                },
                "box_body": {
                    "pos": tensor([0.0, 0.0, 0.0]),
                    "rot": tensor([1.0, 0.0, 0.0, 0.0]),
                    "vel": tensor([0.0, 0.0, 0.0]),
                    "ang_vel": tensor([0.0, 0.0, 0.0]),
                },
            }
        },
    },
    "robots": {
        "franka": {
            "pos": tensor([0.0, 0.0, 0.0]),
            "rot": tensor([1.0, 0.0, 0.0, 0.0]),
            "vel": tensor([0.0, 0.0, 0.0]),
            "ang_vel": tensor([0.0, 0.0, 0.0]),
            "dof_pos": {
                "panda_joint1": 0.0,
                "panda_joint2": -0.785398,
                "panda_joint3": 0.0,
                "panda_joint4": -2.356194,
                "panda_joint5": 0.0,
                "panda_joint6": 1.570796,
                "panda_joint7": 0.785398,
                "panda_finger_joint1": 0.04,
                "panda_finger_joint2": 0.04,
            },
            "dof_vel": {
                "panda_joint1": 0.0,
                "panda_joint2": 0.0,
                "panda_joint3": 0.0,
                "panda_joint4": 0.0,
                "panda_joint5": 0.0,
                "panda_joint6": 0.0,
                "panda_joint7": 0.0,
                "panda_finger_joint1": 0.0,
                "panda_finger_joint2": 0.0,
            },
            "dof_pos_target": {
                "panda_joint1": 0.0,
                "panda_joint2": -0.785398,
                "panda_joint3": 0.0,
                "panda_joint4": -2.356194,
                "panda_joint5": 0.0,
                "panda_joint6": 1.570796,
                "panda_joint7": 0.785398,
                "panda_finger_joint1": 0.04,
                "panda_finger_joint2": 0.04,
            },
            "dof_vel_target": {
                "panda_joint1": 0.0,
                "panda_joint2": 0.0,
                "panda_joint3": 0.0,
                "panda_joint4": 0.0,
                "panda_joint5": 0.0,
                "panda_joint6": 0.0,
                "panda_joint7": 0.0,
                "panda_finger_joint1": 0.0,
                "panda_finger_joint2": 0.0,
            },
        }
    },
    "cameras": {
        "camera0": {
            "rgb": torch.zeros((H, W, 3)),
            "depth": torch.zeros((H, W)),
        }
    },
}

state with Functions#

MetaSim APIs always deal with states as a list of state. The length of the list is the number of environments. The observation term returned by env.reset() and env.step() is also unified to states.

  • handler.get_states() -> list[State]

  • handler.set_states(states: list[State]) -> None

  • env.reset(init_states: list[State]) -> tuple[list[State], Extra]

  • env.step(actions: list[Action]) -> tuple[list[State], list[Reward], list[Success], list[TimeOut], Extra]