# State MetaSim use `state` to describe the state of a simulation environment at a given time. A unified state is the key to align different simulators. ## `state` Structure The state is a dictionary that contains the following keys: - `objects`: a dictionary that map object name to its state `object_state`. - `robots`: a dictionary that map robot name to its state `robot_state`. - `cameras`: a dictionary that map camera name to its state `camera_state`. ### `object_state` Structure The `object_state` is a dictionary that contains the following keys: - `pos`: the position of the object, as a `tensor([x, y, z])`. - `rot`: the quaternion of the object, as a `tensor([w, x, y, z])`. - `vel`: the linear velocity of the object, as a `tensor([vx, vy, vz])`. - `ang_vel`: the angular velocity of the object, as a `tensor([wx, wy, wz])`. The following keys are optional and only used for articulation objects: - `dof_pos`: the joint positions, as a dict `{'joint1': qpos1, 'joint2': qpos2, ...}`. - `dof_vel`: the joint velocities, as a dict `{'joint1': qvel1, 'joint2': qvel2, ...}`. - `body`: a dictionary that maps body link name to its state `body_state`. The `body_state` is a dictionary that contains `pos`, `rot`, `vel` and `ang_vel` keys. The definition is the same as above, but for the body link. ### `robot_state` Structure The `robot_state` contains all the above keys of an articulation object. Plus, it also contains the following keys: - `dof_pos_target`: the target joint positions, as a dict `{'joint1': qpos1, 'joint2': qpos2, ...}`. - `dof_vel_target`: the target joint velocities, as a dict `{'joint1': qvel1, 'joint2': qvel2, ...}`. ### `camera_state` Structure The `camera_state` is a dictionary that contains the following keys: - `rgb`: the RGB images, as a tensor of shape `[H, W, 3]`. - `depth`: the depth images, as a tensor of shape `[H, W]`. - `pos`: the position of the camera, as a `tensor([x, y, z])`. (not supported yet) - `look_at`: the look at point of the camera, as a `tensor([x, y, z])`. (not supported yet) - `intrinsic`: the intrinsic matrix of the camera, as a tensor of shape `[3, 3]`. (not supported yet) - `extrinsic`: the extrinsic matrix of the camera, as a tensor of shape `[4, 4]`. (not supported yet) ### State Example Here is an feasible example of a state: ```python { "objects": { "cube": { "pos": tensor([0.0, 0.0, 0.0]), "rot": tensor([1.0, 0.0, 0.0, 0.0]), "vel": tensor([0.0, 0.0, 0.0]), "ang_vel": tensor([0.0, 0.0, 0.0]), }, "box": { "pos": tensor([0.0, 0.0, 0.0]), "rot": tensor([1.0, 0.0, 0.0, 0.0]), "vel": tensor([0.0, 0.0, 0.0]), "ang_vel": tensor([0.0, 0.0, 0.0]), "dof_pos": { "box_joint": 0.0 }, "dof_vel": { "box_joint": 0.0 }, "body": { "box_lid": { "pos": tensor([0.0, 0.0, 0.0]), "rot": tensor([1.0, 0.0, 0.0, 0.0]), "vel": tensor([0.0, 0.0, 0.0]), "ang_vel": tensor([0.0, 0.0, 0.0]), }, "box_body": { "pos": tensor([0.0, 0.0, 0.0]), "rot": tensor([1.0, 0.0, 0.0, 0.0]), "vel": tensor([0.0, 0.0, 0.0]), "ang_vel": tensor([0.0, 0.0, 0.0]), }, } }, }, "robots": { "franka": { "pos": tensor([0.0, 0.0, 0.0]), "rot": tensor([1.0, 0.0, 0.0, 0.0]), "vel": tensor([0.0, 0.0, 0.0]), "ang_vel": tensor([0.0, 0.0, 0.0]), "dof_pos": { "panda_joint1": 0.0, "panda_joint2": -0.785398, "panda_joint3": 0.0, "panda_joint4": -2.356194, "panda_joint5": 0.0, "panda_joint6": 1.570796, "panda_joint7": 0.785398, "panda_finger_joint1": 0.04, "panda_finger_joint2": 0.04, }, "dof_vel": { "panda_joint1": 0.0, "panda_joint2": 0.0, "panda_joint3": 0.0, "panda_joint4": 0.0, "panda_joint5": 0.0, "panda_joint6": 0.0, "panda_joint7": 0.0, "panda_finger_joint1": 0.0, "panda_finger_joint2": 0.0, }, "dof_pos_target": { "panda_joint1": 0.0, "panda_joint2": -0.785398, "panda_joint3": 0.0, "panda_joint4": -2.356194, "panda_joint5": 0.0, "panda_joint6": 1.570796, "panda_joint7": 0.785398, "panda_finger_joint1": 0.04, "panda_finger_joint2": 0.04, }, "dof_vel_target": { "panda_joint1": 0.0, "panda_joint2": 0.0, "panda_joint3": 0.0, "panda_joint4": 0.0, "panda_joint5": 0.0, "panda_joint6": 0.0, "panda_joint7": 0.0, "panda_finger_joint1": 0.0, "panda_finger_joint2": 0.0, }, } }, "cameras": { "camera0": { "rgb": torch.zeros((H, W, 3)), "depth": torch.zeros((H, W)), } }, } ``` ## `state` with Functions MetaSim APIs always deal with `states` as a list of `state`. The length of the list is the number of environments. The observation term returned by `env.reset()` and `env.step()` is also unified to `states`. - `handler.get_states() -> list[State]` - `handler.set_states(states: list[State]) -> None` - `env.reset(init_states: list[State]) -> tuple[list[State], Extra]` - `env.step(actions: list[Action]) -> tuple[list[State], list[Reward], list[Success], list[TimeOut], Extra]`