State#
MetaSim use state
to describe the state of a simulation environment at a given time.
A unified state is the key to align different simulators.
state
Structure#
The state is a dictionary that contains the following keys:
objects
: a dictionary that map object name to its stateobject_state
.robots
: a dictionary that map robot name to its staterobot_state
.cameras
: a dictionary that map camera name to its statecamera_state
.
object_state
Structure#
The object_state
is a dictionary that contains the following keys:
pos
: the position of the object, as atensor([x, y, z])
.rot
: the quaternion of the object, as atensor([w, x, y, z])
.vel
: the linear velocity of the object, as atensor([vx, vy, vz])
.ang_vel
: the angular velocity of the object, as atensor([wx, wy, wz])
.
The following keys are optional and only used for articulation objects:
dof_pos
: the joint positions, as a dict{'joint1': qpos1, 'joint2': qpos2, ...}
.dof_vel
: the joint velocities, as a dict{'joint1': qvel1, 'joint2': qvel2, ...}
.body
: a dictionary that maps body link name to its statebody_state
.
The body_state
is a dictionary that contains pos
, rot
, vel
and ang_vel
keys. The definition is the same as above, but for the body link.
robot_state
Structure#
The robot_state
contains all the above keys of an articulation object. Plus, it also contains the following keys:
dof_pos_target
: the target joint positions, as a dict{'joint1': qpos1, 'joint2': qpos2, ...}
.dof_vel_target
: the target joint velocities, as a dict{'joint1': qvel1, 'joint2': qvel2, ...}
.
camera_state
Structure#
The camera_state
is a dictionary that contains the following keys:
rgb
: the RGB images, as a tensor of shape[H, W, 3]
.depth
: the depth images, as a tensor of shape[H, W]
.pos
: the position of the camera, as atensor([x, y, z])
. (not supported yet)look_at
: the look at point of the camera, as atensor([x, y, z])
. (not supported yet)intrinsic
: the intrinsic matrix of the camera, as a tensor of shape[3, 3]
. (not supported yet)extrinsic
: the extrinsic matrix of the camera, as a tensor of shape[4, 4]
. (not supported yet)
State Example#
Here is an feasible example of a state:
{
"objects": {
"cube": {
"pos": tensor([0.0, 0.0, 0.0]),
"rot": tensor([1.0, 0.0, 0.0, 0.0]),
"vel": tensor([0.0, 0.0, 0.0]),
"ang_vel": tensor([0.0, 0.0, 0.0]),
},
"box": {
"pos": tensor([0.0, 0.0, 0.0]),
"rot": tensor([1.0, 0.0, 0.0, 0.0]),
"vel": tensor([0.0, 0.0, 0.0]),
"ang_vel": tensor([0.0, 0.0, 0.0]),
"dof_pos": { "box_joint": 0.0 },
"dof_vel": { "box_joint": 0.0 },
"body": {
"box_lid": {
"pos": tensor([0.0, 0.0, 0.0]),
"rot": tensor([1.0, 0.0, 0.0, 0.0]),
"vel": tensor([0.0, 0.0, 0.0]),
"ang_vel": tensor([0.0, 0.0, 0.0]),
},
"box_body": {
"pos": tensor([0.0, 0.0, 0.0]),
"rot": tensor([1.0, 0.0, 0.0, 0.0]),
"vel": tensor([0.0, 0.0, 0.0]),
"ang_vel": tensor([0.0, 0.0, 0.0]),
},
}
},
},
"robots": {
"franka": {
"pos": tensor([0.0, 0.0, 0.0]),
"rot": tensor([1.0, 0.0, 0.0, 0.0]),
"vel": tensor([0.0, 0.0, 0.0]),
"ang_vel": tensor([0.0, 0.0, 0.0]),
"dof_pos": {
"panda_joint1": 0.0,
"panda_joint2": -0.785398,
"panda_joint3": 0.0,
"panda_joint4": -2.356194,
"panda_joint5": 0.0,
"panda_joint6": 1.570796,
"panda_joint7": 0.785398,
"panda_finger_joint1": 0.04,
"panda_finger_joint2": 0.04,
},
"dof_vel": {
"panda_joint1": 0.0,
"panda_joint2": 0.0,
"panda_joint3": 0.0,
"panda_joint4": 0.0,
"panda_joint5": 0.0,
"panda_joint6": 0.0,
"panda_joint7": 0.0,
"panda_finger_joint1": 0.0,
"panda_finger_joint2": 0.0,
},
"dof_pos_target": {
"panda_joint1": 0.0,
"panda_joint2": -0.785398,
"panda_joint3": 0.0,
"panda_joint4": -2.356194,
"panda_joint5": 0.0,
"panda_joint6": 1.570796,
"panda_joint7": 0.785398,
"panda_finger_joint1": 0.04,
"panda_finger_joint2": 0.04,
},
"dof_vel_target": {
"panda_joint1": 0.0,
"panda_joint2": 0.0,
"panda_joint3": 0.0,
"panda_joint4": 0.0,
"panda_joint5": 0.0,
"panda_joint6": 0.0,
"panda_joint7": 0.0,
"panda_finger_joint1": 0.0,
"panda_finger_joint2": 0.0,
},
}
},
"cameras": {
"camera0": {
"rgb": torch.zeros((H, W, 3)),
"depth": torch.zeros((H, W)),
}
},
}
state
with Functions#
MetaSim APIs always deal with states
as a list of state
. The length of the list is the number of environments. The observation term returned by env.reset()
and env.step()
is also unified to states
.
handler.get_states() -> list[State]
handler.set_states(states: list[State]) -> None
env.reset(init_states: list[State]) -> tuple[list[State], Extra]
env.step(actions: list[Action]) -> tuple[list[State], list[Reward], list[Success], list[TimeOut], Extra]