SimplerEnv → RoboVerse 1:1 integration#
SimplerEnv (Evaluating Real-World Robot Manipulation Policies in Simulation, Li et al. 2024) is the standard real-to-sim evaluation suite for generalist manipulation policies (RT-1, RT-1-X, RT-2-X, Octo, OpenVLA …). It ships 25 tasks built on ManiSkill2-real2sim + SAPIEN across two embodiments — Google Robot (21 tasks: pick coke can ×4, pick object, move near ×3, open/close drawer ×8, place-in-drawer ×5) and WidowX / Bridge (4 tasks: spoon-on-towel, carrot-on-plate, stack cube, eggplant-in-basket) — with a real-image greenscreen overlay so the rendered observation matches the real-robot eval distribution.
RoboVerse integrates SimplerEnv on two tracks:
MetaSim-native (primary) — every one of the 25 tasks is rebuilt entirely through the MetaSim API: each asset (robot, articulated cabinet, mesh / convex objects, ground, mounted cameras) is declared in a
ScenarioCfg, stepped through the SAPIEN 2 handler, wrapped inBaseTaskEnv, and registered with@register_task. The SimplerEnv control / grasp / overlay logic is vendored under_native/with zero import of the upstreamsimpler_env/mani_skill2_real2simpackages — verified by a meta-path block and a zero-import grep test, so the upstream clone is deletable.Passthrough (optional) — a transparent
gymnasium.makeforward tosimpler_env.makewhen the clone is installed; bitwise 1:1 by construction.
Status#
Capability |
Result |
Where |
|---|---|---|
MetaSim-native tasks |
25 / 25 built via |
|
Obs matches upstream |
initial render vs |
|
Real-policy success (RT-1/Octo) |
13/25 solved on MetaSim-native envs (RT-1 13/21 · Octo 0/4) |
|
Zero-upstream / deletable |
meta-path block + grep test green; runs with the clone absent |
|
Passthrough |
bitwise 1:1 by construction (forwards |
|
Registration |
25 gym ids ( |
|
MetaSim core changes: 4 backward-compatible SAPIEN-2 handler extensions — all
opt-in (mesh RigidObjCfg loading, mounted-camera intrinsics, PhysX SceneConfig
overrides, primitive fix_base_link / collision_enabled); existing scenarios are
untouched (the new code paths only activate on the new optional fields).
Fidelity is measured against the upstream
simpler_envobservation (initial render, same task + seed + station): coke / pick / move-near are bitwise; drawer 0.01, place 1.94, widowx 0.0 mean-abs over[0,255]. The tiny residuals are SAPIEN contact-solver nondeterminism (~1.8e-6) + GPU edge anti-aliasing (place’s 1.94 also includes the settling object + upstream’s randomurdf_versionrecolor variant). This upstream-obs check supersedes the earlier native-vs-reference parity, which could not see station / overlay / cabinet-recolor deviations (both sides shared them).
Environment setup#
SimplerEnv pins SAPIEN==2.2.2, numpy==1.24.4, mani_skill2_real2sim==0.5.3,
which conflict with the default RoboVerse env — install in a dedicated conda
env. The MetaSim-native track needs only SAPIEN 2 + the migrated roboverse_data
assets (no upstream package); the passthrough track additionally needs the
upstream clone.
conda create -n simpler python=3.10 -y && conda activate simpler
pip install sapien==2.2.2 numpy==1.24.4
# native track: + the SimplerEnv assets under roboverse_data/assets/simpler_env/
# passthrough track (optional):
git clone https://github.com/simpler-env/SimplerEnv.git && pip install -e SimplerEnv
Verified on an RTX 5090 (sm_120) with the NVIDIA Vulkan ICD — SAPIEN 2 rendering works, no sm_120 wall.
Usage#
import roboverse_pack.tasks.simpler_env # auto-registers SimplerEnv/<task> + simpler.<task>
# (1) MetaSim-native via gym
import gymnasium as gym
env = gym.make("SimplerEnv/google_robot_pick_coke_can")
obs, info = env.reset(seed=0)
obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
# (2) MetaSim-native via the MetaSim task registry
from metasim.task.registry import get_task_class
task = get_task_class("simpler.widowx_stack_cube")()
# (3) optional upstream passthrough (requires the SimplerEnv clone)
from roboverse_pack.tasks.simpler_env import register_simpler_env_passthrough
register_simpler_env_passthrough(prefix="SimplerEnvPassthrough/")
env = gym.make("SimplerEnvPassthrough/google_robot_pick_coke_can")
Reproduce — run commands#
All commands assume the dedicated simpler env and CPU JAX (JAX_PLATFORMS=cpu,
to keep the GPU for SAPIEN rendering only).
# --- native registration + 25/25 make/reset/step with the upstream clone DELETED ---
JAX_PLATFORMS=cpu python scripts/verify_native_registration.py
# --- exhaustive MetaSim-native vs upstream-equivalent parity over all 25 tasks ---
# (per-task subprocess-isolated; writes /tmp/metasim_full_parity.json)
JAX_PLATFORMS=cpu python scripts/spike_metasim_full_parity.py
# --- 25 side-by-side 1:1 galleries [native | reference | diff x30] ---
JAX_PLATFORMS=cpu python scripts/render_metasim_1to1_gallery.py
# --- tests ---
python -m pytest tests/test_simpler_env_native.py -v # registry(25) + zero-import + smoke
python -m pytest tests/test_simpler_env_passthrough.py -v # upstream forward (needs the clone)
Real-policy rollouts (RT-1 / Octo)#
These are real pretrained policies driving our MetaSim-native env (SimplerEnv/<task>), recording the first episode the task’s own success checker marks solved — an actual policy manipulating the objects, not a scripted motion. Google-robot tasks use RT-1 (rt_1_tf_trained_for_000400120); WidowX/Bridge tasks use Octo-base (policy_setup=widowx_bridge) — matching what SimplerEnv evaluates per embodiment.
Solved 13/25 on the MetaSim-native envs (RT-1 13/21 Google · Octo 0/4 WidowX). Captions show the policy and, for solved tasks, the step count; ✗ marks episodes the policy did not solve this run (a property of the policy, not the integration — long-horizon place + Bridge tasks are the known-hard cases).
Google Robot — pick coke can (4)#
google_robot_pick_coke_can · RT-1 ✓ 10 steps
google_robot_pick_horizontal_coke_can · RT-1 ✓ 12 steps
google_robot_pick_vertical_coke_can · RT-1 ✓ 15 steps
google_robot_pick_standing_coke_can · RT-1 ✓ 10 steps
Google Robot — pick object & move near (4)#
google_robot_pick_object · RT-1 ✓ 20 steps
google_robot_move_near · RT-1 ✓ 13 steps
google_robot_move_near_v0 · RT-1 ✓ 13 steps
google_robot_move_near_v1 · RT-1 ✓ 13 steps
Google Robot — open drawer (4)#
google_robot_open_drawer · RT-1 ✓ 43 steps
google_robot_open_top_drawer · RT-1 ✗ (not solved this run)
google_robot_open_middle_drawer · RT-1 ✓ 38 steps
google_robot_open_bottom_drawer · RT-1 ✗ (not solved this run)
Google Robot — close drawer (4)#
google_robot_close_drawer · RT-1 ✓ 27 steps
google_robot_close_top_drawer · RT-1 ✓ 27 steps
google_robot_close_middle_drawer · RT-1 ✓ 25 steps
google_robot_close_bottom_drawer · RT-1 ✗ (not solved this run)
Google Robot — place in closed drawer (5)#
google_robot_place_in_closed_drawer · RT-1 ✗ (not solved this run)
google_robot_place_in_closed_top_drawer · RT-1 ✗ (not solved this run)
google_robot_place_in_closed_middle_drawer · RT-1 ✗ (not solved this run)
google_robot_place_in_closed_bottom_drawer · RT-1 ✗ (not solved this run)
google_robot_place_apple_in_closed_top_drawer · RT-1 ✗ (not solved this run)
WidowX / Bridge — put-on (4)#
widowx_spoon_on_towel · Octo ✗ (not solved this run)
widowx_carrot_on_plate · Octo ✗ (not solved this run)
widowx_stack_cube · Octo ✗ (not solved this run)
widowx_put_eggplant_in_basket · Octo ✗ (not solved this run)
Implementation fidelity (obs matches upstream)#
Separately from policy capability, the MetaSim-native env is verified to reproduce the upstream simpler_env observation: for the same task + seed + station, the initial render matches upstream by mean-abs ≤ ~2/255 across all six families (coke/pick/move-near bitwise; drawer 0.01, place 1.94, widowx 0.0). This is a stronger check than the earlier native-vs-reference parity — it catches station / overlay / asset-recolor deviations that an internal self-comparison cannot. Regenerate side-by-side [ MetaSim-native | reference | abs-diff x30 ] clips with scripts/render_metasim_1to1_gallery.py; the per-task obs-vs-upstream check is in scripts/render_policy_gallery.py.
Design notes & honest caveats#
Per-episode random objects. Several families (pick-object, move-near, place-in-drawer) sample objects per episode. To keep the active physics solve bitwise, the full candidate set is declared once in
ScenarioCfg; each episode activates its subset and parks the rest atHIDDEN_POS=(0,0,-100)with motion locked and collision groups disabled, so inactive actors cannot perturb the solve.One env per process. SAPIEN keeps a process-global renderer/engine, so (exactly as upstream) only one env may be alive per process. The parity harness isolates each task in its own subprocess; this is a property of the underlying simulator, not the integration.
Open-loop forwarding, not policy success. We verify the rendering / state / reward / success contract (open-loop, scripted or seeded actions). Closed-loop policy success is a property of the policy and is out of scope for the integration claim.
Assets. The native track reads from
roboverse_data/assets/simpler_env/+roboverse_data/robots/{google_robot,widowx}/(URDFs, themk_stationcabinet, scene GLBs, object meshes, model DB, real-image overlays). These are mirrored on HuggingFace atRoboVerseOrg/roboverse_dataand download automatically on first use (_native/_assets.py→snapshot_download) when no localroboverse_datacheckout (or$ROBOVERSE_DATA) is found — so a fresh install needs no manual asset fetch. The SAPIEN*.convex.stlcollision caches are not stored (repo.gitignoreconvention) and are not required — verified by a cold-start run (empty dir → HF download → task builds and renders) with no caches present.