RL Infrastructure Overview#
⚠️ CONSTRUCTION WARNING ⚠️
This RL infrastructure is currently under active construction. Features, APIs, and interfaces may change significantly. Use at your own discretion and expect potential breaking changes.
Available Scripts#
All scripts are located in get_started/rl/
:
1. PPO Training (0_ppo.py
)#
Direct implementation using RLTaskEnv with Stable Baselines3 PPO.
Usage:
# Basic usage with default settings
python get_started/rl/0_ppo.py
# Custom task and robot
python get_started/rl/0_ppo.py --task reach_origin --robot franka --sim isaacgym
# Adjust environment count and headless mode
python get_started/rl/0_ppo.py --num-envs 256 --headless
# Different simulators
python get_started/rl/0_ppo.py --sim mujoco
python get_started/rl/0_ppo.py --sim genesis
python get_started/rl/0_ppo.py --sim isaaclab
Arguments:
--task
: Task name (default:reach_origin
)--robot
: Robot type (default:franka
)--num-envs
: Number of parallel environments (default:128
)--sim
: Simulator backend (isaacgym
,isaaclab
,mujoco
,genesis
,mjx
)--headless
: Run without GUI (flag)
Outputs:
Model saved to:
get_started/output/rl/0_ppo_reaching_{sim}
2. PPO Training with Gym Interface (0_ppo_gym.py
)#
Uses Gymnasium-compatible interface with cleaner integration.
Usage:
# Basic training
python get_started/rl/0_ppo_gym.py
# With different backend
python get_started/rl/0_ppo_gym.py --sim mjx --device cuda
# Custom configuration
python get_started/rl/0_ppo_gym.py --task reach_origin --robot franka --num-envs 64
Arguments:
--task
: Task name (default:reach_origin
)--robot
: Robot type (default:franka
)--num-envs
: Number of environments (default:128
)--sim
: Simulator (isaaclab
,isaacgym
,mujoco
,genesis
,mjx
)--headless
: Headless mode (flag)--device
: Device (cuda
,cpu
)
3. Fast TD3 Training (1_fttd3.py
)#
Advanced TD3 implementation with distributional critics and various optimizations.
Usage:
# Basic training
python get_started/rl/1_fttd3.py
# The script uses a CONFIG dictionary for configuration
# Key parameters can be modified in the CONFIG section
Key Configuration Options:
CONFIG = {
"sim": "mjx", # Simulator backend
"robots": ["h1"], # Robot type
"task": "humanoid.run", # Task name
"num_envs": 1024, # Number of environments
"total_timesteps": 1500, # Training timesteps
"batch_size": 32768, # Batch size
"learning_rate": 0.0003, # Learning rate
"use_wandb": False, # Weights & Biases logging
}
Environment Integration#
Supported Tasks#
reach_origin
: Basic reaching taskhumanoid.run
: Humanoid locomotionCustom tasks via task registry
Supported Robots#
franka
: Franka Panda armh1
: H1 humanoid robotAdditional robots available in robot configurations
Simulator Backends#
Isaac Gym: NVIDIA’s physics simulation
Isaac Lab: Next-generation Isaac simulation
MuJoCo: Fast physics simulation
Genesis: Multi-physics simulation
MJX: JAX-based MuJoCo implementation
Dependencies#
Core Requirements#
# Core metasim framework
pip install -e .
# RL libraries
pip install stable-baselines3
pip install torch torchvision
pip install tensordict
pip install loguru
pip install tyro