RL Infrastructure Overview#

⚠️ CONSTRUCTION WARNING ⚠️
This RL infrastructure is currently under active construction. Features, APIs, and interfaces may change significantly. Use at your own discretion and expect potential breaking changes.

Available Scripts#

All scripts are located in get_started/rl/:

1. PPO Training (`0_ppo.py`)#

Direct implementation using RLTaskEnv with Stable Baselines3 PPO.

Usage:

# Basic usage with default settings
python get_started/rl/0_ppo.py

# Custom task and robot
python get_started/rl/0_ppo.py --task reach_origin --robot franka --sim isaacgym

# Adjust environment count and headless mode
python get_started/rl/0_ppo.py --num-envs 256 --headless

# Different simulators
python get_started/rl/0_ppo.py --sim mujoco
python get_started/rl/0_ppo.py --sim genesis
python get_started/rl/0_ppo.py --sim isaaclab

Arguments:

--task: Task name (default: reach_origin)
--robot: Robot type (default: franka)
--num-envs: Number of parallel environments (default: 128)
--sim: Simulator backend (isaacgym, isaaclab, mujoco, genesis, mjx)
--headless: Run without GUI (flag)

Outputs:

Model saved to: get_started/output/rl/0_ppo_reaching_{sim}

2. PPO Training with Gym Interface (`0_ppo_gym.py`)#

Uses Gymnasium-compatible interface with cleaner integration.

Usage:

# Basic training
python get_started/rl/0_ppo_gym.py

# With different backend
python get_started/rl/0_ppo_gym.py --sim mjx --device cuda

# Custom configuration
python get_started/rl/0_ppo_gym.py --task reach_origin --robot franka --num-envs 64

Arguments:

--task: Task name (default: reach_origin)
--robot: Robot type (default: franka)
--num-envs: Number of environments (default: 128)
--sim: Simulator (isaaclab, isaacgym, mujoco, genesis, mjx)
--headless: Headless mode (flag)
--device: Device (cuda, cpu)

3. Fast TD3 Training (`1_fttd3.py`)#

Advanced TD3 implementation with distributional critics and various optimizations.

Usage:

# Basic training
python get_started/rl/1_fttd3.py

# The script uses a CONFIG dictionary for configuration
# Key parameters can be modified in the CONFIG section

Key Configuration Options:

CONFIG = {
    "sim": "mjx",                    # Simulator backend
    "robots": ["h1"],               # Robot type
    "task": "humanoid.run",         # Task name
    "num_envs": 1024,              # Number of environments
    "total_timesteps": 1500,       # Training timesteps
    "batch_size": 32768,           # Batch size
    "learning_rate": 0.0003,       # Learning rate
    "use_wandb": False,            # Weights & Biases logging
}

Environment Integration#

Supported Tasks#

reach_origin: Basic reaching task
humanoid.run: Humanoid locomotion
Custom tasks via task registry

Supported Robots#

franka: Franka Panda arm
h1: H1 humanoid robot
Additional robots available in robot configurations

Simulator Backends#

Isaac Gym: NVIDIA’s physics simulation
Isaac Lab: Next-generation Isaac simulation
MuJoCo: Fast physics simulation
Genesis: Multi-physics simulation
MJX: JAX-based MuJoCo implementation

Dependencies#

Core Requirements#

# Core metasim framework
pip install -e .

# RL libraries
pip install stable-baselines3
pip install torch torchvision
pip install tensordict
pip install loguru
pip install tyro

RL Infrastructure Overview#

Available Scripts#

1. PPO Training (0_ppo.py)#

2. PPO Training with Gym Interface (0_ppo_gym.py)#

3. Fast TD3 Training (1_fttd3.py)#

Environment Integration#

Supported Tasks#

Supported Robots#

Simulator Backends#

Dependencies#

Core Requirements#

This Page

1. PPO Training (`0_ppo.py`)#

2. PPO Training with Gym Interface (`0_ppo_gym.py`)#

3. Fast TD3 Training (`1_fttd3.py`)#