RL Infrastructure Overview#

⚠️ CONSTRUCTION WARNING ⚠️
This RL infrastructure is currently under active construction. Features, APIs, and interfaces may change significantly. Use at your own discretion and expect potential breaking changes.

Available Scripts#

All scripts are located in get_started/rl/:

1. PPO Training (0_ppo.py)#

Direct implementation using RLTaskEnv with Stable Baselines3 PPO.

Usage:

# Basic usage with default settings
python get_started/rl/0_ppo.py

# Custom task and robot
python get_started/rl/0_ppo.py --task reach_origin --robot franka --sim isaacgym

# Adjust environment count and headless mode
python get_started/rl/0_ppo.py --num-envs 256 --headless

# Different simulators
python get_started/rl/0_ppo.py --sim mujoco
python get_started/rl/0_ppo.py --sim genesis
python get_started/rl/0_ppo.py --sim isaaclab

Arguments:

  • --task: Task name (default: reach_origin)

  • --robot: Robot type (default: franka)

  • --num-envs: Number of parallel environments (default: 128)

  • --sim: Simulator backend (isaacgym, isaaclab, mujoco, genesis, mjx)

  • --headless: Run without GUI (flag)

Outputs:

  • Model saved to: get_started/output/rl/0_ppo_reaching_{sim}

2. PPO Training with Gym Interface (0_ppo_gym.py)#

Uses Gymnasium-compatible interface with cleaner integration.

Usage:

# Basic training
python get_started/rl/0_ppo_gym.py

# With different backend
python get_started/rl/0_ppo_gym.py --sim mjx --device cuda

# Custom configuration
python get_started/rl/0_ppo_gym.py --task reach_origin --robot franka --num-envs 64

Arguments:

  • --task: Task name (default: reach_origin)

  • --robot: Robot type (default: franka)

  • --num-envs: Number of environments (default: 128)

  • --sim: Simulator (isaaclab, isaacgym, mujoco, genesis, mjx)

  • --headless: Headless mode (flag)

  • --device: Device (cuda, cpu)

3. Fast TD3 Training (1_fttd3.py)#

Advanced TD3 implementation with distributional critics and various optimizations.

Usage:

# Basic training
python get_started/rl/1_fttd3.py

# The script uses a CONFIG dictionary for configuration
# Key parameters can be modified in the CONFIG section

Key Configuration Options:

CONFIG = {
    "sim": "mjx",                    # Simulator backend
    "robots": ["h1"],               # Robot type
    "task": "humanoid.run",         # Task name
    "num_envs": 1024,              # Number of environments
    "total_timesteps": 1500,       # Training timesteps
    "batch_size": 32768,           # Batch size
    "learning_rate": 0.0003,       # Learning rate
    "use_wandb": False,            # Weights & Biases logging
}

Environment Integration#

Supported Tasks#

  • reach_origin: Basic reaching task

  • humanoid.run: Humanoid locomotion

  • Custom tasks via task registry

Supported Robots#

  • franka: Franka Panda arm

  • h1: H1 humanoid robot

  • Additional robots available in robot configurations

Simulator Backends#

  1. Isaac Gym: NVIDIA’s physics simulation

  2. Isaac Lab: Next-generation Isaac simulation

  3. MuJoCo: Fast physics simulation

  4. Genesis: Multi-physics simulation

  5. MJX: JAX-based MuJoCo implementation

Dependencies#

Core Requirements#

# Core metasim framework
pip install -e .

# RL libraries
pip install stable-baselines3
pip install torch torchvision
pip install tensordict
pip install loguru
pip install tyro