RL Infrastructure Overview#
⚠️ CONSTRUCTION WARNING ⚠️
This RL infrastructure is currently under active construction. Features, APIs, and interfaces may change significantly. Use at your own discretion and expect potential breaking changes.
Available Scripts#
All scripts are located in get_started/rl/:
1. PPO Training (0_ppo.py)#
Direct implementation using RLTaskEnv with Stable Baselines3 PPO.
Usage:
# Basic usage with default settings
python get_started/rl/0_ppo.py
# Custom task and robot
python get_started/rl/0_ppo.py --task reach_origin --robot franka --sim isaacgym
# Adjust environment count and headless mode
python get_started/rl/0_ppo.py --num-envs 256 --headless
# Different simulators
python get_started/rl/0_ppo.py --sim mujoco
python get_started/rl/0_ppo.py --sim genesis
python get_started/rl/0_ppo.py --sim isaaclab
Arguments:
- --task: Task name (default:- reach_origin)
- --robot: Robot type (default:- franka)
- --num-envs: Number of parallel environments (default:- 128)
- --sim: Simulator backend (- isaacgym,- isaaclab,- mujoco,- genesis,- mjx)
- --headless: Run without GUI (flag)
Outputs:
- Model saved to: - get_started/output/rl/0_ppo_reaching_{sim}
2. PPO Training with Gym Interface (0_ppo_gym.py)#
Uses Gymnasium-compatible interface with cleaner integration.
Usage:
# Basic training
python get_started/rl/0_ppo_gym.py
# With different backend
python get_started/rl/0_ppo_gym.py --sim mjx --device cuda
# Custom configuration
python get_started/rl/0_ppo_gym.py --task reach_origin --robot franka --num-envs 64
Arguments:
- --task: Task name (default:- reach_origin)
- --robot: Robot type (default:- franka)
- --num-envs: Number of environments (default:- 128)
- --sim: Simulator (- isaaclab,- isaacgym,- mujoco,- genesis,- mjx)
- --headless: Headless mode (flag)
- --device: Device (- cuda,- cpu)
3. Fast TD3 Training (1_fttd3.py)#
Advanced TD3 implementation with distributional critics and various optimizations.
Usage:
# Basic training
python get_started/rl/1_fttd3.py
# The script uses a CONFIG dictionary for configuration
# Key parameters can be modified in the CONFIG section
Key Configuration Options:
CONFIG = {
    "sim": "mjx",                    # Simulator backend
    "robots": ["h1"],               # Robot type
    "task": "humanoid.run",         # Task name
    "num_envs": 1024,              # Number of environments
    "total_timesteps": 1500,       # Training timesteps
    "batch_size": 32768,           # Batch size
    "learning_rate": 0.0003,       # Learning rate
    "use_wandb": False,            # Weights & Biases logging
}
Environment Integration#
Supported Tasks#
- reach_origin: Basic reaching task
- humanoid.run: Humanoid locomotion
- Custom tasks via task registry 
Supported Robots#
- franka: Franka Panda arm
- h1: H1 humanoid robot
- Additional robots available in robot configurations 
Simulator Backends#
- Isaac Gym: NVIDIA’s physics simulation 
- Isaac Lab: Next-generation Isaac simulation 
- MuJoCo: Fast physics simulation 
- Genesis: Multi-physics simulation 
- MJX: JAX-based MuJoCo implementation 
Dependencies#
Core Requirements#
# Core metasim framework
pip install -e .
# RL libraries
pip install stable-baselines3
pip install torch torchvision
pip install tensordict
pip install loguru
pip install tyro