RL Infrastructure Overview#
⚠️ CONSTRUCTION WARNING ⚠️
This RL infrastructure is currently under active construction. Features, APIs, and interfaces may change significantly. Use at your own discretion and expect potential breaking changes.
Available Scripts#
All scripts are located in get_started/rl/:
1. PPO Training (0_ppo.py)#
Direct implementation using RLTaskEnv with Stable Baselines3 PPO.
Usage:
# Basic usage with default settings
python get_started/rl/0_ppo.py
# Custom task and robot
python get_started/rl/0_ppo.py --task reach_origin --robot franka --sim isaacgym
# Adjust environment count and headless mode
python get_started/rl/0_ppo.py --num-envs 256 --headless
# Different simulators
python get_started/rl/0_ppo.py --sim mujoco
python get_started/rl/0_ppo.py --sim genesis
python get_started/rl/0_ppo.py --sim isaaclab
Arguments:
--task: Task name (default:reach_origin)--robot: Robot type (default:franka)--num-envs: Number of parallel environments (default:128)--sim: Simulator backend (isaacgym,isaaclab,mujoco,genesis,mjx)--headless: Run without GUI (flag)
Outputs:
Model saved to:
get_started/output/rl/0_ppo_reaching_{sim}
2. PPO Training with Gym Interface (0_ppo_gym.py)#
Uses Gymnasium-compatible interface with cleaner integration.
Usage:
# Basic training
python get_started/rl/0_ppo_gym.py
# With different backend
python get_started/rl/0_ppo_gym.py --sim mjx --device cuda
# Custom configuration
python get_started/rl/0_ppo_gym.py --task reach_origin --robot franka --num-envs 64
Arguments:
--task: Task name (default:reach_origin)--robot: Robot type (default:franka)--num-envs: Number of environments (default:128)--sim: Simulator (isaaclab,isaacgym,mujoco,genesis,mjx)--headless: Headless mode (flag)--device: Device (cuda,cpu)
3. Fast TD3 Training (1_fttd3.py)#
Advanced TD3 implementation with distributional critics and various optimizations.
Usage:
# Basic training
python get_started/rl/1_fttd3.py
# The script uses a CONFIG dictionary for configuration
# Key parameters can be modified in the CONFIG section
Key Configuration Options:
CONFIG = {
"sim": "mjx", # Simulator backend
"robots": ["h1"], # Robot type
"task": "humanoid.run", # Task name
"num_envs": 1024, # Number of environments
"total_timesteps": 1500, # Training timesteps
"batch_size": 32768, # Batch size
"learning_rate": 0.0003, # Learning rate
"use_wandb": False, # Weights & Biases logging
}
Environment Integration#
Supported Tasks#
reach_origin: Basic reaching taskhumanoid.run: Humanoid locomotionCustom tasks via task registry
Supported Robots#
franka: Franka Panda armh1: H1 humanoid robotAdditional robots available in robot configurations
Simulator Backends#
Isaac Gym: NVIDIA’s physics simulation
Isaac Lab: Next-generation Isaac simulation
MuJoCo: Fast physics simulation
Genesis: Multi-physics simulation
MJX: JAX-based MuJoCo implementation
Dependencies#
Core Requirements#
# Core metasim framework
pip install -e .
# RL libraries
pip install stable-baselines3
pip install torch torchvision
pip install tensordict
pip install loguru
pip install tyro