# RL Infrastructure Overview > ⚠️ **CONSTRUCTION WARNING** ⚠️ > This RL infrastructure is currently under active construction. Features, APIs, and interfaces may change significantly. Use at your own discretion and expect potential breaking changes. ## Available Scripts All scripts are located in `get_started/rl/`: ### 1. PPO Training (`0_ppo.py`) Direct implementation using RLTaskEnv with Stable Baselines3 PPO. **Usage:** ```bash # Basic usage with default settings python get_started/rl/0_ppo.py # Custom task and robot python get_started/rl/0_ppo.py --task reach_origin --robot franka --sim isaacgym # Adjust environment count and headless mode python get_started/rl/0_ppo.py --num-envs 256 --headless # Different simulators python get_started/rl/0_ppo.py --sim mujoco python get_started/rl/0_ppo.py --sim genesis python get_started/rl/0_ppo.py --sim isaaclab ``` **Arguments:** - `--task`: Task name (default: `reach_origin`) - `--robot`: Robot type (default: `franka`) - `--num-envs`: Number of parallel environments (default: `128`) - `--sim`: Simulator backend (`isaacgym`, `isaaclab`, `mujoco`, `genesis`, `mjx`) - `--headless`: Run without GUI (flag) **Outputs:** - Model saved to: `get_started/output/rl/0_ppo_reaching_{sim}` ### 2. PPO Training with Gym Interface (`0_ppo_gym.py`) Uses Gymnasium-compatible interface with cleaner integration. **Usage:** ```bash # Basic training python get_started/rl/0_ppo_gym.py # With different backend python get_started/rl/0_ppo_gym.py --sim mjx --device cuda # Custom configuration python get_started/rl/0_ppo_gym.py --task reach_origin --robot franka --num-envs 64 ``` **Arguments:** - `--task`: Task name (default: `reach_origin`) - `--robot`: Robot type (default: `franka`) - `--num-envs`: Number of environments (default: `128`) - `--sim`: Simulator (`isaaclab`, `isaacgym`, `mujoco`, `genesis`, `mjx`) - `--headless`: Headless mode (flag) - `--device`: Device (`cuda`, `cpu`) ### 3. Fast TD3 Training (`1_fttd3.py`) Advanced TD3 implementation with distributional critics and various optimizations. **Usage:** ```bash # Basic training python get_started/rl/1_fttd3.py # The script uses a CONFIG dictionary for configuration # Key parameters can be modified in the CONFIG section ``` **Key Configuration Options:** ```python CONFIG = { "sim": "mjx", # Simulator backend "robots": ["h1"], # Robot type "task": "humanoid.run", # Task name "num_envs": 1024, # Number of environments "total_timesteps": 1500, # Training timesteps "batch_size": 32768, # Batch size "learning_rate": 0.0003, # Learning rate "use_wandb": False, # Weights & Biases logging } ``` ## Environment Integration ### Supported Tasks - `reach_origin`: Basic reaching task - `humanoid.run`: Humanoid locomotion - Custom tasks via task registry ### Supported Robots - `franka`: Franka Panda arm - `h1`: H1 humanoid robot - Additional robots available in robot configurations ### Simulator Backends 1. **Isaac Gym**: NVIDIA's physics simulation 2. **Isaac Lab**: Next-generation Isaac simulation 3. **MuJoCo**: Fast physics simulation 4. **Genesis**: Multi-physics simulation 5. **MJX**: JAX-based MuJoCo implementation ## Dependencies ### Core Requirements ```bash # Core metasim framework pip install -e . # RL libraries pip install stable-baselines3 pip install torch torchvision pip install tensordict pip install loguru pip install tyro ```