PPO#

RoboVerse provides two PPO implementations with different features and use cases:

1. Stable-Baselines3 PPO (Recommended for Beginners)#

Based on Stable-Baselines3, this implementation provides a more user-friendly interface with comprehensive configuration options.

Usage#

# Basic PPO training with Franka robot
python get_started/rl/0_ppo.py --task reach_origin --robot franka --sim isaacgym

# PPO with Gym interface
python get_started/rl/0_ppo_gym_style.py --sim mjx --num-envs 256

Configuration#

Check the file header in get_started/rl/0_ppo.py for available configuration options including:

Task selection (--task)
Robot type (--robot)
Simulator backend (--sim)
Environment settings

2. CleanRL PPO#

Based on CleanRL, this implementation provides a more minimal and educational approach with direct algorithm implementation.

Usage#

# CleanRL PPO with RoboVerse environment
python roboverse_learn/rl/clean_rl/ppo.py --task reach_origin --robot franka --sim mjx --num_envs 2048

Configuration#

Check the file header in roboverse_learn/rl/clean_rl/ppo.py for available configuration options including:

Task selection (--task)
Robot type (--robot)
Simulator backend (--sim)
Training hyperparameters (--num_envs, --learning_rate, etc.)

Quick Start Examples#

For detailed tutorials and infrastructure setup:

Infrastructure Overview: See RL Infrastructure for complete setup
Quick Examples: See Quick Start Examples for ready-to-run commands

PPO#

1. Stable-Baselines3 PPO (Recommended for Beginners)#

Usage#

Configuration#

2. CleanRL PPO#

Usage#

Configuration#

Quick Start Examples#

This Page