rLLM

Train your AI agents with RL. Any framework. Minimal code changes.

rLLM is an open-source framework for training AI agents with reinforcement learning. Swap in a tracked client, define a reward function, and let RL handle the rest — no matter what agent framework you use.

Core Features

Works with any agent framework — LangGraph, SmolAgent, Strands, OpenAI Agents SDK, Google ADK, or plain openai.OpenAI. Just swap the client. 🔌
Near-zero code changes — Add @rllm.rollout to wrap your agent code, and rLLM traces every LLM call automatically. 🪄
CLI-first workflow — Eval and train from the command line with 50+ built-in benchmarks. rllm eval gsm8k just works. ⚡
Battle-tested results — rLLM-trained agents beat models 50x their size (4B → outperforms 235B on finance, 1.5B → surpasses O1-Preview on math). 📈
Multiple RL algorithms — GRPO, REINFORCE, RLOO, rejection sampling, and more. 🧠
Two training backends — verl for distributed multi-GPU training, tinker for single-machine / CPU setups. Same API either way. 🔧

Installation

rLLM requires Python >= 3.10 (3.11 is needed if using tinker). You can install it either directly via pip or build from source.

uv pip install "rllm @ git+https://github.com/rllm-org/rllm.git"

this installs dependencies for running rllm cli, which uses Tinker as the training backend.

To use verl as the training backend (GPU machine required), install via

# For distributed GPU training (verl + vLLM/SGLang)
uv pip install rllm[verl] @ git+https://github.com/rllm-org/rllm.git

For building from source or Docker, see the installation guide.

Quickstart

Option A: CLI (no code needed)

# 1. Configure your model provider
rllm model setup

# 2. Evaluate on a benchmark
rllm eval gsm8k

# 3. Train with RL
rllm train gsm8k

Option B: Python API

Define a rollout (your agent) and an evaluator (your reward function), then hand them to the trainer:

# my_flow.py
from openai import OpenAI
import rllm
from rllm.experimental.eval.types import AgentConfig, Task
from rllm.types import Episode, Trajectory

@rllm.rollout
def solve(task: Task, config: AgentConfig) -> Episode:
    client = OpenAI(base_url=config.base_url, api_key="EMPTY")
    response = client.chat.completions.create(
        model=config.model,
        messages=[{"role": "user", "content": task.data["question"]}],
    )
    answer = response.choices[0].message.content or ""
    return Episode(
        trajectories=[Trajectory(name="solver", steps=[])],
        artifacts={"answer": answer},
    )

# my_evaluator.py
import rllm
from rllm.experimental.eval.types import EvalOutput, Signal, _extract_agent_answer
from rllm.types import Episode

@rllm.evaluator
def score(task: dict, episode: Episode) -> EvalOutput:
    answer = _extract_agent_answer(episode)
    is_correct = answer.strip() == task["ground_truth"].strip()
    reward = 1.0 if is_correct else 0.0
    return EvalOutput(reward=reward, is_correct=is_correct,
                      signals=[Signal(name="accuracy", value=reward)])

# train.py
from rllm.experimental.unified_trainer import AgentTrainer

trainer = AgentTrainer(
    backend="tinker",
    agent_flow=solve,
    evaluator=score,
    config=config,
    train_dataset=dataset,
)
trainer.train()

During training, config.base_url points to a gateway that transparently captures token IDs and logprobs — your agent code stays the same for eval and training.

See the cookbooks for complete working examples (single-turn VLM solver, multi-agent solver-judge, and more).

Architecture

rLLM follows a simple pipeline: run your agent → collect traces → compute rewards → update the model.

┌──────────────┐    ┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│  Your Agent  │───▶│    Traces     │───▶│   Rewards    │───▶│  RL Update   │
│  (any code)  │    │  (auto-logged)│    │ (your logic) │    │  (GRPO etc.) │
└──────────────┘    └──────────────┘    └──────────────┘    └──────────────┘

Your agent runs as-is — rLLM's SDK intercepts LLM calls and structures them into Episodes (one task) containing Trajectories (one agent run) made of Steps (one LLM call). A reward function scores the result, and the RL algorithm updates the model weights. The same agent code works for both eval and training.

Under the hood:

Workflow Engine runs N parallel agent instances to collect rollouts
LiteLLM Proxy routes requests and captures token IDs + logprobs
Transform Pipeline groups trajectories for advantage computation
Training Backend (verl or tinker) handles the policy update

Community Projects

Tongyi DeepResearch — Open-source AI researchers by Alibaba NLP
Terminal-Bench-RL — Training long-horizon terminal agents with RL
PettingLLMs — Multi-agent RL with on-policy training
SETA — Scaling environments for terminal agents
LLM-in-Sandbox — Building general agents by running LLMs in a sandbox
Cogito, Ergo Ludo — An agent that learns to play by reasoning and planning
Cut the Bill, Keep the Turns — Affordable multi-turn search RL
Experiential Reinforcement Learning — Experience-reflection-consolidation loop for RL with sparse rewards
V1: Unifying Generation and Self-Verification — Pairwise self-verification for parallel test-time scaling

Articles & Blog Posts

rLLM UI: Real-Time Observability Tool for Agent Training & Evaluation — Mar 2026
rLLM On-Policy Distillation: Training Smaller Students from Stronger Teachers — Mar 2026
Faster and Better: Open-Source Recipe for Deep Research Agents with Fully Async Training — Feb 2026
rLLM-FinQA: How a 4B Model Outperforms 235B and Rivals Gemini 2.5 Pro on Financial Analysis — Feb 2026
rLLM SDK: Training Any Agentic Program without Code Changes — Dec 2025
rLLM v0.2: RL Training for General Agentic Programs — Oct 2025
DeepSWE: Open-source SWE Agent via RL — Jul 2025
DeepCoder: 14B Coder at O3-mini Level — Apr 2025
DeepScaleR: 1.5B Surpasses O1-Preview — Feb 2025

Acknowledgements

Our work is done as part of Berkeley Sky Computing Lab. The rLLM team is generously supported by grants from Laude Institute, AWS, Hyperbolic, Fireworks AI, and Modal. We pay special thanks to Together AI for the research partnership and compute support.

Citation

@misc{rllm2025,
  title={rLLM: A Framework for Post-Training Language Agents},
  author={Sijun Tan and Michael Luo and Colin Cai and Tarun Venkat and Kyle Montgomery and Aaron Hao and Tianhao Wu and Arnav Balyan and Manan Roongta and Chenguang Wang and Li Erran Li and Raluca Ada Popa and Ion Stoica},
  year={2025},
  howpublished={\url{https://pretty-radio-b75.notion.site/rLLM-A-Framework-for-Post-Training-Language-Agents-21b81902c146819db63cd98a54ba5f31}},
  note={Notion Blog},
}

You may also cite our prior work DeepScaleR, DeepCoder, and DeepSWE.

Name		Name	Last commit message	Last commit date
Latest commit History 1,732 Commits
.github		.github
agenthub		agenthub
cookbooks		cookbooks
docs		docs
examples		examples
projects/finqa		projects/finqa
rllm-model-gateway		rllm-model-gateway
rllm		rllm
scripts		scripts
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
build_docs.sh		build_docs.sh
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rLLM

Core Features

Installation

Quickstart

Option A: CLI (no code needed)

Option B: Python API

Architecture

Community Projects

Articles & Blog Posts

Acknowledgements

Citation

About

Uh oh!

Releases 3

Packages

Uh oh!

Uh oh!

Contributors 53

Languages

Folders and files

Latest commit

History

Repository files navigation

rLLM

Core Features

Installation

Quickstart

Option A: CLI (no code needed)

Option B: Python API

Architecture

Community Projects

Articles & Blog Posts

Acknowledgements

Citation

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Uh oh!

Contributors 53

Languages

Packages