ART (Agent Reinforcement Trainer) Cheatsheet

ART (Agent Reinforcement Trainer) by OpenPipe is an open-source framework for training LLM-based agents with reinforcement learning, primarily GRPO. Its defining idea is a split architecture: a lightweight client runs your agent’s rollouts in your own code through an OpenAI-compatible endpoint, while a server/backend handles inference (vLLM) and training (Unsloth-powered GRPO), optionally on a separate GPU machine. This makes it well suited to multi-turn, tool-using agents that need “on-the-job” training against a reward.

Reinforcement learning can be unstable and compute-hungry. Start small, log everything, and validate a reward function on a handful of rollouts before scaling.

Installation

Step	Command
Install	`pip install openpipe-art`
With extras (backend)	`pip install "openpipe-art[backend]"`
uv	`uv add openpipe-art`
Requirements	A CUDA GPU for the backend (training/inference)

Core Concepts

Term	Meaning
Model	A trainable model handle (`art.TrainableModel`) registered with a backend
Backend	Where inference + training run (local GPU or remote)
Rollout	One agent episode that produces a trajectory and a reward
Trajectory	The messages/tool-calls/choices ART scores and learns from
GRPO	Group Relative Policy Optimization — the default RL algorithm
Reward	A scalar your code assigns to a trajectory (higher = better)

Defining a Model

import art

model = art.TrainableModel(
    name="agent-001",
    project="my-agent",
    base_model="Qwen/Qwen2.5-7B-Instruct",
)

backend = art.LocalBackend()          # or a remote backend
await model.register(backend)

Object	Purpose
`art.TrainableModel(...)`	The policy you are training
`art.LocalBackend()`	Run inference + training on the local GPU
`model.register(backend)`	Bind a model to a backend
`model.openai_client()`	OpenAI-compatible client for rollouts

Writing a Rollout

import art, weave

@weave.op
async def rollout(model: art.Model, scenario) -> art.Trajectory:
    client = model.openai_client()
    traj = art.Trajectory(messages_and_choices=[], reward=0.0)

    messages = [{"role": "user", "content": scenario.prompt}]
    completion = await client.chat.completions.create(
        model=model.name, messages=messages,
    )
    choice = completion.choices[0]
    traj.messages_and_choices.append(choice)

    traj.reward = score(choice.message.content, scenario)  # your reward fn
    return traj

Training Loop

for step in range(NUM_STEPS):
    groups = await art.gather_trajectory_groups(
        (art.TrajectoryGroup(rollout(model, s) for _ in range(GROUP_SIZE))
         for s in scenarios)
    )
    await model.train(groups, config=art.TrainConfig(learning_rate=1e-5))

Call	Description
`art.TrajectoryGroup(...)`	A group of rollouts compared against each other (GRPO)
`art.gather_trajectory_groups(...)`	Run rollouts concurrently and collect groups
`model.train(groups, config=...)`	One GRPO update from the gathered groups
`art.TrainConfig(...)`	Learning rate and training hyperparameters
`model.get_step()`	Current training step (for checkpointing/resume)

Reward Design Tips

Guideline	Why
Keep rewards bounded	Stabilizes GRPO advantage estimates
Reward the outcome, not the wording	Avoids reward hacking on phrasing
Add small shaping for tool success	Helps multi-step credit assignment
Use RULER for relative scoring	ART’s helper to rank trajectories in a group when no clean metric exists

Observability

Tool	Integration
Weights & Biases	Native logging of reward/loss curves
Weave	Decorate rollouts with `@weave.op` for trace capture
LangfuseTracing	Supported for trajectory inspection

ART vs Other RL Trainers

Aspect	ART	OpenRLHF	verl
Focus	Multi-step agents	Scalable RLHF	High-throughput RL
Architecture	Split client/server	Ray + vLLM	Ray + vLLM
Backend	vLLM + Unsloth	vLLM	vLLM
Best for	Agents trained in your own code	Large-scale RLHF pipelines	Research throughput

ART (Agent Reinforcement Trainer) Cheatsheet

ART (Agent Reinforcement Trainer) Cheatsheet

Installation

Core Concepts

Defining a Model

Writing a Rollout

Training Loop

Reward Design Tips

Observability

ART vs Other RL Trainers

Resources