Outlines is a Python library for structured text generation. Unlike prompt-based approaches, it modifies the token sampling process itself using Finite State Machines (FSMs) to guarantee outputs that conform to a schema, regex pattern, or grammar — making invalid outputs structurally impossible.
GitHub: https://github.com/dottxt-ai/outlines
Docs: https://dottxt-ai.github.io/outlines/
Paper: “Efficient Guided Generation for LLMs” (Willard & Louf, 2023)
Installation
# Core install
pip install outlines
# With specific model backends
pip install "outlines[transformers]" # HuggingFace Transformers
pip install "outlines[llamacpp]" # llama.cpp Python bindings
pip install "outlines[mlxlm]" # Apple Silicon (MLX)
pip install "outlines[vllm]" # vLLM high-throughput serving
pip install "outlines[openai]" # OpenAI API (regex not supported)
pip install "outlines[anthropic]" # Anthropic API
# For grammar-based generation (EBNF)
pip install lark # Required for grammar support
# GPU support (CUDA)
pip install torch --index-url https://download.pytorch.org/whl/cu121
Configuration
Model Loading
import outlines
# Transformers (local, most control)
model = outlines.models.transformers(
"mistralai/Mistral-7B-Instruct-v0.3",
device="cuda", # "cpu", "cuda", "mps" (Apple)
model_kwargs={
"torch_dtype": "auto",
"load_in_4bit": True, # 4-bit quantization
},
)
# llama.cpp (GGUF quantized models, low memory)
model = outlines.models.llamacpp(
"TheBloke/Mistral-7B-Instruct-v0.2-GGUF",
filename="mistral-7b-instruct-v0.2.Q4_K_M.gguf",
n_ctx=4096,
n_gpu_layers=-1, # -1 = all layers on GPU
)
# MLX (Apple Silicon — fast, native)
model = outlines.models.mlxlm("mlx-community/Mistral-7B-Instruct-v0.3-4bit")
# vLLM (high-throughput serving)
model = outlines.models.vllm("mistralai/Mistral-7B-Instruct-v0.3")
# OpenAI (regex not supported — only JSON/choice)
model = outlines.models.openai("gpt-4o-mini")
Sampler Configuration
from outlines.samplers import greedy, multinomial, beam_search
sampler = greedy() # Deterministic (temp=0)
sampler = multinomial(samples=1, temperature=0.7, top_p=0.9, top_k=50)
sampler = beam_search(beams=5) # Beam search (high quality, slow)
Core API
Generator Types
| Generator | Function | Use Case |
|---|
| Text | outlines.generate.text(model) | Unconstrained generation |
| Regex | outlines.generate.regex(model, pattern) | Pattern-constrained output |
| Choice | outlines.generate.choice(model, choices) | Discrete classification |
| JSON | outlines.generate.json(model, schema) | Pydantic model or JSON schema |
| Grammar | outlines.generate.cfg(model, grammar) | EBNF grammar constraints |
| Format | outlines.generate.format(model, type) | Python type (int, float, bool) |
| FSM | outlines.generate.fsm(model, fsm) | Custom interegular FSM |
Generator Call Signatures
| Argument | Type | Description |
|---|
prompts | str or list[str] | Input prompt(s) |
max_tokens | int | Maximum tokens to generate |
stop_at | str or list[str] | Stop sequences |
sampler | Sampler | Sampling strategy (default: multinomial) |
kv_cache | varies | KV cache object for session reuse |
Advanced Usage
JSON Generation from Pydantic Models
import outlines
from pydantic import BaseModel, Field
from enum import Enum
class Sentiment(str, Enum):
POSITIVE = "positive"
NEGATIVE = "negative"
NEUTRAL = "neutral"
class Review(BaseModel):
sentiment: Sentiment
score: float = Field(ge=0.0, le=10.0)
summary: str = Field(max_length=100)
pros: list[str]
cons: list[str]
model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.3")
generator = outlines.generate.json(model, Review)
# Generate — output is GUARANTEED to be a valid Review instance
review = generator(
"Review: Great battery life but the camera is disappointing.",
max_tokens=256,
)
print(review.sentiment) # Sentiment.POSITIVE or similar
print(review.score) # Always a float 0.0–10.0
print(type(review)) # <class '__main__.Review'>
JSON from Raw Schema
import json
schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer", "minimum": 0, "maximum": 150},
"skills": {"type": "array", "items": {"type": "string"}},
},
"required": ["name", "age"],
}
generator = outlines.generate.json(model, json.dumps(schema))
result = generator("Extract info: Alice is a 28-year-old Python and Rust developer.")
# result is always valid JSON matching schema
print(result) # {"name": "Alice", "age": 28, "skills": ["Python", "Rust"]}
Regex-Guided Generation
import outlines
model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.3")
# Phone number extraction
phone_gen = outlines.generate.regex(model, r"\+?[1-9]\d{1,14}")
phone = phone_gen("Contact number for the office:")
print(phone) # "+14155552671" — always matches pattern
# Date extraction (YYYY-MM-DD)
date_gen = outlines.generate.regex(model, r"\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])")
date = date_gen("The event is scheduled for:")
print(date) # "2025-06-15" — always a valid date format
# IP address
ip_gen = outlines.generate.regex(
model,
r"((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)"
)
ip = ip_gen("Server IP address:")
print(ip) # Always a syntactically valid IP
# Structured log line
log_gen = outlines.generate.regex(
model,
r"\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2} (INFO|WARN|ERROR) \w+: .{1,80}"
)
log = log_gen("Generate a sample log entry:")
print(log) # "2025-01-15 14:23:01 ERROR database: Connection timeout exceeded"
Choice (Classification)
import outlines
model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.3")
# Binary classification — always returns exactly one of the choices
sentiment_gen = outlines.generate.choice(model, ["positive", "negative", "neutral"])
label = sentiment_gen("Classify: 'This product is absolutely fantastic!'")
print(label) # "positive"
# Multi-class routing
route_gen = outlines.generate.choice(
model,
["billing", "technical_support", "sales", "returns", "general"]
)
intent = route_gen("I need to update my credit card on file.")
print(intent) # "billing"
Grammar-Based Generation (EBNF/CFG)
import outlines
model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.3")
# Define arithmetic expression grammar in EBNF
arithmetic_grammar = r"""
start: expr
expr: term (("+"|"-") term)*
term: factor (("*"|"/") factor)*
factor: NUMBER | "(" expr ")"
NUMBER: /\d+(\.\d+)?/
%ignore " "
"""
gen = outlines.generate.cfg(model, arithmetic_grammar)
expr = gen("Write an arithmetic expression for the area of a circle with r=5:")
print(expr) # "3.14 * 5 * 5" — always valid per grammar
# Simple CSV grammar
csv_grammar = r"""
start: row ("\n" row)*
row: field ("," field)*
field: /[^,\n]*/
"""
csv_gen = outlines.generate.cfg(model, csv_grammar)
csv_data = csv_gen("Generate 3 rows of name, age, city data:")
print(csv_data)
Python Type Constraints
import outlines
model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.3")
# Integer output
int_gen = outlines.generate.format(model, int)
count = int_gen("How many planets are in the solar system?")
print(count, type(count)) # 8 <class 'int'>
# Float output
float_gen = outlines.generate.format(model, float)
prob = float_gen("Probability of rain tomorrow (0.0 to 1.0):")
print(prob, type(prob)) # 0.65 <class 'float'>
# Boolean output
bool_gen = outlines.generate.format(model, bool)
is_spam = bool_gen("Is this spam? 'Win a free iPhone now!'")
print(is_spam) # True
Batch Generation
import outlines
model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.3")
class Classification(BaseModel):
label: str
confidence: float = Field(ge=0.0, le=1.0)
gen = outlines.generate.json(model, Classification)
# Process batch of inputs
prompts = [
"Classify: Great product, highly recommend!",
"Classify: Terrible, broke after one day.",
"Classify: It arrived on time.",
]
results = gen(prompts, max_tokens=64) # Returns list of Classification objects
for prompt, result in zip(prompts, results):
print(f"{result.label} ({result.confidence:.2f}): {prompt[:40]}")
Common Workflows
import outlines
from pydantic import BaseModel
from typing import Optional
class InvoiceData(BaseModel):
vendor_name: str
invoice_number: str
total_amount: float
currency: str
due_date: Optional[str] = None
line_items: list[str] = []
model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.3")
extractor = outlines.generate.json(model, InvoiceData)
def extract_invoice(raw_text: str) -> InvoiceData:
prompt = f"""Extract invoice data from the following text:
{raw_text}
Respond with structured data."""
return extractor(prompt, max_tokens=512)
Constrained Agent Actions
from pydantic import BaseModel
class AgentAction(BaseModel):
action: str # "search", "calculate", "respond", "escalate"
target: str
reasoning: str
gen = outlines.generate.json(model, AgentAction)
def decide_action(user_message: str) -> AgentAction:
prompt = f"""Given this user message, decide what action to take:
User: {user_message}
Choose action from: search, calculate, respond, escalate"""
return gen(prompt, max_tokens=128)
Tips and Best Practices
| Topic | Recommendation |
|---|
| FSM compilation | First call compiles the FSM; subsequent calls on same generator are fast — reuse generators |
| Pydantic v2 | Outlines uses Pydantic v2 schemas; ensure models use v2 syntax |
| Optional fields | Use Optional[T] = None for fields the model might not have data for |
| Max tokens | Always set max_tokens — structured generation still terminates on EOS or stop tokens |
| Grammar complexity | Very complex grammars (large EBNF) can be slow to compile; cache the generator object |
| Regex anchoring | Outlines regexes are implicitly anchored to the full output — no need for ^...$ |
| Quantization | 4-bit/8-bit models work with FSM sampling; quality may drop slightly |
| OpenAI limits | OpenAI backend only supports json and choice generators, not regex/grammar |
| Batching | Pass list of prompts for parallel generation; significantly faster than sequential |
| Model choice | Instruction-tuned models produce better structured outputs than base models |
| Sampling | Use greedy() for deterministic/reproducible outputs; multinomial for variety |
| vLLM integration | Use vLLM backend with guided_decoding_backend="outlines" for production serving |