Outlines

Outlines is a Python library for structured text generation. Unlike prompt-based approaches, it modifies the token sampling process itself using Finite State Machines (FSMs) to guarantee outputs that conform to a schema, regex pattern, or grammar — making invalid outputs structurally impossible.

GitHub: https://github.com/dottxt-ai/outlines
Docs: https://dottxt-ai.github.io/outlines/
Paper: “Efficient Guided Generation for LLMs” (Willard & Louf, 2023)

Installation

# Core install
pip install outlines

# With specific model backends
pip install "outlines[transformers]"   # HuggingFace Transformers
pip install "outlines[llamacpp]"       # llama.cpp Python bindings
pip install "outlines[mlxlm]"          # Apple Silicon (MLX)
pip install "outlines[vllm]"           # vLLM high-throughput serving
pip install "outlines[openai]"         # OpenAI API (regex not supported)
pip install "outlines[anthropic]"      # Anthropic API

# For grammar-based generation (EBNF)
pip install lark                       # Required for grammar support

# GPU support (CUDA)
pip install torch --index-url https://download.pytorch.org/whl/cu121

Configuration

Model Loading

import outlines

# Transformers (local, most control)
model = outlines.models.transformers(
    "mistralai/Mistral-7B-Instruct-v0.3",
    device="cuda",                     # "cpu", "cuda", "mps" (Apple)
    model_kwargs={
        "torch_dtype": "auto",
        "load_in_4bit": True,          # 4-bit quantization
    },
)

# llama.cpp (GGUF quantized models, low memory)
model = outlines.models.llamacpp(
    "TheBloke/Mistral-7B-Instruct-v0.2-GGUF",
    filename="mistral-7b-instruct-v0.2.Q4_K_M.gguf",
    n_ctx=4096,
    n_gpu_layers=-1,                   # -1 = all layers on GPU
)

# MLX (Apple Silicon — fast, native)
model = outlines.models.mlxlm("mlx-community/Mistral-7B-Instruct-v0.3-4bit")

# vLLM (high-throughput serving)
model = outlines.models.vllm("mistralai/Mistral-7B-Instruct-v0.3")

# OpenAI (regex not supported — only JSON/choice)
model = outlines.models.openai("gpt-4o-mini")

Sampler Configuration

from outlines.samplers import greedy, multinomial, beam_search

sampler = greedy()                      # Deterministic (temp=0)
sampler = multinomial(samples=1, temperature=0.7, top_p=0.9, top_k=50)
sampler = beam_search(beams=5)          # Beam search (high quality, slow)

Core API

Generator Types

Generator	Function	Use Case
Text	`outlines.generate.text(model)`	Unconstrained generation
Regex	`outlines.generate.regex(model, pattern)`	Pattern-constrained output
Choice	`outlines.generate.choice(model, choices)`	Discrete classification
JSON	`outlines.generate.json(model, schema)`	Pydantic model or JSON schema
Grammar	`outlines.generate.cfg(model, grammar)`	EBNF grammar constraints
Format	`outlines.generate.format(model, type)`	Python type (int, float, bool)
FSM	`outlines.generate.fsm(model, fsm)`	Custom interegular FSM

Generator Call Signatures

Argument	Type	Description
`prompts`	`str` or `list[str]`	Input prompt(s)
`max_tokens`	`int`	Maximum tokens to generate
`stop_at`	`str` or `list[str]`	Stop sequences
`sampler`	`Sampler`	Sampling strategy (default: multinomial)
`kv_cache`	varies	KV cache object for session reuse

Advanced Usage

JSON Generation from Pydantic Models

import outlines
from pydantic import BaseModel, Field
from enum import Enum

class Sentiment(str, Enum):
    POSITIVE = "positive"
    NEGATIVE = "negative"
    NEUTRAL = "neutral"

class Review(BaseModel):
    sentiment: Sentiment
    score: float = Field(ge=0.0, le=10.0)
    summary: str = Field(max_length=100)
    pros: list[str]
    cons: list[str]

model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.3")

generator = outlines.generate.json(model, Review)

# Generate — output is GUARANTEED to be a valid Review instance
review = generator(
    "Review: Great battery life but the camera is disappointing.",
    max_tokens=256,
)
print(review.sentiment)   # Sentiment.POSITIVE or similar
print(review.score)       # Always a float 0.0–10.0
print(type(review))       # <class '__main__.Review'>

JSON from Raw Schema

import json

schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer", "minimum": 0, "maximum": 150},
        "skills": {"type": "array", "items": {"type": "string"}},
    },
    "required": ["name", "age"],
}

generator = outlines.generate.json(model, json.dumps(schema))
result = generator("Extract info: Alice is a 28-year-old Python and Rust developer.")
# result is always valid JSON matching schema
print(result)  # {"name": "Alice", "age": 28, "skills": ["Python", "Rust"]}

Regex-Guided Generation

import outlines

model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.3")

# Phone number extraction
phone_gen = outlines.generate.regex(model, r"\+?[1-9]\d{1,14}")
phone = phone_gen("Contact number for the office:")
print(phone)  # "+14155552671" — always matches pattern

# Date extraction (YYYY-MM-DD)
date_gen = outlines.generate.regex(model, r"\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])")
date = date_gen("The event is scheduled for:")
print(date)  # "2025-06-15" — always a valid date format

# IP address
ip_gen = outlines.generate.regex(
    model,
    r"((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)"
)
ip = ip_gen("Server IP address:")
print(ip)  # Always a syntactically valid IP

# Structured log line
log_gen = outlines.generate.regex(
    model,
    r"\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2} (INFO|WARN|ERROR) \w+: .{1,80}"
)
log = log_gen("Generate a sample log entry:")
print(log)  # "2025-01-15 14:23:01 ERROR database: Connection timeout exceeded"

Choice (Classification)

import outlines

model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.3")

# Binary classification — always returns exactly one of the choices
sentiment_gen = outlines.generate.choice(model, ["positive", "negative", "neutral"])
label = sentiment_gen("Classify: 'This product is absolutely fantastic!'")
print(label)  # "positive"

# Multi-class routing
route_gen = outlines.generate.choice(
    model,
    ["billing", "technical_support", "sales", "returns", "general"]
)
intent = route_gen("I need to update my credit card on file.")
print(intent)  # "billing"

Grammar-Based Generation (EBNF/CFG)

import outlines

model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.3")

# Define arithmetic expression grammar in EBNF
arithmetic_grammar = r"""
    start: expr
    expr: term (("+"|"-") term)*
    term: factor (("*"|"/") factor)*
    factor: NUMBER | "(" expr ")"
    NUMBER: /\d+(\.\d+)?/
    %ignore " "
"""

gen = outlines.generate.cfg(model, arithmetic_grammar)
expr = gen("Write an arithmetic expression for the area of a circle with r=5:")
print(expr)  # "3.14 * 5 * 5" — always valid per grammar

# Simple CSV grammar
csv_grammar = r"""
    start: row ("\n" row)*
    row: field ("," field)*
    field: /[^,\n]*/
"""

csv_gen = outlines.generate.cfg(model, csv_grammar)
csv_data = csv_gen("Generate 3 rows of name, age, city data:")
print(csv_data)

Python Type Constraints

import outlines

model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.3")

# Integer output
int_gen = outlines.generate.format(model, int)
count = int_gen("How many planets are in the solar system?")
print(count, type(count))  # 8 <class 'int'>

# Float output
float_gen = outlines.generate.format(model, float)
prob = float_gen("Probability of rain tomorrow (0.0 to 1.0):")
print(prob, type(prob))   # 0.65 <class 'float'>

# Boolean output
bool_gen = outlines.generate.format(model, bool)
is_spam = bool_gen("Is this spam? 'Win a free iPhone now!'")
print(is_spam)  # True

Batch Generation

import outlines

model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.3")

class Classification(BaseModel):
    label: str
    confidence: float = Field(ge=0.0, le=1.0)

gen = outlines.generate.json(model, Classification)

# Process batch of inputs
prompts = [
    "Classify: Great product, highly recommend!",
    "Classify: Terrible, broke after one day.",
    "Classify: It arrived on time.",
]
results = gen(prompts, max_tokens=64)   # Returns list of Classification objects
for prompt, result in zip(prompts, results):
    print(f"{result.label} ({result.confidence:.2f}): {prompt[:40]}")

Common Workflows

Structured Data Extraction Pipeline

import outlines
from pydantic import BaseModel
from typing import Optional

class InvoiceData(BaseModel):
    vendor_name: str
    invoice_number: str
    total_amount: float
    currency: str
    due_date: Optional[str] = None
    line_items: list[str] = []

model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.3")
extractor = outlines.generate.json(model, InvoiceData)

def extract_invoice(raw_text: str) -> InvoiceData:
    prompt = f"""Extract invoice data from the following text:

{raw_text}

Respond with structured data."""
    return extractor(prompt, max_tokens=512)

Constrained Agent Actions

from pydantic import BaseModel

class AgentAction(BaseModel):
    action: str    # "search", "calculate", "respond", "escalate"
    target: str
    reasoning: str

gen = outlines.generate.json(model, AgentAction)

def decide_action(user_message: str) -> AgentAction:
    prompt = f"""Given this user message, decide what action to take:

User: {user_message}

Choose action from: search, calculate, respond, escalate"""
    return gen(prompt, max_tokens=128)

Tips and Best Practices

Topic	Recommendation
FSM compilation	First call compiles the FSM; subsequent calls on same generator are fast — reuse generators
Pydantic v2	Outlines uses Pydantic v2 schemas; ensure models use v2 syntax
Optional fields	Use `Optional[T] = None` for fields the model might not have data for
Max tokens	Always set `max_tokens` — structured generation still terminates on EOS or stop tokens
Grammar complexity	Very complex grammars (large EBNF) can be slow to compile; cache the generator object
Regex anchoring	Outlines regexes are implicitly anchored to the full output — no need for `^...$`
Quantization	4-bit/8-bit models work with FSM sampling; quality may drop slightly
OpenAI limits	OpenAI backend only supports `json` and `choice` generators, not regex/grammar
Batching	Pass list of prompts for parallel generation; significantly faster than sequential
Model choice	Instruction-tuned models produce better structured outputs than base models
Sampling	Use `greedy()` for deterministic/reproducible outputs; `multinomial` for variety
vLLM integration	Use vLLM backend with `guided_decoding_backend="outlines"` for production serving