コンテンツにスキップ

Outlines

Outlines is a Python library for structured text generation. Unlike prompt-based approaches, it modifies the token sampling process itself using Finite State Machines (FSMs) to guarantee outputs that conform to a schema, regex pattern, or grammar — making invalid outputs structurally impossible.

GitHub: https://github.com/dottxt-ai/outlines
Docs: https://dottxt-ai.github.io/outlines/
Paper: “Efficient Guided Generation for LLMs” (Willard & Louf, 2023)

Installation

# Core install
pip install outlines

# With specific model backends
pip install "outlines[transformers]"   # HuggingFace Transformers
pip install "outlines[llamacpp]"       # llama.cpp Python bindings
pip install "outlines[mlxlm]"          # Apple Silicon (MLX)
pip install "outlines[vllm]"           # vLLM high-throughput serving
pip install "outlines[openai]"         # OpenAI API (regex not supported)
pip install "outlines[anthropic]"      # Anthropic API

# For grammar-based generation (EBNF)
pip install lark                       # Required for grammar support

# GPU support (CUDA)
pip install torch --index-url https://download.pytorch.org/whl/cu121

Configuration

Model Loading

import outlines

# Transformers (local, most control)
model = outlines.models.transformers(
    "mistralai/Mistral-7B-Instruct-v0.3",
    device="cuda",                     # "cpu", "cuda", "mps" (Apple)
    model_kwargs={
        "torch_dtype": "auto",
        "load_in_4bit": True,          # 4-bit quantization
    },
)

# llama.cpp (GGUF quantized models, low memory)
model = outlines.models.llamacpp(
    "TheBloke/Mistral-7B-Instruct-v0.2-GGUF",
    filename="mistral-7b-instruct-v0.2.Q4_K_M.gguf",
    n_ctx=4096,
    n_gpu_layers=-1,                   # -1 = all layers on GPU
)

# MLX (Apple Silicon — fast, native)
model = outlines.models.mlxlm("mlx-community/Mistral-7B-Instruct-v0.3-4bit")

# vLLM (high-throughput serving)
model = outlines.models.vllm("mistralai/Mistral-7B-Instruct-v0.3")

# OpenAI (regex not supported — only JSON/choice)
model = outlines.models.openai("gpt-4o-mini")

Sampler Configuration

from outlines.samplers import greedy, multinomial, beam_search

sampler = greedy()                      # Deterministic (temp=0)
sampler = multinomial(samples=1, temperature=0.7, top_p=0.9, top_k=50)
sampler = beam_search(beams=5)          # Beam search (high quality, slow)

Core API

Generator Types

GeneratorFunctionUse Case
Textoutlines.generate.text(model)Unconstrained generation
Regexoutlines.generate.regex(model, pattern)Pattern-constrained output
Choiceoutlines.generate.choice(model, choices)Discrete classification
JSONoutlines.generate.json(model, schema)Pydantic model or JSON schema
Grammaroutlines.generate.cfg(model, grammar)EBNF grammar constraints
Formatoutlines.generate.format(model, type)Python type (int, float, bool)
FSMoutlines.generate.fsm(model, fsm)Custom interegular FSM

Generator Call Signatures

ArgumentTypeDescription
promptsstr or list[str]Input prompt(s)
max_tokensintMaximum tokens to generate
stop_atstr or list[str]Stop sequences
samplerSamplerSampling strategy (default: multinomial)
kv_cachevariesKV cache object for session reuse

Advanced Usage

JSON Generation from Pydantic Models

import outlines
from pydantic import BaseModel, Field
from enum import Enum

class Sentiment(str, Enum):
    POSITIVE = "positive"
    NEGATIVE = "negative"
    NEUTRAL = "neutral"

class Review(BaseModel):
    sentiment: Sentiment
    score: float = Field(ge=0.0, le=10.0)
    summary: str = Field(max_length=100)
    pros: list[str]
    cons: list[str]

model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.3")

generator = outlines.generate.json(model, Review)

# Generate — output is GUARANTEED to be a valid Review instance
review = generator(
    "Review: Great battery life but the camera is disappointing.",
    max_tokens=256,
)
print(review.sentiment)   # Sentiment.POSITIVE or similar
print(review.score)       # Always a float 0.0–10.0
print(type(review))       # <class '__main__.Review'>

JSON from Raw Schema

import json

schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer", "minimum": 0, "maximum": 150},
        "skills": {"type": "array", "items": {"type": "string"}},
    },
    "required": ["name", "age"],
}

generator = outlines.generate.json(model, json.dumps(schema))
result = generator("Extract info: Alice is a 28-year-old Python and Rust developer.")
# result is always valid JSON matching schema
print(result)  # {"name": "Alice", "age": 28, "skills": ["Python", "Rust"]}

Regex-Guided Generation

import outlines

model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.3")

# Phone number extraction
phone_gen = outlines.generate.regex(model, r"\+?[1-9]\d{1,14}")
phone = phone_gen("Contact number for the office:")
print(phone)  # "+14155552671" — always matches pattern

# Date extraction (YYYY-MM-DD)
date_gen = outlines.generate.regex(model, r"\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])")
date = date_gen("The event is scheduled for:")
print(date)  # "2025-06-15" — always a valid date format

# IP address
ip_gen = outlines.generate.regex(
    model,
    r"((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)"
)
ip = ip_gen("Server IP address:")
print(ip)  # Always a syntactically valid IP

# Structured log line
log_gen = outlines.generate.regex(
    model,
    r"\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2} (INFO|WARN|ERROR) \w+: .{1,80}"
)
log = log_gen("Generate a sample log entry:")
print(log)  # "2025-01-15 14:23:01 ERROR database: Connection timeout exceeded"

Choice (Classification)

import outlines

model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.3")

# Binary classification — always returns exactly one of the choices
sentiment_gen = outlines.generate.choice(model, ["positive", "negative", "neutral"])
label = sentiment_gen("Classify: 'This product is absolutely fantastic!'")
print(label)  # "positive"

# Multi-class routing
route_gen = outlines.generate.choice(
    model,
    ["billing", "technical_support", "sales", "returns", "general"]
)
intent = route_gen("I need to update my credit card on file.")
print(intent)  # "billing"

Grammar-Based Generation (EBNF/CFG)

import outlines

model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.3")

# Define arithmetic expression grammar in EBNF
arithmetic_grammar = r"""
    start: expr
    expr: term (("+"|"-") term)*
    term: factor (("*"|"/") factor)*
    factor: NUMBER | "(" expr ")"
    NUMBER: /\d+(\.\d+)?/
    %ignore " "
"""

gen = outlines.generate.cfg(model, arithmetic_grammar)
expr = gen("Write an arithmetic expression for the area of a circle with r=5:")
print(expr)  # "3.14 * 5 * 5" — always valid per grammar

# Simple CSV grammar
csv_grammar = r"""
    start: row ("\n" row)*
    row: field ("," field)*
    field: /[^,\n]*/
"""

csv_gen = outlines.generate.cfg(model, csv_grammar)
csv_data = csv_gen("Generate 3 rows of name, age, city data:")
print(csv_data)

Python Type Constraints

import outlines

model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.3")

# Integer output
int_gen = outlines.generate.format(model, int)
count = int_gen("How many planets are in the solar system?")
print(count, type(count))  # 8 <class 'int'>

# Float output
float_gen = outlines.generate.format(model, float)
prob = float_gen("Probability of rain tomorrow (0.0 to 1.0):")
print(prob, type(prob))   # 0.65 <class 'float'>

# Boolean output
bool_gen = outlines.generate.format(model, bool)
is_spam = bool_gen("Is this spam? 'Win a free iPhone now!'")
print(is_spam)  # True

Batch Generation

import outlines

model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.3")

class Classification(BaseModel):
    label: str
    confidence: float = Field(ge=0.0, le=1.0)

gen = outlines.generate.json(model, Classification)

# Process batch of inputs
prompts = [
    "Classify: Great product, highly recommend!",
    "Classify: Terrible, broke after one day.",
    "Classify: It arrived on time.",
]
results = gen(prompts, max_tokens=64)   # Returns list of Classification objects
for prompt, result in zip(prompts, results):
    print(f"{result.label} ({result.confidence:.2f}): {prompt[:40]}")

Common Workflows

Structured Data Extraction Pipeline

import outlines
from pydantic import BaseModel
from typing import Optional

class InvoiceData(BaseModel):
    vendor_name: str
    invoice_number: str
    total_amount: float
    currency: str
    due_date: Optional[str] = None
    line_items: list[str] = []

model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.3")
extractor = outlines.generate.json(model, InvoiceData)

def extract_invoice(raw_text: str) -> InvoiceData:
    prompt = f"""Extract invoice data from the following text:

{raw_text}

Respond with structured data."""
    return extractor(prompt, max_tokens=512)

Constrained Agent Actions

from pydantic import BaseModel

class AgentAction(BaseModel):
    action: str    # "search", "calculate", "respond", "escalate"
    target: str
    reasoning: str

gen = outlines.generate.json(model, AgentAction)

def decide_action(user_message: str) -> AgentAction:
    prompt = f"""Given this user message, decide what action to take:

User: {user_message}

Choose action from: search, calculate, respond, escalate"""
    return gen(prompt, max_tokens=128)

Tips and Best Practices

TopicRecommendation
FSM compilationFirst call compiles the FSM; subsequent calls on same generator are fast — reuse generators
Pydantic v2Outlines uses Pydantic v2 schemas; ensure models use v2 syntax
Optional fieldsUse Optional[T] = None for fields the model might not have data for
Max tokensAlways set max_tokens — structured generation still terminates on EOS or stop tokens
Grammar complexityVery complex grammars (large EBNF) can be slow to compile; cache the generator object
Regex anchoringOutlines regexes are implicitly anchored to the full output — no need for ^...$
Quantization4-bit/8-bit models work with FSM sampling; quality may drop slightly
OpenAI limitsOpenAI backend only supports json and choice generators, not regex/grammar
BatchingPass list of prompts for parallel generation; significantly faster than sequential
Model choiceInstruction-tuned models produce better structured outputs than base models
SamplingUse greedy() for deterministic/reproducible outputs; multinomial for variety
vLLM integrationUse vLLM backend with guided_decoding_backend="outlines" for production serving