Nougat Cheat Sheet

Overview

Nougat (Neural Optical Understanding for Academic Documents) is a transformer-based model developed by Meta AI that converts academic PDF documents into structured Markdown text. Unlike traditional OCR, Nougat uses a visual transformer encoder-decoder architecture trained on scientific papers to accurately extract mathematical equations (as LaTeX), tables, section structures, and references from PDF documents without requiring an underlying text layer.

The model excels at handling the complex layouts found in academic papers: multi-column formats, inline and display math, chemical formulas, figures with captions, and bibliographic references. Nougat processes PDF pages as images and outputs clean Markdown with embedded LaTeX, making it ideal for building searchable academic knowledge bases and RAG systems over scientific literature.

Installation

pip install nougat-ocr

# With GPU support
pip install nougat-ocr[gpu]

# From source
git clone https://github.com/facebookresearch/nougat.git
cd nougat
pip install -e ".[gpu]"

# Download model weights (auto-downloaded on first use)
nougat --help

Core Usage

Command Line

# Convert single PDF
nougat path/to/paper.pdf -o output_dir/

# Convert with specific model
nougat paper.pdf -o output/ -m 0.1.0-small

# Convert multiple PDFs
nougat paper1.pdf paper2.pdf paper3.pdf -o output/

# Process entire directory
nougat /path/to/pdfs/ -o output/ --recompute

# Specify pages
nougat paper.pdf -o output/ --pages 0-5

# Skip existing outputs
nougat paper.pdf -o output/ --no-skipping

# Use CPU (slower)
nougat paper.pdf -o output/ --no-cuda

# Batch processing with specific batch size
nougat paper.pdf -o output/ --batchsize 4

Python API

from nougat import NougatModel
from nougat.utils.device import move_to_device
from nougat.postprocessing import markdown_compatible
from PIL import Image
import torch

# Load model
model = NougatModel.from_pretrained("facebook/nougat-base")
model = move_to_device(model)
model.eval()

# Process a single page image
from nougat.utils.dataset import LazyDataset
from torch.utils.data import DataLoader

# Convert PDF to images
from nougat.dataset.rasterize import rasterize_paper
images = rasterize_paper(pdf_path="paper.pdf", return_pil=True)

# Process each page
for i, image in enumerate(images):
    # Prepare input
    sample = model.encoder.prepare_input(image)
    sample = sample.unsqueeze(0).to(model.device)

    # Generate markdown
    output = model.inference(image_tensors=sample)
    generated = output["predictions"][0]

    # Post-process
    markdown = markdown_compatible(generated)
    print(f"--- Page {i+1} ---")
    print(markdown)

Output Format

# Paper Title

## Abstract

This paper presents a novel approach to...

## 1 Introduction

We introduce a method that $\alpha + \beta = \gamma$ demonstrates...

### 1.1 Background

The equation governing the process is:

$$\mathcal{L} = \sum_{i=1}^{N} \log p(x_i | \theta)$$

## 2 Methodology

| Method | Accuracy | F1 Score |
|--------|----------|----------|
| Baseline | 0.85 | 0.83 |
| Ours | **0.92** | **0.91** |

## References

* [1] Author et al. "Title of Paper." Conference 2024.

Models

Model	Size	Performance	Speed
`nougat-base`	350M params	Best quality	Slower
`nougat-small`	250M params	Good quality	Faster

# Download specific model
python -c "from nougat import NougatModel; NougatModel.from_pretrained('facebook/nougat-base')"
python -c "from nougat import NougatModel; NougatModel.from_pretrained('facebook/nougat-small')"

Configuration

Processing Options

Parameter	Description	Default
`--model` / `-m`	Model tag (0.1.0-base, 0.1.0-small)	0.1.0-base
`--batchsize` / `-b`	Batch size for processing	1
`--pages`	Page range to process (e.g., 0-5)	All
`--out` / `-o`	Output directory	Current dir
`--recompute`	Reprocess existing outputs	False
`--no-cuda`	Force CPU processing	False
`--no-skipping`	Don’t skip pages with errors	False
`--markdown`	Post-process to clean Markdown	True

GPU Memory Requirements

Model	Batch Size 1	Batch Size 4
nougat-base	~6 GB VRAM	~16 GB VRAM
nougat-small	~4 GB VRAM	~12 GB VRAM

Advanced Usage

Batch Processing Pipeline

import os
import glob
from pathlib import Path
from nougat import NougatModel
from nougat.utils.device import move_to_device
from nougat.dataset.rasterize import rasterize_paper
from nougat.postprocessing import markdown_compatible

model = NougatModel.from_pretrained("facebook/nougat-base")
model = move_to_device(model)
model.eval()

pdf_dir = "./papers/"
output_dir = "./markdown/"
os.makedirs(output_dir, exist_ok=True)

for pdf_path in glob.glob(f"{pdf_dir}/*.pdf"):
    name = Path(pdf_path).stem
    output_path = f"{output_dir}/{name}.md"

    if os.path.exists(output_path):
        continue

    print(f"Processing: {pdf_path}")
    images = rasterize_paper(pdf_path, return_pil=True)

    pages = []
    for image in images:
        sample = model.encoder.prepare_input(image).unsqueeze(0).to(model.device)
        output = model.inference(image_tensors=sample)
        page_md = markdown_compatible(output["predictions"][0])
        pages.append(page_md)

    full_md = "\n\n".join(pages)
    with open(output_path, "w") as f:
        f.write(full_md)

    print(f"  -> {output_path}")

API Server

from fastapi import FastAPI, UploadFile
from nougat import NougatModel
from nougat.utils.device import move_to_device
from nougat.dataset.rasterize import rasterize_paper
from nougat.postprocessing import markdown_compatible
import tempfile

app = FastAPI()
model = NougatModel.from_pretrained("facebook/nougat-base")
model = move_to_device(model)
model.eval()

@app.post("/convert")
async def convert_pdf(file: UploadFile):
    with tempfile.NamedTemporaryFile(suffix=".pdf", delete=False) as tmp:
        tmp.write(await file.read())
        tmp_path = tmp.name

    images = rasterize_paper(tmp_path, return_pil=True)
    pages = []
    for image in images:
        sample = model.encoder.prepare_input(image).unsqueeze(0).to(model.device)
        output = model.inference(image_tensors=sample)
        pages.append(markdown_compatible(output["predictions"][0]))

    return {"markdown": "\n\n".join(pages), "pages": len(pages)}

Integration with RAG

from nougat import NougatModel
from langchain.text_splitter import MarkdownHeaderTextSplitter

# Convert PDF to Markdown
markdown_text = convert_pdf_to_markdown("paper.pdf")

# Split by headers for RAG chunking
headers_to_split_on = [
    ("#", "Header 1"),
    ("##", "Header 2"),
    ("###", "Header 3"),
]
splitter = MarkdownHeaderTextSplitter(headers_to_split_on=headers_to_split_on)
chunks = splitter.split_text(markdown_text)

# Each chunk has metadata with section headers
for chunk in chunks:
    print(f"Section: {chunk.metadata}")
    print(f"Content: {chunk.page_content[:100]}...")

Troubleshooting

Issue	Solution
CUDA out of memory	Reduce batch size to 1, use nougat-small model
Poor LaTeX output	Use nougat-base model, check PDF is not a scan
Garbled text on scanned PDFs	Nougat works best on born-digital PDFs
Slow processing	Use GPU, increase batch size if VRAM allows
Missing pages in output	Check `--pages` range, use `--no-skipping`
Model download fails	Download manually from HuggingFace hub
Repetitive output	Known issue with some pages; post-process to detect loops
Tables misaligned	Use nougat-base for better table extraction

# Test installation
python -c "import nougat; print('Nougat installed')"

# Check CUDA availability
python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}')"

# Verify model download
python -c "from nougat import NougatModel; m = NougatModel.from_pretrained('facebook/nougat-base'); print('Model loaded')"