Zum Inhalt springen

Nougat Cheat Sheet

Overview

Nougat (Neural Optical Understanding for Academic Documents) is a transformer-based model developed by Meta AI that converts academic PDF documents into structured Markdown text. Unlike traditional OCR, Nougat uses a visual transformer encoder-decoder architecture trained on scientific papers to accurately extract mathematical equations (as LaTeX), tables, section structures, and references from PDF documents without requiring an underlying text layer.

The model excels at handling the complex layouts found in academic papers: multi-column formats, inline and display math, chemical formulas, figures with captions, and bibliographic references. Nougat processes PDF pages as images and outputs clean Markdown with embedded LaTeX, making it ideal for building searchable academic knowledge bases and RAG systems over scientific literature.

Installation

pip install nougat-ocr

# With GPU support
pip install nougat-ocr[gpu]

# From source
git clone https://github.com/facebookresearch/nougat.git
cd nougat
pip install -e ".[gpu]"

# Download model weights (auto-downloaded on first use)
nougat --help

Core Usage

Command Line

# Convert single PDF
nougat path/to/paper.pdf -o output_dir/

# Convert with specific model
nougat paper.pdf -o output/ -m 0.1.0-small

# Convert multiple PDFs
nougat paper1.pdf paper2.pdf paper3.pdf -o output/

# Process entire directory
nougat /path/to/pdfs/ -o output/ --recompute

# Specify pages
nougat paper.pdf -o output/ --pages 0-5

# Skip existing outputs
nougat paper.pdf -o output/ --no-skipping

# Use CPU (slower)
nougat paper.pdf -o output/ --no-cuda

# Batch processing with specific batch size
nougat paper.pdf -o output/ --batchsize 4

Python API

from nougat import NougatModel
from nougat.utils.device import move_to_device
from nougat.postprocessing import markdown_compatible
from PIL import Image
import torch

# Load model
model = NougatModel.from_pretrained("facebook/nougat-base")
model = move_to_device(model)
model.eval()

# Process a single page image
from nougat.utils.dataset import LazyDataset
from torch.utils.data import DataLoader

# Convert PDF to images
from nougat.dataset.rasterize import rasterize_paper
images = rasterize_paper(pdf_path="paper.pdf", return_pil=True)

# Process each page
for i, image in enumerate(images):
    # Prepare input
    sample = model.encoder.prepare_input(image)
    sample = sample.unsqueeze(0).to(model.device)

    # Generate markdown
    output = model.inference(image_tensors=sample)
    generated = output["predictions"][0]

    # Post-process
    markdown = markdown_compatible(generated)
    print(f"--- Page {i+1} ---")
    print(markdown)

Output Format

# Paper Title

## Abstract

This paper presents a novel approach to...

## 1 Introduction

We introduce a method that $\alpha + \beta = \gamma$ demonstrates...

### 1.1 Background

The equation governing the process is:

$$\mathcal{L} = \sum_{i=1}^{N} \log p(x_i | \theta)$$

## 2 Methodology

| Method | Accuracy | F1 Score |
|--------|----------|----------|
| Baseline | 0.85 | 0.83 |
| Ours | **0.92** | **0.91** |

## References

* [1] Author et al. "Title of Paper." Conference 2024.

Models

ModelSizePerformanceSpeed
nougat-base350M paramsBest qualitySlower
nougat-small250M paramsGood qualityFaster
# Download specific model
python -c "from nougat import NougatModel; NougatModel.from_pretrained('facebook/nougat-base')"
python -c "from nougat import NougatModel; NougatModel.from_pretrained('facebook/nougat-small')"

Configuration

Processing Options

ParameterDescriptionDefault
--model / -mModel tag (0.1.0-base, 0.1.0-small)0.1.0-base
--batchsize / -bBatch size for processing1
--pagesPage range to process (e.g., 0-5)All
--out / -oOutput directoryCurrent dir
--recomputeReprocess existing outputsFalse
--no-cudaForce CPU processingFalse
--no-skippingDon’t skip pages with errorsFalse
--markdownPost-process to clean MarkdownTrue

GPU Memory Requirements

ModelBatch Size 1Batch Size 4
nougat-base~6 GB VRAM~16 GB VRAM
nougat-small~4 GB VRAM~12 GB VRAM

Advanced Usage

Batch Processing Pipeline

import os
import glob
from pathlib import Path
from nougat import NougatModel
from nougat.utils.device import move_to_device
from nougat.dataset.rasterize import rasterize_paper
from nougat.postprocessing import markdown_compatible

model = NougatModel.from_pretrained("facebook/nougat-base")
model = move_to_device(model)
model.eval()

pdf_dir = "./papers/"
output_dir = "./markdown/"
os.makedirs(output_dir, exist_ok=True)

for pdf_path in glob.glob(f"{pdf_dir}/*.pdf"):
    name = Path(pdf_path).stem
    output_path = f"{output_dir}/{name}.md"

    if os.path.exists(output_path):
        continue

    print(f"Processing: {pdf_path}")
    images = rasterize_paper(pdf_path, return_pil=True)

    pages = []
    for image in images:
        sample = model.encoder.prepare_input(image).unsqueeze(0).to(model.device)
        output = model.inference(image_tensors=sample)
        page_md = markdown_compatible(output["predictions"][0])
        pages.append(page_md)

    full_md = "\n\n".join(pages)
    with open(output_path, "w") as f:
        f.write(full_md)

    print(f"  -> {output_path}")

API Server

from fastapi import FastAPI, UploadFile
from nougat import NougatModel
from nougat.utils.device import move_to_device
from nougat.dataset.rasterize import rasterize_paper
from nougat.postprocessing import markdown_compatible
import tempfile

app = FastAPI()
model = NougatModel.from_pretrained("facebook/nougat-base")
model = move_to_device(model)
model.eval()

@app.post("/convert")
async def convert_pdf(file: UploadFile):
    with tempfile.NamedTemporaryFile(suffix=".pdf", delete=False) as tmp:
        tmp.write(await file.read())
        tmp_path = tmp.name

    images = rasterize_paper(tmp_path, return_pil=True)
    pages = []
    for image in images:
        sample = model.encoder.prepare_input(image).unsqueeze(0).to(model.device)
        output = model.inference(image_tensors=sample)
        pages.append(markdown_compatible(output["predictions"][0]))

    return {"markdown": "\n\n".join(pages), "pages": len(pages)}

Integration with RAG

from nougat import NougatModel
from langchain.text_splitter import MarkdownHeaderTextSplitter

# Convert PDF to Markdown
markdown_text = convert_pdf_to_markdown("paper.pdf")

# Split by headers for RAG chunking
headers_to_split_on = [
    ("#", "Header 1"),
    ("##", "Header 2"),
    ("###", "Header 3"),
]
splitter = MarkdownHeaderTextSplitter(headers_to_split_on=headers_to_split_on)
chunks = splitter.split_text(markdown_text)

# Each chunk has metadata with section headers
for chunk in chunks:
    print(f"Section: {chunk.metadata}")
    print(f"Content: {chunk.page_content[:100]}...")

Troubleshooting

IssueSolution
CUDA out of memoryReduce batch size to 1, use nougat-small model
Poor LaTeX outputUse nougat-base model, check PDF is not a scan
Garbled text on scanned PDFsNougat works best on born-digital PDFs
Slow processingUse GPU, increase batch size if VRAM allows
Missing pages in outputCheck --pages range, use --no-skipping
Model download failsDownload manually from HuggingFace hub
Repetitive outputKnown issue with some pages; post-process to detect loops
Tables misalignedUse nougat-base for better table extraction
# Test installation
python -c "import nougat; print('Nougat installed')"

# Check CUDA availability
python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}')"

# Verify model download
python -c "from nougat import NougatModel; m = NougatModel.from_pretrained('facebook/nougat-base'); print('Model loaded')"