FAISS Cheat Sheet

Overview

FAISS (Facebook AI Similarity Search) is a library developed by Meta AI Research for fast similarity search and dense vector clustering. Written in C++ with Python and Go bindings, FAISS is optimized for searching large collections — from thousands to billions of vectors — and provides both exact (brute-force) and approximate nearest-neighbor (ANN) algorithms.

FAISS is a library, not a database — it has no persistence layer, server mode, or metadata storage built in. You manage loading/saving indexes to disk, and any metadata (document IDs, payloads) must be maintained in a parallel data structure. This low-level nature makes FAISS extremely fast and memory-efficient but requires more application code than managed vector databases.

The library supports CPU and GPU (CUDA) execution, SIMD-optimized distance computations, multiple quantization schemes (Scalar Quantizer, Product Quantizer, HNSW), and composite index types that chain transforms and quantizers. FAISS underpins many vector database engines (including components of Milvus and Weaviate) and remains the reference implementation for ANN benchmarks.

Installation

Python

# CPU-only (conda recommended for FAISS)
conda install -c pytorch faiss-cpu

# GPU (requires CUDA)
conda install -c pytorch faiss-gpu

# pip (CPU wheels, may lag conda releases)
pip install faiss-cpu
pip install faiss-gpu    # CUDA 11/12 builds

# Verify installation
python -c "import faiss; print(faiss.__version__)"
python -c "import faiss; print(faiss.get_num_gpus())"   # GPU count

From Source (for custom CUDA builds)

git clone https://github.com/facebookresearch/faiss.git
cd faiss

cmake -B build \
  -DFAISS_ENABLE_GPU=ON \
  -DFAISS_ENABLE_PYTHON=ON \
  -DCMAKE_BUILD_TYPE=Release \
  -DCUDA_ARCHITECTURES="80;86"   # Ampere GPUs

cmake --build build -j8
cd build/faiss/python && pip install -e .

C++ Library

# Debian/Ubuntu
sudo apt install -y libfaiss-dev

# macOS
brew install faiss

Configuration

Index Selection Guide

Exact Search (small datasets, <100K vectors):
  faiss.IndexFlatL2          — Exact L2, no training
  faiss.IndexFlatIP          — Exact inner product, no training

Approximate Search (large datasets):
  faiss.IndexIVFFlat         — IVF + flat storage, good baseline ANN
  faiss.IndexHNSWFlat        — HNSW graph, excellent recall, high RAM
  faiss.IndexIVFPQ           — IVF + Product Quantization, low memory
  faiss.IndexIVFScalarQuantizer — IVF + SQ, balanced speed/memory

Compression:
  faiss.IndexPQ              — Pure Product Quantization
  faiss.IndexScalarQuantizer — Scalar quantization (4-bit, 8-bit)

GPU Acceleration:
  faiss.index_cpu_to_gpu()   — Move any CPU index to GPU
  faiss.index_cpu_to_all_gpus() — Multi-GPU index

Distance Metrics

import faiss

faiss.METRIC_L2            # Euclidean distance (default)
faiss.METRIC_INNER_PRODUCT # Dot product (use with normalized vectors for cosine)
faiss.METRIC_L1            # Manhattan distance
faiss.METRIC_Linf          # Chebyshev distance
faiss.METRIC_Canberra      # Canberra distance

Core Commands/API

Method	Description
`faiss.IndexFlatL2(d)`	Exact L2 index, dimension d
`faiss.IndexFlatIP(d)`	Exact inner product index
`faiss.IndexHNSWFlat(d, M)`	HNSW index, M connections per node
`faiss.IndexIVFFlat(quantizer, d, nlist)`	IVF with flat storage
`faiss.IndexIVFPQ(quantizer, d, nlist, M, nbits)`	IVF with Product Quantization
`faiss.IndexIVFScalarQuantizer(quantizer, d, nlist, qt)`	IVF + scalar quantization
`faiss.IndexPQ(d, M, nbits)`	Pure Product Quantization
`faiss.IndexScalarQuantizer(d, qt)`	Scalar quantization index
`faiss.IndexIDMap(index)`	Wrap index to support custom int64 IDs
`faiss.IndexIDMap2(index)`	IDMap with reverse lookup capability
`index.train(vectors)`	Train index on representative sample
`index.add(vectors)`	Add vectors (sequential IDs)
`index.add_with_ids(vectors, ids)`	Add vectors with custom IDs (IDMap)
`index.search(query, k)`	Search k nearest neighbors
`index.range_search(query, radius)`	Find all vectors within radius
`index.remove_ids(id_selector)`	Remove vectors by ID
`index.reconstruct(i)`	Reconstruct vector at index position i
`index.ntotal`	Number of vectors in index
`index.is_trained`	Whether index needs training
`faiss.write_index(index, path)`	Serialize index to file
`faiss.read_index(path)`	Load index from file
`faiss.index_cpu_to_gpu(res, dev, index)`	Move index to GPU
`faiss.index_gpu_to_cpu(index)`	Move GPU index back to CPU
`faiss.normalize_L2(vectors)`	In-place L2 normalization

Advanced Usage

Flat (Exact) Index

import faiss
import numpy as np

d = 1536    # Vector dimension
n = 10_000  # Number of vectors

# Generate random vectors (replace with real embeddings)
np.random.seed(42)
vectors = np.random.rand(n, d).astype(np.float32)

# Build exact index
index = faiss.IndexFlatL2(d)
index.add(vectors)
print(f"Vectors in index: {index.ntotal}")

# Search — returns (distances, indices)
query = np.random.rand(5, d).astype(np.float32)  # 5 query vectors
distances, indices = index.search(query, k=10)    # Top-10 for each query

print(f"Shape: distances={distances.shape}, indices={indices.shape}")
for i in range(len(query)):
    print(f"Query {i}: nearest={indices[i][0]}, distance={distances[i][0]:.4f}")

# Cosine similarity — normalize first
faiss.normalize_L2(vectors)
faiss.normalize_L2(query)
index_ip = faiss.IndexFlatIP(d)
index_ip.add(vectors)
scores, ids = index_ip.search(query, k=5)   # scores = cosine similarity

HNSW Index

# HNSW — excellent recall, high RAM, no training
# M: connections per node (4-64); higher = better recall, more memory
index_hnsw = faiss.IndexHNSWFlat(d, M=32)

# Tuning parameters
index_hnsw.hnsw.efConstruction = 200  # Build quality (40-800)
index_hnsw.hnsw.efSearch = 64         # Query-time accuracy (set before search)

index_hnsw.add(vectors)

# Set efSearch before each search call
index_hnsw.hnsw.efSearch = 128
distances, indices = index_hnsw.search(query, k=10)

# HNSW does not support remove_ids — rebuild index to remove vectors
# HNSW does not support GPU transfer

IVF Index (Trained ANN)

# IVF: partition space into nlist cells, search nprobe cells per query
# Rule of thumb: nlist = 4 * sqrt(n)
nlist = 100
quantizer = faiss.IndexFlatL2(d)    # Coarse quantizer

index_ivf = faiss.IndexIVFFlat(quantizer, d, nlist, faiss.METRIC_L2)

# Training required — use representative sample (at least 39 * nlist vectors)
train_data = vectors[:8_000]
assert not index_ivf.is_trained
index_ivf.train(train_data)
assert index_ivf.is_trained

index_ivf.add(vectors)

# Set nprobe before search (higher = better recall, slower)
index_ivf.nprobe = 10   # Search 10 of 100 cells
distances, indices = index_ivf.search(query, k=5)

IVF + Product Quantization (Low Memory)

# PQ compresses vectors: M subvectors of nbits each
# Memory per vector = M * nbits / 8 bytes (vs 4*d bytes for float32)
# Example: d=1536, M=96, nbits=8 → 96 bytes vs 6144 bytes (64x compression)

nlist = 256
M     = 96       # Number of subvectors (d must be divisible by M)
nbits = 8        # Bits per subvector (8 → 256 centroids)

quantizer  = faiss.IndexFlatL2(d)
index_ivfpq = faiss.IndexIVFPQ(quantizer, d, nlist, M, nbits)

index_ivfpq.train(vectors)   # Needs >= 256 * nlist training vectors
index_ivfpq.add(vectors)

index_ivfpq.nprobe = 16
distances, indices = index_ivfpq.search(query, k=10)

# Memory estimate
bytes_per_vec = M * nbits // 8
total_mb = (n * bytes_per_vec) / (1024**2)
print(f"Index size estimate: {total_mb:.1f} MB")

Custom IDs with IDMap

import numpy as np

# Map sequential FAISS indices to your own int64 IDs
base_index = faiss.IndexFlatL2(d)
index      = faiss.IndexIDMap(base_index)

custom_ids = np.array([1001, 2002, 3003, 4004, 5005], dtype=np.int64)
vecs = np.random.rand(5, d).astype(np.float32)

index.add_with_ids(vecs, custom_ids)

distances, ids = index.search(np.random.rand(1, d).astype(np.float32), k=3)
print(f"Nearest IDs: {ids[0]}")   # Returns your custom IDs

# Remove by ID
selector = faiss.IDSelectorBatch(np.array([1001, 2002], dtype=np.int64))
index.remove_ids(selector)

GPU Acceleration

import faiss

# Single GPU
res = faiss.StandardGpuResources()   # Default: 2GB temp memory
gpu_index = faiss.index_cpu_to_gpu(res, 0, index)   # GPU 0
gpu_index.add(vectors)
distances, indices = gpu_index.search(query, k=10)

# Convert back to CPU for saving
cpu_index = faiss.index_gpu_to_cpu(gpu_index)
faiss.write_index(cpu_index, "index.faiss")

# Multi-GPU — spreads index across all available GPUs
multi_gpu_index = faiss.index_cpu_to_all_gpus(index)
distances, indices = multi_gpu_index.search(query, k=10)

# GPU resources with custom temp memory
res = faiss.StandardGpuResources()
res.setTempMemory(4 * 1024**3)   # 4 GB temp memory
res.setDefaultNullStreamAllDevices()

Serialization

# Save and load index
faiss.write_index(index, "my_index.faiss")
loaded = faiss.read_index("my_index.faiss")

# Save to bytes (in-memory)
import io
buffer = faiss.serialize_index(index)   # Returns numpy uint8 array
# Restore
index = faiss.deserialize_index(buffer)

# Mmap for very large indexes (read-only, no copy to RAM)
loaded_mmap = faiss.read_index("my_index.faiss", faiss.IO_FLAG_MMAP)

Common Workflows

Building a Production Index Pipeline

import faiss
import numpy as np
import pickle
import os

class FAISSIndex:
    def __init__(self, dim: int, nlist: int = 100, use_gpu: bool = False):
        self.dim = dim
        self.id_to_meta = {}  # Map int64 ID → metadata dict

        quantizer     = faiss.IndexFlatL2(dim)
        base_index    = faiss.IndexIVFPQ(quantizer, dim, nlist, dim // 16, 8)
        self.index    = faiss.IndexIDMap(base_index)
        self.use_gpu  = use_gpu

    def train(self, vectors: np.ndarray):
        self.index.index.train(vectors)

    def add(self, vectors: np.ndarray, ids: np.ndarray, metadata: list[dict]):
        self.index.add_with_ids(vectors, ids)
        for i, meta in zip(ids, metadata):
            self.id_to_meta[int(i)] = meta

    def search(self, query: np.ndarray, k: int = 10, nprobe: int = 10):
        self.index.index.nprobe = nprobe
        distances, ids = self.index.search(query, k)
        results = []
        for dist_row, id_row in zip(distances, ids):
            row = []
            for d, i in zip(dist_row, id_row):
                if i != -1:  # -1 means no result (padded)
                    row.append({"id": int(i), "distance": float(d),
                                "meta": self.id_to_meta.get(int(i), {})})
            results.append(row)
        return results

    def save(self, path: str):
        faiss.write_index(self.index, path + ".faiss")
        with open(path + ".meta", "wb") as f:
            pickle.dump(self.id_to_meta, f)

    @classmethod
    def load(cls, path: str, dim: int):
        obj = cls.__new__(cls)
        obj.dim   = dim
        obj.index = faiss.read_index(path + ".faiss")
        with open(path + ".meta", "rb") as f:
            obj.id_to_meta = pickle.load(f)
        return obj

Recall Evaluation

def compute_recall(index_approx, index_exact, queries, k):
    _, gt_ids  = index_exact.search(queries, k)
    _, ann_ids = index_approx.search(queries, k)

    recall = 0.0
    for gt_row, ann_row in zip(gt_ids, ann_ids):
        recall += len(set(gt_row) & set(ann_row)) / k
    return recall / len(queries)

# Test recall at different nprobe settings
for nprobe in [1, 5, 10, 20, 50]:
    index_ivf.nprobe = nprobe
    r = compute_recall(index_ivf, index_flat, query, k=10)
    print(f"nprobe={nprobe:3d}  recall@10={r:.3f}")

Tips and Best Practices

Tip	Details
Always use `float32`	FAISS requires `np.float32`; add `.astype(np.float32)` to all vector arrays
`-1` in results means missing	IVF indexes pad results with -1 when fewer than k results exist in searched cells
Train on representative data	IVF/PQ training quality depends on diversity; use 30-100x `nlist` training vectors
Use IDMap for custom IDs	FAISS internal IDs are sequential int64; wrap with `IndexIDMap` for arbitrary IDs
HNSW never needs `remove_ids` workaround	But HNSW does not support removals; use IVF + IDMap for deletion support
Save metadata separately	FAISS stores only vectors; use pickle/SQLite for ID-to-metadata mapping
Tune nprobe vs recall trade-off	Plot recall@k vs nprobe curve; 95%+ recall typically requires nprobe = nlist * 0.1-0.2
Normalize for cosine similarity	Call `faiss.normalize_L2(vecs)` in-place before adding and before each search
Use GPU for batch queries	GPU shines with query batch sizes >= 32; single-query GPU latency is often higher than CPU
Index factory strings for complex indexes	`faiss.index_factory(d, "IVF256,PQ32")` is cleaner than manual composition for complex indexes