Ir al contenido

FAISS Cheat Sheet

Overview

FAISS (Facebook AI Similarity Search) is a library developed by Meta AI Research for fast similarity search and dense vector clustering. Written in C++ with Python and Go bindings, FAISS is optimized for searching large collections — from thousands to billions of vectors — and provides both exact (brute-force) and approximate nearest-neighbor (ANN) algorithms.

FAISS is a library, not a database — it has no persistence layer, server mode, or metadata storage built in. You manage loading/saving indexes to disk, and any metadata (document IDs, payloads) must be maintained in a parallel data structure. This low-level nature makes FAISS extremely fast and memory-efficient but requires more application code than managed vector databases.

The library supports CPU and GPU (CUDA) execution, SIMD-optimized distance computations, multiple quantization schemes (Scalar Quantizer, Product Quantizer, HNSW), and composite index types that chain transforms and quantizers. FAISS underpins many vector database engines (including components of Milvus and Weaviate) and remains the reference implementation for ANN benchmarks.

Installation

Python

# CPU-only (conda recommended for FAISS)
conda install -c pytorch faiss-cpu

# GPU (requires CUDA)
conda install -c pytorch faiss-gpu

# pip (CPU wheels, may lag conda releases)
pip install faiss-cpu
pip install faiss-gpu    # CUDA 11/12 builds

# Verify installation
python -c "import faiss; print(faiss.__version__)"
python -c "import faiss; print(faiss.get_num_gpus())"   # GPU count

From Source (for custom CUDA builds)

git clone https://github.com/facebookresearch/faiss.git
cd faiss

cmake -B build \
  -DFAISS_ENABLE_GPU=ON \
  -DFAISS_ENABLE_PYTHON=ON \
  -DCMAKE_BUILD_TYPE=Release \
  -DCUDA_ARCHITECTURES="80;86"   # Ampere GPUs

cmake --build build -j8
cd build/faiss/python && pip install -e .

C++ Library

# Debian/Ubuntu
sudo apt install -y libfaiss-dev

# macOS
brew install faiss

Configuration

Index Selection Guide

Exact Search (small datasets, <100K vectors):
  faiss.IndexFlatL2          — Exact L2, no training
  faiss.IndexFlatIP          — Exact inner product, no training

Approximate Search (large datasets):
  faiss.IndexIVFFlat         — IVF + flat storage, good baseline ANN
  faiss.IndexHNSWFlat        — HNSW graph, excellent recall, high RAM
  faiss.IndexIVFPQ           — IVF + Product Quantization, low memory
  faiss.IndexIVFScalarQuantizer — IVF + SQ, balanced speed/memory

Compression:
  faiss.IndexPQ              — Pure Product Quantization
  faiss.IndexScalarQuantizer — Scalar quantization (4-bit, 8-bit)

GPU Acceleration:
  faiss.index_cpu_to_gpu()   — Move any CPU index to GPU
  faiss.index_cpu_to_all_gpus() — Multi-GPU index

Distance Metrics

import faiss

faiss.METRIC_L2            # Euclidean distance (default)
faiss.METRIC_INNER_PRODUCT # Dot product (use with normalized vectors for cosine)
faiss.METRIC_L1            # Manhattan distance
faiss.METRIC_Linf          # Chebyshev distance
faiss.METRIC_Canberra      # Canberra distance

Core Commands/API

MethodDescription
faiss.IndexFlatL2(d)Exact L2 index, dimension d
faiss.IndexFlatIP(d)Exact inner product index
faiss.IndexHNSWFlat(d, M)HNSW index, M connections per node
faiss.IndexIVFFlat(quantizer, d, nlist)IVF with flat storage
faiss.IndexIVFPQ(quantizer, d, nlist, M, nbits)IVF with Product Quantization
faiss.IndexIVFScalarQuantizer(quantizer, d, nlist, qt)IVF + scalar quantization
faiss.IndexPQ(d, M, nbits)Pure Product Quantization
faiss.IndexScalarQuantizer(d, qt)Scalar quantization index
faiss.IndexIDMap(index)Wrap index to support custom int64 IDs
faiss.IndexIDMap2(index)IDMap with reverse lookup capability
index.train(vectors)Train index on representative sample
index.add(vectors)Add vectors (sequential IDs)
index.add_with_ids(vectors, ids)Add vectors with custom IDs (IDMap)
index.search(query, k)Search k nearest neighbors
index.range_search(query, radius)Find all vectors within radius
index.remove_ids(id_selector)Remove vectors by ID
index.reconstruct(i)Reconstruct vector at index position i
index.ntotalNumber of vectors in index
index.is_trainedWhether index needs training
faiss.write_index(index, path)Serialize index to file
faiss.read_index(path)Load index from file
faiss.index_cpu_to_gpu(res, dev, index)Move index to GPU
faiss.index_gpu_to_cpu(index)Move GPU index back to CPU
faiss.normalize_L2(vectors)In-place L2 normalization

Advanced Usage

Flat (Exact) Index

import faiss
import numpy as np

d = 1536    # Vector dimension
n = 10_000  # Number of vectors

# Generate random vectors (replace with real embeddings)
np.random.seed(42)
vectors = np.random.rand(n, d).astype(np.float32)

# Build exact index
index = faiss.IndexFlatL2(d)
index.add(vectors)
print(f"Vectors in index: {index.ntotal}")

# Search — returns (distances, indices)
query = np.random.rand(5, d).astype(np.float32)  # 5 query vectors
distances, indices = index.search(query, k=10)    # Top-10 for each query

print(f"Shape: distances={distances.shape}, indices={indices.shape}")
for i in range(len(query)):
    print(f"Query {i}: nearest={indices[i][0]}, distance={distances[i][0]:.4f}")

# Cosine similarity — normalize first
faiss.normalize_L2(vectors)
faiss.normalize_L2(query)
index_ip = faiss.IndexFlatIP(d)
index_ip.add(vectors)
scores, ids = index_ip.search(query, k=5)   # scores = cosine similarity

HNSW Index

# HNSW — excellent recall, high RAM, no training
# M: connections per node (4-64); higher = better recall, more memory
index_hnsw = faiss.IndexHNSWFlat(d, M=32)

# Tuning parameters
index_hnsw.hnsw.efConstruction = 200  # Build quality (40-800)
index_hnsw.hnsw.efSearch = 64         # Query-time accuracy (set before search)

index_hnsw.add(vectors)

# Set efSearch before each search call
index_hnsw.hnsw.efSearch = 128
distances, indices = index_hnsw.search(query, k=10)

# HNSW does not support remove_ids — rebuild index to remove vectors
# HNSW does not support GPU transfer

IVF Index (Trained ANN)

# IVF: partition space into nlist cells, search nprobe cells per query
# Rule of thumb: nlist = 4 * sqrt(n)
nlist = 100
quantizer = faiss.IndexFlatL2(d)    # Coarse quantizer

index_ivf = faiss.IndexIVFFlat(quantizer, d, nlist, faiss.METRIC_L2)

# Training required — use representative sample (at least 39 * nlist vectors)
train_data = vectors[:8_000]
assert not index_ivf.is_trained
index_ivf.train(train_data)
assert index_ivf.is_trained

index_ivf.add(vectors)

# Set nprobe before search (higher = better recall, slower)
index_ivf.nprobe = 10   # Search 10 of 100 cells
distances, indices = index_ivf.search(query, k=5)

IVF + Product Quantization (Low Memory)

# PQ compresses vectors: M subvectors of nbits each
# Memory per vector = M * nbits / 8 bytes (vs 4*d bytes for float32)
# Example: d=1536, M=96, nbits=8 → 96 bytes vs 6144 bytes (64x compression)

nlist = 256
M     = 96       # Number of subvectors (d must be divisible by M)
nbits = 8        # Bits per subvector (8 → 256 centroids)

quantizer  = faiss.IndexFlatL2(d)
index_ivfpq = faiss.IndexIVFPQ(quantizer, d, nlist, M, nbits)

index_ivfpq.train(vectors)   # Needs >= 256 * nlist training vectors
index_ivfpq.add(vectors)

index_ivfpq.nprobe = 16
distances, indices = index_ivfpq.search(query, k=10)

# Memory estimate
bytes_per_vec = M * nbits // 8
total_mb = (n * bytes_per_vec) / (1024**2)
print(f"Index size estimate: {total_mb:.1f} MB")

Custom IDs with IDMap

import numpy as np

# Map sequential FAISS indices to your own int64 IDs
base_index = faiss.IndexFlatL2(d)
index      = faiss.IndexIDMap(base_index)

custom_ids = np.array([1001, 2002, 3003, 4004, 5005], dtype=np.int64)
vecs = np.random.rand(5, d).astype(np.float32)

index.add_with_ids(vecs, custom_ids)

distances, ids = index.search(np.random.rand(1, d).astype(np.float32), k=3)
print(f"Nearest IDs: {ids[0]}")   # Returns your custom IDs

# Remove by ID
selector = faiss.IDSelectorBatch(np.array([1001, 2002], dtype=np.int64))
index.remove_ids(selector)

GPU Acceleration

import faiss

# Single GPU
res = faiss.StandardGpuResources()   # Default: 2GB temp memory
gpu_index = faiss.index_cpu_to_gpu(res, 0, index)   # GPU 0
gpu_index.add(vectors)
distances, indices = gpu_index.search(query, k=10)

# Convert back to CPU for saving
cpu_index = faiss.index_gpu_to_cpu(gpu_index)
faiss.write_index(cpu_index, "index.faiss")

# Multi-GPU — spreads index across all available GPUs
multi_gpu_index = faiss.index_cpu_to_all_gpus(index)
distances, indices = multi_gpu_index.search(query, k=10)

# GPU resources with custom temp memory
res = faiss.StandardGpuResources()
res.setTempMemory(4 * 1024**3)   # 4 GB temp memory
res.setDefaultNullStreamAllDevices()

Serialization

# Save and load index
faiss.write_index(index, "my_index.faiss")
loaded = faiss.read_index("my_index.faiss")

# Save to bytes (in-memory)
import io
buffer = faiss.serialize_index(index)   # Returns numpy uint8 array
# Restore
index = faiss.deserialize_index(buffer)

# Mmap for very large indexes (read-only, no copy to RAM)
loaded_mmap = faiss.read_index("my_index.faiss", faiss.IO_FLAG_MMAP)

Common Workflows

Building a Production Index Pipeline

import faiss
import numpy as np
import pickle
import os

class FAISSIndex:
    def __init__(self, dim: int, nlist: int = 100, use_gpu: bool = False):
        self.dim = dim
        self.id_to_meta = {}  # Map int64 ID → metadata dict

        quantizer     = faiss.IndexFlatL2(dim)
        base_index    = faiss.IndexIVFPQ(quantizer, dim, nlist, dim // 16, 8)
        self.index    = faiss.IndexIDMap(base_index)
        self.use_gpu  = use_gpu

    def train(self, vectors: np.ndarray):
        self.index.index.train(vectors)

    def add(self, vectors: np.ndarray, ids: np.ndarray, metadata: list[dict]):
        self.index.add_with_ids(vectors, ids)
        for i, meta in zip(ids, metadata):
            self.id_to_meta[int(i)] = meta

    def search(self, query: np.ndarray, k: int = 10, nprobe: int = 10):
        self.index.index.nprobe = nprobe
        distances, ids = self.index.search(query, k)
        results = []
        for dist_row, id_row in zip(distances, ids):
            row = []
            for d, i in zip(dist_row, id_row):
                if i != -1:  # -1 means no result (padded)
                    row.append({"id": int(i), "distance": float(d),
                                "meta": self.id_to_meta.get(int(i), {})})
            results.append(row)
        return results

    def save(self, path: str):
        faiss.write_index(self.index, path + ".faiss")
        with open(path + ".meta", "wb") as f:
            pickle.dump(self.id_to_meta, f)

    @classmethod
    def load(cls, path: str, dim: int):
        obj = cls.__new__(cls)
        obj.dim   = dim
        obj.index = faiss.read_index(path + ".faiss")
        with open(path + ".meta", "rb") as f:
            obj.id_to_meta = pickle.load(f)
        return obj

Recall Evaluation

def compute_recall(index_approx, index_exact, queries, k):
    _, gt_ids  = index_exact.search(queries, k)
    _, ann_ids = index_approx.search(queries, k)

    recall = 0.0
    for gt_row, ann_row in zip(gt_ids, ann_ids):
        recall += len(set(gt_row) & set(ann_row)) / k
    return recall / len(queries)

# Test recall at different nprobe settings
for nprobe in [1, 5, 10, 20, 50]:
    index_ivf.nprobe = nprobe
    r = compute_recall(index_ivf, index_flat, query, k=10)
    print(f"nprobe={nprobe:3d}  recall@10={r:.3f}")

Tips and Best Practices

TipDetails
Always use float32FAISS requires np.float32; add .astype(np.float32) to all vector arrays
-1 in results means missingIVF indexes pad results with -1 when fewer than k results exist in searched cells
Train on representative dataIVF/PQ training quality depends on diversity; use 30-100x nlist training vectors
Use IDMap for custom IDsFAISS internal IDs are sequential int64; wrap with IndexIDMap for arbitrary IDs
HNSW never needs remove_ids workaroundBut HNSW does not support removals; use IVF + IDMap for deletion support
Save metadata separatelyFAISS stores only vectors; use pickle/SQLite for ID-to-metadata mapping
Tune nprobe vs recall trade-offPlot recall@k vs nprobe curve; 95%+ recall typically requires nprobe = nlist * 0.1-0.2
Normalize for cosine similarityCall faiss.normalize_L2(vecs) in-place before adding and before each search
Use GPU for batch queriesGPU shines with query batch sizes >= 32; single-query GPU latency is often higher than CPU
Index factory strings for complex indexesfaiss.index_factory(d, "IVF256,PQ32") is cleaner than manual composition for complex indexes