FAISS Cheat Sheet
Overview
FAISS (Facebook AI Similarity Search) is a library developed by Meta AI Research for fast similarity search and dense vector clustering. Written in C++ with Python and Go bindings, FAISS is optimized for searching large collections — from thousands to billions of vectors — and provides both exact (brute-force) and approximate nearest-neighbor (ANN) algorithms.
FAISS is a library, not a database — it has no persistence layer, server mode, or metadata storage built in. You manage loading/saving indexes to disk, and any metadata (document IDs, payloads) must be maintained in a parallel data structure. This low-level nature makes FAISS extremely fast and memory-efficient but requires more application code than managed vector databases.
The library supports CPU and GPU (CUDA) execution, SIMD-optimized distance computations, multiple quantization schemes (Scalar Quantizer, Product Quantizer, HNSW), and composite index types that chain transforms and quantizers. FAISS underpins many vector database engines (including components of Milvus and Weaviate) and remains the reference implementation for ANN benchmarks.
Installation
Python
# CPU-only (conda recommended for FAISS)
conda install -c pytorch faiss-cpu
# GPU (requires CUDA)
conda install -c pytorch faiss-gpu
# pip (CPU wheels, may lag conda releases)
pip install faiss-cpu
pip install faiss-gpu # CUDA 11/12 builds
# Verify installation
python -c "import faiss; print(faiss.__version__)"
python -c "import faiss; print(faiss.get_num_gpus())" # GPU count
From Source (for custom CUDA builds)
git clone https://github.com/facebookresearch/faiss.git
cd faiss
cmake -B build \
-DFAISS_ENABLE_GPU=ON \
-DFAISS_ENABLE_PYTHON=ON \
-DCMAKE_BUILD_TYPE=Release \
-DCUDA_ARCHITECTURES="80;86" # Ampere GPUs
cmake --build build -j8
cd build/faiss/python && pip install -e .
C++ Library
# Debian/Ubuntu
sudo apt install -y libfaiss-dev
# macOS
brew install faiss
Configuration
Index Selection Guide
Exact Search (small datasets, <100K vectors):
faiss.IndexFlatL2 — Exact L2, no training
faiss.IndexFlatIP — Exact inner product, no training
Approximate Search (large datasets):
faiss.IndexIVFFlat — IVF + flat storage, good baseline ANN
faiss.IndexHNSWFlat — HNSW graph, excellent recall, high RAM
faiss.IndexIVFPQ — IVF + Product Quantization, low memory
faiss.IndexIVFScalarQuantizer — IVF + SQ, balanced speed/memory
Compression:
faiss.IndexPQ — Pure Product Quantization
faiss.IndexScalarQuantizer — Scalar quantization (4-bit, 8-bit)
GPU Acceleration:
faiss.index_cpu_to_gpu() — Move any CPU index to GPU
faiss.index_cpu_to_all_gpus() — Multi-GPU index
Distance Metrics
import faiss
faiss.METRIC_L2 # Euclidean distance (default)
faiss.METRIC_INNER_PRODUCT # Dot product (use with normalized vectors for cosine)
faiss.METRIC_L1 # Manhattan distance
faiss.METRIC_Linf # Chebyshev distance
faiss.METRIC_Canberra # Canberra distance
Core Commands/API
| Method | Description |
|---|---|
faiss.IndexFlatL2(d) | Exact L2 index, dimension d |
faiss.IndexFlatIP(d) | Exact inner product index |
faiss.IndexHNSWFlat(d, M) | HNSW index, M connections per node |
faiss.IndexIVFFlat(quantizer, d, nlist) | IVF with flat storage |
faiss.IndexIVFPQ(quantizer, d, nlist, M, nbits) | IVF with Product Quantization |
faiss.IndexIVFScalarQuantizer(quantizer, d, nlist, qt) | IVF + scalar quantization |
faiss.IndexPQ(d, M, nbits) | Pure Product Quantization |
faiss.IndexScalarQuantizer(d, qt) | Scalar quantization index |
faiss.IndexIDMap(index) | Wrap index to support custom int64 IDs |
faiss.IndexIDMap2(index) | IDMap with reverse lookup capability |
index.train(vectors) | Train index on representative sample |
index.add(vectors) | Add vectors (sequential IDs) |
index.add_with_ids(vectors, ids) | Add vectors with custom IDs (IDMap) |
index.search(query, k) | Search k nearest neighbors |
index.range_search(query, radius) | Find all vectors within radius |
index.remove_ids(id_selector) | Remove vectors by ID |
index.reconstruct(i) | Reconstruct vector at index position i |
index.ntotal | Number of vectors in index |
index.is_trained | Whether index needs training |
faiss.write_index(index, path) | Serialize index to file |
faiss.read_index(path) | Load index from file |
faiss.index_cpu_to_gpu(res, dev, index) | Move index to GPU |
faiss.index_gpu_to_cpu(index) | Move GPU index back to CPU |
faiss.normalize_L2(vectors) | In-place L2 normalization |
Advanced Usage
Flat (Exact) Index
import faiss
import numpy as np
d = 1536 # Vector dimension
n = 10_000 # Number of vectors
# Generate random vectors (replace with real embeddings)
np.random.seed(42)
vectors = np.random.rand(n, d).astype(np.float32)
# Build exact index
index = faiss.IndexFlatL2(d)
index.add(vectors)
print(f"Vectors in index: {index.ntotal}")
# Search — returns (distances, indices)
query = np.random.rand(5, d).astype(np.float32) # 5 query vectors
distances, indices = index.search(query, k=10) # Top-10 for each query
print(f"Shape: distances={distances.shape}, indices={indices.shape}")
for i in range(len(query)):
print(f"Query {i}: nearest={indices[i][0]}, distance={distances[i][0]:.4f}")
# Cosine similarity — normalize first
faiss.normalize_L2(vectors)
faiss.normalize_L2(query)
index_ip = faiss.IndexFlatIP(d)
index_ip.add(vectors)
scores, ids = index_ip.search(query, k=5) # scores = cosine similarity
HNSW Index
# HNSW — excellent recall, high RAM, no training
# M: connections per node (4-64); higher = better recall, more memory
index_hnsw = faiss.IndexHNSWFlat(d, M=32)
# Tuning parameters
index_hnsw.hnsw.efConstruction = 200 # Build quality (40-800)
index_hnsw.hnsw.efSearch = 64 # Query-time accuracy (set before search)
index_hnsw.add(vectors)
# Set efSearch before each search call
index_hnsw.hnsw.efSearch = 128
distances, indices = index_hnsw.search(query, k=10)
# HNSW does not support remove_ids — rebuild index to remove vectors
# HNSW does not support GPU transfer
IVF Index (Trained ANN)
# IVF: partition space into nlist cells, search nprobe cells per query
# Rule of thumb: nlist = 4 * sqrt(n)
nlist = 100
quantizer = faiss.IndexFlatL2(d) # Coarse quantizer
index_ivf = faiss.IndexIVFFlat(quantizer, d, nlist, faiss.METRIC_L2)
# Training required — use representative sample (at least 39 * nlist vectors)
train_data = vectors[:8_000]
assert not index_ivf.is_trained
index_ivf.train(train_data)
assert index_ivf.is_trained
index_ivf.add(vectors)
# Set nprobe before search (higher = better recall, slower)
index_ivf.nprobe = 10 # Search 10 of 100 cells
distances, indices = index_ivf.search(query, k=5)
IVF + Product Quantization (Low Memory)
# PQ compresses vectors: M subvectors of nbits each
# Memory per vector = M * nbits / 8 bytes (vs 4*d bytes for float32)
# Example: d=1536, M=96, nbits=8 → 96 bytes vs 6144 bytes (64x compression)
nlist = 256
M = 96 # Number of subvectors (d must be divisible by M)
nbits = 8 # Bits per subvector (8 → 256 centroids)
quantizer = faiss.IndexFlatL2(d)
index_ivfpq = faiss.IndexIVFPQ(quantizer, d, nlist, M, nbits)
index_ivfpq.train(vectors) # Needs >= 256 * nlist training vectors
index_ivfpq.add(vectors)
index_ivfpq.nprobe = 16
distances, indices = index_ivfpq.search(query, k=10)
# Memory estimate
bytes_per_vec = M * nbits // 8
total_mb = (n * bytes_per_vec) / (1024**2)
print(f"Index size estimate: {total_mb:.1f} MB")
Custom IDs with IDMap
import numpy as np
# Map sequential FAISS indices to your own int64 IDs
base_index = faiss.IndexFlatL2(d)
index = faiss.IndexIDMap(base_index)
custom_ids = np.array([1001, 2002, 3003, 4004, 5005], dtype=np.int64)
vecs = np.random.rand(5, d).astype(np.float32)
index.add_with_ids(vecs, custom_ids)
distances, ids = index.search(np.random.rand(1, d).astype(np.float32), k=3)
print(f"Nearest IDs: {ids[0]}") # Returns your custom IDs
# Remove by ID
selector = faiss.IDSelectorBatch(np.array([1001, 2002], dtype=np.int64))
index.remove_ids(selector)
GPU Acceleration
import faiss
# Single GPU
res = faiss.StandardGpuResources() # Default: 2GB temp memory
gpu_index = faiss.index_cpu_to_gpu(res, 0, index) # GPU 0
gpu_index.add(vectors)
distances, indices = gpu_index.search(query, k=10)
# Convert back to CPU for saving
cpu_index = faiss.index_gpu_to_cpu(gpu_index)
faiss.write_index(cpu_index, "index.faiss")
# Multi-GPU — spreads index across all available GPUs
multi_gpu_index = faiss.index_cpu_to_all_gpus(index)
distances, indices = multi_gpu_index.search(query, k=10)
# GPU resources with custom temp memory
res = faiss.StandardGpuResources()
res.setTempMemory(4 * 1024**3) # 4 GB temp memory
res.setDefaultNullStreamAllDevices()
Serialization
# Save and load index
faiss.write_index(index, "my_index.faiss")
loaded = faiss.read_index("my_index.faiss")
# Save to bytes (in-memory)
import io
buffer = faiss.serialize_index(index) # Returns numpy uint8 array
# Restore
index = faiss.deserialize_index(buffer)
# Mmap for very large indexes (read-only, no copy to RAM)
loaded_mmap = faiss.read_index("my_index.faiss", faiss.IO_FLAG_MMAP)
Common Workflows
Building a Production Index Pipeline
import faiss
import numpy as np
import pickle
import os
class FAISSIndex:
def __init__(self, dim: int, nlist: int = 100, use_gpu: bool = False):
self.dim = dim
self.id_to_meta = {} # Map int64 ID → metadata dict
quantizer = faiss.IndexFlatL2(dim)
base_index = faiss.IndexIVFPQ(quantizer, dim, nlist, dim // 16, 8)
self.index = faiss.IndexIDMap(base_index)
self.use_gpu = use_gpu
def train(self, vectors: np.ndarray):
self.index.index.train(vectors)
def add(self, vectors: np.ndarray, ids: np.ndarray, metadata: list[dict]):
self.index.add_with_ids(vectors, ids)
for i, meta in zip(ids, metadata):
self.id_to_meta[int(i)] = meta
def search(self, query: np.ndarray, k: int = 10, nprobe: int = 10):
self.index.index.nprobe = nprobe
distances, ids = self.index.search(query, k)
results = []
for dist_row, id_row in zip(distances, ids):
row = []
for d, i in zip(dist_row, id_row):
if i != -1: # -1 means no result (padded)
row.append({"id": int(i), "distance": float(d),
"meta": self.id_to_meta.get(int(i), {})})
results.append(row)
return results
def save(self, path: str):
faiss.write_index(self.index, path + ".faiss")
with open(path + ".meta", "wb") as f:
pickle.dump(self.id_to_meta, f)
@classmethod
def load(cls, path: str, dim: int):
obj = cls.__new__(cls)
obj.dim = dim
obj.index = faiss.read_index(path + ".faiss")
with open(path + ".meta", "rb") as f:
obj.id_to_meta = pickle.load(f)
return obj
Recall Evaluation
def compute_recall(index_approx, index_exact, queries, k):
_, gt_ids = index_exact.search(queries, k)
_, ann_ids = index_approx.search(queries, k)
recall = 0.0
for gt_row, ann_row in zip(gt_ids, ann_ids):
recall += len(set(gt_row) & set(ann_row)) / k
return recall / len(queries)
# Test recall at different nprobe settings
for nprobe in [1, 5, 10, 20, 50]:
index_ivf.nprobe = nprobe
r = compute_recall(index_ivf, index_flat, query, k=10)
print(f"nprobe={nprobe:3d} recall@10={r:.3f}")
Tips and Best Practices
| Tip | Details |
|---|---|
Always use float32 | FAISS requires np.float32; add .astype(np.float32) to all vector arrays |
-1 in results means missing | IVF indexes pad results with -1 when fewer than k results exist in searched cells |
| Train on representative data | IVF/PQ training quality depends on diversity; use 30-100x nlist training vectors |
| Use IDMap for custom IDs | FAISS internal IDs are sequential int64; wrap with IndexIDMap for arbitrary IDs |
HNSW never needs remove_ids workaround | But HNSW does not support removals; use IVF + IDMap for deletion support |
| Save metadata separately | FAISS stores only vectors; use pickle/SQLite for ID-to-metadata mapping |
| Tune nprobe vs recall trade-off | Plot recall@k vs nprobe curve; 95%+ recall typically requires nprobe = nlist * 0.1-0.2 |
| Normalize for cosine similarity | Call faiss.normalize_L2(vecs) in-place before adding and before each search |
| Use GPU for batch queries | GPU shines with query batch sizes >= 32; single-query GPU latency is often higher than CPU |
| Index factory strings for complex indexes | faiss.index_factory(d, "IVF256,PQ32") is cleaner than manual composition for complex indexes |