pgvector Cheat Sheet
Overview
pgvector extends PostgreSQL with a native vector data type and similarity search operators, letting you build vector search capabilities directly inside an existing Postgres database. This eliminates the need for a separate vector store, simplifies operational complexity, and enables powerful hybrid queries that combine SQL filtering with nearest-neighbor search in a single query planner pass.
pgvector supports three distance metrics — L2 (Euclidean), inner product (dot product), and cosine distance — plus Hamming and Jaccard distances for binary vectors. Indexing is available via IVFFlat (inverted file with flat quantization, good for recall/speed trade-offs) and HNSW (hierarchical navigable small world, excellent recall with fast build times in newer versions). Without an index, pgvector performs exact brute-force search, which is accurate but slow at scale.
pgvector works seamlessly with SQLAlchemy, psycopg2/psycopg3, Prisma, and LangChain’s PGVector integration. It is available in managed Postgres services including Amazon RDS, Azure Database for PostgreSQL, Supabase, and Neon.
Installation
PostgreSQL Extension
# Ubuntu/Debian — build from source
sudo apt install -y postgresql-server-dev-all build-essential git
git clone --branch v0.8.0 https://github.com/pgvector/pgvector.git
cd pgvector
make && sudo make install
# Enable in PostgreSQL
psql -U postgres -c "CREATE EXTENSION IF NOT EXISTS vector;"
# Homebrew (macOS)
brew install pgvector
# Alpine / Docker
apk add --no-cache pgvector
# Debian package (PostgreSQL 16)
sudo apt install -y postgresql-16-pgvector
Docker with pgvector Pre-installed
# Official pgvector Docker image
docker run -d \
-e POSTGRES_PASSWORD=mysecretpassword \
-p 5432:5432 \
-v pg_data:/var/lib/postgresql/data \
--name pgvector \
pgvector/pgvector:pg16
# Docker Compose
cat > docker-compose.yml << 'EOF'
version: "3.9"
services:
postgres:
image: pgvector/pgvector:pg16
environment:
POSTGRES_DB: vectordb
POSTGRES_USER: postgres
POSTGRES_PASSWORD: secret
ports:
- "5432:5432"
volumes:
- pg_data:/var/lib/postgresql/data
volumes:
pg_data:
EOF
docker compose up -d
Python Clients
pip install pgvector psycopg2-binary # psycopg2
pip install pgvector psycopg[binary] # psycopg3
pip install pgvector sqlalchemy psycopg2-binary # SQLAlchemy ORM
pip install langchain-postgres # LangChain integration
Configuration
Enable Extension and Tune PostgreSQL
-- Enable pgvector
CREATE EXTENSION IF NOT EXISTS vector;
-- Check installed version
SELECT extversion FROM pg_extension WHERE extname = 'vector';
-- PostgreSQL settings for vector workloads
-- Add to postgresql.conf
-- shared_buffers = 4GB -- 25% of RAM
-- work_mem = 256MB -- Per sort/hash operation
-- maintenance_work_mem = 2GB -- Index builds
-- effective_cache_size = 12GB -- Query planner hint
-- max_parallel_workers = 8 -- Parallel query workers
-- Apply at session level without restart
SET work_mem = '256MB';
SET maintenance_work_mem = '2GB';
SET max_parallel_workers_per_gather = 4;
Python Connection Setup
import psycopg2
from pgvector.psycopg2 import register_vector
conn = psycopg2.connect(
host="localhost",
port=5432,
dbname="vectordb",
user="postgres",
password="secret"
)
register_vector(conn) # Register numpy array <-> vector type adapter
# psycopg3
import psycopg
from pgvector.psycopg import register_vector_async
conn = psycopg.connect("postgresql://postgres:secret@localhost/vectordb")
conn.execute("CREATE EXTENSION IF NOT EXISTS vector")
register_vector(conn)
Core Commands/API
| SQL Command / Operator | Description |
|---|---|
CREATE EXTENSION vector | Enable pgvector in current database |
VECTOR(n) | Column type for n-dimensional vectors |
HALFVEC(n) | Half-precision float vectors (save 50% storage) |
BIT(n) | Binary vectors for Hamming/Jaccard distance |
<-> | L2 (Euclidean) distance operator |
<#> | Negative inner product (use for dot product similarity) |
<=> | Cosine distance operator |
<+> | L1 (Manhattan) distance operator |
l2_distance(a, b) | L2 distance function |
inner_product(a, b) | Inner product function |
cosine_distance(a, b) | Cosine distance function |
vector_dims(v) | Return number of dimensions |
vector_norm(v) | Return L2 norm |
l2_normalize(v) | Return L2-normalized vector |
CREATE INDEX ... USING hnsw | Build HNSW approximate index |
CREATE INDEX ... USING ivfflat | Build IVFFlat approximate index |
SET hnsw.ef_search = 100 | Set HNSW query-time accuracy (session) |
SET ivfflat.probes = 10 | Set IVFFlat probe count (session) |
Advanced Usage
Schema Design and CRUD
-- Create table with vector column
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
content TEXT NOT NULL,
embedding VECTOR(1536), -- OpenAI text-embedding-3-small
source TEXT,
category TEXT,
created_at TIMESTAMPTZ DEFAULT NOW(),
metadata JSONB
);
-- Create partial index on subset of data
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WHERE category = 'technical';
-- Insert a vector (Python list or numpy array)
INSERT INTO documents (content, embedding, source, category)
VALUES (
'PostgreSQL is an advanced relational database.',
'[0.1, 0.2, 0.3, ...]'::vector,
'wikipedia',
'technical'
);
-- Bulk insert from Python
import numpy as np, psycopg2
from pgvector.psycopg2 import register_vector
conn = psycopg2.connect("...")
register_vector(conn)
cur = conn.cursor()
vectors = np.random.rand(1000, 1536).astype(np.float32)
texts = [f"Document {i}" for i in range(1000)]
cur.executemany(
"INSERT INTO documents (content, embedding) VALUES (%s, %s)",
[(t, v) for t, v in zip(texts, vectors)]
)
conn.commit()
Indexing — HNSW vs IVFFlat
-- HNSW index (recommended for most workloads)
-- Best recall, fast queries, no training needed
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WITH (
m = 16, -- Max connections per layer (8-64, higher = better recall)
ef_construction = 64 -- Build accuracy (32-200, higher = better quality)
);
-- Distance ops: vector_l2_ops | vector_ip_ops | vector_cosine_ops
-- HNSW for half-precision vectors
CREATE INDEX ON documents USING hnsw (embedding halfvec_cosine_ops);
-- IVFFlat index (less RAM, requires training data)
-- Rule of thumb: lists = rows / 1000 (min 100)
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- Index build progress (PostgreSQL 12+)
SELECT phase, tuples_done, tuples_total
FROM pg_stat_progress_create_index;
-- Tune ef_search at query time (higher = better recall, slower)
SET hnsw.ef_search = 100; -- default 40
SET ivfflat.probes = 10; -- default 1 (scan more lists for better recall)
Similarity Queries
-- K-nearest neighbors (cosine similarity)
SELECT id, content, 1 - (embedding <=> '[0.1,0.2,...]'::vector) AS similarity
FROM documents
ORDER BY embedding <=> '[0.1,0.2,...]'::vector
LIMIT 10;
-- Filter before vector search (uses index)
SELECT id, content,
embedding <-> '[0.1,0.2,...]'::vector AS distance
FROM documents
WHERE category = 'technical'
AND created_at > NOW() - INTERVAL '30 days'
ORDER BY embedding <-> '[0.1,0.2,...]'::vector
LIMIT 5;
-- Exact search (bypasses index — use for accuracy testing)
SET enable_indexscan = off;
SELECT id, content
FROM documents
ORDER BY embedding <-> '[0.1,0.2,...]'::vector
LIMIT 10;
-- Find near-duplicates (self-join)
SELECT a.id, b.id, a.embedding <=> b.embedding AS cosine_dist
FROM documents a
JOIN documents b ON a.id < b.id
WHERE a.embedding <=> b.embedding < 0.05
ORDER BY cosine_dist;
Hybrid Search (Vector + Full-Text)
-- Add full-text search column
ALTER TABLE documents ADD COLUMN fts_vector TSVECTOR
GENERATED ALWAYS AS (to_tsvector('english', content)) STORED;
CREATE INDEX ON documents USING GIN (fts_vector);
-- Hybrid query: combine FTS rank with vector distance (RRF)
WITH semantic AS (
SELECT id, ROW_NUMBER() OVER (ORDER BY embedding <=> '[0.1,0.2,...]'::vector) AS rank
FROM documents
LIMIT 50
),
keyword AS (
SELECT id, ROW_NUMBER() OVER (ORDER BY ts_rank(fts_vector, query) DESC) AS rank
FROM documents,
plainto_tsquery('english', 'postgresql vector search') query
WHERE fts_vector @@ query
LIMIT 50
)
SELECT
COALESCE(s.id, k.id) AS id,
d.content,
1.0 / (60 + COALESCE(s.rank, 100)) +
1.0 / (60 + COALESCE(k.rank, 100)) AS rrf_score
FROM semantic s
FULL OUTER JOIN keyword k ON s.id = k.id
JOIN documents d ON d.id = COALESCE(s.id, k.id)
ORDER BY rrf_score DESC
LIMIT 10;
SQLAlchemy ORM Integration
from sqlalchemy import create_engine, Column, Integer, String, Text
from sqlalchemy.orm import DeclarativeBase, Session
from pgvector.sqlalchemy import Vector
engine = create_engine("postgresql+psycopg2://postgres:secret@localhost/vectordb")
class Base(DeclarativeBase):
pass
class Document(Base):
__tablename__ = "documents"
id = Column(Integer, primary_key=True)
content = Column(Text, nullable=False)
embedding = Column(Vector(1536))
source = Column(String(100))
Base.metadata.create_all(engine)
# Insert
with Session(engine) as session:
doc = Document(content="Hello world", embedding=[0.1] * 1536, source="test")
session.add(doc)
session.commit()
# Query — cosine similarity
import numpy as np
query_vec = np.random.rand(1536).tolist()
with Session(engine) as session:
results = (
session.query(Document)
.order_by(Document.embedding.cosine_distance(query_vec))
.limit(5)
.all()
)
for doc in results:
print(doc.content)
Common Workflows
Full RAG Pipeline with psycopg3
import psycopg
from pgvector.psycopg import register_vector
import openai
conn = psycopg.connect("postgresql://postgres:secret@localhost/vectordb")
register_vector(conn)
openai_client = openai.OpenAI()
def embed(text: str) -> list[float]:
resp = openai_client.embeddings.create(
input=text, model="text-embedding-3-small"
)
return resp.data[0].embedding
def ingest(documents: list[dict]):
"""documents: [{"content": ..., "source": ...}]"""
texts = [d["content"] for d in documents]
embeddings = [
r.embedding for r in openai_client.embeddings.create(
input=texts, model="text-embedding-3-small"
).data
]
with conn.cursor() as cur:
cur.executemany(
"INSERT INTO documents (content, embedding, source) VALUES (%s, %s, %s)",
[(d["content"], e, d.get("source")) for d, e in zip(documents, embeddings)]
)
conn.commit()
def retrieve(query: str, k: int = 5, category: str = None) -> list[str]:
q_vec = embed(query)
filter_sql = "AND category = %s" if category else ""
params = [q_vec, k] if not category else [q_vec, category, k]
with conn.cursor() as cur:
cur.execute(f"""
SELECT content
FROM documents
WHERE TRUE {filter_sql}
ORDER BY embedding <=> %s
LIMIT %s
""", ([q_vec] if not category else [q_vec, category]) + [k])
return [row[0] for row in cur.fetchall()]
def answer(question: str) -> str:
context = "\n---\n".join(retrieve(question))
resp = openai_client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": f"Context:\n{context}"},
{"role": "user", "content": question}
]
)
return resp.choices[0].message.content
LangChain PGVector Integration
from langchain_postgres import PGVector
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
CONNECTION_STRING = "postgresql+psycopg://postgres:secret@localhost/vectordb"
vectorstore = PGVector(
embeddings=embeddings,
collection_name="langchain_docs",
connection=CONNECTION_STRING,
use_jsonb=True
)
# Add documents
from langchain_core.documents import Document
vectorstore.add_documents([
Document(page_content="pgvector extends PostgreSQL.", metadata={"source": "docs"}),
Document(page_content="HNSW provides approximate nearest neighbor search.", metadata={"source": "paper"})
])
# Search
results = vectorstore.similarity_search("vector indexing", k=3)
results_with_score = vectorstore.similarity_search_with_score("SQL vector", k=3)
# As retriever
retriever = vectorstore.as_retriever(
search_type="similarity",
search_kwargs={"k": 5, "filter": {"source": "docs"}}
)
Tips and Best Practices
| Tip | Details |
|---|---|
| Use HNSW over IVFFlat | HNSW requires no training, builds incrementally, and provides better recall; prefer it for new projects |
Set maintenance_work_mem high for builds | SET maintenance_work_mem = '4GB' before index creation significantly speeds up HNSW builds |
| Normalize before inner product | For cosine similarity with <#>, normalize vectors first: SELECT l2_normalize(embedding) |
| Partial indexes for filtered queries | CREATE INDEX ... WHERE category = 'tech' gives exact search in filtered subsets |
Use halfvec to cut storage | HALFVEC(1536) uses half the storage of VECTOR(1536) with minimal accuracy loss |
Run VACUUM ANALYZE after bulk inserts | Updates planner statistics so the query optimizer correctly uses vector indexes |
Tune hnsw.ef_search per query | Higher values improve recall at the cost of latency; set per transaction for latency-sensitive paths |
Add pgstattuple for index stats | SELECT * FROM pgstattuple('documents_embedding_idx') to inspect index efficiency |
| Use connection pooling | PgBouncer or pgpool-II reduces connection overhead for high-concurrency RAG APIs |
Monitor with pg_stat_user_indexes | Check idx_scan to verify indexes are being used by your queries |