콘텐츠로 이동

pgvector Cheat Sheet

Overview

pgvector extends PostgreSQL with a native vector data type and similarity search operators, letting you build vector search capabilities directly inside an existing Postgres database. This eliminates the need for a separate vector store, simplifies operational complexity, and enables powerful hybrid queries that combine SQL filtering with nearest-neighbor search in a single query planner pass.

pgvector supports three distance metrics — L2 (Euclidean), inner product (dot product), and cosine distance — plus Hamming and Jaccard distances for binary vectors. Indexing is available via IVFFlat (inverted file with flat quantization, good for recall/speed trade-offs) and HNSW (hierarchical navigable small world, excellent recall with fast build times in newer versions). Without an index, pgvector performs exact brute-force search, which is accurate but slow at scale.

pgvector works seamlessly with SQLAlchemy, psycopg2/psycopg3, Prisma, and LangChain’s PGVector integration. It is available in managed Postgres services including Amazon RDS, Azure Database for PostgreSQL, Supabase, and Neon.

Installation

PostgreSQL Extension

# Ubuntu/Debian — build from source
sudo apt install -y postgresql-server-dev-all build-essential git

git clone --branch v0.8.0 https://github.com/pgvector/pgvector.git
cd pgvector
make && sudo make install

# Enable in PostgreSQL
psql -U postgres -c "CREATE EXTENSION IF NOT EXISTS vector;"

# Homebrew (macOS)
brew install pgvector

# Alpine / Docker
apk add --no-cache pgvector

# Debian package (PostgreSQL 16)
sudo apt install -y postgresql-16-pgvector

Docker with pgvector Pre-installed

# Official pgvector Docker image
docker run -d \
  -e POSTGRES_PASSWORD=mysecretpassword \
  -p 5432:5432 \
  -v pg_data:/var/lib/postgresql/data \
  --name pgvector \
  pgvector/pgvector:pg16

# Docker Compose
cat > docker-compose.yml << 'EOF'
version: "3.9"
services:
  postgres:
    image: pgvector/pgvector:pg16
    environment:
      POSTGRES_DB: vectordb
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: secret
    ports:
      - "5432:5432"
    volumes:
      - pg_data:/var/lib/postgresql/data
volumes:
  pg_data:
EOF
docker compose up -d

Python Clients

pip install pgvector psycopg2-binary          # psycopg2
pip install pgvector psycopg[binary]          # psycopg3
pip install pgvector sqlalchemy psycopg2-binary  # SQLAlchemy ORM
pip install langchain-postgres                 # LangChain integration

Configuration

Enable Extension and Tune PostgreSQL

-- Enable pgvector
CREATE EXTENSION IF NOT EXISTS vector;

-- Check installed version
SELECT extversion FROM pg_extension WHERE extname = 'vector';

-- PostgreSQL settings for vector workloads
-- Add to postgresql.conf
-- shared_buffers = 4GB            -- 25% of RAM
-- work_mem = 256MB                -- Per sort/hash operation
-- maintenance_work_mem = 2GB      -- Index builds
-- effective_cache_size = 12GB     -- Query planner hint
-- max_parallel_workers = 8        -- Parallel query workers

-- Apply at session level without restart
SET work_mem = '256MB';
SET maintenance_work_mem = '2GB';
SET max_parallel_workers_per_gather = 4;

Python Connection Setup

import psycopg2
from pgvector.psycopg2 import register_vector

conn = psycopg2.connect(
    host="localhost",
    port=5432,
    dbname="vectordb",
    user="postgres",
    password="secret"
)
register_vector(conn)    # Register numpy array <-> vector type adapter

# psycopg3
import psycopg
from pgvector.psycopg import register_vector_async

conn = psycopg.connect("postgresql://postgres:secret@localhost/vectordb")
conn.execute("CREATE EXTENSION IF NOT EXISTS vector")
register_vector(conn)

Core Commands/API

SQL Command / OperatorDescription
CREATE EXTENSION vectorEnable pgvector in current database
VECTOR(n)Column type for n-dimensional vectors
HALFVEC(n)Half-precision float vectors (save 50% storage)
BIT(n)Binary vectors for Hamming/Jaccard distance
<->L2 (Euclidean) distance operator
<#>Negative inner product (use for dot product similarity)
<=>Cosine distance operator
<+>L1 (Manhattan) distance operator
l2_distance(a, b)L2 distance function
inner_product(a, b)Inner product function
cosine_distance(a, b)Cosine distance function
vector_dims(v)Return number of dimensions
vector_norm(v)Return L2 norm
l2_normalize(v)Return L2-normalized vector
CREATE INDEX ... USING hnswBuild HNSW approximate index
CREATE INDEX ... USING ivfflatBuild IVFFlat approximate index
SET hnsw.ef_search = 100Set HNSW query-time accuracy (session)
SET ivfflat.probes = 10Set IVFFlat probe count (session)

Advanced Usage

Schema Design and CRUD

-- Create table with vector column
CREATE TABLE documents (
    id          BIGSERIAL PRIMARY KEY,
    content     TEXT NOT NULL,
    embedding   VECTOR(1536),        -- OpenAI text-embedding-3-small
    source      TEXT,
    category    TEXT,
    created_at  TIMESTAMPTZ DEFAULT NOW(),
    metadata    JSONB
);

-- Create partial index on subset of data
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WHERE category = 'technical';

-- Insert a vector (Python list or numpy array)
INSERT INTO documents (content, embedding, source, category)
VALUES (
    'PostgreSQL is an advanced relational database.',
    '[0.1, 0.2, 0.3, ...]'::vector,
    'wikipedia',
    'technical'
);

-- Bulk insert from Python
import numpy as np, psycopg2
from pgvector.psycopg2 import register_vector
conn = psycopg2.connect("...")
register_vector(conn)
cur = conn.cursor()

vectors = np.random.rand(1000, 1536).astype(np.float32)
texts   = [f"Document {i}" for i in range(1000)]

cur.executemany(
    "INSERT INTO documents (content, embedding) VALUES (%s, %s)",
    [(t, v) for t, v in zip(texts, vectors)]
)
conn.commit()

Indexing — HNSW vs IVFFlat

-- HNSW index (recommended for most workloads)
-- Best recall, fast queries, no training needed
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WITH (
    m = 16,              -- Max connections per layer (8-64, higher = better recall)
    ef_construction = 64 -- Build accuracy (32-200, higher = better quality)
);

-- Distance ops: vector_l2_ops | vector_ip_ops | vector_cosine_ops

-- HNSW for half-precision vectors
CREATE INDEX ON documents USING hnsw (embedding halfvec_cosine_ops);

-- IVFFlat index (less RAM, requires training data)
-- Rule of thumb: lists = rows / 1000 (min 100)
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

-- Index build progress (PostgreSQL 12+)
SELECT phase, tuples_done, tuples_total
FROM pg_stat_progress_create_index;

-- Tune ef_search at query time (higher = better recall, slower)
SET hnsw.ef_search = 100;   -- default 40
SET ivfflat.probes = 10;    -- default 1 (scan more lists for better recall)

Similarity Queries

-- K-nearest neighbors (cosine similarity)
SELECT id, content, 1 - (embedding <=> '[0.1,0.2,...]'::vector) AS similarity
FROM documents
ORDER BY embedding <=> '[0.1,0.2,...]'::vector
LIMIT 10;

-- Filter before vector search (uses index)
SELECT id, content,
       embedding <-> '[0.1,0.2,...]'::vector AS distance
FROM documents
WHERE category = 'technical'
  AND created_at > NOW() - INTERVAL '30 days'
ORDER BY embedding <-> '[0.1,0.2,...]'::vector
LIMIT 5;

-- Exact search (bypasses index — use for accuracy testing)
SET enable_indexscan = off;
SELECT id, content
FROM documents
ORDER BY embedding <-> '[0.1,0.2,...]'::vector
LIMIT 10;

-- Find near-duplicates (self-join)
SELECT a.id, b.id, a.embedding <=> b.embedding AS cosine_dist
FROM documents a
JOIN documents b ON a.id < b.id
WHERE a.embedding <=> b.embedding < 0.05
ORDER BY cosine_dist;

Hybrid Search (Vector + Full-Text)

-- Add full-text search column
ALTER TABLE documents ADD COLUMN fts_vector TSVECTOR
    GENERATED ALWAYS AS (to_tsvector('english', content)) STORED;

CREATE INDEX ON documents USING GIN (fts_vector);

-- Hybrid query: combine FTS rank with vector distance (RRF)
WITH semantic AS (
    SELECT id, ROW_NUMBER() OVER (ORDER BY embedding <=> '[0.1,0.2,...]'::vector) AS rank
    FROM documents
    LIMIT 50
),
keyword AS (
    SELECT id, ROW_NUMBER() OVER (ORDER BY ts_rank(fts_vector, query) DESC) AS rank
    FROM documents,
         plainto_tsquery('english', 'postgresql vector search') query
    WHERE fts_vector @@ query
    LIMIT 50
)
SELECT
    COALESCE(s.id, k.id) AS id,
    d.content,
    1.0 / (60 + COALESCE(s.rank, 100)) +
    1.0 / (60 + COALESCE(k.rank, 100)) AS rrf_score
FROM semantic s
FULL OUTER JOIN keyword k ON s.id = k.id
JOIN documents d ON d.id = COALESCE(s.id, k.id)
ORDER BY rrf_score DESC
LIMIT 10;

SQLAlchemy ORM Integration

from sqlalchemy import create_engine, Column, Integer, String, Text
from sqlalchemy.orm import DeclarativeBase, Session
from pgvector.sqlalchemy import Vector

engine = create_engine("postgresql+psycopg2://postgres:secret@localhost/vectordb")

class Base(DeclarativeBase):
    pass

class Document(Base):
    __tablename__ = "documents"
    id        = Column(Integer, primary_key=True)
    content   = Column(Text, nullable=False)
    embedding = Column(Vector(1536))
    source    = Column(String(100))

Base.metadata.create_all(engine)

# Insert
with Session(engine) as session:
    doc = Document(content="Hello world", embedding=[0.1] * 1536, source="test")
    session.add(doc)
    session.commit()

# Query — cosine similarity
import numpy as np
query_vec = np.random.rand(1536).tolist()

with Session(engine) as session:
    results = (
        session.query(Document)
        .order_by(Document.embedding.cosine_distance(query_vec))
        .limit(5)
        .all()
    )
    for doc in results:
        print(doc.content)

Common Workflows

Full RAG Pipeline with psycopg3

import psycopg
from pgvector.psycopg import register_vector
import openai

conn = psycopg.connect("postgresql://postgres:secret@localhost/vectordb")
register_vector(conn)

openai_client = openai.OpenAI()

def embed(text: str) -> list[float]:
    resp = openai_client.embeddings.create(
        input=text, model="text-embedding-3-small"
    )
    return resp.data[0].embedding

def ingest(documents: list[dict]):
    """documents: [{"content": ..., "source": ...}]"""
    texts = [d["content"] for d in documents]
    embeddings = [
        r.embedding for r in openai_client.embeddings.create(
            input=texts, model="text-embedding-3-small"
        ).data
    ]
    with conn.cursor() as cur:
        cur.executemany(
            "INSERT INTO documents (content, embedding, source) VALUES (%s, %s, %s)",
            [(d["content"], e, d.get("source")) for d, e in zip(documents, embeddings)]
        )
    conn.commit()

def retrieve(query: str, k: int = 5, category: str = None) -> list[str]:
    q_vec = embed(query)
    filter_sql = "AND category = %s" if category else ""
    params = [q_vec, k] if not category else [q_vec, category, k]
    with conn.cursor() as cur:
        cur.execute(f"""
            SELECT content
            FROM documents
            WHERE TRUE {filter_sql}
            ORDER BY embedding <=> %s
            LIMIT %s
        """, ([q_vec] if not category else [q_vec, category]) + [k])
        return [row[0] for row in cur.fetchall()]

def answer(question: str) -> str:
    context = "\n---\n".join(retrieve(question))
    resp = openai_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": f"Context:\n{context}"},
            {"role": "user",   "content": question}
        ]
    )
    return resp.choices[0].message.content

LangChain PGVector Integration

from langchain_postgres import PGVector
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
CONNECTION_STRING = "postgresql+psycopg://postgres:secret@localhost/vectordb"

vectorstore = PGVector(
    embeddings=embeddings,
    collection_name="langchain_docs",
    connection=CONNECTION_STRING,
    use_jsonb=True
)

# Add documents
from langchain_core.documents import Document
vectorstore.add_documents([
    Document(page_content="pgvector extends PostgreSQL.", metadata={"source": "docs"}),
    Document(page_content="HNSW provides approximate nearest neighbor search.", metadata={"source": "paper"})
])

# Search
results = vectorstore.similarity_search("vector indexing", k=3)
results_with_score = vectorstore.similarity_search_with_score("SQL vector", k=3)

# As retriever
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 5, "filter": {"source": "docs"}}
)

Tips and Best Practices

TipDetails
Use HNSW over IVFFlatHNSW requires no training, builds incrementally, and provides better recall; prefer it for new projects
Set maintenance_work_mem high for buildsSET maintenance_work_mem = '4GB' before index creation significantly speeds up HNSW builds
Normalize before inner productFor cosine similarity with <#>, normalize vectors first: SELECT l2_normalize(embedding)
Partial indexes for filtered queriesCREATE INDEX ... WHERE category = 'tech' gives exact search in filtered subsets
Use halfvec to cut storageHALFVEC(1536) uses half the storage of VECTOR(1536) with minimal accuracy loss
Run VACUUM ANALYZE after bulk insertsUpdates planner statistics so the query optimizer correctly uses vector indexes
Tune hnsw.ef_search per queryHigher values improve recall at the cost of latency; set per transaction for latency-sensitive paths
Add pgstattuple for index statsSELECT * FROM pgstattuple('documents_embedding_idx') to inspect index efficiency
Use connection poolingPgBouncer or pgpool-II reduces connection overhead for high-concurrency RAG APIs
Monitor with pg_stat_user_indexesCheck idx_scan to verify indexes are being used by your queries