Skip to content

Qdrant Cheat Sheet

Overview

Qdrant is a production-grade vector search engine written in Rust, purpose-built for high-throughput similarity search at scale. It stores points — records consisting of a vector, an optional payload (JSON metadata), and a unique ID — organized into collections. Qdrant’s Rust core delivers low latency and high throughput while its Python, JavaScript, Go, and Rust clients provide convenient access patterns for AI/ML workloads.

Key differentiators include rich payload filtering that executes alongside vector search (not as a post-filter), sparse vector support for BM25/SPLADE hybrid retrieval, named vectors per point for multi-modal storage, scalar and product quantization for memory reduction, and built-in multi-tenancy via payload-based tenant isolation. Qdrant supports both REST and gRPC APIs, making it suitable for latency-sensitive production deployments.

Qdrant can be deployed as a Docker container, a managed cloud service (Qdrant Cloud), or a distributed cluster. Its Web UI (port 6333) provides collection browsing, search testing, and cluster monitoring out of the box.

Installation

# Single node — development
docker pull qdrant/qdrant
docker run -d \
  -p 6333:6333 \
  -p 6334:6334 \
  -v $(pwd)/qdrant_storage:/qdrant/storage \
  --name qdrant \
  qdrant/qdrant

# With custom config
docker run -d \
  -p 6333:6333 \
  -v $(pwd)/qdrant_storage:/qdrant/storage \
  -v $(pwd)/config.yaml:/qdrant/config/production.yaml \
  qdrant/qdrant \
  ./qdrant --config-path /qdrant/config/production.yaml

# Docker Compose — with Web UI accessible at http://localhost:6333/dashboard
cat > docker-compose.yml << 'EOF'
version: "3.9"
services:
  qdrant:
    image: qdrant/qdrant:latest
    ports:
      - "6333:6333"   # REST + Web UI
      - "6334:6334"   # gRPC
    volumes:
      - qdrant_data:/qdrant/storage
    environment:
      - QDRANT__SERVICE__API_KEY=your-api-key
volumes:
  qdrant_data:
EOF
docker compose up -d

Python Client

pip install qdrant-client
pip install qdrant-client[fastembed]   # Include FastEmbed local embeddings

Binary Installation (Linux)

# Download latest release
curl -L https://github.com/qdrant/qdrant/releases/latest/download/qdrant-x86_64-unknown-linux-musl.tar.gz \
  | tar xz
./qdrant   # starts on port 6333

Configuration

Client Setup

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams

# In-memory (testing only)
client = QdrantClient(":memory:")

# Local persistent storage (no server required)
client = QdrantClient(path="./qdrant_local")

# Remote server
client = QdrantClient(url="http://localhost:6333")

# Remote with API key
client = QdrantClient(
    url="https://xyz.qdrant.io:6333",
    api_key="your-api-key"
)

# gRPC (lower latency for high throughput)
client = QdrantClient(
    host="localhost",
    grpc_port=6334,
    prefer_grpc=True
)

# Async client
from qdrant_client import AsyncQdrantClient
async_client = AsyncQdrantClient(url="http://localhost:6333")

Collection Configuration

from qdrant_client.models import (
    Distance, VectorParams, HnswConfigDiff,
    OptimizersConfigDiff, QuantizationConfig,
    ScalarQuantizationConfig, ScalarType
)

client.create_collection(
    collection_name="articles",
    vectors_config=VectorParams(
        size=1536,              # Embedding dimension
        distance=Distance.COSINE  # COSINE | EUCLID | DOT | MANHATTAN
    ),
    hnsw_config=HnswConfigDiff(
        m=16,                  # Connectivity (higher = better recall, more RAM)
        ef_construct=100,      # Build accuracy
        full_scan_threshold=10_000  # Use brute-force below this count
    ),
    optimizers_config=OptimizersConfigDiff(
        indexing_threshold=20_000,  # Start indexing after N vectors
        memmap_threshold=50_000     # Use mmap above this size
    ),
    quantization_config=ScalarQuantizationConfig(
        type=ScalarType.INT8,
        quantile=0.99,
        always_ram=True       # Keep quantized vectors in RAM for speed
    ),
    on_disk_payload=True      # Save payload to disk (large metadata)
)

Core Commands/API

MethodDescription
client.create_collection(name, vectors_config)Create a new collection
client.delete_collection(name)Delete a collection
client.get_collection(name)Get collection info and stats
client.get_collections()List all collections
client.collection_exists(name)Check if collection exists
client.recreate_collection(name, ...)Drop and recreate collection
client.upsert(collection_name, points)Insert or update points
client.upload_points(collection_name, points)Batch upload with retries
client.upload_collection(collection_name, vectors, payload, ids)Stream upload large datasets
client.retrieve(collection_name, ids)Fetch points by ID
client.search(collection_name, query_vector, limit)Vector similarity search
client.query_points(collection_name, query, limit)Unified query API (v1.10+)
client.scroll(collection_name, limit)Iterate all points in collection
client.delete(collection_name, points_selector)Delete points by ID or filter
client.update_payload(collection_name, payload, points_selector)Update point payload
client.delete_payload(collection_name, keys, points_selector)Remove payload fields
client.count(collection_name, count_filter)Count matching points
client.recommend(collection_name, positive, negative)Recommendation by examples
client.discover(collection_name, target, context)Contextual discovery search
client.create_payload_index(collection_name, field_name, field_type)Index payload field for fast filtering
client.create_snapshot(collection_name)Create collection snapshot
client.update_collection(name, optimizers_config)Update collection settings

Advanced Usage

Upserting Points with Payload

from qdrant_client.models import PointStruct
import uuid

# Upsert individual points
client.upsert(
    collection_name="articles",
    points=[
        PointStruct(
            id=str(uuid.uuid4()),
            vector=[0.1, 0.2, ...],     # 1536-dim embedding
            payload={
                "title": "Introduction to RAG",
                "source": "blog",
                "year": 2024,
                "tags": ["rag", "llm"]
            }
        )
    ]
)

# Batch upload with numpy arrays (memory efficient)
import numpy as np
vectors = np.random.rand(10_000, 1536).astype(np.float32)
payloads = [{"doc_id": i, "category": "tech"} for i in range(10_000)]
ids = list(range(10_000))

client.upload_collection(
    collection_name="articles",
    vectors=vectors,
    payload=payloads,
    ids=ids,
    batch_size=256,
    parallel=4           # Parallel upload threads
)

Filtering and Searching

from qdrant_client.models import Filter, FieldCondition, MatchValue, Range, SearchRequest

# Search with payload filter
results = client.search(
    collection_name="articles",
    query_vector=[0.1, 0.2, ...],
    query_filter=Filter(
        must=[
            FieldCondition(key="source", match=MatchValue(value="blog")),
            FieldCondition(key="year",   range=Range(gte=2023, lte=2025))
        ],
        must_not=[
            FieldCondition(key="tags", match=MatchValue(value="deprecated"))
        ]
    ),
    limit=10,
    with_payload=True,
    with_vectors=False,
    score_threshold=0.7   # Minimum similarity score
)

for hit in results:
    print(f"[{hit.score:.4f}] {hit.id}: {hit.payload['title']}")

# Batch search (multiple queries in one request)
search_requests = [
    SearchRequest(vector=[0.1, 0.2, ...], limit=5),
    SearchRequest(vector=[0.3, 0.4, ...], limit=5,
                  filter=Filter(must=[FieldCondition(key="source", match=MatchValue(value="docs"))]))
]
batch_results = client.search_batch(
    collection_name="articles",
    requests=search_requests
)
from qdrant_client.models import (
    SparseVectorParams, SparseIndexParams,
    SparseVector, NamedSparseVector
)

# Create collection with both dense and sparse vectors
client.create_collection(
    collection_name="hybrid",
    vectors_config={
        "dense": VectorParams(size=384, distance=Distance.COSINE)
    },
    sparse_vectors_config={
        "sparse": SparseVectorParams(
            index=SparseIndexParams(on_disk=False)
        )
    }
)

# Upsert point with both vector types
from qdrant_client.models import PointStruct
client.upsert(
    collection_name="hybrid",
    points=[
        PointStruct(
            id=1,
            vector={
                "dense": [0.1, 0.2, ...],          # 384-dim dense
                "sparse": SparseVector(
                    indices=[10, 42, 137],           # Non-zero token indices
                    values=[0.8, 0.3, 0.5]          # SPLADE/BM25 weights
                )
            },
            payload={"text": "example document"}
        )
    ]
)

# Hybrid search — query both, re-rank with RRF
from qdrant_client.models import Prefetch, FusionQuery, Fusion
results = client.query_points(
    collection_name="hybrid",
    prefetch=[
        Prefetch(query=[0.1, 0.2, ...], using="dense", limit=20),
        Prefetch(query=SparseVector(indices=[10, 42], values=[0.8, 0.3]),
                 using="sparse", limit=20)
    ],
    query=FusionQuery(fusion=Fusion.RRF),
    limit=5
)

Scroll and Multi-Tenancy

from qdrant_client.models import ScrollRequest, Filter, FieldCondition, MatchValue

# Scroll through all points (pagination)
offset = None
all_points = []
while True:
    results, offset = client.scroll(
        collection_name="articles",
        scroll_filter=Filter(
            must=[FieldCondition(key="category", match=MatchValue(value="tech"))]
        ),
        limit=100,
        offset=offset,
        with_payload=True
    )
    all_points.extend(results)
    if offset is None:
        break

# Multi-tenancy: isolate tenants by payload field
# Index the tenant field for performance
client.create_payload_index(
    collection_name="articles",
    field_name="tenant_id",
    field_schema="keyword"
)

# Tenant-scoped search
def search_for_tenant(tenant_id: str, query_vector, limit: int = 10):
    return client.search(
        collection_name="articles",
        query_vector=query_vector,
        query_filter=Filter(
            must=[FieldCondition(key="tenant_id", match=MatchValue(value=tenant_id))]
        ),
        limit=limit
    )

Snapshots and Backup

# Create snapshot via REST API
curl -X POST http://localhost:6333/collections/articles/snapshots

# List snapshots
curl http://localhost:6333/collections/articles/snapshots

# Download snapshot
curl -O http://localhost:6333/collections/articles/snapshots/articles-123.snapshot

# Restore from snapshot
curl -X POST "http://localhost:6333/collections/articles/snapshots/upload?priority=snapshot" \
  -H "Content-Type: multipart/form-data" \
  -F "snapshot=@articles-123.snapshot"

Common Workflows

Full RAG Pipeline with FastEmbed

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams

# FastEmbed runs locally — no API key needed
client = QdrantClient(url="http://localhost:6333")
client.set_model("BAAI/bge-small-en-v1.5")  # 384-dim, fast

client.create_collection(
    collection_name="docs",
    vectors_config=client.get_fastembed_vector_params(on_disk=False)
)

# Ingest documents (FastEmbed handles embedding)
documents = ["Document one text", "Document two text", "Document three text"]
metadata  = [{"source": "wiki"}, {"source": "pdf"}, {"source": "web"}]

client.add(
    collection_name="docs",
    documents=documents,
    metadata=metadata
)

# Query — text is embedded automatically
hits = client.query(
    collection_name="docs",
    query_text="search query here",
    limit=5
)

for hit in hits:
    print(hit.document, hit.score)

Payload Index for Fast Filtering

# Create indexes on frequently filtered fields
client.create_payload_index("articles", "source",   "keyword")
client.create_payload_index("articles", "year",     "integer")
client.create_payload_index("articles", "published","bool")
client.create_payload_index("articles", "score",    "float")
# Full-text index
client.create_payload_index("articles", "content",  "text")

Tips and Best Practices

TipDetails
Index payload fieldsAlways create payload indexes on filtered fields; unindexed filters scan all vectors
Use scalar quantizationINT8 quantization cuts memory ~4x with <5% recall loss; enable always_ram=True
Set score_thresholdFilter out low-quality results at the engine level rather than post-processing
Prefer gRPC for productiongRPC (port 6334) has lower overhead than REST for high-QPS workloads
Upload in batches of 256upload_collection with batch_size=256 and parallel=4 maximizes throughput
Use upload_collection for large datasetsIt streams data and handles retries automatically
Named vectors for multi-modalStore image and text embeddings as separate named vectors on the same point
Monitor via Web UIAccess http://localhost:6333/dashboard for real-time collection stats
Choose EUCLID for normalized vectorsDOT and COSINE are equivalent for L2-normalized embeddings; EUCLID is fastest
Tune hnsw:m carefullyHigher M improves recall but increases index size and build time