Qdrant Cheat Sheet
Overview
Qdrant is a production-grade vector search engine written in Rust, purpose-built for high-throughput similarity search at scale. It stores points — records consisting of a vector, an optional payload (JSON metadata), and a unique ID — organized into collections. Qdrant’s Rust core delivers low latency and high throughput while its Python, JavaScript, Go, and Rust clients provide convenient access patterns for AI/ML workloads.
Key differentiators include rich payload filtering that executes alongside vector search (not as a post-filter), sparse vector support for BM25/SPLADE hybrid retrieval, named vectors per point for multi-modal storage, scalar and product quantization for memory reduction, and built-in multi-tenancy via payload-based tenant isolation. Qdrant supports both REST and gRPC APIs, making it suitable for latency-sensitive production deployments.
Qdrant can be deployed as a Docker container, a managed cloud service (Qdrant Cloud), or a distributed cluster. Its Web UI (port 6333) provides collection browsing, search testing, and cluster monitoring out of the box.
Installation
Docker (Recommended)
# Single node — development
docker pull qdrant/qdrant
docker run -d \
-p 6333:6333 \
-p 6334:6334 \
-v $(pwd)/qdrant_storage:/qdrant/storage \
--name qdrant \
qdrant/qdrant
# With custom config
docker run -d \
-p 6333:6333 \
-v $(pwd)/qdrant_storage:/qdrant/storage \
-v $(pwd)/config.yaml:/qdrant/config/production.yaml \
qdrant/qdrant \
./qdrant --config-path /qdrant/config/production.yaml
# Docker Compose — with Web UI accessible at http://localhost:6333/dashboard
cat > docker-compose.yml << 'EOF'
version: "3.9"
services:
qdrant:
image: qdrant/qdrant:latest
ports:
- "6333:6333" # REST + Web UI
- "6334:6334" # gRPC
volumes:
- qdrant_data:/qdrant/storage
environment:
- QDRANT__SERVICE__API_KEY=your-api-key
volumes:
qdrant_data:
EOF
docker compose up -d
Python Client
pip install qdrant-client
pip install qdrant-client[fastembed] # Include FastEmbed local embeddings
Binary Installation (Linux)
# Download latest release
curl -L https://github.com/qdrant/qdrant/releases/latest/download/qdrant-x86_64-unknown-linux-musl.tar.gz \
| tar xz
./qdrant # starts on port 6333
Configuration
Client Setup
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams
# In-memory (testing only)
client = QdrantClient(":memory:")
# Local persistent storage (no server required)
client = QdrantClient(path="./qdrant_local")
# Remote server
client = QdrantClient(url="http://localhost:6333")
# Remote with API key
client = QdrantClient(
url="https://xyz.qdrant.io:6333",
api_key="your-api-key"
)
# gRPC (lower latency for high throughput)
client = QdrantClient(
host="localhost",
grpc_port=6334,
prefer_grpc=True
)
# Async client
from qdrant_client import AsyncQdrantClient
async_client = AsyncQdrantClient(url="http://localhost:6333")
Collection Configuration
from qdrant_client.models import (
Distance, VectorParams, HnswConfigDiff,
OptimizersConfigDiff, QuantizationConfig,
ScalarQuantizationConfig, ScalarType
)
client.create_collection(
collection_name="articles",
vectors_config=VectorParams(
size=1536, # Embedding dimension
distance=Distance.COSINE # COSINE | EUCLID | DOT | MANHATTAN
),
hnsw_config=HnswConfigDiff(
m=16, # Connectivity (higher = better recall, more RAM)
ef_construct=100, # Build accuracy
full_scan_threshold=10_000 # Use brute-force below this count
),
optimizers_config=OptimizersConfigDiff(
indexing_threshold=20_000, # Start indexing after N vectors
memmap_threshold=50_000 # Use mmap above this size
),
quantization_config=ScalarQuantizationConfig(
type=ScalarType.INT8,
quantile=0.99,
always_ram=True # Keep quantized vectors in RAM for speed
),
on_disk_payload=True # Save payload to disk (large metadata)
)
Core Commands/API
| Method | Description |
|---|---|
client.create_collection(name, vectors_config) | Create a new collection |
client.delete_collection(name) | Delete a collection |
client.get_collection(name) | Get collection info and stats |
client.get_collections() | List all collections |
client.collection_exists(name) | Check if collection exists |
client.recreate_collection(name, ...) | Drop and recreate collection |
client.upsert(collection_name, points) | Insert or update points |
client.upload_points(collection_name, points) | Batch upload with retries |
client.upload_collection(collection_name, vectors, payload, ids) | Stream upload large datasets |
client.retrieve(collection_name, ids) | Fetch points by ID |
client.search(collection_name, query_vector, limit) | Vector similarity search |
client.query_points(collection_name, query, limit) | Unified query API (v1.10+) |
client.scroll(collection_name, limit) | Iterate all points in collection |
client.delete(collection_name, points_selector) | Delete points by ID or filter |
client.update_payload(collection_name, payload, points_selector) | Update point payload |
client.delete_payload(collection_name, keys, points_selector) | Remove payload fields |
client.count(collection_name, count_filter) | Count matching points |
client.recommend(collection_name, positive, negative) | Recommendation by examples |
client.discover(collection_name, target, context) | Contextual discovery search |
client.create_payload_index(collection_name, field_name, field_type) | Index payload field for fast filtering |
client.create_snapshot(collection_name) | Create collection snapshot |
client.update_collection(name, optimizers_config) | Update collection settings |
Advanced Usage
Upserting Points with Payload
from qdrant_client.models import PointStruct
import uuid
# Upsert individual points
client.upsert(
collection_name="articles",
points=[
PointStruct(
id=str(uuid.uuid4()),
vector=[0.1, 0.2, ...], # 1536-dim embedding
payload={
"title": "Introduction to RAG",
"source": "blog",
"year": 2024,
"tags": ["rag", "llm"]
}
)
]
)
# Batch upload with numpy arrays (memory efficient)
import numpy as np
vectors = np.random.rand(10_000, 1536).astype(np.float32)
payloads = [{"doc_id": i, "category": "tech"} for i in range(10_000)]
ids = list(range(10_000))
client.upload_collection(
collection_name="articles",
vectors=vectors,
payload=payloads,
ids=ids,
batch_size=256,
parallel=4 # Parallel upload threads
)
Filtering and Searching
from qdrant_client.models import Filter, FieldCondition, MatchValue, Range, SearchRequest
# Search with payload filter
results = client.search(
collection_name="articles",
query_vector=[0.1, 0.2, ...],
query_filter=Filter(
must=[
FieldCondition(key="source", match=MatchValue(value="blog")),
FieldCondition(key="year", range=Range(gte=2023, lte=2025))
],
must_not=[
FieldCondition(key="tags", match=MatchValue(value="deprecated"))
]
),
limit=10,
with_payload=True,
with_vectors=False,
score_threshold=0.7 # Minimum similarity score
)
for hit in results:
print(f"[{hit.score:.4f}] {hit.id}: {hit.payload['title']}")
# Batch search (multiple queries in one request)
search_requests = [
SearchRequest(vector=[0.1, 0.2, ...], limit=5),
SearchRequest(vector=[0.3, 0.4, ...], limit=5,
filter=Filter(must=[FieldCondition(key="source", match=MatchValue(value="docs"))]))
]
batch_results = client.search_batch(
collection_name="articles",
requests=search_requests
)
Sparse Vectors (Hybrid Search)
from qdrant_client.models import (
SparseVectorParams, SparseIndexParams,
SparseVector, NamedSparseVector
)
# Create collection with both dense and sparse vectors
client.create_collection(
collection_name="hybrid",
vectors_config={
"dense": VectorParams(size=384, distance=Distance.COSINE)
},
sparse_vectors_config={
"sparse": SparseVectorParams(
index=SparseIndexParams(on_disk=False)
)
}
)
# Upsert point with both vector types
from qdrant_client.models import PointStruct
client.upsert(
collection_name="hybrid",
points=[
PointStruct(
id=1,
vector={
"dense": [0.1, 0.2, ...], # 384-dim dense
"sparse": SparseVector(
indices=[10, 42, 137], # Non-zero token indices
values=[0.8, 0.3, 0.5] # SPLADE/BM25 weights
)
},
payload={"text": "example document"}
)
]
)
# Hybrid search — query both, re-rank with RRF
from qdrant_client.models import Prefetch, FusionQuery, Fusion
results = client.query_points(
collection_name="hybrid",
prefetch=[
Prefetch(query=[0.1, 0.2, ...], using="dense", limit=20),
Prefetch(query=SparseVector(indices=[10, 42], values=[0.8, 0.3]),
using="sparse", limit=20)
],
query=FusionQuery(fusion=Fusion.RRF),
limit=5
)
Scroll and Multi-Tenancy
from qdrant_client.models import ScrollRequest, Filter, FieldCondition, MatchValue
# Scroll through all points (pagination)
offset = None
all_points = []
while True:
results, offset = client.scroll(
collection_name="articles",
scroll_filter=Filter(
must=[FieldCondition(key="category", match=MatchValue(value="tech"))]
),
limit=100,
offset=offset,
with_payload=True
)
all_points.extend(results)
if offset is None:
break
# Multi-tenancy: isolate tenants by payload field
# Index the tenant field for performance
client.create_payload_index(
collection_name="articles",
field_name="tenant_id",
field_schema="keyword"
)
# Tenant-scoped search
def search_for_tenant(tenant_id: str, query_vector, limit: int = 10):
return client.search(
collection_name="articles",
query_vector=query_vector,
query_filter=Filter(
must=[FieldCondition(key="tenant_id", match=MatchValue(value=tenant_id))]
),
limit=limit
)
Snapshots and Backup
# Create snapshot via REST API
curl -X POST http://localhost:6333/collections/articles/snapshots
# List snapshots
curl http://localhost:6333/collections/articles/snapshots
# Download snapshot
curl -O http://localhost:6333/collections/articles/snapshots/articles-123.snapshot
# Restore from snapshot
curl -X POST "http://localhost:6333/collections/articles/snapshots/upload?priority=snapshot" \
-H "Content-Type: multipart/form-data" \
-F "snapshot=@articles-123.snapshot"
Common Workflows
Full RAG Pipeline with FastEmbed
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams
# FastEmbed runs locally — no API key needed
client = QdrantClient(url="http://localhost:6333")
client.set_model("BAAI/bge-small-en-v1.5") # 384-dim, fast
client.create_collection(
collection_name="docs",
vectors_config=client.get_fastembed_vector_params(on_disk=False)
)
# Ingest documents (FastEmbed handles embedding)
documents = ["Document one text", "Document two text", "Document three text"]
metadata = [{"source": "wiki"}, {"source": "pdf"}, {"source": "web"}]
client.add(
collection_name="docs",
documents=documents,
metadata=metadata
)
# Query — text is embedded automatically
hits = client.query(
collection_name="docs",
query_text="search query here",
limit=5
)
for hit in hits:
print(hit.document, hit.score)
Payload Index for Fast Filtering
# Create indexes on frequently filtered fields
client.create_payload_index("articles", "source", "keyword")
client.create_payload_index("articles", "year", "integer")
client.create_payload_index("articles", "published","bool")
client.create_payload_index("articles", "score", "float")
# Full-text index
client.create_payload_index("articles", "content", "text")
Tips and Best Practices
| Tip | Details |
|---|---|
| Index payload fields | Always create payload indexes on filtered fields; unindexed filters scan all vectors |
| Use scalar quantization | INT8 quantization cuts memory ~4x with <5% recall loss; enable always_ram=True |
Set score_threshold | Filter out low-quality results at the engine level rather than post-processing |
| Prefer gRPC for production | gRPC (port 6334) has lower overhead than REST for high-QPS workloads |
| Upload in batches of 256 | upload_collection with batch_size=256 and parallel=4 maximizes throughput |
Use upload_collection for large datasets | It streams data and handles retries automatically |
| Named vectors for multi-modal | Store image and text embeddings as separate named vectors on the same point |
| Monitor via Web UI | Access http://localhost:6333/dashboard for real-time collection stats |
| Choose EUCLID for normalized vectors | DOT and COSINE are equivalent for L2-normalized embeddings; EUCLID is fastest |
Tune hnsw:m carefully | Higher M improves recall but increases index size and build time |