Overview
Kotaemon is an open-source, customizable RAG-based document Q&A application built by Cinnamon AI. It provides a clean web interface for uploading documents, asking questions, and receiving answers with inline citations and source highlights. Kotaemon supports multiple LLM providers (OpenAI, Anthropic, Azure, Ollama), various embedding models, and both local and cloud-based vector stores.
The application is designed for both end users who want a simple document chat interface and developers who want to customize the RAG pipeline. It supports PDF, DOCX, TXT, and HTML files, features multi-user support with file management, and provides options for different retrieval strategies including hybrid search and reranking.
Installation
pip Install
pip install "kotaemon[all]"
# Start the application
kotaemon
# Opens at http://localhost:7860
Docker
docker run -d \
--name kotaemon \
-p 7860:7860 \
-v kotaemon_data:/app/ktem_app_data \
-e OPENAI_API_KEY=sk-... \
ghcr.io/cinnamon/kotaemon:latest
Docker Compose
version: '3.8'
services:
kotaemon:
image: ghcr.io/cinnamon/kotaemon:latest
ports:
- "7860:7860"
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
- GRADIO_SERVER_NAME=0.0.0.0
volumes:
- kotaemon_data:/app/ktem_app_data
volumes:
kotaemon_data:
From Source
git clone https://github.com/Cinnamon/kotaemon.git
cd kotaemon
# Install dependencies
pip install -e "libs/kotaemon[all]"
pip install -e "libs/ktem"
# Start application
python app.py
Configuration
Environment Variables
# LLM Providers
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
AZURE_OPENAI_API_KEY=...
AZURE_OPENAI_ENDPOINT=https://your.openai.azure.com
# Local Models
OLLAMA_BASE_URL=http://localhost:11434
# Application Settings
GRADIO_SERVER_NAME=0.0.0.0
GRADIO_SERVER_PORT=7860
KH_APP_DATA_DIR=./ktem_app_data
flowsettings.toml
# ktem_app_data/flowsettings.toml
[KH_LLMS]
[KH_LLMS.openai_gpt4o]
spec = "kotaemon.llms.ChatOpenAI"
model = "gpt-4o"
temperature = 0.1
max_tokens = 2048
[KH_LLMS.ollama_llama]
spec = "kotaemon.llms.ChatOllama"
model = "llama3.1"
base_url = "http://localhost:11434"
[KH_LLMS.azure_gpt4]
spec = "kotaemon.llms.AzureChatOpenAI"
azure_deployment = "gpt-4o"
azure_endpoint = "https://your.openai.azure.com"
[KH_EMBEDDINGS]
[KH_EMBEDDINGS.openai]
spec = "kotaemon.embeddings.OpenAIEmbeddings"
model = "text-embedding-3-small"
[KH_EMBEDDINGS.local]
spec = "kotaemon.embeddings.FastEmbedEmbeddings"
model_name = "BAAI/bge-small-en-v1.5"
[KH_VECTORSTORE]
[KH_VECTORSTORE.default]
spec = "kotaemon.storages.ChromaVectorStore"
path = "./ktem_app_data/chroma"
[KH_DOCSTORE]
[KH_DOCSTORE.default]
spec = "kotaemon.storages.SimpleFileDocumentStore"
path = "./ktem_app_data/docstore"
Core Features
Document Management
| Feature | Description |
|---|
| Upload | PDF, DOCX, TXT, HTML, MD files |
| Multi-file | Upload multiple documents at once |
| Collections | Organize documents into groups |
| Indexing | Automatic chunking and embedding |
| Delete | Remove documents and their indexes |
Retrieval Settings
| Setting | Options | Default |
|---|
| Retrieval Method | Vector, Keyword, Hybrid | Hybrid |
| Top K | 1-20 | 5 |
| Chunk Size | 256-2048 | 1024 |
| Chunk Overlap | 0-512 | 256 |
| Reranking | Cohere, Cross-encoder, None | None |
| Citation Mode | Inline, Footnote | Inline |
Chat Interface Features
- Streaming responses
- Inline source citations with page numbers
- Source document preview with highlights
- Conversation history
- Regenerate answers
- Copy response to clipboard
- Switch between LLM models mid-conversation
- Upload documents directly in chat
Customization
Custom RAG Pipeline
from kotaemon.base import BaseComponent
from kotaemon.llms import ChatOpenAI
from kotaemon.embeddings import OpenAIEmbeddings
class CustomRAGPipeline(BaseComponent):
llm: ChatOpenAI = ChatOpenAI(model="gpt-4o")
embeddings: OpenAIEmbeddings = OpenAIEmbeddings()
def run(self, query: str, documents: list) -> str:
# Custom retrieval logic
chunks = self.retrieve(query, documents)
# Custom generation
context = "\n".join([c.text for c in chunks])
prompt = f"Context: {context}\nQuestion: {query}\nAnswer:"
response = self.llm(prompt)
return response
Adding Custom Document Loaders
from kotaemon.loaders import BaseLoader
class CustomPDFLoader(BaseLoader):
def load(self, file_path: str):
# Custom PDF parsing logic
import pdfplumber
with pdfplumber.open(file_path) as pdf:
pages = []
for page in pdf.pages:
text = page.extract_text()
if text:
pages.append({
"text": text,
"metadata": {
"page": page.page_number,
"source": file_path
}
})
return pages
Custom Embedding Model
from kotaemon.embeddings import BaseEmbeddings
import numpy as np
class CustomEmbeddings(BaseEmbeddings):
def embed_documents(self, texts: list[str]) -> list[list[float]]:
# Your embedding logic
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("BAAI/bge-small-en-v1.5")
embeddings = model.encode(texts)
return embeddings.tolist()
def embed_query(self, text: str) -> list[float]:
return self.embed_documents([text])[0]
Advanced Usage
Multi-User Setup
# flowsettings.toml
[KH_AUTH]
enabled = true
admin_username = "admin"
admin_password = "changeme"
[KH_USERS]
[KH_USERS.user1]
password = "user1pass"
role = "user"
[KH_USERS.user2]
password = "user2pass"
role = "admin"
External Vector Store
# Use Qdrant
[KH_VECTORSTORE.qdrant]
spec = "kotaemon.storages.QdrantVectorStore"
url = "http://localhost:6333"
collection_name = "kotaemon_docs"
# Use Milvus
[KH_VECTORSTORE.milvus]
spec = "kotaemon.storages.MilvusVectorStore"
uri = "http://localhost:19530"
collection_name = "kotaemon_docs"
Reranking Configuration
[KH_RERANKERS]
[KH_RERANKERS.cohere]
spec = "kotaemon.rerankers.CohereReranker"
model = "rerank-english-v3.0"
top_n = 5
[KH_RERANKERS.cross_encoder]
spec = "kotaemon.rerankers.CrossEncoderReranker"
model_name = "cross-encoder/ms-marco-MiniLM-L-12-v2"
top_n = 5
GraphRAG Integration
# Enable GraphRAG mode for multi-hop reasoning
[KH_REASONING]
enable_graph_rag = true
graph_store = "neo4j"
neo4j_url = "bolt://localhost:7687"
neo4j_user = "neo4j"
neo4j_password = "password"
Troubleshooting
| Issue | Solution |
|---|
| Port 7860 in use | Set GRADIO_SERVER_PORT=7861 |
| Model not responding | Verify API keys in environment variables |
| Document upload fails | Check file format support, increase max upload size |
| Slow indexing | Use local embeddings model, reduce chunk size |
| Citations missing | Enable citation mode in retrieval settings |
| Memory errors | Reduce chunk overlap, process fewer documents at once |
| Ollama not connecting | Verify OLLAMA_BASE_URL, check Ollama is running |
| Empty responses | Check vector store has indexed documents |
# View logs
python app.py 2>&1 | tee kotaemon.log
# Reset application data
rm -rf ktem_app_data/
python app.py
# Check health
curl http://localhost:7860/api/health
# Verify configuration
python -c "
import toml
config = toml.load('ktem_app_data/flowsettings.toml')
print(config.keys())
"