Kotaemon Cheat Sheet

Overview

Kotaemon is an open-source, customizable RAG-based document Q&A application built by Cinnamon AI. It provides a clean web interface for uploading documents, asking questions, and receiving answers with inline citations and source highlights. Kotaemon supports multiple LLM providers (OpenAI, Anthropic, Azure, Ollama), various embedding models, and both local and cloud-based vector stores.

The application is designed for both end users who want a simple document chat interface and developers who want to customize the RAG pipeline. It supports PDF, DOCX, TXT, and HTML files, features multi-user support with file management, and provides options for different retrieval strategies including hybrid search and reranking.

Installation

pip Install

pip install "kotaemon[all]"

# Start the application
kotaemon
# Opens at http://localhost:7860

Docker

docker run -d \
  --name kotaemon \
  -p 7860:7860 \
  -v kotaemon_data:/app/ktem_app_data \
  -e OPENAI_API_KEY=sk-... \
  ghcr.io/cinnamon/kotaemon:latest

Docker Compose

version: '3.8'
services:
  kotaemon:
    image: ghcr.io/cinnamon/kotaemon:latest
    ports:
      - "7860:7860"
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - GRADIO_SERVER_NAME=0.0.0.0
    volumes:
      - kotaemon_data:/app/ktem_app_data

volumes:
  kotaemon_data:

From Source

git clone https://github.com/Cinnamon/kotaemon.git
cd kotaemon

# Install dependencies
pip install -e "libs/kotaemon[all]"
pip install -e "libs/ktem"

# Start application
python app.py

Configuration

Environment Variables

# LLM Providers
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
AZURE_OPENAI_API_KEY=...
AZURE_OPENAI_ENDPOINT=https://your.openai.azure.com

# Local Models
OLLAMA_BASE_URL=http://localhost:11434

# Application Settings
GRADIO_SERVER_NAME=0.0.0.0
GRADIO_SERVER_PORT=7860
KH_APP_DATA_DIR=./ktem_app_data

flowsettings.toml

# ktem_app_data/flowsettings.toml

[KH_LLMS]
  [KH_LLMS.openai_gpt4o]
  spec = "kotaemon.llms.ChatOpenAI"
  model = "gpt-4o"
  temperature = 0.1
  max_tokens = 2048

  [KH_LLMS.ollama_llama]
  spec = "kotaemon.llms.ChatOllama"
  model = "llama3.1"
  base_url = "http://localhost:11434"

  [KH_LLMS.azure_gpt4]
  spec = "kotaemon.llms.AzureChatOpenAI"
  azure_deployment = "gpt-4o"
  azure_endpoint = "https://your.openai.azure.com"

[KH_EMBEDDINGS]
  [KH_EMBEDDINGS.openai]
  spec = "kotaemon.embeddings.OpenAIEmbeddings"
  model = "text-embedding-3-small"

  [KH_EMBEDDINGS.local]
  spec = "kotaemon.embeddings.FastEmbedEmbeddings"
  model_name = "BAAI/bge-small-en-v1.5"

[KH_VECTORSTORE]
  [KH_VECTORSTORE.default]
  spec = "kotaemon.storages.ChromaVectorStore"
  path = "./ktem_app_data/chroma"

[KH_DOCSTORE]
  [KH_DOCSTORE.default]
  spec = "kotaemon.storages.SimpleFileDocumentStore"
  path = "./ktem_app_data/docstore"

Core Features

Document Management

Feature	Description
Upload	PDF, DOCX, TXT, HTML, MD files
Multi-file	Upload multiple documents at once
Collections	Organize documents into groups
Indexing	Automatic chunking and embedding
Delete	Remove documents and their indexes

Retrieval Settings

Setting	Options	Default
Retrieval Method	Vector, Keyword, Hybrid	Hybrid
Top K	1-20	5
Chunk Size	256-2048	1024
Chunk Overlap	0-512	256
Reranking	Cohere, Cross-encoder, None	None
Citation Mode	Inline, Footnote	Inline

Chat Interface Features

- Streaming responses
- Inline source citations with page numbers
- Source document preview with highlights
- Conversation history
- Regenerate answers
- Copy response to clipboard
- Switch between LLM models mid-conversation
- Upload documents directly in chat

Customization

Custom RAG Pipeline

from kotaemon.base import BaseComponent
from kotaemon.llms import ChatOpenAI
from kotaemon.embeddings import OpenAIEmbeddings

class CustomRAGPipeline(BaseComponent):
    llm: ChatOpenAI = ChatOpenAI(model="gpt-4o")
    embeddings: OpenAIEmbeddings = OpenAIEmbeddings()

    def run(self, query: str, documents: list) -> str:
        # Custom retrieval logic
        chunks = self.retrieve(query, documents)

        # Custom generation
        context = "\n".join([c.text for c in chunks])
        prompt = f"Context: {context}\nQuestion: {query}\nAnswer:"
        response = self.llm(prompt)

        return response

Adding Custom Document Loaders

from kotaemon.loaders import BaseLoader

class CustomPDFLoader(BaseLoader):
    def load(self, file_path: str):
        # Custom PDF parsing logic
        import pdfplumber
        with pdfplumber.open(file_path) as pdf:
            pages = []
            for page in pdf.pages:
                text = page.extract_text()
                if text:
                    pages.append({
                        "text": text,
                        "metadata": {
                            "page": page.page_number,
                            "source": file_path
                        }
                    })
        return pages

Custom Embedding Model

from kotaemon.embeddings import BaseEmbeddings
import numpy as np

class CustomEmbeddings(BaseEmbeddings):
    def embed_documents(self, texts: list[str]) -> list[list[float]]:
        # Your embedding logic
        from sentence_transformers import SentenceTransformer
        model = SentenceTransformer("BAAI/bge-small-en-v1.5")
        embeddings = model.encode(texts)
        return embeddings.tolist()

    def embed_query(self, text: str) -> list[float]:
        return self.embed_documents([text])[0]

Advanced Usage

Multi-User Setup

# flowsettings.toml
[KH_AUTH]
enabled = true
admin_username = "admin"
admin_password = "changeme"

[KH_USERS]
  [KH_USERS.user1]
  password = "user1pass"
  role = "user"

  [KH_USERS.user2]
  password = "user2pass"
  role = "admin"

External Vector Store

# Use Qdrant
[KH_VECTORSTORE.qdrant]
spec = "kotaemon.storages.QdrantVectorStore"
url = "http://localhost:6333"
collection_name = "kotaemon_docs"

# Use Milvus
[KH_VECTORSTORE.milvus]
spec = "kotaemon.storages.MilvusVectorStore"
uri = "http://localhost:19530"
collection_name = "kotaemon_docs"

Reranking Configuration

[KH_RERANKERS]
  [KH_RERANKERS.cohere]
  spec = "kotaemon.rerankers.CohereReranker"
  model = "rerank-english-v3.0"
  top_n = 5

  [KH_RERANKERS.cross_encoder]
  spec = "kotaemon.rerankers.CrossEncoderReranker"
  model_name = "cross-encoder/ms-marco-MiniLM-L-12-v2"
  top_n = 5

GraphRAG Integration

# Enable GraphRAG mode for multi-hop reasoning
[KH_REASONING]
enable_graph_rag = true
graph_store = "neo4j"
neo4j_url = "bolt://localhost:7687"
neo4j_user = "neo4j"
neo4j_password = "password"

Troubleshooting

Issue	Solution
Port 7860 in use	Set `GRADIO_SERVER_PORT=7861`
Model not responding	Verify API keys in environment variables
Document upload fails	Check file format support, increase max upload size
Slow indexing	Use local embeddings model, reduce chunk size
Citations missing	Enable citation mode in retrieval settings
Memory errors	Reduce chunk overlap, process fewer documents at once
Ollama not connecting	Verify `OLLAMA_BASE_URL`, check Ollama is running
Empty responses	Check vector store has indexed documents

# View logs
python app.py 2>&1 | tee kotaemon.log

# Reset application data
rm -rf ktem_app_data/
python app.py

# Check health
curl http://localhost:7860/api/health

# Verify configuration
python -c "
import toml
config = toml.load('ktem_app_data/flowsettings.toml')
print(config.keys())
"