콘텐츠로 이동

Kotaemon Cheat Sheet

Overview

Kotaemon is an open-source, customizable RAG-based document Q&A application built by Cinnamon AI. It provides a clean web interface for uploading documents, asking questions, and receiving answers with inline citations and source highlights. Kotaemon supports multiple LLM providers (OpenAI, Anthropic, Azure, Ollama), various embedding models, and both local and cloud-based vector stores.

The application is designed for both end users who want a simple document chat interface and developers who want to customize the RAG pipeline. It supports PDF, DOCX, TXT, and HTML files, features multi-user support with file management, and provides options for different retrieval strategies including hybrid search and reranking.

Installation

pip Install

pip install "kotaemon[all]"

# Start the application
kotaemon
# Opens at http://localhost:7860

Docker

docker run -d \
  --name kotaemon \
  -p 7860:7860 \
  -v kotaemon_data:/app/ktem_app_data \
  -e OPENAI_API_KEY=sk-... \
  ghcr.io/cinnamon/kotaemon:latest

Docker Compose

version: '3.8'
services:
  kotaemon:
    image: ghcr.io/cinnamon/kotaemon:latest
    ports:
      - "7860:7860"
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - GRADIO_SERVER_NAME=0.0.0.0
    volumes:
      - kotaemon_data:/app/ktem_app_data

volumes:
  kotaemon_data:

From Source

git clone https://github.com/Cinnamon/kotaemon.git
cd kotaemon

# Install dependencies
pip install -e "libs/kotaemon[all]"
pip install -e "libs/ktem"

# Start application
python app.py

Configuration

Environment Variables

# LLM Providers
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
AZURE_OPENAI_API_KEY=...
AZURE_OPENAI_ENDPOINT=https://your.openai.azure.com

# Local Models
OLLAMA_BASE_URL=http://localhost:11434

# Application Settings
GRADIO_SERVER_NAME=0.0.0.0
GRADIO_SERVER_PORT=7860
KH_APP_DATA_DIR=./ktem_app_data

flowsettings.toml

# ktem_app_data/flowsettings.toml

[KH_LLMS]
  [KH_LLMS.openai_gpt4o]
  spec = "kotaemon.llms.ChatOpenAI"
  model = "gpt-4o"
  temperature = 0.1
  max_tokens = 2048

  [KH_LLMS.ollama_llama]
  spec = "kotaemon.llms.ChatOllama"
  model = "llama3.1"
  base_url = "http://localhost:11434"

  [KH_LLMS.azure_gpt4]
  spec = "kotaemon.llms.AzureChatOpenAI"
  azure_deployment = "gpt-4o"
  azure_endpoint = "https://your.openai.azure.com"

[KH_EMBEDDINGS]
  [KH_EMBEDDINGS.openai]
  spec = "kotaemon.embeddings.OpenAIEmbeddings"
  model = "text-embedding-3-small"

  [KH_EMBEDDINGS.local]
  spec = "kotaemon.embeddings.FastEmbedEmbeddings"
  model_name = "BAAI/bge-small-en-v1.5"

[KH_VECTORSTORE]
  [KH_VECTORSTORE.default]
  spec = "kotaemon.storages.ChromaVectorStore"
  path = "./ktem_app_data/chroma"

[KH_DOCSTORE]
  [KH_DOCSTORE.default]
  spec = "kotaemon.storages.SimpleFileDocumentStore"
  path = "./ktem_app_data/docstore"

Core Features

Document Management

FeatureDescription
UploadPDF, DOCX, TXT, HTML, MD files
Multi-fileUpload multiple documents at once
CollectionsOrganize documents into groups
IndexingAutomatic chunking and embedding
DeleteRemove documents and their indexes

Retrieval Settings

SettingOptionsDefault
Retrieval MethodVector, Keyword, HybridHybrid
Top K1-205
Chunk Size256-20481024
Chunk Overlap0-512256
RerankingCohere, Cross-encoder, NoneNone
Citation ModeInline, FootnoteInline

Chat Interface Features

- Streaming responses
- Inline source citations with page numbers
- Source document preview with highlights
- Conversation history
- Regenerate answers
- Copy response to clipboard
- Switch between LLM models mid-conversation
- Upload documents directly in chat

Customization

Custom RAG Pipeline

from kotaemon.base import BaseComponent
from kotaemon.llms import ChatOpenAI
from kotaemon.embeddings import OpenAIEmbeddings

class CustomRAGPipeline(BaseComponent):
    llm: ChatOpenAI = ChatOpenAI(model="gpt-4o")
    embeddings: OpenAIEmbeddings = OpenAIEmbeddings()

    def run(self, query: str, documents: list) -> str:
        # Custom retrieval logic
        chunks = self.retrieve(query, documents)

        # Custom generation
        context = "\n".join([c.text for c in chunks])
        prompt = f"Context: {context}\nQuestion: {query}\nAnswer:"
        response = self.llm(prompt)

        return response

Adding Custom Document Loaders

from kotaemon.loaders import BaseLoader

class CustomPDFLoader(BaseLoader):
    def load(self, file_path: str):
        # Custom PDF parsing logic
        import pdfplumber
        with pdfplumber.open(file_path) as pdf:
            pages = []
            for page in pdf.pages:
                text = page.extract_text()
                if text:
                    pages.append({
                        "text": text,
                        "metadata": {
                            "page": page.page_number,
                            "source": file_path
                        }
                    })
        return pages

Custom Embedding Model

from kotaemon.embeddings import BaseEmbeddings
import numpy as np

class CustomEmbeddings(BaseEmbeddings):
    def embed_documents(self, texts: list[str]) -> list[list[float]]:
        # Your embedding logic
        from sentence_transformers import SentenceTransformer
        model = SentenceTransformer("BAAI/bge-small-en-v1.5")
        embeddings = model.encode(texts)
        return embeddings.tolist()

    def embed_query(self, text: str) -> list[float]:
        return self.embed_documents([text])[0]

Advanced Usage

Multi-User Setup

# flowsettings.toml
[KH_AUTH]
enabled = true
admin_username = "admin"
admin_password = "changeme"

[KH_USERS]
  [KH_USERS.user1]
  password = "user1pass"
  role = "user"

  [KH_USERS.user2]
  password = "user2pass"
  role = "admin"

External Vector Store

# Use Qdrant
[KH_VECTORSTORE.qdrant]
spec = "kotaemon.storages.QdrantVectorStore"
url = "http://localhost:6333"
collection_name = "kotaemon_docs"

# Use Milvus
[KH_VECTORSTORE.milvus]
spec = "kotaemon.storages.MilvusVectorStore"
uri = "http://localhost:19530"
collection_name = "kotaemon_docs"

Reranking Configuration

[KH_RERANKERS]
  [KH_RERANKERS.cohere]
  spec = "kotaemon.rerankers.CohereReranker"
  model = "rerank-english-v3.0"
  top_n = 5

  [KH_RERANKERS.cross_encoder]
  spec = "kotaemon.rerankers.CrossEncoderReranker"
  model_name = "cross-encoder/ms-marco-MiniLM-L-12-v2"
  top_n = 5

GraphRAG Integration

# Enable GraphRAG mode for multi-hop reasoning
[KH_REASONING]
enable_graph_rag = true
graph_store = "neo4j"
neo4j_url = "bolt://localhost:7687"
neo4j_user = "neo4j"
neo4j_password = "password"

Troubleshooting

IssueSolution
Port 7860 in useSet GRADIO_SERVER_PORT=7861
Model not respondingVerify API keys in environment variables
Document upload failsCheck file format support, increase max upload size
Slow indexingUse local embeddings model, reduce chunk size
Citations missingEnable citation mode in retrieval settings
Memory errorsReduce chunk overlap, process fewer documents at once
Ollama not connectingVerify OLLAMA_BASE_URL, check Ollama is running
Empty responsesCheck vector store has indexed documents
# View logs
python app.py 2>&1 | tee kotaemon.log

# Reset application data
rm -rf ktem_app_data/
python app.py

# Check health
curl http://localhost:7860/api/health

# Verify configuration
python -c "
import toml
config = toml.load('ktem_app_data/flowsettings.toml')
print(config.keys())
"