Saltar a contenido

LlamaIndex Framework Cheat Sheet

Sinopsis

LlamaIndex es un poderoso marco de datos diseñado para conectar modelos de lenguaje grande (LLMs) con fuentes de datos externas, permitiendo la creación de aplicaciones de generación aumentada de recuperación (RAG). Desarrollado para abordar el desafío de los recortes de conocimientos limitados de LLMs, LlamaIndex proporciona un conjunto completo de herramientas para ingerir, estructurar y acceder a datos privados o específicos de dominio que de otro modo serían inaccesibles a los modelos de fundación.

Lo que distingue a LlamaIndex es su enfoque en la conectividad de datos y la gestión del conocimiento. El marco se basa en la transformación de datos brutos de diversas fuentes en conocimientos estructurados y cuestionables que pueden aprovechar eficazmente las LLM. Con su arquitectura modular, LlamaIndex ofrece a los desarrolladores la flexibilidad para personalizar cada componente del oleoducto RAG al tiempo que proporciona predeterminados sensibles para la implementación rápida.

LlamaIndex ha surgido como la solución de ir a la construcción de aplicaciones de gran densidad de conocimiento, desde sistemas de respuesta de preguntas y chatbots para documentar herramientas de resumen y motores de búsqueda semánticos. Su rico ecosistema de integraciones con bases de datos vectoriales, modelos de integración y proveedores de LLM lo hace adaptable a una amplia gama de casos de uso y entornos de despliegue.

Instalación y configuración

Instalación básica

# Install core LlamaIndex
pip install llama-index

# Install with specific integrations
pip install llama-index-embeddings-openai  # OpenAI embeddings
pip install llama-index-llms-openai        # OpenAI LLMs
pip install llama-index-readers-file       # File readers
pip install llama-index-vector-stores-chroma  # Chroma vector store

# Install all core packages
pip install llama-index-core[all]

# Install development version
pip install git+https://github.com/jerryjliu/llama_index.git

Configuración del medio ambiente

import os
import logging
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

# Set up API keys
os.environ["OPENAI_API_KEY"] = "your-openai-api-key"

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger()

# Global settings configuration
Settings.llm = OpenAI(model="gpt-4", temperature=0.1)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-ada-002")
Settings.chunk_size = 1024
Settings.chunk_overlap = 20

Estructura del proyecto

llamaindex_project/
├── data/
│   ├── documents/
│   └── processed/
├── indices/
│   ├── vector_index/
│   └── summary_index/
├── readers/
│   ├── __init__.py
│   └── custom_readers.py
├── retrievers/
│   ├── __init__.py
│   └── custom_retrievers.py
├── query_engines/
│   ├── __init__.py
│   └── custom_engines.py
├── prompts/
│   ├── __init__.py
│   └── custom_prompts.py
├── config/
│   ├── __init__.py
│   └── settings.py
└── main.py

Carga y procesamiento de datos

Documento Cargando

from llama_index.core import SimpleDirectoryReader
from llama_index.readers.file import PDFReader, DocxReader
from llama_index.readers.web import SimpleWebPageReader

# Load documents from a directory
documents = SimpleDirectoryReader("./data/documents").load_data()

# Load specific file types
pdf_reader = PDFReader()
pdf_documents = pdf_reader.load_data(file_path="./data/documents/report.pdf")

docx_reader = DocxReader()
docx_documents = docx_reader.load_data(file_path="./data/documents/memo.docx")

# Load web pages
web_documents = SimpleWebPageReader().load_data(
    ["https://example.com/page1", "https://example.com/page2"]
)

# Combine documents from multiple sources
all_documents = pdf_documents + docx_documents + web_documents

Documento personalizado Cargando

from llama_index.core import Document
from typing import List
import json

def load_custom_json_data(file_path: str) -> List[Document]:
    """Load custom JSON data into LlamaIndex documents."""
    with open(file_path, "r") as f:
        data = json.load(f)

    documents = []
    for item in data:
        # Create document with text content and metadata
        doc = Document(
            text=item["content"],
            metadata=\\\\{
                "title": item.get("title", ""),
                "author": item.get("author", ""),
                "date": item.get("date", ""),
                "category": item.get("category", ""),
                "source": file_path
            \\\\}
        )
        documents.append(doc)

    return documents

# Use custom loader
custom_documents = load_custom_json_data("./data/documents/custom_data.json")

Texto dividido

from llama_index.core.node_parser import SentenceSplitter, TokenTextSplitter

# Sentence-based splitter
sentence_splitter = SentenceSplitter(
    chunk_size=1024,
    chunk_overlap=20,
    paragraph_separator="\n\n",
    secondary_chunking_regex="[^,.;。]+[,.;。]?",
)
sentence_nodes = sentence_splitter.get_nodes_from_documents(documents)

# Token-based splitter
token_splitter = TokenTextSplitter(
    chunk_size=512,
    chunk_overlap=50,
    separator=" ",
)
token_nodes = token_splitter.get_nodes_from_documents(documents)

# Custom splitting logic
from llama_index.core.node_parser import SimpleNodeParser
from llama_index.core.schema import TextNode

class CustomNodeParser(SimpleNodeParser):
    def get_nodes_from_documents(self, documents):
        nodes = super().get_nodes_from_documents(documents)
        # Add custom processing for nodes
        for node in nodes:
            # Add custom metadata or transform node text
            node.metadata["processed"] = True
            node.text = node.text.replace("old_term", "new_term")
        return nodes

custom_parser = CustomNodeParser(
    chunk_size=1024,
    chunk_overlap=20,
)
custom_nodes = custom_parser.get_nodes_from_documents(documents)

Transformación de texto

from llama_index.core.ingestion import IngestionPipeline
from llama_index.core.node_parser import SentenceSplitter
from llama_index.embeddings.openai import OpenAIEmbedding

# Define custom text transformer
def clean_text_transformer(nodes):
    for node in nodes:
        # Clean text: remove extra whitespace, normalize quotes, etc.
        node.text = " ".join(node.text.split())
        node.text = node.text.replace(""", "\"").replace(""", "\"")
    return nodes

# Create ingestion pipeline
pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(chunk_size=1024, chunk_overlap=20),
        clean_text_transformer,
        OpenAIEmbedding(model="text-embedding-ada-002"),
    ]
)

# Process documents through pipeline
nodes = pipeline.run(documents=documents)

Creación y gestión de índices

Vector Index

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb

# Create a basic in-memory vector index
documents = SimpleDirectoryReader("./data/documents").load_data()
vector_index = VectorStoreIndex.from_documents(documents)

# Create persistent vector index with Chroma
chroma_client = chromadb.PersistentClient("./chroma_db")
chroma_collection = chroma_client.create_collection("my_collection")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
persistent_index = VectorStoreIndex.from_documents(
    documents,
    vector_store=vector_store
)

# Save and load index
vector_index.storage_context.persist("./storage")

from llama_index.core import StorageContext, load_index_from_storage
storage_context = StorageContext.from_defaults(persist_dir="./storage")
loaded_index = load_index_from_storage(storage_context)

Índice

from llama_index.core import SummaryIndex

# Create summary index
summary_index = SummaryIndex.from_documents(documents)

# Query with summary index
query_engine = summary_index.as_query_engine()
response = query_engine.query("Summarize the key points in these documents.")

Índice de Gráficos de Conocimiento

from llama_index.core import KnowledgeGraphIndex
from llama_index.graph_stores.neo4j import Neo4jGraphStore

# Create knowledge graph index
kg_index = KnowledgeGraphIndex.from_documents(documents)

# Create with Neo4j backend
neo4j_graph_store = Neo4jGraphStore(
    username="neo4j",
    password="password",
    url="bolt://localhost:7687",
    database="neo4j"
)
neo4j_kg_index = KnowledgeGraphIndex.from_documents(
    documents,
    graph_store=neo4j_graph_store
)

# Query knowledge graph
kg_query_engine = kg_index.as_query_engine()
response = kg_query_engine.query(
    "What is the relationship between entity A and entity B?"
)

Índice híbrido

from llama_index.core import VectorStoreIndex, SummaryIndex
from llama_index.core.query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector

# Create multiple indices
vector_index = VectorStoreIndex.from_documents(documents)
summary_index = SummaryIndex.from_documents(documents)

# Create query engines
vector_query_engine = vector_index.as_query_engine()
summary_query_engine = summary_index.as_query_engine()

# Define routing function
def route_query(query_str):
    if "summarize" in query_str.lower() or "overview" in query_str.lower():
        return "summary"
    else:
        return "vector"

# Create router query engine
query_engine_dict = \\\\{
    "vector": vector_query_engine,
    "summary": summary_query_engine
\\\\}

router_query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(),
    query_engine_dict=query_engine_dict
)

# Query with automatic routing
response = router_query_engine.query("Give me a summary of the documents.")

Consulta y recuperación

Consultas básicas

from llama_index.core import VectorStoreIndex

# Create index and query engine
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

# Simple query
response = query_engine.query("What is the main topic discussed in the documents?")
print(response)

# Access source nodes and metadata
for source_node in response.source_nodes:
    print(f"Source text: \\\\{source_node.node.text[:100]\\\\}...")
    print(f"Metadata: \\\\{source_node.node.metadata\\\\}")
    print(f"Score: \\\\{source_node.score\\\\}")
    print("---")

Configuración avanzada de consultas

from llama_index.core import VectorStoreIndex
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.retrievers import VectorIndexRetriever

# Create index
index = VectorStoreIndex.from_documents(documents)

# Configure retriever
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=5,  # Number of results to retrieve
    vector_store_query_mode="hybrid"  # hybrid, sparse, or dense
)

# Configure query engine
query_engine = RetrieverQueryEngine.from_args(
    retriever=retriever,
    response_mode="compact",  # compact, tree_summarize, refine, or simple
    node_postprocessors=[],  # Optional postprocessors for retrieved nodes
    llm=None  # Optional custom LLM
)

# Execute query
response = query_engine.query("What are the key challenges mentioned in the documents?")

Streaming Responses

from llama_index.core import VectorStoreIndex

# Create index
index = VectorStoreIndex.from_documents(documents)

# Create streaming query engine
query_engine = index.as_query_engine(streaming=True)

# Stream response
streaming_response = query_engine.query("Explain the concept of RAG in detail.")

# Process streaming response
for token in streaming_response.response_gen:
    print(token, end="", flush=True)

Filtro y Metadatos Consultas

from llama_index.core import VectorStoreIndex, StorageContext, load_index_from_storage
from llama_index.core.vector_stores import MetadataFilters, FilterOperator, MetadataFilter

# Load index
storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = load_index_from_storage(storage_context)

# Create metadata filters
filters = MetadataFilters(
    filters=[
        MetadataFilter(key="category", value="technical", operator=FilterOperator.EQ),
        MetadataFilter(key="date", value="2023-01-01", operator=FilterOperator.GTE)
    ]
)

# Create retriever with filters
retriever = index.as_retriever(
    similarity_top_k=5,
    filters=filters
)

# Create query engine with filtered retriever
query_engine = index.as_query_engine(
    retriever=retriever
)

# Execute filtered query
response = query_engine.query("What technical advancements were made after January 2023?")

Querying Multi-Modal

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.multi_modal_llms.openai import OpenAIMultiModal
from llama_index.core.schema import ImageDocument

# Load text and image documents
text_documents = SimpleDirectoryReader("./data/text").load_data()
image_paths = ["./data/images/image1.jpg", "./data/images/image2.png"]
image_documents = [ImageDocument(image_path=path) for path in image_paths]

# Combine documents
all_documents = text_documents + image_documents

# Create multi-modal index
multi_modal_llm = OpenAIMultiModal(model="gpt-4-vision-preview")
index = VectorStoreIndex.from_documents(
    all_documents,
    multi_modal_llm=multi_modal_llm
)

# Query with multi-modal context
query_engine = index.as_query_engine()
response = query_engine.query("Describe what's in the images and how it relates to the text.")

Características avanzadas

Retrieveras personalizadas

from llama_index.core.retrievers import BaseRetriever
from llama_index.core.schema import NodeWithScore, QueryBundle
from typing import List

class CustomRetriever(BaseRetriever):
    """Custom retriever implementation."""

    def __init__(self, index, top_k=5):
        """Initialize with index."""
        self.index = index
        self.top_k = top_k

    def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        """Custom retrieval logic."""
        # Implement custom retrieval strategy
        vector_retriever = self.index.as_retriever(similarity_top_k=self.top_k)
        vector_nodes = vector_retriever.retrieve(query_bundle)

        # Add custom logic, e.g., re-ranking
        reranked_nodes = self._rerank_nodes(vector_nodes, query_bundle)

        return reranked_nodes

    def _rerank_nodes(self, nodes: List[NodeWithScore], query_bundle: QueryBundle) -> List[NodeWithScore]:
        """Rerank nodes based on custom criteria."""
        # Example: Boost nodes with certain metadata
        for node in nodes:
            if "important" in node.node.metadata.get("tags", []):
                node.score += 0.2

        # Sort by updated scores
        return sorted(nodes, key=lambda x: x.score, reverse=True)

# Use custom retriever
custom_retriever = CustomRetriever(index, top_k=10)
query_engine = RetrieverQueryEngine.from_args(retriever=custom_retriever)
response = query_engine.query("What are the most important concepts?")

Síntesis de respuesta personalizada

from llama_index.core.response_synthesizers import BaseSynthesizer
from llama_index.core.schema import NodeWithScore, QueryBundle, Response

class CustomResponseSynthesizer(BaseSynthesizer):
    """Custom response synthesizer."""

    def __init__(self, llm=None):
        """Initialize with optional LLM."""
        super().__init__(llm=llm)

    def synthesize(
        self,
        query_bundle: QueryBundle,
        nodes: List[NodeWithScore],
        **kwargs
    ) -> Response:
        """Synthesize response from nodes."""
        # Extract relevant text from nodes
        context_texts = [node.node.text for node in nodes]
        context_str = "\n\n".join(context_texts)

        # Create custom prompt
        prompt_template = (
            "Based on the following context information, answer the query.\n"
            "Context:\n\\\\{context_str\\\\}\n\n"
            "Query: \\\\{query_str\\\\}\n\n"
            "Answer: "
        )

        prompt = prompt_template.format(
            context_str=context_str,
            query_str=query_bundle.query_str
        )

        # Generate response using LLM
        llm_response = self._llm.complete(prompt)

        # Create response object
        response = Response(
            response=llm_response.text,
            source_nodes=nodes,
            metadata=\\\\{
                "prompt": prompt,
                "node_count": len(nodes)
            \\\\}
        )

        return response

# Use custom response synthesizer
from llama_index.core.query_engine import RetrieverQueryEngine

custom_synthesizer = CustomResponseSynthesizer()
query_engine = RetrieverQueryEngine.from_args(
    retriever=index.as_retriever(),
    response_synthesizer=custom_synthesizer
)

response = query_engine.query("Explain the key concepts in the documents.")

Prompts personalizados

from llama_index.core import PromptTemplate
from llama_index.core.prompts import PromptType

# Define custom text QA prompt
text_qa_template = PromptTemplate(
    """You are an expert assistant. Answer the question based on the provided context.

    Context:
    \\\\{context_str\\\\}

    Question:
    \\\\{query_str\\\\}

    Answer the question with a detailed explanation. If the answer cannot be found in the context,
    state "I don't have enough information to answer this question." and suggest what additional
    information would be needed.

    Answer:""",
    prompt_type=PromptType.QUESTION_ANSWER
)

# Define custom refine prompt
refine_template = PromptTemplate(
    """You are an expert assistant. Refine the existing answer based on new context.

    Existing Answer:
    \\\\{existing_answer\\\\}

    New Context:
    \\\\{context_msg\\\\}

    Question:
    \\\\{query_str\\\\}

    Refine the existing answer to improve it. If the new context doesn't provide relevant information,
    keep the existing answer as is.

    Refined Answer:""",
    prompt_type=PromptType.REFINE
)

# Use custom prompts with query engine
from llama_index.core import VectorStoreIndex

query_engine = index.as_query_engine(
    text_qa_template=text_qa_template,
    refine_template=refine_template,
    response_mode="refine"
)

response = query_engine.query("What are the main applications of LlamaIndex?")

Evaluación

from llama_index.core.evaluation import (
    FaithfulnessEvaluator,
    RelevancyEvaluator,
    CorrectnessEvaluator,
    BatchEvalRunner
)

# Create evaluators
faithfulness_evaluator = FaithfulnessEvaluator()
relevancy_evaluator = RelevancyEvaluator()
correctness_evaluator = CorrectnessEvaluator()

# Create query engine
query_engine = index.as_query_engine()

# Define evaluation questions and ground truth
eval_questions = [
    "What is retrieval-augmented generation?",
    "How does LlamaIndex handle document ingestion?",
    "What are the main components of a RAG pipeline?"
]

ground_truths = [
    "Retrieval-augmented generation (RAG) is a technique that enhances LLMs by retrieving external knowledge.",
    "LlamaIndex handles document ingestion through document loaders, text splitters, and embedding generation.",
    "The main components of a RAG pipeline include data ingestion, indexing, retrieval, and response generation."
]

# Run batch evaluation
batch_runner = BatchEvalRunner(
    evaluators=[faithfulness_evaluator, relevancy_evaluator, correctness_evaluator],
    workers=2
)

eval_results = batch_runner.evaluate_queries(
    query_engine=query_engine,
    queries=eval_questions,
    ground_truths=ground_truths
)

# Analyze results
for i, result in enumerate(eval_results):
    print(f"Question: \\\\{eval_questions[i]\\\\}")
    print(f"Faithfulness: \\\\{result[0].score\\\\}")
    print(f"Relevancy: \\\\{result[1].score\\\\}")
    print(f"Correctness: \\\\{result[2].score\\\\}")
    print("---")

Integración

Vector Store Integrations

# Chroma
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb

chroma_client = chromadb.PersistentClient("./chroma_db")
chroma_collection = chroma_client.get_or_create_collection("my_collection")
chroma_vector_store = ChromaVectorStore(chroma_collection=chroma_collection)

# Pinecone
from llama_index.vector_stores.pinecone import PineconeVectorStore
import pinecone

pinecone.init(api_key="your-api-key", environment="your-environment")
pinecone_index = pinecone.Index("your-index-name")
pinecone_vector_store = PineconeVectorStore(pinecone_index=pinecone_index)

# Weaviate
from llama_index.vector_stores.weaviate import WeaviateVectorStore
import weaviate

weaviate_client = weaviate.Client("http://localhost:8080")
weaviate_vector_store = WeaviateVectorStore(
    weaviate_client=weaviate_client,
    index_name="LlamaIndex"
)

# Create index with vector store
from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(
    documents,
    vector_store=chroma_vector_store  # Or any other vector store
)

LLM Integración

# OpenAI
from llama_index.llms.openai import OpenAI

openai_llm = OpenAI(model="gpt-4", temperature=0.1)

# Anthropic
from llama_index.llms.anthropic import Anthropic

anthropic_llm = Anthropic(model="claude-3-sonnet-20240229", temperature=0.2)

# Hugging Face
from llama_index.llms.huggingface import HuggingFaceLLM

huggingface_llm = HuggingFaceLLM(
    model_name="mistralai/Mistral-7B-Instruct-v0.2",
    tokenizer_name="mistralai/Mistral-7B-Instruct-v0.2",
    context_window=4096,
    max_new_tokens=256,
    generate_kwargs=\\\\{"temperature": 0.7, "do_sample": True\\\\}
)

# Set as default LLM
from llama_index.core import Settings

Settings.llm = openai_llm

Integraciones de integración

# OpenAI Embeddings
from llama_index.embeddings.openai import OpenAIEmbedding

openai_embed_model = OpenAIEmbedding(
    model="text-embedding-ada-002",
    embed_batch_size=100
)

# Hugging Face Embeddings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

huggingface_embed_model = HuggingFaceEmbedding(
    model_name="BAAI/bge-large-en-v1.5"
)

# Set as default embedding model
from llama_index.core import Settings

Settings.embed_model = openai_embed_model

Integración de motores de chat

from llama_index.core import VectorStoreIndex
from llama_index.core.chat_engine import ContextChatEngine, CondenseQuestionChatEngine

# Create index
index = VectorStoreIndex.from_documents(documents)

# Simple context chat engine
context_chat_engine = index.as_chat_engine(
    chat_mode="context",
    system_prompt="You are a helpful assistant that answers questions based on the provided context."
)

# Condense question chat engine (for better handling of chat history)
condense_chat_engine = index.as_chat_engine(
    chat_mode="condense_question",
    system_prompt="You are a helpful assistant that answers questions based on the provided context."
)

# Chat with history
from llama_index.core.schema import ChatMessage

messages = [
    ChatMessage(role="user", content="What is LlamaIndex?"),
    ChatMessage(role="assistant", content="LlamaIndex is a data framework for LLM applications."),
    ChatMessage(role="user", content="What are its main features?")
]

response = condense_chat_engine.chat(
    message="What are its main features?",
    chat_history=messages[:-1]  # Exclude the last message
)

Despliegue de la producción

Caching y Optimización

from llama_index.core import Settings, VectorStoreIndex
from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler
import diskcache

# Set up caching
cache = diskcache.Cache("./cache")

def memoize_embed(fn):
    """Memoize embedding function."""
    def wrapper(texts):
        results = []
        uncached_texts = []
        uncached_indices = []

        # Check cache for each text
        for i, text in enumerate(texts):
            cache_key = f"embed_\\\\{hash(text)\\\\}"
            if cache_key in cache:
                results.append(cache[cache_key])
            else:
                uncached_texts.append(text)
                uncached_indices.append(i)

        # Compute embeddings for uncached texts
        if uncached_texts:
            uncached_embeddings = fn(uncached_texts)

            # Store in cache and results
            for i, embedding in zip(uncached_indices, uncached_embeddings):
                cache_key = f"embed_\\\\{hash(texts[i])\\\\}"
                cache[cache_key] = embedding
                results.append(embedding)

        return results

    return wrapper

# Apply caching to embedding model
from llama_index.embeddings.openai import OpenAIEmbedding

embed_model = OpenAIEmbedding()
embed_model.get_text_embedding_batch = memoize_embed(embed_model.get_text_embedding_batch)

# Set up debug handler
debug_handler = LlamaDebugHandler()
callback_manager = CallbackManager([debug_handler])

# Configure settings with optimizations
Settings.embed_model = embed_model
Settings.callback_manager = callback_manager
Settings.chunk_size = 512  # Smaller chunks for efficiency
Settings.chunk_overlap = 50

Implementación de API con FastAPI

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import List, Optional
from llama_index.core import VectorStoreIndex, StorageContext, load_index_from_storage

app = FastAPI()

# Load index at startup
@app.on_event("startup")
async def startup_event():
    global query_engine
    try:
        storage_context = StorageContext.from_defaults(persist_dir="./storage")
        index = load_index_from_storage(storage_context)
        query_engine = index.as_query_engine()
    except Exception as e:
        print(f"Error loading index: \\\\{e\\\\}")
        raise

# Define request/response models
class QueryRequest(BaseModel):
    query: str
    filters: Optional[dict] = None
    top_k: Optional[int] = 5

class SourceNode(BaseModel):
    text: str
    metadata: dict
    score: float

class QueryResponse(BaseModel):
    answer: str
    sources: List[SourceNode]

# Query endpoint
@app.post("/query", response_model=QueryResponse)
async def query(request: QueryRequest):
    try:
        # Apply filters if provided
        if request.filters:
            from llama_index.core.vector_stores import MetadataFilters, MetadataFilter

            filters = []
            for key, value in request.filters.items():
                filters.append(MetadataFilter(key=key, value=value))

            metadata_filters = MetadataFilters(filters=filters)
            retriever = query_engine.index.as_retriever(
                similarity_top_k=request.top_k,
                filters=metadata_filters
            )
            custom_query_engine = query_engine.index.as_query_engine(
                retriever=retriever
            )
            response = custom_query_engine.query(request.query)
        else:
            response = query_engine.query(request.query)

        # Format source nodes
        sources = []
        for node in response.source_nodes:
            sources.append(
                SourceNode(
                    text=node.node.text,
                    metadata=node.node.metadata,
                    score=node.score
                )
            )

        return QueryResponse(
            answer=str(response),
            sources=sources
        )
    except Exception as e:
        raise HTTPException(status_code=500, detail=f"Error processing query: \\\\{str(e)\\\\}")

# Chat endpoint
class ChatRequest(BaseModel):
    message: str
    chat_history: Optional[List[dict]] = None

@app.post("/chat")
async def chat(request: ChatRequest):
    try:
        from llama_index.core.schema import ChatMessage

        # Convert chat history to ChatMessage objects
        chat_history = []
        if request.chat_history:
            for msg in request.chat_history:
                chat_history.append(
                    ChatMessage(role=msg["role"], content=msg["content"])
                )

        # Create chat engine if not exists
        if not hasattr(app.state, "chat_engine"):
            app.state.chat_engine = query_engine.index.as_chat_engine(
                chat_mode="condense_question"
            )

        # Get response
        response = app.state.chat_engine.chat(
            message=request.message,
            chat_history=chat_history
        )

        return \\\\{
            "response": response.response,
            "sources": [
                \\\\{
                    "text": node.node.text,
                    "metadata": node.node.metadata,
                    "score": node.score
                \\\\}
                for node in response.source_nodes
            ] if hasattr(response, "source_nodes") else []
        \\\\}
    except Exception as e:
        raise HTTPException(status_code=500, detail=f"Error processing chat: \\\\{str(e)\\\\}")

Docker Deployment

# Dockerfile for LlamaIndex application
FROM python:3.10-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application code
COPY . .

# Create directories for data and storage
RUN mkdir -p ./data/documents ./storage ./cache

# Set environment variables
ENV PYTHONPATH=/app
ENV OPENAI_API_KEY=$\\\\{OPENAI_API_KEY\\\\}

# Expose port
EXPOSE 8000

# Health check
HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 \
  CMD curl -f http://localhost:8000/health||exit 1

# Start application
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
# docker-compose.yml
version: '3.8'

services:
  llamaindex-app:
    build: .
    ports:
      - "8000:8000"
    environment:
      - OPENAI_API_KEY=$\\\\{OPENAI_API_KEY\\\\}
    volumes:
      - ./data:/app/data
      - ./storage:/app/storage
      - ./cache:/app/cache
    restart: unless-stopped

  chroma-db:
    image: ghcr.io/chroma-core/chroma:latest
    ports:
      - "8001:8000"
    volumes:
      - ./chroma_data:/chroma/chroma
    environment:
      - CHROMA_DB_IMPL=duckdb+parquet
      - CHROMA_PERSIST_DIRECTORY=/chroma/chroma
    restart: unless-stopped

Mejores prácticas y patrones

Proceso de documentos

  • ** Optimización del tamaño de la cama**: Ajuste el tamaño del trozo basado en el tipo de contenido y patrones de consulta
  • Enriquecimiento de metadatos: Añadir metadatos ricos a documentos para mejor filtrado y recuperación
  • Preprocesamiento: Limpiar y normalizar el texto antes de indexar
  • Herario Chunking: Use relaciones entre padres e hijos para mejorar la preservación del contexto

Estrategias de recuperación

  • Hybrid Search: Combine la búsqueda de vectores y palabras clave para obtener mejores resultados
  • Realizar: Aplicar la recuperación después de la recuperación para mejorar la pertinencia
  • Metadatos Filtrando: Use metadatos para reducir el espacio de búsqueda
  • Multi-Index Retrieval: Consultar índices múltiples para resultados completos

Generación de respuesta

  • Modos de respuesta: Elija modos de respuesta adecuados (compactar, refinar, árbol_summarize)
  • Prontos de clientes: Previsiones de Tailor para casos de uso específico
  • Atribución de la fuente: Incluir la información de la fuente en las respuestas
  • Streaming*: Utilizar streaming para una mejor experiencia de usuario con respuestas largas

Optimización del rendimiento

  • Caching: Implementar caching for embeddings and LLM responses
  • Procesamiento de lotes: Documentos de proceso en lotes
  • ** Operaciones anónimas**: Utilizar API de asinc para operaciones no de bloqueo
  • Index Pruning: Índices regularmente limpios y optimizados

Supervisión y evaluación

  • Logging: Implementar un registro completo
  • Metrices de evaluación: Rastrear la pertinencia, la fidelidad y la corrección
  • Retroalimentación del usuario: Recoger e incorporar la información del usuario
  • A/B Testing: Compare diferentes configuraciones

Solución de problemas

Cuestiones comunes

Pobre calidad de recuperación

  • Causa: Inapropiado tamaño del trozo, incrustaciones deficientes o contexto insuficiente
  • Solución: Ajuste el tamaño del trozo, pruebe diferentes modelos de embedding o implemente reranking

Alto ritmo

  • Causa: Grandes índices, consultas complejas o recuperación ineficiente
  • Solución: Implementar el caché, optimizar el tamaño del trozo, o utilizar tiendas vectoriales más eficientes

Cuestiones de memoria

  • Causa: Cargar demasiados documentos o incrustaciones en memoria
  • Solución: Use tiendas vectoriales basadas en disco, documentos de proceso en lotes o implemente streaming

Alucinaciones

  • Causa: Insuficiente contexto, mala recuperación o limitaciones de LLM
  • Solución: Mejorar la calidad de recuperación, ajustar los impulsos o realizar la verificación de hechos

-...

*Esta completa hoja de trampolín de LlamaIndex proporciona todo lo necesario para construir aplicaciones RAG sofisticadas. Desde la configuración básica hasta los patrones avanzados de implementación de la producción, utilice estos ejemplos y mejores prácticas para crear potentes aplicaciones de IA con el marco flexible de LlamaIndex. *