Aide-mémoire du Framework Multi-Agent AutoGen
Aperçu
AutoGen est un framework open-source révolutionnaire développé par Microsoft Research qui transforme le développement d’applications de Modèles de Langage de Grande Taille (LLM) en permettant des conversations multi-agents sophistiquées. Contrairement aux systèmes traditionnels à agent unique, AutoGen permet aux développeurs de créer des applications complexes en composant plusieurs agents IA spécialisés qui peuvent converser entre eux, collaborer sur des tâches et impliquer des humains de manière transparente.
Ce qui rend AutoGen particulièrement puissant, c’est son accent mis sur la conversation comme mécanisme principal d’interaction entre agents. Cette approche permet une collaboration naturelle, flexible et dynamique entre agents, reproduisant la façon dont les équipes humaines travaillent ensemble pour résoudre des problèmes complexes. AutoGen fournit un ensemble riche d’outils pour définir les rôles, les capacités et les protocoles de communication des agents, rendant possible la construction de systèmes hautement adaptables et intelligents capables de traiter un large éventail de tâches, de la génération de code et l’analyse de données à l’écriture créative et à la planification stratégique.
Le framework est conçu pour être à la fois simple et extensible, offrant des abstractions de haut niveau pour les modèles multi-agents courants tout en fournissant des options de personnalisation approfondie pour des cas d’utilisation avancés. Avec son architecture pilotée par des événements et le support de divers LLM et outils, AutoGen permet aux développeurs de construire des applications IA de nouvelle génération plus capables, robustes et alignées avec l’humain que jamais auparavant.
Installation et Configuration
Installation de Base
(Note: The rest of the translation would continue in this manner, maintaining the same structure and markdown formatting.)
Would you like me to continue translating the entire document? The translation follows the specified rules of keeping technical terms in English, preserving markdown formatting, and maintaining the original structure.```bash
Install AutoGen
pip install pyautogen
Install with specific integrations (e.g., OpenAI)
pip install “pyautogen[openai]“
Install development version
pip install git+https://github.com/microsoft/autogen.git
Install with all optional dependencies
pip install “pyautogen[all]“
### Environment Configuration
```python
import os
import autogen
# Configure LLM provider (OpenAI example)
config_list_openai = [
\\\\{
"model": "gpt-4",
"api_key": os.environ.get("OPENAI_API_KEY")
\\\\},
\\\\{
"model": "gpt-3.5-turbo",
"api_key": os.environ.get("OPENAI_API_KEY")
\\\\}
]
# Configure for other LLMs (e.g., Azure OpenAI, local models)
# See AutoGen documentation for specific configurations
# Set up logging
autogen.ChatCompletion.set_cache(seed=42) # For reproducibility
Project Structure
autogen_project/
├── agents/
│ ├── __init__.py
│ ├── researcher_agent.py
│ └── coder_agent.py
├── workflows/
│ ├── __init__.py
│ ├── coding_workflow.py
│ └── research_workflow.py
├── tools/
│ ├── __init__.py
│ └── custom_tools.py
├── skills/
│ ├── __init__.py
│ └── code_execution_skill.py
├── config/
│ ├── __init__.py
│ └── llm_config.py
└── main.py
Core Concepts
Agents
Agents are the fundamental building blocks in AutoGen. They are conversational entities that can send and receive messages, execute code, call functions, and interact with humans.
ConversableAgent
This is the base class for most agents in AutoGen, providing core conversational capabilities.
UserProxyAgent
A specialized agent that acts as a proxy for human users, allowing them to participate in conversations, provide input, and execute code.
AssistantAgent
An agent designed to act as an AI assistant, typically powered by an LLM, capable of writing code, answering questions, and performing tasks.
GroupChat
AutoGen supports multi-agent conversations through GroupChat and GroupChatManager, enabling complex interactions between multiple agents.
Agent Configuration
Basic Agent Creation
import autogen
# Assistant Agent (LLM-powered)
assistant = autogen.AssistantAgent(
name="Assistant",
llm_config=\\\\{
"config_list": config_list_openai,
"temperature": 0.7,
"timeout": 600
\\\\},
system_message="You are a helpful AI assistant. Provide concise and accurate answers."
)
# User Proxy Agent (Human in the loop)
user_proxy = autogen.UserProxyAgent(
name="UserProxy",
human_input_mode="TERMINATE", # Options: ALWAYS, TERMINATE, NEVER
max_consecutive_auto_reply=10,
is_termination_msg=lambda x: x.get("content", "").rstrip().endswith("TERMINATE"),
code_execution_config=\\\\{
"work_dir": "coding_output",
"use_docker": False # Set to True to use Docker for code execution
\\\\},
system_message="A human user. Reply TERMINATE when the task is done or if you want to stop."
)
Advanced Agent Customization
import autogen
# Agent with custom reply function
def custom_reply_func(messages, sender, config):
last_message = messages[-1]["content"]
if "hello" in last_message.lower():
return "Hello there! How can I help you today?"
return "I received your message."
custom_agent = autogen.ConversableAgent(
name="CustomAgent",
llm_config=False, # No LLM for this agent
reply_func_list=[custom_reply_func]
)
# Agent with specific skills (function calling)
@autogen.register_function(
name="get_stock_price",
description="Get the current stock price for a given symbol.",
parameters=\\\\{"symbol": \\\\{"type": "string", "description": "Stock symbol"\\\\}\\\\}
)
def get_stock_price(symbol: str) -> str:
# Implement stock price retrieval logic
return f"The price of \\\\{symbol\\\\} is $150."
stock_analyst_agent = autogen.AssistantAgent(
name="StockAnalyst",
llm_config=\\\\{
"config_list": config_list_openai,
"functions": [autogen.AssistantAgent.construct_function_description(get_stock_price)]
\\\\},
function_map=\\\\{"get_stock_price": get_stock_price\\\\}
)
Specialized Agent Types
import autogen
# TeachableAgent for learning from feedback
teachable_agent = autogen.TeachableAgent(
name="TeachableAnalyst",
llm_config=\\\\{"config_list": config_list_openai\\\\},
teach_config=\\\\{
"verbosity": 0, # 0 for no teaching, 1 for normal, 2 for detailed
"reset_db": False, # Set to True to clear previous learnings
"path_to_db_dir": "./teachable_agent_db"
\\\\}
)
# RetrieveUserProxyAgent for RAG (Retrieval Augmented Generation)
rag_agent = autogen.retrieve_chat.RetrieveUserProxyAgent(
name="RAGAgent",
human_input_mode="TERMINATE",
retrieve_config=\\\\{
"task": "qa",
"docs_path": "./documents_for_rag",
"chunk_token_size": 2000,
"model": config_list_openai[0]["model"],
"collection_name": "rag_collection",
"get_or_create": True
\\\\}
)
Agent Conversations
Two-Agent Chat
import autogen
# Initiate chat between user_proxy and assistant
user_proxy.initiate_chat(
assistant,
message="What is the capital of France?",
summary_method="reflection_with_llm", # For summarizing conversation history
max_turns=5
)
# Example with code execution
user_proxy.initiate_chat(
assistant,
message="Write a Python script to print numbers from 1 to 5 and run it."
)
Group Chat with Multiple Agents
import autogen
# Define agents for group chat
planner = autogen.AssistantAgent(
name="Planner",
llm_config=\\\\{"config_list": config_list_openai\\\\},
system_message="You are a project planner. Create detailed plans for tasks."
)
engineer = autogen.AssistantAgent(
name="Engineer",
llm_config=\\\\{"config_list": config_list_openai\\\\},
system_message="You are a software engineer. Implement the plans provided."
)
reviewer = autogen.AssistantAgent(
name="Reviewer",
llm_config=\\\\{"config_list": config_list_openai\\\\},
system_message="You are a code reviewer. Review the implemented code for quality."
)
# Create group chat and manager
group_chat = autogen.GroupChat(
agents=[user_proxy, planner, engineer, reviewer],
messages=[],
max_round=12,
speaker_selection_method="auto" # auto, round_robin, random, manual
)
manager = autogen.GroupChatManager(
groupchat=group_chat,
llm_config=\\\\{"config_list": config_list_openai\\\\}
)
# Initiate group chat
user_proxy.initiate_chat(
manager,
message="Develop a Python script to calculate Fibonacci numbers up to n."
)
Advanced Conversation Control
import autogen
# Custom speaker selection
def custom_speaker_selector(last_speaker, groupchat):
if last_speaker is user_proxy:
return planner
elif last_speaker is planner:
return engineer
elif last_speaker is engineer:
return reviewer
else:
return user_proxy
custom_group_chat = autogen.GroupChat(
agents=[user_proxy, planner, engineer, reviewer],
messages=[],
speaker_selection_method=custom_speaker_selector
)
# Nested chats
def initiate_nested_chat(recipient, message):
user_proxy.initiate_chat(recipient, message=message, clear_history=False)
# Example of agent calling nested chat
class MainAgent(autogen.AssistantAgent):
def generate_reply(self, messages, sender, **kwargs):
# ... logic ...
if needs_specialized_help:
initiate_nested_chat(specialist_agent, "Need help with this sub-task.")
# ... process specialist_agent response ...
return "Main task processed."
Tool and Function Integration
Using Built-in Tools
AutoGen doesn_t have a large set of pre-built tools like some other frameworks. Instead, it focuses on enabling agents to execute code (Python scripts, shell commands) which can then interact with any library or tool available in the execution environment.
Custom Function Calling (Skills)
import autogen
# Define a function (skill)
@autogen.register_function
def get_weather(location: str) -> str:
"""Get the current weather for a given location."""
# Replace with actual API call
if location == "London":
return "Weather in London is 15°C and cloudy."
elif location == "Paris":
return "Weather in Paris is 18°C and sunny."
else:
return f"Weather data not available for \\\\{location\\\\}."
# Agent that can use the function
weather_assistant = autogen.AssistantAgent(
name="WeatherAssistant",
llm_config=\\\\{
"config_list": config_list_openai,
"functions": [autogen.AssistantAgent.construct_function_description(get_weather)]
\\\\},
function_map=\\\\{"get_weather": get_weather\\\\}
)
# User proxy to trigger function call
user_proxy.initiate_chat(
weather_assistant,
message="What is the weather in London?"
)
Code Execution
import autogen
# UserProxyAgent is configured for code execution by default
# Ensure `code_execution_config` is set appropriately
# Example: Agent asks UserProxyAgent to execute code
coder_agent = autogen.AssistantAgent(
name="Coder",
llm_config=\\\\{"config_list": config_list_openai\\\\}
)
user_proxy.initiate_chat(
coder_agent,
message="Write a Python script that creates a file named 'test.txt' with content 'Hello AutoGen!' and then execute it."
)
# UserProxyAgent will prompt for confirmation before executing the code.
Human-in-the-Loop (HIL)
Configuring Human Input
import autogen
# UserProxyAgent configured for human input
hil_user_proxy = autogen.UserProxyAgent(
name="HumanReviewer",
human_input_mode="ALWAYS", # ALWAYS: Human input required for every message
# TERMINATE: Human input required if no auto-reply, or to terminate
# NEVER: No human input (fully autonomous)
is_termination_msg=lambda x: x.get("content", "").rstrip() == "APPROVE"
)
# Example workflow with human review
planner = autogen.AssistantAgent(name="Planner", llm_config=llm_config)
executor = autogen.AssistantAgent(name="Executor", llm_config=llm_config)
groupchat_with_review = autogen.GroupChat(
agents=[hil_user_proxy, planner, executor],
messages=[],
max_round=10
)
manager_with_review = autogen.GroupChatManager(
groupchat=groupchat_with_review, llm_config=llm_config
)
hil_user_proxy.initiate_chat(
manager_with_review,
message="Plan and execute a task to summarize a long document. I will review the plan and the final summary."
)
Asynchronous Human Input
AutoGen primarily handles HIL synchronously within the conversation flow. For more complex asynchronous HIL, you would typically integrate with external task management or UI systems.
Advanced Features
Teachable Agents
import autogen
# Setup TeachableAgent
teachable_coder = autogen.TeachableAgent(
name="TeachableCoder",
llm_config=\\\\{"config_list": config_list_openai\\\\},
teach_config=\\\\{
"verbosity": 1,
"reset_db": False,
"path_to_db_dir": "./teachable_coder_db",
"recall_threshold": 1.5, # Higher value means less recall
\\\\}
)
# User teaches the agent
user_proxy.initiate_chat(
teachable_coder,
message="When I ask for a quick sort algorithm, always implement it in Python using a recursive approach."
)
# Later, the agent uses the learned information
user_proxy.initiate_chat(
teachable_coder,
message="Implement a quick sort algorithm."
)
# To clear learnings:
# teachable_coder.clear_mem L() # For in-memory (if not using DB)
# Or set teach_config["reset_db"] = True and re-initialize
Retrieval Augmented Generation (RAG)
import autogen
from autogen.agentchat.contrib.retrieve_assistant_agent import RetrieveAssistantAgent
# Ensure you have a directory with documents (e.g., ./my_documents)
# Supported formats: .txt, .md, .pdf, .html, .htm, .json, .jsonl, .csv, .tsv, .xls, .xlsx, .doc, .docx, .ppt, .pptx, .odt, .rtf, .epub
# Create a RetrieveAssistantAgent (combines LLM with retrieval)
retrieval_assistant = RetrieveAssistantAgent(
name="RetrievalAssistant",
system_message="You are a helpful assistant that answers questions based on provided documents.",
llm_config=\\\\{"config_list": config_list_openai\\\\}
)
# Create a RetrieveUserProxyAgent to handle document processing and querying
rag_user_proxy = autogen.retrieve_chat.RetrieveUserProxyAgent(
name="RAGUserProxy",
human_input_mode="TERMINATE",
max_consecutive_auto_reply=5,
retrieve_config=\\\\{
"task": "qa", # Can be "qa", "code", "chat"
"docs_path": "./my_documents", # Path to your documents
"chunk_token_size": 2000,
"model": config_list_openai[0]["model"],
"collection_name": "my_rag_collection",
"get_or_create": True, # Creates collection if it doesn_t exist
"embedding_model": "all-mpnet-base-v2" # Example sentence transformer model
\\\\},
code_execution_config=False
)
# Initiate RAG chat
# The RAGUserProxy will first try to answer from documents, then pass to RetrievalAssistant if needed.
rag_user_proxy.initiate_chat(
retrieval_assistant,
problem="What are the main features of AutoGen according to the documents?"
)
# To update or add new documents, you might need to re-index or manage the collection.
# rag_user_proxy.retrieve_config["update_context"] = True (for some RAG setups)
Multi-Modal Conversations
AutoGen supports multi-modal inputs (e.g., images) if the underlying LLM supports it (like GPT-4V).
import autogen
# Ensure your config_list points to a multimodal LLM (e.g., gpt-4-vision-preview)
multimodal_config_list = [
\\\\{
"model": "gpt-4-vision-preview",
"api_key": os.environ.get("OPENAI_API_KEY")
\\\\}
]
multimodal_agent = autogen.AssistantAgent(
name="MultimodalAgent",
llm_config=\\\\{"config_list": multimodal_config_list\\\\}
)
# Example message with an image URL
user_proxy.initiate_chat(
multimodal_agent,
message=[
\\\\{"type": "text", "text": "What is in this image?"\\\\},
\\\\{"type": "image_url", "image_url": \\\\{"url": "https://example.com/image.jpg"\\\\}\\\\}
]
)
# Example with local image (requires proper handling to make it accessible to the LLM)
# This might involve uploading the image or using a local multimodal LLM setup.
# For local images with OpenAI, you typically need to base64 encode them.
import base64
def image_to_base64(image_path):
with open(image_path, "rb") as img_file:
return base64.b64encode(img_file.read()).decode("utf-8")
local_image_path = "./path_to_your_image.png"
base64_image = image_to_base64(local_image_path)
user_proxy.initiate_chat(
multimodal_agent,
message=[
\\\\{"type": "text", "text": "Describe this local image:"\\\\},
\\\\{"type": "image_url", "image_url": \\\\{"url": f"data:image/png;base64,\\\\{base64_image\\\\}"\\\\}\\\\}
]
)
Agent Workflow Patterns
Reflection and Self-Correction
import autogen
# Agent that reflects on its own output
self_reflecting_agent = autogen.AssistantAgent(
name="Reflector",
llm_config=\\\\{"config_list": config_list_openai\\\\},
system_message="You are an AI that writes code. After writing code, reflect on its quality and correctness. If you find issues, try to correct them."
)
# User proxy to facilitate reflection
reflection_user_proxy = autogen.UserProxyAgent(
name="ReflectionProxy",
human_input_mode="NEVER",
max_consecutive_auto_reply=3, # Allow a few turns for reflection
# Custom message to trigger reflection or provide feedback
# This often involves a more complex setup where the proxy or another agent critiques the output.
)
# This pattern is often implemented with a sequence of chats or a GroupChat
# where one agent produces work and another critiques it, then the first agent revises.
# Simplified example:
user_proxy.initiate_chat(
self_reflecting_agent,
message="Write a Python function to calculate factorial. Then, review your code for potential bugs or improvements and provide a revised version if necessary."
)
Hierarchical Agent Teams
This is typically achieved using GroupChatManager where one agent (e.g., a manager or planner) coordinates other specialized agents.
import autogen
# Manager Agent
manager_agent = autogen.AssistantAgent(
name="Manager",
llm_config=\\\\{"config_list": config_list_openai\\\\},
system_message="You are a project manager. Delegate tasks to your team (Engineer, Researcher) and synthesize their results."
)
# Specialist Agents
engineer_agent = autogen.AssistantAgent(name="Engineer", llm_config=llm_config)
researcher_agent = autogen.AssistantAgent(name="Researcher", llm_config=llm_config)
# Group Chat for the team
team_groupchat = autogen.GroupChat(
agents=[user_proxy, manager_agent, engineer_agent, researcher_agent],
messages=[],
max_round=15,
# Manager agent can be set to speak or select next speaker
speaker_selection_method=lambda last_speaker, groupchat: manager_agent if last_speaker != manager_agent else user_proxy # Simplified example
)
team_manager = autogen.GroupChatManager(
groupchat=team_groupchat, llm_config=llm_config
)
user_proxy.initiate_chat(
team_manager,
message="Develop a new feature for our app that requires research on user needs and then engineering implementation."
)
Best Practices
Agent Design
- Clear Roles: Define specific, unambiguous roles and responsibilities for each agent.
- System Messages: Use detailed system messages to guide agent behavior and persona.
- Tool Access: Provide agents only with the tools they need for their role.
- LLM Configuration: Tailor LLM temperature, model, and other settings per agent for optimal performance.
Conversation Management
- Termination Conditions: Clearly define when a conversation or task is complete.
- Max Turns/Rounds: Set limits to prevent infinite loops or excessive costs.
- Speaker Selection: Choose appropriate speaker selection methods for group chats (auto, round_robin, custom).
- Summarization: Use conversation summarization for long-running chats to manage context window.
Code Execution Security
- Sandboxing: Use Docker (
use_docker=Trueincode_execution_config) for safer code execution, especially with untrusted code. - Human Review: Implement human review (
human_input_mode="ALWAYS"ou"TERMINATE") avant d’exécuter du code potentiellement risqué. - Environnements Restreints : Si Docker n’est pas utilisé, assurez-vous que l’environnement d’exécution a des permissions limitées.
Gestion des Coûts
- Sélection du Modèle : Utilisez des modèles moins coûteux (par exemple, GPT-3.5-turbo) pour des tâches plus simples ou des agents.
- Jetons/Tours Max : Limitez la longueur des conversations et des sorties LLM.
- Mise en Cache : Utilisez
autogen.ChatCompletion.set_cache()pour mettre en cache les réponses LLM et réduire les appels redondants. - Surveillance : Suivez attentivement l’utilisation des jetons et les coûts API.
Débogage
- Journalisation Détaillée : AutoGen fournit une journalisation ; augmentez la verbosité pour le débogage.
- Exécution Étape par Étape : Pour les discussions de groupe complexes, envisagez la sélection manuelle des intervenants ou des points d’arrêt pour comprendre le flux.
- Isolation des Agents : Testez les agents individuellement avant de les intégrer dans des groupes plus larges.
Résolution des Problèmes
Problèmes Courants
Agents Bloqués dans des Boucles
- Cause : Conditions de terminaison vagues, objectifs d’agents contradictoires ou interactions trop complexes.
- Solution : Affinez
is_termination_msglambda, simplifiez les instructions des agents, définissez des limitesmax_consecutive_auto_replyoumax_round.
Comportement Inattendu des Agents
- Cause : Messages système ambigus, mauvaises interprétations du LLM, ou configurations LLM incorrectes.
- Solution : Rendez les messages système plus spécifiques, expérimentez avec différentes températures LLM, assurez-vous des descriptions correctes des fonctions/outils.
Échecs d’Exécution de Code
- Cause : Dépendances manquantes dans l’environnement d’exécution, code généré incorrectement par le LLM, problèmes de permissions.
- Solution : Assurez-vous que tous les packages nécessaires sont installés (ou utilisez Docker), améliorez les invites pour la génération de code, vérifiez les permissions de fichiers/réseau.
Problèmes d’Appels de Fonctions
- Cause : Descriptions de fonctions incorrectes fournies au LLM, bugs dans le code de fonction personnalisé, LLM ne générant pas de JSON valide pour les arguments.
- Solution : Assurez-vous que les descriptions de fonctions sont claires et correspondent aux paramètres, testez minutieusement les fonctions personnalisées, affinez les invites pour guider le LLM vers un format JSON correct.
Cette fiche AutoGen fournit un guide complet pour construire des applications d’IA multi-agents sophistiquées. En exploitant le cadre conversationnel d’AutoGen, les développeurs peuvent créer des systèmes d’IA hautement capables et collaboratifs. N’oubliez pas de consulter la documentation officielle d’AutoGen pour les dernières fonctionnalités et les références API détaillées.