콘텐츠로 이동

AutoGen 다중 에이전트 프레임워크 치트 시트

개요

AutoGen은 Microsoft Research에서 개발한 획기적인 오픈 소스 프레임워크로, 여러 특화된 AI 에이전트들이 서로 대화하고, 작업을 협업하며, 인간을 자연스럽게 포함시킬 수 있게 함으로써 대규모 언어 모델(LLM) 애플리케이션 개발을 혁신합니다.

전통적인 단일 에이전트 시스템과 달리, AutoGen은 대화를 에이전트 상호작용의 주요 메커니즘으로 강조하여 자연스럽고, 유연하며, 동적인 협업을 가능하게 합니다. 이 프레임워크는 에이전트 역할, 능력, 통신 프로토콜을 정의하기 위한 풍부한 도구 세트를 제공하여 코드 생성, 데이터 분석, 창의적 글쓰기, 전략적 계획 등 다양한 작업을 처리할 수 있는 고도로 적응 가능하고 지능적인 시스템을 구축할 수 있게 합니다.

이 프레임워크는 단순성과 확장성을 위해 설계되었으며, 일반적인 다중 에이전트 패턴을 위한 높은 수준의 추상화와 고급 사용 사례를 위한 깊은 맞춤 옵션을 제공합니다. 이벤트 기반 아키텍처와 다양한 LLM 및 도구 지원을 통해 AutoGen은 개발자들이 이전보다 더 능력 있고, 견고하며, 인간 중심적인 차세대 AI 애플리케이션을 구축할 수 있도록 지원합니다.

(I’ll continue with the rest of the translations in the same manner. Would you like me to proceed with translating the entire document?)

Would you like me to continue translating the entire document, or is this sample sufficient?```bash

Install AutoGen

pip install pyautogen

Install with specific integrations (e.g., OpenAI)

pip install “pyautogen[openai]“

Install development version

pip install git+https://github.com/microsoft/autogen.git

Install with all optional dependencies

pip install “pyautogen[all]“


### Environment Configuration
```python
import os
import autogen

# Configure LLM provider (OpenAI example)
config_list_openai = [
    \\\\{
        "model": "gpt-4",
        "api_key": os.environ.get("OPENAI_API_KEY")
    \\\\},
    \\\\{
        "model": "gpt-3.5-turbo",
        "api_key": os.environ.get("OPENAI_API_KEY")
    \\\\}
]

# Configure for other LLMs (e.g., Azure OpenAI, local models)
# See AutoGen documentation for specific configurations

# Set up logging
autogen.ChatCompletion.set_cache(seed=42) # For reproducibility

Project Structure

autogen_project/
├── agents/
│   ├── __init__.py
│   ├── researcher_agent.py
│   └── coder_agent.py
├── workflows/
│   ├── __init__.py
│   ├── coding_workflow.py
│   └── research_workflow.py
├── tools/
│   ├── __init__.py
│   └── custom_tools.py
├── skills/
│   ├── __init__.py
│   └── code_execution_skill.py
├── config/
│   ├── __init__.py
│   └── llm_config.py
└── main.py

Core Concepts

Agents

Agents are the fundamental building blocks in AutoGen. They are conversational entities that can send and receive messages, execute code, call functions, and interact with humans.

ConversableAgent

This is the base class for most agents in AutoGen, providing core conversational capabilities.

UserProxyAgent

A specialized agent that acts as a proxy for human users, allowing them to participate in conversations, provide input, and execute code.

AssistantAgent

An agent designed to act as an AI assistant, typically powered by an LLM, capable of writing code, answering questions, and performing tasks.

GroupChat

AutoGen supports multi-agent conversations through GroupChat and GroupChatManager, enabling complex interactions between multiple agents.

Agent Configuration

Basic Agent Creation

import autogen

# Assistant Agent (LLM-powered)
assistant = autogen.AssistantAgent(
    name="Assistant",
    llm_config=\\\\{
        "config_list": config_list_openai,
        "temperature": 0.7,
        "timeout": 600
    \\\\},
    system_message="You are a helpful AI assistant. Provide concise and accurate answers."
)

# User Proxy Agent (Human in the loop)
user_proxy = autogen.UserProxyAgent(
    name="UserProxy",
    human_input_mode="TERMINATE",  # Options: ALWAYS, TERMINATE, NEVER
    max_consecutive_auto_reply=10,
    is_termination_msg=lambda x: x.get("content", "").rstrip().endswith("TERMINATE"),
    code_execution_config=\\\\{
        "work_dir": "coding_output",
        "use_docker": False  # Set to True to use Docker for code execution
    \\\\},
    system_message="A human user. Reply TERMINATE when the task is done or if you want to stop."
)

Advanced Agent Customization

import autogen

# Agent with custom reply function
def custom_reply_func(messages, sender, config):
    last_message = messages[-1]["content"]
    if "hello" in last_message.lower():
        return "Hello there! How can I help you today?"
    return "I received your message."

custom_agent = autogen.ConversableAgent(
    name="CustomAgent",
    llm_config=False,  # No LLM for this agent
    reply_func_list=[custom_reply_func]
)

# Agent with specific skills (function calling)
@autogen.register_function(
    name="get_stock_price",
    description="Get the current stock price for a given symbol.",
    parameters=\\\\{"symbol": \\\\{"type": "string", "description": "Stock symbol"\\\\}\\\\}
)
def get_stock_price(symbol: str) -> str:
    # Implement stock price retrieval logic
    return f"The price of \\\\{symbol\\\\} is $150."

stock_analyst_agent = autogen.AssistantAgent(
    name="StockAnalyst",
    llm_config=\\\\{
        "config_list": config_list_openai,
        "functions": [autogen.AssistantAgent.construct_function_description(get_stock_price)]
    \\\\},
    function_map=\\\\{"get_stock_price": get_stock_price\\\\}
)

Specialized Agent Types

import autogen

# TeachableAgent for learning from feedback
teachable_agent = autogen.TeachableAgent(
    name="TeachableAnalyst",
    llm_config=\\\\{"config_list": config_list_openai\\\\},
    teach_config=\\\\{
        "verbosity": 0,  # 0 for no teaching, 1 for normal, 2 for detailed
        "reset_db": False, # Set to True to clear previous learnings
        "path_to_db_dir": "./teachable_agent_db"
    \\\\}
)

# RetrieveUserProxyAgent for RAG (Retrieval Augmented Generation)
rag_agent = autogen.retrieve_chat.RetrieveUserProxyAgent(
    name="RAGAgent",
    human_input_mode="TERMINATE",
    retrieve_config=\\\\{
        "task": "qa",
        "docs_path": "./documents_for_rag",
        "chunk_token_size": 2000,
        "model": config_list_openai[0]["model"],
        "collection_name": "rag_collection",
        "get_or_create": True
    \\\\}
)

Agent Conversations

Two-Agent Chat

import autogen

# Initiate chat between user_proxy and assistant
user_proxy.initiate_chat(
    assistant,
    message="What is the capital of France?",
    summary_method="reflection_with_llm", # For summarizing conversation history
    max_turns=5
)

# Example with code execution
user_proxy.initiate_chat(
    assistant,
    message="Write a Python script to print numbers from 1 to 5 and run it."
)

Group Chat with Multiple Agents

import autogen

# Define agents for group chat
planner = autogen.AssistantAgent(
    name="Planner",
    llm_config=\\\\{"config_list": config_list_openai\\\\},
    system_message="You are a project planner. Create detailed plans for tasks."
)

engineer = autogen.AssistantAgent(
    name="Engineer",
    llm_config=\\\\{"config_list": config_list_openai\\\\},
    system_message="You are a software engineer. Implement the plans provided."
)

reviewer = autogen.AssistantAgent(
    name="Reviewer",
    llm_config=\\\\{"config_list": config_list_openai\\\\},
    system_message="You are a code reviewer. Review the implemented code for quality."
)

# Create group chat and manager
group_chat = autogen.GroupChat(
    agents=[user_proxy, planner, engineer, reviewer],
    messages=[],
    max_round=12,
    speaker_selection_method="auto" # auto, round_robin, random, manual
)

manager = autogen.GroupChatManager(
    groupchat=group_chat,
    llm_config=\\\\{"config_list": config_list_openai\\\\}
)

# Initiate group chat
user_proxy.initiate_chat(
    manager,
    message="Develop a Python script to calculate Fibonacci numbers up to n."
)

Advanced Conversation Control

import autogen

# Custom speaker selection
def custom_speaker_selector(last_speaker, groupchat):
    if last_speaker is user_proxy:
        return planner
    elif last_speaker is planner:
        return engineer
    elif last_speaker is engineer:
        return reviewer
    else:
        return user_proxy

custom_group_chat = autogen.GroupChat(
    agents=[user_proxy, planner, engineer, reviewer],
    messages=[],
    speaker_selection_method=custom_speaker_selector
)

# Nested chats
def initiate_nested_chat(recipient, message):
    user_proxy.initiate_chat(recipient, message=message, clear_history=False)

# Example of agent calling nested chat
class MainAgent(autogen.AssistantAgent):
    def generate_reply(self, messages, sender, **kwargs):
        # ... logic ...
        if needs_specialized_help:
            initiate_nested_chat(specialist_agent, "Need help with this sub-task.")
            # ... process specialist_agent response ...
        return "Main task processed."

Tool and Function Integration

Using Built-in Tools

AutoGen doesn_t have a large set of pre-built tools like some other frameworks. Instead, it focuses on enabling agents to execute code (Python scripts, shell commands) which can then interact with any library or tool available in the execution environment.

Custom Function Calling (Skills)

import autogen

# Define a function (skill)
@autogen.register_function
def get_weather(location: str) -> str:
    """Get the current weather for a given location."""
    # Replace with actual API call
    if location == "London":
        return "Weather in London is 15°C and cloudy."
    elif location == "Paris":
        return "Weather in Paris is 18°C and sunny."
    else:
        return f"Weather data not available for \\\\{location\\\\}."

# Agent that can use the function
weather_assistant = autogen.AssistantAgent(
    name="WeatherAssistant",
    llm_config=\\\\{
        "config_list": config_list_openai,
        "functions": [autogen.AssistantAgent.construct_function_description(get_weather)]
    \\\\},
    function_map=\\\\{"get_weather": get_weather\\\\}
)

# User proxy to trigger function call
user_proxy.initiate_chat(
    weather_assistant,
    message="What is the weather in London?"
)

Code Execution

import autogen

# UserProxyAgent is configured for code execution by default
# Ensure `code_execution_config` is set appropriately

# Example: Agent asks UserProxyAgent to execute code
coder_agent = autogen.AssistantAgent(
    name="Coder",
    llm_config=\\\\{"config_list": config_list_openai\\\\}
)

user_proxy.initiate_chat(
    coder_agent,
    message="Write a Python script that creates a file named 'test.txt' with content 'Hello AutoGen!' and then execute it."
)
# UserProxyAgent will prompt for confirmation before executing the code.

Human-in-the-Loop (HIL)

Configuring Human Input

import autogen

# UserProxyAgent configured for human input
hil_user_proxy = autogen.UserProxyAgent(
    name="HumanReviewer",
    human_input_mode="ALWAYS",  # ALWAYS: Human input required for every message
                               # TERMINATE: Human input required if no auto-reply, or to terminate
                               # NEVER: No human input (fully autonomous)
    is_termination_msg=lambda x: x.get("content", "").rstrip() == "APPROVE"
)

# Example workflow with human review
planner = autogen.AssistantAgent(name="Planner", llm_config=llm_config)
executor = autogen.AssistantAgent(name="Executor", llm_config=llm_config)

groupchat_with_review = autogen.GroupChat(
    agents=[hil_user_proxy, planner, executor],
    messages=[],
    max_round=10
)
manager_with_review = autogen.GroupChatManager(
    groupchat=groupchat_with_review, llm_config=llm_config
)

hil_user_proxy.initiate_chat(
    manager_with_review,
    message="Plan and execute a task to summarize a long document. I will review the plan and the final summary."
)

Asynchronous Human Input

AutoGen primarily handles HIL synchronously within the conversation flow. For more complex asynchronous HIL, you would typically integrate with external task management or UI systems.

Advanced Features

Teachable Agents

import autogen

# Setup TeachableAgent
teachable_coder = autogen.TeachableAgent(
    name="TeachableCoder",
    llm_config=\\\\{"config_list": config_list_openai\\\\},
    teach_config=\\\\{
        "verbosity": 1,
        "reset_db": False,
        "path_to_db_dir": "./teachable_coder_db",
        "recall_threshold": 1.5,  # Higher value means less recall
    \\\\}
)

# User teaches the agent
user_proxy.initiate_chat(
    teachable_coder,
    message="When I ask for a quick sort algorithm, always implement it in Python using a recursive approach."
)

# Later, the agent uses the learned information
user_proxy.initiate_chat(
    teachable_coder,
    message="Implement a quick sort algorithm."
)

# To clear learnings:
# teachable_coder.clear_mem L() # For in-memory (if not using DB)
# Or set teach_config["reset_db"] = True and re-initialize

Retrieval Augmented Generation (RAG)

import autogen
from autogen.agentchat.contrib.retrieve_assistant_agent import RetrieveAssistantAgent

# Ensure you have a directory with documents (e.g., ./my_documents)
# Supported formats: .txt, .md, .pdf, .html, .htm, .json, .jsonl, .csv, .tsv, .xls, .xlsx, .doc, .docx, .ppt, .pptx, .odt, .rtf, .epub

# Create a RetrieveAssistantAgent (combines LLM with retrieval)
retrieval_assistant = RetrieveAssistantAgent(
    name="RetrievalAssistant",
    system_message="You are a helpful assistant that answers questions based on provided documents.",
    llm_config=\\\\{"config_list": config_list_openai\\\\}
)

# Create a RetrieveUserProxyAgent to handle document processing and querying
rag_user_proxy = autogen.retrieve_chat.RetrieveUserProxyAgent(
    name="RAGUserProxy",
    human_input_mode="TERMINATE",
    max_consecutive_auto_reply=5,
    retrieve_config=\\\\{
        "task": "qa",  # Can be "qa", "code", "chat"
        "docs_path": "./my_documents", # Path to your documents
        "chunk_token_size": 2000,
        "model": config_list_openai[0]["model"],
        "collection_name": "my_rag_collection",
        "get_or_create": True, # Creates collection if it doesn_t exist
        "embedding_model": "all-mpnet-base-v2" # Example sentence transformer model
    \\\\},
    code_execution_config=False
)

# Initiate RAG chat
# The RAGUserProxy will first try to answer from documents, then pass to RetrievalAssistant if needed.
rag_user_proxy.initiate_chat(
    retrieval_assistant,
    problem="What are the main features of AutoGen according to the documents?"
)

# To update or add new documents, you might need to re-index or manage the collection.
# rag_user_proxy.retrieve_config["update_context"] = True (for some RAG setups)

Multi-Modal Conversations

AutoGen supports multi-modal inputs (e.g., images) if the underlying LLM supports it (like GPT-4V).

import autogen

# Ensure your config_list points to a multimodal LLM (e.g., gpt-4-vision-preview)
multimodal_config_list = [
    \\\\{
        "model": "gpt-4-vision-preview",
        "api_key": os.environ.get("OPENAI_API_KEY")
    \\\\}
]

multimodal_agent = autogen.AssistantAgent(
    name="MultimodalAgent",
    llm_config=\\\\{"config_list": multimodal_config_list\\\\}
)

# Example message with an image URL
user_proxy.initiate_chat(
    multimodal_agent,
    message=[
        \\\\{"type": "text", "text": "What is in this image?"\\\\},
        \\\\{"type": "image_url", "image_url": \\\\{"url": "https://example.com/image.jpg"\\\\}\\\\}
    ]
)

# Example with local image (requires proper handling to make it accessible to the LLM)
# This might involve uploading the image or using a local multimodal LLM setup.
# For local images with OpenAI, you typically need to base64 encode them.
import base64

def image_to_base64(image_path):
    with open(image_path, "rb") as img_file:
        return base64.b64encode(img_file.read()).decode("utf-8")

local_image_path = "./path_to_your_image.png"
base64_image = image_to_base64(local_image_path)

user_proxy.initiate_chat(
    multimodal_agent,
    message=[
        \\\\{"type": "text", "text": "Describe this local image:"\\\\},
        \\\\{"type": "image_url", "image_url": \\\\{"url": f"data:image/png;base64,\\\\{base64_image\\\\}"\\\\}\\\\}
    ]
)

Agent Workflow Patterns

Reflection and Self-Correction

import autogen

# Agent that reflects on its own output
self_reflecting_agent = autogen.AssistantAgent(
    name="Reflector",
    llm_config=\\\\{"config_list": config_list_openai\\\\},
    system_message="You are an AI that writes code. After writing code, reflect on its quality and correctness. If you find issues, try to correct them."
)

# User proxy to facilitate reflection
reflection_user_proxy = autogen.UserProxyAgent(
    name="ReflectionProxy",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=3, # Allow a few turns for reflection
    # Custom message to trigger reflection or provide feedback
    # This often involves a more complex setup where the proxy or another agent critiques the output.
)

# This pattern is often implemented with a sequence of chats or a GroupChat
# where one agent produces work and another critiques it, then the first agent revises.

# Simplified example:
user_proxy.initiate_chat(
    self_reflecting_agent,
    message="Write a Python function to calculate factorial. Then, review your code for potential bugs or improvements and provide a revised version if necessary."
)

Hierarchical Agent Teams

This is typically achieved using GroupChatManager where one agent (e.g., a manager or planner) coordinates other specialized agents.

import autogen

# Manager Agent
manager_agent = autogen.AssistantAgent(
    name="Manager",
    llm_config=\\\\{"config_list": config_list_openai\\\\},
    system_message="You are a project manager. Delegate tasks to your team (Engineer, Researcher) and synthesize their results."
)

# Specialist Agents
engineer_agent = autogen.AssistantAgent(name="Engineer", llm_config=llm_config)
researcher_agent = autogen.AssistantAgent(name="Researcher", llm_config=llm_config)

# Group Chat for the team
team_groupchat = autogen.GroupChat(
    agents=[user_proxy, manager_agent, engineer_agent, researcher_agent],
    messages=[],
    max_round=15,
    # Manager agent can be set to speak or select next speaker
    speaker_selection_method=lambda last_speaker, groupchat: manager_agent if last_speaker != manager_agent else user_proxy # Simplified example
)

team_manager = autogen.GroupChatManager(
    groupchat=team_groupchat, llm_config=llm_config
)

user_proxy.initiate_chat(
    team_manager,
    message="Develop a new feature for our app that requires research on user needs and then engineering implementation."
)

Best Practices

Agent Design

  • Clear Roles: Define specific, unambiguous roles and responsibilities for each agent.
  • System Messages: Use detailed system messages to guide agent behavior and persona.
  • Tool Access: Provide agents only with the tools they need for their role.
  • LLM Configuration: Tailor LLM temperature, model, and other settings per agent for optimal performance.

Conversation Management

  • Termination Conditions: Clearly define when a conversation or task is complete.
  • Max Turns/Rounds: Set limits to prevent infinite loops or excessive costs.
  • Speaker Selection: Choose appropriate speaker selection methods for group chats (auto, round_robin, custom).
  • Summarization: Use conversation summarization for long-running chats to manage context window.

Code Execution Security

  • Sandboxing: Use Docker (use_docker=True in code_execution_config) for safer code execution, especially with untrusted code.
  • Human Review: Implement human review (human_input_mode="ALWAYS"또는"TERMINATE") 잠재적으로 위험한 코드를 실행하기 전에.
  • 제한된 환경: Docker를 사용하지 않는 경우, 실행 환경의 권한이 제한되어 있는지 확인하세요.

비용 관리

  • 모델 선택: 간단한 작업이나 에이전트에는 비용이 덜 드는 모델(예: GPT-3.5-turbo)을 사용하세요.
  • 최대 토큰/턴: 대화와 LLM 출력의 길이를 제한하세요.
  • 캐싱: autogen.ChatCompletion.set_cache()를 사용하여 LLM 응답을 캐시하고 중복 호출을 줄이세요.
  • 모니터링: 토큰 사용량과 API 비용을 면밀히 추적하세요.

디버깅

  • 상세 로깅: AutoGen은 로깅을 제공합니다. 디버깅을 위해 상세 수준을 높이세요.
  • 단계별 실행: 복잡한 그룹 채팅의 경우, 흐름을 이해하기 위해 수동 발언자 선택이나 중단점을 고려하세요.
  • 에이전트 격리: 더 큰 그룹에 통합하기 전에 에이전트를 개별적으로 테스트하세요.

문제 해결

일반적인 문제

에이전트 루프에 갇힘

  • 원인: 모호한 종료 조건, 충돌하는 에이전트 목표, 또는 지나치게 복잡한 상호작용.
  • 해결책: is_termination_msglambda를 정제하고, 에이전트 지침을 단순화하고, max_consecutive_auto_reply또는 max_round제한을 설정하세요.

예상치 못한 에이전트 동작

  • 원인: 모호한 시스템 메시지, LLM의 잘못된 해석, 또는 부적절한 LLM 구성.
  • 해결책: 시스템 메시지를 더 구체적으로 만들고, 다른 LLM 온도로 실험하고, 함수/도구 설명이 올바른지 확인하세요.

코드 실행 실패

  • 원인: 실행 환경의 누락된 종속성, LLM이 생성한 부정확한 코드, 권한 문제.
  • 해결책: 모든 필요한 패키지가 설치되었는지 확인하고(또는 Docker 사용), 코드 생성을 위한 프롬프트를 개선하고, 파일/네트워크 권한을 확인하세요.

함수 호출 문제

  • 원인: LLM에 제공된 부정확한 함수 설명, 사용자 정의 함수 코드의 버그, LLM이 인자에 대한 유효한 JSON을 생성하지 못함.
  • 해결책: 함수 설명이 명확하고 매개변수와 일치하는지 확인하고, 사용자 정의 함수를 철저히 테스트하고, LLM이 올바른 JSON 형식을 생성하도록 프롬프트를 개선하세요.

이 AutoGen 치트시트는 정교한 멀티 에이전트 AI 애플리케이션을 구축하기 위한 포괄적인 가이드를 제공합니다. AutoGen의 대화형 프레임워크를 활용하여 개발자는 매우 유능하고 협력적인 AI 시스템을 만들 수 있습니다. 최신 기능과 자세한 API 참조를 위해 공식 AutoGen 문서를 참조하세요.