Back to Blog
TutorialsAutoGenMicrosoftTracing

AutoGen Tracing: Complete Setup Guide for Microsoft AutoGen

KZ

Kevin Zhang

Platform Engineer

January 3, 20258 min read

AutoGen Tracing: Complete Setup Guide for Microsoft AutoGen

Microsoft AutoGen has become a leading framework for building conversational AI systems with multiple agents. Its flexible architecture supports everything from simple two-agent chats to complex multi-agent discussions. But debugging these conversations requires specialized tooling.

AutoGen Architecture Overview

Before diving into tracing, let's understand how AutoGen structures conversations. AutoGen uses a message-passing model where agents communicate through conversations. An AssistantAgent might talk to a UserProxyAgent, or multiple agents might participate in a group chat. Each agent can be backed by different LLMs, have different system prompts, and possess different capabilities.

This flexibility is powerful but creates debugging challenges. When a conversation goes wrong, you need to understand the full message history, each agent's decision-making, and how the conversation flowed.

Setting Up OverseeX Tracing

Install the AutoGen integration:

pip install overseex-autogen

Then wrap your agents with tracing:

from autogen import AssistantAgent, UserProxyAgent
from overseex_autogen import trace_autogen_chat

Define your agents

assistant = AssistantAgent( name="assistant", llm_config={"model": "gpt-4"} )

user_proxy = UserProxyAgent( name="user_proxy", human_input_mode="NEVER" )

Start a traced conversation

with trace_autogen_chat( api_key="your_api_key", conversation_name="support-chat" ): user_proxy.initiate_chat( assistant, message="Help me write a Python function" )

Every message, LLM call, and code execution within the context manager is automatically captured.

Understanding AutoGen Traces

AutoGen traces show the conversation as it actually happened, including all participant messages with full content, LLM reasoning for each response, code execution results (for UserProxyAgents with code execution enabled), and timestamps and latencies for each step.

Conversation Flow

The trace timeline shows messages in order, making it easy to follow the conversation. You can see how agents responded to each other and where conversations went off track.

LLM Calls

For each agent response that involves an LLM call, you see the system prompt for that agent, the full message history provided, the model's response including reasoning, and token usage and latency.

Code Execution

When agents execute code, the trace captures the code that was generated, execution output including stdout and stderr, and any errors or exceptions that occurred.

Tracing Group Chats

AutoGen's GroupChat feature enables multi-agent discussions. Tracing these is especially valuable:

from autogen import GroupChat, GroupChatManager

researcher = AssistantAgent(name="researcher", ...) analyst = AssistantAgent(name="analyst", ...) writer = AssistantAgent(name="writer", ...)

group_chat = GroupChat( agents=[researcher, analyst, writer], messages=[], max_round=10 )

manager = GroupChatManager(groupchat=group_chat)

with trace_autogen_chat( api_key="your_api_key", conversation_name="research-team" ): researcher.initiate_chat( manager, message="Let's analyze market trends" )

The trace shows which agent spoke when, how the manager selected speakers, and the full group dynamic.

Common AutoGen Issues

Tracing helps identify several common problems in AutoGen applications.

Infinite loops: Agents can get stuck in unproductive back-and-forth. The trace shows repetitive patterns and helps identify the cause.

Context overflow: Long conversations can exceed model context limits. Traces show message counts and help you implement proper summarization.

Code execution failures: When generated code doesn't work, the trace shows what was generated and what errors occurred.

Speaker selection issues: In group chats, the wrong agent might get selected. Traces show the selection process and help tune the manager's behavior.

Advanced Configuration

Selective Tracing

For high-volume applications, you might not want to trace every conversation:

from overseex_autogen import configure_tracing

configure_tracing( api_key="your_api_key", sample_rate=0.1, # Trace 10% of conversations trace_errors_always=True # But always trace errors )

Custom Metadata

Add context to your traces:

with trace_autogen_chat(
    api_key="your_api_key",
    metadata={
        "user_id": user_id,
        "session_id": session_id,
        "feature": "code-review"
    }
):
    # conversation code

Sensitive Data Handling

Redact sensitive information before logging:

with trace_autogen_chat(
    api_key="your_api_key",
    redact_patterns=[
        r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b',  # emails
        r'\b\d{3}-\d{2}-\d{4}\b'  # SSNs
    ]
):
    # conversation code

Best Practices

Name your agents meaningfully: Agent names appear throughout traces. "code-reviewer" is better than "agent2".

Set reasonable max_rounds: Unbounded conversations can run forever. Set limits and alert when reached.

Monitor code execution carefully: Auto-executing generated code is powerful but risky. Watch for security issues in traces.

Use human_input_mode wisely: For production, you usually want "NEVER" or careful "TERMINATE" conditions.

Conclusion

AutoGen enables sophisticated conversational AI systems, but debugging them requires specialized tracing. With OverseeX, you get complete visibility into AutoGen conversations—every message, every LLM call, every code execution. This visibility is essential for building reliable AutoGen applications.

Start tracing your AutoGen conversations today and understand exactly how your agents interact.

Share this article
KZ

Kevin Zhang

Platform Engineer

Writing about AI agents, monitoring, and building reliable LLM applications at OverseeX.

Ready to Monitor Your AI Agents?

Start capturing traces and optimizing your LLM applications today.

Get Started Free