Build with agents, conversations, and responses in Foundry Agent Service

Microsoft Foundry Agent Service uses three core runtime components—agents, conversations, and responses—to power stateful, multi-turn interactions. An agent uses a model from the Foundry model catalog, along with instructions and tools. A conversation persists history across turns. A response is the output the agent produces when it processes input. This article walks through each component and shows how to use them together in code. You’ll learn how to create an agent, start a conversation, generate responses (with or without an agent), add follow-up messages, and stream results—with examples in Python, C#, JavaScript, Java, and REST API.

How runtime components work together

When you work with an agent, you follow a consistent pattern:

Create an agent: Define an agent to start sending messages and receiving responses.
Create a conversation (optional): Use a conversation to maintain history across turns. If you don’t use a conversation, carry forward context by using the output from a previous response.
Generate a response: The agent’s Foundry model processes input items in the conversation and any instructions provided in the request. The agent might append items to the conversation.
Check response status: Monitor the response until it finishes (especially in streaming or background mode).
Retrieve the response: Display the generated response to the user.

The following diagram illustrates how these components interact in a typical agent loop. :::image type=“content” source=”../media/runtime-components.png” alt-text=“Diagram that shows the agent runtime loop: an agent definition and optional conversation history feed response generation, which can call tools, append items back into the conversation, and produce output items you display to the user.”::: You provide user input (and optionally conversation history), the service generates a response (including tool calls when configured), and the resulting items can be reused as context for the next turn.

Prerequisites

To run the samples in this article, you need:

An Azure subscription. Create one for free.
A Microsoft Foundry project.
The Foundry Agent Service SDK for your language:

Python
C#
JavaScript
Java
REST API

pip install "azure-ai-projects>=2.0.0"
pip install azure-identity

dotnet add package Azure.AI.Projects
dotnet add package Azure.AI.Projects.Agents
dotnet add package Azure.AI.Extensions.OpenAI
dotnet add package Azure.Identity

npm install @azure/ai-projects@2.0.0
npm install @azure/identity

<dependency>
    <groupId>com.azure</groupId>
    <artifactId>azure-ai-agents</artifactId>
    <version>2.0.0</version>
</dependency>
<dependency>
    <groupId>com.azure</groupId>
    <artifactId>azure-identity</artifactId>
    <version>1.15.4</version>
</dependency>

No SDK installation required. Use Azure CLI to obtain an access token:

az login

Create an agent

An agent is a persisted orchestration definition that combines AI models, instructions, code, tools, parameters, and optional safety or governance controls. Store agents as named, versioned assets in Microsoft Foundry. During response generation, the agent definition works with interaction history (conversation or previous response) to process and respond to user input. The following example creates a prompt agent with a name, model, and instructions. Use the project client for agent creation and versioning.

    from azure.identity import DefaultAzureCredential
    from azure.ai.projects import AIProjectClient
    from azure.ai.projects.models import PromptAgentDefinition

    # Format: "https://resource_name.services.ai.azure.com/api/projects/project_name"
    PROJECT_ENDPOINT = "your_project_endpoint"

    # Create project client to call Foundry API
    project = AIProjectClient(
        endpoint=PROJECT_ENDPOINT,
        credential=DefaultAzureCredential(),
    )

    # Create a prompt agent
    agent = project.agents.create_version(
        agent_name="my-agent",
        definition=PromptAgentDefinition(
            model="gpt-5-mini",
            instructions="You are a helpful assistant.",
        ),
    )
    print(f"Agent: {agent.name}, Version: {agent.version}")

Agents are now identified using the agent name and agent version. They don’t have a GUID called AgentID anymore.

For additional agent types (workflow, hosted), see Agent development lifecycle.

Create an agent with tools

Tools extend what an agent can do beyond generating text. When you attach tools to an agent, the agent can call external services, run code, search files, and access data sources during response generation—using tools such as web search or function calling. You can attach one or more tools when you create an agent. During response generation, the agent decides whether to call a tool based on the user input and its instructions. The following example creates an agent with a web search tool attached.

    from azure.identity import DefaultAzureCredential
    from azure.ai.projects import AIProjectClient
    from azure.ai.projects.models import PromptAgentDefinition, WebSearchTool

    PROJECT_ENDPOINT = "your_project_endpoint"

    project = AIProjectClient(
        endpoint=PROJECT_ENDPOINT,
        credential=DefaultAzureCredential(),
    )

    # Create an agent with a web search tool
    agent = project.agents.create_version(
        agent_name="my-tool-agent",
        definition=PromptAgentDefinition(
            model="gpt-5-mini",
            instructions="You are a helpful assistant that can search the web.",
            tools=[WebSearchTool()],
        ),
    )
    print(f"Agent: {agent.name}, Version: {agent.version}")

For the full list of available tools, see the tools overview. For best practices, see Best practices for using tools.

Generate responses

Response generation invokes the agent. The agent uses its configuration and any provided history (conversation or previous response) to perform tasks by calling models and tools. As part of response generation, the agent appends items to the conversation. You can also generate a response without defining an agent. In this case, you provide all configurations directly in the request and use them only for that response. This approach is useful for simple scenarios with minimal tools. Additionally, you can fork the conversation at the first response ID or second response ID

Generate a response with an agent

The following example generates a response using an agent reference, then sends a follow-up question using the previous response as context.

    from azure.identity import DefaultAzureCredential
    from azure.ai.projects import AIProjectClient

    # Format: "https://resource_name.services.ai.azure.com/api/projects/project_name"
    PROJECT_ENDPOINT = "your_project_endpoint"
    AGENT_NAME = "your_agent_name"

    # Create clients to call Foundry API
    project = AIProjectClient(
        endpoint=PROJECT_ENDPOINT,
        credential=DefaultAzureCredential(),
    )
    openai = project.get_openai_client()

    # Generate a response using the agent
    response = openai.responses.create(
        extra_body={
            "agent_reference": {
                "name": AGENT_NAME,
                "type": "agent_reference",
            }
        },
        input="What is the largest city in France?",
    )
    print(response.output_text)

    # Ask a follow-up question using the previous response
    follow_up = openai.responses.create(
        extra_body={
            "agent_reference": {
                "name": AGENT_NAME,
                "type": "agent_reference",
            }
        },
        previous_response_id=response.id,
        input="What is the population of that city?",
    )
    print(follow_up.output_text)

Print tool calls from a response

When an agent uses tools during response generation, the response output contains tool call items alongside the final message. You can iterate over response.output to inspect each item and display tool calls—such as web searches, function calls, or file searches—before printing the text response.

    from azure.identity import DefaultAzureCredential
    from azure.ai.projects import AIProjectClient

    PROJECT_ENDPOINT = "your_project_endpoint"
    AGENT_NAME = "your_agent_name"

    project = AIProjectClient(
        endpoint=PROJECT_ENDPOINT,
        credential=DefaultAzureCredential(),
    )
    openai = project.get_openai_client()

    response = openai.responses.create(
        extra_body={
            "agent_reference": {
                "name": AGENT_NAME,
                "type": "agent_reference",
            }
        },
        input="What happened in the news today?",
    )

    # Print each output item, including tool calls
    for item in response.output:
        if item.type == "web_search_call":
            print(f"[Tool] Web search: status={item.status}")
        elif item.type == "function_call":
            print(f"[Tool] Function call: {item.name}({item.arguments})")
        elif item.type == "file_search_call":
            print(f"[Tool] File search: status={item.status}")
        elif item.type == "message":
            print(f"[Assistant] {item.content[0].text}")

Generate a response without storing

By default, the service stores response history server-side, so you can reference previous_response_id for multi-turn context. If you set store to false, the service doesn’t persist the response. You must carry forward the conversation context yourself by passing previous output items as input to the next request. This approach is useful when you need full control over conversation state, want to minimize stored data, or work in a zero-data-retention environment.

    from azure.identity import DefaultAzureCredential
    from azure.ai.projects import AIProjectClient

    PROJECT_ENDPOINT = "your_project_endpoint"
    AGENT_NAME = "your_agent_name"

    project = AIProjectClient(
        endpoint=PROJECT_ENDPOINT,
        credential=DefaultAzureCredential(),
    )
    openai = project.get_openai_client()

    # Generate a response without storing
    response = openai.responses.create(
        extra_body={
            "agent_reference": {
                "name": AGENT_NAME,
                "type": "agent_reference",
            }
        },
        input="What is the largest city in France?",
        store=False,
    )
    print(response.output_text)

    # Carry forward context client-side by passing previous output as input
    follow_up = openai.responses.create(
        extra_body={
            "agent_reference": {
                "name": AGENT_NAME,
                "type": "agent_reference",
            }
        },
        input=[
            {"role": "user", "content": "What is the largest city in France?"},
            {"role": "assistant", "content": response.output_text},
            {"role": "user", "content": "What is the population of that city?"},
        ],
        store=False,
    )
    print(follow_up.output_text)

Conversations and conversation items

Conversations are durable objects with unique identifiers. After creation, you can reuse them across sessions. Conversations store items, which can include messages, tool calls, tool outputs, and other data.

Create a conversation

The following example creates a conversation with an initial user message. Use the OpenAI client (obtained from the project client) for conversations and responses.

    from azure.identity import DefaultAzureCredential
    from azure.ai.projects import AIProjectClient

    # Format: "https://resource_name.services.ai.azure.com/api/projects/project_name"
    PROJECT_ENDPOINT = "your_project_endpoint"

    # Create clients to call Foundry API
    project = AIProjectClient(
        endpoint=PROJECT_ENDPOINT,
        credential=DefaultAzureCredential(),
    )
    openai = project.get_openai_client()

    # Create a conversation with an initial user message
    conversation = openai.conversations.create(
        items=[
            {
                "type": "message",
                "role": "user",
                "content": "What is the largest city in France?",
            }
        ],
    )
    print(f"Conversation ID: {conversation.id}")

When to use a conversation

Use a conversation when you want:

Multi-turn continuity: Keep a stable history across turns without rebuilding context yourself.
Cross-session continuity: Reuse the same conversation for a user who returns later.
Easier debugging: Inspect what happened over time (for example, tool calls and outputs).

When a conversation is used to generate a response (with or without an agent), the full conversation is provided as input to the model. The generated response is then appended to the same conversation.

If the conversation exceeds the model’s supported context size, the model will automatically truncate the input context. The conversation itself is not truncated, but only a subset of it is used to generate the response.

If you don’t create a conversation, you can still build multi-turn flows by using the output from a previous response as the starting point for the next request. This approach gives you more flexibility than the older thread-based pattern, where state was tightly coupled to thread objects. For migration guidance, see Migrate to the Agents SDK.

Conversation item types

Conversations store items rather than only chat messages. Items capture what happened during response generation so the next turn can reuse that context. Common item types include:

Message items: User or assistant messages.
Tool call items: Records of tool invocations the agent attempted.
Tool output items: Outputs returned by tools (for example, retrieval results).
Output items: The response content you display back to the user.

Add items to a conversation

After you create a conversation, use conversations.items.create() to add subsequent user messages or other items.

    # Add a follow-up message to an existing conversation
    openai.conversations.items.create(
        conversation_id=conversation.id,
        items=[
            {
                "type": "message",
                "role": "user",
                "content": "What about Germany?",
            }
        ],
    )

Use a conversation with an agent

Combine a conversation with an agent reference to maintain history across multiple turns. The agent processes all items in the conversation and appends its output automatically.

    from azure.identity import DefaultAzureCredential
    from azure.ai.projects import AIProjectClient

    PROJECT_ENDPOINT = "your_project_endpoint"
    AGENT_NAME = "your_agent_name"

    # Create clients to call Foundry API
    project = AIProjectClient(
        endpoint=PROJECT_ENDPOINT,
        credential=DefaultAzureCredential(),
    )
    openai = project.get_openai_client()

    # Create a conversation for multi-turn chat
    conversation = openai.conversations.create()

    # First turn
    response = openai.responses.create(
        conversation=conversation.id,
        extra_body={
            "agent_reference": {
                "name": AGENT_NAME,
                "type": "agent_reference",
            }
        },
        input="What is the largest city in France?",
    )
    print(response.output_text)

    # Follow-up turn in the same conversation
    follow_up = openai.responses.create(
        conversation=conversation.id,
        extra_body={
            "agent_reference": {
                "name": AGENT_NAME,
                "type": "agent_reference",
            }
        },
        input="What is the population of that city?",
    )
    print(follow_up.output_text)

For examples that show how conversations and responses work together in code, see Create and use memory in Foundry Agent Service.

Streaming and background responses

For long running operations, you can return results incrementally using streaming or run completely asynchronously using background mode. In these cases, you typically monitor the response until it finishes and then consume the final output items.

Stream a response

Streaming returns partial results as they’re generated. This approach is useful for showing output to users in real time.

    from azure.identity import DefaultAzureCredential
    from azure.ai.projects import AIProjectClient

    # Format: "https://resource_name.services.ai.azure.com/api/projects/project_name"
    PROJECT_ENDPOINT = "your_project_endpoint"
    AGENT_NAME = "your_agent_name"

    # Create clients to call Foundry API
    project = AIProjectClient(
        endpoint=PROJECT_ENDPOINT,
        credential=DefaultAzureCredential(),
    )
    openai = project.get_openai_client()

    # Stream a response using the agent
    stream = openai.responses.create(
        extra_body={
            "agent_reference": {
                "name": AGENT_NAME,
                "type": "agent_reference",
            }
        },
        input="Explain how agents work in one paragraph.",
        stream=True,
    )
    for event in stream:
        if hasattr(event, "delta") and event.delta:
            print(event.delta, end="", flush=True)

For details about response modes and how to consume outputs, see Responses API.

Run an agent in background mode

Background mode runs the agent asynchronously, which is useful for long-running tasks such as complex reasoning or image generation. Set background to true and then poll for the response status until it completes.

    from time import sleep
    from azure.identity import DefaultAzureCredential
    from azure.ai.projects import AIProjectClient

    PROJECT_ENDPOINT = "your_project_endpoint"
    AGENT_NAME = "your_agent_name"

    # Create clients to call Foundry API
    project = AIProjectClient(
        endpoint=PROJECT_ENDPOINT,
        credential=DefaultAzureCredential(),
    )
    openai = project.get_openai_client()

    # Start a background response using the agent
    response = openai.responses.create(
        extra_body={
            "agent_reference": {
                "name": AGENT_NAME,
                "type": "agent_reference",
            }
        },
        input="Write a detailed analysis of renewable energy trends.",
        background=True,
    )

    # Poll until the response completes
    while response.status in ("queued", "in_progress"):
        sleep(2)
        response = openai.responses.retrieve(response.id)

    print(response.output_text)

Attach memory to an agent (preview)

Memory gives agents the ability to retain information across sessions, so they can personalize responses and recall user preferences over time. Without memory, each conversation starts from scratch. Foundry Agent Service provides a managed memory solution (preview) that you configure through memory stores. A memory store defines which types of information the agent should retain. Attach a memory store to your agent, and the agent uses stored memories as additional context during response generation. The following example creates a memory store and attaches it to an agent.

    from azure.identity import DefaultAzureCredential
    from azure.ai.projects import AIProjectClient
    from azure.ai.projects.models import (
        MemoryStoreDefaultDefinition,
        MemoryStoreDefaultOptions,
    )

    PROJECT_ENDPOINT = "your_project_endpoint"

    project = AIProjectClient(
        endpoint=PROJECT_ENDPOINT,
        credential=DefaultAzureCredential(),
    )

    # Create a memory store
    options = MemoryStoreDefaultOptions(
        chat_summary_enabled=True,
        user_profile_enabled=True,
    )
    definition = MemoryStoreDefaultDefinition(
        chat_model="gpt-5.2",
        embedding_model="text-embedding-3-small",
        options=options,
    )
    memory_store = project.beta.memory_stores.create(
        name="my_memory_store",
        definition=definition,
        description="Memory store for my agent",
    )
    print(f"Memory store: {memory_store.name}")

For conceptual details, see Memory in Foundry Agent Service. For full implementation guidance, see Create and use memory.

Security and data handling

Because conversations and responses can persist user-provided content and tool outputs, treat runtime data like application data:

Avoid storing secrets in prompts or conversation history. Use connections and managed secret stores instead (for example, Set up a Key Vault connection).
Use least privilege for tool access. When a tool accesses external systems, the agent can potentially read or send data through that tool.
Be careful with non-Microsoft services. If your agent calls tools backed by non-Microsoft services, some data might flow to those services. For related considerations, see Discover tools in the Foundry Tools.

Limits and constraints

Limits can depend on the model, region, and the tools you attach (for example, streaming availability and tool support). For current availability and constraints for responses, see Responses API.

​How runtime components work together

​Prerequisites

​Create an agent

​Create an agent with tools

​Generate responses

​Generate a response with an agent

​Print tool calls from a response

​Generate a response without storing

​Conversations and conversation items

​Create a conversation

​When to use a conversation

​Conversation item types

​Add items to a conversation

​Use a conversation with an agent

​Streaming and background responses

​Stream a response

​Run an agent in background mode

​Attach memory to an agent (preview)

​Security and data handling

​Limits and constraints

​Related content

How runtime components work together

Prerequisites

Create an agent

Create an agent with tools

Generate responses

Generate a response with an agent

Print tool calls from a response

Generate a response without storing

Conversations and conversation items

Create a conversation

When to use a conversation

Conversation item types

Add items to a conversation

Use a conversation with an agent

Streaming and background responses

Stream a response

Run an agent in background mode

Attach memory to an agent (preview)

Security and data handling

Limits and constraints

Related content