Deploy your containerized agent code to Foundry Agent Service using the Python SDK or REST API.
This article shows you how to deploy a containerized agent to Foundry Agent Service using the Python SDK or REST API. Use these approaches when you want to manage agent deployments directly from your own applications or services.If you’re deploying for the first time or want the fastest path, use the Quickstart: Create and deploy a Hosted agent instead. The quickstart uses the Azure Developer CLI (azd) or VS Code extension, which handle building, pushing, versioning, and RBAC configuration automatically.
Every Hosted agent deployment follows this sequence:
Build and push — Package your agent code into a container image and push it to Azure Container Registry.
Create an agent version — Register the image with Foundry Agent Service. The platform provisions infrastructure and creates a dedicated Entra agent identity.
Poll for status — Wait for the version status to reach active.
Invoke — Send requests to the agent’s dedicated endpoint.
You need Foundry Project Manager at project scope to create and deploy Hosted agents. This role includes both the data plane permissions to create agents and the ability to assign the Foundry User role to the platform-created agent identity. The agent identity needs Foundry User on the project to access models and artifacts at runtime.
The Foundry RBAC roles were recently renamed. Foundry User, Foundry Owner, Foundry Account Owner, and Foundry Project Manager were previously named Azure AI User, Azure AI Owner, Azure AI Account Owner, and Azure AI Project Manager. You might still see the previous names in some places while the rename rolls out. The role IDs and core permissions are unchanged by the rename.
If you use azd or the VS Code extension, the tooling handles most RBAC assignments automatically, including:
Container Registry Repository Reader for the project managed identity (image pulls)
Foundry User for the platform-created agent identity (runtime model and tool access)
The platform creates a dedicated Entra agent identity for each Hosted agent at deploy time. This identity is a service principal that your running container uses to call models and tools. You don’t need to configure managed identities manually. However, the user who creates the agent must have permission to assign Foundry User to that identity—which is why Foundry Project Manager is recommended over Foundry User alone.
While azd and VS Code extensions handle basic RBAC assignments automatically, complex scenarios may require additional manual configuration. For comprehensive details about all permissions and role assignments involved, see Hosted agent permissions reference.
The Azure Container Registry that holds your Hosted agent’s container image must currently be reachable over its public endpoint. Placing the registry behind a private network (private endpoint with public network access disabled) isn’t currently supported for Hosted agents — the platform can’t pull the image. For the full list of network constraints, see Limitations.
Your container image must meet the following requirements to run on the Hosted agent platform.
The hosting platform requires x86_64 (linux/amd64) container images. If you build on Apple Silicon or other ARM-based machines, use docker build --platform linux/amd64 . to avoid producing an incompatible ARM image.
Bidirectional streaming: real-time voice agents, interactive media
The WebSocket protocol uses the identifier invocations_ws and ships in the same azure-ai-agentserver-invocations package as the HTTP /invocations route, so one container can serve both. Use it when you need persistent, full-duplex streaming—for example, sending microphone PCM to the agent and receiving synthesized audio back. For voice scenarios, see Build a voice agent with hosted agents.
The invocations_ws WebSocket protocol is in preview and is currently available only in North Central US.
A single container can expose multiple protocols simultaneously by declaring them when you create the agent — in the agent.yaml file, SDK call, or REST API request — and importing the required libraries. Use the protocol libraries within your existing framework, whether that’s Microsoft Agent Framework, LangChain, or custom code.
The Python and .NET libraries for the Responses protocol implement the Azure AI Responses API. Import the package and implement the IResponseHandler interface. The library handles routing, streaming with server-sent events (SSE), background execution, cancellation, caching, and response lifecycle management.
IResponseHandler is the core abstraction you implement. The library calls CreateAsync for each incoming request and delivers the returned IAsyncEnumerable<ResponseStreamEvent> to clients through SSE:
public class EchoHandler : ResponseHandler{ public override IAsyncEnumerable<ResponseStreamEvent> CreateAsync( CreateResponse request, ResponseContext context, CancellationToken cancellationToken) { return new TextResponse(context, request, createText: async ct => { var input = await context.GetInputTextAsync(cancellationToken: ct); return $"Echo: {input}"; }); }}
ResponseEventStream manages sequenceNumber, outputIndex, contentIndex, itemId, and the full Response lifecycle automatically. Each yield return maps one-to-one to an SSE event, so you don’t need to track this state yourself.
Streaming mode (default): SSE events are delivered in real time to the connected client.
Background mode: The handler runs to completion without a connected SSE client. Events are buffered and available for replay through GET /responses/{id}.
The library orchestrates the complete response lifecycle: created → in_progress → completed (or failed or cancelled). The library also manages cancellation, error handling, and terminal event guarantees automatically.
All service instances registered through AddResponsesServer() are thread-safe. Handler instances are scoped per-request.For detailed handler implementation guidance, see the handler implementation guide. For runnable examples, see the Responses protocol samples.
Containers serve traffic on port 8088 locally. In production, the Foundry gateway handles routing — your container doesn’t need to expose a public port.
The Hosted agent platform automatically injects environment variables into your container at runtime. Your code can read these without declaring them in agent.yaml or environment_variables. The FOUNDRY_* prefix is reserved for platform use.
Variable
Purpose
FOUNDRY_PROJECT_ENDPOINT
Foundry project endpoint URL
FOUNDRY_PROJECT_ARM_ID
Foundry project ARM resource ID
FOUNDRY_AGENT_NAME
Name of the running agent
FOUNDRY_AGENT_VERSION
Version of the running agent
FOUNDRY_AGENT_SESSION_ID
Session ID for the current request (hosted containers only)
APPLICATIONINSIGHTS_CONNECTION_STRING
Application Insights connection string for telemetry
Don’t redeclare platform-injected variables in agent.yaml—they’re set automatically.Variables that you declare yourself, such as MODEL_DEPLOYMENT_NAME or toolbox MCP endpoints, go in the environment_variables section of agent.yaml or the SDK create_version call.
Reference project connections in environment variables
Instead of hard-coding secrets (API keys, tokens, endpoints) into agent.yaml or your image, pull them from a Foundry project connection at sandbox start. Any value in environment_variables can be a placeholder expression that the platform resolves before your container starts.
A placeholder has the form ${{connections.<name>.<path>}}, where <name> is the connection’s resource name (visible in the portal under Project details > Connected resources) and <path> is one of:
Path
Resolves to
credentials.<field>
A secret field on the connection
target
The connection’s target property (for example, an endpoint URL)
metadata.<field>
A field under the connection’s metadata
The field name to use depends on the connection category:
Connection category
Field name in placeholder
ApiKey, AppInsights
Always key—for example, credentials.key
CustomKeys
The key name you supplied when creating the connection—for example, credentials.github_token
At sandbox start, Foundry resolves the placeholder and injects the resolved value as a plain environment variable. Your code reads it like any other env var:
import ostoken = os.environ["GITHUB_TOKEN"]
A GET on the agent version returns the literal ${{...}} text—the resolved secret is never echoed back through the management API.
Create the connection before you deploy the version. If the connection or the referenced field is missing at sandbox start, the placeholder doesn’t resolve and the variable is empty.
Secrets are write-only. GET on a connection returns credentials: null. Verify resolution by reading the env var from inside your running container, not by inspecting the connection.
Record CustomKeys field names yourself. The management API never echoes them back after creation. Keep them next to your agent source (for example, in IaC templates or alongside agent.yaml) so you can construct placeholders later without guessing.
Foundry manages the backing secret name. When you create the connection, Foundry stores the value in Key Vault under a name it chooses — you can’t reference a preexisting Key Vault secret by name. To attach your own Key Vault as the backing store, see Set up a Key Vault connection.
Before deploying to Foundry, validate your agent works locally using the protocol library. The container serves the same endpoints locally as it does in production.
The Azure Developer CLI (azd) and VS Code extension automate the full deployment lifecycle. For a step-by-step walkthrough, see the Quickstart: Create and deploy a Hosted agent.
Creating a version triggers the platform to provision the agent automatically. There’s no separate start step — the platform builds a container snapshot and makes the agent ready to serve requests.
After creating a version, poll until the status is active before invoking the agent. Provisioning typically takes less than one minute depending on image size.
import time# Poll until the agent version is activewhile True: version_info = project.agents.get_version( agent_name="my-agent", agent_version=agent.version ) status = version_info["status"] print(f"Status: {status}") if status == "active": print("Agent is ready!") break elif status == "failed": print(f"Provisioning failed: {version_info['error']}") break time.sleep(5)
Version status values:
Status
Description
creating
Infrastructure provisioning in progress
active
Agent is ready to serve requests
failed
Provisioning failed — check the error field for details
After the version reaches active status, use get_openai_client to create an OpenAI client bound to the agent’s endpoint.For the Responses protocol:
# Create an OpenAI client bound to the agent endpointopenai_client = project.get_openai_client(agent_name="my-agent")response = openai_client.responses.create( input="Hello! What can you do?",)print(response.output_text)
For the Invocations protocol, call the invocations endpoint directly:
To prevent charges, clean up resources when finished. Agent compute is deprovisioned after 15 minutes of inactivity, so there’s no cost when an agent isn’t serving requests.