langchain_azure_ai.agents.hosting package to expose a compiled
LangGraph graph through the protocols for Microsoft Foundry
hosted agents. The hosting
package lets you keep your LangChain and LangGraph agent logic in code while
Foundry manages the hosted runtime, sessions, scale, identity, and protocol
endpoints.
In this article, you create a minimal LangGraph agent, expose it through either
the Responses or Invocations protocol, test it through HTTP, and deploy it to
Foundry with the Azure Developer CLI or the Foundry Toolkit Visual Studio Code
extension.
Prerequisites
- An Azure subscription. Create one for free.
- A Foundry project.
- A deployed chat model, such as
gpt-4.1orgpt-5-mini. - Python 3.10 or later.
- Azure CLI signed in (
az login) soDefaultAzureCredentialcan authenticate.
Install the package
Installlangchain-azure-ai 1.2.4 or later with the hosting extra:
hosting extra installs the Foundry protocol libraries used by the host
servers:
azure-ai-agentserver-responsesfor the OpenAI-compatible/responsesendpoint.azure-ai-agentserver-invocationsfor the generic/invocationsendpoint.
Choose a hosting protocol
Hosted agents can expose one or more protocols. Start with Responses for most conversational agents.| Protocol | Host class | Endpoint | Use when |
|---|---|---|---|
| Responses | ResponsesHostServer | /responses | You want OpenAI-compatible chat, streaming, response history, and conversation threading. |
| Invocations | InvocationsHostServer | /invocations | You want a custom JSON shape, a webhook-style endpoint, or non-conversational processing. |
Configure environment variables
Set the project endpoint and model deployment name for local development:FOUNDRY_PROJECT_ENDPOINT. If you use azd ai agent init with a sample
manifest, the generated project also uses AZURE_AI_MODEL_DEPLOYMENT_NAME for
the selected model deployment.
Responses protocol
Use the Responses protocol when you want an OpenAI-compatible chat endpoint with streaming, response history, and conversation threading.Create a Responses host
Create a file namedmain.py with a minimal LangGraph agent that uses a
Foundry model. This pattern matches the basic Responses sample in the
langchain-azure-ai source repository.
create_agent, connects it to the Foundry project’s OpenAI-compatible model
endpoint, and passes the compiled graph to ResponsesHostServer. The host
starts an HTTP server and exposes the graph through POST /responses. By
default, the server binds to port 8088, or to the value of the PORT
environment variable when one is set.
Run the app locally:
Test the Responses endpoint
Send a non-streaming Responses request to the local server. Bash:stream to true. The host emits Responses API
server-sent events, such as response.created, response.output_text.delta,
and response.completed.
Conversations
ResponsesHostServer supports two conversation-state patterns. The pattern it
uses depends on whether your compiled graph has a LangGraph checkpointer.
| Graph configuration | Conversation source | What the host sends to the graph on later turns |
|---|---|---|
| Graph without a checkpointer | Responses history from the protocol runtime | Prior response history plus the current request input |
| Graph compiled with a checkpointer | LangGraph checkpoint state keyed by the conversation or response thread | Current request input only |
previous_response_id or
a conversation ID. For local testing, chain the previous response ID in the
next request:
agent_session_id or use a conversation ID. For details, see
Manage Hosted agent sessions.
Human-in-the-loop
If your graph uses LangGraphinterrupt() calls, ResponsesHostServer surfaces
pending interrupts through standard Responses API output items:
- A
function_callitem named__hosted_agent_adapter_interrupt__. - An
mcp_approval_requestitem withserver_labelset tolanggraph.
function_call_output item
whose call_id matches the interrupt ID or an mcp_approval_response item
whose approval_request_id matches the interrupt ID. Use
function_call_output when you need to send a rich LangGraph Command payload
with resume, update, or goto fields. Use mcp_approval_response for a
simple approve or reject flow.
Invocations protocol
UseInvocationsHostServer when your callers can’t use the Responses API
request shape or when your scenario isn’t a chat conversation. The default
Invocations host accepts a message string and an optional stream flag.
Create an Invocations host
Use the same model-building function from the Responses example, but startInvocationsHostServer instead of ResponsesHostServer.
POST /invocations. The MemorySaver checkpointer gives local multi-turn continuity
for a given session ID. For production, use a durable checkpointer so state
survives container restarts.
Test the Invocations endpoint
Send a non-streaming request:x-agent-session-id response header as
the agent_session_id query parameter on the next request:
text/event-stream events with token payloads:
done event:
Customize the request schema
To customize the request body, subclassInvocationsHostServer and override
parse_request. You can also override build_input to map the parsed data to a
custom graph state.
build_input instead of flattening the request to text.
Deploy
You can deploy with the Azure Developer CLI or the Foundry Toolkit Visual Studio Code extension. The Azure Developer CLI flow uses sample manifests and Docker; the extension flow provides a guided deployment experience in Visual Studio Code. Hosted agent deployment requires the Foundry Project Manager role on the project. For details, see Deploy a Hosted agent.Deploy with Azure Developer CLI
Thelangchain-azure-ai source repository includes Hosted agent samples that
can be run and deployed with the Azure Developer CLI. The flow uses each
sample’s agent.manifest.yaml, agent.yaml, Dockerfile, and main.py.
Install the AI agent extension and sign in before you initialize a sample:
azd ai agent run builds the container
image declared in the sample’s Dockerfile. For command details, see the
Azure Developer CLI reference.
Initialize from a sample manifest
Create a new folder and initialize it from a sample manifest. Replace the manifest URL with the sample you want to use.azd ai agent init. If you don’t already have a
Foundry project and model deployment, the initialization flow can guide you
through creating them.
Run the container locally
Run the agent host locally throughazd:
http://127.0.0.1:8088. In another terminal, invoke the
local protocol endpoint directly:
azd:
Deploy to Foundry
If the initialized project uses a new Foundry project and model deployment, provision the Azure resources first:FOUNDRY_PROJECT_ENDPOINT: The endpoint URL for the Foundry project where the agent is deployed.AZURE_AI_MODEL_DEPLOYMENT_NAME: The model deployment name selected duringazd ai agent init.APPLICATIONINSIGHTS_CONNECTION_STRING: The connection string for the project’s Application Insights instance.
Deploy with Foundry Toolkit Visual Studio Code extension
For extension-based deployment, see Quickstart: Deploy your first hosted agent.Troubleshooting
Use this checklist to diagnose common issues while developing Hosted agents withlangchain_azure_ai.agents.hosting.
Graph schema validation fails
The default hosts expect a compiled LangGraph graph whose state has amessages field, such as MessagesState. If your graph uses a custom state
schema, subclass the host and override build_input. For Responses, override
handle_create when you need full control over request parsing, graph
execution, and emitted Responses events.
Conversation state doesn’t continue
For the Responses protocol, passprevious_response_id or a conversation ID
on later turns. If your graph uses a checkpointer, make sure the checkpointer is
configured and durable for the environment where the agent runs.
For the Invocations protocol, the platform doesn’t store conversation history.
Use an agent_session_id query parameter to route later calls to the same
hosted sandbox and use your own state store or LangGraph checkpointer for
conversation state.
The model can’t be reached in the hosted container
Confirm that the Hosted agent version includesAZURE_AI_MODEL_DEPLOYMENT_NAME,
and that the agent identity has permission to call the Foundry project. The
platform sets FOUNDRY_PROJECT_ENDPOINT; your code should read that variable
when running in Foundry.