Foundry Agent Service limits, quotas, and regional support

Foundry Agent Service enforces quotas and limits on agent artifacts, file uploads, messages, and tool registrations. Understanding these limits helps you design applications that scale without hitting service boundaries. This article lists default limits, supported regions, compatible models, and guidance for handling limit errors.

Foundry Agent Service is generally available (GA). Some sub-features, such as hosted agents, are in public preview and might have different constraints.

Prerequisites

An Azure subscription.
A Microsoft Foundry project.
A deployed model compatible with Agent Service. Model and region availability can vary.

Supported regions

Foundry Agent Service is available in the following Azure regions:

Australia East
Brazil South
Canada East
East US
East US 2
France Central
Germany West Central
Italy North
Japan East
Norway East
South Africa North
South Central US
South India
Sweden Central
Switzerland North
UK South
West Europe
West US
West US 3

Not all tools are available in every region. For example, file search isn’t available in Italy North and Brazil South. For the full tool-by-region matrix, see Tool support by region and model.

Azure OpenAI model support

Foundry Agent Service is compatible with current Azure OpenAI models. For a complete list of supported models and their availability by region, see Foundry Models sold directly by Azure.

Other model collections

In addition to Azure OpenAI models, Agent Service supports models from the Foundry model catalog. These models are deployed and managed through Foundry and follow separate quotas. The following models are available for your agents to use. Models sold directly by Azure:

MAI-DS-R1: Deterministic, precision-focused reasoning.
grok-4: Frontier-scale reasoning for complex, multiple-step problem solving.
grok-4-fast-reasoning: Accelerated agentic reasoning optimized for workflow automation.
grok-4-fast-non-reasoning: High-throughput, low-latency generation and system routing.
grok-3: Strong reasoning for complex, system-level workflows.
grok-3-mini: Lightweight model optimized for interactive, high-volume use cases.
Llama-3.3-70B-Instruct: Versatile model for enterprise Q&A, decision support, and system orchestration.
Llama-4-Maverick-17B-128E-Instruct-FP8: FP8-optimized model that delivers fast, cost-efficient inference.
DeepSeek-V3-0324: Multimodal understanding across text and images.
DeepSeek-V3.1: Enhanced multimodal reasoning and grounded retrieval.
DeepSeek-R1-0528: Advanced long-form and multiple-step reasoning.
gpt-oss-120b: Open-ecosystem model that supports transparency and reproducibility.

Model availability can change over time. To verify what you can deploy for your project and region, use the Foundry portal model experience.

Troubleshooting

A model or version isn’t available in your region

Confirm you selected the right tab for your deployment type (global standard vs. provisioned).
Try a different region that supports the model and version. See the model and region support table.
If you’re using gpt-5 models, registration is required. Access is granted according to Microsoft’s eligibility criteria.

A tool isn’t available in your region

Not all tools are supported in every region. For example, file search isn’t available in Italy North and Brazil South, and code interpreter isn’t available in all regions.
Check the tool support by region and model table to confirm availability before you deploy.
If a tool isn’t available, choose a supported region or use a different tool.

Provisioned throughput deployment fails

Confirm you have enough PTUs available in the region.
Review Provisioned throughput and Spillover traffic management.

Agent receives rate-limit (429) errors

Implement exponential backoff with jitter in your application retry logic.
For sustained high-throughput workloads, consider provisioned throughput deployments.
Review Azure OpenAI quotas and limits for your deployment’s tokens-per-minute and requests-per-minute caps.

Quotas and limits

Foundry Agent Service enforces limits in two places:

Agent Service limits. Limits for agent and thread artifacts, such as file uploads, vector store attachments, message counts, and tool registration.
Model limits. Quotas and rate limits for the model deployments your agents call.

If you’re using threads and messages, see Threads, runs, and messages in Foundry Agent Service. If you’re using file search, see Vector stores for file search.

Default quotas and limits for the service

The following table lists default limits enforced by the Agent Service. These limits apply to all Foundry projects regardless of subscription type or region.

Limit name	Limit value
Maximum number of files per agent/thread	10,000
Maximum file size for agents	512 MB
Maximum size for all uploaded files for agents	300 GB
Maximum file size in tokens for attaching to a vector store	2,000,000 tokens
Maximum number of messages per thread	100,000
Maximum size of `text` content per message	1,500,000 characters
Maximum number of tools registered per agent	128

The Agent Service limits in this table are fixed and apply uniformly across all subscription types. Agent Service doesn’t impose separate rate limits on API calls. Rate limiting is applied at the model deployment level. See Azure OpenAI quotas and limits for model-specific rate limits.

Limit error reference

When you exceed a limit, the Agent Service returns an error. Handle these errors gracefully in your application.

Error scenario	HTTP status	Error code	Recommended action
File too large	400	`file_size_exceeded`	Split content into smaller files
Vector store token limit	400	`token_limit_exceeded`	Reduce file content or split files
Thread message cap	400	`message_limit_exceeded`	Create a new thread
Message content too large	400	`content_size_exceeded`	Use file search for large content
Too many tools	400	`tool_limit_exceeded`	Remove unused tools
Rate limit exceeded	429	`rate_limit_exceeded`	Implement exponential backoff

For example:

File exceeds the maximum size. Uploading the file fails. Split the content into smaller files or reduce file size before you upload.
Vector store token limit. Attaching a file to a vector store fails if the file exceeds the token limit. Reduce the file content or split it into multiple files.
Thread message cap. Adding messages can fail after a thread reaches the message limit. Create a new thread for a new conversation session, or archive and rotate threads as part of your application design.
Message content size. Creating a message can fail if the text content is too large. Send smaller messages, or move large content into files and use file search.
Tool registration cap. Creating or updating an agent can fail if you register too many tools. Register only the tools you need, and prefer fewer, reusable tools.
Rate limit exceeded. API calls to the model deployment are throttled. Implement exponential backoff with jitter.

For file search scenarios, see Vector stores for file search for guidance on managing vector store growth.

Best practices to stay within limits

Use the following practices to reduce limit-related failures:

Keep files small and focused. Prefer multiple smaller documents over a single large document.
Avoid very large messages. Put long content in uploaded files and query it by using file search.
Plan for long conversations. Treat threads as session state and rotate to new threads when conversations become very long.
Register only required tools. Remove unused tools from agent definitions.
Monitor usage trends. Track agent activity by using Foundry Agent Service metrics to identify growth before you hit limits.

Quotas and limits for models

Agents follow the quotas and rate limits for the model deployments they use. For current model quotas and limits, see:

To view or request more model quota, see Manage and increase quotas for resources with Microsoft Foundry (Foundry projects).

Request a limit increase

The limits in this article are default values for Foundry Agent Service. If your workload requires higher limits:

Model quotas. You can request increases for model deployment quotas. See Manage and increase quotas for resources with Microsoft Foundry.
Agent Service limits. The file, message, and tool limits listed in this article are fixed service limits and can’t be increased. Design your application to work within these constraints by using the best practices described earlier.

What is Microsoft Foundry (new)?

Get started

Agent development

Agent tools & integration

Model capabilities

Fine-tuning

Manage agents, models, & tools

Observability, evaluation, & tracing

Developer experience

API & SDK

Responsible AI

Best practices

Setup & configure

Security & governance

Operate & support

Quotas and limits for Microsoft Foundry Agent Service

Foundry Agent Service limits, quotas, and regional support

Prerequisites

Supported regions

Azure OpenAI model support

Other model collections

Troubleshooting

A model or version isn’t available in your region

A tool isn’t available in your region

Provisioned throughput deployment fails

Agent receives rate-limit (429) errors

Quotas and limits

Default quotas and limits for the service

Limit error reference

Best practices to stay within limits

Quotas and limits for models

Request a limit increase

What is Microsoft Foundry (new)?

Get started

Agent development

Agent tools & integration

Model capabilities

Fine-tuning

Manage agents, models, & tools

Observability, evaluation, & tracing

Developer experience

API & SDK

Responsible AI

Best practices

Setup & configure

Security & governance

Operate & support

​Foundry Agent Service limits, quotas, and regional support

​Prerequisites

​Supported regions

​Azure OpenAI model support

​Other model collections

​Troubleshooting

​A model or version isn’t available in your region

​A tool isn’t available in your region

​Provisioned throughput deployment fails

​Agent receives rate-limit (429) errors

​Quotas and limits

​Default quotas and limits for the service

​Limit error reference

​Best practices to stay within limits

​Quotas and limits for models

​Request a limit increase

​Related content

Foundry Agent Service limits, quotas, and regional support

Prerequisites

Supported regions

Azure OpenAI model support

Other model collections

Troubleshooting

A model or version isn’t available in your region

A tool isn’t available in your region

Provisioned throughput deployment fails

Agent receives rate-limit (429) errors

Quotas and limits

Default quotas and limits for the service

Limit error reference

Best practices to stay within limits

Quotas and limits for models

Request a limit increase

Related content