Deployment types for Microsoft Foundry Models

When you deploy a model in Microsoft Foundry, you choose a deployment type that determines:

Where your data is processed (global, data zone, or single region)
How you pay (pay-per-token or reserved capacity)
Performance characteristics (latency variance, throughput limits)

The service offers two main categories: standard (pay-per-token) and provisioned (reserved capacity). Within each category, you can choose global, data zone, or regional processing based on your compliance requirements.

Screenshot of the Foundry portal deployment dialog showing the deployment type selection box with Global Standard selected.

Data residency for all deployment types: Data stored at rest remains in the designated Azure geography. However, inferencing data is processed as follows:

Global types: May be processed in any Azure region
DataZone types: Processed only within the Microsoft-specified data zone (US or EU)
Standard/Regional types: Processed in the deployment region

Learn more about data residency.

Deployment type comparison

Deployment type	SKU code	Data processing	Billing	Best for
Global Standard	`GlobalStandard`	Any Azure region	Pay-per-token	General workloads, highest quota
Global Provisioned	`GlobalProvisionedManaged`	Any Azure region	Reserved PTU	Predictable high-throughput
Global Batch	`GlobalBatch`	Any Azure region	50% discount, 24-hr	Large async jobs
Data Zone Standard	`DataZoneStandard`	Within data zone	Pay-per-token	EU/US data zone compliance
Data Zone Provisioned	`DataZoneProvisionedManaged`	Within data zone	Reserved PTU	Data zone + predictable throughput
Data Zone Batch	`DataZoneBatch`	Within data zone	50% discount	Large async jobs with data zone
Standard	`Standard`	Single region	Pay-per-token	Regional compliance, low volume
Regional Provisioned	`ProvisionedManaged`	Single region	Reserved PTU	Regional compliance + throughput
Developer	`DeveloperTier`	Any Azure region	Pay-per-token	Fine-tuned model evaluation only

Not all models support all deployment types. Check Foundry Models sold directly by Azure for model availability by deployment type and region.

SLA guarantees vary by deployment type. Provisioned types provide guaranteed throughput and lower latency variance. Standard types offer best-effort service. Developer deployments don’t include an SLA. For details, see the Azure SLA for Azure OpenAI Service.

For detailed pricing, see Azure OpenAI Service pricing.

Choose the right deployment type

Use the following criteria to select a deployment type:

By data residency requirement

No restrictions: Use Global Standard or Global Provisioned
EU data zone: Use DataZone Standard or DataZone Provisioned in an EU region
US data zone: Use DataZone Standard or DataZone Provisioned in a US region
Single region only: Use Standard or Regional Provisioned

By workload pattern

Variable, bursty traffic: Use Standard or Global Standard (pay-per-token)
Consistent high volume: Use Provisioned types (reserved capacity)
Large batch jobs (not time-sensitive): Use Global Batch or DataZone Batch (50% cost savings)
Fine-tuned model evaluation: Use Developer (no SLA, lowest cost)

By latency requirement

Low latency variance required: Use Provisioned types
Latency variance acceptable: Use Standard types

Data processing locations

For standard deployments, there are three options: global, data zone, and Azure geography. For provisioned deployments, there are two options: global and Azure geography. Global Standard is a common starting point for most workloads.

Global deployments

Global deployments use Azure’s global infrastructure to dynamically route traffic to available datacenters. Global deployments offer the highest initial throughput limits and broadest model availability. For high-volume workloads, you might experience increased latency variation. If you require lower latency variance at scale, use provisioned deployment types. Global deployments receive new models and features first.

Data Zone deployments

For Global deployment types, prompts and responses might be processed in any geography where the model is deployed. For DataZone deployment types, prompts and responses are processed only within the specified data zone:

United States: Data processed anywhere within the US
European Union: Data processed within any EU member nation

Learn more in the “Model region availability by deployment type” section of Foundry Models sold directly by Azure.

With Global Standard and Data Zone Standard deployment types, if the primary region experiences an interruption in service, all traffic initially routed to this region is affected. To learn more, see the business continuity and disaster recovery guide.

Global Standard

SKU name in code: GlobalStandard

Global Standard deployments use Azure’s global infrastructure to dynamically route traffic to available datacenters. This deployment type provides the highest default quota and eliminates the need to load balance across multiple resources. Customers with high consistent volume might experience greater latency variability. The threshold is set per model. To learn more, see the Quotas page. For applications that require lower latency variance at large workload usage, consider provisioned throughput. Global Standard supports priority processing (preview) for faster response times on a pay-as-you-go basis. To learn more, see Priority processing for Foundry models (preview).

Global Provisioned

SKU name in code: GlobalProvisionedManaged

Global Provisioned deployments use Azure’s global infrastructure to dynamically route traffic to available datacenters. This deployment type provides reserved model processing capacity for predictable throughput, combining global routing with guaranteed capacity. With provisioned throughput, you purchase a fixed number of provisioned throughput units (PTUs) that guarantee a specific level of processing capacity. This deployment type provides lower and more consistent latency than Global Standard. To learn more, see Provisioned throughput concepts.

Global Batch

SKU name in code: GlobalBatch

Global Batch handles large-scale and high-volume processing tasks. You can process asynchronous groups of requests with separate quota and a 24-hour target turnaround, at 50% less cost than Global Standard. With batch processing, rather than sending one request at a time, you send a large number of requests in a single file. Global Batch requests have a separate enqueued token quota, which avoids any disruption of your online workloads. Common use cases:

Large-scale data processing: Analyze datasets in parallel.
Content generation: Create large volumes of text, such as product descriptions or articles.
Document review and summarization: Process and summarize lengthy documents.
Customer support automation: Handle numerous queries simultaneously.
Data extraction and analysis: Extract and analyze information from large amounts of unstructured data.
Natural language processing (NLP) tasks: Perform sentiment analysis or translation on large datasets.

Batch deployments trade real-time responsiveness for cost savings. Batch requests don’t have a real-time SLA — they target completion within 24 hours but might take longer.

Data Zone Standard

SKU name in code: DataZoneStandard

Data Zone Standard deployments dynamically route traffic to datacenters within the Microsoft-defined data zone (US or EU). This deployment type provides higher default quotas than geography-based deployment types while keeping data within the specified zone. Customers with high consistent volume might experience greater latency variability. The threshold is set per model. To learn more, see the quotas and limits page. For workloads that require low latency variance at large volume, consider provisioned deployment types. Data Zone Standard supports priority processing (preview) for faster response times on a pay-as-you-go basis. To learn more, see Priority processing for Foundry models (preview).

Data Zone Provisioned

SKU name in code: DataZoneProvisionedManaged

Data Zone Provisioned deployments dynamically route traffic within the Microsoft-specified data zone (US or EU) while providing reserved model processing capacity. This deployment type combines data zone compliance with high and predictable throughput.

Data Zone Batch

SKU name in code: DataZoneBatch

Data Zone Batch deployments provide the same functionality as Global Batch, including 50% cost savings and 24-hour turnaround. Traffic is routed only to datacenters within the Microsoft-defined data zone (US or EU).

Standard

SKU name in code: Standard

Standard deployments use pay-per-token billing. You pay only for what you consume. Models available in each region and throughput might be limited. Standard deployments are suited for low-to-medium volume workloads with high burstiness. Customers with high consistent volume might experience greater latency variability.

Regional Provisioned

SKU name in code: ProvisionedManaged

Regional Provisioned deployments allow you to specify the amount of throughput you require in a deployment. The service then allocates the necessary model processing capacity and ensures it’s ready for you. Throughput is defined in terms of provisioned throughput units (PTUs), which is a normalized way of representing the throughput for your deployment. Each model-version pair requires different amounts of PTUs to deploy, and provides different amounts of throughput per PTU. Minimum PTU requirements vary by model. For current minimums and available capacity, see Provisioned throughput concepts.

Developer (for fine-tuned models)

SKU name in code: DeveloperTier

The Developer deployment type is designed for fine-tuned model evaluation only. It provides cost-efficient testing of custom models but doesn’t include data residency guarantees or an SLA. Developer deployments have a fixed 24-hour lifetime and are automatically deleted after expiration. To learn more about using the Developer deployment type, see the fine-tuning guide.

Troubleshooting deployment issues

Common issues when creating or using deployments:

Issue	Cause	Resolution
Deployment type unavailable	Model doesn’t support the selected type	Check model availability by deployment type
Quota exceeded	Subscription limit reached for tokens per minute	Request quota increase in Azure portal or use a different region
Region unavailable	Model not deployed in selected region	Select a region from the model’s availability list
Provisioned capacity unavailable	No PTU capacity in region	Try a different region or use Global Provisioned for broader availability

For quota limits by deployment type, see Foundry Models quotas and limits.

Restrict deployment types with Azure Policy

Azure Policy helps enforce organizational standards and assess compliance at scale. Through its compliance dashboard, you can evaluate the overall state of the environment and drill down to per-resource, per-policy granularity. Azure Policy also supports bulk remediation for existing resources and automatic remediation for new resources. Learn more about Azure Policy and specific built-in controls for Foundry Tools. Use the following policy to disable access to a specific Foundry deployment type. Replace GlobalStandard with the SKU name for the deployment type you want to restrict.

{
    "mode": "All",
    "policyRule": {
        "if": {
            "allOf": [
                {
                    "field": "type",
                    "equals": "Microsoft.CognitiveServices/accounts/deployments"
                },
                {
                    "field": "Microsoft.CognitiveServices/accounts/deployments/sku.name",
                    "equals": "GlobalStandard"
                }
            ]
        }
    }
}

What is Microsoft Foundry (new)?

Get started

Agent development

Agent tools & integration

Model capabilities

Fine-tuning

Manage agents, models, & tools

Observability, evaluation, & tracing

Developer experience

API & SDK

Responsible AI

Best practices

Setup & configure

Security & governance

Operate & support

Understanding deployment types in Microsoft Foundry Models

Deployment types for Microsoft Foundry Models

Deployment type comparison

Choose the right deployment type

By data residency requirement

By workload pattern

By latency requirement

Data processing locations

Global deployments

Data Zone deployments

Global Standard

Global Provisioned

Global Batch

Data Zone Standard

Data Zone Provisioned

Data Zone Batch

Standard

Regional Provisioned

Developer (for fine-tuned models)

Troubleshooting deployment issues

Restrict deployment types with Azure Policy

What is Microsoft Foundry (new)?

Get started

Agent development

Agent tools & integration

Model capabilities

Fine-tuning

Manage agents, models, & tools

Observability, evaluation, & tracing

Developer experience

API & SDK

Responsible AI

Best practices

Setup & configure

Security & governance

Operate & support

​Deployment types for Microsoft Foundry Models

​Deployment type comparison

​Choose the right deployment type

​By data residency requirement

​By workload pattern

​By latency requirement

​Data processing locations

​Global deployments

​Data Zone deployments

​Global Standard

​Global Provisioned

​Global Batch

​Data Zone Standard

​Data Zone Provisioned

​Data Zone Batch

​Standard

​Regional Provisioned

​Developer (for fine-tuned models)

​Troubleshooting deployment issues

​Restrict deployment types with Azure Policy

​Related content

Deployment types for Microsoft Foundry Models

Deployment type comparison

Choose the right deployment type

By data residency requirement

By workload pattern

By latency requirement

Data processing locations

Global deployments

Data Zone deployments

Global Standard

Global Provisioned

Global Batch

Data Zone Standard

Data Zone Provisioned

Data Zone Batch

Standard

Regional Provisioned

Developer (for fine-tuned models)

Troubleshooting deployment issues

Restrict deployment types with Azure Policy

Related content