Skip to main content

Deployment types for Microsoft Foundry Models

When you deploy a model in Microsoft Foundry, you choose a deployment type that determines:
  • Where your data is processed (global, data zone, or single region)
  • How you pay (pay-per-token or reserved capacity)
  • Performance characteristics (latency variance, throughput limits)
The service offers two main categories: standard (pay-per-token) and provisioned (reserved capacity). Within each category, you can choose global, data zone, or regional processing based on your compliance requirements.
Screenshot of the Foundry portal deployment dialog showing the deployment type selection box with Global Standard selected.
Data residency for all deployment types: Data stored at rest remains in the designated Azure geography. However, inferencing data is processed as follows:
  • Global types: May be processed in any Azure region
  • DataZone types: Processed only within the Microsoft-specified data zone (US or EU)
  • Standard/Regional types: Processed in the deployment region
Learn more about data residency.

Deployment type comparison

Deployment typeSKU codeData processingBillingBest for
Global StandardGlobalStandardAny Azure regionPay-per-tokenGeneral workloads, highest quota
Global ProvisionedGlobalProvisionedManagedAny Azure regionReserved PTUPredictable high-throughput
Global BatchGlobalBatchAny Azure region50% discount, 24-hrLarge async jobs
Data Zone StandardDataZoneStandardWithin data zonePay-per-tokenEU/US data zone compliance
Data Zone ProvisionedDataZoneProvisionedManagedWithin data zoneReserved PTUData zone + predictable throughput
Data Zone BatchDataZoneBatchWithin data zone50% discountLarge async jobs with data zone
StandardStandardSingle regionPay-per-tokenRegional compliance, low volume
Regional ProvisionedProvisionedManagedSingle regionReserved PTURegional compliance + throughput
DeveloperDeveloperTierAny Azure regionPay-per-tokenFine-tuned model evaluation only
Not all models support all deployment types. Check Foundry Models sold directly by Azure for model availability by deployment type and region.
SLA guarantees vary by deployment type. Provisioned types provide guaranteed throughput and lower latency variance. Standard types offer best-effort service. Developer deployments don’t include an SLA. For details, see the Azure SLA for Azure OpenAI Service.
For detailed pricing, see Azure OpenAI Service pricing.

Choose the right deployment type

Use the following criteria to select a deployment type:

By data residency requirement

  • No restrictions: Use Global Standard or Global Provisioned
  • EU data zone: Use DataZone Standard or DataZone Provisioned in an EU region
  • US data zone: Use DataZone Standard or DataZone Provisioned in a US region
  • Single region only: Use Standard or Regional Provisioned

By workload pattern

  • Variable, bursty traffic: Use Standard or Global Standard (pay-per-token)
  • Consistent high volume: Use Provisioned types (reserved capacity)
  • Large batch jobs (not time-sensitive): Use Global Batch or DataZone Batch (50% cost savings)
  • Fine-tuned model evaluation: Use Developer (no SLA, lowest cost)

By latency requirement

  • Low latency variance required: Use Provisioned types
  • Latency variance acceptable: Use Standard types

Data processing locations

For standard deployments, there are three options: global, data zone, and Azure geography. For provisioned deployments, there are two options: global and Azure geography. Global Standard is a common starting point for most workloads.

Global deployments

Global deployments use Azure’s global infrastructure to dynamically route traffic to available datacenters. Global deployments offer the highest initial throughput limits and broadest model availability. For high-volume workloads, you might experience increased latency variation. If you require lower latency variance at scale, use provisioned deployment types. Global deployments receive new models and features first.

Data Zone deployments

For Global deployment types, prompts and responses might be processed in any geography where the model is deployed. For DataZone deployment types, prompts and responses are processed only within the specified data zone:
  • United States: Data processed anywhere within the US
  • European Union: Data processed within any EU member nation
Learn more in the “Model region availability by deployment type” section of Foundry Models sold directly by Azure.
With Global Standard and Data Zone Standard deployment types, if the primary region experiences an interruption in service, all traffic initially routed to this region is affected. To learn more, see the business continuity and disaster recovery guide.

Global Standard

  • SKU name in code: GlobalStandard
Global Standard deployments use Azure’s global infrastructure to dynamically route traffic to available datacenters. This deployment type provides the highest default quota and eliminates the need to load balance across multiple resources. Customers with high consistent volume might experience greater latency variability. The threshold is set per model. To learn more, see the Quotas page. For applications that require lower latency variance at large workload usage, consider provisioned throughput. Global Standard supports priority processing (preview) for faster response times on a pay-as-you-go basis. To learn more, see Priority processing for Foundry models (preview).

Global Provisioned

  • SKU name in code: GlobalProvisionedManaged
Global Provisioned deployments use Azure’s global infrastructure to dynamically route traffic to available datacenters. This deployment type provides reserved model processing capacity for predictable throughput, combining global routing with guaranteed capacity. With provisioned throughput, you purchase a fixed number of provisioned throughput units (PTUs) that guarantee a specific level of processing capacity. This deployment type provides lower and more consistent latency than Global Standard. To learn more, see Provisioned throughput concepts.

Global Batch

  • SKU name in code: GlobalBatch
Global Batch handles large-scale and high-volume processing tasks. You can process asynchronous groups of requests with separate quota and a 24-hour target turnaround, at 50% less cost than Global Standard. With batch processing, rather than sending one request at a time, you send a large number of requests in a single file. Global Batch requests have a separate enqueued token quota, which avoids any disruption of your online workloads. Common use cases:
  • Large-scale data processing: Analyze datasets in parallel.
  • Content generation: Create large volumes of text, such as product descriptions or articles.
  • Document review and summarization: Process and summarize lengthy documents.
  • Customer support automation: Handle numerous queries simultaneously.
  • Data extraction and analysis: Extract and analyze information from large amounts of unstructured data.
  • Natural language processing (NLP) tasks: Perform sentiment analysis or translation on large datasets.
Batch deployments trade real-time responsiveness for cost savings. Batch requests don’t have a real-time SLA — they target completion within 24 hours but might take longer.

Data Zone Standard

  • SKU name in code: DataZoneStandard
Data Zone Standard deployments dynamically route traffic to datacenters within the Microsoft-defined data zone (US or EU). This deployment type provides higher default quotas than geography-based deployment types while keeping data within the specified zone. Customers with high consistent volume might experience greater latency variability. The threshold is set per model. To learn more, see the quotas and limits page. For workloads that require low latency variance at large volume, consider provisioned deployment types. Data Zone Standard supports priority processing (preview) for faster response times on a pay-as-you-go basis. To learn more, see Priority processing for Foundry models (preview).

Data Zone Provisioned

  • SKU name in code: DataZoneProvisionedManaged
Data Zone Provisioned deployments dynamically route traffic within the Microsoft-specified data zone (US or EU) while providing reserved model processing capacity. This deployment type combines data zone compliance with high and predictable throughput.

Data Zone Batch

  • SKU name in code: DataZoneBatch
Data Zone Batch deployments provide the same functionality as Global Batch, including 50% cost savings and 24-hour turnaround. Traffic is routed only to datacenters within the Microsoft-defined data zone (US or EU).

Standard

  • SKU name in code: Standard
Standard deployments use pay-per-token billing. You pay only for what you consume. Models available in each region and throughput might be limited. Standard deployments are suited for low-to-medium volume workloads with high burstiness. Customers with high consistent volume might experience greater latency variability.

Regional Provisioned

  • SKU name in code: ProvisionedManaged
Regional Provisioned deployments allow you to specify the amount of throughput you require in a deployment. The service then allocates the necessary model processing capacity and ensures it’s ready for you. Throughput is defined in terms of provisioned throughput units (PTUs), which is a normalized way of representing the throughput for your deployment. Each model-version pair requires different amounts of PTUs to deploy, and provides different amounts of throughput per PTU. Minimum PTU requirements vary by model. For current minimums and available capacity, see Provisioned throughput concepts.

Developer (for fine-tuned models)

  • SKU name in code: DeveloperTier
The Developer deployment type is designed for fine-tuned model evaluation only. It provides cost-efficient testing of custom models but doesn’t include data residency guarantees or an SLA. Developer deployments have a fixed 24-hour lifetime and are automatically deleted after expiration. To learn more about using the Developer deployment type, see the fine-tuning guide.

Troubleshooting deployment issues

Common issues when creating or using deployments:
IssueCauseResolution
Deployment type unavailableModel doesn’t support the selected typeCheck model availability by deployment type
Quota exceededSubscription limit reached for tokens per minuteRequest quota increase in Azure portal or use a different region
Region unavailableModel not deployed in selected regionSelect a region from the model’s availability list
Provisioned capacity unavailableNo PTU capacity in regionTry a different region or use Global Provisioned for broader availability
For quota limits by deployment type, see Foundry Models quotas and limits.

Restrict deployment types with Azure Policy

Azure Policy helps enforce organizational standards and assess compliance at scale. Through its compliance dashboard, you can evaluate the overall state of the environment and drill down to per-resource, per-policy granularity. Azure Policy also supports bulk remediation for existing resources and automatic remediation for new resources. Learn more about Azure Policy and specific built-in controls for Foundry Tools. Use the following policy to disable access to a specific Foundry deployment type. Replace GlobalStandard with the SKU name for the deployment type you want to restrict.
{
    "mode": "All",
    "policyRule": {
        "if": {
            "allOf": [
                {
                    "field": "type",
                    "equals": "Microsoft.CognitiveServices/accounts/deployments"
                },
                {
                    "field": "Microsoft.CognitiveServices/accounts/deployments/sku.name",
                    "equals": "GlobalStandard"
                }
            ]
        }
    }
}