Skip to main content
When you deploy a model in Microsoft Foundry in Azure Government, you choose a deployment type that determines:
  • Where your data is processed (data zone or single region)
  • How you pay (pay-per-token or reserved capacity)
  • Performance characteristics (latency variance, throughput limits)
The service offers two main categories: standard (pay-per-token) and provisionedmanaged (reserved capacity). Within each category, you can choose data zone or single regional processing based on your requirements.
Screenshot of the Foundry portal deployment dialog showing the deployment type selection box with Global Standard selected.
Data residency for all deployment types: Data stored at rest remains in the designated Azure region. However, inferencing data is processed as follows:
  • USGov DataZone types: Processed only within the Azure Government cloud USGov data zone
  • Standard/Regional types: Processed in the deployment region

Deployment type comparison

Deployment typeSKU codeData processingBillingBest for
Data Zone StandardDataZoneStandardWithin data zonePay-per-tokenUSGov data zone compliance
Data Zone ProvisionedDataZoneProvisionedManagedWithin data zoneReserved PTUUSGov Data zone + predictable throughput
StandardStandardSingle regionPay-per-tokenRegional compliance, low volume
Regional ProvisionedProvisionedManagedSingle regionReserved PTURegional compliance + throughput
Not all models support all deployment types. Check Foundry Models sold by Azure for model availability by deployment type and region.
SLA guarantees vary by deployment type. Provisioned types provide guaranteed throughput and lower latency variance. Standard types offer best-effort service. For details, see the Azure SLA for Azure OpenAI Service.
For detailed pricing, see Azure OpenAI Service pricing.

Choose the right deployment type

Use the following criteria to select a deployment type:

By data residency requirement

  • USGov data zone: Use DataZone Standard or DataZone Provisioned in an Azure Government region
  • Single region only: Use Standard or Regional Provisioned

By workload pattern

  • Variable, bursty traffic: Use Standard or DataZone (pay-per-token)
  • Consistent high volume: Use Provisioned types (reserved capacity)

By latency requirement

  • Low latency variance required: Use Provisioned types
  • Latency variance acceptable: Use Standard types

Data Zone deployments

For DataZone deployment types, prompts and responses are processed only within the specified data zone:
  • USGov: Data processed within the two Azure Government regions (USGovArizona or USGovVirginia)
Learn more in the “Model region availability by deployment type” section of Foundry Models sold by Azure.
With Data Zone Standard deployment types, if the primary region experiences an interruption in service, all traffic initially routed to this region is affected. To learn more, see the high availability and disaster recovery guide.

Data Zone Standard

  • SKU name in code: DataZoneStandard
Data Zone Standard deployments dynamically route traffic to datacenters within the Microsoft-defined data zone (USGov). This deployment type provides higher default quotas than geography-based deployment types while keeping data within the specified zone. Customers with high consistent volume might experience greater latency variability. The threshold is set per model. To learn more about Azure OpenAI quotas in Azure Government, see the Quotas and limits in Azure OpenAI. For workloads that require low latency variance at large volume, consider provisioned deployment types.

Data Zone Provisioned

  • SKU name in code: DataZoneProvisionedManaged
Data Zone Provisioned deployments dynamically route traffic within the Microsoft-specified data zone (USGov) while providing reserved model processing capacity. This deployment type combines data zone compliance with high and predictable throughput.

Standard

  • SKU name in code: Standard
Standard deployments use pay-per-token billing. You pay only for what you consume. Models available in each region and throughput might be limited. Standard deployments are suited for low-to-medium volume workloads with high burstiness. Customers with high consistent volume might experience greater latency variability.

Regional Provisioned

  • SKU name in code: ProvisionedManaged
Regional Provisioned deployments allow you to specify the amount of throughput you require in a deployment. The service then allocates the necessary model processing capacity and ensures it’s ready for you. Throughput is defined in terms of provisioned throughput units (PTUs), which is a normalized way of representing the throughput for your deployment. Each model-version pair requires different amounts of PTUs to deploy, and provides different amounts of throughput per PTU. Minimum PTU requirements vary by model. For current minimums and available capacity, see Provisioned throughput concepts.

Troubleshooting deployment issues

Common issues when creating or using deployments:
IssueCauseResolution
Deployment type unavailableModel doesn’t support the selected typeCheck model availability by deployment type
Quota exceededSubscription limit reached for tokens per minuteRequest quota increase at Azure Government AOAI Quota or use a different region
Region unavailableModel not deployed in selected regionSelect a region from the model’s availability list
Provisioned capacity unavailableNo PTU capacity in regionTry a different region or use DataZone Provisioned for broader availability
For Azure OpenAI quota limits by deployment type in Azure Government, see Quotas and limits in Azure OpenAI.

Abuse Monitoring in Azure Government

Not all features of Abuse Monitoring are enabled for Azure OpenAI deployments in Azure Government. You are responsible for implementing reasonable technical and operational measures to detect and mitigate any use of the service in violation of the Product Terms. Automated Content Classification and Filtering remains enabled by default for Azure Government. If modified content filters are required, apply at Azure Government Modified Filter Application.