Skip to main content
This article contains brief example templates to help get you started programmatically creating Azure OpenAI deployments that use quota to set TPM rate limits. With the introduction of quota you must use API version 2023-05-01 for resource management related activities. This API version is only for managing your resources, and doesn’t impact the API version used for inferencing calls like completions, chat completions, embedding, image generation, etc.

Prerequisites

Before you create deployments programmatically, complete the following: Each tab in this article lists any tool-specific prerequisites, such as the required Azure CLI or Az PowerShell module version.

Create a deployment and query usage

Select the tab for the tool or template language you want to use. Each tab includes a deployment example that sets a TPM-based capacity, followed by a usage query that returns your remaining quota in the specified region.

Deployment

PUT https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.CognitiveServices/accounts/{accountName}/deployments/{deploymentName}?api-version=2023-05-01
Path parameters
ParameterTypeRequired?Description
accountNamestringRequiredThe name of your Azure OpenAI Resource.
deploymentNamestringRequiredThe deployment name you chose when you deployed an existing model or the name you would like a new model deployment to have.
resourceGroupNamestringRequiredThe name of the associated resource group for this model deployment.
subscriptionIdstringRequiredSubscription ID for the associated subscription.
api-versionstringRequiredThe API version to use for this operation. This follows the YYYY-MM-DD format.
Supported versionsRequest bodyThis is only a subset of the available request body parameters. For the full list of the parameters, you can refer to the REST API reference documentation.
ParameterTypeDescription
skuSkuThe resource model definition representing SKU.
capacityintegerThis represents the amount of quota you’re assigning to this deployment. A value of 1 equals 1,000 Tokens per Minute (TPM). A value of 10 equals 10k Tokens per Minute (TPM).

Example request

curl -X PUT https://management.azure.com/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/resource-group-temp/providers/Microsoft.CognitiveServices/accounts/docs-openai-test-001/deployments/gpt-4o-test-deployment?api-version=2023-05-01 \
  -H "Content-Type: application/json" \
  -H 'Authorization: Bearer YOUR_AUTH_TOKEN' \
  -d '{"sku":{"name":"Standard","capacity":10},"properties": {"model": {"format": "OpenAI","name": "gpt-4o","version": "2024-11-20"}}}'
There are multiple ways to generate an authorization token. The easiest method for initial testing is to launch the Cloud Shell from the Azure portal. Then run az account get-access-token. You can use this token as your temporary authorization token for API testing.
For more information, see the REST API reference documentation for usages and deployment.

Usage

To query your quota usage in a given region, for a specific subscription
GET https://management.azure.com/subscriptions/{subscriptionId}/providers/Microsoft.CognitiveServices/locations/{location}/usages?api-version=2023-05-01
Path parameters
ParameterTypeRequired?Description
subscriptionIdstringRequiredSubscription ID for the associated subscription.
locationstringRequiredLocation to view usage for ex: eastus
api-versionstringRequiredThe API version to use for this operation. This follows the YYYY-MM-DD format.
Supported versions

Example request

curl -X GET https://management.azure.com/subscriptions/00000000-0000-0000-0000-000000000000/providers/Microsoft.CognitiveServices/locations/eastus/usages?api-version=2023-05-01 \
  -H "Content-Type: application/json" \
  -H 'Authorization: Bearer YOUR_AUTH_TOKEN'