Skip to main content
For details on the data you can collect for Azure OpenAI in Microsoft Foundry Models and how to use it, see Monitor Azure OpenAI.

Supported metrics for Microsoft.CognitiveServices/accounts

Monitor the most important metrics for Azure OpenAI. Later in this article, you find a longer list of all available metrics for this namespace, which contains more details on metrics in this shorter list. Please see the following list for the most up-to-date information. The Azure team is working on refreshing the tables in the following sections.
Don’t confuse the metrics in this section with the legacy Latency metric listed under Cognitive Services - HTTP Requests later in this article. The legacy Latency metric isn’t designed for Azure OpenAI workloads and produces misleading results when used to diagnose Azure OpenAI latency. For Azure OpenAI latency monitoring, use Time to Response (AzureOpenAITimeToResponse), Time to Last Byte (AzureOpenAITTLTInMS), Time Between Tokens (AzureOpenAINormalizedTBTInMS), or Normalized Time to First Byte (AzureOpenAINormalizedTTFTInMS). For guidance on interpreting these metrics, see Performance and latency.
  • Azure OpenAI Requests
  • Active Tokens
  • Generated Completion Tokens
  • Processed FineTuned Training Hours
  • Processed Inference Tokens
  • Processed Prompt Tokens
  • Provisioned-managed Utilization V2
  • Prompt Token Cache Match Rate
  • Time to Response
  • Time Between Tokens
  • Time to Last Byte
  • Normalized Time to First Byte
  • Tokens per Second
You can also monitor Content Safety metrics that other related services use.
  • Blocked Volume
  • Harmful Volume Detected
  • Potential Abusive User Count
  • Safety System Event
  • Total Volume Sent for Safety Check
The Provisioned-managed Utilization metric is now deprecated and is no longer recommended. This metric is replaced by the Provisioned-managed Utilization V2 metric. Tokens per Second, Time to Response, and Time Between Tokens aren’t currently available for Standard deployments.

Quick reference: Key metrics by use case

Use this table to find the right metric for a specific monitoring goal. For end-to-end guidance on interpreting these metrics, see Performance and latency.
I want to monitor…Use this metricREST API name
Overall response timeTime to Last ByteAzureOpenAITTLTInMS
First-token responsiveness (streaming)Time to ResponseAzureOpenAITimeToResponse
Token generation speedTime Between TokensAzureOpenAINormalizedTBTInMS
First-token efficiency normalized by prompt sizeNormalized Time to First ByteAzureOpenAINormalizedTTFTInMS
Output token volume per requestGenerated Completion TokensGeneratedTokens
Input token volume per requestProcessed Prompt TokensProcessedPromptTokens
PTU capacity utilizationProvisioned-managed Utilization V2AzureOpenAIProvisionedManagedUtilizationV2
Request volume and errorsAzure OpenAI RequestsAzureOpenAIRequests
Always pair a latency metric with a token count metric. A latency increase without a corresponding token increase might indicate a real issue. A latency increase with a proportional token increase is expected behavior.
The metrics under Cognitive Services - HTTP Requests later in this article are legacy Cognitive Services metrics and aren’t designed for Azure OpenAI workloads. In particular, the Latency metric in that category isn’t the same as the Azure OpenAI latency metrics (Time to Response, Time to Last Byte, Time Between Tokens, Normalized Time to First Byte). Using the legacy Latency metric for Azure OpenAI troubleshooting produces misleading results. Use the Azure OpenAI metrics listed in this section instead.
The following table lists the metrics available for the Microsoft.CognitiveServices/accounts resource type.
  • ApiName
  • FeatureName
  • ModelDeploymentName
  • ModelName
  • ModelVersion
  • OperationName
  • Region
  • StatusCode
  • StreamType
  • UsageChannel

Supported resource logs for Microsoft.CognitiveServices/accounts

Azure OpenAI microsoft.cognitiveservices/accounts