Supported metrics for Microsoft.CognitiveServices/accounts
Monitor the most important metrics for Azure OpenAI. Later in this article, you find a longer list of all available metrics for this namespace, which contains more details on metrics in this shorter list. Please see the following list for the most up-to-date information. The Azure team is working on refreshing the tables in the following sections.Don’t confuse the metrics in this section with the legacy
Latency metric listed under Cognitive Services - HTTP Requests later in this article. The legacy Latency metric isn’t designed for Azure OpenAI workloads and produces misleading results when used to diagnose Azure OpenAI latency. For Azure OpenAI latency monitoring, use Time to Response (AzureOpenAITimeToResponse), Time to Last Byte (AzureOpenAITTLTInMS), Time Between Tokens (AzureOpenAINormalizedTBTInMS), or Normalized Time to First Byte (AzureOpenAINormalizedTTFTInMS). For guidance on interpreting these metrics, see Performance and latency.- Azure OpenAI Requests
- Active Tokens
- Generated Completion Tokens
- Processed FineTuned Training Hours
- Processed Inference Tokens
- Processed Prompt Tokens
- Provisioned-managed Utilization V2
- Prompt Token Cache Match Rate
- Time to Response
- Time Between Tokens
- Time to Last Byte
- Normalized Time to First Byte
- Tokens per Second
- Blocked Volume
- Harmful Volume Detected
- Potential Abusive User Count
- Safety System Event
- Total Volume Sent for Safety Check
The Provisioned-managed Utilization metric is now deprecated and is no longer recommended. This metric is replaced by the Provisioned-managed Utilization V2 metric.
Tokens per Second, Time to Response, and Time Between Tokens aren’t currently available for Standard deployments.
Quick reference: Key metrics by use case
Use this table to find the right metric for a specific monitoring goal. For end-to-end guidance on interpreting these metrics, see Performance and latency.| I want to monitor… | Use this metric | REST API name |
|---|---|---|
| Overall response time | Time to Last Byte | AzureOpenAITTLTInMS |
| First-token responsiveness (streaming) | Time to Response | AzureOpenAITimeToResponse |
| Token generation speed | Time Between Tokens | AzureOpenAINormalizedTBTInMS |
| First-token efficiency normalized by prompt size | Normalized Time to First Byte | AzureOpenAINormalizedTTFTInMS |
| Output token volume per request | Generated Completion Tokens | GeneratedTokens |
| Input token volume per request | Processed Prompt Tokens | ProcessedPromptTokens |
| PTU capacity utilization | Provisioned-managed Utilization V2 | AzureOpenAIProvisionedManagedUtilizationV2 |
| Request volume and errors | Azure OpenAI Requests | AzureOpenAIRequests |
- ApiName
- FeatureName
- ModelDeploymentName
- ModelName
- ModelVersion
- OperationName
- Region
- StatusCode
- StreamType
- UsageChannel
Supported resource logs for Microsoft.CognitiveServices/accounts
Azure OpenAI microsoft.cognitiveservices/accounts
Related content
- For a description of monitoring Azure OpenAI, see Monitor Azure OpenAI.
- For details on monitoring Azure resources, see Monitor Azure resources with Azure Monitor.