Built-in evaluators reference
General purpose evaluators
Textual similarity evaluators
RAG evaluators
Risk and safety evaluators
Agent evaluators
Azure OpenAI graders
Custom evaluators
Combining evaluators
Related content

Built-in evaluators reference

This article refers to the Microsoft Foundry (new) portal.

Items marked (preview) in this article are currently in public preview. This preview is provided without a service-level agreement, and we don’t recommend it for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

Microsoft Foundry provides a comprehensive set of built-in evaluators to assess the quality, safety, and reliability of AI responses throughout the development lifecycle. This reference details all available evaluators, their purposes, required inputs, and guidance on selecting the right evaluator for your use case. You can also create custom evaluators tailored to your specific evaluation criteria.

The Microsoft Foundry SDK for evaluation and Foundry portal are in public preview, but the APIs are generally available for model and dataset evaluation (agent evaluation remains in public preview). Evaluators marked (preview) in this article are currently in public preview everywhere.

General purpose evaluators

Textual similarity evaluators

RAG evaluators

Risk and safety evaluators

Agent evaluators

| Evaluator | Purpose | |—|—|—| | Task Adherence (preview) | Measures whether the agent follows through on identified tasks according to system instructions. | | Task Completion (preview)| Measures whether the agent successfully completed the requested task end-to-end. | | Intent Resolution (preview) | Measures how accurately the agent identifies and addresses user intentions. | | Task Navigation Efficiency (preview) | Determines whether the agent’s sequence of steps matches an optimal or expected path to measure efficiency. | | Tool Call Accuracy (preview) | Measures the overall quality of tool calls including selection, parameter correctness, and efficiency. | | Tool Selection (preview) | Measures whether the agent selected the most appropriate and efficient tools for a task. | | Tool Input Accuracy (preview)| Validates that all tool call parameters are correct with strict criteria including grounding, type, format, completeness, and appropriateness. | | Tool Output Utilization (preview)| Measures whether the agent correctly interprets and uses tool outputs contextually in responses and subsequent calls. | | Tool Call Success (preview) | Evaluates whether all tool calls executed successfully without technical failures. | To learn more, see Agent evaluators.

Azure OpenAI graders

Custom evaluators

In addition to built-in evaluators, you can create custom evaluators tailored to your specific evaluation criteria. Custom evaluators allow you to define unique scoring logic, validation rules, and quality metrics that align with your business requirements and application-specific needs. To learn more, see Custom evaluators.

Combining evaluators

For comprehensive quality assessment, combine multiple evaluators:

RAG applications: Retrieval + Groundedness + Relevance + Content Safety
Agent applications: Tool Call Accuracy + Task Adherence + Intent Resolution + Content Safety
Translation applications: BLEU + METEOR + Fluency + Coherence
All applications: Add risk and safety evaluators (Hate and Unfairness, Sexual, Violence, Self-Harm) for responsible AI practices

Microsoft Foundry risk and safety evaluations (preview) Transparency Note

General Purpose Evaluators for Generative AI

What is Microsoft Foundry (new)?

Get started

Agent development

Agent tools & integration

Model capabilities

Fine-tuning

Manage agents, models, & tools

Observability, evaluation, & tracing

Developer experience

API & SDK

Responsible AI

Best practices

Setup & configure

Security & governance

Operate & support

Built-in Evaluators Reference

Built-in evaluators reference

General purpose evaluators

Textual similarity evaluators

RAG evaluators

Risk and safety evaluators

Agent evaluators

Azure OpenAI graders

Custom evaluators

Combining evaluators

What is Microsoft Foundry (new)?

Get started

Agent development

Agent tools & integration

Model capabilities

Fine-tuning

Manage agents, models, & tools

Observability, evaluation, & tracing

Developer experience

API & SDK

Responsible AI

Best practices

Setup & configure

Security & governance

Operate & support

​Built-in evaluators reference

​General purpose evaluators

​Textual similarity evaluators

​RAG evaluators

​Risk and safety evaluators

​Agent evaluators

​Azure OpenAI graders

​Custom evaluators

​Combining evaluators

​Related content

Built-in evaluators reference

General purpose evaluators

Textual similarity evaluators

RAG evaluators

Risk and safety evaluators

Agent evaluators

Azure OpenAI graders

Custom evaluators

Combining evaluators

Related content