General purpose evaluators

This article refers to the Microsoft Foundry (new) portal.

The Microsoft Foundry SDK for evaluation and Foundry portal are in public preview, but the APIs are generally available for model and dataset evaluation (agent evaluation remains in public preview). Evaluators marked (preview) in this article are currently in public preview everywhere.

AI systems might generate textual responses that are incoherent, or lack the general writing quality beyond minimum grammatical correctness. To address these issues, Microsoft Foundry supports evaluating coherence and fluency.

Coherence

The coherence evaluator measures the logical and orderly presentation of ideas in a response, which allows the reader to easily follow and understand the writer’s train of thought. A coherent response directly addresses the question with clear connections between sentences and paragraphs, using appropriate transitions and a logical sequence of ideas. Higher scores mean better coherence.

Fluency

The fluency evaluator measures the effectiveness and clarity of written communication. This measure focuses on grammatical accuracy, vocabulary range, sentence complexity, coherence, and overall readability. It assesses how smoothly ideas are conveyed and how easily the reader can understand the text.

Using general-purpose evaluators

General-purpose evaluators assess the quality of AI-generated text independent of specific use cases. Examples:

Evaluator	What it measures	Required inputs	Required parameters
`builtin.coherence`	Logical flow and organization of ideas	`query`, `response`	`deployment_name`
`builtin.fluency`	Grammatical accuracy and readability	`response`	`deployment_name`

Example input

Your test dataset should contain the fields referenced in your data mappings:

{"query": "What are the benefits of renewable energy?", "response": "Renewable energy reduces carbon emissions, lowers long-term costs, and provides energy independence."}
{"query": "How does photosynthesis work?", "response": "Plants convert sunlight, water, and carbon dioxide into glucose and oxygen through chlorophyll in their leaves."}

Configuration example

Data mapping syntax:

{{item.field_name}} references fields from your test dataset (for example, {{item.query}}).
{{sample.output_text}} references response text generated or retrieved during evaluation. Use this when evaluating with a model target or agent target.

testing_criteria = [
    {
        "type": "azure_ai_evaluator",
        "name": "coherence",
        "evaluator_name": "builtin.coherence",
        "initialization_parameters": {"deployment_name": model_deployment},
        "data_mapping": {"query": "{{item.query}}", "response": "{{item.response}}"},
    },
    {
        "type": "azure_ai_evaluator",
        "name": "fluency",
        "evaluator_name": "builtin.fluency",
        "initialization_parameters": {"deployment_name": model_deployment},
        "data_mapping": {"response": "{{item.response}}"},
    },
]

See Run evaluations in the cloud for details on running evaluations and configuring data sources.

Example output

These evaluators return scores on a 1-5 Likert scale (1 = very poor, 5 = excellent). The default pass threshold is 3. Scores at or above the threshold are considered passing. Key output fields:

{
    "type": "azure_ai_evaluator",
    "name": "Coherence",
    "metric": "coherence",
    "score": 4,
    "label": "pass",
    "reason": "The response directly addresses the question with clear, logical connections between ideas.",
    "threshold": 3,
    "passed": true
}

How to run cloud evaluation

Built-in Evaluators Reference

What is Microsoft Foundry (new)?

Get started

Agent development

Agent tools & integration

Model capabilities

Fine-tuning

Manage agents, models, & tools

Observability, evaluation, & tracing

Developer experience

API & SDK

Responsible AI

Best practices

Setup & configure

Security & governance

Operate & support

General Purpose Evaluators for Generative AI

General purpose evaluators

Coherence

Fluency

Using general-purpose evaluators

Example input

Configuration example

Example output

What is Microsoft Foundry (new)?

Get started

Agent development

Agent tools & integration

Model capabilities

Fine-tuning

Manage agents, models, & tools

Observability, evaluation, & tracing

Developer experience

API & SDK

Responsible AI

Best practices

Setup & configure

Security & governance

Operate & support

​General purpose evaluators

​Coherence

​Fluency

​Using general-purpose evaluators

​Example input

​Configuration example

​Example output

​Related content

General purpose evaluators

Coherence

Fluency

Using general-purpose evaluators

Example input

Configuration example

Example output

Related content