See evaluation results in the Microsoft Foundry portal
Prerequisites
See your evaluation results
Evaluation run details
Compare the evaluation results
Understand the built-in evaluation metrics
Troubleshooting
Related content

See evaluation results in the Microsoft Foundry portal

This article refers to the Microsoft Foundry (new) portal.

In this article, you learn to:

Locate and open evaluation runs.
View aggregate and sample-level metrics.
Compare results across runs.
Interpret metric categories and calculations.
Troubleshoot missing or partial metrics.

Prerequisites

An evaluation run.
- To learn how to run evaluations in the portal see, Evaluate generative AI models and applications.
- To learn how to run evaluations from the SDK see, Run evaluations from the SDK or Evaluate your AI agents.

See your evaluation results

After submitting an evaluation, you can track its progress on the Evaluation details page. When the evaluation completes, the page displays key information such as: -The evaluation creator -Evaluation token usage -Scores for each evaluator, broken down by run

Screenshot of the evaluation details page showing evaluation runs.

Select a specific run to drill into row‑level results. Select Learn more about metrics for definitions and formulas.

Evaluation run details

To view the row level data for individual runs, select the name of the run. This provides a view that allows you to see evaluation results at the individual query level against each evaluator used. Here, you can view details like query, response, ground truth, and the evaluator score and explanation.

Compare the evaluation results

To facilitate a comprehensive comparison between two or more runs, you can select the desired runs and initiate the process.

Select two or more runs in the evaluation detail page.
Select Compare.

It generates a side-by-side comparison view for all selected runs. The comparison is computed based on statistic t-testing, which provides more sensitive and reliable results for you to make decisions. You can use different functionalities of this feature:

Baseline comparison: By setting a baseline run, you can identify a reference point against which to compare the other runs. You can see how each run deviates from your chosen standard.
Statistic t-testing assessment: Each cell provides the stat-sig results with different color codes. You can also hover on the cell to get the sample size and p-value.

Legend	Definition
ImprovedStrong	Highly stat-sig (p<=0.001) and moved in the desired direction
ImprovedWeak	Stat-sig (0.001<p<=0.05) and moved in the desired direction
DegradedStrong	Highly stat-sig (p<=0.001) and moved in the wrong direction
DegradedWeak	Stat-sig (0.001<p<=0.05) and moved in the wrong direction
ChangedStrong	Highly stat-sig (p<=0.001) and desired direction is neutral
ChangedWeak	Stat-sig (0.001<p<=0.05) and desired direction is neutral
Inconclusive	Too few examples, or p>=0.05

The comparison view won’t be saved. If you leave the page, you can reselect the runs and select Compare to regenerate the view.

Understand the built-in evaluation metrics

Understanding the built-in metrics is essential for assessing the performance and effectiveness of your AI application. By learning about these key measurement tools, you can interpret the results, make informed decisions, and fine-tune your application to achieve optimal outcomes. To learn more, see Built in evaluators.

Troubleshooting

Symptom	Possible cause	Action
Run stays pending	High service load or queued jobs	Refresh, verify quota, and resubmit if prolonged
Metrics missing	Not selected at creation	Rerun and select required metrics
All safety metrics zero	Category disabled or unsupported model	Confirm model and metric support matrix
Groundedness unexpectedly low	Retrieval/context incomplete	Verify context construction / retrieval latency

Improve low metrics with prompt iteration or fine-tuning.
Run evaluations in the cloud with the Microsoft Foundry SDK.

Learn how to evaluate your generative AI applications:

Evaluate Generative AI Models and Apps with Microsoft Foundry

Evaluation Cluster Analysis

What is Microsoft Foundry (new)?

Get started

Agent development

Agent tools & integration

Model capabilities

Fine-tuning

Manage agents, models, & tools

Observability, evaluation, & tracing

Developer experience

API & SDK

Responsible AI

Best practices

Setup & configure

Security & governance

Operate & support

See Evaluation Results in Microsoft Foundry portal

See evaluation results in the Microsoft Foundry portal

Prerequisites

See your evaluation results

Evaluation run details

Compare the evaluation results

Understand the built-in evaluation metrics

Troubleshooting

What is Microsoft Foundry (new)?

Get started

Agent development

Agent tools & integration

Model capabilities

Fine-tuning

Manage agents, models, & tools

Observability, evaluation, & tracing

Developer experience

API & SDK

Responsible AI

Best practices

Setup & configure

Security & governance

Operate & support

​See evaluation results in the Microsoft Foundry portal

​Prerequisites

​See your evaluation results

​Evaluation run details

​Compare the evaluation results

​Understand the built-in evaluation metrics

​Troubleshooting

​Related content

See evaluation results in the Microsoft Foundry portal

Prerequisites

See your evaluation results

Evaluation run details

Compare the evaluation results

Understand the built-in evaluation metrics

Troubleshooting

Related content