Items marked (preview) in this article are currently in public preview. This preview is provided without a service-level agreement, and we don’t recommend it for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
Overview
End user feedback logging uses OTel semantic conventions to emit structured feedback events that are associated with a specific trace or span. This means feedback data flows through the same telemetry pipeline as your traces and metrics, giving you a unified view of application behavior and user sentiment. Key capabilities:- Thumbs up or thumbs down: Record binary user approval signals that tie to a specific agent response.
- Rating scale: Capture numeric scores, such as 1–5 stars, to quantify user satisfaction.
- Trace correlation: Each feedback event links back to the originating trace and span, so you can drill down from aggregate satisfaction metrics to individual interactions.
- Standard OTel transport: Feedback events use the OpenTelemetry Events API, so they’re exported through your existing OTel pipeline to Application Insights or any compatible backend.
Prerequisites
- A Foundry project with an Application Insights resource connected. See Set up tracing.
- OpenTelemetry instrumentation configured in your application. See Set up tracing in Microsoft Foundry for setup instructions.
- Python 3.9 or later, or a supported language with OTel SDK support.
-
The
azure-ai-projectsandazure-monitor-opentelemetrypackages installed:
Evaluation types
Capture human feedback as agen_ai.evaluation.result OpenTelemetry event. The system supports two evaluation types:
| Type | Description | UI rendering | Score range |
|---|---|---|---|
| Binary | A pass/fail evaluation. | Thumbs up or thumbs down | 0.0 (fail) or 1.0 (pass) |
| Likert 5-point | An ordinal evaluation on a 5-point scale. | 5-star rating or Likert scale (Strongly Disagree → Strongly Agree) | 1.0 to 5.0 |
- Builder: A human evaluating with a Microsoft observability solution, such as in Foundry or Azure Monitor portal.
- End user: A human evaluating through an application interface that a Microsoft observability solution is monitoring.
Event attributes
Each human evaluation is emitted as agen_ai.evaluation.result event. The following sections describe the required attributes for each evaluation type. For a complete reference implementation, see sample_human_evaluations.py.
Emit a binary evaluation (thumbs up or down)
Binary evaluations capture pass or fail feedback. The score must be0.0 (fail) or 1.0 (pass).
Rules:
gen_ai.evaluation.score.value:0.0or1.0gen_ai.evaluation.score.label:"fail"or"pass"internal_properties.gen_ai.evaluation.type:"boolean"internal_properties.gen_ai.evaluation.min_value:0.0internal_properties.gen_ai.evaluation.max_value:1.0internal_properties.gen_ai.evaluation.threshold:1.0internal_properties.gen_ai.evaluation.desirable_direction:"increase"
Example: End user gives thumbs up
For a runnable example, see sample_human_evaluations.py.Emit a Likert 5-point evaluation (rating scale)
Likert 5-point evaluations capture ordinal feedback on a 1–5 scale. A score at or above the threshold (default3.0) is considered a pass.
Rules:
gen_ai.evaluation.score.value: a value between1.0and5.0gen_ai.evaluation.score.label:"pass"if score ≥ threshold, otherwise"fail"internal_properties.gen_ai.evaluation.type:"ordinal"internal_properties.gen_ai.evaluation.min_value:1.0internal_properties.gen_ai.evaluation.max_value:5.0internal_properties.gen_ai.evaluation.threshold:3.0internal_properties.gen_ai.evaluation.desirable_direction:"increase"