Agent Optimizer is currently in limited preview and only available through a sign-up process. To access the service, complete the intake form. This preview is provided without a service-level agreement, and we don’t recommend it for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
| Scenario | What the optimizer does |
|---|---|
| Improve overall response quality | Instruction tuning |
| Reduce incorrect information | Instruction tuning |
| Improve repeatable behaviors (escalation, debugging patterns) | Skill improvement |
| Agent has structured procedures that need refinement | Skill improvement |
| Find the best quality/cost model trade-off | Model selection |
| First optimization, not sure what to expect | All applicable targets run automatically |
Prerequisites
- A Foundry project with a deployed hosted agent
- The
azure.ai.agentsCLI extension installed (see Quickstart: Optimize a hosted agent) - A model deployed for evaluation (for example,
gpt-4.1-mini) and an optimization model from the supported list (for example,gpt-5.1) - Your agent is optimizer-ready (calls
load_config())
Which agent gets optimized
The optimizer needs to know which deployed hosted agent to target. It resolves the agent name using the following priority order:| Priority | Source | Example |
|---|---|---|
| 1 (highest) | --agent CLI flag | azd ai agent optimize --agent my-support-agent |
| 2 | agent.name field in eval.yaml | agent:\n name: my-support-agent |
| 3 (default) | name field in agent.yaml | name: my-support-agent |
agent.yaml file. Use the --agent flag when you have multiple agents in your project or want to override the default:
The agent name must match a deployed hosted agent in your Foundry project. Run
azd ai agent invoke "test" to verify your agent responds before starting optimization.Optimize instructions
The optimizer rewrites and refines your agent’s system prompt to improve performance on your evaluation dataset. Instruction tuning activates automatically when your baseline config includes aninstructions.md file.
How it works
- Baseline evaluation. Your agent runs with its current instructions against every task in the dataset. The evaluator scores each response against the task’s criteria.
- Instruction generation. The optimizer analyzes the baseline scores and generates alternative system prompts designed to improve weak areas while maintaining strong areas.
- Candidate evaluation. The optimizer injects each candidate instruction set into your agent through the
OPTIMIZATION_CANDIDATE_IDenvironment variable and evaluates it against the same dataset. - Ranking. The optimizer ranks candidates by composite score and marks the best candidate with ★.
Run instruction optimization
What gets changed
The optimizer rewrites the system prompt. Your code stays the same becauseload_config() returns the new instructions automatically. Common improvements include:
- Adding explicit constraints that the original prompt implied but didn’t state
- Restructuring instructions for clarity
- Adding output format specifications
- Strengthening safety and scope boundaries
Example: Before and after
Before (your default instructions):Max iterations
Themax_iterations option controls how many candidate instruction sets are generated. Each iteration produces one candidate.
| Max iterations | Candidates | Time | Best for |
|---|---|---|---|
| 4 (default) | 4 | 5 to 10 min | Quick experiments |
| 5 | 5 | 10 to 15 min | Good balance |
| 10 | 10 | 20 to 30 min | Thorough exploration |
Times are approximate for a dataset of 3 to 10 tasks. Larger datasets or slower eval models increase run duration.
Eval model
The eval model scores agent responses against criteria. Any chat-completion model deployed in your Foundry project works.If the eval model isn’t deployed, all scores are zero with no error message. Always verify your eval model exists in the project.
Optimization model (reflection)
The optimization model (also called “reflection model”) generates candidate configurations — improved instructions, skills, and tool descriptions. It analyzes baseline results and produces improved variants. It must be deployed in your Foundry project. Supported models:gpt-5, gpt-5.1, gpt-5.3.
Specify the optimization model in your config file or via CLI:
The
optimization_model field is required. If you don’t specify it and don’t pass --optimize-model, the optimization API returns an error.Optimize skills
The optimizer improves existing skills your agent uses. It refines skill descriptions, implementations, and activation criteria. Skill optimization activates automatically when your baseline config includes askills/ directory.
How it works
- Baseline evaluation. Same as instruction tuning. The optimizer evaluates your agent against the dataset.
-
Skill improvement. The optimizer analyzes weak areas and refines skill definitions. A skill is a named capability with:
- Name: For example,
"step_by_step_reasoning" - Description: What the skill does and when to use it
- Body: Implementation details or procedure
- Name: For example,
-
Injection. The agent loads improved skills through
load_config(), which makes them available to your agent’s instruction set. - Evaluation. The optimizer evaluates the agent with improved skills against the dataset.
Run skill optimization
Ensure your baseline config has askills/ directory with at least one skill, then run:
Skill file downloads
For candidates that include skill files, the config loader can download them through the resolver API. Skills use the open Agent Skills format and are stored in a local directory.Optimize tools
The optimizer improves tool descriptions and parameters in yourtools.json file to help the model call tools more accurately. Tool optimization activates automatically when your baseline config includes a tools.json file.
How it works
- Baseline evaluation. The optimizer evaluates your agent against the dataset, including any tool calls it makes.
- Tool analysis. The optimizer identifies tool calls that fail or produce suboptimal results and analyzes the root cause—unclear descriptions, missing parameters, or ambiguous naming.
- Description refinement. The optimizer generates improved tool definitions with clearer descriptions, better parameter documentation, and more precise function names.
- Evaluation. The optimizer evaluates the agent with improved tool definitions against the dataset.
Run tool optimization
Ensure your baseline config has atools.json file, then run:
What gets changed
The optimizer refines yourtools.json definitions. Common improvements include:
- Clearer function descriptions that help the model know when to call a tool
- More specific parameter descriptions that reduce inaccurate arguments
- Added constraints (enums, required fields) that prevent invalid inputs
Optimize model selection
The optimizer evaluates your agent across multiple model deployments to find the best quality-to-cost trade-off. Each model runs against the same dataset, so you can compare results directly. Model optimization activates when you specify model candidates inoptimization_config.
Configure model candidates
Specify the models to evaluate in youreval.yaml:
optimization_config.model must be deployed in your Foundry project.
Run model optimization
Combine with other targets
When your baseline includes instructions, skills, and model candidates, the optimizer runs all targets together in a single run:Interpret results
After optimization completes, review the results table. For detailed scoring guidance, see Understand optimization results. Key thresholds:| Improvement | Interpretation |
|---|---|
| Less than 0.03 | Noise. Not meaningful. |
| 0.03 to 0.10 | Moderate. Worth deploying. |
| 0.10 to 0.20 | Significant improvement. |
| Greater than 0.20 | Major improvement. |
Deploy the winner
The recommended workflow is to apply the optimized config locally, then deploy:.agent_configs/<candidate_id>/ in your project. On next deploy, your agent uses the improved instructions and tool descriptions.
Alternatively, you can deploy directly via the API (useful for quick A/B testing):
Troubleshooting
| Problem | Cause | Fix |
|---|---|---|
| All scores are 0.00 | Eval model not deployed | Deploy the eval model in your Foundry project, or use --eval-model to specify one that exists |
optimize returns 403 | Subscription not on allow list | Contact your Microsoft representative to request access |
"agent.yaml does not declare any protocols" | Invalid agent.yaml format | Use flat format: kind: hosted at top level with protocols: list |
| Job stuck at “running” | Service issue | Cancel with azd ai agent optimize cancel <id> and retry |
| No candidate IDs in output | Job still running | Wait for completion or use --watch |