- User input — The prompt sent to a model or agent.
- Tool call (Preview) — The action and data the agent proposes to send to a tool. Agents only.
- Tool response (Preview) — The content returned from a tool to the agent. Agents only.
- Output — The final completion returned to the user.
Guardrails leverage classification models from Azure AI Content Safety to detect harmful content across supported risk categories.
The guardrail system applies to all Foundry Models sold by Azure, except for prompts and completions processed by audio models such as Whisper. For more information, see Audio models. The guardrail system currently applies only to agents developed in the Foundry Agent Service, not to other agents registered in the Foundry Control Plane.
Prerequisites
- An Azure subscription. Create one for free.
- A Microsoft Foundry project.
- At least one model deployment in your project.
- Foundry Account Owner role.
The Foundry RBAC roles were recently renamed. Foundry User, Foundry Owner, Foundry Account Owner, and Foundry Project Manager were previously named Azure AI User, Azure AI Owner, Azure AI Account Owner, and Azure AI Project Manager. You might still see the previous names in some places while the rename rolls out. The role IDs and core permissions are unchanged by the rename.
- Access to a role that allows you to create a Foundry resource, such as Foundry Account Owner or Foundry Owner on the subscription or resource group. For more information about permissions, see Role-based access control for Microsoft Foundry.
The Foundry RBAC roles were recently renamed. Foundry User, Foundry Owner, Foundry Account Owner, and Foundry Project Manager were previously named Azure AI User, Azure AI Owner, Azure AI Account Owner, and Azure AI Project Manager. You might still see the previous names in some places while the rename rolls out. The role IDs and core permissions are unchanged by the rename.
Guardrails for agents vs models
An individual Foundry guardrail can be applied to one or many models and one or many agents in a project. Some controls within a guardrail may not be relevant to models because the risk, intervention point, or action is specific to agentic behavior or tool calls. Those controls aren’t run on models using that guardrail. Some risks in Preview aren’t yet supported for agents. When controls involving those risks are added to a guardrail and the guardrail is applied to an agent, those controls don’t take effect for that agent. They still apply to models that use the same guardrail.Risk applicability
The following table summarizes which risks are applicable to models and agents:| Risk | Applicable to Models | Applicable to Agents (Preview) |
|---|---|---|
| Hate | ✅ | ✅ |
| Sexual | ✅ | ✅ |
| Self-harm | ✅ | ✅ |
| Violence | ✅ | ✅ |
| User prompt attacks | ✅ | ✅ |
| Indirect attacks | ✅ | ✅ |
| Spotlighting (Preview) | ✅ | ❌ |
| Protected material for code | ✅ | ✅ |
| Protected material for text | ✅ | ✅ |
| Groundedness (Preview) | ✅ | ❌ |
| Personally identifiable information (Preview) | ✅ | ✅ |
| Task Adherence (Preview) | ✅ | ✅ |
Severity levels
For content risks (Hate, Sexual, Self-harm, Violence), each control uses a severity level threshold that determines which content is flagged:| Severity level | Behavior |
|---|---|
| Off | Detection is disabled for this risk. Only available for approved customers, see content filters |
| Low | Flags content at low severity and above. Least restrictive. |
| Medium | Flags content at medium severity and above. |
| High | Flags only the most severe content. Most restrictive. |
Intervention point applicability
The following table summarizes which intervention points are applicable to models and agents:| Intervention Point | Applicable to Models | Applicable to Agents (Preview) |
|---|---|---|
| User input | ✅ | ✅ |
| Tool call | ❌ | ✅ (Preview) |
| Tool response | ❌ | ✅ (Preview) |
| Output | ✅ | ✅ |
Risks are detected in an agent based on the guardrail it’s assigned, not the guardrail of its underlying model. The agentic guardrail fully overrides the model’s guardrail.
Example: Guardrail override behavior
Consider this scenario:- A model deployment has a control with Violence detection set to High for user input and output
- An agent using that model has a control with Violence detection set to Low for user input and output. The agent has no controls for Violence detection at all for tool calls and responses
Action applicability
When a control detects a risk, it can take one of two actions. The following table summarizes which actions are applicable to models and agents:| Action | Applicable to Models | Applicable to Agents (Preview) |
|---|---|---|
| Annotate | ✅ | ❌ |
| Annotate and block | ✅ | ✅ |
Guardrail inheritance and override
Risks are detected in an agent based on the guardrail it’s assigned, not the guardrail of its underlying model. The agentic guardrail fully overrides the model’s guardrail.
- A model deployment has a control with Violence detection set to High for user input and output
- An agent using that model has a control with Violence detection set to Low for user input and output. The agent has no controls for Violence detection at all for tool calls and responses|
- User queries to the agent are scanned for Violence at a Low level
- Tool calls generated internally to the agent by its underlying model, including the content then sent to that tool during the tool call’s execution, will not be scanned for Violence
- The response back from the tool will not be scanned for Violence
- The final output returned to the user in response to their original query are scanned for Violence at a Low level
Default guardrails
By default, models are assigned the Microsoft.DefaultV2 guardrail. For more information about what controls are included, see Content filtering. Default guardrail assignment for agents follows these rules:- If you assign a custom guardrail to an agent, that guardrail is used.
- If no custom guardrail is assigned, the agent inherits the guardrail of its underlying model deployment.
- An agent only uses the Microsoft.DefaultV2 guardrail if its model deployment uses that guardrail, or if you explicitly assign it.
For example, if no custom guardrails are specified for an agent and that agent uses a GPT-4o mini deployment with a guardrail named “MyCustomGuardrails,” the agent also uses “MyCustomGuardrails” until you assign a different guardrail.
Troubleshooting
Guardrail not applying to agent
Symptom: Agent behavior doesn’t match assigned guardrail configuration. Causes:- Guardrail contains controls with preview risks not yet supported for agents (Spotlighting, Groundedness)
- Agent using model’s guardrail instead of assigned guardrail
- Verify assigned guardrail using Azure AI Foundry portal or SDK
- Check that guardrail controls don’t rely on agent-unsupported risks
- Explicitly assign guardrail to agent to override model defaults
Content flagged unexpectedly
Symptom: Legitimate content blocked by guardrail. Causes:- Severity level set too restrictively (High blocking)
- Classification model detected edge-case pattern
- Review severity level settings for affected risk category
- Test with different severity levels to find appropriate threshold
- For persistent false positives, contact Azure Support to review classification
Tool calls not being scanned
Symptom: Harmful content passes through tool calls/responses. Causes:- Tool call and tool response intervention points not configured in guardrail
- Using preview features that may not be fully enabled
- Verify guardrail includes controls for tool call and tool response intervention points
- Ensure Foundry Agent Service preview features are enabled for your project