Skip to main content
Azure OpenAI in Microsoft Foundry Models includes default safety policies applied to all models (excluding Whisper models). These configurations provide you with a responsible experience by default, including content filtering models, blocklists, prompt transformation, content credentials, and other features. Guardrails and controls ensure that AI-generated outputs align with ethical guidelines and safety standards. Azure OpenAI provides Guardrail capabilities to help identify and mitigate risks associated with various categories of harmful or inappropriate content. Default safety aims to mitigate risks in different categories such as hate and fairness, sexual, violence, self-harm, protected material content, and user prompt injection attacks. To learn more, see categories and severity levels. All safety policies are configurable. To learn more about configurability, see configuring Guardrails. When content is detected that exceeds the severity threshold for a risk category, the API request is blocked and returns an error response indicating which category triggered the filter. This applies to both user prompts (input) and model completions (output).

Prerequisites

  • An Azure subscription with access to Azure OpenAI Service
  • Deployed Azure OpenAI models (excluding Whisper, which uses different safety configurations)

Text models

Text models in Azure OpenAI can take in and generate both text and code. These models leverage Azure’s text content filters to detect and prevent harmful content. This system works on both prompts and completions.
Risk categoryPrompt or completionSeverity threshold
Hate and fairnessPrompts and completionsMedium
ViolencePrompts and completionsMedium
SexualPrompts and completionsMedium
Self-harmPrompts and completionsMedium
User prompt injection attack (jailbreak)PromptsN/A
Protected material – textCompletionsN/A
Protected material – codeCompletionsN/A

Vision models

Vision-enabled chat models

Risk categoryPrompt or completionSeverity threshold
Hate and fairnessPrompts and completionsMedium
ViolencePrompts and completionsMedium
SexualPrompts and completionsMedium
Self-harmPrompts and completionsMedium
Identification of individuals and inference of sensitive attributesPromptsN/A
User prompt injection attack (jailbreak)PromptsN/A

Image generation models

Risk categoryPrompt or completionSeverity threshold
Hate and fairnessPrompts and completionsMedium
ViolencePrompts and completionsMedium
SexualPrompts and completionsMedium
Self-harmPrompts and completionsMedium
Content credentialsCompletionsN/A
Deceptive generation of political candidatesPromptsN/A
Depictions of public figuresPromptsN/A
User prompt injection attack (jailbreak)PromptsN/A
Protected material – art and studio charactersPromptsN/A
ProfanityPromptsN/A

Audio models

Risk categoryPrompt or completionSeverity threshold
Hate and fairnessPrompts and completionsMedium
ViolencePrompts and completionsMedium
SexualPrompts and completionsMedium
Self-harmPrompts and completionsMedium
User prompt injection attack (jailbreak)PromptsN/A
Protected material - textCompletionsN/A
Protected material - codeCompletionsN/A

Severity levels

The text content filtering models for the hate, sexual, violence, and self-harm categories are specifically trained and tested on the following languages: English, German, Japanese, Spanish, French, Italian, Portuguese, and Chinese. However, the service can work in many other languages, but the quality might vary. In all cases, you should do your own testing to ensure that it works for your application.

Text content

The Severity definitions tab in this document contains examples of harmful content that may be disturbing to some readers.

Image content

The Severity definitions tab in this document contains examples of harmful content that may be disturbing to some readers.

Testing safety policies

To verify that default safety policies are active, send a test prompt that should trigger content filtering. The following example uses the Azure OpenAI Python SDK with key-based authentication:
import os
from openai import AzureOpenAI

client = AzureOpenAI(
    api_key=os.environ["AZURE_OPENAI_API_KEY"],
    api_version="2024-10-21",
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
)

response = client.chat.completions.create(
    model=os.environ["AZURE_OPENAI_DEPLOYMENT"],  # your deployment name
    messages=[{"role": "user", "content": "[test prompt]"}],
)

print(response.choices[0].finish_reason)
Replace [test prompt] with content that exceeds one of the configured severity thresholds. If safety policies are active and the content is filtered, the request returns an HTTP 400 error with a content_filter code, or the response’s finish_reason is content_filter with details indicating which category was triggered.

Next steps