Foundry Models sold directly by Azure
This article refers to the Microsoft Foundry (new) portal.
Foundry Models sold directly by Azure also include select models from top model providers, such as:
- Black Forest Labs:
FLUX.2-pro,FLUX.1-Kontext-pro,FLUX-1.1-pro - Cohere:
Cohere-command-a,embed-v-4-0,Cohere-rerank-v4.0-pro,Cohere-rerank-v4.0-fast - DeepSeek:
DeepSeek-V3.2,DeepSeek-V3.2-Speciale,DeepSeek-V3.1,DeepSeek-V3-0324,DeepSeek-R1-0528,DeepSeek-R1 - Moonshot AI:
Kimi-K2.5,Kimi-K2-Thinking - Meta:
Llama-4-Maverick-17B-128E-Instruct-FP8,Llama-3.3-70B-Instruct - Microsoft:
MAI-DS-R1,model-router - Mistral:
mistral-document-ai-2505,Mistral-Large-3 - xAI:
grok-code-fast-1,grok-3,grok-3-mini,grok-4-fast-reasoning,grok-4-fast-non-reasoning,grok-4
Azure OpenAI in Microsoft Foundry models
Azure OpenAI is powered by a diverse set of models with different capabilities and price points. Model availability varies by region and cloud. For Azure Government model availability, refer to Azure OpenAI in Azure Government.| Models | Description |
|---|---|
| GPT-5.2 series | NEW gpt-5.2-codex, gpt-5.2, gpt-5.2-chat (Preview) |
| GPT-5.1 series | NEW gpt-5.1, gpt-5.1-chat, gpt-5.1-codex, gpt-5.1-codex-mini |
| Sora | NEW sora-2 |
| GPT-5 series | gpt-5, gpt-5-mini, gpt-5-nano, gpt-5-chat |
| gpt-oss | open-weight reasoning models |
| codex-mini | Fine-tuned version of o4-mini. |
| GPT-4.1 series | gpt-4.1, gpt-4.1-mini, gpt-4.1-nano |
| computer-use-preview | An experimental model trained for use with the Responses API computer use tool. |
| o-series models | Reasoning models with advanced problem solving and increased focus and capability. |
| GPT-4o, GPT-4o mini, and GPT-4 Turbo | Capable Azure OpenAI models with multimodal versions, which can accept both text and images as input. |
| Embeddings | A set of models that can convert text into numerical vector form to facilitate text similarity. |
| Image generation | A series of models that can generate original images from natural language. |
Video generation | A model that can generate original video scenes from text instructions. |
| Audio | A series of models for speech to text, translation, and text to speech. GPT-4o audio models support either low latency speech in, speech out conversational interactions or audio generation. |
GPT-5.2
Region availability
| Model | Region |
|---|---|
gpt-5.2 | See the models table. |
gpt-5.2-chat | See the models table. |
gpt-5.2-codex | East US2 & Sweden Central (Global Standard) |
| Model ID | Description | Context Window | Max Output Tokens | Training Data (up to) |
|---|---|---|---|---|
gpt-5.2-codex (2026-01-14) | - Reasoning - Chat Completions API. - Responses API. - Structured outputs. - Text and image processing. - Functions, tools, and parallel tool calling. - Full summary of capabilities. - Optimized for Codex CLI & Codex VS Code extension | 400,000 Input: 272,000 Output: 128,000 | 128,000 | |
gpt-5.2 (2025-12-11) | - Reasoning - Chat Completions API. - Responses API. - Structured outputs. - Text and image processing. - Functions, tools, and parallel tool calling. - Full summary of capabilities. | 400,000 Input: 272,000 Output: 128,000 | 128,000 | August 2025 |
gpt-5.2-chat (2025-12-11)Preview | - Chat Completions API. - Responses API. - Structured outputs - Functions, tools, and parallel tool calling. | 128,000 Input: 111,616 Output: 16,384 | 16,384 | August 2025 |
We don’t recommend using preview models in production. We’ll upgrade all deployments of preview models to either future preview versions or to the latest stable, generally available version. Models that are designated preview don’t follow the standard Azure OpenAI model lifecycle.
GPT-5.1
Region availability
| Model | Region |
|---|---|
gpt-5.1 | See the models table. |
gpt-5.1-chat | See the models table. |
gpt-5.1-codex | See the models table. |
gpt-5.1-codex-mini | See the models table. |
gpt-5.1-codex-max | See the models table. |
| Model ID | Description | Context Window | Max Output Tokens | Training Data (up to) |
|---|---|---|---|---|
gpt-5.1 (2025-11-13) | - Reasoning - Chat Completions API. - Responses API. - Structured outputs. - Text and image processing. - Functions, tools, and parallel tool calling. - Full summary of capabilities. | 400,000 Input: 272,000 Output: 128,000 | 128,000 | September 30, 2024 |
gpt-5.1-chat (2025-11-13) Preview | - Reasoning - Chat Completions API. - Responses API. - Structured outputs - Functions, tools, and parallel tool calling. | 128,000 Input: 111,616 Output: 16,384 | 16,384 | September 30, 2024 |
gpt-5.1-codex (2025-11-13) | - Responses API only. - Text and image processing - Structured outputs. - Functions, tools, and parallel tool calling. - Full summary of capabilities - Optimized for Codex CLI & Codex VS Code extension | 400,000 Input: 272,000 Output: 128,000 | 128,000 | September 30, 2024 |
gpt-5.1-codex-mini (2025-11-13) | - Responses API only. - Text and image processing - Structured outputs. - Functions, tools, and parallel tool calling. - Full summary of capabilities - Optimized for Codex CLI & Codex VS Code extension | 400,000 Input: 272,000 Output: 128,000 | 128,000 | September 30, 2024 |
gpt-5.1-codex-max (2025-12-04) | - Responses API only. - Text and image processing - Structured outputs. - Functions, tools, and parallel tool calling. - Full summary of capabilities - Optimized for Codex CLI & Codex VS Code extension | 400,000 Input: 272,000 Output: 128,000 | 128,000 | September 30, 2024 |
We don’t recommend using preview models in production. We’ll upgrade all deployments of preview models to either future preview versions or to the latest stable, generally available version. Models that are designated preview don’t follow the standard Azure OpenAI model lifecycle.
-
gpt-5.1reasoning_effortdefaults tonone. When upgrading from previous reasoning models togpt-5.1, keep in mind that you may need to update your code to explicitly pass areasoning_effortlevel if you want reasoning to occur. -
gpt-5.1-chatadds built-in reasoning capabilities. Like other reasoning models it does not support parameters liketemperature. If you upgrade from usinggpt-5-chat(which is not a reasoning model) togpt-5.1-chatmake sure you remove any custom parameters liketemperaturefrom your code which are not supported by reasoning models. -
gpt-5.1-codex-maxadds support for settingreasoning_efforttoxhigh. Reasoning effortnoneis not supported withgpt-5.1-codex-max.
GPT-5
Region availability
| Model | Region |
|---|---|
gpt-5 (2025-08-07) | See the models table. |
gpt-5-mini (2025-08-07) | See the models table. |
gpt-5-nano (2025-08-07) | See the models table. |
gpt-5-chat (2025-08-07) | See the models table. |
gpt-5-chat (2025-10-03) | See the models table. |
gpt-5-codex (2025-09-11) | See the models table. |
gpt-5-pro (2025-10-06) | See the models table. |
- Registration is required for access to the gpt-5-pro, gpt-5, & gpt-5-codex models.
-
gpt-5-mini,gpt-5-nano, andgpt-5-chatdo not require registration.
o3, don’t need to reapply as their approved subscriptions will automatically be granted access upon model release.
| Model ID | Description | Context Window | Max Output Tokens | Training Data (up to) |
|---|---|---|---|---|
gpt-5 (2025-08-07) | - Reasoning - Chat Completions API. - Responses API. - Structured outputs. - Text and image processing. - Functions, tools, and parallel tool calling. - Full summary of capabilities. | 400,000 Input: 272,000 Output: 128,000 | 128,000 | September 30, 2024 |
gpt-5-mini (2025-08-07) | - Reasoning - Chat Completions API. - Responses API. - Structured outputs. - Text and image processing. - Functions, tools, and parallel tool calling. - Full summary of capabilities. | 400,000 Input: 272,000 Output: 128,000 | 128,000 | May 31, 2024 |
gpt-5-nano (2025-08-07) | - Reasoning - Chat Completions API. - Responses API. - Structured outputs. - Text and image processing. - Functions, tools, and parallel tool calling. - Full summary of capabilities. | 400,000 Input: 272,000 Output: 128,000 | 128,000 | May 31, 2024 |
gpt-5-chat (2025-08-07)Preview | - Chat Completions API. - Responses API. - Input: Text/Image - Output: Text only | 128,000 | 16,384 | September 30, 2024 |
gpt-5-chat (2025-10-03)Preview1 | - Chat Completions API. - Responses API. - Input: Text/Image - Output: Text only | 128,000 | 16,384 | September 30, 2024 |
gpt-5-codex (2025-09-11) | - Responses API only. - Input: Text/Image - Output: Text only - Structured outputs. - Text and image processing. - Functions, tools, and parallel tool calling. - Full summary of capabilities - Optimized for Codex CLI & Codex VS Code extension | 400,000 Input: 272,000 Output: 128,000 | 128,000 | - |
gpt-5-pro (2025-10-06) | - Reasoning - Responses API. - Structured outputs. - Text and image processing. - Functions and tools - Full summary of capabilities. | 400,000 Input: 272,000 Output: 128,000 | 128,000 | September 30, 2024 |
1
gpt-5-chat version 2025-10-03 introduces a significant enhancement focused on emotional intelligence and mental health capabilities. This upgrade integrates specialized datasets and refined response strategies to improve the model’s ability to:- Understand and interpret emotional context more accurately, enabling nuanced and empathetic interactions.
- Provide supportive, responsible responses in conversations related to mental health, ensuring sensitivity and adherence to best practices.
We don’t recommend using preview models in production. We’ll upgrade all deployments of preview models to either future preview versions or to the latest stable, generally available version. Models that are designated preview don’t follow the standard Azure OpenAI model lifecycle.
gpt-oss
Region availability
| Model | Region |
|---|---|
gpt-oss-120b | All Azure OpenAI regions |
Capabilities
| Model ID | Description | Context Window | Max Output Tokens | Training Data (up to) |
|---|---|---|---|---|
gpt-oss-120b (Preview) | - Text in/text out only - Chat Completions API - Streaming - Function calling - Structured outputs - Reasoning - Available for deployment1 and via managed compute | 131,072 | 131,072 | May 31, 2024 |
gpt-oss-20b (Preview) | - Text in/text out only - Chat Completions API - Streaming - Function calling - Structured outputs - Reasoning - Available via managed compute and Foundry Local | 131,072 | 131,072 | May 31, 2024 |
gpt-oss-120b requires a Foundry project to deploy the model.
Deploy with code
GPT-4.1 series
Region availability
| Model | Region |
|---|---|
gpt-4.1 (2025-04-14) | See the models table. |
gpt-4.1-nano (2025-04-14) | See the models table. |
gpt-4.1-mini (2025-04-14) | See the models table. |
Capabilities
A known issue is affecting all GPT 4.1 series models. Large tool or function call definitions that exceed 300,000 tokens will result in failures, even though the 1 million token context limit of the models wasn’t reached.The errors can vary based on API call and underlying payload characteristics.Here are the error messages for the Chat Completions API:
-
Error code: 400 - {'error': {'message': "This model's maximum context length is 300000 tokens. However, your messages resulted in 350564 tokens (100 in the messages, 350464 in the functions). Please reduce the length of the messages or functions.", 'type': 'invalid_request_error', 'param': 'messages', 'code': 'context_length_exceeded'}} -
Error code: 400 - {'error': {'message': "Invalid 'tools[0].function.description': string too long. Expected a string with maximum length 1048576, but got a string with length 2778531 instead.", 'type': 'invalid_request_error', 'param': 'tools[0].function.description', 'code': 'string_above_max_length'}}
Error code: 500 - {'error': {'message': 'The server had an error processing your request. Sorry about that! You can retry your request, or contact us through an Azure support request at: https://go.microsoft.com/fwlink/?linkid=2213926 if you keep seeing this error. (Please include the request ID d2008353-291d-428f-adc1-defb5d9fb109 in your email.)', 'type': 'server_error', 'param': None, 'code': None}}
| Model ID | Description | Context window | Max output tokens | Training data (up to) |
|---|---|---|---|---|
gpt-4.1 (2025-04-14) | - Text and image input - Text output - Chat completions API - Responses API - Streaming - Function calling - Structured outputs (chat completions) | - 1,047,576 - 128,000 (standard & provisioned managed deployments) - 300,000 (batch deployments) | 32,768 | May 31, 2024 |
gpt-4.1-nano (2025-04-14) | - Text and image input - Text output - Chat completions API - Responses API - Streaming - Function calling - Structured outputs (chat completions) | - 1,047,576 - 128,000 (standard & provisioned managed deployments) - 300,000 (batch deployments) | 32,768 | May 31, 2024 |
gpt-4.1-mini (2025-04-14) | - Text and image input - Text output - Chat completions API - Responses API - Streaming - Function calling - Structured outputs (chat completions) | - 1,047,576 - 128,000 (standard & provisioned managed deployments) - 300,000 (batch deployments) | 32,768 | May 31, 2024 |
computer-use-preview
An experimental model trained for use with the Responses API computer use tool. It can be used with third-party libraries to allow the model to control mouse and keyboard input, while getting context from screenshots of the current environment.We don’t recommend using preview models in production. We’ll upgrade all deployments of preview models to either future preview versions or to the latest stable, generally available version. Models that are designated preview don’t follow the standard Azure OpenAI model lifecycle.
computer-use-preview. Access is granted based on Microsoft’s eligibility criteria. Customers who have access to other limited access models still need to request access for this model.
To request access, go to computer-use-preview limited access model application. When access is granted, you need to create a deployment for the model.
Region availability
| Model | Region |
|---|---|
computer-use-preview | See the models table. |
Capabilities
| Model ID | Description | Context window | Max output tokens | Training data (up to) |
|---|---|---|---|---|
computer-use-preview (2025-03-11) | Specialized model for use with the Responses API computer use tool - Tools - Streaming - Text (input/output) - Image (input) | 8,192 | 1,024 | October 2023 |
o-series models
The Azure OpenAI o-series models are designed to tackle reasoning and problem-solving tasks with increased focus and capability. These models spend more time processing and understanding the user’s request, making them exceptionally strong in areas like science, coding, and math, compared to previous iterations.| Model ID | Description | Max request (tokens) | Training data (up to) |
|---|---|---|---|
codex-mini (2025-05-16) | Fine-tuned version of o4-mini. - Responses API. - Structured outputs. - Text and image processing. - Functions and tools. Full summary of capabilities. | Input: 200,000 Output: 100,000 | May 31, 2024 |
o3-pro (2025-06-10) | - Responses API. - Structured outputs. - Text and image processing. - Functions and tools. Full summary of capabilities. | Input: 200,000 Output: 100,000 | May 31, 2024 |
o4-mini (2025-04-16) | - New reasoning model, offering enhanced reasoning abilities. - Chat Completions API. - Responses API. - Structured outputs. - Text and image processing. - Functions and tools. Full summary of capabilities. | Input: 200,000 Output: 100,000 | May 31, 2024 |
o3 (2025-04-16) | - New reasoning model, offering enhanced reasoning abilities. - Chat Completions API. - Responses API. - Structured outputs. - Text and image processing. - Functions, tools, and parallel tool calling. Full summary of capabilities. | Input: 200,000 Output: 100,000 | May 31, 2024 |
o3-mini (2025-01-31) | - Enhanced reasoning abilities. - Structured outputs. - Text-only processing. - Functions and tools. | Input: 200,000 Output: 100,000 | October 2023 |
o1 (2024-12-17) | - Enhanced reasoning abilities. - Structured outputs. - Text and image processing. - Functions and tools. | Input: 200,000 Output: 100,000 | October 2023 |
o1-preview (2024-09-12) | Older preview version. | Input: 128,000 Output: 32,768 | October 2023 |
o1-mini (2024-09-12) | A faster and more cost-efficient option in the o1 series, ideal for coding tasks that require speed and lower resource consumption. - Global Standard deployment available by default. - Standard (regional) deployments are currently only available for select customers who received access as part of the o1-preview limited access release. | Input: 128,000 Output: 65,536 | October 2023 |
Region availability
| Model | Region |
|---|---|
codex-mini | East US2 & Sweden Central (Global Standard). |
o3-pro | East US2 & Sweden Central (Global Standard). |
o4-mini | See the models table. |
o3 | See the models table. |
o3-mini | See the models table. |
o1 | See the models table. |
o1-preview | See the models table. This model is available only for customers who were granted access as part of the original limited access. |
o1-mini | See the models table. |
GPT-4o and GPT-4 Turbo
GPT-4o integrates text and images in a single model, which enables it to handle multiple data types simultaneously. This multimodal approach enhances accuracy and responsiveness in human-computer interactions. GPT-4o matches GPT-4 Turbo in English text and coding tasks while offering superior performance in non-English language tasks and vision tasks, setting new benchmarks for AI capabilities.GPT-4 and GPT-4 Turbo models
These models can be used only with the Chat Completions API. See Model versions to learn about how Azure OpenAI handles model version upgrades. See Working with models to learn how to view and configure the model version settings of your GPT-4 deployments.| Model ID | Description | Max request (tokens) | Training data (up to) |
|---|---|---|---|
gpt-4o (2024-11-20) GPT-4o (Omni) | - Structured outputs. - Text and image processing. - JSON Mode. - Parallel function calling. - Enhanced accuracy and responsiveness. - Parity with English text and coding tasks compared to GPT-4 Turbo with Vision. - Superior performance in non-English languages and in vision tasks. - Enhanced creative writing ability. | Input: 128,000 Output: 16,384 | October 2023 |
gpt-4o (2024-08-06) GPT-4o (Omni) | - Structured outputs. - Text and image processing. - JSON Mode. - Parallel function calling. - Enhanced accuracy and responsiveness. - Parity with English text and coding tasks compared to GPT-4 Turbo with Vision. - Superior performance in non-English languages and in vision tasks. | Input: 128,000 Output: 16,384 | October 2023 |
gpt-4o-mini (2024-07-18) GPT-4o mini | - Fast, inexpensive, capable model ideal for replacing GPT-3.5 Turbo series models. - Text and image processing. - JSON Mode. - Parallel function calling. | Input: 128,000 Output: 16,384 | October 2023 |
gpt-4o (2024-05-13) GPT-4o (Omni) | - Text and image processing. - JSON Mode. - Parallel function calling. - Enhanced accuracy and responsiveness. - Parity with English text and coding tasks compared to GPT-4 Turbo with Vision. - Superior performance in non-English languages and in vision tasks. | Input: 128,000 Output: 4,096 | October 2023 |
gpt-4 (turbo-2024-04-09) GPT-4 Turbo with Vision | New generally available model. - Replacement for all previous GPT-4 preview models ( vision-preview, 1106-Preview, 0125-Preview). - Feature availability is currently different, depending on the method of input and the deployment type. | Input: 128,000 Output: 4,096 | December 2023 |
We don’t recommend that you use preview models in production. We’ll upgrade all deployments of preview models to either future preview versions or to the latest stable, generally available version. Models that are designated preview don’t follow the standard Azure OpenAI model lifecycle.
Embeddings
text-embedding-3-large is the latest and most capable embedding model. You can’t upgrade between embeddings models. To move from using text-embedding-ada-002 to text-embedding-3-large, you need to generate new embeddings.
text-embedding-3-largetext-embedding-3-smalltext-embedding-ada-002
| Evaluation benchmark | text-embedding-ada-002 | text-embedding-3-small | text-embedding-3-large |
|---|---|---|---|
| MIRACL average | 31.4 | 44.0 | 54.9 |
| MTEB average | 61.0 | 62.3 | 64.6 |
dimensions parameter. Typically, larger embeddings are more expensive from a compute, memory, and storage perspective. When you can adjust the number of dimensions, you gain more control over overall cost and performance. The dimensions parameter isn’t supported in all versions of the OpenAI 1.x Python library. To take advantage of this parameter, we recommend that you upgrade to the latest version: pip install openai --upgrade.
OpenAI’s MTEB benchmark testing found that even when the third generation model’s dimensions are reduced to less than the 1,536 dimensions of text-embeddings-ada-002, performance remains slightly better.
Image generation models
The image generation models generate images from text prompts that the user provides. GPT-image-1 series models are in limited access preview. DALL-E 3 is generally available for use with the REST APIs. DALL-E 2 and DALL-E 3 with client SDKs are in preview. Registration is required to accessgpt-image-1, gpt-image-1-mini, or gpt-image-1.5. Access is granted based on Microsoft’s eligibility criteria. Customers who have access to other limited access models still need to request access for this model.
To request access, fill out an application form: Apply for GPT-image-1 access; Apply for GPT-image-1.5 access. When access is granted, you need to create a deployment for the model.
Region availability
| Model | Region |
|---|---|
dall-e-3 | East US Australia East Sweden Central |
gpt-image-1 | West US 3 (Global Standard) East US 2 (Global Standard) UAE North (Global Standard) Poland Central (Global Standard) Sweden Central (Global Standard) |
gpt-image-1-mini | West US 3 (Global Standard) East US 2 (Global Standard) UAE North (Global Standard) Poland Central (Global Standard) Sweden Central (Global Standard) |
gpt-image-1.5 | West US 3 (Global Standard) East US 2 (Global Standard) UAE North (Global Standard) Poland Central (Global Standard) Sweden Central (Global Standard) |
Video generation models
Sora is an AI model from OpenAI that can create realistic and imaginative video scenes from text instructions. Sora is in preview.Region availability
| Model | Region |
|---|---|
sora | East US 2 (Global Standard) Sweden Central (Global Standard) |
sora-2 | East US 2 (Global Standard) Sweden Central (Global Standard) |
Audio models
Audio models in Azure OpenAI are available via therealtime, completions, and audio APIs.
GPT-4o audio models
The GPT-4o audio models are part of the GPT-4o model family and support either low-latency, speech in, speech out conversational interactions or audio generation.We don’t recommend using preview models in production. We’ll upgrade all deployments of preview models to either future preview versions or to the latest stable, generally available version. Models that are designated preview don’t follow the standard Azure OpenAI model lifecycle.
| Model ID | Description | Max request (tokens) | Training data (up to) |
|---|---|---|---|
gpt-4o-mini-audio-preview (2024-12-17) GPT-4o audio | Audio model for audio and text generation. | Input: 128,000 Output: 16,384 | September 2023 |
gpt-4o-audio-preview (2024-12-17) GPT-4o audio | Audio model for audio and text generation. | Input: 128,000 Output: 16,384 | September 2023 |
gpt-4o-realtime-preview (2025-06-03) GPT-4o audio | Audio model for real-time audio processing. | Input: 128,000 Output: 4,096 | October 2023 |
gpt-4o-realtime-preview (2024-12-17) GPT-4o audio | Audio model for real-time audio processing. | Input: 128,000 Output: 4,096 | October 2023 |
gpt-4o-mini-realtime-preview (2024-12-17) GPT-4o audio | Audio model for real-time audio processing. | Input: 128,000 Output: 4,096 | October 2023 |
gpt-realtime (2025-08-28) (GA)gpt-realtime-mini (2025-10-06)gpt-realtime-mini-2025-12-15 (2025-12-15) gpt-audio(2025-08-28)gpt-audio-mini(2025-10-06) | Audio model for real-time audio processing. | Input: 28,672 Output: 4,096 | October 2023 |
Audio API
The audio models via the/audio API can be used for speech to text, translation, and text to speech.
Speech-to-text models
| Model ID | Description | Max request (audio file size) |
|---|---|---|
whisper | General-purpose speech recognition model. | 25 MB |
gpt-4o-transcribe | Speech-to-text model powered by GPT-4o. | 25 MB |
gpt-4o-mini-transcribe | Speech-to-text model powered by GPT-4o mini. | 25 MB |
gpt-4o-transcribe-diarize | Speech-to-text model with automatic speech recognition. | 25 MB |
gpt-4o-mini-transcribe-2025-12-15 | Speech-to-text model with automatic speech recognition. Improved transcription accuracy and robustness. | 25 MB |
Speech translation models
| Model ID | Description | Max request (audio file size) |
|---|---|---|
whisper | General-purpose speech recognition model. | 25 MB |
Text-to-speech models (preview)
| Model ID | Description |
|---|---|
tts | Text-to-speech model optimized for speed. |
tts-hd | Text-to-speech model optimized for quality. |
gpt-4o-mini-tts | Text-to-speech model powered by GPT-4o mini. You can guide the voice to speak in a specific style or tone. |
gpt-4o-mini-tts-2025-12-15 | Text-to-speech model powered by GPT-4o mini. You can guide the voice to speak in a specific style or tone. |
Model summary table and region availability
Models by deployment type
Azure OpenAI provides customers with choices on the hosting structure that fits their business and usage patterns. The service offers two main types of deployment:- Standard: Has a global deployment option, routing traffic globally to provide higher throughput.
- Provisioned: Also has a global deployment option, allowing customers to purchase and deploy provisioned throughput units across Azure global infrastructure.
- Global Standard
- Global Provisioned managed
- Global Batch
- Data Zone Standard
- Data Zone Provisioned managed
- Data Zone Batch
- Standard
- Provisioned managed
Global Standard model availability
| Region | gpt-5.2-codex, 2026-01-14 | gpt-5.2, 2025-12-11 | gpt-5.2-chat, 2025-12-11 | gpt-5.1-codex-max, 2025-12-04 | gpt-5.1, 2025-11-13 | gpt-5.1-chat, 2025-11-13 | gpt-5.1-codex, 2025-11-13 | gpt-5.1-codex-mini, 2025-11-13 | gpt-5-pro, 2025-10-06 | gpt-5-codex, 2025-09-15 | gpt-5, 2025-08-07 | gpt-5-mini, 2025-08-07 | gpt-5-nano, 2025-08-07 | gpt-5-chat, 2025-08-07 | gpt-5-chat, 2025-10-03 | o3-pro, 2025-06-10 | codex-mini, 2025-05-16 | sora, 2025-05-02 | model-router, 2025-08-07 | model-router, 2025-05-19 | model-router, 2025-11-18 | o3, 2025-04-16 | o4-mini, 2025-04-16 | gpt-image-1, 2025-04-15 | gpt-4.1, 2025-04-14 | gpt-4.1-nano, 2025-04-14 | gpt-4.1-mini, 2025-04-14 | computer-use-preview, 2025-03-11 | o3-mini, 2025-01-31 | o1, 2024-12-17 | gpt-4o, 2024-05-13 | gpt-4o, 2024-08-06 | gpt-4o, 2024-11-20 | gpt-4o-mini, 2024-07-18 | text-embedding-3-small, 1 | text-embedding-3-large, 1 | text-embedding-ada-002, 2 | gpt-4o-realtime-preview, 2024-12-17 | gpt-4o-audio-preview, 2024-12-17 | gpt-4o-mini-realtime-preview, 2024-12-17 | gpt-4o-mini-audio-preview, 2024-12-17 | gpt-4o-transcribe, 2025-03-20 | gpt-4o-mini-tts, 2025-12-15 | gpt-4o-mini-tts, 2025-03-20 | gpt-4o-mini-transcribe, 2025-12-15 | gpt-4o-mini-transcribe, 2025-03-20 | gpt-image-1-mini, 2025-10-06 | gpt-audio-mini, 2025-10-06 | gpt-audio-mini, 2025-12-15 | gpt-image-1.5, 2025-12-16 | sora-2, 2025-10-06 | gpt-realtime-mini, 2025-10-06 | gpt-realtime-mini, 2025-12-15 | o3-deep-research, 2025-06-26 | gpt-realtime, 2025-08-28 | gpt-audio, 2025-08-28 | gpt-4o-transcribe-diarize, 2025-10-15 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| australiaeast | - | - | - | - | ✅ | ✅ | ✅ | ✅ | - | - | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | ✅ | ✅ | - | ✅ | ✅ | ✅ | - | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
| brazilsouth | - | - | - | - | - | - | - | - | - | - | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | ✅ | ✅ | - | ✅ | ✅ | ✅ | - | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
| canadacentral | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | ✅ | ✅ | ✅ | - | - | - | - | ✅ | - | ✅ | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
| canadaeast | - | - | - | - | ✅ | ✅ | ✅ | ✅ | - | - | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | ✅ | ✅ | - | ✅ | ✅ | ✅ | - | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
| centralus | - | - | - | - | - | - | - | - | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | - | - | - | - | ✅ | ✅ | - | ✅ | ✅ | ✅ | - | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | - | - | - | ✅ | ✅ | - | - | ✅ | ✅ | - | ✅ | ✅ | - |
| eastus | - | - | - | - | - | - | - | - | - | - | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | ✅ | ✅ | - | ✅ | ✅ | ✅ | - | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | - | - | - | ✅ | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
| eastus2 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | - | ✅ | ✅ | ✅ |
| francecentral | - | - | - | - | - | - | - | - | - | - | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | ✅ | ✅ | - | ✅ | ✅ | ✅ | - | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
| germanywestcentral | - | - | - | - | - | - | - | - | - | - | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | ✅ | ✅ | - | ✅ | ✅ | ✅ | - | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
| italynorth | - | - | - | - | - | - | - | - | - | - | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | ✅ | ✅ | - | ✅ | ✅ | ✅ | - | ✅ | ✅ | - | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
| japaneast | - | - | - | - | ✅ | ✅ | ✅ | ✅ | - | - | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | ✅ | ✅ | - | ✅ | ✅ | ✅ | - | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
| koreacentral | - | - | - | - | ✅ | ✅ | ✅ | ✅ | - | - | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | ✅ | ✅ | - | ✅ | ✅ | ✅ | - | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
| northcentralus | - | - | - | - | - | - | - | - | - | - | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | ✅ | ✅ | - | ✅ | ✅ | ✅ | - | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
| norwayeast | - | - | - | - | - | - | - | - | - | - | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | ✅ | ✅ | - | ✅ | ✅ | ✅ | - | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | ✅ | - | - | - |
| polandcentral | - | - | - | - | - | - | - | - | - | - | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | - | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | - | ✅ | - | - | ✅ | - | - | - | - | - | - | - |
| southafricanorth | - | - | - | - | - | - | - | - | - | - | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | ✅ | ✅ | - | ✅ | ✅ | ✅ | - | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
| southcentralus | - | ✅ | - | - | - | - | - | - | - | - | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | ✅ | ✅ | - | ✅ | ✅ | ✅ | - | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
| southeastasia | - | - | - | - | - | - | - | - | - | - | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | ✅ | ✅ | - | ✅ | ✅ | ✅ | - | - | - | - | ✅ | - | ✅ | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
| southindia | - | - | - | - | - | - | - | - | - | - | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | ✅ | ✅ | - | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
| spaincentral | - | - | - | - | - | - | - | - | - | - | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | ✅ | ✅ | - | ✅ | ✅ | ✅ | - | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
| swedencentral | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | - | ✅ | - | - | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | - | ✅ | ✅ | ✅ |
| switzerlandnorth | - | - | - | - | ✅ | ✅ | ✅ | ✅ | - | - | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | ✅ | ✅ | - | ✅ | ✅ | ✅ | - | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
| switzerlandwest | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | ✅ | ✅ | ✅ | - | - | - | - | ✅ | - | ✅ | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
| uaenorth | - | - | - | - | - | - | - | - | - | - | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | - | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | - | ✅ | - | - | ✅ | - | - | - | - | - | - | - |
| uksouth | - | - | - | - | ✅ | ✅ | ✅ | ✅ | - | - | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | ✅ | ✅ | - | ✅ | ✅ | ✅ | - | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
| westeurope | - | - | - | - | - | - | - | - | - | - | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | ✅ | ✅ | - | ✅ | ✅ | ✅ | - | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
| westus | - | - | - | - | - | - | - | - | - | - | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | ✅ | ✅ | - | ✅ | ✅ | ✅ | - | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | ✅ | - | - | - |
| westus3 | - | - | - | - | - | - | - | - | - | - | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | - | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | - | - | - | - | - | - | - | - | - | ✅ | - | - | ✅ | - | - | - | - | - | - | - |
o3-deep-research is currently only available with Foundry Agent Service. To learn more, see the Deep Research tool guidance.Embeddings models
These models can be used only with Embedding API requests.text-embedding-3-large is the latest and most capable embedding model. You can’t upgrade between embedding models. To migrate from using text-embedding-ada-002 to text-embedding-3-large, you need to generate new embeddings.text-embedding-ada-002 (version 2) |8,192 | 1,536 | Sep 2021 |
| text-embedding-ada-002 (version 1) |2,046 | 1,536 | Sep 2021 |
| text-embedding-3-large | 8,192 | 3,072 |Sep 2021 |
| text-embedding-3-small | 8,192| 1,536 | Sep 2021 |
When you send an array of inputs for embedding, the maximum number of input items in the array per call to the embedding endpoint is 2,048.
Image generation models
| Model ID | Max request (characters) |
|---|---|
gpt-image-1 | 4,000 |
gpt-image-1-mini | 4,000 |
gpt-image-1.5 | 4,000 |
dall-e-3 | 4,000 |
Video generation models
| Model ID | Max Request (characters) |
|---|---|
| sora | 4,000 |
Fine-tuning models
The supported regions for fine-tuning might vary if you use Azure OpenAI models in a Microsoft Foundry project versus outside a project.
gpt-4o-mini (2024-07-18) | North Central US
Sweden Central | ✅ | ✅ | Input: 128,000
Output: 16,384
Training example context length: 65,536 | Oct 2023 | Text to text | |
gpt-4o (2024-08-06) | East US2
North Central US
Sweden Central | ✅ | ✅ | Input: 128,000
Output: 16,384
Training example context length: 65,536 | Oct 2023 | Text and vision to text | |
gpt-4.1 (2025-04-14) | North Central US
Sweden Central | ✅ | ✅ | Input: 128,000
Output: 16,384
Training example context length: 65,536 | May 2024 | Text and vision to text | |
gpt-4.1-mini (2025-04-14) | North Central US
Sweden Central | ✅ | ✅ | Input: 128,000
Output: 16,384
Training example context length: 65,536 | May 2024 | Text to text | |
gpt-4.1-nano (2025-04-14) | North Central US Sweden Central | ✅ | ✅ | Input: 128,000
Output: 16,384
Training example context length: 32,768 | May 2024 | Text to text | |
o4-mini (2025-04-16) | East US2
Sweden Central | ✅ | ❌ | Input: 128,000
Output: 16,384
Training example context length: 65,536 | May 2024 | Text to text | |
Ministral-3B (preview) (2411) | Not supported | ✅ | ❌ | Input: 128,000
Output: Unknown
Training example context length: Unknown | Unknown | Text to text | |
Qwen-32B (preview) | Not supported | ✅ | ❌ | Input: 8,000 Output: 32,000
Training example context length: 8192 | July 2024 | Text to text |
Global training provides more affordable training per token, but doesn’t offer data residency. It’s currently available to Foundry resources in the following regions:
- Australia East
- Brazil South
- Canada Central
- Canada East
- East US
- East US2
- France Central
- Germany West Central
- Italy North
- Japan East (no vision support)
- Korea Central
- North Central US
- Norway East
- Poland Central (no 4.1-nano support)
- Southeast Asia
- South Africa North
- South Central US
- South India
- Spain Central
- Sweden Central
- Switzerland West
- Switzerland North
- UK South
- West Europe
- West US
- West US3
Assistants (preview)
For Assistants, you need a combination of a supported model and a supported region. Certain tools and capabilities require the latest models. The following models are available in the Assistants API, SDK, and Foundry. The following table is for standard deployment. For information on provisioned throughput unit availability, see Provisioned throughput. The listed models and regions can be used with both Assistants v1 and v2. You can use Global Standard models if they’re supported in the following regions.| Region | gpt-4o, 2024-05-13 | gpt-4o, 2024-08-06 | gpt-4o-mini, 2024-07-18 | gpt-4, 0613 | gpt-4, 1106-Preview | gpt-4, 0125-Preview | gpt-4, turbo-2024-04-09 | gpt-4-32k, 0613 | gpt-35-turbo, 0613 | gpt-35-turbo, 1106 | gpt-35-turbo, 0125 | gpt-35-turbo-16k, 0613 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| australiaeast | - | - | - | ✅ | ✅ | - | - | ✅ | ✅ | ✅ | ✅ | ✅ |
| eastus | ✅ | ✅ | ✅ | - | - | ✅ | ✅ | - | ✅ | - | ✅ | ✅ |
| eastus2 | ✅ | ✅ | ✅ | - | ✅ | - | ✅ | - | ✅ | - | ✅ | ✅ |
| francecentral | - | - | - | ✅ | ✅ | - | - | ✅ | ✅ | ✅ | - | ✅ |
| japaneast | - | - | - | - | - | - | - | - | ✅ | - | ✅ | ✅ |
| norwayeast | - | - | - | - | ✅ | - | - | - | - | - | - | - |
| southindia | - | - | - | - | ✅ | - | - | - | - | ✅ | ✅ | - |
| swedencentral | ✅ | ✅ | ✅ | ✅ | ✅ | - | ✅ | ✅ | ✅ | ✅ | - | ✅ |
| uksouth | - | - | - | - | ✅ | ✅ | - | - | ✅ | ✅ | ✅ | ✅ |
| westus | ✅ | ✅ | ✅ | - | ✅ | - | ✅ | - | - | ✅ | ✅ | - |
| westus3 | ✅ | ✅ | ✅ | - | ✅ | - | ✅ | - | - | - | ✅ | - |
Model retirement
For the latest information on model retirements, refer to the model retirement guide.Related content
- Foundry Models from partners and community
- Model retirement and deprecation
- Learn more about working with Azure OpenAI models
- Learn more about Azure OpenAI
- Learn more about fine-tuning Azure OpenAI models
Foundry Models sold directly by Azure also include all Azure OpenAI models. To learn about these models, switch to the Azure OpenAI models collection at the top of this article.
Black Forest Labs models sold directly by Azure
The Black Forest Labs (BFL) collection of image generation models includes FLUX.2 [pro] for image generation and editing through both text and image prompts, FLUX.1 Kontext [pro] for in-context generation and editing, and FLUX1.1 [pro] for text-to-image generation. You can run these models through the BFL service provider API and through the images/generations and images/edits endpoints.See the GitHub sample for image generation with FLUX models in Microsoft Foundry and its associated notebook that showcases how to create high-quality images from textual prompts.
| Model | Type & API endpoint | Capabilities | Deployment type (region availability) |
|---|---|---|---|
FLUX.2-pro | Image generation - BFL service provider API: <resource-name>/providers/blackforestlabs/v1/flux-2-pro | - Input: text and image (32,000 tokens and up to 8 imagesi) - Output: One Image - Tool calling: No - Response formats: Image (PNG and JPG) - Key features: Multi-reference support for up to 8 imagesii; more grounded in real-world knowledge; greater output flexibility; enhanced performance - Additional parameters: (In provider-specific API only) Supports all parameters. | - Global standard (all regions) |
FLUX.1-Kontext-pro | Image generation - Image API: https://<resource-name>/openai/deployments/{deployment-id}/images/generations and https://<resource-name>/openai/deployments/{deployment-id}/images/edits - BFL service provider API: <resource-name>/providers/blackforestlabs/v1/flux-kontext-pro?api-version=preview | - Input: text and image (5,000 tokens and 1 image) - Output: One Image - Tool calling: No - Response formats: Image (PNG and JPG) - Key features: Character consistency, advanced editing - Additional parameters: (In provider-specific API only) seed, aspect ratio, input_image, prompt_unsampling, safety_tolerance, output_format | - Global standard (all regions) |
FLUX-1.1-pro | Image generation - Image API: https://<resource-name>/openai/deployments/{deployment-id}/images/generations - BFL service provider API: <resource-name>/providers/blackforestlabs/v1/flux-pro-1.1?api-version=preview | - Input: text (5,000 tokens and 1 image) - Output: One Image - Tool calling: No - Response formats: Image (PNG and JPG) - Key features: Fast inference speed, strong prompt adherence, competitive pricing, scalable generation - Additional parameters: (In provider-specific API only) width, height, prompt_unsampling, seed, safety_tolerance, output_format | - Global standard (all regions) |
Code samples for FLUX.2[pro]
Image generation- Input: Text
- Output: One image
- Input: Up to eight bit-64 encoded images
- Output: One image
Cohere models sold directly by Azure
The Cohere family of models includes various models optimized for different use cases, including chat completions, rerank/text classification, and embeddings. Cohere models are optimized for various use cases that include reasoning, summarization, and question answering.| Model | Type | Capabilities | Deployment type (region availability) |
|---|---|---|---|
| Cohere-rerank-v4.0-pro | text classification (rerank) | - Input: text - Output: text - Languages: en, fr, es, it, de, pt-br, ja, zh-cn, ar, vi, hi, ru, id, and nl - Tool calling: No - Response formats: JSON | - Global standard (all regions) - Managed compute |
| Cohere-rerank-v4.0-fast | text classification (rerank) | - Input: text - Output: text - Languages: en, fr, es, it, de, pt-br, ja, zh-cn, ar, vi, hi, ru, id, and nl - Tool calling: No - Response formats: JSON | - Global standard (all regions) - Managed compute |
Cohere-command-a | chat-completion | - Input: text (131,072 tokens) - Output: text (8,182 tokens) - Languages: en, fr, es, it, de, pt-br, ja, ko, zh-cn, and ar - Tool calling: Yes - Response formats: Text, JSON | - Global standard (all regions) |
embed-v-4-0 | embeddings | - Input: text (512 tokens) and images (2MM pixels) - Output: Vector (256, 512, 1024, 1536 dim.) - Languages: en, fr, es, it, de, pt-br, ja, ko, zh-cn, and ar | - Global standard (all regions) |
DeepSeek models sold directly by Azure
The DeepSeek family of models includes several reasoning models, which excel at reasoning tasks by using a step-by-step training process, such as language, scientific reasoning, and coding tasks.| Model | Type | Capabilities | Deployment type (region availability) |
|---|---|---|---|
DeepSeek-V3.2-Speciale | chat-completion (with reasoning content) | - Input: text (128,000 tokens) - Output: text (128,000 tokens) - Languages: en and zh - Tool calling: No - Response formats: Text, JSON | - Global standard (all regions) |
DeepSeek-V3.2 | chat-completion (with reasoning content) | - Input: text (128,000 tokens) - Output: text (128,000 tokens) - Languages: en and zh - Tool calling: No - Response formats: Text, JSON | - Global standard (all regions) |
DeepSeek-V3.1 | chat-completion (with reasoning content) | - Input: text (131,072 tokens) - Output: text (131,072 tokens) - Languages: en and zh - Tool calling: Yes - Response formats: Text, JSON | - Global standard (all regions) |
DeepSeek-R1-0528 | chat-completion (with reasoning content) | - Input: text (163,840 tokens) - Output: text (163,840 tokens) - Languages: en and zh - Tool calling: No - Response formats: Text | - Global standard (all regions) - Global provisioned (all regions) |
DeepSeek-V3-0324 | chat-completion | - Input: text (131,072 tokens) - Output: text (131,072 tokens) - Languages: en and zh - Tool calling: Yes - Response formats: Text, JSON | - Global standard (all regions) - Global provisioned (all regions) |
DeepSeek-R1 | chat-completion (with reasoning content) | - Input: text (163,840 tokens) - Output: text (163,840 tokens) - Languages: en and zh - Tool calling: No - Response formats: Text | - Global standard (all regions) - Global provisioned (all regions) |
Meta models sold directly by Azure
Meta Llama models and tools are a collection of pretrained and fine-tuned generative AI text and image reasoning models. Meta models range in scale to include:- Small language models (SLMs) like 1B and 3B Base and Instruct models for on-device and edge inferencing
- Mid-size large language models (LLMs) like 7B, 8B, and 70B Base and Instruct models
- High-performance models like Meta Llama 3.1-405B Instruct for synthetic data generation and distillation use cases.
| Model | Type | Capabilities | Deployment type (region availability) |
|---|---|---|---|
Llama-4-Maverick-17B-128E-Instruct-FP8 | chat-completion | - Input: text and images (1M tokens) - Output: text (1M tokens) - Languages: ar, en, fr, de, hi, id, it, pt, es, tl, th, and vi - Tool calling: No - Response formats: Text | - Global standard (all regions) |
Llama-3.3-70B-Instruct | chat-completion | - Input: text (128,000 tokens) - Output: text (8,192 tokens) - Languages: en, de, fr, it, pt, hi, es, and th - Tool calling: No - Response formats: Text | - Global standard (all regions) |
Microsoft models sold directly by Azure
Microsoft models include various model groups such as Model Router, MAI models, Phi models, healthcare AI models, and more. See the Microsoft model collection in the Foundry portal. You can also find several Microsoft models available from partners and community.| Model | Type | Capabilities | Deployment type (region availability) |
|---|---|---|---|
| model-router1 | chat-completion | More details in Model router overview. - Input: text, image - Output: text (max output tokens varies2) Context window: 200,0003 - Languages: en | - Global standard (East US 2, Sweden Central) - Data Zone standard4 (East US 2, Sweden Central) |
MAI-DS-R1 | chat-completion (with reasoning content) | - Input: text (163,840 tokens) - Output: text (163,840 tokens) - Languages: en and zh - Tool calling: No - Response formats: Text | - Global standard (all regions) |
2025-11-18. Earlier versions (2025-08-07 and 2025-05-19) are also available.
2 Max output tokens varies for underlying models in the model router. For example, 32,768 (GPT-4.1 series), 100,000 (o4-mini), 128,000 (gpt-5 reasoning models), and 16,384 (gpt-5-chat).
3 Larger context windows are compatible with some of the underlying models of the Model Router. That means an API call with a larger context succeeds only if the prompt gets routed to one of such models. Otherwise, the call fails.
4 Billing for Data Zone Standard model router deployments begins no earlier than November 1, 2025.
Mistral models sold directly by Azure
| Model | Type | Capabilities | Deployment type (region availability) |
|---|---|---|---|
Mistral-Large-3 | chat-completion | - Input: text, image - Output: text - Languages: en, fr, de, es, it, pt, nl, zh, ja, ko, and ar - Tool calling: Yes - Response formats: Text, JSON | - Global standard (West US 3) |
mistral-document-ai-2505 | Image-to-Text | - Input: image or PDF pages (30 pages, max 30MB PDF file) - Output: text - Languages: en - Tool calling: no - Response formats: Text, JSON, Markdown | - Global standard (all regions) - Data zone standard (US and EU) |
Moonshot AI models sold directly by Azure
Moonshot AI models include Kimi K2.5 and Kimi K2 Thinking. Kimi K2.5 is a multimodal reasoning model that accepts text and image input, while Kimi K2 Thinking is the latest, most capable version of open-source thinking model.| Model | Type | Capabilities | Deployment type (region availability) |
|---|---|---|---|
Kimi-K2.5 | chat-completion (with reasoning content) | - Input: text and image (262,144 tokens) - Output: text (262,144 tokens) - Languages: en and zh - Tool calling: Yes - Response formats: Text | - Global standard (all regions) |
Kimi-K2-Thinking | chat-completion (with reasoning content) | - Input: text (262,144 tokens) - Output: text (262,144 tokens) - Languages: en and zh - Tool calling: Yes - Response formats: Text | - Global standard (all regions) |
xAI models sold directly by Azure
xAI’s Grok models in Foundry Models include a diverse set of reasoning and non-reasoning models designed for enterprise use cases such as data extraction, coding, text summarization, and agentic applications. Registration is required for access to grok-code-fast-1 and grok-4.| Model | Type | Capabilities | Deployment type (region availability) |
|---|---|---|---|
grok-4 | chat-completion | - Input: text (262,000 tokens) - Output: text (8,192 tokens) - Languages: en - Tool calling: yes - Response formats: text | - Global standard (all regions) |
grok-4-fast-reasoning | chat-completion | - Input: text, image (128,000 tokens) - Output: text (128,000 tokens) - Languages: en - Tool calling: yes - Response formats: text | - Global standard (all regions) - Data zone standard (US) |
grok-4-fast-non-reasoning | chat-completion | - Input: text, image (128,000 tokens) - Output: text (128,000 tokens) - Languages: en - Tool calling: yes - Response formats: text | - Global standard (all regions) - Data zone standard (US) |
grok-code-fast-1 | chat-completion | - Input: text (256,000 tokens) - Output: text (8,192 tokens) - Languages: en - Tool calling: yes - Response formats: text | - Global standard (all regions) |
grok-3 | chat-completion | - Input: text (131,072 tokens) - Output: text (131,072 tokens) - Languages: en - Tool calling: yes - Response formats: text | - Global standard (all regions) - Data zone standard (US) |
grok-3-mini | chat-completion | - Input: text (131,072 tokens) - Output: text (131,072 tokens) - Languages: en - Tool calling: yes - Response formats: text | - Global standard (all regions) - Data zone standard (US) |
Model region availability by deployment type
Foundry Models gives you choices for the hosting structure that fits your business and usage patterns. The service offers two main types of deployment:- Standard: Has a global deployment option, routing traffic globally to provide higher throughput.
- Provisioned: Also has a global deployment option, allowing you to purchase and deploy provisioned throughput units across Azure global infrastructure.
- Global Standard
- Global Provisioned managed
- Data Zone Standard
Global Standard model availability
| Region | DeepSeek-R1-0528 | DeepSeek-R1 | DeepSeek-V3-0324 | DeepSeek-V3.1 | FLUX.1-Kontext-pro | FLUX-1.1-pro | grok-4 | grok-4-fast-reasoning | grok-4-fast-non-reasoning | grok-code-fast-1 | grok-3 | grok-3-mini | Llama-4-Maverick-17B-128E-Instruct-FP8 | Llama-3.3-70B-Instruct | MAI-DS-R1 | mistral-document-ai-2505 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| australiaeast | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| brazilsouth | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| canadaeast | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| eastus | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| eastus2 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| francecentral | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| germanywestcentral | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| italynorth | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| japaneast | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| koreacentral | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| northcentralus | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| norwayeast | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| polandcentral | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| southafricanorth | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| southcentralus | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| southindia | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| spaincentral | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| swedencentral | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| switzerlandnorth | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| switzerlandwest | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| uaenorth | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| uksouth | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| westeurope | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| westus | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| westus3 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |