Endpoints for Microsoft Foundry Models

This article refers to the Microsoft Foundry (new) portal.

Microsoft Foundry Models enables you to access the most powerful models from leading model providers through a single endpoint and set of credentials. This capability lets you switch between models and use them in your application without changing any code. This article explains how the Foundry services organize models and how to use the inference endpoint to access them.

If you’re currently using an Azure AI Inference beta SDK with Microsoft Foundry Models or Azure OpenAI service, we strongly recommend that you transition to the generally available OpenAI/v1 API, which uses an OpenAI stable SDK.For more information on how to migrate to the OpenAI/v1 API by using an SDK in your programming language of choice, see Migrate from Azure AI Inference SDK to OpenAI SDK.

Deployments

Foundry uses deployments to make models available. Deployments give a model a name and set specific configurations. You can access a model by using its deployment name in your requests. A deployment includes:

A model name
A model version
A provisioning or capacity type¹
A content filtering configuration¹
A rate limiting configuration¹

¹ These configurations can change depending on the selected model. A Foundry resource can have many model deployments. You only pay for inference performed on model deployments. Deployments are Azure resources, so they’re subject to Azure policies. For more information about creating deployments, see Add and configure model deployments.

Azure OpenAI inference endpoint

The Azure OpenAI API exposes the full capabilities of OpenAI models and supports more features like assistants, threads, files, and batch inference. You might also access non-OpenAI models through this route. Azure OpenAI endpoints, usually of the form https://<resource-name>.openai.azure.com, work at the deployment level and each deployment has its own associated URL. However, you can use the same authentication mechanism to consume the deployments. For more information, see the reference page for Azure OpenAI API.

An illustration showing how Azure OpenAI deployments contain a single URL for each deployment.

Each deployment has a URL that’s formed by concatenating the Azure OpenAI base URL and the route /deployments/<model-deployment-name>.

Python
JavaScript
C#
Java
REST

Install the package openai using your package manager, like pip:

pip install openai --upgrade

Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:

import os
from openai import AzureOpenAI

client = AzureOpenAI(
    azure_endpoint = "https://<resource>.services.ai.azure.com"
    api_key=os.getenv("AZURE_INFERENCE_CREDENTIAL"),  
    api_version="2024-10-21",
)

Install the package openai using npm:

npm install openai

Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:

import { AzureKeyCredential } from "@azure/openai";

const endpoint = "https://<resource>.services.ai.azure.com";
const apiKey = new AzureKeyCredential(process.env.AZURE_INFERENCE_CREDENTIAL);
const apiVersion = "2024-10-21"

const client = new AzureOpenAI({ 
    endpoint, 
    apiKey, 
    apiVersion, 
    "deepseek-v3-0324"
});

Here, deepseek-v3-0324 is the name of a model deployment in the Microsoft Foundry resource.

Install the OpenAI library with the following command:

dotnet add package Azure.AI.OpenAI --prerelease

You can use the package to consume the model. The following example shows how to create a client to consume chat completions:

AzureOpenAIClient client = new(
    new Uri("https://<resource>.services.ai.azure.com"),
    new ApiKeyCredential(Environment.GetEnvironmentVariable("AZURE_INFERENCE_CREDENTIAL"))
);

Add the package to your project:

<dependency>
    <groupId>com.azure</groupId>
    <artifactId>azure-ai-openai</artifactId>
    <version>1.0.0-beta.16</version>
</dependency>

Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:

OpenAIClient client = new OpenAIClientBuilder()
    .credential(new AzureKeyCredential("{key}"))
    .endpoint("https://<resource>.services.ai.azure.com")
    .buildClient();

Use the reference section to explore the API design and which parameters are available. For example, the reference section for Chat completions details how to use the route /chat/completions to generate predictions based on chat-formatted instructions:Request

POST https://<resource>.services.ai.azure.com/openai/deployments/deepseek-v3-0324/chat/completions?api-version=2024-10-21
api-key: <api-key>
Content-Type: application/json

Here, deepseek-v3-0324 is the name of a model deployment in the Foundry resource.

Python
JavaScript
C#
Java
REST

response = client.chat.completions.create(
    model="deepseek-v3-0324", # Replace with your model deployment name.
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain Riemann's conjecture in 1 paragraph"}
    ]
)

print(response.model_dump_json(indent=2)

var messages = [
    { role: "system", content: "You are a helpful assistant" },
    { role: "user", content: "Explain Riemann's conjecture in 1 paragraph" },
];

const response = await client.chat.completions.create({ messages, model: "deepseek-v3-0324" });

console.log(response.choices[0].message.content)

ChatCompletion response = chatClient.CompleteChat(
    [
        new SystemChatMessage("You are a helpful assistant."),
        new UserChatMessage("Explain Riemann's conjecture in 1 paragraph"),
    ]);

Console.WriteLine($"{response.Role}: {response.Content[0].Text}");

List<ChatRequestMessage> chatMessages = new ArrayList<>();
chatMessages.add(new ChatRequestSystemMessage("You are a helpful assistant"));
chatMessages.add(new ChatRequestUserMessage("Explain Riemann's conjecture in 1 paragraph"));

ChatCompletions chatCompletions = client.getChatCompletions("deepseek-v3-0324",
    new ChatCompletionsOptions(chatMessages));

System.out.printf("Model ID=%s is created at %s.%n", chatCompletions.getId(), chatCompletions.getCreatedAt());
for (ChatChoice choice : chatCompletions.getChoices()) {
    ChatResponseMessage message = choice.getMessage();
    System.out.printf("Index: %d, Chat Role: %s.%n", choice.getIndex(), message.getRole());
    System.out.println("Message:");
    System.out.println(message.getContent());
}

Here, deepseek-v3-0324 is the name of a model deployment in the Microsoft Foundry resource.

Request

POST https://<resource>.services.ai.azure.com/openai/deployments/deepseek-v3-0324/chat/completions?api-version=2024-10-21
api-key: <api-key>
Content-Type: application/json

{
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant"
        },
        {
            "role": "user",
            "content": "Explain Riemann's conjecture in 1 paragraph"
        }
    ]
}

Here, deepseek-v3-0324 is the name of a model deployment in the Foundry resource.

For more information about how to use the Azure OpenAI endpoint, see Azure OpenAI in Foundry Models documentation.

Keyless authentication

Models deployed to Foundry Models in Foundry Tools support keyless authorization by using Microsoft Entra ID. Keyless authorization enhances security, simplifies the user experience, reduces operational complexity, and provides robust compliance support for modern development. It makes keyless authorization a strong choice for organizations adopting secure and scalable identity management solutions. To use keyless authentication, configure your resource and grant access to users to perform inference. After you configure the resource and grant access, authenticate as follows:

Python
C#
JavaScript
Java
REST

Install the OpenAI SDK using a package manager like pip:

pip install openai

For Microsoft Entra ID authentication, also install:

pip install azure-identity

Use the package to consume the model. The following example shows how to create a client to consume chat completions with Microsoft Entra ID and make a test call to the chat completions endpoint with your model deployment.Replace <resource> with your Foundry resource name. Find it in the Azure portal or by running az cognitiveservices account list. Replace DeepSeek-V3.1 with your actual deployment name.

from openai import OpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), 
    "https://cognitiveservices.azure.com/.default"
)

client = OpenAI(
    base_url="https://<resource>.openai.azure.com/openai/v1/",
    api_key=token_provider,
)

completion = client.chat.completions.create(
    model="DeepSeek-V3.1",  # Required: your deployment name
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is Azure AI?"}
    ]
)

print(completion.choices[0].message.content)

Expected output

Azure AI is a comprehensive suite of artificial intelligence services and tools from Microsoft that enables developers to build intelligent applications. It includes services for natural language processing, computer vision, speech recognition, and machine learning capabilities.

Reference: OpenAI Python SDK and DefaultAzureCredential class.

Install the OpenAI SDK:

dotnet add package OpenAI

For Microsoft Entra ID authentication, also install the Azure.Identity package:

dotnet add package Azure.Identity

Import the following namespaces:

using Azure.Identity;
using OpenAI;
using OpenAI.Chat;
using System.ClientModel.Primitives;

Then, use the package to consume the model. The following example shows how to create a client to consume chat completions with Microsoft Entra ID, and then make a test call to the chat completions endpoint with your model deployment.Replace <resource> with your Foundry resource name (find it in the Azure portal). Replace gpt-4o-mini with your actual deployment name.

#pragma warning disable OPENAI001

BearerTokenPolicy tokenPolicy = new(
    new DefaultAzureCredential(),
    "https://cognitiveservices.azure.com/.default"
);

ChatClient client = new(
    model: "gpt-4o-mini", // Your deployment name
    authenticationPolicy: tokenPolicy,
    options: new OpenAIClientOptions() {
        Endpoint = new Uri("https://<resource>.openai.azure.com/openai/v1/")
    }
);

ChatCompletion completion = client.CompleteChat(
    new SystemChatMessage("You are a helpful assistant."),
    new UserChatMessage("What is Azure AI?")
);

Console.WriteLine(completion.Content[0].Text);

Expected output:

Azure AI is a comprehensive suite of artificial intelligence services and tools from Microsoft that enables developers to build intelligent applications. It includes services for natural language processing, computer vision, speech recognition, and machine learning capabilities.

Reference: OpenAI .NET SDK and DefaultAzureCredential class.

Install the OpenAI SDK with npm:

npm install openai

For Microsoft Entra ID authentication, also install:

npm install @azure/identity

Then, use the package to consume the model. The following example shows how to create a client to consume chat completions with Microsoft Entra ID, and then make a test call to the chat completions endpoint with your model deployment.Replace <resource> with your Foundry resource name (find it in the Azure portal or by running az cognitiveservices account list). Replace DeepSeek-V3.1 with your actual deployment name.

import { DefaultAzureCredential, getBearerTokenProvider } from "@azure/identity";
import { OpenAI } from "openai";

const tokenProvider = getBearerTokenProvider(
    new DefaultAzureCredential(),
    'https://cognitiveservices.azure.com/.default'
);

const client = new OpenAI({
    baseURL: "https://<resource>.openai.azure.com/openai/v1/",
    apiKey: tokenProvider
});

const completion = await client.chat.completions.create({
    model: "DeepSeek-V3.1", // Required: your deployment name
    messages: [
        { role: "system", content: "You are a helpful assistant." },
        { role: "user", content: "What is Azure AI?" }
    ]
});

console.log(completion.choices[0].message.content);

Expected output:

Azure AI is a comprehensive suite of artificial intelligence services and tools from Microsoft that enables developers to build intelligent applications. It includes services for natural language processing, computer vision, speech recognition, and machine learning capabilities.

Reference: OpenAI Node.js SDK and DefaultAzureCredential class.

Add the OpenAI SDK to your project. Check the OpenAI Java GitHub repository for the latest version and installation instructions.For Microsoft Entra ID authentication, also add:

<dependency>
    <groupId>com.azure</groupId>
    <artifactId>azure-identity</artifactId>
    <version>1.18.0</version>
</dependency>

import com.openai.client.OpenAIClient;
import com.openai.client.okhttp.OpenAIOkHttpClient;
import com.azure.identity.DefaultAzureCredential;
import com.azure.identity.DefaultAzureCredentialBuilder;
import com.openai.models.chat.completions.*;

DefaultAzureCredential tokenCredential = new DefaultAzureCredentialBuilder().build();

OpenAIClient client = OpenAIOkHttpClient.builder()
    .baseUrl("https://<resource>.openai.azure.com/openai/v1/")
    .credential(BearerTokenCredential.create(
        AuthenticationUtil.getBearerTokenSupplier(
            tokenCredential, 
            "https://cognitiveservices.azure.com/.default"
        )
    ))
    .build();

ChatCompletionCreateParams params = ChatCompletionCreateParams.builder()
    .addSystemMessage("You are a helpful assistant.")
    .addUserMessage("What is Azure AI?")
    .model("DeepSeek-V3.1") // Required: your deployment name
    .build();

ChatCompletion completion = client.chat().completions().create(params);
System.out.println(completion.choices().get(0).message().content());

Expected output:

Azure AI is a comprehensive suite of artificial intelligence services and tools from Microsoft that enables developers to build intelligent applications. It includes services for natural language processing, computer vision, speech recognition, and machine learning capabilities.

Reference: OpenAI Java SDK and DefaultAzureCredential class.

Explore the API design in the reference section to see which parameters are available. Indicate the authentication token in the header Authorization. For example, the Chat completion reference section details how to use the /chat/completions route to generate predictions based on chat-formatted instructions. The path /models is included in the root of the URL:RequestReplace <resource> with your Foundry resource name (find it in the Azure portal or by running az cognitiveservices account list). Replace MAI-DS-R1 with your actual deployment name.The base_url will accept both https://<resource>.openai.azure.com/openai/v1/ and https://<resource>.services.ai.azure.com/openai/v1/ formats.

curl -X POST https://<resource>.openai.azure.com/openai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $AZURE_OPENAI_AUTH_TOKEN" \
  -d '{
      "model": "MAI-DS-R1",
      "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Explain what the bitter lesson is?"
      }
    ]
  }'

ResponseIf authentication is successful, you receive a 200 OK response with chat completion results in the response body:

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1738368234,
  "model": "MAI-DS-R1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The bitter lesson refers to a key insight in AI research that emphasizes the importance of general-purpose learning methods that leverage computation, rather than human-designed domain-specific approaches. It suggests that methods which scale with increased computation tend to be more effective in the long run."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 52,
    "total_tokens": 80
  }
}

Tokens must be issued with scope https://cognitiveservices.azure.com/.default.For testing purposes, the easiest way to get a valid token for your user account is to use the Azure CLI. In a console, run the following Azure CLI command:

az account get-access-token --resource https://cognitiveservices.azure.com --query "accessToken" --output tsv

This command outputs an access token that you can store in the $AZURE_OPENAI_AUTH_TOKEN environment variable.Reference: Chat Completions API

What is Microsoft Foundry (new)?

Get started

Agent development

Agent tools & integration

Model capabilities

Fine-tuning

Manage agents, models, & tools

Observability, evaluation, & tracing

Developer experience

API & SDK

Responsible AI

Best practices

Setup & configure

Security & governance

Operate & support

Endpoints for Microsoft Foundry Models

Endpoints for Microsoft Foundry Models

Deployments

Azure OpenAI inference endpoint

Keyless authentication

What is Microsoft Foundry (new)?

Get started

Agent development

Agent tools & integration

Model capabilities

Fine-tuning

Manage agents, models, & tools

Observability, evaluation, & tracing

Developer experience

API & SDK

Responsible AI

Best practices

Setup & configure

Security & governance

Operate & support

​Endpoints for Microsoft Foundry Models

​Deployments

​Azure OpenAI inference endpoint

​Keyless authentication

​Related content

Endpoints for Microsoft Foundry Models

Deployments

Azure OpenAI inference endpoint

Keyless authentication

Related content