Skip to main content

Endpoints for Microsoft Foundry Models

This article refers to the Microsoft Foundry (new) portal.
Microsoft Foundry Models enables you to access the most powerful models from leading model providers through a single endpoint and set of credentials. This capability lets you switch between models and use them in your application without changing any code. This article explains how the Foundry services organize models and how to use the inference endpoint to access them.
If you’re currently using an Azure AI Inference beta SDK with Microsoft Foundry Models or Azure OpenAI service, we strongly recommend that you transition to the generally available OpenAI/v1 API, which uses an OpenAI stable SDK.For more information on how to migrate to the OpenAI/v1 API by using an SDK in your programming language of choice, see Migrate from Azure AI Inference SDK to OpenAI SDK.

Deployments

Foundry uses deployments to make models available. Deployments give a model a name and set specific configurations. You can access a model by using its deployment name in your requests. A deployment includes:
  • A model name
  • A model version
  • A provisioning or capacity type1
  • A content filtering configuration1
  • A rate limiting configuration1
1 These configurations can change depending on the selected model. A Foundry resource can have many model deployments. You only pay for inference performed on model deployments. Deployments are Azure resources, so they’re subject to Azure policies. For more information about creating deployments, see Add and configure model deployments.

Azure OpenAI inference endpoint

The Azure OpenAI API exposes the full capabilities of OpenAI models and supports more features like assistants, threads, files, and batch inference. You might also access non-OpenAI models through this route. Azure OpenAI endpoints, usually of the form https://<resource-name>.openai.azure.com, work at the deployment level and each deployment has its own associated URL. However, you can use the same authentication mechanism to consume the deployments. For more information, see the reference page for Azure OpenAI API.
An illustration showing how Azure OpenAI deployments contain a single URL for each deployment.
Each deployment has a URL that’s formed by concatenating the Azure OpenAI base URL and the route /deployments/<model-deployment-name>.
Install the package openai using your package manager, like pip:
pip install openai --upgrade
Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:
import os
from openai import AzureOpenAI

client = AzureOpenAI(
    azure_endpoint = "https://<resource>.services.ai.azure.com"
    api_key=os.getenv("AZURE_INFERENCE_CREDENTIAL"),  
    api_version="2024-10-21",
)
response = client.chat.completions.create(
    model="deepseek-v3-0324", # Replace with your model deployment name.
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain Riemann's conjecture in 1 paragraph"}
    ]
)

print(response.model_dump_json(indent=2)
For more information about how to use the Azure OpenAI endpoint, see Azure OpenAI in Foundry Models documentation.

Keyless authentication

Models deployed to Foundry Models in Foundry Tools support keyless authorization by using Microsoft Entra ID. Keyless authorization enhances security, simplifies the user experience, reduces operational complexity, and provides robust compliance support for modern development. It makes keyless authorization a strong choice for organizations adopting secure and scalable identity management solutions. To use keyless authentication, configure your resource and grant access to users to perform inference. After you configure the resource and grant access, authenticate as follows:
Install the OpenAI SDK using a package manager like pip:
pip install openai
For Microsoft Entra ID authentication, also install:
pip install azure-identity
Use the package to consume the model. The following example shows how to create a client to consume chat completions with Microsoft Entra ID and make a test call to the chat completions endpoint with your model deployment.Replace <resource> with your Foundry resource name. Find it in the Azure portal or by running az cognitiveservices account list. Replace DeepSeek-V3.1 with your actual deployment name.
from openai import OpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), 
    "https://cognitiveservices.azure.com/.default"
)

client = OpenAI(
    base_url="https://<resource>.openai.azure.com/openai/v1/",
    api_key=token_provider,
)

completion = client.chat.completions.create(
    model="DeepSeek-V3.1",  # Required: your deployment name
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is Azure AI?"}
    ]
)

print(completion.choices[0].message.content)
Expected output
Azure AI is a comprehensive suite of artificial intelligence services and tools from Microsoft that enables developers to build intelligent applications. It includes services for natural language processing, computer vision, speech recognition, and machine learning capabilities.
Reference: OpenAI Python SDK and DefaultAzureCredential class.