Tutorial: Get started with DeepSeek-R1 in Foundry Models

In this tutorial, you learn how to deploy and use a DeepSeek reasoning model in Microsoft Foundry. This tutorial uses DeepSeek-R1 for illustration. However, the content also applies to the newer DeepSeek-R1-0528 reasoning model. What you accomplish: In this tutorial, you deploy the DeepSeek-R1 reasoning model, send inference requests programmatically using code, and parse the reasoning output to understand how the model arrives at its answers. The steps you perform in this tutorial are:

Create and configure the Azure resources to use DeepSeek-R1 in Foundry Models.
Configure the model deployment.
Use DeepSeek-R1 with the next generation v1 Azure OpenAI APIs to consume the model in code.

Prerequisites

To complete this article, you need:

An Azure subscription with a valid payment method. If you don’t have an Azure subscription, create a paid Azure account to begin. If you’re using GitHub Models, you can upgrade from GitHub Models to Microsoft Foundry Models and create an Azure subscription in the process.
Access to Microsoft Foundry with appropriate permissions to create and manage resources. Typically requires Contributor or Owner role on the resource group for creating resources and deploying models.
The Cognitive Services User role (or higher) assigned to your Azure account on the Foundry resource. This role is required to make inference calls with Microsoft Entra ID. Assign it in the Azure portal under Access Control (IAM) on the Foundry resource.
Install the Azure OpenAI SDK for your programming language:
- Python: pip install openai azure-identity
- .NET: dotnet add package OpenAI and dotnet add package Azure.Identity
- JavaScript: npm install openai @azure/identity
- Java: Add the com.openai:openai-java and com.azure:azure-identity packages

DeepSeek-R1 is a reasoning model that generates explanations alongside answers. It supports text-based chat completions but doesn’t support tool calling or structured output formats. See About reasoning models for details.

Create the resources

To create a Foundry project that supports deployment for DeepSeek-R1, follow these steps. You can also create the resources using Azure CLI or infrastructure as code, with Bicep.

The project you’re working on appears in the upper-left corner.
To create a new project, select the project name, then Create new project.
Give your project a name and select Create project.

Deploy the model

Add a model to your project. Select Build in the middle of the page, then Model.
Select Deploy base model to open the model catalog.
Find and select the DeepSeek-R1 model tile to open its model card and select Deploy. You can select Quick deploy to use the defaults, or select Customize deployment to see and change the deployment settings.

When the deployment finishes, you land on its playground, where you can start to interact with the deployment. Confirm your deployment is ready by verifying the deployment status shows Succeeded. Note the deployment name and endpoint URI from the deployment details—you need both for the code section. If you prefer to explore the model interactively first, skip to Use the model in the playground.

Use the model in code

Use the Foundry Models endpoint and credentials to connect to the model.

Select the Details pane from the upper pane of the Playgrounds to see the deployment’s details. Here, you can find the deployment’s URI and API key.
Get your resource name from the deployment’s URI to use for inferencing the model via code.

Use the next generation v1 Azure OpenAI APIs to consume the model in your code. These code examples use a secure, keyless authentication approach, Microsoft Entra ID, via the Azure Identity library. The following code examples demonstrate how to:

Authenticate with Microsoft Entra ID using DefaultAzureCredential, which automatically attempts multiple authentication methods (environment variables, managed identity, Azure CLI, and others). The exact order depends on the Azure Identity SDK version you’re using.

For local development, ensure you’re authenticated with Azure CLI by running az login. For production deployments in Azure, configure managed identity for your application.

Create a chat completion client connected to your model deployment
Send a basic prompt to the DeepSeek-R1 model
Receive and display the response

Expected output: A JSON response containing the model’s answer, reasoning process (within <think> tags), token usage statistics (prompt tokens, completion tokens, total tokens), and model information.

Python
JavaScript
C#
Java
REST

Install the packages openai and azure-identity using your package manager, like pip:

pip install --upgrade openai azure-identity

The following example shows how to create a client to consume chat completions and then generate and print out the response:

from openai import OpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider

token_provider = get_bearer_token_provider(
    DefaultAzureCredential(), "https://ai.azure.com/.default"
)

client = OpenAI(  
  base_url = "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",  
  api_key=token_provider,
)
response = client.chat.completions.create(
  model="DeepSeek-R1", # Replace with your model deployment name.
  messages=[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "How many languages are in the world?"}
  ]
)

#print(response.choices[0].message)
print(response.model_dump_json(indent=2))

First install the Azure Identity client library before you can use DefaultAzureCredential:Install the packages openai and @azure/identity using npm:

npm install openai @azure/identity

To authenticate the OpenAI client, use the getBearerTokenProvider function from the @azure/identity package. This function creates a token provider that OpenAI uses internally to obtain tokens for each request.The following code creates the token provider, creates a client to consume chat completions, and generates the response:

import { DefaultAzureCredential, getBearerTokenProvider } from "@azure/identity";
import { OpenAI } from "openai";

const tokenProvider = getBearerTokenProvider(
    new DefaultAzureCredential(),
    'https://ai.azure.com/.default');

const client = new OpenAI({
    baseURL: "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/",
    apiKey: tokenProvider
});

const messages = [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'How many languages are in the world?' }
];

// Make the API request with top-level await
const result = await client.chat.completions.create({ 
    messages, 
    model: 'DeepSeek-R1', // Your model deployment name
    max_tokens: 100 
});

// Print the full response
console.log('Full response:', result);

// Print just the message content from the response
console.log('Response content:', result.choices[0].message.content);

First install the Azure Identity library before you can use DefaultAzureCredential:

dotnet add package Azure.Identity

Use the desired credential type from the library. For example, DefaultAzureCredential. Then create the token provider, create a client to consume chat completions, and generate the response.

using Azure.Identity;
using OpenAI;
using OpenAI.Chat;
using System.ClientModel.Primitives;

#pragma warning disable OPENAI001

BearerTokenPolicy tokenPolicy = new(
    new DefaultAzureCredential(),
    "https://ai.azure.com/.default");

ChatClient client = new(
    model: "DeepSeek-R1", // Replace with your model deployment name.
    authenticationPolicy: tokenPolicy,
    options: new OpenAIClientOptions() { 
        Endpoint = new Uri("https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1")
   }
);

ChatCompletion completion = client.CompleteChat("How many languages are in the world?");

Console.WriteLine($"[ASSISTANT]: {completion.Content[0].Text}");

Authentication with Microsoft Entra ID requires some initial setup:Add the Azure Identity package:

<dependency>
    <groupId>com.azure</groupId>
    <artifactId>azure-identity</artifactId>
    <version>1.18.0</version>
</dependency>

After setup, you can choose which type of credential from azure.identity to use. As an example, DefaultAzureCredential can be used to authenticate the client.Authentication is straightforward using DefaultAzureCredential. It finds the best credential to use in its running environment.

Credential tokenCredential = BearerTokenCredential.create(
        AuthenticationUtil.getBearerTokenSupplier(
                new DefaultAzureCredentialBuilder().build(),
                "https://ai.azure.com/.default"));
OpenAIClient client = OpenAIOkHttpClient.builder()
        .baseUrl("https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/")
        .credential(tokenCredential)
        .build();

For more information about Azure OpenAI keyless authentication, see Use Azure OpenAI without keys.Chat completion:

package com.example;

import com.openai.client.OpenAIClient;
import com.openai.client.okhttp.OpenAIOkHttpClient;
import com.openai.models.ChatModel;
import com.openai.models.chat.completions.ChatCompletion;
import com.openai.models.chat.completions.ChatCompletionCreateParams;

public class OpenAITest {
    public static void main(String[] args) {
        String resourceName = "https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1";
        String modelDeploymentName = "DeepSeek-R1"; // Replace with your model deployment name.

        try {
            OpenAIClient client = OpenAIOkHttpClient.builder()
                    .baseUrl(resourceName)
                    // Set the Azure Entra ID
                    .credential(BearerTokenCredential.create(AuthenticationUtil.getBearerTokenSupplier(
                        new DefaultAzureCredentialBuilder().build(), "https://ai.azure.com/.default")))
                    .build();

           ChatCompletionCreateParams params = ChatCompletionCreateParams.builder()
              .addUserMessage("How many languages are in the world?")
              .model(modelDeploymentName)
              .build();
           ChatCompletion chatCompletion = client.chat().completions().create(params);
        }
    }
}

curl -X POST https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $AZURE_OPENAI_AUTH_TOKEN" \
  -d '{
      "model": "DeepSeek-R1",
      "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "How many languages are in the world?"
      }
    ]
  }'

To retrieve a response:

curl -X GET https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/chat/completions/{response_id} \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $AZURE_OPENAI_AUTH_TOKEN"

After running the code, you should see a JSON response that includes choices[0].message.content with the model’s answer. If the model generates reasoning, the response contains content wrapped in <think>...</think> tags followed by the final answer.

API Reference:

Reasoning might generate longer responses and consume a larger number of tokens. DeepSeek-R1 supports up to 5,000 requests per minute and 5,000,000 tokens per minute. See the rate limits that apply to DeepSeek-R1 models. Consider having a retry strategy to handle rate limits. You can also request increases to the default limits.

About reasoning models

Reasoning models can reach higher levels of performance in domains like math, coding, science, strategy, and logistics. The way these models produce outputs is by explicitly using chain of thought to explore all possible paths before generating an answer. They verify their answers as they produce them, which helps to arrive at more accurate conclusions. As a result, reasoning models might require less context prompts in order to produce effective results. Reasoning models produce two types of content as outputs:

Reasoning completions
Output completions

Both of these completions count towards content generated from the model. Therefore, they contribute to the token limits and costs associated with the model. Some models, like DeepSeek-R1, might respond with the reasoning content. Others, like o1, output only the completions.

Reasoning content

Some reasoning models, like DeepSeek-R1, generate completions and include the reasoning behind them. The reasoning associated with the completion is included in the response’s content within the tags <think> and </think>. The model can select the scenarios for which to generate reasoning content. The following example shows how to generate the reasoning content, using Python:

import re

match = re.match(r"<think>(.*?)</think>(.*)", response.choices[0].message.content, re.DOTALL)

print("Response:")
if match:
    print("\tThinking:", match.group(1))
    print("\tAnswer:", match.group(2))
else:
    print("\tAnswer:", response.choices[0].message.content)
print("Model:", response.model)
print("Usage:")
print("\tPrompt tokens:", response.usage.prompt_tokens)
print("\tTotal tokens:", response.usage.total_tokens)
print("\tCompletion tokens:", response.usage.completion_tokens)

Thinking: Okay, the user is asking how many languages exist in the world. I need to provide a clear and accurate answer. Let's start by recalling the general consensus from linguistic sources. I remember that the number often cited is around 7,000, but maybe I should check some reputable organizations.\n\nEthnologue is a well-known resource for language data, and I think they list about 7,000 languages. But wait, do they update their numbers? It might be around 7,100 or so. Also, the exact count can vary because some sources might categorize dialects differently or have more recent data. \n\nAnother thing to consider is language endangerment. Many languages are endangered, with some having only a few speakers left. Organizations like UNESCO track endangered languages, so mentioning that adds context. Also, the distribution isn't even. Some countries or regions have hundreds of languages, like Papua New Guinea with over 800, while others have just a few. \n\nA user might also wonder why the exact number is hard to pin down. It's because the distinction between a language and a dialect can be political or cultural. For example, Mandarin and Cantonese are considered dialects of Chinese by some, but they're mutually unintelligible, so others classify them as separate languages. Also, some regions are under-researched, making it hard to document all languages. \n\nI should also touch on language families. The 7,000 languages are grouped into families like Indo-European, Sino-Tibetan, Niger-Congo, etc. Maybe mention a few of the largest families. But wait, the question is just about the count, not the families. Still, it's good to provide a bit more context. \n\nI need to make sure the information is up-to-date. Let me think – recent estimates still hover around 7,000. However, languages are dying out rapidly, so the number decreases over time. Including that note about endangerment and language extinction rates could be helpful. For instance, it's often stated that a language dies every few weeks. \n\nAnother point is sign languages. Does the count include them? Ethnologue includes some, but not all sources might. If the user is including sign languages, that adds more to the count, but I think the 7,000 figure typically refers to spoken languages. For thoroughness, maybe mention that there are also over 300 sign languages. \n\nSummarizing, the answer should state around 7,000, mention Ethnologue's figure, explain why the exact number varies, touch on endangerment, and possibly note sign languages as a separate category. Also, a brief mention of Papua New Guinea as the most linguistically diverse country/region. \n\nWait, let me verify Ethnologue's current number. As of their latest edition (25th, 2022), they list 7,168 living languages. But I should check if that's the case. Some sources might round to 7,000. Also, SIL International publishes Ethnologue, so citing them as reference makes sense. \n\nOther sources, like Glottolog, might have a different count because they use different criteria. Glottolog might list around 7,000 as well, but exact numbers vary. It's important to highlight that the count isn't exact because of differing definitions and ongoing research. \n\nIn conclusion, the approximate number is 7,000, with Ethnologue being a key source, considerations of endangerment, and the challenges in counting due to dialect vs. language distinctions. I should make sure the answer is clear, acknowledges the variability, and provides key points succinctly.

Answer: The exact number of languages in the world is challenging to determine due to differences in definitions (e.g., distinguishing languages from dialects) and ongoing documentation efforts. However, widely cited estimates suggest there are approximately **7,000 languages** globally.
Model: DeepSeek-R1
Usage: 
  Prompt tokens: 11
  Total tokens: 897
  Completion tokens: 886

API Reference:

Prompt reasoning models

When building prompts for reasoning models, take the following into consideration:

Use simple instructions and avoid using chain-of-thought techniques.
Built-in reasoning capabilities make simple zero-shot prompts as effective as more complex methods.
When providing additional context or documents, like in RAG scenarios, including only the most relevant information might help prevent the model from over-complicating its response.
Reasoning models may support the use of system messages. However, they might not follow them as strictly as other non-reasoning models.
When creating multi-turn applications, consider appending only the final answer from the model, without it’s reasoning content, as explained in the Reasoning content section. Notice that reasoning models can take longer times to generate responses. They use long reasoning chains of thought that enable deeper and more structured problem-solving. They also perform self-verification to cross-check their answers and correct their mistakes, thereby showcasing emergent self-reflective behaviors.

Parameters

Reasoning models support a subset of the standard chat completion parameters to maintain the integrity of their reasoning process. Supported parameters:

max_tokens - Maximum number of tokens to generate in the response
stop - Sequences where the API stops generating tokens
stream - Enable streaming responses
n - Number of completions to generate

Unsupported parameters (reasoning models don’t support these):

temperature - Fixed to optimize reasoning quality
top_p - Not configurable for reasoning models
presence_penalty - Not available
repetition_penalty - Not available for reasoning models

Example using max_tokens:

response = client.chat.completions.create(
    model="DeepSeek-R1",
    messages=[
        {"role": "user", "content": "Explain quantum computing"}
    ],
    max_tokens=1000  # Limit response length
)

For the complete list of supported parameters, see the Chat completions API reference.

Use the model in the playground

Use the model in the playground to get an idea of the model’s capabilities. As soon as the deployment completes, you land on the model’s playground, where you can start to interact with the deployment. For example, you can enter your prompts, such as “How many languages are in the world?” in the playground.

Troubleshooting

If you encounter issues while following this tutorial, use the following guidance to resolve common problems.

Authentication errors (401/403)

Ensure you’re signed in to Azure CLI. For local development, run az login before executing your code. DefaultAzureCredential uses your Azure CLI credentials as a fallback when no other credentials are available.
Verify role assignments. Your Azure account needs the Cognitive Services User role (or higher) on the Foundry resource to make inference calls with Microsoft Entra ID. If you haven’t assigned this role yet, see the Prerequisites section.
Check the endpoint format. The endpoint URL must follow the format https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/. Verify the resource name matches your Foundry resource.

Deployment issues

Deployment name vs. model name. The model parameter in API calls refers to your deployment name, not the model name. If you customized the deployment name during creation, use that name instead of DeepSeek-R1.
Deployment not ready. If you receive a 404 error, verify that the deployment status shows Succeeded in the Foundry portal before making API calls.

Rate limiting (429 errors)

Implement retry logic. Reasoning models generate longer responses that consume more tokens. Use exponential backoff to handle 429 (Too Many Requests) errors.
Monitor token usage. DeepSeek-R1 reasoning content (within <think> tags) counts toward your token limit. See quotas and limits for the current rate limits.
Request quota increases. If you consistently hit rate limits, request increases to the default limits.

Package installation issues

Python. Install both required packages: pip install openai azure-identity. The azure-identity package is required for DefaultAzureCredential.
JavaScript. Install both required packages: npm install openai @azure/identity.
.NET. Install the Azure Identity package: dotnet add package Azure.Identity.

What you learned

In this tutorial, you accomplished the following:

Created Foundry resources for hosting AI models
Deployed the DeepSeek-R1 reasoning model
Made authenticated API calls using Microsoft Entra ID
Sent inference requests and received reasoning outputs
Parsed reasoning content from model responses to understand the model’s thought process

​Prerequisites

​Create the resources

​Deploy the model

​Use the model in code

​About reasoning models

​Reasoning content

​Prompt reasoning models

​Parameters

​Use the model in the playground

​Troubleshooting

​Authentication errors (401/403)

​Deployment issues

​Rate limiting (429 errors)

​Package installation issues

​What you learned

​Related content

Prerequisites

Create the resources

Deploy the model

Use the model in code

About reasoning models

Reasoning content

Prompt reasoning models

Parameters

Use the model in the playground

Troubleshooting

Authentication errors (401/403)

Deployment issues

Rate limiting (429 errors)

Package installation issues

What you learned

Related content