- Create and configure the Azure resources to use DeepSeek-R1 in Foundry Models.
- Configure the model deployment.
- Use DeepSeek-R1 with the next generation v1 Azure OpenAI APIs to consume the model in code.
Prerequisites
To complete this article, you need:- An Azure subscription with a valid payment method. If you don’t have an Azure subscription, create a paid Azure account to begin. If you’re using GitHub Models, you can upgrade from GitHub Models to Microsoft Foundry Models and create an Azure subscription in the process.
- Access to Microsoft Foundry with appropriate permissions to create and manage resources. Typically requires Contributor or Owner role on the resource group for creating resources and deploying models.
- The Cognitive Services User role (or higher) assigned to your Azure account on the Foundry resource. This role is required to make inference calls with Microsoft Entra ID. Assign it in the Azure portal under Access Control (IAM) on the Foundry resource.
-
Install the Azure OpenAI SDK for your programming language:
- Python:
pip install openai azure-identity - .NET:
dotnet add package OpenAIanddotnet add package Azure.Identity - JavaScript:
npm install openai @azure/identity - Java: Add the
com.openai:openai-javaandcom.azure:azure-identitypackages
- Python:
Create the resources
To create a Foundry project that supports deployment for DeepSeek-R1, follow these steps. You can also create the resources using Azure CLI or infrastructure as code, with Bicep.- Sign in to Microsoft Foundry. Make sure the New Foundry toggle is on. These steps refer to Foundry (new).

- The project you’re working on appears in the upper-left corner.
- To create a new project, select the project name, then Create new project.
- Give your project a name and select Create project.
Deploy the model
- Add a model to your project. Select Build in the middle of the page, then Model.
- Select Deploy base model to open the model catalog.
- Find and select the DeepSeek-R1 model tile to open its model card and select Deploy. You can select Quick deploy to use the defaults, or select Customize deployment to see and change the deployment settings.
Use the model in code
Use the Foundry Models endpoint and credentials to connect to the model.- Select the Details pane from the upper pane of the Playgrounds to see the deployment’s details. Here, you can find the deployment’s URI and API key.
- Get your resource name from the deployment’s URI to use for inferencing the model via code.
- Authenticate with Microsoft Entra ID using
DefaultAzureCredential, which automatically attempts multiple authentication methods (environment variables, managed identity, Azure CLI, and others). The exact order depends on the Azure Identity SDK version you’re using.
- Create a chat completion client connected to your model deployment
- Send a basic prompt to the DeepSeek-R1 model
- Receive and display the response
<think> tags), token usage statistics (prompt tokens, completion tokens, total tokens), and model information.
- Python
- JavaScript
- C#
- Java
- REST
Install the packages The following example shows how to create a client to consume chat completions and then generate and print out the response:
openai and azure-identity using your package manager, like pip:- OpenAI Python client
- OpenAI JavaScript client
- OpenAI .NET client
- DefaultAzureCredential class
- Chat completions API reference
- Azure Identity library overview
About reasoning models
Reasoning models can reach higher levels of performance in domains like math, coding, science, strategy, and logistics. The way these models produce outputs is by explicitly using chain of thought to explore all possible paths before generating an answer. They verify their answers as they produce them, which helps to arrive at more accurate conclusions. As a result, reasoning models might require less context prompts in order to produce effective results. Reasoning models produce two types of content as outputs:- Reasoning completions
- Output completions
DeepSeek-R1, might respond with the reasoning content. Others, like o1, output only the completions.
Reasoning content
Some reasoning models, like DeepSeek-R1, generate completions and include the reasoning behind them. The reasoning associated with the completion is included in the response’s content within the tags<think> and </think>. The model can select the scenarios for which to generate reasoning content. The following example shows how to generate the reasoning content, using Python:
Prompt reasoning models
When building prompts for reasoning models, take the following into consideration:- Use simple instructions and avoid using chain-of-thought techniques.
- Built-in reasoning capabilities make simple zero-shot prompts as effective as more complex methods.
- When providing additional context or documents, like in RAG scenarios, including only the most relevant information might help prevent the model from over-complicating its response.
- Reasoning models may support the use of system messages. However, they might not follow them as strictly as other non-reasoning models.
- When creating multi-turn applications, consider appending only the final answer from the model, without it’s reasoning content, as explained in the Reasoning content section. Notice that reasoning models can take longer times to generate responses. They use long reasoning chains of thought that enable deeper and more structured problem-solving. They also perform self-verification to cross-check their answers and correct their mistakes, thereby showcasing emergent self-reflective behaviors.
Parameters
Reasoning models support a subset of the standard chat completion parameters to maintain the integrity of their reasoning process. Supported parameters:max_tokens- Maximum number of tokens to generate in the responsestop- Sequences where the API stops generating tokensstream- Enable streaming responsesn- Number of completions to generate
temperature- Fixed to optimize reasoning qualitytop_p- Not configurable for reasoning modelspresence_penalty- Not availablerepetition_penalty- Not available for reasoning models
max_tokens:
Use the model in the playground
Use the model in the playground to get an idea of the model’s capabilities. As soon as the deployment completes, you land on the model’s playground, where you can start to interact with the deployment. For example, you can enter your prompts, such as “How many languages are in the world?” in the playground.Troubleshooting
If you encounter issues while following this tutorial, use the following guidance to resolve common problems.Authentication errors (401/403)
- Ensure you’re signed in to Azure CLI. For local development, run
az loginbefore executing your code.DefaultAzureCredentialuses your Azure CLI credentials as a fallback when no other credentials are available. - Verify role assignments. Your Azure account needs the Cognitive Services User role (or higher) on the Foundry resource to make inference calls with Microsoft Entra ID. If you haven’t assigned this role yet, see the Prerequisites section.
- Check the endpoint format. The endpoint URL must follow the format
https://YOUR-RESOURCE-NAME.openai.azure.com/openai/v1/. Verify the resource name matches your Foundry resource.
Deployment issues
- Deployment name vs. model name. The
modelparameter in API calls refers to your deployment name, not the model name. If you customized the deployment name during creation, use that name instead ofDeepSeek-R1. - Deployment not ready. If you receive a 404 error, verify that the deployment status shows Succeeded in the Foundry portal before making API calls.
Rate limiting (429 errors)
- Implement retry logic. Reasoning models generate longer responses that consume more tokens. Use exponential backoff to handle 429 (Too Many Requests) errors.
- Monitor token usage. DeepSeek-R1 reasoning content (within
<think>tags) counts toward your token limit. See quotas and limits for the current rate limits. - Request quota increases. If you consistently hit rate limits, request increases to the default limits.
Package installation issues
- Python. Install both required packages:
pip install openai azure-identity. Theazure-identitypackage is required forDefaultAzureCredential. - JavaScript. Install both required packages:
npm install openai @azure/identity. - .NET. Install the Azure Identity package:
dotnet add package Azure.Identity.
What you learned
In this tutorial, you accomplished the following:- Created Foundry resources for hosting AI models
- Deployed the DeepSeek-R1 reasoning model
- Made authenticated API calls using Microsoft Entra ID
- Sent inference requests and received reasoning outputs
- Parsed reasoning content from model responses to understand the model’s thought process