Deploy Microsoft Foundry Models in the Foundry portal
This article refers to the Microsoft Foundry (new) portal.
Llama-3.2-90B-Vision-Instruct for illustration. Models from partners and community require that you subscribe to Azure Marketplace before deployment. On the other hand, Foundry Models sold directly by Azure, such as Azure OpenAI in Foundry Models, don’t have this requirement. For more information about Foundry Models, including the regions where they’re available for deployment, see Foundry Models sold directly by Azure and Foundry Models from partners and community.
Prerequisites
To complete this article, you need:- An Azure subscription with a valid payment method. If you don’t have an Azure subscription, create a paid Azure account to begin. If you’re using GitHub Models, you can upgrade to Foundry Models and create an Azure subscription in the process.
- The Cognitive Services Contributor role or equivalent permissions on the Foundry resource to create and manage deployments. For more information, see Azure RBAC roles.
- A Microsoft Foundry project. This kind of project is managed under a Foundry resource.
- Foundry Models from partners and community require access to Azure Marketplace to create subscriptions. Ensure you have the permissions required to subscribe to model offerings. Foundry Models sold directly by Azure don’t have this requirement.
Deploy a model
Sign in to Microsoft Foundry. Make sure the New Foundry toggle is on. These steps refer to Foundry (new).
- Go to the Model catalog section in the Foundry portal.
-
Select a model and review its details in the model card. This article uses
Llama-3.2-90B-Vision-Instructfor illustration. - Select Use this model.
-
For Foundry Models from partners and community, you need to subscribe to Azure Marketplace. This requirement applies to
Llama-3.2-90B-Vision-Instruct, for example. Read the terms of use and select Agree and Proceed to accept the terms.
For Foundry Models sold directly by Azure, such as the Azure OpenAI model
gpt-4o-mini, you don’t subscribe to Azure Marketplace.-
Configure the deployment settings:
- By default, the deployment uses the model name. You can modify this name before deploying.
- During inference, the deployment name is used in the
modelparameter to route requests to this particular deployment.
- The Foundry portal automatically selects the Foundry resource associated with your project as the Connected AI resource. Select Customize to change the connection if needed. If you’re deploying under the Serverless API deployment type, the project and resource must be in one of the supported regions of deployment for the model.

- Select Deploy. The model’s deployment details page opens up while the deployment is being created.
- When the deployment completes, the model is ready for use. You can also use the Foundry Playgrounds to interactively test the model.

- From the Foundry portal homepage, select Discover in the upper-right navigation, then Models in the left pane.
-
Select a model and review its details in the model card. This article uses
Llama-3.2-90B-Vision-Instructfor illustration. - Select Deploy > Custom settings to customize your deployment. Alternatively, you can use the default deployment settings by selecting Deploy > Default settings.
-
For Foundry Models from partners and community, you need to subscribe to Azure Marketplace. This requirement applies to
Llama-3.2-90B-Vision-Instruct, for example. Read the terms of use and select Agree and Proceed to accept the terms.
For Foundry Models sold directly by Azure, such as the Azure OpenAI model
gpt-4o-mini, you don’t subscribe to Azure Marketplace.-
Configure the deployment settings:
- By default, the deployment uses the model name. You can modify this name before deploying.
- During inference, the deployment name is used in the
modelparameter to route requests to this particular deployment.
- When the deployment completes, you land on the Foundry Playgrounds where you can interactively test the model. Your project and resource must be in one of the supported regions of deployment for the model. Verify that the deployment status shows Succeeded in your deployment list.
Manage models
You can manage the existing model deployments in the resource by using the Foundry portal.- Select Build in the upper-right navigation.
- Select Models in the left pane to see the list of deployments in the resource.
Test the deployment in the playground
You can interact with the new model in the Foundry portal by using the playground. The playground is a web-based interface that lets you interact with the model in real-time. Use the playground to test the model with different prompts and see the model’s responses.- From the list of deployments, select the Llama-3.2-90B-Vision-Instruct deployment to open up the playground page.
- Type your prompt and see the outputs.
- Select the Code tab to see details about how to access the model deployment programmatically.
Use the model with code
To run inference on the deployed model, see the following examples:- To use the Responses API with Foundry Models sold directly by Azure, such as Microsoft AI, DeepSeek, and Grok models, see How to generate text responses with Microsoft Foundry Models.
- To use the Responses API with OpenAI models, see Getting started with the responses API.
- To use the Chat completions API with models sold by partners, such as the Llama model deployed in this article, see Model support for chat completions.
Regional availability and quota limits of a model
For Foundry Models, the default quota varies by model and region. Certain models might only be available in some regions. For more information on availability and quota limits, see Azure OpenAI in Microsoft Foundry Models quotas and limits and Microsoft Foundry Models quotas and limits.Quota for deploying and running inference on a model
For Foundry Models, deploying and running inference consume quota that Azure assigns to your subscription on a per-region, per-model basis in units of Tokens-per-Minute (TPM). When you sign up for Foundry, you receive default quota for most of the available models. Then, you assign TPM to each deployment as you create it, which reduces the available quota for that model. You can continue to create deployments and assign them TPMs until you reach your quota limit. When you reach your quota limit, you can only create new deployments of that model if you:- Request more quota by submitting a quota increase form.
- Adjust the allocated quota on other model deployments in the Foundry portal, to free up tokens for new deployments.
Troubleshooting
| Issue | Resolution |
|---|---|
| Quota exceeded | Request more quota or reallocate TPM from existing deployments. |
| Region not supported | Check regional availability and deploy in a supported region. |
| Marketplace subscription error | Verify you have the required permissions to subscribe to Azure Marketplace offerings. |
| Deployment status shows Failed | Confirm that the model is available in your selected region and that you have sufficient quota. |