Skip to main content

Deploy Microsoft Foundry Models in the Foundry portal

This article refers to the Microsoft Foundry (new) portal.
In this article, you learn how to use the Foundry portal to deploy a Foundry Model in a Foundry resource for inference. Foundry Models include models such as Azure OpenAI models, Meta Llama models, and more. After you deploy a Foundry Model, you can interact with it in the Foundry Playground and use it from code. This article uses a Foundry Model from partners and community Llama-3.2-90B-Vision-Instruct for illustration. Models from partners and community require that you subscribe to Azure Marketplace before deployment. On the other hand, Foundry Models sold directly by Azure, such as Azure OpenAI in Foundry Models, don’t have this requirement. For more information about Foundry Models, including the regions where they’re available for deployment, see Foundry Models sold directly by Azure and Foundry Models from partners and community.

Prerequisites

To complete this article, you need:

Deploy a model

Sign in to Microsoft Foundry. Make sure the New Foundry toggle is on. These steps refer to Foundry (new).
  1. Go to the Model catalog section in the Foundry portal.
  2. Select a model and review its details in the model card. This article uses Llama-3.2-90B-Vision-Instruct for illustration.
  3. Select Use this model.
  4. For Foundry Models from partners and community, you need to subscribe to Azure Marketplace. This requirement applies to Llama-3.2-90B-Vision-Instruct, for example. Read the terms of use and select Agree and Proceed to accept the terms.
For Foundry Models sold directly by Azure, such as the Azure OpenAI model gpt-4o-mini, you don’t subscribe to Azure Marketplace.
  1. Configure the deployment settings:
    • By default, the deployment uses the model name. You can modify this name before deploying.
    • During inference, the deployment name is used in the model parameter to route requests to this particular deployment.
Each model supports different deployment types, providing different data residency or throughput guarantees. See deployment types for more details. In this example, the model supports the Global Standard deployment type.
  1. The Foundry portal automatically selects the Foundry resource associated with your project as the Connected AI resource. Select Customize to change the connection if needed. If you’re deploying under the Serverless API deployment type, the project and resource must be in one of the supported regions of deployment for the model.
Screenshot showing how to customize the deployment if needed.
  1. Select Deploy. The model’s deployment details page opens up while the deployment is being created.
  2. When the deployment completes, the model is ready for use. You can also use the Foundry Playgrounds to interactively test the model.
::: moniker-end Deploy a model by following these steps in the Foundry portal:
::: moniker range=“foundry” Sign in to Microsoft Foundry. Make sure the New Foundry toggle is on. These steps refer to Foundry (new).
  1. From the Foundry portal homepage, select Discover in the upper-right navigation, then Models in the left pane.
  2. Select a model and review its details in the model card. This article uses Llama-3.2-90B-Vision-Instruct for illustration.
  3. Select Deploy > Custom settings to customize your deployment. Alternatively, you can use the default deployment settings by selecting Deploy > Default settings.
  4. For Foundry Models from partners and community, you need to subscribe to Azure Marketplace. This requirement applies to Llama-3.2-90B-Vision-Instruct, for example. Read the terms of use and select Agree and Proceed to accept the terms.
For Foundry Models sold directly by Azure, such as the Azure OpenAI model gpt-4o-mini, you don’t subscribe to Azure Marketplace.
  1. Configure the deployment settings:
    • By default, the deployment uses the model name. You can modify this name before deploying.
    • During inference, the deployment name is used in the model parameter to route requests to this particular deployment.
    Select Deploy to create your deployment.
Each model supports different deployment types, providing different data residency or throughput guarantees. See deployment types for more details. In this example, the model supports the Global Standard deployment type.
  1. When the deployment completes, you land on the Foundry Playgrounds where you can interactively test the model. Your project and resource must be in one of the supported regions of deployment for the model. Verify that the deployment status shows Succeeded in your deployment list.
::: moniker-end

Manage models

You can manage the existing model deployments in the resource by using the Foundry portal.
  1. Select Build in the upper-right navigation.
  2. Select Models in the left pane to see the list of deployments in the resource.
From a deployment’s detail page, you can view endpoint details and keys, adjust deployment settings, or delete a deployment that you no longer need.

Test the deployment in the playground

You can interact with the new model in the Foundry portal by using the playground. The playground is a web-based interface that lets you interact with the model in real-time. Use the playground to test the model with different prompts and see the model’s responses.
  1. From the list of deployments, select the Llama-3.2-90B-Vision-Instruct deployment to open up the playground page.
  2. Type your prompt and see the outputs.
  3. Select the Code tab to see details about how to access the model deployment programmatically.

Use the model with code

To run inference on the deployed model, see the following examples:

Regional availability and quota limits of a model

For Foundry Models, the default quota varies by model and region. Certain models might only be available in some regions. For more information on availability and quota limits, see Azure OpenAI in Microsoft Foundry Models quotas and limits and Microsoft Foundry Models quotas and limits.

Quota for deploying and running inference on a model

For Foundry Models, deploying and running inference consume quota that Azure assigns to your subscription on a per-region, per-model basis in units of Tokens-per-Minute (TPM). When you sign up for Foundry, you receive default quota for most of the available models. Then, you assign TPM to each deployment as you create it, which reduces the available quota for that model. You can continue to create deployments and assign them TPMs until you reach your quota limit. When you reach your quota limit, you can only create new deployments of that model if you:
  • Request more quota by submitting a quota increase form.
  • Adjust the allocated quota on other model deployments in the Foundry portal, to free up tokens for new deployments.
For more information about quota, see Microsoft Foundry Models quotas and limits and Manage Azure OpenAI quota.

Troubleshooting

IssueResolution
Quota exceededRequest more quota or reallocate TPM from existing deployments.
Region not supportedCheck regional availability and deploy in a supported region.
Marketplace subscription errorVerify you have the required permissions to subscribe to Azure Marketplace offerings.
Deployment status shows FailedConfirm that the model is available in your selected region and that you have sufficient quota.