Deploy models using Azure CLI and Bicep

This article refers to the Microsoft Foundry (new) portal.

If you’re currently using an Azure AI Inference beta SDK with Microsoft Foundry Models or Azure OpenAI service, we strongly recommend that you transition to the generally available OpenAI/v1 API, which uses an OpenAI stable SDK.For more information on how to migrate to the OpenAI/v1 API by using an SDK in your programming language of choice, see Migrate from Azure AI Inference SDK to OpenAI SDK.

In this article, you learn how to add a new model deployment to a Foundry Models endpoint. The deployment is available for inference in your Foundry resource when you specify the deployment name in your requests.

Prerequisites

To complete this article, you need the following:

An Azure subscription. If you’re using GitHub Models, you can upgrade your experience and create an Azure subscription in the process. For more information, see Upgrade from GitHub Models to Foundry Models.
A Foundry project. This project type is managed under a Foundry resource (formerly known as Azure AI Services resource). If you don’t have a Foundry project, see Create a project for Microsoft Foundry.
Azure role-based access control (RBAC) permissions to create and manage deployments. You need the Cognitive Services Contributor role or equivalent permissions for the Foundry resource.
Foundry Models from partners and community require access to Azure Marketplace. Ensure you have the permissions required to subscribe to model offerings. Foundry Models sold directly by Azure don’t have this requirement.
Install the Azure CLI (version 2.60 or later) and the cognitiveservices extension.
```
az extension add -n cognitiveservices
```
Some commands in this tutorial use the jq tool, which might not be installed on your system. For installation instructions, see Download jq.
Identify the following information:
- Your Azure subscription ID
- Your Foundry resource name
- The resource group where you deployed the Foundry resource

Add models

To add a model, first identify the model that you want to deploy. Query the available models as follows:

Sign in to your Azure subscription.
```
az login
```
If you have more than one subscription, select the subscription where your resource is located.
```
az account set --subscription $subscriptionId
```
Set the following environment variables with the name of the Foundry resource you plan to use and resource group.
```
accountName="<ai-services-resource-name>"
resourceGroupName="<resource-group>"
location="eastus2"
```

If you haven’t created a Foundry resource yet, create one.

az cognitiveservices account create -n $accountName -g $resourceGroupName --custom-domain $accountName --location $location --kind AIServices --sku S0

Reference: az cognitiveservices account

Check which models are available to you and under which SKU. SKUs, also known as deployment types, define how Azure infrastructure processes requests. Models might offer different deployment types. The following command lists all the model definitions available:
```
az cognitiveservices account list-models \
    -n $accountName \
    -g $resourceGroupName \
| jq '.[] | { name: .name, format: .format, version: .version, sku: .skus[0].name, capacity: .skus[0].capacity.default }'
```
The output includes available models with their properties:
```
{
  "name": "Phi-4-mini-instruct",
  "format": "Microsoft",
  "version": "1",
  "sku": "GlobalStandard",
  "capacity": 1
}
```
Reference: az cognitiveservices account list-models
Identify the model you want to deploy. You need the properties name, format, version, and sku. The property format indicates the provider offering the model. Depending on the type of deployment, you might also need capacity.

Add the model deployment to the resource. The following example adds Phi-4-mini-instruct:

az cognitiveservices account deployment create \
    -n $accountName \
    -g $resourceGroupName \
    --deployment-name Phi-4-mini-instruct \
    --model-name Phi-4-mini-instruct \
    --model-version 1 \
    --model-format Microsoft \
    --sku-capacity 1 \
    --sku-name GlobalStandard

Reference: az cognitiveservices account deployment

Verify the deployment completed successfully:
```
az cognitiveservices account deployment show \
    --deployment-name Phi-4-mini-instruct \
    -n $accountName \
    -g $resourceGroupName \
| jq '.properties.provisioningState'
```
The output should display "Succeeded". The model is ready to use after provisioning completes. Reference: az cognitiveservices account list-models

You can deploy the same model multiple times if needed as long as it’s under a different deployment name. This capability is useful if you want to test different configurations for a given model, including content filters.

Use the model

This section is identical for both the CLI and Bicep approaches.

You can consume deployed models using the Endpoints for Foundry Models for the resource. When you construct your request, specify the parameter model and insert the model deployment name you created. You can programmatically get the URI for the inference endpoint by using the following code: Inference endpoint

az cognitiveservices account show  -n $accountName -g $resourceGroupName | jq '.properties.endpoints["Azure AI Model Inference API"]'

To make requests to the Foundry Models endpoint, append the route models. For example: https://<resource>.services.ai.azure.com/models. See the Azure AI Model Inference API reference for all supported operations. Inference keys

az cognitiveservices account keys list  -n $accountName -g $resourceGroupName

Manage deployments

You can see all the deployments available using the CLI:

Run the following command to see all the active deployments:
```
az cognitiveservices account deployment list -n $accountName -g $resourceGroupName
```
Reference: az cognitiveservices account deployment list

You can see the details of a given deployment:

az cognitiveservices account deployment show \
    --deployment-name "Phi-4-mini-instruct" \
    -n $accountName \
    -g $resourceGroupName

Reference: az cognitiveservices account deployment show

You can delete a given deployment as follows:

az cognitiveservices account deployment delete \
    --deployment-name "Phi-4-mini-instruct" \
    -n $accountName \
    -g $resourceGroupName

Reference: az cognitiveservices account deployment delete

Install the Azure CLI.
Identify the following information:
- Your Azure subscription ID
Your Foundry resource (formerly known as Azure AI Services resource) name
The resource group where the Foundry resource is deployed
The model name, provider, version, and SKU you want to deploy. You can use the Foundry portal or the Azure CLI to find this information. In this example, you deploy the following model:
- Model name: Phi-4-mini-instruct
- Provider: Microsoft
- Version: 1
- Deployment type: Global standard

Set up the environment

The example in this article is based on code samples contained in the Azure-Samples/azureai-model-inference-bicep repository. To run the commands locally without having to copy or paste file content, clone the repository:

git clone https://github.com/Azure-Samples/azureai-model-inference-bicep

The files for this example are in:

cd azureai-model-inference-bicep/infra

Foundry Models from partners and community available for deployment (for example, Cohere models) require Azure Marketplace. Model providers define the license terms and set the price for use of their models using Azure Marketplace. When deploying third-party models, ensure you have the following permissions in your account:

On the Azure subscription:

Microsoft.MarketplaceOrdering/agreements/offers/plans/read

Microsoft.MarketplaceOrdering/agreements/offers/plans/sign/action

Microsoft.MarketplaceOrdering/offerTypes/publishers/offers/plans/agreements/read

Microsoft.Marketplace/offerTypes/publishers/offers/plans/agreements/read

Microsoft.SaaS/register/action

On the resource group—to create and use the SaaS resource:

Microsoft.SaaS/resources/read

Microsoft.SaaS/resources/write

## Add the model

Use the template ai-services-deployment-template.bicep to describe model deployments: ai-services-deployment-template.bicep
```
// Source: ai-services-deployment-template.bicep (not available)
```

Run the deployment:

RESOURCE_GROUP="<resource-group-name>"
ACCOUNT_NAME="<azure-ai-model-inference-name>" 
MODEL_NAME="Phi-4-mini-instruct"
PROVIDER="Microsoft"
VERSION=1

az deployment group create \
    --resource-group $RESOURCE_GROUP \
    --template-file ai-services-deployment-template.bicep \
    --parameters accountName=$ACCOUNT_NAME modelName=$MODEL_NAME modelVersion=$VERSION modelPublisherFormat=$PROVIDER

Verify the deployment completed successfully:

az cognitiveservices account deployment show \
    --deployment-name $MODEL_NAME \
    -n $ACCOUNT_NAME \
    -g $RESOURCE_GROUP \
| jq '.properties.provisioningState'

The output should display "Succeeded".

Use the model

This section is identical for both the CLI and Bicep approaches.

az cognitiveservices account show  -n $accountName -g $resourceGroupName | jq '.properties.endpoints["Azure AI Model Inference API"]'

az cognitiveservices account keys list  -n $accountName -g $resourceGroupName

Troubleshooting

Error	Cause	Resolution
Quota exceeded	Your subscription reached the deployment quota for the selected SKU or region.	Check your quota in the Foundry portal or request an increase through Azure support.
Authorization failed	The identity used doesn’t have the required RBAC role.	Assign the Cognitive Services Contributor role on the Foundry resource.
Model not available	The model isn’t available in your region or subscription.	Run `az cognitiveservices account list-models` to check available models and regions.
Extension not found	The `cognitiveservices` CLI extension isn’t installed.	Run `az extension add -n cognitiveservices` to install the extension.

What is Microsoft Foundry (new)?

Get started

Agent development

Agent tools & integration

Model capabilities

Fine-tuning

Manage agents, models, & tools

Observability, evaluation, & tracing

Developer experience

API & SDK

Responsible AI

Best practices

Setup & configure

Security & governance

Operate & support

Deploy models using Azure CLI and Bicep

Deploy models using Azure CLI and Bicep

Prerequisites

Add models

Use the model

Manage deployments

Set up the environment

Use the model

Troubleshooting

What is Microsoft Foundry (new)?

Get started

Agent development

Agent tools & integration

Model capabilities

Fine-tuning

Manage agents, models, & tools

Observability, evaluation, & tracing

Developer experience

API & SDK

Responsible AI

Best practices

Setup & configure

Security & governance

Operate & support

​Deploy models using Azure CLI and Bicep

​Prerequisites

​Add models

​Use the model

​Manage deployments

​Set up the environment

​Permissions required to subscribe to Models from Partners and Community

​Use the model

​Troubleshooting

​Related content

Deploy models using Azure CLI and Bicep

Prerequisites

Add models

Use the model

Manage deployments

Set up the environment

Permissions required to subscribe to Models from Partners and Community

Use the model

Troubleshooting

Related content