Deploy your fine-tuned model
- Portal
- Python
- REST
- CLI
To deploy models, you need to be assigned the
Foundry Owner role or any role with the Microsoft.CognitiveServices/accounts/deployments/write action.The Foundry RBAC roles were recently renamed. Foundry User, Foundry Owner, Foundry Account Owner, and Foundry Project Manager were previously named Azure AI User, Azure AI Owner, Azure AI Account Owner, and Azure AI Project Manager. You might still see the previous names in some places while the rename rolls out. The role IDs and core permissions are unchanged by the rename.

After you deploy a customized model, if at any time the deployment remains inactive for more than 15 days, the deployment is deleted. The deployment of a customized model is inactive if the model was deployed more than 15 days ago and no chat completions or response API calls were made to it during a continuous 15-day period.The deletion of an inactive deployment doesn’t delete or affect the underlying customized model. The customized model can be redeployed at any time.As described in Azure OpenAI in Microsoft Foundry Models pricing, each customized (fine-tuned) model that’s deployed incurs an hourly hosting cost regardless of whether chat completions or response API calls are made to the model. To learn more about planning and managing costs with Azure OpenAI, see Plan and manage costs for Azure OpenAI.
Use your deployed fine-tuned model
- Portal
- Python
- REST
- CLI
After your custom model deploys, you can use it like any other deployed model. You can use the Playgrounds in the Foundry portal to experiment with your new deployment. You can continue to use the same parameters with your custom model, such as 
temperature and max_tokens, as you can with other deployed models.
Prompt caching
Azure OpenAI fine-tuning supports prompt caching with select models. Prompt caching allows you to reduce overall request latency and cost for longer prompts that have identical content at the beginning of the prompt. To learn more about prompt caching, see getting started with prompt caching.Deployment Types
Azure OpenAI fine-tuning supports the following deployment types.Standard
Standard deployments provide a pay-per-token billing model with data residency confined to the deployed region.| Models | East US2 | North Central US | Sweden Central |
|---|---|---|---|
| o4-mini | ✅ | ✅ | |
| GPT-4.1 | ✅ | ✅ | |
| GPT-4.1-mini | ✅ | ✅ | |
| GPT-4.1-nano | ✅ | ✅ | |
| GPT-4o | ✅ | ✅ | |
| GPT-4o-mini | ✅ | ✅ |
Global Standard
Global standard fine-tuned deployments offer cost savings, but custom model weights may temporarily be stored outside the geography of your Azure OpenAI resource. Global standard deployments are available from all Azure OpenAI regions for the following models:- o4-mini
- GPT-4.1
- GPT-4.1-mini
- GPT-4.1-nano
- GPT-4o
- GPT-4o-mini

Developer Tier
Developer fine-tuned deployments offer a similar experience as Global Standard without an hourly hosting fee, but do not offer an availability SLA. Developer deployments are designed for model candidate evaluation and not for production use. Developer deployments are available from all Azure OpenAI regions for the following models:| Models | Availability |
|---|---|
| o4-mini | All regions |
| GPT-4.1 | All regions |
| GPT-4.1-mini | All regions |
| GPT-4.1-nano | All regions |
Provisioned Throughput
| Models | North Central US | Sweden Central |
|---|---|---|
| GPT-4.1 | ✅ | |
| GPT-4o | ✅ | ✅ |
| GPT-4o-mini | ✅ | ✅ |
Clean up your deployment
To delete a deployment, use the Deployments - Delete REST API and send an HTTP DELETE to the deployment resource. Like with creating deployments, you must include the following parameters:- Azure subscription ID
- Azure resource group name
- Azure OpenAI resource name
- Name of the deployment to delete