Skip to main content

Fireworks models on Microsoft Foundry

Through integration with Fireworks AI, Microsoft Foundry customers can: All of these capabilities are available directly within your Foundry project, with Azure governance, access controls, and project management built in.

Prerequisites

The Foundry RBAC roles were recently renamed. Foundry User, Foundry Owner, Foundry Account Owner, and Foundry Project Manager were previously named Azure AI User, Azure AI Owner, Azure AI Account Owner, and Azure AI Project Manager. You might still see the previous names in some places while the rename rolls out. The role IDs and core permissions are unchanged by the rename.

Region availability

Data Zone Standard deployments of models via Fireworks on Foundry are available in the following Azure regions:
  • East US (eastus)
  • East US 2 (eastus2)
  • Central US (centralus)
  • North Central US (northcentralus)
  • West US (westus)
  • West US 3 (westus3)
Global provisioned throughput deployments of base and custom models are available in all global Azure regions except for Azure Government cloud environments.

Deploy Fireworks models from the Foundry portal

Deploy Fireworks models from the Foundry model catalog. Complete these steps to get a live endpoint for chat completions. Browse available models in the Available catalog models section, or import your own custom model.
  1. From the portal homepage, select Discover in the upper-right navigation.
  2. In the left pane, select Models to open the Model catalog.
  3. Select your desired Fireworks model to view its details on the model page:
Screenshot of Foundry models homepage showing available Fireworks models.
  1. On the model page, select Deploy. For more information on deployment options, see Deploy Foundry Models in the portal.
  2. In the deployment window, configure the following settings:
    • Deployment name: Keep the default name or enter a custom name to identify the deployment.
    • Deployment type: Select Data Zone Standard or Global provisioned throughput. For more information, see Deployment types.
    • Model version settings: Select the model version for the deployment.
    • Tokens per Minute Rate Limit: Set a custom tokens-per-minute limit to manage costs and control usage. The default value is based on the model’s typical performance and cost profile.
    • Guardrails: Select DefaultV2 or Default guardrail configuration. Models use the Microsoft.DefaultV2 guardrail unless a different one is specified. For more information, see Use guardrails to set boundaries on model outputs.
  3. Select Deploy. The deployment process can take up to 30 minutes.
  4. After deployment completes, use the provided endpoint and key to send inference requests to the model. To quickly test the deployment, use the Playground in your Foundry project.
To verify the deployment, navigate to your project’s Deployments page and confirm the deployment Status shows Succeeded.

Available catalog models

The following Fireworks models are available in the Foundry model catalog:
Model providerModel nameModel IDTypeSupported offersDescription
DeepSeekDeepSeek v3.1FW-DeepSeek-v3.1Chat completionsPTUGeneral-purpose open-weight model for chat and reasoning tasks.
DeepSeekDeepSeek V4 ProFW-DeepSeek-V4-ProChat completionsPer-Token and PTUFlagship MoE model for frontier reasoning, coding, and long-context tasks.
GoogleGemma 4 26B A4B ITFW-Gemma-4-26B-A4B-ITChat completionsPTUInstruction-tuned multimodal sparse model for efficient vision and language tasks.
GoogleGemma 4 31B ITFW-Gemma-4-31B-ITChat completionsPTUInstruction-tuned multimodal dense model for vision, chat, and reasoning tasks.
MetaLlama 3.1 8B InstructFW-Llama-3.1-8B-InstructChat completionsPTUCompact instruction-tuned model for cost-efficient chat workloads.
Mistral AIMinistral 3 3B Instruct 2512FW-Ministral-3-3B-Instruct-2512Chat completionsPTUSmall efficient model for lightweight chat and instruction-following tasks.
Moonshot AIKimi K2 Instruct 0905FW-Kimi-K2-Instruct-0905Chat completionsPTUInstruction-tuned model for chat workloads.
Moonshot AIKimi K2 ThinkingFW-Kimi-K2-ThinkingChat completionsPTUReasoning-focused model for multi-step problem solving.
Moonshot AIKimi K2.6FW-Kimi-K2.6Chat completionsPer-Token and PTUNative multimodal agentic model for long-horizon coding and task orchestration.
QwenQwen 3.5 9BFW-Qwen3.5-9BChat completionsPTUCompact model for efficient chat and reasoning.
QwenQwen 3.5 35B A3BFW-Qwen3.5-35B-A3BChat completionsPTUSparse mixture-of-experts model for efficient general-purpose tasks.
QwenQwen 3.5 112B A10BFW-Qwen3.5-112B-A10BChat completionsPTULarge sparse model for complex reasoning and generation tasks.
QwenQwen 3.5 397BFW-Qwen3.5-397BChat completionsPTULarge-scale model for advanced reasoning and generation.
Zhipu AIGLM-4.7FW-GLM-4.7Chat completionsPTUBilingual model for chat and reasoning tasks.
Zhipu AIGLM-5.1FW-GLM-5.1Chat completionsPTUAdvanced bilingual model for chat, reasoning, and code.
All catalog models support the OpenAI/v1 API for Chat Completions API and the Foundry SDK and endpoint for accessing the Responses API.
Fireworks models on Standard (Per-Token) inference offerings are subject to a 15-day notice period prior to model retirement. Plan your deployments accordingly and monitor notifications for upcoming retirement dates.

Custom models (bring your own model)

In addition to the catalog models, Fireworks on Foundry supports importing and deploying your own custom model weights. This BYOM capability lets you run proprietary or fine-tuned open-weight models within the Foundry ecosystem, with inference provided by the optimized Fireworks cloud.

Supported model architectures

Custom models must be based on one of the following supported architectures:
  • Kimi (K2, K2.5, K2.6)
  • GLM (4.7, 4.8)
  • OpenAI (gpt-oss-120b)
  • Qwen (qwen3.5-9B, qwen3.5-35B-A3B, qwen3.5-112B-A10B, qwen3.5-397B)

Limitations

  • Full-weight models only. LoRA and adapter-based models aren’t supported.
  • CLI-first workflow. The import process uses the Azure Developer CLI (azd). The Foundry portal supports registering, viewing, and deploying models after upload.
  • Fireworks Agents and Agent Builder workflows aren’t currently supported.
For step-by-step instructions, see Import custom models into Foundry.

Data privacy

When you use Fireworks on Foundry, data is shared between Microsoft and Fireworks AI, and different compliance and data handling rules will apply. See below for details. Customers are responsible for evaluating whether data sharing between Microsoft and Fireworks is appropriate for their organizations compliance requirements.
  • Fireworks on Foundry is currently excluded from EU Data Boundary commitments.
  • FedRAMP isn’t achieved for Fireworks on Foundry. If your organization requires FedRAMP, before use, consult with your Authorization Official to determine if use of Fireworks on Foundry is allowed.
  • Payment Card Industry (PCI) Data Security Standard (DSS) isn’t applicable to Fireworks on Foundry. You shouldn’t use Fireworks on Foundry to store, process, or transmit payment and cardholder data.

Transparency note

Fireworks on Foundry allows customers to deploy and operate third-party and open-weight AI models using Microsoft Foundry platform services.
  • Microsoft doesn’t develop, train, fine-tune, or evaluate the safety, security, or Responsible AI characteristics of models deployed through Fireworks on Foundry.
  • Microsoft makes no representations regarding the behavior, performance, or risk profile of these models.
  • Customers are solely responsible for assessing the suitability of any model for their intended use, including performing any required safety, compliance, and Responsible AI evaluations, before deploying models in production or customer-facing applications.
Foundry provides the tools and best practices for performing your own risk and safety evaluations of models.

Frequently asked questions

Is Fireworks on Foundry available in Azure for US Government?

No, currently the Fireworks on Foundry service isn’t available for Azure Government cloud users.

How can I get quota for Fireworks model deployments?

Use the quota request form to request added quota for Fireworks on Foundry.

I have a Fireworks AI account. Can I use my existing Fireworks deployments?

No, you need to create new deployments in Foundry. If you’d like to shift consumption to Azure, contact your Fireworks account team to assist.

Can I deploy LoRA or adapter-based models?

No, Fireworks on Foundry supports full-weight custom models only. LoRA and adapter-based models aren’t supported at this time.

How do I import and deploy a custom model?

Custom model import uses a CLI-first workflow with the Azure Developer CLI. For step-by-step instructions, see Import custom models into Foundry.

How is Fireworks on Foundry billed?

Fireworks models deployed through Foundry support both pay-per-token and provisioned throughput offers.

How do I disable Fireworks in my Foundry project?

To stop using Fireworks models, delete the Fireworks model deployments from your Foundry project.

How do I use the Responses API?

The Responses API is supported via the Foundry Projects API and SDK. Make sure to point your client to your project’s API endpoint or use the Foundry SDK.

Troubleshoot Fireworks on Foundry

Use the following guidance to resolve common issues with Fireworks on Foundry.
IssueResolution
Fireworks models don’t appear in the model catalogVerify you’re working in a supported region. Check that the model catalog filters are set to show Fireworks models.
Deployment fails with a quota errorUse the quota request form to request added capacity for Fireworks on Foundry.
”Forbidden” or access denied during deploymentVerify that your identity has the Azure AI Developer role or higher on the Foundry project. Subscription-level roles alone aren’t sufficient for deployment.
Model endpoint returns errors after deploymentConfirm the deployment status shows Succeeded on the project’s Deployments page. Verify you’re using the correct Target URI and Key from the deployment details.
For other queries, see the frequently asked questions section.