Import custom models with Fireworks

Import and deploy your own model weights on Foundry using the Fireworks inference runtime. In this article, you learn how to import, register, and deploy your own custom model weights in Microsoft Foundry. Custom model import (also known as bring your own weights) lets you run your proprietary or fine-tuned open-weight models within the Foundry ecosystem. You can also import LoRA adapters (preview) and draft models for speculative decoding (preview) when your base model and deployment type support those features. LoRA adapters are lightweight fine-tuning artifacts that modify a base model without uploading a full copy of the model weights. Speculative decoding uses a smaller draft model to propose tokens that a target model verifies, which can reduce generation latency for supported workloads.

This custom model import guide uses the Fireworks on Foundry integration. For an overview of available catalog models, supported architectures, data privacy, and limitations, see Use Fireworks models on Foundry.

Items marked (preview) in this article are currently in preview. This preview is provided without a service-level agreement, and we don’t recommend it for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

The import workflow has five steps:

Prepare your model, adapter, or draft model files.
Register the custom model asset in the Foundry portal.
Upload the files using the generated Azure Developer CLI command.
Deploy the model, LoRA adapter (preview), or speculative decoding configuration (preview) to Fireworks inference infrastructure.
Test the deployment in Foundry.

Prerequisites

Before you begin, make sure your Azure environment is set up and that you have the required tools installed. To complete the steps in this article, you need the following resources and permissions:

An Azure subscription. If you don’t have one, create a free account.
A Foundry resource with a Foundry project.
The Cognitive Services Contributor role or equivalent permissions on the Foundry resource to create and manage deployments. For more information, see Azure role based access control.
Azure Developer CLI (azd) installed locally. The import workflow uses azd to upload model weights. To verify your installation, run azd version and azd ai models create --help.
Azure Developer CLI authentication. Before running the generated upload command, sign in with azd auth login.

Region availability

Support for deploying custom models is available in all global Azure regions except for Azure Government cloud environments.

Model requirements

Custom models must match a supported Fireworks model and include the files required for the model type you import. Review the supported models, file requirements, and preview constraints before starting the import process.

Supported architectures

Custom models must be based on one of the following model architectures:

Model Architecture	Versions
DeepSeek	v3.1, V4 Pro
Gemma	4 26B A4B IT, 4 31B IT
GLM	4.7, 5.1
Kimi	K2 Instruct 0905, K2 Thinking, K2.6
Llama	3.1 8B Instruct
Ministral	3 3B Instruct 2512
Qwen	3.5 9B, 3.5 35B A3B, 3.5 112B A10B, 3.5 397B

Required model files

Your model directory must include the files required for the model weight type you’re importing.

File	Full-weight model	LoRA adapter (preview)	Draft model for speculative decoding (preview)
`config.json`	Required	Not required; inherited from the base model	Required
`.safetensors` or `.bin` full weight files	Required	Not applicable	Required
`tokenizer.model`, `tokenizer.json`, or `tokenizer_config.json`	Required	Not required; inherited from the base model	Required
`adapter_config.json`	Not applicable	Required	Not applicable
`adapter_model.bin` or `adapter_model.safetensors`	Not applicable	Required	Not applicable

Import a custom model

The import process starts in the Foundry portal, where you register your model, and then uses the Azure Developer CLI to upload the model weights from your local machine.

Sign in to the Foundry portal.
From the Foundry portal homepage, select Build in the upper-right navigation, then select Models in the left pane.
Select the Models tab.
Select Generate upload command instead.

Configure the following settings:

Setting	Description
Base model	Select the Fireworks base model or architecture that matches your model files. For LoRA adapters, select the base model that the adapter targets. For draft models, select the model family that matches the target model you plan to pair with the draft model.
Model details	Enter the custom model name and version details.
Weight type	Select Full weight model, LoRA adapter (preview), or Draft model (preview). The portal uses this selection to generate the appropriate CLI command and flags.
LoRA settings	For LoRA adapters (preview), configure rank and alpha. Target modules and dropout are optional settings.
Model path	Enter the local path to the folder that contains your model files.

The portal generates an azd command. Copy the command and paste it into a local terminal. Update the --source parameter to point to the directory that contains your model weight files.

Make sure the directory you specify contains all the required model files. Missing files cause the import to fail.

Wait for the upload to complete. Upload time depends on the model size and your network bandwidth. Large models (tens of gigabytes) can take a significant amount of time over standard connections.

Verify model registration

After the upload finishes, confirm that Foundry successfully registered the model before proceeding to deployment.

Return to the Foundry portal and refresh the Custom Models page.
Confirm that your imported custom model appears in the list with a Registered status.
Select your model to review its details, including the architecture and file manifest.

Deploy the imported model

With the model registered, you can deploy it to Fireworks’ cloud for inference.

From the Models list, select your custom model.
Select Deploy.
Configure the deployment:
- Deployment name: provide a deployment name. During inference, this name is used in the model parameter to route requests to this deployment.
- Provisioned throughput units: allocate the number of provisioned throughput units (PTUs) for the deployment. For more information, see Provisioned throughput concepts.
Review and acknowledge the pricing terms.
Select Deploy.

When the deployment completes, the status shows Succeeded in your deployment list.

You can only have one active deployment of the same imported custom model at a time in a given project.

Deploy with speculative decoding (preview)

Speculative decoding pairs a target model with a same-family, architecture-compatible draft model to reduce token generation latency during decoding. It does not improve context processing latency in the prefill phase, so workloads with long inputs and short outputs may see limited benefit. To deploy a draft model, select the registered draft model, and then select Deploy. Configure the deployment name, select a Target model from the same model family as the draft model, set the draft token count, choose the deployment type, configure PTU capacity, review the pricing terms, and deploy.

Deployment examples

Use the following examples to automate parts of the deployment workflow after the custom model is registered. Each example deploys the custom model with 80 units of Global Provisioned throughput. Be sure to replace any placeholders with your details.

    PUT https://management.azure.com/subscriptions/{subscription-id}/resourceGroups/{resource-group}/providers/Microsoft.CognitiveServices/accounts/{foundry-account}/deployments/{deployment-name}?api-version=2025-06-01
    Authorization: Bearer <access-token>
    Content-Type: application/json

Test your deployment

After the deployment succeeds, verify it works by sending a test request:

Open the Foundry Playground.
Select your custom model deployment from the model list.
Send a test prompt and confirm the model returns a valid response.

Troubleshooting

If you encounter issues during import or deployment, use the following table to identify common problems and resolutions.

Issue	Resolution
Import fails with missing files	Verify your model directory contains all required model files, including `config.json`, weight files, an index file, and tokenizer files.
Architecture mismatch	Confirm the architecture you selected matches your model. See supported architectures.
Upload times out or stalls	Check your network connection and retry. For large models, use a stable high-bandwidth connection.
Deployment fails	Confirm you have sufficient quota and that Fireworks on Foundry is available in your supported region.
Speculative decoding draft model isn’t available	Confirm that the draft model is registered in the same project and that the draft model and target model are in the same model family and architecture-compatible.
Quota exceeded	Request more quota or reallocate provisioned throughput units from existing deployments.

For more troubleshooting guidance, see Troubleshoot Fireworks on Foundry. Explore the following resources to learn more about Fireworks models, deployment options, and authentication on Foundry.

​Import custom models with Fireworks

​Prerequisites

​Region availability

​Model requirements

​Supported architectures

​Required model files

​Import a custom model

​Verify model registration

​Deploy the imported model

​Deploy with speculative decoding (preview)

​Deployment examples

​Test your deployment

​Troubleshooting

​Related content

Import custom models with Fireworks

Prerequisites

Region availability

Model requirements

Supported architectures

Required model files

Import a custom model

Verify model registration

Deploy the imported model

Deploy with speculative decoding (preview)

Deployment examples

Test your deployment

Troubleshooting

Related content