Skip to main content

Deploy and use Claude models in Microsoft Foundry (preview)

This article refers to the Microsoft Foundry (new) portal.
Anthropic’s Claude models bring advanced conversational AI capabilities to Microsoft Foundry, enabling you to build intelligent applications with state-of-the-art language understanding and generation. Claude models excel at complex reasoning, code generation, and multimodal tasks including image analysis. In this article, you learn how to:
  • Deploy Claude models in Microsoft Foundry
  • Authenticate by using Microsoft Entra ID or API keys
  • Call the Claude Messages API from Python, JavaScript, or REST
  • Choose the right Claude model for your use case
Claude models in Foundry include:
Model familyModels
Claude Opusclaude-opus-4-6 (preview), claude-opus-4-5 (preview), claude-opus-4-1 (preview)
Claude Sonnetclaude-sonnet-4-5 (preview)
Claude Haikuclaude-haiku-4-5 (preview)
To learn more about the individual models, see Available Claude models.
To use Claude models in Microsoft Foundry, you need a paid Azure subscription with a billing account in a country or region where Anthropic offers the models for purchase. The following paid subscription types are currently restricted: Cloud Solution Providers (CSP), sponsored accounts with Azure credits, enterprise accounts in Singapore and South Korea, and Microsoft accounts.For a list of common subscription-related errors, see Common error messages and solutions.

Prerequisites

Deploy Claude models

Claude models in Foundry are available for global standard deployment. To deploy a Claude model, follow the instructions in Deploy Microsoft Foundry Models in the Foundry portal. After deployment, use the Foundry playground to interactively test the model.

Call the Claude Messages API

After you deploy a Claude model, interact with it to generate text responses:
  • Use the Anthropic SDKs and the following Claude APIs:
    • Messages API: Send a structured list of input messages with text or image content. The model generates the next message in the conversation.
    • Token Count API: Count the number of tokens in a message.
    • Files API: Upload and manage files for use with the Claude API without re-uploading content with each request.
    • Skills API: Create custom skills for Claude AI.

Send messages with authentication

The following examples show how to send requests to Claude Sonnet 4.5 using Microsoft Entra ID or API key authentication. To work with your deployed model, you need:
  • Your base URL, which is of the form https://<resource name>.services.ai.azure.com/anthropic.
  • Your target URI from your deployment details, which is of the form https://<resource name>.services.ai.azure.com/anthropic/v1/messages.
  • Microsoft Entra ID for keyless authentication or your deployment’s API key for API authentication.
  • Deployment name you chose during deployment creation. This name can be different from the model ID.

Use Microsoft Entra ID authentication

For Messages API endpoints, use your base URL with Microsoft Entra ID authentication.
  1. Install the Azure Identity client library: Install this library to use the DefaultAzureCredential. Authorization is easiest when you use DefaultAzureCredential because it finds the best credential to use in its running environment.
    pip install azure-identity
    
    Set the values of the client ID, tenant ID, and client secret of the Microsoft Entra ID application as environment variables: AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_CLIENT_SECRET.
    export AZURE_CLIENT_ID="<AZURE_CLIENT_ID>"
    export AZURE_TENANT_ID="<AZURE_TENANT_ID>"
    export AZURE_CLIENT_SECRET="<AZURE_CLIENT_SECRET>"
    
  2. Install dependencies: Install the Anthropic SDK by using pip (requires Python 3.8 or later).
    pip install -U "anthropic"
    
  3. Run a basic code sample to complete the following tasks:
    1. Create a client with the Anthropic SDK, using Microsoft Entra ID authentication.
    2. Make a basic call to the Messages API. The call is synchronous.
    from anthropic import AnthropicFoundry
    from azure.identity import DefaultAzureCredential, get_bearer_token_provider
    
    baseURL = "https://<resource-name>.services.ai.azure.com/anthropic" # Your base URL. Replace <resource-name> with your resource name
    deploymentName = "claude-sonnet-4-5" # Replace with your deployment name
    
    # Create token provider for Entra ID authentication
    tokenProvider = get_bearer_token_provider(
        DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
    )
    
    # Create client with Entra ID authentication
    client = AnthropicFoundry(
        azure_ad_token_provider=tokenProvider,
        base_url=baseURL
    )
    
    # Send request
    message = client.messages.create(
        model=deploymentName,
        messages=[
            {"role": "user", "content": "What is the capital/major city of France?"}
        ],
        max_tokens=1024,
    )
    
    print(message.content)
    
    Expected output: A JSON response with the model’s text completion in message.content, such as "The capital/major city of France is Paris." Reference: Anthropic Client SDK, DefaultAzureCredential

Use API key authentication

For Messages API endpoints, use your base URL and API key to authenticate against the service.
  1. Install dependencies: Install the Anthropic SDK by using pip (requires Python 3.8 or later):
    pip install -U "anthropic"
    
  2. Run a basic code sample to complete the following tasks:
    1. Create a client with the Anthropic SDK by passing your API key to the SDK’s configuration. This authentication method lets you interact seamlessly with the service.
    2. Make a basic call to the Messages API. The call is synchronous.
    from anthropic import AnthropicFoundry
    
    baseURL = "https://<resource-name>.services.ai.azure.com/anthropic" # Your base URL. Replace <resource-name> with your resource name
    deploymentName = "claude-sonnet-4-5" # Replace with your deployment name
    apiKey = "YOUR_API_KEY" # Replace YOUR_API_KEY with your API key
    
    # Create client with API key authentication
    client = AnthropicFoundry(
        api_key=apiKey,
        base_url=baseURL
    )
    
    # Send request
    message = client.messages.create(
        model=deploymentName,
        messages=[
            {"role": "user", "content": "What is the capital/major city of France?"}
        ],
        max_tokens=1024,
    )
    
    print(message.content)
    
    Expected output: A JSON response with the model’s text completion in message.content, such as "The capital/major city of France is Paris." Reference: Anthropic Client SDK

Available Claude models

Foundry supports Claude Opus 4.6, Claude Opus 4.5, Claude Opus 4.1, Claude Sonnet 4.5, and Claude Haiku 4.5 models through global standard deployment. These models have key capabilities:
  • Extended thinking: Enhanced reasoning for complex tasks.
  • Image and text input: Strong vision for analyzing charts, graphs, technical diagrams, reports, and other visual assets.
  • Code generation: Advanced code generation, analysis, and debugging.
For more details about the model capabilities, see capabilities of Claude models.

Claude Opus 4.6 (preview)

Claude Opus 4.6 is the latest version of Anthropic’s most intelligent model, and the world’s best model for coding, enterprise agents, and professional work. With a 1M token context window (beta) and 128K max output, Opus 4.6 is ideal for production code, sophisticated agents, office tasks, financial analysis, cybersecurity, and computer use.

Claude Opus 4.5 (preview)

Claude Opus 4.5 is an industry leader in coding, agents, computer use, and enterprise workflows. With a 200K token context window and 64K max output, Opus 4.5 is ideal for production code, sophisticated agents, office tasks, financial analysis, cybersecurity, and computer use tasks.

Claude Opus 4.1 (preview)

Claude Opus 4.1 is an industry leader for coding. It delivers sustained performance on long-running tasks that require focused effort and thousands of steps, significantly expanding what AI agents can solve.

Claude Sonnet 4.5 (preview)

Claude Sonnet 4.5 is a highly capable model designed for building real-world agents and handling complex, long-horizon tasks. It offers a strong balance of speed and cost for high-volume use cases. Sonnet 4.5 also provides advanced accuracy for computer use, enabling developers to direct Claude to use computers the way people do.

Claude Haiku 4.5 (preview)

Claude Haiku 4.5 delivers near-frontier performance for a wide range of use cases. It stands out as one of the best coding and agent models, with the right speed and cost to power free products and scaled subagents.

Advanced features and capabilities of Claude models

Claude in Foundry Models supports advanced features and capabilities. Core capabilities enhance Claude’s fundamental abilities for processing, analyzing, and generating content across various formats and use cases. Tools enable Claude to interact with external systems, execute code, and perform automated tasks through various tool interfaces. Some of the Core capabilities that Foundry supports are:
  • Large context window: An extended context window that processes larger documents and longer conversations.
  • Agent skills: Extend Claude’s capabilities with skills.
  • Citations: Ground Claude’s responses in source documents.
  • Context editing: Automatically manage conversation context with configurable strategies.
  • Extended thinking: Enhanced reasoning capabilities for complex tasks.
  • PDF support: Process and analyze text and visual content from PDF documents.
  • Prompt caching: Provide Claude with more background knowledge and example outputs to reduce costs and latency.
Some of the Tools that Foundry supports are:
  • MCP connector: Connect to remote MCP servers directly from the Messages API without a separate MCP client.
  • Memory: Store and retrieve information across conversations. Build knowledge bases over time, maintain project context, and learn from past interactions.
  • Web fetch: Retrieve full content from specified web pages and PDF documents for in-depth analysis.
For a full list of supported capabilities and tools, see Claude’s features overview.

Agent support

API quotas and limits

Currently, only Enterprise and MCA-E subscriptions are eligible for Claude model usage in Foundry.
Claude models in Foundry have the following rate limits, measured in Tokens Per Minute (TPM) and Requests Per Minute (RPM):
ModelDeployment typeDefault RPMDefault TPMEnterprise and MCA-E RPMEnterprise and MCA-E TPM
claude-opus-4-6Global Standard002,0002,000,000
claude-opus-4-5Global Standard002,0002,000,000
claude-opus-4-1Global Standard002,0002,000,000
claude-sonnet-4-5Global Standard004,0002,000,000
claude-haiku-4-5Global Standard004,0004,000,000
To increase your quota beyond the default limits, submit a request through the quota increase request form.

Rate-limit best practices

To optimize your usage and avoid rate limiting:
  • Implement retry logic: Handle 429 responses with exponential backoff.
  • Batch requests: Combine multiple prompts when possible.
  • Monitor usage: Track your token consumption and request patterns.
  • Use appropriate models: Choose the right Claude model for your use case.

Responsible AI considerations

When using Claude models in Foundry, consider these responsible AI practices:

Best practices

Follow these best practices when working with Claude models in Foundry:

Model selection

Choose the appropriate Claude model based on your specific requirements:
  • Claude Opus 4.6: Most intelligent model for building agents, coding, and enterprise workflows.
  • Claude Opus 4.5: Best performance across coding, agents, computer use, and enterprise workflows.
  • Claude Opus 4.1: Complex reasoning and enterprise applications.
  • Claude Sonnet 4.5: Balanced performance and capabilities, production workflows.
  • Claude Haiku 4.5: Speed and cost optimization, high-volume processing.

Prompt engineering

  • Clear instructions: Provide specific and detailed prompts.
  • Context management: Use the available context window effectively.
  • Role definitions: Use system messages to define the assistant’s role and behavior.
  • Structured prompts: Use consistent formatting for better results.

Cost optimization

  • Token management: Monitor and optimize token usage.
  • Model selection: Use the most cost-effective model for your use case.
  • Caching: Implement explicit prompt caching where appropriate.
  • Request batching: Combine multiple requests when possible.

Troubleshooting

The following table lists common errors when you work with Claude models in Foundry and their solutions:
ErrorCauseSolution
401 UnauthorizedInvalid or expired API key, or incorrect Entra ID token scope.Verify your API key is correct. For Entra ID, confirm you use scope https://cognitiveservices.azure.com/.default.
403 ForbiddenInsufficient permissions on the resource or subscription.Verify you have Contributor or Owner role on the resource group. For Entra ID, ensure the Cognitive Services User role is assigned.
404 Not FoundIncorrect endpoint URL or deployment name.Confirm your base URL follows the pattern https://<resource-name>.services.ai.azure.com/anthropic and the deployment name matches your configuration.
429 Too Many RequestsRate limit exceeded for your subscription tier.Implement exponential backoff with retry logic. Consider reducing request frequency or requesting a quota increase.
Subscription eligibility errorNon-Enterprise or non-MCA-E subscription.Claude models require an Enterprise or MCA-E subscription. See API quotas and limits for details.
Region not availableDeployment attempted in an unsupported region.Deploy to East US2 or Sweden Central, the supported regions for Claude models.