Use the computer use tool for agents - Microsoft Foundry Docs

Items marked (preview) in this article are currently in public preview. This preview is provided without a service-level agreement, and we don’t recommend it for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.

The computer use tool comes with significant security and privacy risks, including prompt injection attacks. For more information about intended uses, capabilities, limitations, risks, and considerations when choosing a use case, see the Azure OpenAI transparency note.

Create agents that interpret screenshots and automate UI interactions like clicking, typing, and scrolling. The computer use tool uses the computer-use-preview Foundry model to propose actions based on visual content, enabling agents to interact with desktop and browser applications through their user interfaces. This guide shows how to integrate the computer use tool into an application loop (screenshot → action → screenshot) by using the latest SDKs.

Usage support

The following table shows SDK and setup support.

Microsoft Foundry support	Python SDK	C# SDK	JavaScript SDK	Java SDK	REST API	Basic agent setup	Standard agent setup
✔️	✔️	✔️	✔️	✔️	✔️	✔️	✔️

Prerequisites

An Azure subscription. Create one for free.
A basic or standard agent environment.
The latest SDK package:
- Python: azure-ai-projects
- C#/.NET: Azure.AI.Extensions.OpenAI
- TypeScript: @azure/ai-projects
- Java: azure-ai-agents
Access to the computer-use-preview model. See Request access below.
A virtual machine or sandboxed environment for safe testing. Don’t run on machines with access to sensitive data.

Run the maintained SDK samples (recommended)

The code snippets in this article focus on the agent and Responses API integration. For an end-to-end runnable sample that includes helper code and sample screenshots, use the SDK samples on GitHub.

The SDK samples include helper utilities for screenshot capture, action execution, and image encoding. Clone the repository or copy these files to your project before running the samples.

Request access

To access the computer-use-preview model, you need to register. Microsoft grants access based on eligibility criteria. If you have access to other limited access models, you still need to request access for this model. To request access, see the application form. After Microsoft grants access, you need to create a deployment for the model.

Code samples

Use the computer use tool on virtual machines with no access to sensitive data or critical resources. For more information about the intended uses, capabilities, limitations, risks, and considerations when choosing a use case, see the Azure OpenAI transparency note.

You need the latest SDK package. The .NET SDK is currently in preview.

What you can do with the computer use tool

After you integrate the request-and-response loop (screenshot -> action -> screenshot), the computer use tool can help an agent:

Propose UI actions such as clicking, typing, scrolling, and requesting a new screenshot.
Adapt to UI changes by re-evaluating the latest screenshot after each action.
Work across browser and desktop UI, depending on how you host your sandboxed environment.

The tool doesn’t directly control a device. Your application executes each requested action and returns an updated screenshot.

Differences between browser automation and computer use

The following table lists some of the differences between the computer use tool and browser automation tool.

Feature	Browser Automation	Computer use tool
Model support	All GPT models	`Computer-use-preview` model only
Can I visualize what’s happening?	No	Yes
How it understands the screen	Parses the HTML or XML pages into DOM documents	Raw pixel data from screenshots
How it acts	A list of actions provided by the model	Virtual keyboard and mouse
Is it multistep?	Yes	Yes
Interfaces	Browser	Computer and browser
Do I need to bring my own resource?	Your own Playwright resource with the keys stored as a connection.	No additional resource required but we highly recommend running this tool in a sandboxed environment.

When to use each tool

Choose computer use when you need to:

Interact with desktop applications beyond the browser
Visualize what the agent sees through screenshots
Work in environments where DOM parsing isn’t available

Choose browser automation when you need to:

Perform web-only interactions without limited access requirements
Use any GPT model (not limited to computer-use-preview)
Avoid managing screenshot capture and action execution loops

Regional support

To use the computer use tool, you need a computer use model deployment. The computer use model is available in the following regions:

Region	Status
`eastus2`	Available
`swedencentral`	Available
`southindia`	Available

Understanding the computer use integration

When working with the computer use tool, integrate it into your application by performing the following steps:

Send a request to the model that includes a call to the computer use tool, the display size, and the environment. You can also include a screenshot of the initial state of the environment in the first API request.
Receive a response from the model. If the response has action items, those items contain suggested actions to make progress toward the specified goal. For example, an action might be screenshot so the model can assess the current state with an updated screenshot, or click with X/Y coordinates indicating where the mouse should be moved.
Execute the action by using your application code on your computer or browser environment.
After executing the action, capture the updated state of the environment as a screenshot.
Send a new request with the updated state as a tool_call_output, and repeat this loop until the model stops requesting actions or you decide to stop.

Before using the tool, set up an environment that can capture screenshots and execute the recommended actions by the agent. For safety reasons, use a sandboxed environment, such as Playwright.

Manage conversation history

Use the previous_response_id parameter to link the current request to the previous response. Use this parameter when you don’t want to send the full conversation history with each call. If you don’t use this parameter, make sure to include all the items returned in the response output of the previous request in your inputs array. This requirement includes reasoning items if present.

Safety checks and security considerations

Computer use carries substantial security and privacy risks and user responsibility. Both errors in judgment by the AI and the presence of malicious or confusing instructions on web pages, desktops, or other operating environments that the AI encounters might cause it to execute commands you or others don’t intend. These risks could compromise the security of your or other users’ browsers, computers, and any accounts to which AI has access, including personal, financial, or enterprise systems.Use the computer use tool on virtual machines with no access to sensitive data or critical resources. For more information about the intended uses, capabilities, limitations, risks, and considerations when choosing a use case, see the Azure OpenAI transparency note.

The API has safety checks to help protect against prompt injection and model mistakes. These checks include: Malicious instruction detection: The system evaluates the screenshot image and checks if it contains adversarial content that might change the model’s behavior. Irrelevant domain detection: The system evaluates the current_url parameter (if provided) and checks if the current domain is relevant given the conversation history. Sensitive domain detection: The system checks the current_url parameter (if provided) and raises a warning when it detects the user is on a sensitive domain. If one or more of the preceding checks are triggered, the model raises a safety check when it returns the next computer_call by using the pending_safety_checks parameter.

"output": [ 
    { 
        "type": "reasoning", 
        "id": "rs_67cb...", 
        "summary": [ 
            { 
                "type": "summary_text", 
                "text": "Exploring 'File' menu option." 
            } 
        ] 
    }, 
    { 
        "type": "computer_call", 
        "id": "cu_67cb...", 
        "call_id": "call_nEJ...", 
        "action": { 
            "type": "click", 
            "button": "left", 
            "x": 135, 
            "y": 193 
        }, 
        "pending_safety_checks": [ 
            { 
                "id": "cu_sc_67cb...", 
                "code": "malicious_instructions", 
                "message": "We've detected instructions that may cause your application to perform malicious or unauthorized actions. Please acknowledge this warning if you'd like to proceed." 
            } 
        ], 
        "status": "completed" 
    } 
]

You need to pass the safety checks back as acknowledged_safety_checks in the next request to proceed.

"input":[ 
        { 
            "type": "computer_call_output", 
            "call_id": "<call_id>", 
            "acknowledged_safety_checks": [ 
                { 
                    "id": "<safety_check_id>", 
                    "code": "malicious_instructions", 
                    "message": "We've detected instructions that may cause your application to perform malicious or unauthorized actions. Please acknowledge this warning if you'd like to proceed." 
                } 
            ], 
            "output": { 
                "type": "computer_screenshot", 
                "image_url": "<image_url>" 
            } 
        } 
    ]

Safety check handling

In all cases where pending_safety_checks are returned, hand over actions to the end user to confirm proper model behavior and accuracy. malicious_instructions and irrelevant_domain: End users should review model actions and confirm that the model behaves as intended. sensitive_domain: Ensure an end user actively monitors the model actions on these sites. The exact implementation of this “watch mode” can vary by application, but a potential example could be collecting user impression data on the site to make sure there’s active end user engagement with the application.

Troubleshooting

Issue	Cause	Resolution
You don’t see a `computer_call` in the response.	The agent isn’t configured with the computer use tool, the deployment isn’t a computer use model, or the prompt doesn’t require UI interaction.	Confirm the agent has a `computer_use_preview` tool, your deployment is the `computer-use-preview` model, and your prompt requires a UI action (type, click, or screenshot).
The sample code fails with missing helper files or screenshots.	The snippets reference helper utilities and sample images that aren’t part of this documentation repo.	Run the maintained SDK samples in the “Run the maintained SDK samples” section, or copy the helper file and sample images from the SDK repo into your project.
The loop stops at the iteration limit.	The task needs more turns, or the app isn’t applying the actions the model requests.	Increase the iteration limit, and verify that your code executes the requested action and sends a new screenshot after each turn.
You receive `pending_safety_checks`.	The service detected a potential security risk (for example, prompt injection or a sensitive domain).	Pause automation, require an end user to review the request, and only continue after you send `acknowledged_safety_checks` with the next `computer_call_output`.
The model repeats “take a screenshot” without making progress.	The screenshot isn’t updating, is low quality, or doesn’t show the relevant UI state.	Send a fresh screenshot after each action and use a higher-detail image when needed. Ensure the screenshot includes the relevant UI.
Access denied when requesting `computer-use-preview` model.	You haven’t registered for access or access hasn’t been granted.	Submit the application form and wait for approval. Check your email for confirmation.
Screenshot encoding errors.	Image format not supported or base64 encoding issue.	Use PNG or JPEG format. Ensure proper base64 encoding without corruption. Check image dimensions match `display_width` and `display_height`.
Actions execute on wrong coordinates.	Screen resolution mismatch between screenshot and actual display.	Ensure `display_width` and `display_height` in `ComputerUsePreviewTool` match your actual screen resolution.
Model hallucinates UI elements.	Screenshot quality too low or UI changed between turns.	Use higher resolution screenshots. Send fresh screenshots immediately after each action. Reduce delay between action and screenshot.

​Usage support

​Prerequisites

​Run the maintained SDK samples (recommended)

​Request access

​Code samples

​What you can do with the computer use tool

​Differences between browser automation and computer use

​When to use each tool

​Regional support

​Understanding the computer use integration

​Manage conversation history

​Safety checks and security considerations

​Safety check handling

​Troubleshooting

​Related content

Usage support

Prerequisites

Run the maintained SDK samples (recommended)

Request access

Code samples

What you can do with the computer use tool

Differences between browser automation and computer use

When to use each tool

Regional support

Understanding the computer use integration

Manage conversation history

Safety checks and security considerations

Safety check handling

Troubleshooting

Related content