Use the computer use tool for agents (Preview)
Items marked (preview) in this article are currently in public preview. This preview is provided without a service-level agreement, and we don’t recommend it for production workloads. Certain features might not be supported or might have constrained capabilities. For more information, see Supplemental Terms of Use for Microsoft Azure Previews.
computer-use-preview model to propose actions based on visual content, enabling agents to interact with desktop and browser applications through their user interfaces.
This guide shows how to integrate the computer use tool into an application loop (screenshot → action → screenshot) by using the latest SDKs.
Usage support
| Microsoft Foundry support | Python SDK | C# SDK | JavaScript SDK | Java SDK | REST API | Basic agent setup | Standard agent setup |
|---|---|---|---|---|---|---|---|
| ✔️ | ✔️ | ✔️ | ✔️ | - | - | ✔️ | ✔️ |
Prerequisites
- An Azure subscription. Create one for free.
- A basic or standard agent environment.
- The latest prerelease SDK package:
- Python:
azure-ai-projects>=2.0.0b1,azure-identity,python-dotenv - C#/.NET:
Azure.AI.Agents.Persistent(prerelease) - TypeScript:
@azure/ai-projectsv2-beta,@azure/identity
- Python:
- Access to the
computer-use-previewmodel. See Request access below. - A virtual machine or sandboxed environment for safe testing. Don’t run on machines with access to sensitive data.
Environment variables
Set these environment variables before running the samples:| Variable | Description |
|---|---|
FOUNDRY_PROJECT_ENDPOINT | Your Foundry project endpoint URL. |
FOUNDRY_MODEL_DEPLOYMENT_NAME | Your computer-use-preview model deployment name. |
Quick verification
Verify your authentication and project connection before running the full samples:Run the maintained SDK samples (recommended)
The code snippets in this article focus on the agent and Responses API integration. For an end-to-end runnable sample that includes helper code and sample screenshots, use the SDK samples on GitHub.- Python: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/ai/azure-ai-projects/samples/agents/tools
- TypeScript: https://github.com/Azure/azure-sdk-for-js/blob/main/sdk/ai/ai-projects/samples/v2-beta/javascript/agents/tools/agentComputerUse.js
- .NET (computer use tool sample): https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/ai/Azure.AI.Agents.Persistent/samples/Sample33_Computer_Use.md
Request access
To access thecomputer-use-preview model, you need to register. Microsoft grants access based on eligibility criteria. If you have access to other limited access models, you still need to request access for this model.
To request access, see the application form.
After Microsoft grants access, you need to create a deployment for the model.
Code samples
To run this code, you need the latest prerelease package. See the quickstart for details.Screenshot initialization for computer use tool execution
The following code sample demonstrates how to create an agent version with the computer use tool, send an initial request with a screenshot, and perform multiple iterations to complete a task.Create an agent version with the tool
One iteration for the tool to process the screenshot and take the next step
Perform multiple iterations
Make sure you review each iteration and action. The following code sample shows a basic API request. After you send the initial API request, perform a loop where your application code carries out the specified action. Send a screenshot with each turn so the model can evaluate the updated state of the environment. For an example integration for a similar API, see the Azure OpenAI documentation.Clean up
Expected output
The following example shows the expected output when running the previous code sample:Sample for use of an Agent with Computer Use tool
The following C# code sample demonstrates how to create an agent version with the computer use tool, send an initial request with a screenshot, and perform multiple iterations to complete a task. To enable your agent to use the computer use tool, useResponseTool.CreateComputerTool() when configuring the agent’s tools. This example uses synchronous code. For asynchronous usage, see the sample code example in the Azure SDK for .NET repository on GitHub.
Expected output
The following example shows the expected output when running the previous code sample:Sample for use of an Agent with Computer Use tool
The following TypeScript code sample demonstrates how to create an agent version with the computer use tool, send an initial request with a screenshot, and perform multiple iterations to complete a task. For a JavaScript example, see the sample code in the Azure SDK for JavaScript repository on GitHub.Expected output
The following example shows the expected output when running the previous code sample:What you can do with the computer use tool
After you integrate the request-and-response loop (screenshot -> action -> screenshot), the computer use tool can help an agent:- Propose UI actions such as clicking, typing, scrolling, and requesting a new screenshot.
- Adapt to UI changes by re-evaluating the latest screenshot after each action.
- Work across browser and desktop UI, depending on how you host your sandboxed environment.
Differences between browser automation and computer use
The following table lists some of the differences between the computer use tool and browser automation tool.| Feature | Browser Automation | Computer use tool |
|---|---|---|
| Model support | All GPT models | Computer-use-preview model only |
| Can I visualize what’s happening? | No | Yes |
| How it understands the screen | Parses the HTML or XML pages into DOM documents | Raw pixel data from screenshots |
| How it acts | A list of actions provided by the model | Virtual keyboard and mouse |
| Is it multistep? | Yes | Yes |
| Interfaces | Browser | Computer and browser |
| Do I need to bring my own resource? | Your own Playwright resource with the keys stored as a connection. | No additional resource required but we highly recommend running this tool in a sandboxed environment. |
When to use each tool
Choose computer use when you need to:- Interact with desktop applications beyond the browser
- Visualize what the agent sees through screenshots
- Work in environments where DOM parsing isn’t available
- Perform web-only interactions without limited access requirements
- Use any GPT model (not limited to
computer-use-preview) - Avoid managing screenshot capture and action execution loops
Regional support
To use the computer use tool, you need a computer use model deployment. The computer use model is available in the following regions:| Region | Status |
|---|---|
eastus2 | Available |
swedencentral | Available |
southindia | Available |
Understanding the computer use integration
When working with the computer use tool, integrate it into your application by performing the following steps:- Send a request to the model that includes a call to the computer use tool, the display size, and the environment. You can also include a screenshot of the initial state of the environment in the first API request.
- Receive a response from the model. If the response has action items, those items contain suggested actions to make progress toward the specified goal. For example, an action might be
screenshotso the model can assess the current state with an updated screenshot, orclickwith X/Y coordinates indicating where the mouse should be moved. - Execute the action by using your application code on your computer or browser environment.
- After executing the action, capture the updated state of the environment as a screenshot.
- Send a new request with the updated state as a
tool_call_output, and repeat this loop until the model stops requesting actions or you decide to stop.
Before using the tool, set up an environment that can capture screenshots and execute the recommended actions by the agent. For safety reasons, use a sandboxed environment, such as Playwright.
Manage conversation history
Use theprevious_response_id parameter to link the current request to the previous response. Use this parameter when you don’t want to send the full conversation history with each call.
If you don’t use this parameter, make sure to include all the items returned in the response output of the previous request in your inputs array. This requirement includes reasoning items if present.
Safety checks and security considerations
The API has safety checks to help protect against prompt injection and model mistakes. These checks include: Malicious instruction detection: The system evaluates the screenshot image and checks if it contains adversarial content that might change the model’s behavior. Irrelevant domain detection: The system evaluates thecurrent_url parameter (if provided) and checks if the current domain is relevant given the conversation history.
Sensitive domain detection: The system checks the current_url parameter (if provided) and raises a warning when it detects the user is on a sensitive domain.
If one or more of the preceding checks are triggered, the model raises a safety check when it returns the next computer_call by using the pending_safety_checks parameter.
acknowledged_safety_checks in the next request to proceed.
Safety check handling
In all cases wherepending_safety_checks are returned, hand over actions to the end user to confirm proper model behavior and accuracy.
malicious_instructions and irrelevant_domain: End users should review model actions and confirm that the model behaves as intended.
sensitive_domain: Ensure an end user actively monitors the model actions on these sites. The exact implementation of this “watch mode” can vary by application, but a potential example could be collecting user impression data on the site to make sure there’s active end user engagement with the application.
Troubleshooting
| Issue | Cause | Resolution |
|---|---|---|
You don’t see a computer_call in the response. | The agent isn’t configured with the computer use tool, the deployment isn’t a computer use model, or the prompt doesn’t require UI interaction. | Confirm the agent has a computer_use_preview tool, your deployment is the computer-use-preview model, and your prompt requires a UI action (type, click, or screenshot). |
| The sample code fails with missing helper files or screenshots. | The snippets reference helper utilities and sample images that aren’t part of this documentation repo. | Run the maintained SDK samples in the “Run the maintained SDK samples” section, or copy the helper file and sample images from the SDK repo into your project. |
| The loop stops at the iteration limit. | The task needs more turns, or the app isn’t applying the actions the model requests. | Increase the iteration limit, and verify that your code executes the requested action and sends a new screenshot after each turn. |
You receive pending_safety_checks. | The service detected a potential security risk (for example, prompt injection or a sensitive domain). | Pause automation, require an end user to review the request, and only continue after you send acknowledged_safety_checks with the next computer_call_output. |
| The model repeats “take a screenshot” without making progress. | The screenshot isn’t updating, is low quality, or doesn’t show the relevant UI state. | Send a fresh screenshot after each action and use a higher-detail image when needed. Ensure the screenshot includes the relevant UI. |
Access denied when requesting computer-use-preview model. | You haven’t registered for access or access hasn’t been granted. | Submit the application form and wait for approval. Check your email for confirmation. |
| Screenshot encoding errors. | Image format not supported or base64 encoding issue. | Use PNG or JPEG format. Ensure proper base64 encoding without corruption. Check image dimensions match display_width and display_height. |
| Actions execute on wrong coordinates. | Screen resolution mismatch between screenshot and actual display. | Ensure display_width and display_height in ComputerUsePreviewTool match your actual screen resolution. |
| Model hallucinates UI elements. | Screenshot quality too low or UI changed between turns. | Use higher resolution screenshots. Send fresh screenshots immediately after each action. Reduce delay between action and screenshot. |