Google has introduced a new model AI Gemini 2.5 Computer Use, which allows agents to work with web interfaces directly through the browser. Developers can test this model in open access via the Gemini API in Google AI Studio, Vertex AI, as well as in a demo version on Browserbase. The model analyzes the user’s request, a screenshot, and the history of previous actions, after which it performs one of thirteen actions, including text input, clicking, scrolling, dragging elements, or navigating to an address.
Gemini 2.5 Computer Use is optimized for browser work but also shows good results for mobile interfaces, although it is not yet intended for control at the computer operating system level. The model uses visual analysis and logical reasoning capabilities, allowing it to perform tasks such as filling out forms, organizing notes in online services, or adding items to a cart based on a list of ingredients.
Google claims that Gemini 2.5 Computer Use outperforms alternative solutions in accuracy and speed in several tests, including Online-Mind2Web and AndroidWorld. The model is already used for automated interface testing in the company’s internal projects, such as Project Mariner and AI Mode in Search, and has received positive feedback from early users who create personal assistants and tools for workflow automation.
To ensure safety, Google has implemented a check for each action before it is performed, and developers can set additional restrictions, such as requiring user confirmation or blocking risky actions, like attempts to bypass CAPTCHA or interact with medical devices. According to Google, the model will help automate routine tasks without the need for special APIs, opening up new opportunities for teams working on interface testing and digital automation.