Google has announced the launch of its Gemini 2.5 Computer Use model, designed to enable AI systems to control and navigate graphical user interfaces (GUIs). Unlike traditional AI models that work with structured data or APIs, this version can perform human-like actions such as clicking, typing, scrolling, and submitting forms directly on websites and mobile apps.
According to Google, this new model outperforms its competitors on several web and mobile automation benchmarks, offering lower latency and making it faster and more efficient for real-world use.
What makes it different from existing Gemini models
While earlier Gemini models focused mainly on text, vision, and reasoning, the new Computer Use model adds a unique layer- the ability to interact directly with live user interfaces.
It’s powered by Gemini 2.5 Pro’s advanced visual reasoning, enabling it to understand screenshots, identify clickable buttons, and execute commands in real time.
The model operates in a loop structure, meaning it continually updates itself after every action- taking new screenshots, understanding the current screen, and deciding the next move until the task is complete.
Benefits to users and developers
The new Gemini model can be a game-changer for automation. Developers can now use it for:
- UI testing: Automatically finding and fixing app interface errors.
- Workflow automation: Performing repetitive online tasks like filling forms or sorting data.
- Personal assistants: Building smarter agents capable of handling digital tasks independently.
Early testers, including companies like Autotab and Poke.com, reported that Gemini 2.5 was up to 50 per cent faster and more accurate than competing models. Google’s internal teams are already using it to improve software testing and payment systems.
How to use the Gemini 2.5 Computer Use Model
Developers can access the new model through the Gemini API on Google AI Studio and Vertex AI. Here’s how to get started:
- Access the public preview via Google AI Studio or Vertex AI.
- Set up your project using Playwright or Browserbase for browser-based automation.
- Use the computer_use tool, which takes inputs like screenshots, user requests, and action history.
- Run the agent loop — the model performs actions, updates screenshots, and continues until the task is done.
Users can also try a live demo through Browserbase to see Gemini’s automation in action.
Safety comes first
Google emphasised that safety and responsible use are at the heart of the Gemini 2.5 Computer Use model. It includes built-in safety checks, per-step verification, and developer controls that prevent harmful or high-risk actions- such as bypassing CAPTCHA or making unauthorised transactions.
The future of AI-powered automation
With this release, Google is taking a major step toward fully autonomous digital agents. The Gemini 2.5 Computer Use model not only enhances productivity for developers but also signals the start of a future where AI can safely and efficiently perform human-like digital tasks.