The Hugging Face team has introduced an open cloud agent — the Open Computer Agent, which can perform tasks on a virtual computer running the Linux operating system. Access to the agent is provided via a web interface, where users can assign it simple actions, such as finding locations on a map or opening websites in the Firefox browser, which is already installed on the virtual machine.
The Open Computer Agent can handle basic requests, but it struggles with more complex tasks, such as searching for airline tickets. The agent also cannot pass CAPTCHA tests, which are often encountered when working with various websites. To use the Open Computer Agent, you need to wait in a virtual queue; the waiting time depends on the service load and can range from a few seconds to several minutes.
A distinctive feature of the agent is its support for computer vision models, including Qwen-VL, which can identify object coordinates in images and interact with virtual interface elements. This enables the agent to perform more complex automation scenarios, which can be useful for users looking to delegate routine tasks.
The developers emphasize that this model is not positioned as the best in its class, but rather serves as a demonstration of the growing capabilities of open AI models. According to research, about two-thirds of companies are already testing similar solutions to improve efficiency, and the AI agent market is expected to grow in the coming years.