Anthropic

Computer, No Keyboard: The Next Step in Generative AI Interfaces

Dulany Weaver

07 Jan 2025 — 3 min read

Direct computer use by generative AI will greatly simplify their ability to control software.

Generative AI can make realistic images, write books, and code websites, but there is a major barrier to using it for business: integration. Businesses use a range of common and bespoke software to do their work – Excel, Salesforce, custom tools, etc. But while generative AI systems exist in the world of software and code, they haven’t been able to work directly in the software the same way a person does.

So far, businesses have bridged this gap in two ways. One is fully manual; a person writes a prompt, generates a result, then copy/pastes it into another screen. This keep the human in the loop, offering a quality assurance benefit, but also creates a bottleneck for both speed and value. The other approach has been custom integration – a vendor or consultant writes code that links the AI to your business software directly using special software interfaces. However, this is expensive, brittle, and requires your software to support this approach.

Recently, though, two major AI firms announced another way to interface with generative AI: directly through the same software interface a human would use. No code hacking necessary – just read, point, and click, directed by the AI.

And this opens up a whole new range of possibilities for accelerating business with generative AI.

Claude Can Now Use Your Computer

In October, AI firm Anthropic – a major competitor to OpenAI and Google – announced a public beta for functionality called “computer use”. In their announcement post, they note that, through their software, “developers can direct [their AI] Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking buttons, and typing text.” The AI understands the different elements of the software, their intended function, and how they fit together, then provides direction on what to do next to get the desired result.

With this approach, a company could just set up a standard company computer (either physically or virtually), point Claude to it with some directions, and Claude could control the computer directly, just like an employee. No more licensing issues, no systems breaking with software updates, and easy expansion of capabilities as needed. Just a computer on a desk, no keyboard necessary, getting the work done 24/7.

Anthropic is the first to note that Claude’s computer use functionality is still experimental. However, they also highlight that “Asana, Canva, Cognition, DoorDash, Replit, and The Browser Company have already begun to explore these possibilities, carrying out tasks that require dozens, and sometimes even hundreds, of steps to complete.” With the speed of AI advancement nowadays, this capability will likely improve and become common very quickly.

Other Options For Computer Use

Google is also exploring a similar path with the latest version of it’s Gemini series of models. Specifically, Project Mariner is an experimental system that uses Gemini’s vision capabilities to understand what’s in your browser, then uses a special Chrome extension to automatically control the web page. Although this requires that all of the software needed for your business is available from the browser, it could potentially offer functionality akin to Claude’s computer use.

There is also a strong trend in AI development today toward multimodal systems that understand multiple different types of input natively. This means that they understand not just text, but also images, audio, and video with no conversion necessary. From this, it’s a fairly direct step toward both understanding software and controlling the computer. Send the system a screenshot, and it can tell you the next step in the process – or potentially just do it itself.

How To Think About Integration Going Forward

Again, as both Anthropic and Google note, this functionality is still largely experimental. If you want to implement a system based on AI computer use, you should be ready for potential bugs or mistakes, and ensure that you have appropriate contingencies in place.

It’s also important to note that this approach – taking humans out of the loop – offers critical trade-offs. On the one hand, you give up real-time quality assurance (while still keeping after-the-fact QA as an option). In return, though, you gain the potential for faster execution from an AI which can operate 24/7 instead of just 8/5. When compared to today’s integration options, computer use is also potentially more robust to changes in the software, and more flexible to many different types of software systems – if any of your software products change, it may not require an expensive and time-consuming update to the integration.

For now, detailed (and expensive) integration or human-in-the-loop strategies are still your best options if deploying in the near-term. However, if you’re considering adding AI in the next few years, you can look forward to that integration becoming much, much easier in the near future as generative AI’s begin to use computers themselves.

Computer, No Keyboard: The Next Step in Generative AI Interfaces

Dulany Weaver

Claude Can Now Use Your Computer

Other Options For Computer Use

How To Think About Integration Going Forward

Read more

Who is DeepSeek, and Why Did It Affect Nvidia?

The Best Commercial AI Image Generator You’ve Never Heard Of

OpenAI Just Filled Every Tier 1 Support Job in the World

AI Case Study: PwC Invests $1B, Increases Productivity 40% With Generative AI