Introducing Blink: The First Agent to Score 100% on Online Mind2Web

Welcome to the Blink blog. This is where we’ll share what we’re building, why browser use agents matter, and everything we learn along the way. If you’re interested in autonomous agents that can actually operate on the web, this blog is for you.

Our first post is a big one. We’re excited to share that Blink is the first agent to score 100% on every easy, medium, and hard task on the Online Mind2Web benchmark.

The Results

The Online Mind2Web benchmark includes 2,350 real tasks drawn from actual user journeys across shopping, travel, search, job hunting, and more. The instructions are constraint-heavy: wrong date range, wrong color, wrong location, and you fail.

Blink completed every single task.

100% on easy.

100% on medium.

100% on hard.

We beat the best results from Google, OpenAI, and Anthropic. View the leaderboard here.

Why browser use agents matter

A browser use agent is an agent that can navigate and complete hard tasks on the web like a human. It can read a page, understand its structure and intent, move through multi-step flows, create and log in to accounts, and place orders. It's one of the hardest problems in AI.

For enterprises, a reliable browser use agent can completely automate any procurement process, supplier compliance, MAP enforcement, claims processing, and the thousands of repetitive workflows teams still handle manually. A fleet of capable browser use agents can outperform brittle RPA tools and adapt to changing sites automatically.

For consumers, imagine an agent that can complete complex travel booking tasks end-to-end while you sleep. Or a personal shopper who can find you the best deal, remember all your preferences, and check out with their own debit card. Or a personal assistant that can automatically handle all the life admin tasks that drain your Saturday morning.

Current state of browser use agents

Today's browser use agents are unreliable. They misread layouts, miss buttons, break on filters, and struggle with authentication. And when you try to use them in the wild, the failure rate skyrockets.

Benchmarks help quantify this. The hardest one is Online Mind2Web. Even on the “easy” tasks, no agent has ever hit 100%. On the hard tasks, the best scores plateau around 71%.

Why this problem is so hard

The web is a mess. Every site is a different mix of HTML, CSS, JavaScript, iframes, shadow DOMs, forms, pop-ups, and anti-bot traps. And everything is changing constantly. Stability is the exception, not the rule.

What’s next (and how to get early access to Blink)

Conquering Online Mind2Web is only the beginning.

Blink is a foundational building block for agents. We hope many of you reading this will want to build something incredible on top of it. If you want early access and to follow our progress, sign up for the waitlist below.

- Team Blink