Online Mind2Web is one of the most important benchmarks for evaluating browser use agent performance. It focuses on tasks that take place on actual websites, with real interfaces, and real sources of ambiguity. To complete the benchmark, the agent needs to understand and navigate websites like Airbnb, Booking.com, Target, and hundreds of others, and complete multi-step workflows the same way a human would.
What makes the benchmark especially valuable is the difficulty of its harder tasks. Many of them require precise navigation, contextual understanding, form handling, filtering, and interacting with dynamic content. These are the same challenges that appear in real enterprise and consumer workflows. When an agent completes the full set of tasks in Online Mind2Web, it demonstrates that its capable of more than beating a benchmark. It’s proving that it can handle the kind of complex, unpredictable work that millions of people do in the browser every day.
Blink is the first agent in the world to score 100% on all easy, medium, and hard tasks on Online Mind2Web.
Examples of current Mind2Web tasks that Blink conquered successfully, and Google, Anthropic, and OpenAI agents failed at:
Task #1

Task #2

Task #3

Task # 4

Task # 5



