BLINK

Name: BLINK Leaderboard — AI Model Scores
Creator: LLM Stats
License: https://llm-stats.com/legal/terms-of-service

Paper

Progress Over Time

Interactive timeline showing model performance evolution on BLINK

State-of-the-art frontier

Open

Proprietary

BLINK Leaderboard

13 models

			Context	Cost
1	Seed 2.1 Pro ByteDance	—	—	—
2	Seed 2.1 Turbo ByteDance	—	—	—
3	Qwen3 VL 235B A22B Instruct Alibaba Cloud / Qwen Team	236B	—	—
4	Qwen3 VL 8B Instruct Alibaba Cloud / Qwen Team	9B	—	—
5	Qwen3 VL 8B Thinking Alibaba Cloud / Qwen Team	9B	—	—
6	Qwen3 VL 32B Thinking Alibaba Cloud / Qwen Team	33B	—	—
7	Qwen3 VL 30B A3B Instruct Alibaba Cloud / Qwen Team	31B	—	—
8	Qwen3 VL 32B Instruct Alibaba Cloud / Qwen Team	33B	—	—
9	Qwen3 VL 235B A22B Thinking Alibaba Cloud / Qwen Team	236B	—	—
10	Qwen3 VL 4B Instruct Alibaba Cloud / Qwen Team	4B	262K	$0.10 / $0.60
11	Qwen3 VL 30B A3B Thinking Alibaba Cloud / Qwen Team	31B	—	—
12	Qwen3 VL 4B Thinking Alibaba Cloud / Qwen Team	4B	262K	$0.10 / $1.00
13	Phi-4-multimodal-instruct Microsoft	6B	—	—

Notice missing or incorrect data?

About this benchmark

What is BLINK?

BLINK: Multimodal Large Language Models Can See but Not Perceive. A benchmark for multimodal language models focusing on core visual perception abilities. Reformats 14 classic computer vision tasks into 3,807 multiple-choice questions paired with single or multiple images and visual prompting. Tasks include relative depth estimation, visual correspondence, forensics detection, multi-view reasoning, counting, object localization, and spatial reasoning that humans can solve 'within a blink'.

BLINK is a multimodal benchmark evaluating models on multimodal, reasoning, spatial reasoning, 3d, and vision tasks. LLM Stats tracks 13 models on this benchmark, scored on a 0–1 scale. The current average is 0.7, with the leader at 0.8.

Compare leaders on the best AI for multimodal, best AI for reasoning, best AI for spatial reasoning, best AI for 3d and best AI for vision leaderboards.

Current leaders

Seed 2.1 Pro from ByteDance currently leads the BLINK leaderboard with a score of 0.814 across 13 evaluated AI models.

Seed 2.1 ProByteDance81.4%

Seed 2.1 TurboByteDance79.4%

Qwen3 VL 235B A22B InstructAlibaba Cloud / Qwen Team70.7%

Source paper

Title: BLINK: Multimodal Large Language Models Can See but Not Perceive
Authors: Xingyu Fu, Yushi Hu, Bangzheng Li, Yu Feng, and 6 others
Published: April 18, 2024
arXiv: 2404.12390

Abstract

We introduce Blink, a new benchmark for multimodal language models (LLMs) that focuses on core visual perception abilities not found in other evaluations. Most of the Blink tasks can be solved by humans "within a blink" (e.g., relative depth estimation, visual correspondence, forensics detection, and multi-view reasoning). However, we find these perception-demanding tasks cast significant challenges for current multimodal LLMs because they resist mediation through natural language. Blink reformats 14 classic computer vision tasks into 3,807 multiple-choice questions, paired with single or multiple images and visual prompting. While humans get 95.70% accuracy on average, Blink is surprisingly challenging for existing multimodal LLMs: even the best-performing GPT-4V and Gemini achieve accuracies of 51.26% and 45.72%, only 13.17% and 7.63% higher than random guessing, indicating that such perception abilities have not "emerged" yet in recent multimodal LLMs. Our analysis also highlights that specialist CV models could solve these problems much better, suggesting potential pathways for future improvements. We believe Blink will stimulate the community to help multimodal LLMs catch up with human-level visual perception.

FAQ

Common questions about the BLINK benchmark and leaderboard.

What is the BLINK benchmark?

What is the BLINK leaderboard?

The BLINK leaderboard ranks 13 AI models based on their performance on this benchmark. Currently, Seed 2.1 Pro by ByteDance leads with a score of 0.814. The average score across all models is 0.689.

What is the highest BLINK score?

The highest BLINK score is 0.814, achieved by Seed 2.1 Pro from ByteDance.

How many models are evaluated on BLINK?

13 models have been evaluated on the BLINK benchmark, with 0 verified results and 13 self-reported results.

Where can I find the BLINK paper?

The BLINK paper is available at https://arxiv.org/abs/2404.12390. The paper details the methodology, dataset construction, and evaluation criteria.

What categories does BLINK cover?

BLINK is categorized under multimodal, reasoning, spatial reasoning, 3d, and vision. The benchmark evaluates multimodal models.

What is the best open-source model on BLINK?

Qwen3 VL 235B A22B Instruct by Alibaba Cloud / Qwen Team is the top-ranked open-source model on BLINK, with a score of 0.707 (rank #3).

How recent are the BLINK leaderboard results?

The BLINK leaderboard was last updated in July 2026 and currently includes 13 evaluated models.