VIBE
Progress Over Time
Interactive timeline showing model performance evolution on VIBE
VIBE Leaderboard
| Context | Cost | License | ||||
|---|---|---|---|---|---|---|
| 1 | MiniMax | 230B | 1.0M | $0.30 / $1.20 |
Sub-benchmarks
VIBE-Pro
VIBE-Pro is an advanced version of the VIBE (Visual & Interactive Benchmark for Execution) benchmark that evaluates LLMs on professional-grade full-stack application development tasks. It measures model performance across complex real-world development scenarios including web, mobile, and backend applications with higher difficulty than the standard VIBE benchmark.
VIBE-V2
VIBE-V2 is an internal benchmark covering pure front-end and full-stack Web, Android, and iOS projects with build-from-scratch tasks. It uses an Agent-as-a-Verifier paradigm to automatically verify program interaction logic and visual output, scoring models through a unified pipeline that includes a requirement set, containerized deployment, and a dynamic interaction environment.
What is VIBE?
Visual Interface Building Evaluation benchmark for UI/app generation
VIBE is a text benchmark evaluating models on code tasks. LLM Stats tracks 1 models on this benchmark, scored on a 0–1 scale. The current average is 0.9, with the leader at 0.9.
Compare leaders on the best AI for code leaderboards.
Current leaders
MiniMax M2.1 from MiniMax currently leads the VIBE leaderboard with a score of 0.886 across 1 evaluated AI models.
FAQ
Common questions about the VIBE benchmark and leaderboard.