ODinW

Name: ODinW Leaderboard — AI Model Scores
Creator: LLM Stats
License: https://llm-stats.com/legal/terms-of-service

Paper

Progress Over Time

Interactive timeline showing model performance evolution on ODinW

State-of-the-art frontier

Open

Proprietary

ODinW Leaderboard

16 models

			Context	Cost
1	Qwen3.6 Plus Alibaba Cloud / Qwen Team	—	1.0M	$0.50 / $3.00
2	Qwen3.7-Plus Alibaba Cloud / Qwen Team	—	1.0M	$0.32 / $1.28
3	Qwen3.6-35B-A3B Alibaba Cloud / Qwen Team	35B	—	—
4	Qwen3 VL 235B A22B Instruct Alibaba Cloud / Qwen Team	236B	—	—
5	Qwen3 VL 4B Instruct Alibaba Cloud / Qwen Team	4B	262K	$0.10 / $0.60
6	Qwen3 VL 30B A3B Instruct Alibaba Cloud / Qwen Team	31B	—	—
7	Qwen3 VL 32B Instruct Alibaba Cloud / Qwen Team	33B	—	—
8	Qwen3 VL 8B Instruct Alibaba Cloud / Qwen Team	9B	—	—
9	Qwen3.5-122B-A10B Alibaba Cloud / Qwen Team	122B	—	—
10	Qwen3 VL 235B A22B Thinking Alibaba Cloud / Qwen Team	236B	—	—
11	Qwen3.5-35B-A3B Alibaba Cloud / Qwen Team	35B	—	—
12	Qwen2.5-Omni-7B Alibaba Cloud / Qwen Team	7B	—	—
13	Qwen3 VL 30B A3B Thinking Alibaba Cloud / Qwen Team	31B	—	—
14	Qwen3.5-27B Alibaba Cloud / Qwen Team	27B	262K	$0.30 / $2.40
15	Qwen3 VL 8B Thinking Alibaba Cloud / Qwen Team	9B	—	—
16	Qwen3 VL 4B Thinking Alibaba Cloud / Qwen Team	4B	262K	$0.10 / $1.00

Notice missing or incorrect data?

About this benchmark

What is ODinW?

Object Detection in the Wild (ODinW) benchmark for evaluating object detection models' task-level transfer ability across diverse real-world datasets in terms of prediction accuracy and adaptation efficiency

ODinW is a image benchmark evaluating models on vision tasks. LLM Stats tracks 16 models on this benchmark, scored on a 0–1 scale. The current average is 0.5, with the leader at 0.5.

Compare leaders on the best AI for vision leaderboards.

Current leaders

Qwen3.6 Plus from Alibaba Cloud / Qwen Team currently leads the ODinW leaderboard with a score of 0.518 across 16 evaluated AI models.

Qwen3.6 PlusAlibaba Cloud / Qwen Team51.8%

Qwen3.7-PlusAlibaba Cloud / Qwen Team51.1%

Qwen3.6-35B-A3BAlibaba Cloud / Qwen Team50.8%

Source paper

Title: Grounded Language-Image Pre-training
Authors: Liunian Harold Li, Pengchuan Zhang, Haotian Zhang, Jianwei Yang, and 8 others
Published: December 7, 2021
arXiv: 2112.03857

Abstract

This paper presents a grounded language-image pre-training (GLIP) model for learning object-level, language-aware, and semantic-rich visual representations. GLIP unifies object detection and phrase grounding for pre-training. The unification brings two benefits: 1) it allows GLIP to learn from both detection and grounding data to improve both tasks and bootstrap a good grounding model; 2) GLIP can leverage massive image-text pairs by generating grounding boxes in a self-training fashion, making the learned representation semantic-rich. In our experiments, we pre-train GLIP on 27M grounding data, including 3M human-annotated and 24M web-crawled image-text pairs. The learned representations demonstrate strong zero-shot and few-shot transferability to various object-level recognition tasks. 1) When directly evaluated on COCO and LVIS (without seeing any images in COCO during pre-training), GLIP achieves 49.8 AP and 26.9 AP, respectively, surpassing many supervised baselines. 2) After fine-tuned on COCO, GLIP achieves 60.8 AP on val and 61.5 AP on test-dev, surpassing prior SoTA. 3) When transferred to 13 downstream object detection tasks, a 1-shot GLIP rivals with a fully-supervised Dynamic Head. Code is released at https://github.com/microsoft/GLIP.

FAQ

Common questions about the ODinW benchmark and leaderboard.

What is the ODinW benchmark?

What is the ODinW leaderboard?

The ODinW leaderboard ranks 16 AI models based on their performance on this benchmark. Currently, Qwen3.6 Plus by Alibaba Cloud / Qwen Team leads with a score of 0.518. The average score across all models is 0.453.

What is the highest ODinW score?

The highest ODinW score is 0.518, achieved by Qwen3.6 Plus from Alibaba Cloud / Qwen Team.

How many models are evaluated on ODinW?

16 models have been evaluated on the ODinW benchmark, with 0 verified results and 16 self-reported results.

Where can I find the ODinW paper?

The ODinW paper is available at https://arxiv.org/abs/2112.03857. The paper details the methodology, dataset construction, and evaluation criteria.

What categories does ODinW cover?

ODinW is categorized under vision. The benchmark evaluates image models.

What is the best open-source model on ODinW?

Qwen3.6-35B-A3B by Alibaba Cloud / Qwen Team is the top-ranked open-source model on ODinW, with a score of 0.508 (rank #3).

Which model offers the best value on ODinW?

Among models scoring within 10% of the leader, Qwen3 VL 4B Instruct from Alibaba Cloud / Qwen Team is the cheapest, at $0.10 per million input tokens with a score of 0.482.

How recent are the ODinW leaderboard results?

The ODinW leaderboard was last updated in July 2026 and currently includes 16 evaluated models.