ODinW

Object Detection in the Wild (ODinW) benchmark for evaluating object detection models' task-level transfer ability across diverse real-world datasets in terms of prediction accuracy and adaptation efficiency

Paper

Progress Over Time

Interactive timeline showing model performance evolution on ODinW

State-of-the-art frontier
Open
Proprietary

ODinW Leaderboard

15 models
ContextCostLicense
1
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
2
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
35B
3
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
236B262K$0.30 / $1.50
4
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
4B262K$0.10 / $0.60
5
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
31B262K$0.20 / $0.70
6
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
33B
7
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
9B262K$0.08 / $0.50
8
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
122B262K$0.40 / $3.20
9
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
236B262K$0.45 / $3.49
10
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
35B262K$0.25 / $2.00
11
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
7B
12
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
31B262K$0.20 / $1.00
13
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
27B262K$0.30 / $2.40
14
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
9B262K$0.18 / $2.09
15
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
4B262K$0.10 / $1.00
Notice missing or incorrect data?

FAQ

Common questions about ODinW

Object Detection in the Wild (ODinW) benchmark for evaluating object detection models' task-level transfer ability across diverse real-world datasets in terms of prediction accuracy and adaptation efficiency
The ODinW paper is available at https://arxiv.org/abs/2112.03857. This paper provides detailed information about the benchmark methodology, dataset creation, and evaluation criteria.
The ODinW leaderboard ranks 15 AI models based on their performance on this benchmark. Currently, Qwen3.6 Plus by Alibaba Cloud / Qwen Team leads with a score of 0.518. The average score across all models is 0.449.
The highest ODinW score is 0.518, achieved by Qwen3.6 Plus from Alibaba Cloud / Qwen Team.
15 models have been evaluated on the ODinW benchmark, with 0 verified results and 15 self-reported results.
ODinW is categorized under vision. The benchmark evaluates image models.