CoWorkBench

Name: CoWorkBench Leaderboard — AI Model Scores
Creator: LLM Stats
License: https://llm-stats.com/legal/terms-of-service

Progress Over Time

Interactive timeline showing model performance evolution on CoWorkBench

State-of-the-art frontier

Open

Proprietary

CoWorkBench Leaderboard

2 models

				Context	Cost	License
1	Qwen3.7 Max Alibaba Cloud / Qwen Team		—	1.0M	$1.25 / $3.75
2	Qwen3.7-Plus Alibaba Cloud / Qwen Team		—	1.0M	$0.32 / $1.28

Notice missing or incorrect data?

About this benchmark

What is CoWorkBench?

CoWorkBench is Qwen's internal cowork benchmark for evaluating long-horizon office and productivity agent tasks across domains such as computer science, finance, law, and medicine.

CoWorkBench is a text benchmark evaluating models on productivity, reasoning, and agents tasks. LLM Stats tracks 2 models on this benchmark, scored on a 0–1 scale. The current average is 0.7, with the leader at 0.7.

Compare leaders on the best AI for productivity, best AI for reasoning and best AI for agents leaderboards.

Current leaders

Qwen3.7 Max from Alibaba Cloud / Qwen Team currently leads the CoWorkBench leaderboard with a score of 0.672 across 2 evaluated AI models.

Qwen3.7 MaxAlibaba Cloud / Qwen Team67.2%

Qwen3.7-PlusAlibaba Cloud / Qwen Team65.1%

FAQ

Common questions about the CoWorkBench benchmark and leaderboard.

What is the CoWorkBench benchmark?

CoWorkBench is Qwen's internal cowork benchmark for evaluating long-horizon office and productivity agent tasks across domains such as computer science, finance, law, and medicine.

What is the CoWorkBench leaderboard?

The CoWorkBench leaderboard ranks 2 AI models based on their performance on this benchmark. Currently, Qwen3.7 Max by Alibaba Cloud / Qwen Team leads with a score of 0.672. The average score across all models is 0.661.

What is the highest CoWorkBench score?

The highest CoWorkBench score is 0.672, achieved by Qwen3.7 Max from Alibaba Cloud / Qwen Team.

How many models are evaluated on CoWorkBench?

2 models have been evaluated on the CoWorkBench benchmark, with 0 verified results and 2 self-reported results.

What categories does CoWorkBench cover?

CoWorkBench is categorized under productivity, reasoning, and agents. The benchmark evaluates text models.

Which model offers the best value on CoWorkBench?

Among models scoring within 10% of the leader, Qwen3.7-Plus from Alibaba Cloud / Qwen Team is the cheapest, at $0.32 per million input tokens with a score of 0.651.

How recent are the CoWorkBench leaderboard results?

The CoWorkBench leaderboard was last updated in July 2026 and currently includes 2 evaluated models.