OfficeQA Pro

Name: OfficeQA Pro Leaderboard — AI Model Scores
Creator: LLM Stats
License: https://llm-stats.com/legal/terms-of-service

Progress Over Time

Interactive timeline showing model performance evolution on OfficeQA Pro

State-of-the-art frontier

Open

Proprietary

OfficeQA Pro Leaderboard

7 models

			Context	Cost
1	Seed 2.1 Pro ByteDance	—	—	—
2	Seed 2.1 Turbo ByteDance	—	—	—
3	Claude Opus 4.8 Anthropic	—	1.0M	$5.00 / $25.00
4	Kimi K3New Moonshot AI	2.8T	1.0M	$3.00 / $15.00
5	Claude Sonnet 5 Anthropic	—	1.0M	$3.00 / $15.00
6	GPT-5.5 OpenAI	—	1.1M	$5.00 / $30.00
7	MiniMax M3 MiniMax	—	1.0M	$0.30 / $1.20

Notice missing or incorrect data?

About this benchmark

What is OfficeQA Pro?

OfficeQA Pro evaluates AI models on professional knowledge-work questions and tasks drawn from real office workflows, including document analysis, spreadsheet reasoning, and information synthesis across business domains.

OfficeQA Pro is a text benchmark evaluating models on reasoning, general, and agents tasks. LLM Stats tracks 7 models on this benchmark, scored on a 0–1 scale. The current average is 0.6, with the leader at 0.7.

Compare leaders on the best AI for reasoning, best AI for general and best AI for agents leaderboards.

Current leaders

Seed 2.1 Pro from ByteDance currently leads the OfficeQA Pro leaderboard with a score of 0.722 across 7 evaluated AI models.

Seed 2.1 ProByteDance72.2%

Seed 2.1 TurboByteDance71.1%

Claude Opus 4.8Anthropic66.2%

OSS

Kimi K3#4 open-weight63.3%

FAQ

Common questions about the OfficeQA Pro benchmark and leaderboard.

What is the OfficeQA Pro benchmark?

What is the OfficeQA Pro leaderboard?

The OfficeQA Pro leaderboard ranks 7 AI models based on their performance on this benchmark. Currently, Seed 2.1 Pro by ByteDance leads with a score of 0.722. The average score across all models is 0.616.

What is the highest OfficeQA Pro score?

The highest OfficeQA Pro score is 0.722, achieved by Seed 2.1 Pro from ByteDance.

How many models are evaluated on OfficeQA Pro?

7 models have been evaluated on the OfficeQA Pro benchmark, with 0 verified results and 7 self-reported results.

What categories does OfficeQA Pro cover?

OfficeQA Pro is categorized under reasoning, general, and agents. The benchmark evaluates text models.

What is the best open-source model on OfficeQA Pro?

Kimi K3 by Moonshot AI is the top-ranked open-source model on OfficeQA Pro, with a score of 0.633 (rank #4).

Which model offers the best value on OfficeQA Pro?

Among models scoring within 10% of the leader, Claude Opus 4.8 from Anthropic is the cheapest, at $5.00 per million input tokens with a score of 0.662.

How recent are the OfficeQA Pro leaderboard results?

The OfficeQA Pro leaderboard was last updated in July 2026 and currently includes 7 evaluated models.