Global PIQA

Progress Over Time

Interactive timeline showing model performance evolution on Global PIQA

State-of-the-art frontier
Open
Proprietary

Global PIQA Leaderboard

13 models
ContextCostLicense
1
21.0M$0.50 / $3.00
3
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
1.0M$1.25 / $3.75
4
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
1.0M$0.32 / $1.28
5
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
1.0M$0.50 / $3.00
5
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
397B
7
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
122B
8
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
27B262K$0.30 / $2.40
9
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
35B
10
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
9B
11
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
4B
12
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
2B
13
Alibaba Cloud / Qwen Team
Alibaba Cloud / Qwen Team
800M
Notice missing or incorrect data?
About this benchmark

What is Global PIQA?

Global PIQA is a multilingual commonsense reasoning benchmark that evaluates physical interaction knowledge across 100 languages and cultures. It tests AI systems' understanding of physical world knowledge in diverse cultural contexts through multiple choice questions about everyday situations requiring physical commonsense.

Global PIQA is a text benchmark evaluating models on physics, reasoning, and general tasks. LLM Stats tracks 13 models on this benchmark, scored on a 0–1 scale. The current average is 0.8, with the leader at 0.9.

Compare leaders on the best AI for physics, best AI for reasoning and best AI for general leaderboards.

Current leaders

Gemini 3 Pro from Google currently leads the Global PIQA leaderboard with a score of 0.934 across 13 evaluated AI models.

1Gemini 3 ProGoogle93.4%
2Gemini 3 FlashGoogle92.8%
3Qwen3.7 MaxAlibaba Cloud / Qwen Team91.4%
OSSQwen3.5-397B-A17B#5 open-weight89.8%

FAQ

Common questions about the Global PIQA benchmark and leaderboard.

What is the Global PIQA benchmark?

Global PIQA is a multilingual commonsense reasoning benchmark that evaluates physical interaction knowledge across 100 languages and cultures. It tests AI systems' understanding of physical world knowledge in diverse cultural contexts through multiple choice questions about everyday situations requiring physical commonsense.

What is the Global PIQA leaderboard?

The Global PIQA leaderboard ranks 13 AI models based on their performance on this benchmark. Currently, Gemini 3 Pro by Google leads with a score of 0.934. The average score across all models is 0.847.

What is the highest Global PIQA score?

The highest Global PIQA score is 0.934, achieved by Gemini 3 Pro from Google.

How many models are evaluated on Global PIQA?

13 models have been evaluated on the Global PIQA benchmark, with 0 verified results and 13 self-reported results.

What categories does Global PIQA cover?

Global PIQA is categorized under physics, reasoning, and general. The benchmark evaluates text models with multilingual support.

What's the difference between Global PIQA and PIQA?

Global PIQA is a variant of PIQA. See the PIQA leaderboard for the broader benchmark and per-model comparison.

What is the best open-source model on Global PIQA?

Qwen3.5-397B-A17B by Alibaba Cloud / Qwen Team is the top-ranked open-source model on Global PIQA, with a score of 0.898 (rank #5).

Which model offers the best value on Global PIQA?

Among models scoring within 10% of the leader, Qwen3.5-27B from Alibaba Cloud / Qwen Team is the cheapest, at $0.30 per million input tokens with a score of 0.875.

How recent are the Global PIQA leaderboard results?

The Global PIQA leaderboard was last updated in June 2026 and currently includes 13 evaluated models.