ResearchClawBench

Name: ResearchClawBench Leaderboard — AI Model Scores
Creator: LLM Stats
License: https://llm-stats.com/legal/terms-of-service

Implementation

Progress Over Time

Interactive timeline showing model performance evolution on ResearchClawBench

State-of-the-art frontier

Open

Proprietary

ResearchClawBench Leaderboard

1 models

				Context	Cost	License
1	MiMo-V2.5 Xiaomi		311B	1.0M	$0.17 / $0.34

Notice missing or incorrect data?

About this benchmark

What is ResearchClawBench?

ResearchClawBench evaluates research agents on realistic, tool-using research tasks that require code execution and filesystem workspace interaction.

ResearchClawBench is a text benchmark evaluating models on research, agents, and tool calling tasks. LLM Stats tracks 1 models on this benchmark, scored on a 0–1 scale. The current average is 0.2, with the leader at 0.2.

Compare leaders on the best AI for research, best AI for agents and best AI for tool calling leaderboards.

Current leaders

MiMo-V2.5 from Xiaomi currently leads the ResearchClawBench leaderboard with a score of 0.169 across 1 evaluated AI models.

MiMo-V2.5Xiaomi16.9%

FAQ

Common questions about the ResearchClawBench benchmark and leaderboard.

What is the ResearchClawBench benchmark?

ResearchClawBench evaluates research agents on realistic, tool-using research tasks that require code execution and filesystem workspace interaction.

What is the ResearchClawBench leaderboard?

The ResearchClawBench leaderboard ranks 1 AI models based on their performance on this benchmark. Currently, MiMo-V2.5 by Xiaomi leads with a score of 0.169. The average score across all models is 0.169.

What is the highest ResearchClawBench score?

The highest ResearchClawBench score is 0.169, achieved by MiMo-V2.5 from Xiaomi.

How many models are evaluated on ResearchClawBench?

1 models have been evaluated on the ResearchClawBench benchmark, with 0 verified results and 1 self-reported results.

Where can I find the ResearchClawBench dataset?

The ResearchClawBench dataset is available on HuggingFace at https://huggingface.co/datasets/InternScience/ResearchClawBench.

What categories does ResearchClawBench cover?

ResearchClawBench is categorized under research, agents, and tool calling. The benchmark evaluates text models.

What is the best open-source model on ResearchClawBench?

MiMo-V2.5 by Xiaomi is the top-ranked open-source model on ResearchClawBench, with a score of 0.169 (rank #1).

Which model offers the best value on ResearchClawBench?

Among models scoring within 10% of the leader, MiMo-V2.5 from Xiaomi is the cheapest, at $0.17 per million input tokens with a score of 0.169.

How recent are the ResearchClawBench leaderboard results?

The ResearchClawBench leaderboard was last updated in July 2026 and currently includes 1 evaluated models.