Unisound U2 is the next-generation flagship model from Unisound (Yunzhisheng, HKEX: 9678), released June 8, 2026. It is a native agentic Mixture-of-Experts model with 266B total and 10B active parameters, designed to autonomously decompose, execute, verify, and optimize multi-step tasks in a closed loop.

How much does Unisound U2 cost?

U2 pricing is $0.15 per 1M input tokens and $0.30 per 1M output tokensthrough Unisound's MaaS API. Cached input tokens are $0.003 per 1M, a 50x discount that makes repeated long prompts in agent loops very cheap.

What are Unisound U2's benchmark scores?

On LLM Stats' ZeroEval hub, U2 scores 86.9% on GPQA Diamond, 85.8% on MATH-500, 73.3% on AIME 2025, and 73.4% on SWE-bench Verified. It is weaker on grounded factuality (44.3% FACTS Grounding) and adversarial long-context recall (8.5% NoLiMa hard, 16K). All scores are independently verified, not self-reported.

How many parameters does Unisound U2 have?

U2 is a Mixture-of-Experts model with 266 billion total parameters and 10 billion active per token (about 3.8% activation). It was trained on 15 trillion tokens with a knowledge cutoff of January 31, 2026.

Is Unisound U2 multimodal?

No. At launch, U2 is text in, text out. It does not accept image, audio, or video input. Unisound's earlier UniGPT model carried multimodal capabilities, but the U2 release is a text-only reasoning and agent model.

Where can I try Unisound U2?

You can run U2 against any other model for free on the LLM Stats Playground. It is also served directly through Unisound's OpenAI-compatible MaaS API.

Back to blog

Model Release·Technical Deep Dive

Unisound U2: Benchmarks, Pricing, and Full Specs

Unisound U2 is a 266B-total / 10B-active MoE built for agents. Independently verified: 86.9% GPQA Diamond, 85.8% MATH-500, 73.4% SWE-bench Verified, at $0.15/$0.30 per 1M tokens.

Sebastian Crossa

Co-Founder @ LLM Stats

Jun 25, 2026·10 min read

Unisound released U2 on June 8, 2026, its next-generation flagship and the first model the company has positioned squarely as an agent foundation model. Unisound (Yunzhisheng, HKEX: 9678) is a Beijing AI company founded in 2012, best known for conversational and healthcare AI deployed across more than 100 hospitals, and for its earlier 60B-parameter UniGPT model. U2 is a different kind of release: a 266B-total, 10B-active Mixture-of-Experts model built to run long, multi-step workflows on its own.

The headline is intelligence density. U2 posts frontier-class reasoning and coding scores from a small active footprint, and it prices aggressively at $0.15 / $0.30 per million tokens. Every benchmark in this post was run on the LLM Stats ZeroEval hub, so the numbers below are independently verified rather than self-reported by the lab.

Key Numbers

Unisound U2 · June 5, 2026

0.0%

GPQA Diamond

0.0%

MATH-500

0.0%

SWE-bench Verified

0.0%

AIME 2025

Active parameters

$0.00

Input / 1M tokens

266B total parameters, 10B active per token. Frontier-class reasoning and code at a fraction of frontier pricing, with every score above independently verified on the ZeroEval hub.

At a Glance

Release date: June 8, 2026. Announced and released the same day.
Developer: Unisound / Yunzhisheng (HKEX: 9678), Beijing.
Model ID: u2.
Architecture: Mixture-of-Experts, 266B total parameters, 10B active per token.
Training tokens: 15 trillion.
Modalities: Text input, text output. Not multimodal at launch.
Knowledge cutoff: January 31, 2026.
License: Proprietary.
Pricing: $0.15 input / $0.30 output / $0.003 cached input per 1M tokens.
API: OpenAI-compatible MaaS endpoint (maas-api.hivoice.cn), plus LLM Stats.
Context window: Not specified by Unisound at launch.

The Three Advances

Unisound frames U2 around three claims: high intelligence density, high token efficiency, and a native agent architecture. They are tightly related. The first two are two sides of the same engineering bet, that careful data curation and dense semantic representation let a smaller model do the work of a larger one.

What U2 is built on

Unisound · 3 advances

Three bets,
one model.

Intelligence Density

More capability packed into a smaller footprint.

Refined data curation over 15T training tokens
Dense semantic representation per parameter
Frontier-class scores from 10B active parameters

10B

active / 266B total

Token Efficiency

Higher information yield for every token spent.

Fewer tokens to reach the same answer
Lower inference cost without trading away capability
A 50x discount on cached input tokens

$0.15

per 1M input tokens

Native Agent Architecture

Decompose, execute, verify, and optimize in one loop.

Native reasoning-path distillation framework
Closed-loop task execution, not single-shot answers
Built to chain 100+ steps in real-world workflows

100+

steps per workflow

Framing per Unisound's U2 announcement. Step count refers to multi-step agentic workflows the model is designed to run end to end.

The number that anchors all of this is the activation ratio. 10B active out of 266B total means roughly 3.8% of the network fires on any given token, so you pay inference cost closer to a 10B model while drawing on the knowledge of a 266B one. Combined with a per-token price of $0.15 input / $0.30 output, that is the entire pitch: capability that used to require a frontier-tier model, at a price that makes high-volume agent loops economical. The claim only holds if the scores are real, which is where the benchmarks come in.

Benchmarks

Every score below was produced on the LLM Stats ZeroEval hub on June 15, 2026. These are not Unisound's self-reported numbers. We ran the standard subsets and report them as measured, including the ones that are unflattering.

Independently verified

ZeroEval hub · Jun 2026

Frontier reasoning.
Honest weak spots.

Reasoning & Knowledge

GPQA Diamond

diamond

86.9

MATH-500

test

85.8

AIME 2025

2025

73.3

Coding & Agents

SWE-bench Verified

verified

72.2

Terminal-Bench 2.0

full

43.8

Multi-Challenge

default

37.6

Long Context & Grounding

MRCR v2

8-needle · 4–8K

76.6

LongBench v2

short

54.4

FACTS Grounding

default

44.3

HealthBench

hard · mean

19.1

NoLiMa

hard · 16K

8.5

Accuracy unless noted. HealthBench reports mean score on the hard subset. All runs executed on the ZeroEval hub, June 15, 2026. Subsets shown beneath each benchmark.

Where U2 is strong

On structured reasoning and code, U2 lands in frontier territory. 86.9% on GPQA Diamond and 85.8% on MATH-500 are top-tier results, and 73.3% on AIME 2025 confirms it can handle competition-grade math rather than just textbook problems. 73.4% on SWE-bench Verified puts it firmly in the useful-coding-assistant band. For a model with a 10B active footprint, this is the result that makes the intelligence-density claim credible.

Benchmark	Subset	Score
GPQA Diamond	diamond	86.9%
MATH-500	test	85.8%
AIME 2025	2025	73.3%
SWE-bench Verified	verified	73.4%
MRCR v2	8-needle, 4-8K	76.6%

Where U2 is weak

The profile is not uniform, and the soft spots cluster in two places. The first is grounded factuality and instruction following: 44.3% on FACTS Grounding and 37.6% on Multi-Challenge say U2 is more comfortable reasoning toward an answer than staying tightly anchored to supplied facts across many turns. The second is adversarial long-context recall: 54.4% on LongBench v2 (short) is middling, and 8.5% on NoLiMa (hard, 16K) is a genuine weakness, the kind of needle-in-a-haystack stress test where dense distractors break retrieval. HealthBench scores 19.1% on its hard subset, measured as mean score rather than accuracy.

Benchmark	Subset	Score
LongBench v2	short	54.4%
FACTS Grounding	default	44.3%
Terminal-Bench 2.0	full	43.8%
Multi-Challenge	default	37.6%
HealthBench	hard, mean score	19.1%
NoLiMa	hard, 16K	8.5%

The honest read: U2 is a reasoning and coding specialist, not a long-context retrieval engine. If your workload depends on pulling exact facts out of large, noisy contexts, test it carefully before you commit. If your workload is math, code, and structured problem solving, the scores are hard to argue with at this price.

Native Agent Architecture

The third advance is the one Unisound leans on hardest. U2 was trained with what the company calls a native reasoning-path distillation framework, and the practical result is a model built to run a closed loop rather than emit a single answer. Unisound describes U2 as able to autonomously decompose, execute, verify, and optimize a task, and frames it as capable of chaining 100+ steps in complex real-world workflows.

The agent loop

native, not bolted on

It does not answer.
It finishes the task.

Decompose

Break the goal into ordered sub-tasks.

Execute

Run each step with tools and code.

Verify

Check results against the objective.

Optimize

Correct, refine, and re-plan.

Closed loop · repeats until the task is verified complete

Trained with a native reasoning-path distillation framework, U2 runs this loop end to end rather than returning a single-shot response.

The distinction matters for how you use it. A model that is good at single-turn answers still needs an external harness to plan, call tools, check its own work, and retry. U2 is designed to internalize that loop, which is why the strong SWE-bench Verified result (a benchmark that rewards iterating toward a working patch) lines up with the architecture story better than a one-shot Q&A score would. The weaker Terminal-Bench 2.0 number (43.8%) is the counterweight: fully autonomous, open-ended terminal tasks are still hard, and U2 is no exception.

Pricing & Access

Detail	Value
Input price	$0.15 / 1M tokens
Output price	$0.30 / 1M tokens
Cached input	$0.003 / 1M tokens
Total parameters	266B (Mixture-of-Experts)
Active parameters	10B per token
Model ID	`u2`
API	OpenAI-compatible MaaS endpoint
License	Proprietary

At $0.15 input and $0.30 output per million tokens, U2 lands in the cheapest tier of any model posting these reasoning scores, roughly 1 to 3% of what frontier-tier models list for. The detail that matters most for agents is the cache: $0.003 per million cached input tokens is a 50x discount, so a long system prompt or tool spec reused across dozens of loop iterations costs almost nothing after the first call. For multi-step agent harnesses that re-send the same preamble on every turn, cached input becomes the dominant cost lever, and U2 prices it to near zero.

U2 is served through Unisound's OpenAI-compatible MaaS API and is available to run on LLM Stats. There is no public open-weights release at launch; the license is proprietary.

Bottom Line

U2 is a clean expression of one idea: push intelligence density and token efficiency hard enough that a 10B-active model can do frontier-class reasoning and code, then price it so low that running it in long agent loops is a rounding error. On the benchmarks that test that thesis, GPQA Diamond, MATH-500, AIME, and SWE-bench Verified, the verified scores hold up.

The caveats are equally clear. U2 gives ground on grounded factuality and falls off sharply on adversarial long-context recall, and Unisound has not disclosed a context window. This is a reasoning and agent model first. If that matches your workload, U2 is one of the most cost-effective options available right now. If you need reliable recall over large, messy contexts, verify on your own data before you switch. The fastest way to decide is to run it against your current model in the LLM Stats Playground.

Questions

Frequently Asked Questions

Unisound U2 is the next-generation flagship model from Unisound (Yunzhisheng, HKEX: 9678), released June 8, 2026. It is a native agentic Mixture-of-Experts model with 266B total and 10B active parameters, designed to autonomously decompose, execute, verify, and optimize multi-step tasks in a closed loop.
U2 pricing is $0.15 per 1M input tokens and $0.30 per 1M output tokensthrough Unisound's MaaS API. Cached input tokens are $0.003 per 1M, a 50x discount that makes repeated long prompts in agent loops very cheap.
On LLM Stats' ZeroEval hub, U2 scores 86.9% on GPQA Diamond, 85.8% on MATH-500, 73.3% on AIME 2025, and 73.4% on SWE-bench Verified. It is weaker on grounded factuality (44.3% FACTS Grounding) and adversarial long-context recall (8.5% NoLiMa hard, 16K). All scores are independently verified, not self-reported.
U2 is a Mixture-of-Experts model with 266 billion total parameters and 10 billion active per token (about 3.8% activation). It was trained on 15 trillion tokens with a knowledge cutoff of January 31, 2026.
No. At launch, U2 is text in, text out. It does not accept image, audio, or video input. Unisound's earlier UniGPT model carried multimodal capabilities, but the U2 release is a text-only reasoning and agent model.
You can run U2 against any other model for free on the LLM Stats Playground. It is also served directly through Unisound's OpenAI-compatible MaaS API.

At a Glance

The Three Advances

Three bets,one model.

Intelligence Density

Token Efficiency

Native Agent Architecture

Benchmarks

Frontier reasoning.Honest weak spots.

Where U2 is strong

Where U2 is weak

Native Agent Architecture

It does not answer.It finishes the task.

Pricing & Access

Bottom Line

Three bets,
one model.

Frontier reasoning.
Honest weak spots.

It does not answer.
It finishes the task.