MiMo-V2-Flash
Overview
Overview
MiMo-V2-Flash is a powerful, efficient, and ultra-fast foundation language model that excels in reasoning, coding, and agentic scenarios. It is a Mixture-of-Experts model with 309B total parameters and 15B active parameters, featuring a hybrid attention architecture with sliding-window and full attention (5:1 ratio, 128-token window). Delivers 150 tokens/sec inference with 256k context window.
MiMo-V2-Flash was released on December 16, 2025. API access is available through Xiaomi.
Performance
Timeline
Specifications
Benchmarks
Benchmarks
MiMo-V2-Flash Performance Across Datasets
Scores sourced from the model's scorecard, paper, or official blog posts
Pricing
Pricing
Pricing, performance, and capabilities for MiMo-V2-Flash across different providers:
| Provider | Input ($/M) | Output ($/M) | Max Input | Max Output | Latency (s) | Throughput | Quantization | Input | Output |
|---|---|---|---|---|---|---|---|---|---|
Xiaomi | $0.10 | $0.30 | 256.0K | 16.4K | — | — | — | Text Image Audio Video | Text Image Audio Video |
API Access
API Access Coming Soon
API access for MiMo-V2-Flash will be available soon through our gateway.
Recent Posts
Recent Reviews
FAQ
Common questions about MiMo-V2-Flash