Step3-VL-10B
Overview
STEP3-VL-10B is a lightweight open-source foundation model designed to redefine the trade-off between compact efficiency and frontier-level multimodal intelligence. Built on a unified, fully unfrozen pre-training strategy on 1.2T multimodal tokens integrating a language-aligned Perception Encoder with a Qwen3-8B decoder. Features Parallel Coordinated Reasoning (PaCoRe) to scale test-time compute for complex perceptual reasoning.
Step3-VL-10B was released on January 15, 2026.
Performance
Timeline
Specifications
Benchmarks
Step3-VL-10B Performance Across Datasets
Scores sourced from the model's scorecard, paper, or official blog posts
Pricing
Pricing, performance, and capabilities for Step3-VL-10B across different providers:
API Access
API Access Coming Soon
API access for Step3-VL-10B will be available soon through our gateway.
Recent Posts
Recent Reviews
FAQ
Common questions about Step3-VL-10B