stepfun logo

Step3-VL-10B

Overview

STEP3-VL-10B is a lightweight open-source foundation model designed to redefine the trade-off between compact efficiency and frontier-level multimodal intelligence. Built on a unified, fully unfrozen pre-training strategy on 1.2T multimodal tokens integrating a language-aligned Perception Encoder with a Qwen3-8B decoder. Features Parallel Coordinated Reasoning (PaCoRe) to scale test-time compute for complex perceptual reasoning.

Step3-VL-10B was released on January 15, 2026.

Performance

Timeline

ReleasedUnknown
Knowledge CutoffUnknown

Specifications

Parameters
10.0B
License
Apache 2.0
Training Data
Unknown

Benchmarks

Step3-VL-10B Performance Across Datasets

Scores sourced from the model's scorecard, paper, or official blog posts

LLM Stats Logollm-stats.com - Mon Jan 19 2026
Notice missing or incorrect data?Start an Issue discussion

Pricing

Pricing, performance, and capabilities for Step3-VL-10B across different providers:

No pricing information available for this model.

API Access

API Access Coming Soon

API access for Step3-VL-10B will be available soon through our gateway.

Recent Posts

Recent Reviews

FAQ

Common questions about Step3-VL-10B

Step3-VL-10B was released on January 15, 2026.
Step3-VL-10B has 10.0 billion parameters.