GLM-4.7-Flash
Overview
GLM-4.7-Flash is a high-speed, cost-efficient variant of GLM-4.7 optimized for fast inference and lower latency. It retains the coding-centric capabilities of GLM-4.7 including thinking before acting, preserved reasoning across turns, and per-request thinking control for speed or accuracy trade-offs. Ideal for applications requiring quick responses while maintaining strong performance on coding, agentic workflows, and general reasoning tasks.
GLM-4.7-Flash was released on January 19, 2026. API access is available through ZAI.
Performance
Timeline
Specifications
Benchmarks
GLM-4.7-Flash Performance Across Datasets
Scores sourced from the model's scorecard, paper, or official blog posts
Pricing
Pricing, performance, and capabilities for GLM-4.7-Flash across different providers:
| Provider | Input ($/M) | Output ($/M) | Max Input | Max Output | Latency (s) | Throughput | Quantization | Input | Output |
|---|---|---|---|---|---|---|---|---|---|
ZAI | $0.07 | $0.40 | 128.0K | 16.4K | 2.0 | 50.0 c/s | — | Text Image Audio Video | Text Image Audio Video |
API Access
API Access Coming Soon
API access for GLM-4.7-Flash will be available soon through our gateway.
Recent Posts
Recent Reviews
Blog Posts
FAQ
Common questions about GLM-4.7-Flash
