LongCat-Flash-Chat
Overview
Overview
LongCat-Flash-Chat is Meituan's first open-source foundation model, a 560B parameter Mixture-of-Experts (MoE) model that dynamically activates 18.6B-31.3B parameters (~27B average) based on contextual demands. It features Zero-Computation Experts for efficient routing and supports 128K context. Optimized for conversational and agentic tasks, it shows competitive performance across reasoning, coding, instruction following, and domain benchmarks with particular strengths in tool use and complex multi-step interactions. Achieves over 100 tokens per second on H800 GPUs.
LongCat-Flash-Chat was released on August 29, 2025. API access is available through Meituan.
Performance
Timeline
Specifications
Benchmarks
Benchmarks
LongCat-Flash-Chat Performance Across Datasets
Scores sourced from the model's scorecard, paper, or official blog posts
Pricing
Pricing
Pricing, performance, and capabilities for LongCat-Flash-Chat across different providers:
| Provider | Input ($/M) | Output ($/M) | Max Input | Max Output | Latency (s) | Throughput | Quantization | Input | Output |
|---|---|---|---|---|---|---|---|---|---|
Meituan | $0.30 | $1.20 | 128.0K | 128.0K | 3.0 | 100.0 c/s | — | Text Image Audio Video | Text Image Audio Video |
API Access
API Access Coming Soon
API access for LongCat-Flash-Chat will be available soon through our gateway.
Recent Posts
Recent Reviews
FAQ
Common questions about LongCat-Flash-Chat