LongCat-Flash-Lite
Overview
Overview
LongCat-Flash-Lite is a lightweight MoE model from Meituan with 68.5B total parameters and only 2.9B-4.5B activated per token. It explores N-gram embedding expansion as a new scaling direction, supporting 256K context length via YaRN. Optimized for agent tooling and programming tasks, achieving 500-700 tokens per second inference speed while maintaining strong performance on coding, math, and agentic benchmarks.
LongCat-Flash-Lite was released on February 5, 2026. API access is available through Meituan.
Performance
Timeline
Specifications
Benchmarks
Benchmarks
LongCat-Flash-Lite Performance Across Datasets
Scores sourced from the model's scorecard, paper, or official blog posts
Pricing
Pricing
Pricing, performance, and capabilities for LongCat-Flash-Lite across different providers:
| Provider | Input ($/M) | Output ($/M) | Max Input | Max Output | Latency (s) | Throughput | Quantization | Input | Output |
|---|---|---|---|---|---|---|---|---|---|
Meituan | $0.10 | $0.40 | 256.0K | 128.0K | 1.5 | 500.0 c/s | — | Text Image Audio Video | Text Image Audio Video |
API Access
API Access Coming Soon
API access for LongCat-Flash-Lite will be available soon through our gateway.
Recent Posts
Recent Reviews
FAQ
Common questions about LongCat-Flash-Lite