Gemma 3 12B
Overview
Overview
Gemma 3 12B is a 12-billion-parameter vision-language model from Google, handling text and image input and generating text output. It features a 128K context window, multilingual support, and open weights. Suitable for question answering, summarization, reasoning, and image understanding tasks.
Gemma 3 12B was released on March 12, 2025. API access is available through DeepInfra.
Performance
Timeline
ReleasedUnknown
Knowledge CutoffUnknown
Specifications
Parameters
12.0B
License
Gemma
Training Data
Unknown
Tags
tuning:instruct
Benchmarks
Benchmarks
Gemma 3 12B Performance Across Datasets
Scores sourced from the model's scorecard, paper, or official blog posts
Notice missing or incorrect data?Start an Issue discussion→
Pricing
Pricing
Pricing, performance, and capabilities for Gemma 3 12B across different providers:
| Provider | Input ($/M) | Output ($/M) | Max Input | Max Output | Latency (s) | Throughput | Quantization | Input | Output |
|---|---|---|---|---|---|---|---|---|---|
DeepInfra | $0.05 | $0.10 | 131.1K | 131.1K | 0.2 | 33.0 c/s | — | Text Image Audio Video | Text Image Audio Video |
API Access
API Access Coming Soon
API access for Gemma 3 12B will be available soon through our gateway.
Recent Posts
Recent Reviews
FAQ
Common questions about Gemma 3 12B
Gemma 3 12B was released on March 12, 2025 by Google.
Gemma 3 12B was created by Google.
Gemma 3 12B has 12.0 billion parameters.
Gemma 3 12B is released under the Gemma license.
Yes, Gemma 3 12B is a multimodal model that can process both text and images as input.