SE

đŸŸ© Nemotron 3 Nano Launch, what you need to know

Screenshot 2025-12-18 at 11.12.32 a.m..png


Nemotron 3 Nano is an open 30B model built for agent workflows. It mixes Mamba‑2 and Transformer layers with a Mixture‑of‑Experts router that activates a few experts per token (6 of 128), so you get stronger reasoning without running all 31.6B params every step (~3.6B active). (aka longer context while staying cheap).


Why it matters

  • - It’s fast: up to 4x higher token throughput than Nemotron 2 Nano, and strong throughput versus similar open models in like‑for‑like tests.

  • - It handles long input: up to 1M tokens. Most people will use smaller windows due to VRAM, but training included a 512k stage to keep short‑task accuracy steady.

  • - It’s controllable: toggle reasoning on/off and set a thinking budget to cap “reasoning” tokens. Helps keep multi‑agent costs predictable.


Quality and training

  • - Good results in math and coding benchmarks (GSM8K, MATH, HumanEval, MBPP), plus solid long‑context scores as seen in the scores for the RULER benchmark below, with scores from Nemotron 3 Nano highlighted on the right.

  • - Post‑training uses supervised fine‑tuning, reinforcement learning across multiple environments, and RLHF. NeMo Gym and - NeMo RL are open so you can train and evaluate in similar setups.


  • Screenshot 2025-12-18 at 11.17.37 a.m..png


Open and deployable

  • - Open weights, recipes, and large slices of datasets are released. Licensed for commercial use under NVIDIA’s Open Model License, with derivatives allowed and no claim on outputs. (HuggingFace)

  • - Run it on H100/A100 with vLLM or SGLang, or use llama.cpp/LM Studio for local tests. It’s available via Hugging Face and major inference providers.


TL;DR

  • If you’re building agents that need speed, long context, and cost controls, Nemotron 3 Nano is a good starter. Route harder tasks to a frontier model when needed, and keep routine work on Nano.


13
0comments

Discussion

0