Text
Nemotron Nano 9B v2
NVIDIA's efficient 9B hybrid architecture model with Mamba-2 and attention layers
Memory Requirement 12GB RAM
Precision NVFP4
Size 6GB
Jetson Inference - Supported Inference Engines
🚀
Container This model is not supported on this platform.
Model Details
NVIDIA Nemotron Nano 9B v2 is a quantized large language model trained from scratch by NVIDIA, designed as a unified model for both reasoning and non-reasoning tasks. It generates a reasoning trace before concluding with a final response, with configurable reasoning via system prompt.
Architecture
The model uses a hybrid architecture:
- 56 layers total: 27 Mamba layers, 25 MLP layers, 4 attention layers
- NVFP4 quantization with Mamba and MLP layers quantized
- Attention layers and Conv1d components kept in BF16 for accuracy
- Quantization-Aware Distillation (QAD) applied for accuracy recovery
Inputs and Outputs
Input: Text
Output: Text
Intended Use Cases
- AI Agent Systems: Autonomous agents with reasoning capabilities
- Chatbots: General purpose conversational AI
- RAG Systems: Retrieval-augmented generation applications
- Instruction Following: General instruction-following tasks
- Code Generation: Programming assistance in multiple languages
Supported Languages
English, German, Spanish, French, Italian, Japanese, and coding languages.
This model is ready for commercial use.