Text

Nemotron Nano 9B v2

NVIDIA's efficient 9B hybrid architecture model with Mamba-2 and attention layers

Memory Requirement 12GB RAM
Precision NVFP4
Size 6GB

Jetson Inference - Supported Inference Engines

🚀
Container

This model is not supported on this platform.

Model Details

NVIDIA Nemotron Nano 9B v2 is a quantized large language model trained from scratch by NVIDIA, designed as a unified model for both reasoning and non-reasoning tasks. It generates a reasoning trace before concluding with a final response, with configurable reasoning via system prompt.

Architecture

The model uses a hybrid architecture:

  • 56 layers total: 27 Mamba layers, 25 MLP layers, 4 attention layers
  • NVFP4 quantization with Mamba and MLP layers quantized
  • Attention layers and Conv1d components kept in BF16 for accuracy
  • Quantization-Aware Distillation (QAD) applied for accuracy recovery

Inputs and Outputs

Input: Text

Output: Text

Intended Use Cases

  • AI Agent Systems: Autonomous agents with reasoning capabilities
  • Chatbots: General purpose conversational AI
  • RAG Systems: Retrieval-augmented generation applications
  • Instruction Following: General instruction-following tasks
  • Code Generation: Programming assistance in multiple languages

Supported Languages

English, German, Spanish, French, Italian, Japanese, and coding languages.

This model is ready for commercial use.