Text
Nemotron3 Nano 4B
NVIDIA's compact 4B Nano model with day-0 llama.cpp support on Jetson Orin and Thor
Memory Requirement 4GB RAM
Precision Q4_K_M GGUF
Size 2.5GB
Jetson Inference - Supported Inference Engines
🚀
Container # Run Command
sudo docker run -it --rm --pull always --runtime=nvidia --network host -v $HOME/.cache/huggingface:/root/.cache/huggingface ghcr.io/nvidia-ai-iot/llama_cpp:latest-jetson-orin llama-server --hf-repo nvidia/NVIDIA-Nemotron-3-Nano-4B-GGUF --hf-file NVIDIA-Nemotron3-Nano-4B-Q4_K_M.gguf --ctx-size 8196 --alias my_model --n-gpu-layers 999 Model Details
Nemotron3 Nano 4B is a compact NVIDIA language model that can be served locally on Jetson with llama.cpp, giving Jetson Orin and Jetson Thor day-0 support through a simple OpenAI-compatible llama-server workflow.
Inputs and Outputs
Input: Text
Output: Text
Supported Platforms
- Jetson Orin
- Jetson Thor
Inference Engine
This model is currently configured for llama.cpp using the GGUF checkpoint NVIDIA-Nemotron3-Nano-4B-Q4_K_M.gguf.
Notes
- The provided command uses
--alias my_model; you can change that alias to match your application if needed. --n-gpu-layers 999keeps the full model on GPU when memory allows for best performance.