Text

Llama 3.2 3B

Meta's compact 3 billion parameter model, ideal for resource-constrained Jetson deployments

Memory Requirement 4GB RAM
Precision W4A16
Size 2.0GB

Jetson Inference - Supported Inference Engines

πŸš€
Container
# Run Command
sudo docker run -it --rm --pull always --runtime=nvidia --network host ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin vllm serve espressor/meta-llama.Llama-3.2-3B-Instruct_W4A16

Performance on Jetson Orin with vLLM

52.58
output tokens/sec
Single request (concurrency=1)
240.68
output tokens/sec
8 parallel requests (concurrency=8)

Model Details

Meta’s Llama 3.2 3B is a compact yet capable language model optimized for edge deployment. With just 3 billion parameters, it offers an excellent balance between performance and resource efficiency.

Perfect for Jetson Orin Nano and other memory-constrained deployments while still delivering strong instruction-following capabilities.