Llama 3.2 3B
Meta's compact 3 billion parameter model, ideal for resource-constrained Jetson deployments
Serve the model
Start server
Choose module, then engine and optional parameters on the left, then copy the serve command by clicking the button on the right.
Command
·
No command for this module and engine in model data.
Call the model over Web API
Copy a client command below and paste it into your terminal to make a Web API request to the model you just served.
curl -s http://${JETSON_HOST}:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "espressor/meta-llama.Llama-3.2-3B-Instruct_W4A16",
"messages": [{"role": "user", "content": "Hello!"}]
}' With ollama serve on the Jetson, call from another host (set ${JETSON_HOST} or use the field). Match the model name to what you pulled on device.
curl -s http://${JETSON_HOST}:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama.Llama-3.2-3B-Instruct_W4A16",
"messages": [{"role": "user", "content": "Why is the sky blue?"}]
}' With ollama serve on the Jetson, call from another host (set ${JETSON_HOST} or use the field). Match the model name to what you pulled on device.
curl -s http://${JETSON_HOST}:11434/api/generate -d '{
"model": "meta-llama.Llama-3.2-3B-Instruct_W4A16",
"prompt": "Why is the sky blue?",
"stream": false
}' One-shot inference
Choose a Jetson module, adjust optional parameters, then copy the command to run a single inference on the device.
Command
·Shell
No snippet for this module and type in model data.
Model Details
Meta’s Llama 3.2 3B is a compact yet capable language model optimized for edge deployment. With just 3 billion parameters, it offers an excellent balance between performance and resource efficiency.
Perfect for Jetson Orin Nano and other memory-constrained deployments while still delivering strong instruction-following capabilities.