Text

Llama 3.1 8B

Meta's efficient 8 billion parameter instruction-tuned language model optimized for Jetson

Parameters 8B
Modalities
Text
Context Length 128K
License Llama 3.1 Community License
Precision
W4A16

Serve the model

Start server

Choose module, then engine and optional parameters on the left, then copy the serve command by clicking the button on the right.

Command

·

Call the model over Web API

Copy a client command below and paste it into your terminal to make a Web API request to the model you just served.

curl -s http://${JETSON_HOST}:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w4a16",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

With ollama serve on the Jetson, call from another host (set ${JETSON_HOST} or use the field). Match the model name to what you pulled on device.

curl -s http://${JETSON_HOST}:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Meta-Llama-3.1-8B-Instruct-quantized.w4a16",
    "messages": [{"role": "user", "content": "Why is the sky blue?"}]
  }'

With ollama serve on the Jetson, call from another host (set ${JETSON_HOST} or use the field). Match the model name to what you pulled on device.

curl -s http://${JETSON_HOST}:11434/api/generate -d '{
  "model": "Meta-Llama-3.1-8B-Instruct-quantized.w4a16",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

One-shot inference

Choose a Jetson module, adjust optional parameters, then copy the command to run a single inference on the device.

Command

·Shell


												
											

Meta’s Llama 3.1 8B Instruct is a powerful instruction-tuned language model with 8 billion parameters. This quantized version (W4A16) provides excellent performance while being memory efficient for edge deployment on Jetson devices.

The model excels at following instructions, answering questions, and generating coherent text across a wide range of tasks.