Text
Llama 3.1 8B
Meta's efficient 8 billion parameter instruction-tuned language model optimized for Jetson
Memory Requirement 8GB RAM
Precision W4A16
Size 4.5GB
Jetson Inference - Supported Inference Engines
π
Container # Run Command
sudo docker run -it --rm --pull always --runtime=nvidia --network host ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin vllm serve RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w4a16 Performance on Jetson Orin with vLLM
28.14
output tokens/sec
Single request (concurrency=1)
112.33
output tokens/sec
8 parallel requests (concurrency=8)
Model Details
Metaβs Llama 3.1 8B Instruct is a powerful instruction-tuned language model with 8 billion parameters. This quantized version (W4A16) provides excellent performance while being memory efficient for edge deployment on Jetson devices.
The model excels at following instructions, answering questions, and generating coherent text across a wide range of tasks.