Text
Llama 3.1 70B
Meta's flagship 70 billion parameter model delivering state-of-the-art performance on Jetson Thor
Memory Requirement 48GB RAM
Precision W4A16
Size 40GB
Jetson Inference - Supported Inference Engines
π
Container # Run Command
sudo docker run -it --rm --pull always --runtime=nvidia --network host ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin vllm serve RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 Performance on Jetson Orin with vLLM
2.93
output tokens/sec
Single request (concurrency=1)
7.38
output tokens/sec
8 parallel requests (concurrency=8)
Model Details
Metaβs Llama 3.1 70B Instruct is the flagship model in the Llama 3.1 family, featuring 70 billion parameters for state-of-the-art performance. This quantized version (W4A16) enables deployment on Jetson Thor.
Ideal for complex reasoning tasks, detailed content generation, and applications requiring the highest quality outputs.