Text
Qwen3 4B
Alibaba's efficient 4 billion parameter instruction-tuned language model
Memory Requirement 4GB RAM
Precision W4A16
Size 2.5GB
Jetson Inference - Supported Inference Engines
🚀
Container # Run Command
sudo docker run -it --rm --pull always --runtime=nvidia --network host ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin vllm serve RedHatAI/Qwen3-4B-quantized.w4a16 Performance on Jetson Orin with vLLM
42.15
output tokens/sec
Single request (concurrency=1)
193.83
output tokens/sec
8 parallel requests (concurrency=8)
Model Details
Qwen3 is Alibaba Cloud’s latest generation of large language models, offering state-of-the-art performance across a wide range of tasks. The Qwen3 4B model provides an excellent balance of capability and efficiency for edge deployment.
Inputs and Outputs
Input: Text
Output: Text
Intended Use Cases
- Reasoning: Advanced logical and analytical reasoning tasks
- Function Calling: Native support for tool use and function calling
- Subject Matter Experts: Fine-tuning for domain-specific expertise
- Multilingual Instruction Following: Following instructions across 100+ languages
- Translation: High-quality translation between supported languages