Text
Qwen3 32B
Alibaba's flagship 32 billion parameter language model for advanced reasoning
Memory Requirement 24GB RAM
Precision W4A16
Size 18GB
Jetson Inference - Supported Inference Engines
🚀
Container # Run Command
sudo docker run -it --rm --pull always --runtime=nvidia --network host ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin vllm serve RedHatAI/Qwen3-32B-quantized.w4a16 Performance on Jetson Orin with vLLM
6.22
output tokens/sec
Single request (concurrency=1)
16.84
output tokens/sec
8 parallel requests (concurrency=8)
Model Details
Qwen3 32B is the flagship dense model in Alibaba Cloud’s Qwen3 family. With 32 billion parameters, it delivers exceptional performance across complex reasoning, coding, and language understanding tasks.
Inputs and Outputs
Input: Text
Output: Text
Intended Use Cases
- Reasoning: Advanced logical and analytical reasoning tasks
- Function Calling: Native support for tool use and function calling
- Subject Matter Experts: Fine-tuning for domain-specific expertise
- Multilingual Instruction Following: Following instructions across 100+ languages
- Translation: High-quality translation between supported languages