Text

Qwen3 32B

Alibaba's flagship 32 billion parameter language model for advanced reasoning

Memory Requirement 24GB RAM
Precision W4A16
Size 18GB

Jetson Inference - Supported Inference Engines

🚀
Container
# Run Command
sudo docker run -it --rm --pull always --runtime=nvidia --network host ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin vllm serve RedHatAI/Qwen3-32B-quantized.w4a16

Performance on Jetson Orin with vLLM

6.22
output tokens/sec
Single request (concurrency=1)
16.84
output tokens/sec
8 parallel requests (concurrency=8)

Model Details

Qwen3 32B is the flagship dense model in Alibaba Cloud’s Qwen3 family. With 32 billion parameters, it delivers exceptional performance across complex reasoning, coding, and language understanding tasks.

Inputs and Outputs

Input: Text

Output: Text

Intended Use Cases

  • Reasoning: Advanced logical and analytical reasoning tasks
  • Function Calling: Native support for tool use and function calling
  • Subject Matter Experts: Fine-tuning for domain-specific expertise
  • Multilingual Instruction Following: Following instructions across 100+ languages
  • Translation: High-quality translation between supported languages