Text

Qwen3 30B-A3B (MoE)

Alibaba's Mixture-of-Experts model with 30B total / 3B active parameters

Memory Requirement 16GB RAM
Precision W4A16
Size 16GB

Jetson Inference - Supported Inference Engines

🚀
Container
# Run Command
sudo docker run -it --rm --pull always --runtime=nvidia --network host ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin vllm serve RedHatAI/Qwen3-30B-A3B-quantized.w4a16

Performance on Jetson Orin with vLLM

31.43
output tokens/sec
Single request (concurrency=1)
76.69
output tokens/sec
8 parallel requests (concurrency=8)

Model Details

Qwen3 30B-A3B is a Mixture-of-Experts (MoE) model from Alibaba Cloud’s Qwen3 family. It features 30 billion total parameters with only 3 billion active during inference, providing excellent performance with improved efficiency.

Inputs and Outputs

Input: Text

Output: Text

Intended Use Cases

  • Reasoning: Advanced logical and analytical reasoning tasks
  • Function Calling: Native support for tool use and function calling
  • Subject Matter Experts: Fine-tuning for domain-specific expertise
  • Multilingual Instruction Following: Following instructions across 100+ languages
  • Translation: High-quality translation between supported languages