Text

Qwen3 30B-A3B (MoE)

Alibaba's Mixture-of-Experts model with 30B total / 3B active parameters

Parameters 30B total / 3.3B activated
Modalities
Text
Context Length 128K
License Apache 2.0
Precision
W4A16

Serve the model

Start server

Choose module, then engine and optional parameters on the left, then copy the serve command by clicking the button on the right.

Command

ยท

Call the model over Web API

Copy a client command below and paste it into your terminal to make a Web API request to the model you just served.

curl -s http://${JETSON_HOST}:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "RedHatAI/Qwen3-30B-A3B-quantized.w4a16",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Benchmark

Qwen3-30B-A3B  · vLLM  · NVFP4 / W4A16 · ISL 2048 / OSL 128

Engine
Concurrency

C = concurrent requests. Results will vary with image, clocks, and workload.

Model Details

Qwen3 30B-A3B is a Mixture-of-Experts (MoE) model from Alibaba Cloudโ€™s Qwen3 family. It features 30 billion total parameters with only 3 billion active during inference, providing excellent performance with improved efficiency.

Inputs and Outputs

Input: Text

Output: Text

Intended Use Cases

  • Reasoning: Advanced logical and analytical reasoning tasks
  • Function Calling: Native support for tool use and function calling
  • Subject Matter Experts: Fine-tuning for domain-specific expertise
  • Multilingual Instruction Following: Following instructions across 100+ languages
  • Translation: High-quality translation between supported languages