New Text

Qwen3.6 35B-A3B (MoE)

Alibaba's latest Mixture-of-Experts model with 35B total / 3B active parameters, featuring native tool calling and MTP speculative decoding

Parameters 24GB
Modalities
Text
Precision
NVFP4 AWQ-4bit

Serve the model

Start server

Choose module, then engine and optional parameters on the left, then copy the serve command by clicking the button on the right.

Command

·

Call the model over Web API

Copy a client command below and paste it into your terminal to make a Web API request to the model you just served.

curl -s http://${JETSON_HOST}:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3.6-35B-A3B",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Model Details

Qwen3.6 35B-A3B is a Mixture-of-Experts (MoE) model from Alibaba Cloud’s Qwen3.6 family. It features 35 billion total parameters with only 3 billion active during inference, delivering strong performance with excellent efficiency on edge devices.

Inputs and Outputs

Input: Text

Output: Text

Intended Use Cases

  • Reasoning: Advanced logical and analytical reasoning with chain-of-thought
  • Function Calling: Native support for tool use and function calling
  • Multilingual Instruction Following: Following instructions across 100+ languages
  • Code Generation: Programming assistance in multiple languages
  • Translation: High-quality translation between supported languages

Running with vLLM

sudo docker run -it --rm --pull always --runtime=nvidia --network host \
  ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin \
  vllm serve cyankiwi/Qwen3.6-35B-A3B-AWQ-4bit \
    --gpu-memory-utilization 0.8 --enable-prefix-caching \
    --reasoning-parser qwen3 \
    --enable-auto-tool-choice --tool-call-parser qwen3_coder \
    --max-model-len 4096

Speculative Decoding with MTP

This model supports Multi-Token Prediction (MTP) speculative decoding, which can significantly improve generation throughput. To enable it, add the following flag to your vllm serve command:

--speculative-config '{"method": "mtp", "num_speculative_tokens": 4}'

Qwen3.6 Family

ModelParametersActive ParamsTypeBest For
Qwen3.6 35B-A3B35B3BMoEEfficient high-performance inference
Qwen3.6 27B27B27BDenseMaximum accuracy on demanding tasks

Additional Resources