Text

Qwen3.5 27B

Alibaba's dense 27 billion parameter language model with native tool calling and MTP speculative decoding

Memory Requirement 18GB RAM
Precision NVFP4 / W4A16
Size 15GB

Jetson Inference - Supported Inference Engines

🚀
Container
# Run Command
sudo docker run -it --rm --pull always --runtime=nvidia --network host ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin vllm serve Kbenkhaled/Qwen3.5-27B-quantized.w4a16 --gpu-memory-utilization 0.8 --enable-prefix-caching --reasoning-parser qwen3 --enable-auto-tool-choice --tool-call-parser qwen3_coder

Performance on Jetson Orin with vLLM

9
output tokens/sec
Single request (concurrency=1)
41
output tokens/sec
8 parallel requests (concurrency=8)

Model Details

Qwen3.5 27B is a dense language model from Alibaba Cloud’s Qwen3.5 family. With 27 billion parameters, it delivers strong performance across complex reasoning, coding, and language understanding tasks.

Inputs and Outputs

Input: Text

Output: Text

Intended Use Cases

  • Reasoning: Advanced logical and analytical reasoning with chain-of-thought
  • Function Calling: Native support for tool use and function calling
  • Multilingual Instruction Following: Following instructions across 100+ languages
  • Code Generation: Programming assistance in multiple languages
  • Translation: High-quality translation between supported languages

Running with vLLM

sudo docker run -it --rm --pull always --runtime=nvidia --network host \
  ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin \
  vllm serve Kbenkhaled/Qwen3.5-27B-quantized.w4a16 \
    --gpu-memory-utilization 0.8 --enable-prefix-caching \
    --reasoning-parser qwen3 \
    --enable-auto-tool-choice --tool-call-parser qwen3_coder

Speculative Decoding with MTP

This model supports Multi-Token Prediction (MTP) speculative decoding, which can significantly improve generation throughput. To enable it, add the following flag to your vllm serve command:

--speculative-config '{"method": "mtp", "num_speculative_tokens": 4}'

Qwen3.5 Family

ModelParametersActive ParamsTypeBest For
Qwen3.5 35B-A3B35B3BMoEEfficient high-performance inference
Qwen3.5 27B27B27BDenseMaximum accuracy on demanding tasks

Additional Resources