Text

GPT OSS 20B

OpenAI's open-source 20 billion parameter language model

Memory Requirement 16GB RAM
Precision W4A16
Size 12GB

Jetson Inference - Supported Inference Engines

🚀
Container
# Installation
mkdir -p $HOME/.cache/tiktoken
wget -q https://openaipublic.blob.core.windows.net/encodings/cl100k_base.tiktoken \
  -O $HOME/.cache/tiktoken/cl100k_base.tiktoken
wget -q https://openaipublic.blob.core.windows.net/encodings/o200k_base.tiktoken \
  -O $HOME/.cache/tiktoken/o200k_base.tiktoken

# Run Command
sudo docker run -it --rm --pull always --runtime=nvidia --network host \
  -v $HOME/.cache/huggingface:/root/.cache/huggingface \
  -v $HOME/.cache/tiktoken:/etc/encodings \
  -e TIKTOKEN_ENCODINGS_BASE=/etc/encodings \
  ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin \
  vllm serve openai/gpt-oss-20b --gpu-memory-utilization 0.8

Model Details

OpenAI GPT OSS 20B is OpenAI’s open-source 20 billion parameter language model. This model requires tiktoken encodings to be downloaded before serving.

Running with vLLM

Step 1: Download Tiktoken Encodings

mkdir -p $HOME/.cache/tiktoken
wget -q https://openaipublic.blob.core.windows.net/encodings/cl100k_base.tiktoken \
  -O $HOME/.cache/tiktoken/cl100k_base.tiktoken
wget -q https://openaipublic.blob.core.windows.net/encodings/o200k_base.tiktoken \
  -O $HOME/.cache/tiktoken/o200k_base.tiktoken

Step 2: Serve

sudo docker run -it --rm --pull always --runtime=nvidia --network host \
  -v $HOME/.cache/huggingface:/root/.cache/huggingface \
  -v $HOME/.cache/tiktoken:/etc/encodings \
  -e TIKTOKEN_ENCODINGS_BASE=/etc/encodings \
  ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin \
  vllm serve openai/gpt-oss-20b --gpu-memory-utilization 0.8

GPT OSS Family

ModelParametersMemoryMinimum Jetson
GPT OSS 20B20B16GB RAMAGX Orin
GPT OSS 120B120B64GB RAMThor