Text

GPT OSS 120B

OpenAI's open-source 120 billion parameter language model for Jetson Thor

Memory Requirement 64GB RAM
Precision NVFP4
Size 60GB

Jetson Inference - Supported Inference Engines

🚀
Container
# Installation
mkdir -p $HOME/.cache/tiktoken
wget -q https://openaipublic.blob.core.windows.net/encodings/cl100k_base.tiktoken \
  -O $HOME/.cache/tiktoken/cl100k_base.tiktoken
wget -q https://openaipublic.blob.core.windows.net/encodings/o200k_base.tiktoken \
  -O $HOME/.cache/tiktoken/o200k_base.tiktoken

Model Details

OpenAI GPT OSS 120B is OpenAI’s open-source 120 billion parameter language model. Due to its size, this model is exclusively supported on Jetson AGX Thor. It requires tiktoken encodings to be downloaded before serving.

Running with vLLM (Thor Only)

Step 1: Download Tiktoken Encodings

mkdir -p $HOME/.cache/tiktoken
wget -q https://openaipublic.blob.core.windows.net/encodings/cl100k_base.tiktoken \
  -O $HOME/.cache/tiktoken/cl100k_base.tiktoken
wget -q https://openaipublic.blob.core.windows.net/encodings/o200k_base.tiktoken \
  -O $HOME/.cache/tiktoken/o200k_base.tiktoken

Step 2: Serve

sudo docker run -it --rm --pull always --runtime=nvidia --network host \
  -v $HOME/.cache/huggingface:/root/.cache/huggingface \
  -v $HOME/.cache/tiktoken:/etc/encodings \
  -e TIKTOKEN_ENCODINGS_BASE=/etc/encodings \
  ghcr.io/nvidia-ai-iot/vllm:latest-jetson-thor \
  vllm serve openai/gpt-oss-120b --gpu-memory-utilization 0.8

GPT OSS Family

ModelParametersMemoryMinimum Jetson
GPT OSS 20B20B16GB RAMAGX Orin
GPT OSS 120B120B64GB RAMThor