Text

Gemma 3 1B

Google's efficient 1 billion parameter model balancing capability and resource usage

Command to Run on Jetson Model Details

Parameters 1B

Modalities

Text

Context Length 32K

License Gemma Terms of Service

Precision

FP16

Serve the model

Start server

Choose module, then engine and optional parameters on the left, then copy the serve command by clicking the button on the right.

Command

Call the model over Web API

Copy a client command below and paste it into your terminal to make a Web API request to the model you just served.

curl -s http://${JETSON_HOST}:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemma-3-1b-it",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from openai import OpenAI

client = OpenAI(
    base_url="http://${JETSON_HOST}:8000/v1",
    api_key="not-needed",  # vLLM / llama.cpp typically do not enforce a key
)

completion = client.chat.completions.create(
    model="google/gemma-3-1b-it",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(completion.choices[0].message.content)

With ollama serve on the Jetson, call from another host (set ${JETSON_HOST} or use the field). Match the model name to what you pulled on device.

curl -s http://${JETSON_HOST}:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma-3-1b-it",
    "messages": [{"role": "user", "content": "Why is the sky blue?"}]
  }'

With ollama serve on the Jetson, call from another host (set ${JETSON_HOST} or use the field). Match the model name to what you pulled on device.

curl -s http://${JETSON_HOST}:11434/api/generate -d '{
  "model": "gemma-3-1b-it",
  "prompt": "Why is the sky blue?",
  "stream": false
}'

from openai import OpenAI

client = OpenAI(
    base_url="http://${JETSON_HOST}:11434/v1",
    api_key="ollama",  # required by the client; Ollama ignores it
)

completion = client.chat.completions.create(
    model="gemma-3-1b-it",
    messages=[{"role": "user", "content": "Why is the sky blue?"}],
)
print(completion.choices[0].message.content)

import json
import urllib.request

url = "http://${JETSON_HOST}:11434/api/generate"
payload = json.dumps(
    {
        "model": "gemma-3-1b-it",
        "prompt": "Why is the sky blue?",
        "stream": False,
    }
).encode("utf-8")
req = urllib.request.Request(
    url,
    data=payload,
    headers={"Content-Type": "application/json"},
)
with urllib.request.urlopen(req) as resp:
    body = json.load(resp)
    print(body.get("response", body))

One-shot inference

Choose a Jetson module, adjust optional parameters, then copy the command to run a single inference on the device.

Command

·Shell

Model Details

View on HuggingFace

Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma 3 1B (gemma-3-1b-it) is listed here as a text model: it does not provide vision-language / image input support in this catalog. Larger Gemma 3 checkpoints may offer multimodal capabilities separately. Gemma 3 has a large context window, multilingual support in over 140 languages, and is available in more sizes than previous versions. This size is well-suited to text tasks such as question answering, summarization, and reasoning on resource-constrained Jetson devices.

Inputs and outputs

Input:

Text string, such as a question, a prompt, or a document to be summarized
Total input context of 32K tokens for the 1B size

Output:

Generated text in response to the input, such as an answer to a question or a summary of a document
Total output context of 8192 tokens