Text
Gemma 4 E4B
Google's Gemma 4 E4B variant with Q4_K_M GGUF support on Jetson through llama.cpp
Memory Requirement 8GB RAM
Precision Q4_K_M GGUF
Size 5.3GB
Jetson Inference - Supported Inference Engines
🚀
Container # Run Command
sudo docker run -it --rm --pull always --runtime=nvidia --network host -v $HOME/.cache/huggingface:/root/.cache/huggingface ghcr.io/nvidia-ai-iot/llama_cpp:gemma4-jetson-orin llama-server -hf ggml-org/gemma-4-E4B-it-GGUF:Q4_K_M Model Details
Gemma 4 E4B is a lightweight Gemma 4 model that can be served locally on Jetson with llama.cpp. In Google’s launch material, E4B is framed as the stronger edge-focused sibling to E2B, combining on-device efficiency with materially better coding, reasoning, and multimodal performance.
- Local coding assistants on Orin Nano, Orin NX, or AGX Orin
- Multimodal document and screen-understanding with optional voice input
- Tool-using assistants that need better reasoning than E2B
- A balanced default for edge AI demos or products that need better quality without moving to the larger models
Inputs and Outputs
Input: Text, image, and audio
Output: Text
Supported Platforms
- Jetson Orin
- Jetson Thor
Inference Engine
This model is configured to run on Jetson with llama.cpp.
Official Highlights
- Google’s model card describes E4B as a dense multimodal model with 4.5B effective parameters and 8B parameters including embeddings.
- It supports 128K context, text/image/audio input, function calling, and configurable thinking mode.
- In Google’s published benchmark table, E4B lands well above E2B on reasoning, coding, and vision tasks, making it the better general-purpose edge choice when memory allows.
- Like E2B, E4B includes official support for automatic speech recognition and speech translation on short audio clips.