Presenters

Omar Sanseviero

Source

AI Engineer Europe 2026

Gemma 4: The Open Model Revolution You Can Hold in Your Hand! 🚀📱

Hey tech enthusiasts! Ever dreamed of having cutting-edge AI power right in your pocket, running offline, totally customized for your needs? Well, that future just got a whole lot closer! Google DeepMind recently unveiled Gemma 4, their most capable family of open models yet, and it’s set to redefine what’s possible with on-device AI.

Just seven days before this exciting conference, Gemma 4 burst onto the scene, building on a legacy that began with Gemma 3 a year ago. Back then, Gemma 3 models (ranging from 1 billion to 27 billion parameters) were the most capable open models designed to fit on a single consumer GPU. Now, Gemma 4 is here to push those boundaries even further!

What Exactly Are “Open Models,” Anyway? 🤔

Before we dive into Gemma 4’s superpowers, let’s clarify what “open models” mean. These are AI models you can:

Download directly.
Run on your own infrastructure or devices.
Fine-tune for your specific use cases.

It’s all about giving developers and innovators the freedom and flexibility to build incredible things without being tied to external APIs.

Gemma 4: A New Era of On-Device AI ✨

Gemma 4 represents Google’s most capable family of open models ever released. This impressive lineup spans from a compact 2 billion parameters all the way up to a powerful 32 billion parameters, offering a spectrum of capabilities for diverse applications.

Under the Hood: The “E” Factor & Architecture Magic 🛠️

You might have noticed models like “E2B” or “E4B” in the Gemma 4 family. The “E” here stands for effectively. For instance, Gemma E2B actually has around 4 billion parameters, but thanks to a novel architecture called per-layer embeddings, it only loads about 2 billion parameters into the GPU. The rest acts more like a lookup table, residing in slower memory like the CPU or even disk. This architectural decision is a game-changer, highly optimized for on-device mobile use cases, making these models incredibly fast and efficient for phones and other edge devices.

Want to leverage this? Tools like llama.cpp allow you to easily move these per-layer embeddings to the CPU or disk with a simple flag override, making it work seamlessly out of the box.

Multimodal & Multilingual Powerhouse 🌐

Gemma 4 isn’t just about text; it’s a true multimodal marvel:

Multimodal Understanding: The smallest models can process images, videos, and even audio! Imagine speech recognition, or translating spoken Spanish into transcribed French text – all on your device. The larger models offer extremely capable multimodal understanding, discerning fine-grained details in videos and images, like pointing out a specific object (e.g., a llama in a picture) or detecting multiple objects.
Multilingual Prowess: Trained with over 140 languages and utilizing a tokenizer based on Gemini, Gemma 4 is inherently multilingual. This means it excels even with low digital resource languages, like indigenous languages in Peru or official languages in India, making it an incredible tool for global accessibility and innovation.

Unleashing Creativity: Agentic & Coding Capabilities 🤖

Gemma 4 showcases astonishing on-device capabilities, proving that powerful AI agents can live right on your phone:

On-Device Agents: Imagine an AI agent on your Android phone picking a skill to play the piano and then performing it. Or having Gemma code directly on your device, in airplane mode, with no API calls.
Parallel Processing: On a laptop, 10 instances of Gemma can run in parallel, each generating a different SVG image in seconds, all powered by llama.cpp at speeds of 100 tokens per second.
Coding & Android Development: Gemma is a strong coding model, capable of agentic tasks and even Android app development – all offline. Google specifically included Android-related datasets and benchmarks during training to ensure its proficiency.

Community & Ecosystem: The True Force Multiplier 🚀

The impact of Gemma 4 is already staggering. In just a week, Gemma 4 base models hit 10 million downloads, and there are already over 1,000 community-built models based on Gemma 4, including quantizations and fine-tunes. The entire Gemma family has surpassed 500 million downloads and boasts over 100,000 models in total!

Open Source Collaboration 🤝

A major win for the community is the shift to an Apache 2 license for Gemma 4, offering true open-source flexibility. Google DeepMind actively collaborates with the open-source ecosystem, ensuring seamless integration with popular tools like Unsloth, MLX, llama.cpp, Hugging Face, vLLM, and C Lang. This commitment means developers can leverage Gemma’s capabilities within their existing workflows, rather than being forced into new frameworks.

Real-World Impact & Specialized Variants 💡

Beyond the base models, Google has released official Gemma variants:

Shield Gemma: A family of models designed for world-rated production use cases, helping enforce policies against toxic content.
Med-Gemini: A multimodal, Gemma 3-based model tailored for diverse medical tasks like radiology and chest X-ray understanding.

The community is also pushing boundaries:

AI Singapore trains models for Southeast Asian languages.
Sarvam in India is building national models for various official languages.
DeepMind researchers even used a Gemma 3-based model to propose cancer therapy pathways that were later validated in a lab, showcasing AI’s potential for critical scientific advancements.

The Future is Local: Your AI in Your Pocket 📱

While API-based models still offer the most raw intelligence for certain complex tasks, the rapid progress of open models is undeniable. Imagine:

AI assistants in Android Studio working offline.
Chrome extensions powered by Gemma to understand your screen.
AI for legal reviews, finance, or any scenario where data privacy and offline access are paramount.
Someone even got llama.cpp running on a Nintendo Switch to try Gemma!

The speaker enthusiastically recommends spending an hour exploring the latest open models to grasp their incredible capabilities. We’re hurtling towards a future where extremely capable open models will run directly on our devices, customized with our own data for our specific use cases.

The revolution is here, and it’s personal. So, what are you waiting for? Try the models, build something amazing, and share your creations with the world!

Gemma, DeepMind's Family of Open Models — Omar Sanseviero, Google DeepMind

Gemma 4: The Open Model Revolution You Can Hold in Your Hand! 🚀📱

What Exactly Are “Open Models,” Anyway? 🤔

Gemma 4: A New Era of On-Device AI ✨

Under the Hood: The “E” Factor & Architecture Magic 🛠️

Multimodal & Multilingual Powerhouse 🌐

Unleashing Creativity: Agentic & Coding Capabilities 🤖

Community & Ecosystem: The True Force Multiplier 🚀

Open Source Collaboration 🤝

Real-World Impact & Specialized Variants 💡

The Future is Local: Your AI in Your Pocket 📱

Appendix

Gemma 4: The Open Model Revolution You Can Hold in Your Hand! 🚀📱#

What Exactly Are “Open Models,” Anyway? 🤔#

Gemma 4: A New Era of On-Device AI ✨#

Under the Hood: The “E” Factor & Architecture Magic 🛠️#

Multimodal & Multilingual Powerhouse 🌐#

Unleashing Creativity: Agentic & Coding Capabilities 🤖#

Community & Ecosystem: The True Force Multiplier 🚀#

Open Source Collaboration 🤝#

Real-World Impact & Specialized Variants 💡#

The Future is Local: Your AI in Your Pocket 📱#

Appendix#

Gemma 4: The Open Model Revolution You Can Hold in Your Hand! 🚀📱

What Exactly Are “Open Models,” Anyway? 🤔

Gemma 4: A New Era of On-Device AI ✨

Under the Hood: The “E” Factor & Architecture Magic 🛠️

Multimodal & Multilingual Powerhouse 🌐

Unleashing Creativity: Agentic & Coding Capabilities 🤖

Community & Ecosystem: The True Force Multiplier 🚀

Open Source Collaboration 🤝

Real-World Impact & Specialized Variants 💡

The Future is Local: Your AI in Your Pocket 📱

Appendix