Google Just Changed Embeddings Forever: Gemini Embedding 2 Explained

If you’ve ever built a RAG pipeline, you know the friction: separate models for text, images, audio, and video. You preprocess everything into a common format before you can even start searching. It’s messy, brittle, and expensive to maintain.

Google just shipped something that changes that equation. On March 10, 2026, they released Gemini Embedding 2 — their first natively multimodal embedding model — into public preview. And it’s a meaningful architectural step forward.

What Are Embeddings (And Why Should You Care)?

Embeddings are how AI “understands” content. They convert any piece of data — a sentence, an image, a clip of audio — into a vector of numbers that captures its meaning. When two pieces of content are semantically similar, their vectors sit close together in that mathematical space. That’s the engine behind semantic search, recommendation engines, RAG pipelines, and classification systems.

Most embedding models are unimodal. They’re great at one thing — text, or images — but handling multiple content types requires running different models and hoping the outputs are compatible. That creates significant overhead in real-world applications.

What Gemini Embedding 2 Does Differently

Gemini Embedding 2 maps text, images, video, audio, and documents into a single, unified embedding space. One model. One vector space. No preprocessing required.

Here’s what it supports out of the box:

Text — up to 8,192 input tokens
Images — up to 6 images per request (PNG and JPEG)
Video — up to 120 seconds (MP4 and MOV)
Audio — natively embedded without transcription — the model understands audio directly
Documents/PDFs — up to 6 pages per request
Interleaved inputs — mix multiple modalities in a single request (e.g., image + text together)
100+ languages

That last point about audio is worth pausing on. Previous pipelines required you to transcribe audio to text before embedding it — losing nuance and adding latency. Gemini Embedding 2 ingests audio natively, meaning it can capture tone, pacing, and non-verbal signals that transcription strips out.

Matryoshka Representation Learning: Flexible by Design

Gemini Embedding 2 uses Matryoshka Representation Learning (MRL) — a technique that “nests” information so embeddings can be scaled down without retraining. Think of it like Russian nesting dolls.

The model outputs vectors at 3,072 dimensions by default, but you can compress them to 1,536 or 768 dimensions. Smaller vectors mean lower storage costs and faster retrieval — with a controllable tradeoff on accuracy. Google recommends sticking with 3,072 for highest quality, but for high-volume applications where cost matters, 768 is a viable option.

Seeing It in Action

Google built a lightweight multimodal semantic search demo to show the model’s capabilities. It’s worth a look to understand what cross-modal retrieval actually feels like in practice.

For developers, here’s how you’d embed text, an image, and audio in a single API call:

from google import genai
from google.genai import types

client = genai.Client()

with open("example.png", "rb") as f:
    image_bytes = f.read()

with open("sample.mp3", "rb") as f:
    audio_bytes = f.read()

result = client.models.embed_content(
    model="gemini-embedding-2-preview",
    contents=[
        "What is the meaning of life?",
        types.Part.from_bytes(data=image_bytes, mime_type="image/png"),
        types.Part.from_bytes(data=audio_bytes, mime_type="audio/mpeg"),
    ],
)

print(result.embeddings)

Three modalities, one call, one unified embedding. That’s the shift.

Who Should Be Paying Attention

AI/ML developers: This simplifies multimodal RAG dramatically. One model handles retrieval across all your content types — documents, images, audio recordings, video clips — without separate pipelines or compatibility headaches.

Marketing technologists: Imagine a knowledge base that can semantically search across your PDF decks, recorded webinars, product images, and written content simultaneously. That’s now possible with a single embedding layer.

SaaS and product builders: Multimodal search, recommendation systems, and classification across mixed media — without the architecture overhead of stitching together multiple models.

SEO and content teams: As AI-powered search evolves, how your content gets represented in embedding space matters. Understanding the models that power retrieval is becoming a core skill.

Where It Fits in the Ecosystem

Gemini Embedding 2 is available through both the Gemini API and Vertex AI. It integrates with all the major frameworks and vector stores you’re probably already using:

Interactive notebooks are available on GitHub (Gemini API) and GitHub (Vertex AI) if you want to spin up a test project today.

The Bigger Picture

Single-model multimodal embeddings is where the field has been heading for a while. Google shipping it in a production-ready form — with ecosystem integrations already in place — is a meaningful milestone.

The companies that win at AI-powered search and retrieval over the next few years won’t just be the ones with the most data. They’ll be the ones who can represent that data in the richest, most semantically accurate embedding space. Gemini Embedding 2 lowers the barrier to doing that across every content type you work with.

It’s in public preview with free options available. Worth getting your hands on it now.

Start building:

Google Just Changed Embeddings Forever: What Gemini Embedding 2 Means for AI Builders

What Are Embeddings (And Why Should You Care)?

What Gemini Embedding 2 Does Differently

Matryoshka Representation Learning: Flexible by Design

Seeing It in Action

Who Should Be Paying Attention

Where It Fits in the Ecosystem

The Bigger Picture

Jonathan Alonso

Google Just Changed Embeddings Forever: What Gemini Embedding 2 Means for AI Builders

What Are Embeddings (And Why Should You Care)?

What Gemini Embedding 2 Does Differently

Matryoshka Representation Learning: Flexible by Design

Seeing It in Action

Who Should Be Paying Attention

Where It Fits in the Ecosystem

The Bigger Picture

Jonathan Alonso

Related Articles

Claude Marketplace Is Anthropic’s Real Enterprise Play, Not Just a Plugin Directory

GEO Is Simpler Than People Want to Admit: Do SEO, Build Your Brand, and Show Up Everywhere

Freepik Just Rebranded as Magnific – Why That Matters for Content Creators