Google's EmbeddingGemma: A New Contender for On-Device RAG

Fri, 05 Sep 2025 19:54:00 +0000

I usually default to OpenAI for embeddings, but Google’s new EmbeddingGemma model is a noteworthy development. It’s not just another model; it’s a strategic move that shows real promise for improving Retrieval-Augmented Generation (RAG) pipelines, especially in on-device and edge applications.

What is EmbeddingGemma?

Google has released EmbeddingGemma as a lightweight, efficient, and multilingual embedding model. At just 308M parameters, it’s designed for high performance in resource-constrained environments. This isn’t just about making a smaller model; it’s about making a capable small model.

DeepSeek-V3: A Quiet Release with Impressive Local Performance

Thu, 27 Mar 2025 11:22:11 +0000

DeepSeek has once again followed its “quiet release” strategy, making its new DeepSeek-V3-0324 model available on Hugging Face without any major announcements. Instead of marketing hype, they’ve simply delivered a solution for the community to evaluate.

I tested the model locally on a Mac Studio equipped with an M3 Ultra chip and saw impressive performance, generating over 20 tokens per second. This marks a significant acceleration for running capable models on local hardware, making it a viable option for developers.

#EdgeAI on Home

Google's EmbeddingGemma: A New Contender for On-Device RAG

What is EmbeddingGemma?

DeepSeek-V3: A Quiet Release with Impressive Local Performance