Google's EmbeddingGemma: A New Contender for On-Device RAG

I usually default to OpenAI for embeddings, but Google’s new EmbeddingGemma model is a noteworthy development. It’s not just another model; it’s a strategic move that shows real promise for improving Retrieval-Augmented Generation (RAG) pipelines, especially in on-device and edge applications. What is EmbeddingGemma? Google has released EmbeddingGemma as a lightweight, efficient, and multilingual embedding model. At just 308M parameters, it’s designed for high performance in resource-constrained environments. This isn’t just about making a smaller model; it’s about making a capable small model. ...

5 September, 2025 · 2 min · 375 words · Yury Akinin

DeepSeek-V3: A Quiet Release with Impressive Local Performance

DeepSeek has once again followed its “quiet release” strategy, making its new DeepSeek-V3-0324 model available on Hugging Face without any major announcements. Instead of marketing hype, they’ve simply delivered a solution for the community to evaluate. I tested the model locally on a Mac Studio equipped with an M3 Ultra chip and saw impressive performance, generating over 20 tokens per second. This marks a significant acceleration for running capable models on local hardware, making it a viable option for developers. ...

27 March, 2025 · 1 min · 113 words · Yury Akinin