Practice

Google's EmbeddingGemma: A New Contender for On-Device RAG

I usually default to OpenAI for embeddings, but Google’s new EmbeddingGemma model is a noteworthy development. It’s not just another model; it’s a strategic move that shows real promise for improving Retrieval-Augmented Generation (RAG) pipelines, especially in on-device and edge applications. What is EmbeddingGemma? Google has released EmbeddingGemma as a lightweight, efficient, and multilingual embedding model. At just 308M parameters, it’s designed for high performance in resource-constrained environments. This isn’t just about making a smaller model; it’s about making a capable small model. ...

Vector Search Is Reaching Its Limits. Here’s What Comes Next.

Vector databases have become a core component in modern AI, particularly for powering retrieval-augmented generation (RAG) through similarity search. However, as we build more sophisticated applications, the limitations of relying solely on vector representations are becoming clear. From my perspective, the core issue is that advanced AI systems need to understand more than just semantic similarity. They require a richer grasp of data that includes structured attributes, textual precision, and the relationships within and across different modalities like text, images, and video. Relying on basic vector search alone creates significant blind spots. ...

My Take on GPT-5, OpenAI's Strategy, and the Dawn of 'AI Time'

A recent Forbes article by John Sviokla put a name to something many of us in the AI space have been feeling: the shift to AI Time. It’s the idea that the tempo of innovation and organizational operations is no longer dictated by human speed, but by the near-instantaneous cycle of silicon intelligence. OpenAI’s GPT-5 launch is a masterclass in this new reality. It wasn’t a simple model update; it was a multi-front strategic deployment that reshapes the competitive landscape. I see it as a “quadruple play” that establishes a new baseline for the industry. ...

Claude Sonnet 4's 1M Token Window: A Practical Take for Builders

Anthropic just announced a 5x context window increase for Claude Sonnet 4, pushing it to 1 million tokens. While big numbers in AI are common, this move has tangible, practical implications for those of us building complex systems. From my perspective, this isn’t just a quantitative leap; it’s a qualitative one that unlocks a new class of problems we can solve. Moving from File Analysis to System-Level Understanding The ability to load an entire codebase—over 75,000 lines with source files, tests, and docs—into a single prompt is a significant shift. Previously, AI code analysis was often limited to individual files or small modules. We could check for errors or refactor a specific function, but the AI lacked a holistic view. ...

MCP: Common Pitfalls and Why It's the Future of AI Integration

While the Model-Context-Prompt (MCP) framework is a powerful disruption, its implementation comes with challenges. Avoiding common mistakes is critical to harnessing its full potential. Common Mistakes to Avoid 1. Poorly Defined Context The most frequent error is a poorly defined context. The effectiveness of any AI model using MCP is entirely dependent on the quality, clarity, and relevance of the context it receives. Static vs. Dynamic Context: A common mistake is hardcoding static values. Context must be dynamic, reflecting real-time system states to be effective. Data Overload or Underload: Sending too much, too little, or irrelevant data leads to degraded performance and unpredictable outputs. Focus on quality over quantity. 2. Neglecting Security Failure to secure sensitive context information opens the door to significant privacy and compliance risks. It is crucial to enforce strong access controls and data protection from the start, not as an afterthought. ...

Why Docker Calls MCP a 'Security Nightmare'—And How to Fix It

Why Docker Calls MCP a ‘Security Nightmare’—And How to Fix It The Model Context Protocol (MCP) was introduced as a universal standard—the “USB-C for AI applications”—to allow AI agents to seamlessly interact with external tools, APIs, and data. Major players like Microsoft, Google, and OpenAI quickly adopted it, and thousands of MCP server tools emerged. The promise was simple: write an integration once, and any AI agent can use it. ...

How I Hire People for My Team

For me, the key is the person, not the resume. The first things I look at are motivation and energy. If someone is indifferent, it’s an immediate “no,” even if they have the right skills. I need to understand what drives them, why they want to be on the team, and what work means to them. Soft Skills Come First I prioritize understanding how a candidate thinks, communicates, and reacts to change. I look for initiative, a systematic approach, and the ability to take ownership. If a person just waits to be assigned tasks, they are not the right fit for my team. ...

AVELIN is Live: A Three-Year Journey to a New AI

Today, we are officially launching AVELIN—the Artificial Intelligence my team and I have been building for the last three years. Our journey began with humble pilots, experimenting with the first GPT models and running foundational tests. We quickly evolved from simple, single-model chatbots to developing our own proprietary training system, complete with knowledge ingestion, document storage, and our first implementations of Retrieval-Augmented Generation (RAG). ...

A Founder's Diary: 10 Days to the AVELIN Launch

I wanted to share what the final stretch before launching an AI product feels like. With just 10 days until AVELIN goes public, the pressure is immense, and the reality is a mix of controlled chaos and sharp focus. Our team is feeling the strain. As tasks pile up and deadlines loom, fatigue is a real factor. We’re constantly fighting the dilemma of adding ‘just one more thing’ versus locking in the release date and moving forward. ...

AI Startup Diary #2: The Invisible Work is What Matters Most

Over the past few days, our team has pushed through a massive amount of work on A.V.E.L.I.N. This is a crucial stage where the product changes very little on the surface, but internally, we’re implementing dozens of architectural decisions, refining core logic, and running extensive tests. A.V.E.L.I.N is learning to understand not just words, but intent. It can already select the most effective model for a given context and analyze queries from voice and video, not only text. We are intensely focused on making the interaction feel fluid and organic. ...