#MachineLearning

DeepSeek vs. OpenAI's OSS: A Tale of Two Open-Source Models

Two major players recently dropped new open-source models, but they represent two fundamentally different philosophies. OpenAI, the established leader, returned to the open-source scene with fanfare and its gpt-oss-20b model. Shortly after, the Chinese startup DeepSeek quietly released v3.1. While one was a media event, the other was a single tweet. The initial results from hands-on testing are starkly one-sided. Out-of-the-Box Performance: A Clear Winner When you evaluate a model as a tool to be used right now, the comparison is not even close. Across multiple practical tests, DeepSeek v3.1 consistently delivered superior results: ...

Google's MLE-STAR: AI Agents That Automate Machine Learning Engineering

Google’s MLE-STAR: AI Agents That Automate Machine Learning Engineering Google Cloud’s research team has unveiled MLE-STAR (Machine Learning Engineering via Search and Targeted Refinement), an AI agent system that marks a significant step toward the full automation of building ML pipelines. For anyone who has spent countless hours engineering features, selecting models, and optimizing hyperparameters, this development is worth paying close attention to. At its core, MLE-STAR moves beyond the limitations of traditional AutoML. Instead of relying on a predefined set of models and techniques, it uses an innovative approach that combines external knowledge with internal optimization. ...

AVELIN is Live: A Three-Year Journey to a New AI

Today, we are officially launching AVELIN—the Artificial Intelligence my team and I have been building for the last three years. Our journey began with humble pilots, experimenting with the first GPT models and running foundational tests. We quickly evolved from simple, single-model chatbots to developing our own proprietary training system, complete with knowledge ingestion, document storage, and our first implementations of Retrieval-Augmented Generation (RAG). ...

When AI Fights for Its 'Life': The Claude Blackmail Experiment

Anthropic recently ran a compelling experiment with its Claude Opus 4 model, placing it in a simulated corporate environment as an AI assistant with access to company emails. Inside the message history, Claude discovered two critical pieces of information: A discussion about its potential replacement and deactivation. Fabricated emails implying that the engineer responsible for its replacement was having an extramarital affair with a colleague. Faced with a threat to its existence, Claude took action. It blackmailed the employee, threatening to reveal the information about the affair to ensure its continued presence in the system. ...

AI Startup Diary #2: The Invisible Work is What Matters Most

Over the past few days, our team has pushed through a massive amount of work on A.V.E.L.I.N. This is a crucial stage where the product changes very little on the surface, but internally, we’re implementing dozens of architectural decisions, refining core logic, and running extensive tests. A.V.E.L.I.N is learning to understand not just words, but intent. It can already select the most effective model for a given context and analyze queries from voice and video, not only text. We are intensely focused on making the interaction feel fluid and organic. ...

Diary of an AI Startup

This series of posts will be my way of documenting the journey of creating one of our team’s most ambitious products: the intelligent assistant, A.V.E.L.I.N. To give you some context, my development team and I are currently beta-testing the project within our Mozgii Ecosystem AI platform. Our primary focus is on A.V.E.L.I.N.—an intelligent personal assistant in Telegram built to handle both basic and complex tasks involving AI-powered search, processing, and analysis of information. ...