<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>#LLM on Home</title>
    <link>https://yakinin.com/en/tags/%23llm/</link>
    <description>Recent content in #LLM on Home</description>
    <generator>Hugo -- 0.148.2</generator>
    <language>en</language>
    <lastBuildDate>Wed, 27 Aug 2025 08:45:15 +0000</lastBuildDate>
    <atom:link href="https://yakinin.com/en/tags/%23llm/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>DeepSeek vs. OpenAI&#39;s OSS: A Tale of Two Open-Source Models</title>
      <link>https://yakinin.com/en/posts/20250827-deepseek-vs-openai-open-source-models/</link>
      <pubDate>Wed, 27 Aug 2025 08:45:15 +0000</pubDate>
      <guid>https://yakinin.com/en/posts/20250827-deepseek-vs-openai-open-source-models/</guid>
      <description>&lt;p&gt;Two major players recently dropped new open-source models, but they represent two fundamentally different philosophies. OpenAI, the established leader, returned to the open-source scene with fanfare and its &lt;code&gt;gpt-oss-20b&lt;/code&gt; model. Shortly after, the Chinese startup DeepSeek quietly released &lt;code&gt;v3.1&lt;/code&gt;. While one was a media event, the other was a single tweet.&lt;/p&gt;
&lt;p&gt;The initial results from hands-on testing are starkly one-sided.&lt;/p&gt;
&lt;h2 id=&#34;out-of-the-box-performance-a-clear-winner&#34;&gt;Out-of-the-Box Performance: A Clear Winner&lt;/h2&gt;
&lt;p&gt;When you evaluate a model as a tool to be used right now, the comparison is not even close. Across multiple practical tests, DeepSeek v3.1 consistently delivered superior results:&lt;/p&gt;</description>
    </item>
    <item>
      <title>OpenAI&#39;s GPT-OSS: A Major Step Back Towards &#39;Open&#39;</title>
      <link>https://yakinin.com/en/posts/20250813-openai-gpt-oss-northflank/</link>
      <pubDate>Wed, 13 Aug 2025 15:55:16 +0000</pubDate>
      <guid>https://yakinin.com/en/posts/20250813-openai-gpt-oss-northflank/</guid>
      <description>&lt;p&gt;OpenAI just made a significant move by releasing GPT-OSS, its first truly open-source large language model family since GPT-2. With a permissive Apache 2.0 license, this isn&amp;rsquo;t just a minor release; it&amp;rsquo;s a fundamental shift that puts real power back into the hands of developers.&lt;/p&gt;
&lt;p&gt;The family includes two Mixture-of-Experts (MoE) models, gpt-oss-20b and gpt-oss-120b, designed for high-performance inference with strong reasoning capabilities.&lt;/p&gt;
&lt;h2 id=&#34;why-this-is-a-game-changer&#34;&gt;Why This Is a Game-Changer&lt;/h2&gt;
&lt;p&gt;For years, the most powerful models from OpenAI have been locked behind APIs. This meant dealing with rate limits, opaque pricing, and sending potentially sensitive data to a third party. GPT-OSS changes that equation entirely.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Claude Sonnet 4&#39;s 1M Token Window: A Practical Take for Builders</title>
      <link>https://yakinin.com/en/posts/20250813-claude-sonnet-4-1m-context/</link>
      <pubDate>Wed, 13 Aug 2025 00:00:00 +0000</pubDate>
      <guid>https://yakinin.com/en/posts/20250813-claude-sonnet-4-1m-context/</guid>
      <description>&lt;p&gt;Anthropic just announced a 5x context window increase for Claude Sonnet 4, pushing it to 1 million tokens. While big numbers in AI are common, this move has tangible, practical implications for those of us building complex systems.&lt;/p&gt;
&lt;p&gt;From my perspective, this isn&amp;rsquo;t just a quantitative leap; it&amp;rsquo;s a qualitative one that unlocks a new class of problems we can solve.&lt;/p&gt;
&lt;h3 id=&#34;moving-from-file-analysis-to-system-level-understanding&#34;&gt;Moving from File Analysis to System-Level Understanding&lt;/h3&gt;
&lt;p&gt;The ability to load an entire codebase—over 75,000 lines with source files, tests, and docs—into a single prompt is a significant shift. Previously, AI code analysis was often limited to individual files or small modules. We could check for errors or refactor a specific function, but the AI lacked a holistic view.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Claude Opus 4.1: A Focused Upgrade on Coding and a Measured Stance on Autonomy</title>
      <link>https://yakinin.com/en/posts/20250806-claude-opus-4-1-update/</link>
      <pubDate>Wed, 06 Aug 2025 00:00:00 +0000</pubDate>
      <guid>https://yakinin.com/en/posts/20250806-claude-opus-4-1-update/</guid>
      <description>&lt;p&gt;Anthropic has released Claude Opus 4.1, an incremental but important update that sharpens its flagship model&amp;rsquo;s capabilities in specific, high-value areas: agentic tasks, real-world coding, and reasoning. This isn&amp;rsquo;t a complete overhaul, but a focused enhancement for professional and development use cases.&lt;/p&gt;
&lt;h2 id=&#34;enhanced-coding-and-reasoning&#34;&gt;Enhanced Coding and Reasoning&lt;/h2&gt;
&lt;p&gt;The primary upgrade is in coding performance. Opus 4.1 achieves a 74.5% score on the SWE-bench Verified benchmark. Digging into the technical details, it solved an average of 18.4 problems on the hard subset, up from 16.6 for Claude Opus 4.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Deep Research: From Information Hunter to Strategic Co-Pilot</title>
      <link>https://yakinin.com/en/posts/20250414-deep-research-science-impact/</link>
      <pubDate>Mon, 14 Apr 2025 00:00:00 +0000</pubDate>
      <guid>https://yakinin.com/en/posts/20250414-deep-research-science-impact/</guid>
      <description>&lt;h2 id=&#34;your-thought-process-packaged&#34;&gt;Your Thought Process, Packaged&lt;/h2&gt;
&lt;p&gt;Deep Research isn&amp;rsquo;t just another AI feature; it&amp;rsquo;s a fundamental shift toward an agent-based architecture. In this model, the LLM stops being a simple chatbot and becomes a co-author—an agent that independently searches, filters, validates, and structures information.&lt;/p&gt;
&lt;p&gt;What does this change? If you&amp;rsquo;re designing a business, a startup, or a product, you don&amp;rsquo;t have time to personally read 200 sources. Now, an AI agent does it for you. This frees you up to do the high-value work: &lt;strong&gt;to think, not just to search.&lt;/strong&gt;&lt;/p&gt;</description>
    </item>
    <item>
      <title>DeepSeek-V3: A Quiet Release with Impressive Local Performance</title>
      <link>https://yakinin.com/en/posts/20250801-deepseek-v3-local-performance/</link>
      <pubDate>Thu, 27 Mar 2025 11:22:11 +0000</pubDate>
      <guid>https://yakinin.com/en/posts/20250801-deepseek-v3-local-performance/</guid>
      <description>&lt;div style=&#34;display: flex; justify-content: center; gap: 1em; flex-wrap: wrap;&#34;&gt;
  &lt;img src=&#34;https://yakinin.com/img/20250801-deepseek-v3-local-performance-0.jpg&#34; style=&#34;max-width: 350px; width: 100%;&#34; /&gt;
&lt;/div&gt;
&lt;p&gt;DeepSeek has once again followed its &amp;ldquo;quiet release&amp;rdquo; strategy, making its new DeepSeek-V3-0324 model available on Hugging Face without any major announcements. Instead of marketing hype, they&amp;rsquo;ve simply delivered a solution for the community to evaluate.&lt;/p&gt;
&lt;p&gt;I tested the model locally on a Mac Studio equipped with an M3 Ultra chip and saw impressive performance, generating over 20 tokens per second. This marks a significant acceleration for running capable models on local hardware, making it a viable option for developers.&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
