<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Deep Learning on AI Brief | AI-101.tech</title><link>https://AI-101.tech/tags/deep-learning/</link><description>Recent content in Deep Learning on AI Brief | AI-101.tech</description><generator>Hugo</generator><language>en</language><lastBuildDate>Wed, 01 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://AI-101.tech/tags/deep-learning/index.xml" rel="self" type="application/rss+xml"/><item><title>LLM Architecture Deep Dive: From Transformer to MoE Evolution</title><link>https://AI-101.tech/research/2026-04-01-llm-architecture-deep-dive/</link><pubDate>Wed, 01 Apr 2026 00:00:00 +0000</pubDate><guid>https://AI-101.tech/research/2026-04-01-llm-architecture-deep-dive/</guid><description>&lt;h2 id="1-transformer-architecture-the-big-bang-of-modern-ai">1. Transformer Architecture: The Big Bang of Modern AI&lt;/h2>
&lt;p>Before the 2017 publication of &amp;ldquo;Attention Is All You Need,&amp;rdquo; natural language processing (NLP) relied mainly on Recurrent Neural Networks (RNN) and Long Short-Term Memory networks (LSTM). However, RNN&amp;rsquo;s sequential processing created two fatal flaws: first, difficulty capturing long-range semantic dependencies; second, inability to leverage GPU-scale parallel computation. The Transformer changed everything.&lt;/p>
&lt;h3 id="11-the-mathematical-essence-of-attention">1.1 The Mathematical Essence of Attention&lt;/h3>
&lt;p>The soul of the Transformer is &lt;strong>Self-Attention&lt;/strong>. Its core idea: every token in a sequence should determine its own representation based on all other tokens in context.&lt;/p></description></item></channel></rss>