The Most Underrated Layer Inside Every AI Model

Jia-Bin Huang

1 0

Published on 4.05.2026

Why does every AI model use normalization? 🤔

Normalization layers are hidden inside every transformer model.
In this video, we explore:
- Why does AI training become unstable without normalization?
- What do normalization layers actually do?
- How simple alternatives can do just as well as normalization layers.

Chapters:
00:00 Introduction
00:35 Gradient updates without normalization
02:45 Types of normalization layers
03:28 Visualization of normalization layers
05:10 S-shaped input-output mapping
06:46 Dynamic Tanh (DyT) layer
07:44 Comparison: DyT versus LayerNorm
09:41 Why DyT works?
10:21 Implementing DyT layer
11:06 Searching for a stronger function
12:07 Comparison: Derf versus DyT and LayerNorm

References:
[Layer Norm] https://arxiv.org/abs/1607.06450
[RMS Norm] https://proceedings.neurips.cc/paper/2019/hash/1e8a19426224ca89e83cef47f1e7f53b-Abstract.html
[Batch Norm] https://arxiv.org/abs/1502.03167
[Instance Norm] https://arxiv.org/abs/1607.08022
[DyT] https://arxiv.org/abs/2503.10622
[Derf] https://arxiv.org/abs/2512.10938

Video made with Manim: https://www.manim.community/