Chris McCormick    Live Walkthroughs    Support My Work    Archive

Watch, Code, Master: ML tutorials that actually work → Start learning today!

Patterns and Messages - Part 2 - Token Communication

In the previous post, we looked at how our tendency to think of Attention in terms of large matrix multiplications obscures some key insights, and we rolled back the GPU optimizations in order to reveal them (recapped in the next section).