Attention in Transformers

Attention in Transformers: Concepts and Code in PyTorch.

The attention mechanism was a breakthrough that led to transformers, the architecture powering large language models like ChatGPT. Transformers, introduced in the 2017 paper: "Attention is All You Need" by Viswani and others, took off because of its highly scalable design.

In this course, you’ll learn how the attention mechanism, a key element of transformer-based LLMs, works and implement it in PyTorch. You'll develop deep intuition about building reliable, functional, and scalable AI applications.

What you will do

Understand the evolution of the attention mechanism, a key breakthrough that led to transformers.
Learn the relationships between word embeddings, positional embeddings, and attention.
Learn about the Query, Key, and Value matrices, and how to produce and use them in attention.
Walk through the math required to calculate self-attention and masked self-attention to learn why and how they work.
Understand the difference between self-attention and masked self-attention and how one is used in the encoder to build context-aware embeddings and the other is used in the decoder for generative outputs.
Learn the details of the encoder-decoder architecture, cross-attention, and multi-head attention and how they are all incorporated into a transformer.
Use PyTorch to code a class that implements self-attention, masked self-attention, and multi-head attention.

Attention in Transformers: Concepts and Code in PyTorch - DeepLearning.AI

Decoder Architecture in Transformers | Step-by-Step from Scratch

What you will do​

What you will do