Attention in Transformers
Attention in Transformers: Concepts and Code in PyTorch.
The attention mechanism was a breakthrough that led to transformers, the architecture powering large language models like ChatGPT. Transformers, introduced in the 2017 paper: "Attention is All You Need" by Viswani and others, took off because of its highly scalable design.
In this course, you’ll learn how the attention mechanism, a key element of transformer-based LLMs, works and implement it in PyTorch. You'll develop deep intuition about building reliable, functional, and scalable AI applications.
What you will do
- Understand the evolution of the attention mechanism, a key breakthrough that led to transformers.
- Learn the relationships between word embeddings, positional embeddings, and attention.
- Learn about the Query, Key, and Value matrices, and how to produce and use them in attention.
- Walk through the math required to calculate self-attention and masked self-attention to learn why and how they work.
- Understand the difference between self-attention and masked self-attention and how one is used in the encoder to build context-aware embeddings and the other is used in the decoder for generative outputs.
- Learn the details of the encoder-decoder architecture, cross-attention, and multi-head attention and how they are all incorporated into a transformer.
- Use PyTorch to code a class that implements self-attention, masked self-attention, and multi-head attention.
Attention in Transformers: Concepts and Code in PyTorch - DeepLearning.AI