Skip to main content

Table of contents

Preface -

Notation -


3. Linear Neural Networks

3.1. Linear Regression

3.2. Linear Regression Implementation from Scratch

3.3. Concise Implementation of Linear Regression

3.4. Softmax Regression

3.5. The Image Classification Dataset

3.6. Implementation of Softmax Regression from Scratch

3.7. Concise Implementation of Softmax Regression

4. Multilayer Perceptrons

4.1. Multilayer Perceptrons

4.2. Implementation of Multilayer Perceptrons from Scratch

4.3. Concise Implementation of Multilayer Perceptrons

4.4. Model Selection, Underfitting, and Overfitting

4.5. Weight Decay

4.6. Dropout

4.7. Forward Propagation, Backward Propagation, and Computational Graphs

4.8. Numerical Stability and Initialization

4.9. Environment and Distribution Shift

4.10. Predicting House Prices on Kaggle

5. Deep Learning Computation

5.1. Layers and Blocks

5.2. Parameter Management

5.3. Deferred Initialization

5.4. Custom Layers

5.5. File I/O

5.6. GPUs

6. Convolutional Neural Networks

6.1. From Fully-Connected Layers to Convolutions

6.2. Convolutions for Images

6.3. Padding and Stride

6.4. Multiple Input and Multiple Output Channels

6.5. Pooling

6.6. Convolutional Neural Networks (LeNet)

7. Modern Convolutional Neural Networks

7.1. Deep Convolutional Neural Networks (AlexNet)

7.2. Networks Using Blocks (VGG)

7.3. Network in Network (NiN)

7.4. Networks with Parallel Concatenations (GoogLeNet)

7.5. Batch Normalization

7.6. Residual Networks (ResNet)

7.7. Densely Connected Networks (DenseNet)

8. Recurrent Neural Networks

8.1. Sequence Models

8.2. Text Preprocessing

8.3. Language Models and the Dataset

8.4. Recurrent Neural Networks

8.5. Implementation of Recurrent Neural Networks from Scratch

8.6. Concise Implementation of Recurrent Neural Networks

8.7. Backpropagation Through Time

9. Modern Recurrent Neural Networks

9.1. Gated Recurrent Units (GRU)

9.2. Long Short-Term Memory (LSTM)

9.3. Deep Recurrent Neural Networks

9.4. Bidirectional Recurrent Neural Networks

9.5. Machine Translation and the Dataset

9.6. Encoder-Decoder Architecture

9.7. Sequence to Sequence Learning

9.8. Beam Search

10. Attention Mechanisms

10.1. Attention Cues

10.2. Attention Pooling: Nadaraya-Watson Kernel Regression

10.3. Attention Scoring Functions

10.4. Bahdanau Attention

10.5. Multi-Head Attention

10.6. Self-Attention and Positional Encoding

10.7. Transformer

11. Optimization Algorithms

11.1. Optimization and Deep Learning

11.2. Convexity

11.3. Gradient Descent

11.4. Stochastic Gradient Descent

11.5. Minibatch Stochastic Gradient Descent

11.6. Momentum

11.7. Adagrad

11.8. RMSProp

11.9. Adadelta

11.10. Adam

11.11. Learning Rate Scheduling

12. Computational Performance

12.1. Compilers and Interpreters

12.2. Asynchronous Computation

12.3. Automatic Parallelism

12.4. Hardware

12.5. Training on Multiple GPUs

12.6. Concise Implementation for Multiple GPUs

12.7. Parameter Servers

16. Recommender Systems

16.1. Overview of Recommender Systems

16.2. The MovieLens Dataset

16.3. Matrix Factorization

16.4. AutoRec: Rating Prediction with Autoencoders

16.5. Personalized Ranking for Recommender Systems

16.6. Neural Collaborative Filtering for Personalized Ranking

16.7. Sequence-Aware Recommender Systems

16.8. Feature-Rich Recommender Systems

16.9. Factorization Machines

16.10. Deep Factorization Machines

17. Generative Adversarial Networks

17.1. Generative Adversarial Networks

17.2. Deep Convolutional Generative Adversarial Networks

Learning To See

The Math of Intelligence

Machine Learning with Python

AI Conferences

Artificial Intelligence Search Methods For Problem Solving