1. Network’s Building Blocks

1. Network’s Building Blocks#

A deep neural network is composed of multiple layers that perform both linear and nonlinear operations. Let’s take a look at the layers of AlexNet to see what makes up its architecture by printing them out:

import torchvision
torchvision.models.alexnet(weights=None)
AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace=True)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace=True)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace=True)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace=True)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
  (classifier): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
    (2): ReLU(inplace=True)
    (3): Dropout(p=0.5, inplace=False)
    (4): Linear(in_features=4096, out_features=4096, bias=True)
    (5): ReLU(inplace=True)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

We can observe that AlexNet’s architecture is made up of six different types of layers:

  • Conv2d: A convolutional layer that extracts features from the input image.

  • ReLU: An activation function that introduces non-linearity into the model.

  • MaxPool2d: A pooling layer that reduces the spatial dimensions of the feature maps.

  • AdaptiveAvgPool2d: A pooling layer that adjusts the output size, averaging spatial dimensions.

  • Dropout: A regularisation layer that helps prevent overfitting by randomly setting some activations to zero.

  • Linear: A fully connected layer that maps features to the output classes.

The goal of this chapter is to familiarise you with these basic operations that are common to all deep neural networks and show you how to use them to design your own architectures. In the following notebooks, we will explore these operations in detail: