1.6. Dropout

1.6. Dropout#

Dropout is a regularisation technique that randomly sets a fraction of input units to zero during training. By temporarily removing neurons and their connections, the network becomes less dependent on specific features, improving its ability to generalise to unseen data.

Dropout is a simple yet powerful method to reduce overfitting. It works by preventing the co-adaptation of neurons—forcing them to learn more robust and independent representations. The method was introduced by Hinton et al. (2012) in their paper Improving neural networks by preventing co-adaptation of feature detectors.

Key points:

Dropout is enabled only during training (model.train()).
During evaluation (model.eval()), dropout is disabled.
When active, dropout scales the remaining neurons to maintain the same expected output sum.
The dropout probability p controls how many neurons are randomly turned off (e.g., p = 0.5 drops half of them).

0. Preparation#

import torch
import torch.nn as nn

# Create a simple tensor (batch of 1 sample, 6 features)
x = torch.tensor([[1., 2., 3., 4., 5., 6.]])
print("Input tensor x:")
print(x)

Input tensor x:
tensor([[1., 2., 3., 4., 5., 6.]])

# Define dropout layer
dropout_prob = 0.5  # fraction of neurons to drop
drop_nodes = nn.Dropout(p=dropout_prob)

1. Train mode#

# Enable training mode to activate dropout
drop_nodes.train()
y_train = drop_nodes(x)

print("Output after Dropout (training mode):")
print(y_train)

Output after Dropout (training mode):
tensor([[ 0.,  4.,  6.,  8., 10.,  0.]])

Notice that some values are set to 0, and the remaining values are scaled by \(\frac{1}{1-p}\) to maintain the expected sum. Here p = 0.5, so the non-dropped values are multiplied by 2.

2. Eval mode#

# Evaluate mode (Dropout disabled)
drop_nodes.eval()
y_eval = drop_nodes(x)

print("Output in evaluation mode (no dropout applied):")
print(y_eval)

Output in evaluation mode (no dropout applied):
tensor([[1., 2., 3., 4., 5., 6.]])

Summary#

Training mode: Randomly drops units according to probability p and scales remaining ones.
Every run drops different neurons, introducing variability and regularisation.
Evaluation mode: No units are dropped; tensor passes through unchanged.
Dropout is commonly applied after fully connected layers or after activations in a neural network.

Takeaway: Dropout prevents the network from relying too much on specific neurons, which helps reduce overfitting.