Optimisation and Learning

Optimisation and Learning#

The objective of this assignment is to familiarise yourself with loss functions and optimizer algorithms while exploring a real deep learning project. We will use the PyTorch framework to implement this.

Prerequisite#

In the class, we worked on the optimisation and learning notebook. The assignment is the continuation of that notebook.

Submission#

Please do the following questions and submit it by Tuesday, May 16 at 12h00 O’clock. The submission is one or more notebook (.ipynb) files that can include code, comments, written texts, and plots.

0. Create a function to train/test on one dataset#

In the optimisation and learning notebook, the train/test one a single dataset cell is written in script format. Change that to a function so you can easily use it for multiple experiments without the need to copy-paste that cell for each experiment. This way your submission will look nicer and you avoid potential bugs.

1. Evolution of the predictions#

In the optimisation and learning notebook we visualised the data points with their corresponding labels (ground-truth).

Use similar plotting functions to visualise the prediction of a network (colour code points in a way that shows for which points the network is correct/incorrect):

Plot this for every epoch.
Put all these plots into one single gif to see the evolution of predictions.

2. Evolution of the weights#

Our network in the optimisation and learning notebook contains very few parameters. The linear1 layer is torch.Size([5, 2]) and linear2 is torch.Size([1, 5]). Analaye how weights change as a function of epoch number:

Visualise the weights change (with respect to the previous epoch) in a format you find most intuitive (e.g., using matshow using a constant grey colour map).
Similar to Question-1 make a gif file for this visualisation. Comment on the speed of change as a function of epoch number.
Compare these analyses with a scenario where the learning rate is smaller.

3. Outlier in datasets#

Add a few outliers (e.g., 1% or 5% of data points) to one of the structured datasets we created in the optimisation and learning notebook.

Train a network with different loss functions.
Compare the efficacy of each loss function for datasets with and without outliers.
Comment on your observations. Is there a loss which is particularly robust to outliers or particularly sensitive to it?

4. Different optimiser algorithms#

In the class, we only looked at the SGD optimizer. Explore with a few other optimiser algorithms that PyTorch support:

Change the function you created in Question-0 to receive the name of the optimiser as a string argument and choose the optimizer accordingly.
For our toy example, is there a performance difference depending on which optimiser we use?
Do different optimisers need different learning rates?