7.1. Python Scripting#

Jupyter Notebook provides an interactive programming environment. This is very useful in several scenarios such as:

  • prototyping ideas

  • exploring data

  • plotting results

  • demo codes

  • etc.

However, training real-world deep networks often consist of a larger magnitude of code which is difficult to manage in Notebooks. To this end, we should create Python modules and scripts:

  • Python script: is an executable file that can be executed in the terminal, e.g., python <SCRIPT_PATH>.py.

  • Python module: contains function definitions similar to a third-party library or a package.

In this tutorial, we will create a minimal Python package from the same code as we studied in our last session (Notebook: Probing with Linear Classifiers). This example measures the contrast sensitivity function (CSF) of deep neural networks, therefore, we have named the package as deepcsf.

Python Package#

The deepcsf Python package consists of the following structure:

src/
├── deepcsf/                # Python package
│   ├── __init__.py         # __init__.py is required to import the directory as a package
│   ├── csf_main.py         # training/testing routines
│   ├── dataloader.py       # dataset-related code
│   ├── models.py           # the architecture of the network
│   └── utils.py            # common utility functions
└── main.py                 # executable script

Essentially, we have split the code in our notebook into several Python modules each containing a particular functionality.

Nested packages

This tutorial contains a single Python package and a single script, a more complex project often contains several packages and scripts. This is an easy process: split out the functionality you want into separate folders and include an empty __init__.py file.

Execution#

To execute this code, first, we have to activate our virtual environment containing necessary packages like PyTorch (check the environment setup tutorial).

In your terminal, navigate to the src directory where the deepcsf package is. To train a network:

python main.py

And to test the trained network:

python main.py --test_net <CHECKPOINT_PATH>

The CHECKPOINT_PATH is the path to the saved checkpoint in the training script, by default, it’s saved at csf_out/checkpoint.pth.tar.

Arguments#

The argparse module makes it easy to write user-friendly command-line interfaces. Our main.py module receives several arguments. We can see the list of arguments by calling:

python main.py --help

Which outputs:

usage: main.py [-h] [--epochs EPOCHS] [--initial_epoch INITIAL_EPOCH] [--batch_size BATCH_SIZE]
               [--train_samples TRAIN_SAMPLES] [--num_workers NUM_WORKERS] [--lr LR] 
               [--momentum MOMENTUM] [--weight_decay WEIGHT_DECAY] [--out_dir OUT_DIR] 
               [--test_net TEST_NET] [--resume RESUME]

options:
  -h, --help            show this help message and exit
  --epochs EPOCHS       number of epochs of training
  --initial_epoch INITIAL_EPOCH
                        the staring epoch
  --batch_size BATCH_SIZE
                        size of the batches
  --train_samples TRAIN_SAMPLES
                        Number of train samples at each epoch
  --num_workers NUM_WORKERS
                        Number of CPU workers
  --lr LR               SGD: learning rate
  --momentum MOMENTUM   SGD: momentum
  --weight_decay WEIGHT_DECAY
                        SGD: weight decay
  --out_dir OUT_DIR     the output directory
  --test_net TEST_NET   the path to test network
  --resume RESUME       the path to training checkpoint

In order to pass an argument to our script, we first specify the argument name followed by its value . Similarly, to pass several arguments we separate them by an empty space, for example:

python main.py --batch_size 32 --epochs 10

Specifies a batch_size of 32 and 10 epochs.

Adding arguments to your script is very easy, for instance:

# make an instance of ArgumentParser
parser = argparse.ArgumentParser()
# The add_argument() method attaches individual argument specifications to the parser.
parser.add_argument("--epochs", type=int, default=5, help="number of epochs of training")
parser.add_argument("--batch_size", type=int, default=32, help="size of the batches")
# The parse_args() method runs the parser and places the extracted data in an argparse.Namespace object.
args = parser.parse_args()

Make use of the full potential of argparse

The argparse module offers several useful features, including:

  • Type of argument (e.g., string, float, boolean, etc.)

  • Whether an argument is optional or required

  • Limiting the list of values to predefined choices

  • etc.

An explanation of these features goes beyond the scope of this tutorial. Please check the official argparse documentation.

Logging#

The core functionality of our Python script deepcsf is identical to its corresponding Jupyter notebook. However, we have added a few functionalities in csf_main.py to save/load models and log the progress, which we go through it in this section.

Dumping arguments#

We store the value of all variables in argparse.Namespace in a JSON file. This is handy in several scenarios, for instance when running multiple experiments using the same code with different parameters.

def save_arguments(args):
    """Dumping all arguments in a JSON file"""
    json_file_name = os.path.join(args.out_dir, 'args.json')
    with open(json_file_name, 'w') as fp:
        json.dump(dict(args._get_kwargs()), fp, sort_keys=True, indent=4)

Saving Checkpoints#

We should save the weights of our network and parameters for optimiser frequently (e.g., at the end of each epoch):

  1. To resume training,

  2. To test the network with new stimuli.

Often a dict is stored containing all variables that are required to load a network/optimiser again. In our example:

utils.save_checkpoint(
    {
        # to know to which epoch this checkpoint belongs
        'epoch': epoch,
        # related variables to create the network and load its weights
        'network': {
            'arch': arch,
            'layer': layer,
            # state_dict() contains the network's weights
            'state_dict': network.state_dict()
        },
        # to normalise input signal correctly
        'preprocessing': {'mean': args.mean, 'std': args.std},
        # parameters of optimiser are required to resume training
        'optimizer': optimizer.state_dict(),
        # to input network with a correct input size
        'target_size': args.target_size,
    },
    # the directory where the checkpoint is saved
    args.out_dir
)

Checkpoints should be complete!

Double-check that you are saving all the necessary parameters/variables before starting a long training process. It would be very painful not to be able to use a network trained for several days!

Resuming training#

Resuming training from a given checkpoint is a desirable feature (e.g., because the training process was interrupted, or because you want to obtain better performance). When you resume the training from a checkpoint, you should load all the necessary variables from the checkpoint. For example in our code:

# if resuming a previously training process
if args.resume is not None:
    # openning the checkpoint file
    checkpoint = torch.load(args.resume, map_location='cpu')
    # loading the network with the weights from the checkpoint
    network.load_state_dict(checkpoint['network']['state_dict'])
    # setting the epoch to the checkpoint epoch
    args.initial_epoch = checkpoint['epoch'] + 1
    # loading the optimiser parameters from the checkpoint
    optimizer.load_state_dict(checkpoint['optimizer'])

Testing a network#

To test a network we only need to load the weights of the network and several other stored variables such as the optimiser state is irrelevant. From our example:

checkpoint = torch.load(args.test_net, map_location='cpu')
network.load_state_dict(checkpoint['network']['state_dict'])
network = network.to(args.device)
network.eval()