8.2. TensorBoard

Contents

8.2. TensorBoard#

TensorBoard provides the visualisation and tooling needed for machine learning experimentation:

  • Tracking and visualising metrics such as loss and accuracy

  • Visualising the model graph (ops and layers)

  • Viewing histograms of weights, biases, or other tensors as they change over time

  • Projecting embeddings to a lower dimensional space

  • Displaying images, text, and audio data.

Logging#

In the csf_main.py we have used TensorBoard to:

  • log accuracy and loss values

  • show batch images

The SummaryWriter class is your main entry to log data for consumption and visualisation by TensorBoard. So, we import it:

from torch.utils.tensorboard import SummaryWriter

At the start, we initialise two instances of SummaryWriter for train and testing, each logging in their corresponding directories:

args.tb_writers = dict()
for mode in ['train', 'test']:
    args.tb_writers[mode] = SummaryWriter(os.path.join(args.out_dir, mode))

We add new accuracy/loss by calling the add_scalar function and add new images by calling the add_image function.

SummaryWriter contains several add_<SOMETHING> functions (https://pytorch.org/docs/stable/tensorboard.html), most of them with a similar set of arguments:

  • tag (data identifier)

  • value (e.g., a floating number in case of scalar and a tensor in case of image)

  • step (allowing to browse the same tag at different time steps)

At the end of the programme, it’s recommended to close the SummaryWriter by calling the close() function.

Monitoring#

We can open the TensorBoard in our browser by calling

tensorboard --logdir <LOG_DIR> --port <PORT_NUMBER>

In our project, by default, the TensorBoard files are saved at csf_out/train/ and csf_out/test/ folder. If we specify the <LOG_DIR> as the parent directory (csf_out/), TensorBoards in all subdirectories will be also visualised:

  • This is a very useful tool to compare different conditions (e.g., train/test, different experiments) at the same time.

  • If there are too many nested TensorBoards, it might become too slow.

The value for <PORT_NUMBER> is a four-digit number, e.g., 6006.:

  • If the port number is already occupied by another process, use another number.

  • You can have several TensorBoards open at different ports.

Finally, we can see the TensorBoard in our browser under this URL

http://localhost:<PORT_NUMBER>/