{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "\"Open\n", "\n", "[View Source Code](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/07_pytorch_experiment_tracking.ipynb) | [View Slides](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/slides/07_pytorch_experiment_tracking.pdf) " ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# 07. PyTorch Experiment Tracking\n", "\n", "> **Note:** This notebook uses `torchvision`'s new [multi-weight support API (available in `torchvision` v0.13+)](https://pytorch.org/blog/introducing-torchvision-new-multi-weight-support-api/). \n", "\n", "We've trained a fair few models now on the journey to making FoodVision Mini (an image classification model to classify images of pizza, steak or sushi).\n", "\n", "And so far we've keep track of them via Python dictionaries.\n", "\n", "Or just comparing them by the metric print outs during training.\n", "\n", "What if you wanted to run a dozen (or more) different models at once?\n", "\n", "Surely there's a better way...\n", "\n", "There is.\n", "\n", "**Experiment tracking.**\n", "\n", "And since experiment tracking is so important and integral to machine learning, you can consider this notebook your first milestone project.\n", "\n", "So welcome to Milestone Project 1: FoodVision Mini Experiment Tracking.\n", "\n", "We're going to answer the question: **how do I track my machine learning experiments?**" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## What is experiment tracking?\n", "\n", "Machine learning and deep learning are very experimental.\n", "\n", "You have to put on your artist's beret/chef's hat to cook up lots of different models.\n", "\n", "And you have to put on your scientist's coat to track the results of various combinations of data, model architectures and training regimes.\n", "\n", "That's where **experiment tracking** comes in.\n", "\n", "If you're running lots of different experiments, **experiment tracking helps you figure out what works and what doesn't**." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Why track experiments?\n", "\n", "If you're only running a handful of models (like we've done so far), it might be okay just to track their results in print outs and a few dictionaries.\n", "\n", "However, as the number of experiments you run starts to increase, this naive way of tracking could get out of hand.\n", "\n", "So if you're following the machine learning practitioner's motto of *experiment, experiment, experiment!*, you'll want a way to track them.\n", "\n", "\"experiment\n", "\n", "*After building a few models and tracking their results, you'll start to notice how quickly it can get out of hand.*" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Different ways to track machine learning experiments \n", "\n", "There are as many different ways to track machine learning experiments as there is experiments to run.\n", "\n", "This table covers a few.\n", "\n", "| **Method** | **Setup** | **Pros** | **Cons** | **Cost** |\n", "| ----- | ----- | ----- | ----- | ----- |\n", "| Python dictionaries, CSV files, print outs | None | Easy to setup, runs in pure Python | Hard to keep track of large numbers of experiments | Free |\n", "| [TensorBoard](https://www.tensorflow.org/tensorboard/get_started) | Minimal, install [`tensorboard`](https://pypi.org/project/tensorboard/) | Extensions built into PyTorch, widely recognized and used, easily scales. | User-experience not as nice as other options. | Free |\n", "| [Weights & Biases Experiment Tracking](https://wandb.ai/site/experiment-tracking) | Minimal, install [`wandb`](https://docs.wandb.ai/quickstart), make an account | Incredible user experience, make experiments public, tracks almost anything. | Requires external resource outside of PyTorch. | Free for personal use | \n", "| [MLFlow](https://mlflow.org/) | Minimal, install `mlflow` and starting tracking | Fully open-source MLOps lifecycle management, many integrations. | Little bit harder to setup a remote tracking server than other services. | Free | \n", "\n", "\"various\n", "\n", "*Various places and techniques you can use to track your machine learning experiments. **Note:** There are various other options similar to Weights & Biases and open-source options similar to MLflow but I've left them out for brevity. You can find more by searching \"machine learning experiment tracking\".*" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## What we're going to cover\n", "\n", "We're going to be running several different modelling experiments with various levels of data, model size and training time to try and improve on FoodVision Mini.\n", "\n", "And due to its tight integration with PyTorch and widespread use, this notebook focuses on using TensorBoard to track our experiments.\n", "\n", "However, the principles we're going to cover are similar across all of the other tools for experiment tracking.\n", "\n", "| **Topic** | **Contents** |\n", "| ----- | ----- |\n", "| **0. Getting setup** | We've written a fair bit of useful code over the past few sections, let's download it and make sure we can use it again. |\n", "| **1. Get data** | Let's get the pizza, steak and sushi image classification dataset we've been using to try and improve our FoodVision Mini model's results. |\n", "| **2. Create Datasets and DataLoaders** | We'll use the `data_setup.py` script we wrote in chapter 05. PyTorch Going Modular to setup our DataLoaders. |\n", "| **3. Get and customise a pretrained model** | Just like the last section, 06. PyTorch Transfer Learning we'll download a pretrained model from `torchvision.models` and customise it to our own problem. | \n", "| **4. Train model amd track results** | Let's see what it's like to train and track the training results of a single model using TensorBoard. |\n", "| **5. View our model's results in TensorBoard** | Previously we visualized our model's loss curves with a helper function, now let's see what they look like in TensorBoard. |\n", "| **6. Creating a helper function to track experiments** | If we're going to be adhering to the machine learner practitioner's motto of *experiment, experiment, experiment!*, we best create a function that will help us save our modelling experiment results. |\n", "| **7. Setting up a series of modelling experiments** | Instead of running experiments one by one, how about we write some code to run several experiments at once, with different models, different amounts of data and different training times. | \n", "| **8. View modelling experiments in TensorBoard** | By this stage we'll have run eight modelling experiments in one go, a fair bit to keep track of, let's see what their results look like in TensorBoard. | \n", "| **9. Load in the best model and make predictions with it** | The point of experiment tracking is to figure out which model performs the best, let's load in the best performing model and make some predictions with it to *visualize, visualize, visualize!*. |" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Where can you get help?\n", "\n", "All of the materials for this course [are available on GitHub](https://github.com/mrdbourke/pytorch-deep-learning).\n", "\n", "If you run into trouble, you can ask a question on the course [GitHub Discussions page](https://github.com/mrdbourke/pytorch-deep-learning/discussions).\n", "\n", "And of course, there's the [PyTorch documentation](https://pytorch.org/docs/stable/index.html) and [PyTorch developer forums](https://discuss.pytorch.org/), a very helpful place for all things PyTorch. " ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 0. Getting setup \n", "\n", "Let's start by downloading all of the modules we'll need for this section.\n", "\n", "To save us writing extra code, we're going to be leveraging some of the Python scripts (such as `data_setup.py` and `engine.py`) we created in section, [05. PyTorch Going Modular](https://www.learnpytorch.io/05_pytorch_going_modular/).\n", "\n", "Specifically, we're going to download the [`going_modular`](https://github.com/mrdbourke/pytorch-deep-learning/tree/main/going_modular) directory from the `pytorch-deep-learning` repository (if we don't already have it).\n", "\n", "We'll also get the [`torchinfo`](https://github.com/TylerYep/torchinfo) package if it's not available. \n", "\n", "`torchinfo` will help later on to give us visual summaries of our model(s).\n", "\n", "And since we're using a newer version of the `torchvision` package (v0.13 as of June 2022), we'll make sure we've got the latest versions." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "torch version: 1.13.0.dev20220620+cu113\n", "torchvision version: 0.14.0.dev20220620+cu113\n" ] } ], "source": [ "# For this notebook to run with updated APIs, we need torch 1.12+ and torchvision 0.13+\n", "try:\n", " import torch\n", " import torchvision\n", " assert int(torch.__version__.split(\".\")[1]) >= 12, \"torch version should be 1.12+\"\n", " assert int(torchvision.__version__.split(\".\")[1]) >= 13, \"torchvision version should be 0.13+\"\n", " print(f\"torch version: {torch.__version__}\")\n", " print(f\"torchvision version: {torchvision.__version__}\")\n", "except:\n", " print(f\"[INFO] torch/torchvision versions not as required, installing nightly versions.\")\n", " !pip3 install -U torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113\n", " import torch\n", " import torchvision\n", " print(f\"torch version: {torch.__version__}\")\n", " print(f\"torchvision version: {torchvision.__version__}\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "> **Note:** If you're using Google Colab, you may have to restart your runtime after running the above cell. After restarting, you can run the cell again and verify you've got the right versions of `torch` (0.12+) and `torchvision` (0.13+)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Continue with regular imports\n", "import matplotlib.pyplot as plt\n", "import torch\n", "import torchvision\n", "\n", "from torch import nn\n", "from torchvision import transforms\n", "\n", "# Try to get torchinfo, install it if it doesn't work\n", "try:\n", " from torchinfo import summary\n", "except:\n", " print(\"[INFO] Couldn't find torchinfo... installing it.\")\n", " !pip install -q torchinfo\n", " from torchinfo import summary\n", "\n", "# Try to import the going_modular directory, download it from GitHub if it doesn't work\n", "try:\n", " from going_modular.going_modular import data_setup, engine\n", "except:\n", " # Get the going_modular scripts\n", " print(\"[INFO] Couldn't find going_modular scripts... downloading them from GitHub.\")\n", " !git clone https://github.com/mrdbourke/pytorch-deep-learning\n", " !mv pytorch-deep-learning/going_modular .\n", " !rm -rf pytorch-deep-learning\n", " from going_modular.going_modular import data_setup, engine" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Now let's setup device agnostic code.\n", "\n", "> **Note:** If you're using Google Colab, and you don't have a GPU turned on yet, it's now time to turn one on via `Runtime -> Change runtime type -> Hardware accelerator -> GPU`." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'cuda'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n", "device" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Create a helper function to set seeds\n", "\n", "Since we've been setting random seeds a whole bunch throughout previous sections, how about we functionize it?\n", "\n", "Let's create a function to \"set the seeds\" called `set_seeds()`.\n", "\n", "> **Note:** Recall a [random seed](https://en.wikipedia.org/wiki/Random_seed) is a way of flavouring the randomness generated by a computer. They aren't necessary to always set when running machine learning code, however, they help ensure there's an element of reproducibility (the numbers I get with my code are similar to the numbers you get with your code). Outside of an education or experimental setting, random seeds generally aren't required." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Set seeds\n", "def set_seeds(seed: int=42):\n", " \"\"\"Sets random sets for torch operations.\n", "\n", " Args:\n", " seed (int, optional): Random seed to set. Defaults to 42.\n", " \"\"\"\n", " # Set the seed for general torch operations\n", " torch.manual_seed(seed)\n", " # Set the seed for CUDA torch operations (ones that happen on the GPU)\n", " torch.cuda.manual_seed(seed)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Get data\n", "\n", "As always, before we can run machine learning experiments, we'll need a dataset.\n", "\n", "We're going to continue trying to improve upon the results we've been getting on FoodVision Mini.\n", "\n", "In the previous section, [06. PyTorch Transfer Learning](https://www.learnpytorch.io/06_pytorch_transfer_learning/), we saw how powerful using a pretrained model and transfer learning could be when classifying images of pizza, steak and sushi.\n", "\n", "So how about we run some experiments and try to further improve our results?\n", "\n", "To do so, we'll use similar code to the previous section to download the [`pizza_steak_sushi.zip`](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/data/pizza_steak_sushi.zip) (if the data doesn't already exist) except this time its been functionised.\n", "\n", "This will allow us to use it again later. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[INFO] data/pizza_steak_sushi directory exists, skipping download.\n" ] }, { "data": { "text/plain": [ "PosixPath('data/pizza_steak_sushi')" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import os\n", "import zipfile\n", "\n", "from pathlib import Path\n", "\n", "import requests\n", "\n", "def download_data(source: str, \n", " destination: str,\n", " remove_source: bool = True) -> Path:\n", " \"\"\"Downloads a zipped dataset from source and unzips to destination.\n", "\n", " Args:\n", " source (str): A link to a zipped file containing data.\n", " destination (str): A target directory to unzip data to.\n", " remove_source (bool): Whether to remove the source after downloading and extracting.\n", " \n", " Returns:\n", " pathlib.Path to downloaded data.\n", " \n", " Example usage:\n", " download_data(source=\"https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip\",\n", " destination=\"pizza_steak_sushi\")\n", " \"\"\"\n", " # Setup path to data folder\n", " data_path = Path(\"data/\")\n", " image_path = data_path / destination\n", "\n", " # If the image folder doesn't exist, download it and prepare it... \n", " if image_path.is_dir():\n", " print(f\"[INFO] {image_path} directory exists, skipping download.\")\n", " else:\n", " print(f\"[INFO] Did not find {image_path} directory, creating one...\")\n", " image_path.mkdir(parents=True, exist_ok=True)\n", " \n", " # Download pizza, steak, sushi data\n", " target_file = Path(source).name\n", " with open(data_path / target_file, \"wb\") as f:\n", " request = requests.get(source)\n", " print(f\"[INFO] Downloading {target_file} from {source}...\")\n", " f.write(request.content)\n", "\n", " # Unzip pizza, steak, sushi data\n", " with zipfile.ZipFile(data_path / target_file, \"r\") as zip_ref:\n", " print(f\"[INFO] Unzipping {target_file} data...\") \n", " zip_ref.extractall(image_path)\n", "\n", " # Remove .zip file\n", " if remove_source:\n", " os.remove(data_path / target_file)\n", " \n", " return image_path\n", "\n", "image_path = download_data(source=\"https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip\",\n", " destination=\"pizza_steak_sushi\")\n", "image_path" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Excellent! Looks like we've got our pizza, steak and sushi images in standard image classification format ready to go." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Create Datasets and DataLoaders\n", "\n", "Now we've got some data, let's turn it into PyTorch DataLoaders.\n", "\n", "We can do so using the `create_dataloaders()` function we created in [05. PyTorch Going Modular part 2](https://www.learnpytorch.io/05_pytorch_going_modular/#2-create-datasets-and-dataloaders-data_setuppy).\n", "\n", "And since we'll be using transfer learning and specifically pretrained models from [`torchvision.models`](https://pytorch.org/vision/stable/models.html), we'll create a transform to prepare our images correctly.\n", "\n", "To transform our images in tensors, we can use:\n", "1. Manually created transforms using `torchvision.transforms`.\n", "2. Automatically created transforms using `torchvision.models.MODEL_NAME.MODEL_WEIGHTS.DEFAULT.transforms()`.\n", " * Where `MODEL_NAME` is a specific `torchvision.models` architecture, `MODEL_WEIGHTS` is a specific set of pretrained weights and `DEFAULT` means the \"best available weights\".\n", " \n", "We saw an example of each of these in [06. PyTorch Transfer Learning section 2](https://www.learnpytorch.io/06_pytorch_transfer_learning/#2-create-datasets-and-dataloaders).\n", "\n", "Let's see first an example of manually creating a `torchvision.transforms` pipeline (creating a transforms pipeline this way gives the most customization but can potentially result in performance degradation if the transforms don't match the pretrained model).\n", "\n", "The main manual transformation we need to be sure of is that all of our images are normalized in ImageNet format (this is because pretrained `torchvision.models` are all pretrained on [ImageNet](https://www.image-net.org/)).\n", "\n", "We can do this with:\n", "\n", "```python\n", "normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],\n", " std=[0.229, 0.224, 0.225])\n", "```" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 2.1 Create DataLoaders using manually created transforms" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Manually created transforms: Compose(\n", " Resize(size=(224, 224), interpolation=bilinear, max_size=None, antialias=None)\n", " ToTensor()\n", " Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])\n", ")\n" ] }, { "data": { "text/plain": [ "(,\n", " ,\n", " ['pizza', 'steak', 'sushi'])" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Setup directories\n", "train_dir = image_path / \"train\"\n", "test_dir = image_path / \"test\"\n", "\n", "# Setup ImageNet normalization levels (turns all images into similar distribution as ImageNet)\n", "normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],\n", " std=[0.229, 0.224, 0.225])\n", "\n", "# Create transform pipeline manually\n", "manual_transforms = transforms.Compose([\n", " transforms.Resize((224, 224)),\n", " transforms.ToTensor(),\n", " normalize\n", "]) \n", "print(f\"Manually created transforms: {manual_transforms}\")\n", "\n", "# Create data loaders\n", "train_dataloader, test_dataloader, class_names = data_setup.create_dataloaders(\n", " train_dir=train_dir,\n", " test_dir=test_dir,\n", " transform=manual_transforms, # use manually created transforms\n", " batch_size=32\n", ")\n", "\n", "train_dataloader, test_dataloader, class_names" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 2.2 Create DataLoaders using automatically created transforms\n", "\n", "Data transformed and DataLoaders created!\n", "\n", "Let's now see what the same transformation pipeline looks like but this time by using automatic transforms.\n", "\n", "We can do this by first instantiating a set of pretrained weights (for example `weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT`) we'd like to use and calling the `transforms()` method on it." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Automatically created transforms: ImageClassification(\n", " crop_size=[224]\n", " resize_size=[256]\n", " mean=[0.485, 0.456, 0.406]\n", " std=[0.229, 0.224, 0.225]\n", " interpolation=InterpolationMode.BICUBIC\n", ")\n" ] }, { "data": { "text/plain": [ "(,\n", " ,\n", " ['pizza', 'steak', 'sushi'])" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Setup dirs\n", "train_dir = image_path / \"train\"\n", "test_dir = image_path / \"test\"\n", "\n", "# Setup pretrained weights (plenty of these available in torchvision.models)\n", "weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT\n", "\n", "# Get transforms from weights (these are the transforms that were used to obtain the weights)\n", "automatic_transforms = weights.transforms() \n", "print(f\"Automatically created transforms: {automatic_transforms}\")\n", "\n", "# Create data loaders\n", "train_dataloader, test_dataloader, class_names = data_setup.create_dataloaders(\n", " train_dir=train_dir,\n", " test_dir=test_dir,\n", " transform=automatic_transforms, # use automatic created transforms\n", " batch_size=32\n", ")\n", "\n", "train_dataloader, test_dataloader, class_names" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Getting a pretrained model, freezing the base layers and changing the classifier head\n", "\n", "Before we run and track multiple modelling experiments, let's see what it's like to run and track a single one.\n", "\n", "And since our data is ready, the next thing we'll need is a model.\n", "\n", "Let's download the pretrained weights for a `torchvision.models.efficientnet_b0()` model and prepare it for use with our own data." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# Note: This is how a pretrained model would be created in torchvision > 0.13, it will be deprecated in future versions.\n", "# model = torchvision.models.efficientnet_b0(pretrained=True).to(device) # OLD \n", "\n", "# Download the pretrained weights for EfficientNet_B0\n", "weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT # NEW in torchvision 0.13, \"DEFAULT\" means \"best weights available\"\n", "\n", "# Setup the model with the pretrained weights and send it to the target device\n", "model = torchvision.models.efficientnet_b0(weights=weights).to(device)\n", "\n", "# View the output of the model\n", "# model" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Wonderful!\n", "\n", "Now we've got a pretrained model let's turn into a feature extractor model.\n", "\n", "In essence, we'll freeze the base layers of the model (we'll use these to extract features from our input images) and we'll change the classifier head (output layer) to suit the number of classes we're working with (we've got 3 classes: pizza, steak, sushi).\n", "\n", "> **Note:** The idea of creating a feature extractor model (what we're doing here) was covered in more depth in [06. PyTorch Transfer Learning section 3.2: Setting up a pretrained model](https://www.learnpytorch.io/06_pytorch_transfer_learning/#32-setting-up-a-pretrained-model)." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# Freeze all base layers by setting requires_grad attribute to False\n", "for param in model.features.parameters():\n", " param.requires_grad = False\n", " \n", "# Since we're creating a new layer with random weights (torch.nn.Linear), \n", "# let's set the seeds\n", "set_seeds() \n", "\n", "# Update the classifier head to suit our problem\n", "model.classifier = torch.nn.Sequential(\n", " nn.Dropout(p=0.2, inplace=True),\n", " nn.Linear(in_features=1280, \n", " out_features=len(class_names),\n", " bias=True).to(device))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Base layers frozen, classifier head changed, let's get a summary of our model with `torchinfo.summary()`." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "from torchinfo import summary\n", "\n", "# # Get a summary of the model (uncomment for full output)\n", "# summary(model, \n", "# input_size=(32, 3, 224, 224), # make sure this is \"input_size\", not \"input_shape\" (batch_size, color_channels, height, width)\n", "# verbose=0,\n", "# col_names=[\"input_size\", \"output_size\", \"num_params\", \"trainable\"],\n", "# col_width=20,\n", "# row_settings=[\"var_names\"]\n", "# )" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "\"output\n", "\n", "*Output of `torchinfo.summary()` with our feature extractor EffNetB0 model, notice how the base layers are frozen (not trainable) and the output layers are customized to our own problem.*" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Train model and track results\n", "\n", "Model ready to go!\n", "\n", "Let's get ready to train it by creating a loss function and an optimizer.\n", "\n", "Since we're working with multiple classes, we'll use [`torch.nn.CrossEntropyLoss()`](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html) as the loss function.\n", "\n", "And we'll stick with [`torch.optim.Adam()`](https://pytorch.org/docs/stable/optim.html) with learning rate of `0.001` for the optimizer. " ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# Define loss and optimizer\n", "loss_fn = nn.CrossEntropyLoss()\n", "optimizer = torch.optim.Adam(model.parameters(), lr=0.001)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Adjust `train()` function to track results with `SummaryWriter()`\n", "\n", "Beautiful!\n", "\n", "All of the pieces of our training code are starting to come together.\n", "\n", "Let's now add the final piece to track our experiments.\n", "\n", "Previously, we've tracked our modelling experiments using multiple Python dictionaries (one for each model).\n", "\n", "But you can imagine this could get out of hand if we were running anything more than a few experiments.\n", "\n", "Not to worry, there's a better option!\n", "\n", "We can use PyTorch's [`torch.utils.tensorboard.SummaryWriter()`](https://pytorch.org/docs/stable/tensorboard.html) class to save various parts of our model's training progress to file.\n", "\n", "By default, the `SummaryWriter()` class saves various information about our model to a file set by the `log_dir` parameter. \n", "\n", "The default location for `log_dir` is under `runs/CURRENT_DATETIME_HOSTNAME`, where the `HOSTNAME` is the name of your computer.\n", "\n", "But of course, you can change where your experiments are tracked (the filename is as customisable as you'd like).\n", "\n", "The outputs of the `SummaryWriter()` are saved in [TensorBoard format](https://www.tensorflow.org/tensorboard/).\n", "\n", "TensorBoard is a part of the TensorFlow deep learning library and is an excellent way to visualize different parts of your model.\n", "\n", "To start tracking our modelling experiments, let's create a default `SummaryWriter()` instance." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "from torch.utils.tensorboard import SummaryWriter\n", "\n", "# Create a writer with all default settings\n", "writer = SummaryWriter()" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Now to use the writer, we could write a new training loop or we could adjust the existing `train()` function we created in [05. PyTorch Going Modular section 4](https://www.learnpytorch.io/05_pytorch_going_modular/#4-creating-train_step-and-test_step-functions-and-train-to-combine-them).\n", "\n", "Let's take the latter option.\n", "\n", "We'll get the `train()` function from [`engine.py`](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/going_modular/going_modular/engine.py) and adjust it to use `writer`.\n", "\n", "Specifically, we'll add the ability for our `train()` function to log our model's training and test loss and accuracy values.\n", "\n", "We can do this with [`writer.add_scalars(main_tag, tag_scalar_dict)`](https://pytorch.org/docs/stable/tensorboard.html#torch.utils.tensorboard.writer.SummaryWriter.add_scalars), where:\n", "* `main_tag` (string) - the name for the scalars being tracked (e.g. \"Accuracy\")\n", "* `tag_scalar_dict` (dict) - a dictionary of the values being tracked (e.g. `{\"train_loss\": 0.3454}`)\n", " * > **Note:** The method is called `add_scalars()` because our loss and accuracy values are generally scalars (single values).\n", "\n", "Once we've finished tracking values, we'll call `writer.close()` to tell the `writer` to stop looking for values to track.\n", "\n", "To start modifying `train()` we'll also import `train_step()` and `test_step()` from [`engine.py`](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/going_modular/going_modular/engine.py).\n", "\n", "> **Note:** You can track information about your model almost anywhere in your code. But quite often experiments will be tracked *while* a model is training (inside a training/testing loop).\n", ">\n", "> The `torch.utils.tensorboard.SummaryWriter()` class also has many different methods to track different things about your model/data, such as [`add_graph()`](https://pytorch.org/docs/stable/tensorboard.html#torch.utils.tensorboard.writer.SummaryWriter.add_graph) which tracks the computation graph of your model. For more options, [check the `SummaryWriter()` documentation](https://pytorch.org/docs/stable/tensorboard.html#torch.utils.tensorboard.writer.SummaryWriter)." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "from typing import Dict, List\n", "from tqdm.auto import tqdm\n", "\n", "from going_modular.going_modular.engine import train_step, test_step\n", "\n", "# Import train() function from: \n", "# https://github.com/mrdbourke/pytorch-deep-learning/blob/main/going_modular/going_modular/engine.py\n", "def train(model: torch.nn.Module, \n", " train_dataloader: torch.utils.data.DataLoader, \n", " test_dataloader: torch.utils.data.DataLoader, \n", " optimizer: torch.optim.Optimizer,\n", " loss_fn: torch.nn.Module,\n", " epochs: int,\n", " device: torch.device) -> Dict[str, List]:\n", " \"\"\"Trains and tests a PyTorch model.\n", "\n", " Passes a target PyTorch models through train_step() and test_step()\n", " functions for a number of epochs, training and testing the model\n", " in the same epoch loop.\n", "\n", " Calculates, prints and stores evaluation metrics throughout.\n", "\n", " Args:\n", " model: A PyTorch model to be trained and tested.\n", " train_dataloader: A DataLoader instance for the model to be trained on.\n", " test_dataloader: A DataLoader instance for the model to be tested on.\n", " optimizer: A PyTorch optimizer to help minimize the loss function.\n", " loss_fn: A PyTorch loss function to calculate loss on both datasets.\n", " epochs: An integer indicating how many epochs to train for.\n", " device: A target device to compute on (e.g. \"cuda\" or \"cpu\").\n", " \n", " Returns:\n", " A dictionary of training and testing loss as well as training and\n", " testing accuracy metrics. Each metric has a value in a list for \n", " each epoch.\n", " In the form: {train_loss: [...],\n", " train_acc: [...],\n", " test_loss: [...],\n", " test_acc: [...]} \n", " For example if training for epochs=2: \n", " {train_loss: [2.0616, 1.0537],\n", " train_acc: [0.3945, 0.3945],\n", " test_loss: [1.2641, 1.5706],\n", " test_acc: [0.3400, 0.2973]} \n", " \"\"\"\n", " # Create empty results dictionary\n", " results = {\"train_loss\": [],\n", " \"train_acc\": [],\n", " \"test_loss\": [],\n", " \"test_acc\": []\n", " }\n", "\n", " # Loop through training and testing steps for a number of epochs\n", " for epoch in tqdm(range(epochs)):\n", " train_loss, train_acc = train_step(model=model,\n", " dataloader=train_dataloader,\n", " loss_fn=loss_fn,\n", " optimizer=optimizer,\n", " device=device)\n", " test_loss, test_acc = test_step(model=model,\n", " dataloader=test_dataloader,\n", " loss_fn=loss_fn,\n", " device=device)\n", "\n", " # Print out what's happening\n", " print(\n", " f\"Epoch: {epoch+1} | \"\n", " f\"train_loss: {train_loss:.4f} | \"\n", " f\"train_acc: {train_acc:.4f} | \"\n", " f\"test_loss: {test_loss:.4f} | \"\n", " f\"test_acc: {test_acc:.4f}\"\n", " )\n", "\n", " # Update results dictionary\n", " results[\"train_loss\"].append(train_loss)\n", " results[\"train_acc\"].append(train_acc)\n", " results[\"test_loss\"].append(test_loss)\n", " results[\"test_acc\"].append(test_acc)\n", "\n", " ### New: Experiment tracking ###\n", " # Add loss results to SummaryWriter\n", " writer.add_scalars(main_tag=\"Loss\", \n", " tag_scalar_dict={\"train_loss\": train_loss,\n", " \"test_loss\": test_loss},\n", " global_step=epoch)\n", "\n", " # Add accuracy results to SummaryWriter\n", " writer.add_scalars(main_tag=\"Accuracy\", \n", " tag_scalar_dict={\"train_acc\": train_acc,\n", " \"test_acc\": test_acc}, \n", " global_step=epoch)\n", " \n", " # Track the PyTorch model architecture\n", " writer.add_graph(model=model, \n", " # Pass in an example input\n", " input_to_model=torch.randn(32, 3, 224, 224).to(device))\n", " \n", " # Close the writer\n", " writer.close()\n", " \n", " ### End new ###\n", "\n", " # Return the filled results at the end of the epochs\n", " return results" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Woohoo!\n", "\n", "Our `train()` function is now updated to use a `SummaryWriter()` instance to track our model's results.\n", "\n", "How about we try it out for 5 epochs?" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "bf70c256625142c283475bdf9af948a1", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/5 [00:00 **Note:** You might notice the results here are slightly different to what our model got in 06. PyTorch Transfer Learning. The difference comes from using the `engine.train()` and our modified `train()` function. Can you guess why? The [PyTorch documentation on randomness](https://pytorch.org/docs/stable/notes/randomness.html) may help more.\n", "\n", "Running the cell above we get similar outputs we got in [06. PyTorch Transfer Learning section 4: Train model](https://www.learnpytorch.io/06_pytorch_transfer_learning/#4-train-model) but the difference is behind the scenes our `writer` instance has created a `runs/` directory storing our model's results.\n", "\n", "For example, the save location might look like:\n", "\n", "```\n", "runs/Jun21_00-46-03_daniels_macbook_pro\n", "```\n", "\n", "Where the [default format](https://pytorch.org/docs/stable/tensorboard.html#torch.utils.tensorboard.writer.SummaryWriter) is `runs/CURRENT_DATETIME_HOSTNAME`. \n", "\n", "We'll check these out in a second but just as a reminder, we were previously tracking our model's results in a dictionary." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'train_loss': [1.0923754647374153,\n", " 0.8974628075957298,\n", " 0.803724929690361,\n", " 0.6769256368279457,\n", " 0.7064960040152073],\n", " 'train_acc': [0.3984375, 0.65625, 0.74609375, 0.8515625, 0.71875],\n", " 'test_loss': [0.9132757981618246,\n", " 0.7837507526079813,\n", " 0.6722926497459412,\n", " 0.6698453426361084,\n", " 0.6746167540550232],\n", " 'test_acc': [0.5397727272727273,\n", " 0.8560606060606061,\n", " 0.8863636363636364,\n", " 0.8049242424242425,\n", " 0.7736742424242425]}" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Check out the model results\n", "results" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Hmmm, we could format this to be a nice plot but could you imagine keeping track of a bunch of these dictionaries?\n", "\n", "There has to be a better way..." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 5. View our model's results in TensorBoard\n", "\n", "The `SummaryWriter()` class stores our model's results in a directory called `runs/` in TensorBoard format by default.\n", "\n", "TensorBoard is a visualization program created by the TensorFlow team to view and inspect information about models and data.\n", "\n", "You know what that means?\n", "\n", "It's time to follow the data visualizer's motto and *visualize, visualize, visualize!* \n", "\n", "You can view TensorBoard in a number of ways:\n", "\n", "| Code environment | How to view TensorBoard | Resource |\n", "| ----- | ----- | ----- |\n", "| VS Code (notebooks or Python scripts) | Press `SHIFT + CMD + P` to open the Command Palette and search for the command \"Python: Launch TensorBoard\". | [VS Code Guide on TensorBoard and PyTorch](https://code.visualstudio.com/docs/datascience/pytorch-support#_tensorboard-integration) |\n", "| Jupyter and Colab Notebooks | Make sure [TensorBoard is installed](https://pypi.org/project/tensorboard/), load it with `%load_ext tensorboard` and then view your results with `%tensorboard --logdir DIR_WITH_LOGS`. | [`torch.utils.tensorboard`](https://pytorch.org/docs/stable/tensorboard.html) and [Get started with TensorBoard](https://www.tensorflow.org/tensorboard/get_started) |\n", "\n", "You can also upload your experiments to [tensorboard.dev](https://tensorboard.dev/) to share them publicly with others.\n", "\n", "Running the following code in a Google Colab or Jupyter Notebook will start an interactive TensorBoard session to view TensorBoard files in the `runs/` directory.\n", "\n", "```python\n", "%load_ext tensorboard # line magic to load TensorBoard\n", "%tensorboard --logdir runs # run TensorBoard session with the \"runs/\" directory\n", "```" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "# Example code to run in Jupyter or Google Colab Notebook (uncomment to try it out)\n", "# %load_ext tensorboard\n", "# %tensorboard --logdir runs" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "If all went correctly, you should see something like the following:\n", "\n", "\"output\n", "\n", "*Viewing a single modelling experiment's results for accuracy and loss in TensorBoard.*\n", "\n", "> **Note:** For more information on running TensorBoard in notebooks or in other locations, see the following:\n", "> * [Using TensorBoard in Notebooks guide by TensorFlow](https://www.tensorflow.org/tensorboard/tensorboard_in_notebooks)\n", "> * [Get started with TensorBoard.dev](https://tensorboard.dev/#get-started) (helpful for uploading your TensorBoard logs to a shareable link)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Create a helper function to build `SummaryWriter()` instances\n", "\n", "The `SummaryWriter()` class logs various information to a directory specified by the `log_dir` parameter.\n", "\n", "How about we make a helper function to create a custom directory per experiment?\n", "\n", "In essence, each experiment gets its own logs directory.\n", "\n", "For example, say we'd like to track things like:\n", "* **Experiment date/timestamp** - when did the experiment take place?\n", "* **Experiment name** - is there something we'd like to call the experiment?\n", "* **Model name** - what model was used?\n", "* **Extra** - should anything else be tracked?\n", "\n", "You could track almost anything here and be as creative as you want but these should be enough to start.\n", "\n", "Let's create a helper function called `create_writer()` that produces a `SummaryWriter()` instance tracking to a custom `log_dir`.\n", "\n", "Ideally, we'd like the `log_dir` to be something like: \n", "\n", "`runs/YYYY-MM-DD/experiment_name/model_name/extra` \n", "\n", "Where `YYYY-MM-DD` is the date the experiment was run (you could add the time if you wanted to as well)." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "def create_writer(experiment_name: str, \n", " model_name: str, \n", " extra: str=None) -> torch.utils.tensorboard.writer.SummaryWriter():\n", " \"\"\"Creates a torch.utils.tensorboard.writer.SummaryWriter() instance saving to a specific log_dir.\n", "\n", " log_dir is a combination of runs/timestamp/experiment_name/model_name/extra.\n", "\n", " Where timestamp is the current date in YYYY-MM-DD format.\n", "\n", " Args:\n", " experiment_name (str): Name of experiment.\n", " model_name (str): Name of model.\n", " extra (str, optional): Anything extra to add to the directory. Defaults to None.\n", "\n", " Returns:\n", " torch.utils.tensorboard.writer.SummaryWriter(): Instance of a writer saving to log_dir.\n", "\n", " Example usage:\n", " # Create a writer saving to \"runs/2022-06-04/data_10_percent/effnetb2/5_epochs/\"\n", " writer = create_writer(experiment_name=\"data_10_percent\",\n", " model_name=\"effnetb2\",\n", " extra=\"5_epochs\")\n", " # The above is the same as:\n", " writer = SummaryWriter(log_dir=\"runs/2022-06-04/data_10_percent/effnetb2/5_epochs/\")\n", " \"\"\"\n", " from datetime import datetime\n", " import os\n", "\n", " # Get timestamp of current date (all experiments on certain day live in same folder)\n", " timestamp = datetime.now().strftime(\"%Y-%m-%d\") # returns current date in YYYY-MM-DD format\n", "\n", " if extra:\n", " # Create log directory path\n", " log_dir = os.path.join(\"runs\", timestamp, experiment_name, model_name, extra)\n", " else:\n", " log_dir = os.path.join(\"runs\", timestamp, experiment_name, model_name)\n", " \n", " print(f\"[INFO] Created SummaryWriter, saving to: {log_dir}...\")\n", " return SummaryWriter(log_dir=log_dir)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Beautiful!\n", "\n", "Now we've got a `create_writer()` function, let's try it out." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[INFO] Created SummaryWriter, saving to: runs/2022-06-23/data_10_percent/effnetb0/5_epochs...\n" ] } ], "source": [ "# Create an example writer\n", "example_writer = create_writer(experiment_name=\"data_10_percent\",\n", " model_name=\"effnetb0\",\n", " extra=\"5_epochs\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Looking good, now we've got a way to log and trace back our various experiments." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 6.1 Update the `train()` function to include a `writer` parameter\n", "\n", "Our `create_writer()` function works fantastic.\n", "\n", "How about we give our `train()` function the ability to take in a `writer` parameter so we actively update the `SummaryWriter()` instance we're using each time we call `train()`.\n", "\n", "For example, say we're running a series of experiments, calling `train()` multiple times for multiple different models, it would be good if each experiment used a different `writer`.\n", "\n", "One `writer` per experiment = one logs directory per experiment.\n", "\n", "To adjust the `train()` function we'll add a `writer` parameter to the function and then we'll add some code to see if there's a `writer` and if so, we'll track our information there." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "from typing import Dict, List\n", "from tqdm.auto import tqdm\n", "\n", "# Add writer parameter to train()\n", "def train(model: torch.nn.Module, \n", " train_dataloader: torch.utils.data.DataLoader, \n", " test_dataloader: torch.utils.data.DataLoader, \n", " optimizer: torch.optim.Optimizer,\n", " loss_fn: torch.nn.Module,\n", " epochs: int,\n", " device: torch.device, \n", " writer: torch.utils.tensorboard.writer.SummaryWriter # new parameter to take in a writer\n", " ) -> Dict[str, List]:\n", " \"\"\"Trains and tests a PyTorch model.\n", "\n", " Passes a target PyTorch models through train_step() and test_step()\n", " functions for a number of epochs, training and testing the model\n", " in the same epoch loop.\n", "\n", " Calculates, prints and stores evaluation metrics throughout.\n", "\n", " Stores metrics to specified writer log_dir if present.\n", "\n", " Args:\n", " model: A PyTorch model to be trained and tested.\n", " train_dataloader: A DataLoader instance for the model to be trained on.\n", " test_dataloader: A DataLoader instance for the model to be tested on.\n", " optimizer: A PyTorch optimizer to help minimize the loss function.\n", " loss_fn: A PyTorch loss function to calculate loss on both datasets.\n", " epochs: An integer indicating how many epochs to train for.\n", " device: A target device to compute on (e.g. \"cuda\" or \"cpu\").\n", " writer: A SummaryWriter() instance to log model results to.\n", "\n", " Returns:\n", " A dictionary of training and testing loss as well as training and\n", " testing accuracy metrics. Each metric has a value in a list for \n", " each epoch.\n", " In the form: {train_loss: [...],\n", " train_acc: [...],\n", " test_loss: [...],\n", " test_acc: [...]} \n", " For example if training for epochs=2: \n", " {train_loss: [2.0616, 1.0537],\n", " train_acc: [0.3945, 0.3945],\n", " test_loss: [1.2641, 1.5706],\n", " test_acc: [0.3400, 0.2973]} \n", " \"\"\"\n", " # Create empty results dictionary\n", " results = {\"train_loss\": [],\n", " \"train_acc\": [],\n", " \"test_loss\": [],\n", " \"test_acc\": []\n", " }\n", "\n", " # Loop through training and testing steps for a number of epochs\n", " for epoch in tqdm(range(epochs)):\n", " train_loss, train_acc = train_step(model=model,\n", " dataloader=train_dataloader,\n", " loss_fn=loss_fn,\n", " optimizer=optimizer,\n", " device=device)\n", " test_loss, test_acc = test_step(model=model,\n", " dataloader=test_dataloader,\n", " loss_fn=loss_fn,\n", " device=device)\n", "\n", " # Print out what's happening\n", " print(\n", " f\"Epoch: {epoch+1} | \"\n", " f\"train_loss: {train_loss:.4f} | \"\n", " f\"train_acc: {train_acc:.4f} | \"\n", " f\"test_loss: {test_loss:.4f} | \"\n", " f\"test_acc: {test_acc:.4f}\"\n", " )\n", "\n", " # Update results dictionary\n", " results[\"train_loss\"].append(train_loss)\n", " results[\"train_acc\"].append(train_acc)\n", " results[\"test_loss\"].append(test_loss)\n", " results[\"test_acc\"].append(test_acc)\n", "\n", "\n", " ### New: Use the writer parameter to track experiments ###\n", " # See if there's a writer, if so, log to it\n", " if writer:\n", " # Add results to SummaryWriter\n", " writer.add_scalars(main_tag=\"Loss\", \n", " tag_scalar_dict={\"train_loss\": train_loss,\n", " \"test_loss\": test_loss},\n", " global_step=epoch)\n", " writer.add_scalars(main_tag=\"Accuracy\", \n", " tag_scalar_dict={\"train_acc\": train_acc,\n", " \"test_acc\": test_acc}, \n", " global_step=epoch)\n", "\n", " # Close the writer\n", " writer.close()\n", " else:\n", " pass\n", " ### End new ###\n", "\n", " # Return the filled results at the end of the epochs\n", " return results" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 7. Setting up a series of modelling experiments\n", "\n", "It's to step things up a notch.\n", "\n", "Previously we've been running various experiments and inspecting the results one by one.\n", "\n", "But what if we could run multiple experiments and then inspect the results all together?\n", "\n", "You in?\n", "\n", "C'mon, let's go." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 7.1 What kind of experiments should you run?\n", "\n", "That's the million dollar question in machine learning.\n", "\n", "Because there's really no limit to the experiments you can run.\n", "\n", "Such a freedom is why machine learning is so exciting and terrifying at the same time.\n", "\n", "This is where you'll have to put on your scientist coat and remember the machine learning practitioner's motto: *experiment, experiment, experiment!*\n", "\n", "Every hyperparameter stands as a starting point for a different experiment: \n", "* Change the number of **epochs**.\n", "* Change the number of **layers/hidden units**.\n", "* Change the amount of **data**.\n", "* Change the **learning rate**.\n", "* Try different kinds of **data augmentation**.\n", "* Choose a different **model architecture**. \n", "\n", "With practice and running many different experiments, you'll start to build an intuition of what *might* help your model.\n", "\n", "I say *might* on purpose because there's no guarantees.\n", "\n", "But generally, in light of [*The Bitter Lesson*](http://www.incompleteideas.net/IncIdeas/BitterLesson.html) (I've mentioned this twice now because it's an important essay in the world of AI), generally the bigger your model (more learnable parameters) and the more data you have (more opportunities to learn), the better the performance.\n", "\n", "However, when you're first approaching a machine learning problem: start small and if something works, scale it up.\n", "\n", "Your first batch of experiments should take no longer than a few seconds to a few minutes to run.\n", "\n", "The quicker you can experiment, the faster you can work out what *doesn't* work, in turn, the faster you can work out what *does* work.\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### 7.2 What experiments are we going to run?\n", "\n", "Our goal is to improve the model powering FoodVision Mini without it getting too big.\n", "\n", "In essence, our ideal model achieves a high level of test set accuracy (90%+) but doesn't take too long to train/perform inference (make predictions).\n", "\n", "We've got plenty of options but how about we keep things simple?\n", "\n", "Let's try a combination of:\n", "1. A different amount of data (10% of Pizza, Steak, Sushi vs. 20%)\n", "2. A different model ([`torchvision.models.efficientnet_b0`](https://pytorch.org/vision/stable/generated/torchvision.models.efficientnet_b0.html#torchvision.models.efficientnet_b0) vs. [`torchvision.models.efficientnet_b2`](https://pytorch.org/vision/stable/generated/torchvision.models.efficientnet_b2.html#torchvision.models.efficientnet_b2))\n", "3. A different training time (5 epochs vs. 10 epochs)\n", "\n", "Breaking these down we get: \n", "\n", "| Experiment number | Training Dataset | Model (pretrained on ImageNet) | Number of epochs |\n", "| ----- | ----- | ----- | ----- |\n", "| 1 | Pizza, Steak, Sushi 10% percent | EfficientNetB0 | 5 |\n", "| 2 | Pizza, Steak, Sushi 10% percent | EfficientNetB2 | 5 | \n", "| 3 | Pizza, Steak, Sushi 10% percent | EfficientNetB0 | 10 | \n", "| 4 | Pizza, Steak, Sushi 10% percent | EfficientNetB2 | 10 |\n", "| 5 | Pizza, Steak, Sushi 20% percent | EfficientNetB0 | 5 |\n", "| 6 | Pizza, Steak, Sushi 20% percent | EfficientNetB2 | 5 |\n", "| 7 | Pizza, Steak, Sushi 20% percent | EfficientNetB0 | 10 |\n", "| 8 | Pizza, Steak, Sushi 20% percent | EfficientNetB2 | 10 |\n", "\n", "Notice how we're slowly scaling things up. \n", "\n", "With each experiment we slowly increase the amount of data, the model size and the length of training.\n", "\n", "By the end, experiment 8 will be using double the data, double the model size and double the length of training compared to experiment 1.\n", "\n", "> **Note:** I want to be clear that there truly is no limit to amount of experiments you can run. What we've designed here is only a very small subset of options. However, you can't test *everything* so best to try a few things to begin with and then follow the ones which work the best.\n", ">\n", "> And as a reminder, the datasets we're using are a subset of the [Food101 dataset](https://pytorch.org/vision/stable/generated/torchvision.datasets.Food101.html#torchvision.datasets.Food101) (3 classes, pizza, steak, suhsi, instead of 101) and 10% and 20% of the images rather than 100%. If our experiments work, we could start to run more on more data (though this will take longer to compute). You can see how the datasets were created via the [`04_custom_data_creation.ipynb` notebook](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/extras/04_custom_data_creation.ipynb). \n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 7.3 Download different datasets\n", "\n", "Before we start running our series of experiments, we need to make sure our datasets are ready.\n", "\n", "We'll need two forms of a training set:\n", "1. A training set with **10% of the data** of Food101 pizza, steak, sushi images (we've already created this above but we'll do it again for completeness).\n", "2. A training set with **20% of the data** of Food101 pizza, steak, sushi images.\n", "\n", "For consistency, all experiments will use the same testing dataset (the one from the 10% data split).\n", "\n", "We'll start by downloading the various datasets we need using the `download_data()` function we created earlier.\n", "\n", "Both datasets are available from the course GitHub:\n", "1. [Pizza, steak, sushi 10% training data](https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip).\n", "2. [Pizza, steak, sushi 20% training data](https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi_20_percent.zip). " ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[INFO] data/pizza_steak_sushi directory exists, skipping download.\n", "[INFO] data/pizza_steak_sushi_20_percent directory exists, skipping download.\n" ] } ], "source": [ "# Download 10 percent and 20 percent training data (if necessary)\n", "data_10_percent_path = download_data(source=\"https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi.zip\",\n", " destination=\"pizza_steak_sushi\")\n", "\n", "data_20_percent_path = download_data(source=\"https://github.com/mrdbourke/pytorch-deep-learning/raw/main/data/pizza_steak_sushi_20_percent.zip\",\n", " destination=\"pizza_steak_sushi_20_percent\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Data downloaded!\n", "\n", "Now let's setup the filepaths to data we'll be using for the different experiments.\n", "\n", "We'll create different training directory paths but we'll only need one testing directory path since all experiments will be using the same test dataset (the test dataset from pizza, steak, sushi 10%)." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training directory 10%: data/pizza_steak_sushi/train\n", "Training directory 20%: data/pizza_steak_sushi_20_percent/train\n", "Testing directory: data/pizza_steak_sushi/test\n" ] } ], "source": [ "# Setup training directory paths\n", "train_dir_10_percent = data_10_percent_path / \"train\"\n", "train_dir_20_percent = data_20_percent_path / \"train\"\n", "\n", "# Setup testing directory paths (note: use the same test dataset for both to compare the results)\n", "test_dir = data_10_percent_path / \"test\"\n", "\n", "# Check the directories\n", "print(f\"Training directory 10%: {train_dir_10_percent}\")\n", "print(f\"Training directory 20%: {train_dir_20_percent}\")\n", "print(f\"Testing directory: {test_dir}\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 7.4 Transform Datasets and create DataLoaders\n", "\n", "Next we'll create a series of transforms to prepare our images for our model(s).\n", "\n", "To keep things consistent, we'll manually create a transform (just like we did above) and use the same transform across all of the datasets.\n", "\n", "The transform will: \n", "1. Resize all the images (we'll start with 224, 224 but this could be changed).\n", "2. Turn them into tensors with values between 0 & 1. \n", "3. Normalize them in way so their distributions are inline with the ImageNet dataset (we do this because our models from [`torchvision.models`](https://pytorch.org/vision/stable/models.html) have been pretrained on ImageNet)." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "from torchvision import transforms\n", "\n", "# Create a transform to normalize data distribution to be inline with ImageNet\n", "normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], # values per colour channel [red, green, blue]\n", " std=[0.229, 0.224, 0.225]) # values per colour channel [red, green, blue]\n", "\n", "# Compose transforms into a pipeline\n", "simple_transform = transforms.Compose([\n", " transforms.Resize((224, 224)), # 1. Resize the images\n", " transforms.ToTensor(), # 2. Turn the images into tensors with values between 0 & 1\n", " normalize # 3. Normalize the images so their distributions match the ImageNet dataset \n", "])" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Transform ready!\n", "\n", "Now let's create our DataLoaders using the `create_dataloaders()` function from `data_setup.py` we created in [05. PyTorch Going Modular section 2](https://www.learnpytorch.io/05_pytorch_going_modular/#2-create-datasets-and-dataloaders-data_setuppy). \n", "\n", "We'll create the DataLoaders with a batch size of 32.\n", "\n", "For all of our experiments we'll be using the same `test_dataloader` (to keep comparisons consistent)." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of batches of size 32 in 10 percent training data: 8\n", "Number of batches of size 32 in 20 percent training data: 15\n", "Number of batches of size 32 in testing data: 8 (all experiments will use the same test set)\n", "Number of classes: 3, class names: ['pizza', 'steak', 'sushi']\n" ] } ], "source": [ "BATCH_SIZE = 32\n", "\n", "# Create 10% training and test DataLoaders\n", "train_dataloader_10_percent, test_dataloader, class_names = data_setup.create_dataloaders(train_dir=train_dir_10_percent,\n", " test_dir=test_dir, \n", " transform=simple_transform,\n", " batch_size=BATCH_SIZE\n", ")\n", "\n", "# Create 20% training and test data DataLoders\n", "train_dataloader_20_percent, test_dataloader, class_names = data_setup.create_dataloaders(train_dir=train_dir_20_percent,\n", " test_dir=test_dir,\n", " transform=simple_transform,\n", " batch_size=BATCH_SIZE\n", ")\n", "\n", "# Find the number of samples/batches per dataloader (using the same test_dataloader for both experiments)\n", "print(f\"Number of batches of size {BATCH_SIZE} in 10 percent training data: {len(train_dataloader_10_percent)}\")\n", "print(f\"Number of batches of size {BATCH_SIZE} in 20 percent training data: {len(train_dataloader_20_percent)}\")\n", "print(f\"Number of batches of size {BATCH_SIZE} in testing data: {len(train_dataloader_10_percent)} (all experiments will use the same test set)\")\n", "print(f\"Number of classes: {len(class_names)}, class names: {class_names}\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 7.5 Create feature extractor models\n", "\n", "Time to start building our models.\n", "\n", "We're going to create two feature extractor models: \n", "\n", "1. [`torchvision.models.efficientnet_b0()`](https://pytorch.org/vision/main/models/generated/torchvision.models.efficientnet_b0.html) pretrained backbone + custom classifier head (EffNetB0 for short).\n", "2. [`torchvision.models.efficientnet_b2()`](https://pytorch.org/vision/main/models/generated/torchvision.models.efficientnet_b2.html) pretrained backbone + custom classifier head (EffNetB2 for short).\n", "\n", "To do this, we'll freeze the base layers (the feature layers) and update the model's classifier heads (output layers) to suit our problem just like we did in [06. PyTorch Transfer Learning section 3.4](https://www.learnpytorch.io/06_pytorch_transfer_learning/#34-freezing-the-base-model-and-changing-the-output-layer-to-suit-our-needs).\n", "\n", "We saw in the previous chapter the `in_features` parameter to the classifier head of EffNetB0 is `1280` (the backbone turns the input image into a feature vector of size `1280`).\n", "\n", "Since EffNetB2 has a different number of layers and parameters, we'll need to adapt it accordingly.\n", "\n", "> **Note:** Whenever you use a different model, one of the first things you should inspect is the input and output shapes. That way you'll know how you'll have to prepare your input data/update the model to have the correct output shape.\n", "\n", "We can find the input and output shapes of EffNetB2 using [`torchinfo.summary()`](https://github.com/TylerYep/torchinfo) and passing in the `input_size=(32, 3, 224, 224)` parameter (`(32, 3, 224, 224)` is equivalent to `(batch_size, color_channels, height, width)`, i.e we pass in an example of what a single batch of data would be to our model).\n", "\n", "> **Note:** Many modern models can handle input images of varying sizes thanks to [`torch.nn.AdaptiveAvgPool2d()`](https://pytorch.org/docs/stable/generated/torch.nn.AdaptiveAvgPool2d.html) layer, this layer adaptively adjusts the `output_size` of a given input as required. You can try this out by passing different size input images to `torchinfo.summary()` or to your own models using the layer.\n", "\n", "To find the required input shape to the final layer of EffNetB2, let's:\n", "1. Create an instance of `torchvision.models.efficientnet_b2(pretrained=True)`.\n", "2. See the various input and output shapes by running `torchinfo.summary()`.\n", "3. Print out the number of `in_features` by inspecting `state_dict()` of the classifier portion of EffNetB2 and printing the length of the weight matrix.\n", " * **Note:** You could also just inspect the output of `effnetb2.classifier`.\n" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of in_features to final layer of EfficientNetB2: 1408\n" ] } ], "source": [ "import torchvision\n", "from torchinfo import summary\n", "\n", "# 1. Create an instance of EffNetB2 with pretrained weights\n", "effnetb2_weights = torchvision.models.EfficientNet_B2_Weights.DEFAULT # \"DEFAULT\" means best available weights\n", "effnetb2 = torchvision.models.efficientnet_b2(weights=effnetb2_weights)\n", "\n", "# # 2. Get a summary of standard EffNetB2 from torchvision.models (uncomment for full output)\n", "# summary(model=effnetb2, \n", "# input_size=(32, 3, 224, 224), # make sure this is \"input_size\", not \"input_shape\"\n", "# # col_names=[\"input_size\"], # uncomment for smaller output\n", "# col_names=[\"input_size\", \"output_size\", \"num_params\", \"trainable\"],\n", "# col_width=20,\n", "# row_settings=[\"var_names\"]\n", "# ) \n", "\n", "# 3. Get the number of in_features of the EfficientNetB2 classifier layer\n", "print(f\"Number of in_features to final layer of EfficientNetB2: {len(effnetb2.classifier.state_dict()['1.weight'][0])}\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "\"output\n", "\n", "*Model summary of EffNetB2 feature extractor model with all layers unfrozen (trainable) and default classifier head from ImageNet pretraining.*\n", "\n", "Now we know the required number of `in_features` for the EffNetB2 model, let's create a couple of helper functions to setup our EffNetB0 and EffNetB2 feature extractor models.\n", "\n", "We want these functions to:\n", "1. Get the base model from `torchvision.models`\n", "2. Freeze the base layers in the model (set `requires_grad=False`)\n", "3. Set the random seeds (we don't *need* to do this but since we're running a series of experiments and initalizing a new layer with random weights, we want the randomness to be similar for each experiment)\n", "4. Change the classifier head (to suit our problem)\n", "5. Give the model a name (e.g. \"effnetb0\" for EffNetB0)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "import torchvision\n", "from torch import nn\n", "\n", "# Get num out features (one for each class pizza, steak, sushi)\n", "OUT_FEATURES = len(class_names)\n", "\n", "# Create an EffNetB0 feature extractor\n", "def create_effnetb0():\n", " # 1. Get the base mdoel with pretrained weights and send to target device\n", " weights = torchvision.models.EfficientNet_B0_Weights.DEFAULT\n", " model = torchvision.models.efficientnet_b0(weights=weights).to(device)\n", "\n", " # 2. Freeze the base model layers\n", " for param in model.features.parameters():\n", " param.requires_grad = False\n", "\n", " # 3. Set the seeds\n", " set_seeds()\n", "\n", " # 4. Change the classifier head\n", " model.classifier = nn.Sequential(\n", " nn.Dropout(p=0.2),\n", " nn.Linear(in_features=1280, out_features=OUT_FEATURES)\n", " ).to(device)\n", "\n", " # 5. Give the model a name\n", " model.name = \"effnetb0\"\n", " print(f\"[INFO] Created new {model.name} model.\")\n", " return model\n", "\n", "# Create an EffNetB2 feature extractor\n", "def create_effnetb2():\n", " # 1. Get the base model with pretrained weights and send to target device\n", " weights = torchvision.models.EfficientNet_B2_Weights.DEFAULT\n", " model = torchvision.models.efficientnet_b2(weights=weights).to(device)\n", "\n", " # 2. Freeze the base model layers\n", " for param in model.features.parameters():\n", " param.requires_grad = False\n", "\n", " # 3. Set the seeds\n", " set_seeds()\n", "\n", " # 4. Change the classifier head\n", " model.classifier = nn.Sequential(\n", " nn.Dropout(p=0.3),\n", " nn.Linear(in_features=1408, out_features=OUT_FEATURES)\n", " ).to(device)\n", "\n", " # 5. Give the model a name\n", " model.name = \"effnetb2\"\n", " print(f\"[INFO] Created new {model.name} model.\")\n", " return model" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Those are some nice looking functions!\n", "\n", "Let's test them out by creating an instance of EffNetB0 and EffNetB2 and checking out their `summary()`." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[INFO] Created new effnetb0 model.\n" ] } ], "source": [ "effnetb0 = create_effnetb0() \n", "\n", "# Get an output summary of the layers in our EffNetB0 feature extractor model (uncomment to view full output)\n", "# summary(model=effnetb0, \n", "# input_size=(32, 3, 224, 224), # make sure this is \"input_size\", not \"input_shape\"\n", "# # col_names=[\"input_size\"], # uncomment for smaller output\n", "# col_names=[\"input_size\", \"output_size\", \"num_params\", \"trainable\"],\n", "# col_width=20,\n", "# row_settings=[\"var_names\"]\n", "# ) " ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "\"output\n", "\n", "*Model summary of EffNetB0 model with base layers frozen (untrainable) and updated classifier head (suited for pizza, steak, sushi image classification).*" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[INFO] Created new effnetb2 model.\n" ] } ], "source": [ "effnetb2 = create_effnetb2()\n", "\n", "# Get an output summary of the layers in our EffNetB2 feature extractor model (uncomment to view full output)\n", "# summary(model=effnetb2, \n", "# input_size=(32, 3, 224, 224), # make sure this is \"input_size\", not \"input_shape\"\n", "# # col_names=[\"input_size\"], # uncomment for smaller output\n", "# col_names=[\"input_size\", \"output_size\", \"num_params\", \"trainable\"],\n", "# col_width=20,\n", "# row_settings=[\"var_names\"]\n", "# ) " ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "\"output\n", "\n", "*Model summary of EffNetB2 model with base layers frozen (untrainable) and updated classifier head (suited for pizza, steak, sushi image classification).*" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Looking at the outputs of the summaries, it seems the EffNetB2 backbone has nearly double the amount of parameters as EffNetB0.\n", "\n", "| Model | Total parameters (before freezing/changing head) | Total parameters (after freezing/changing head) | Total trainable parameters (after freezing/changing head) |\n", "| ----- | ----- | ----- | ----- |\n", "| EfficientNetB0 | 5,288,548 | 4,011,391 | 3,843 | \n", "| EfficientNetB2 | 9,109,994 | 7,705,221 | 4,227 |\n", "\n", "This gives the backbone of the EffNetB2 model more opportunities to form a representation of our pizza, steak and sushi data.\n", "\n", "However, the trainable parameters for each model (the classifier heads) aren't very different.\n", "\n", "Will these extra parameters lead to better results?\n", "\n", "We'll have to wait and see... \n", "\n", "> **Note:** In the spirit of experimenting, you really could try almost any model from `torchvision.models` in a similar fashion to what we're doing here. I've only chosen EffNetB0 and EffNetB2 as examples. Perhaps you might want to throw something like `torchvision.models.convnext_tiny()` or `torchvision.models.convnext_small()` into the mix." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 7.6 Create experiments and set up training code\n", "\n", "We've prepared our data and prepared our models, the time has come to setup some experiments!\n", "\n", "We'll start by creating two lists and a dictionary:\n", "1. A list of the number of epochs we'd like to test (`[5, 10]`)\n", "2. A list of the models we'd like to test (`[\"effnetb0\", \"effnetb2\"]`)\n", "3. A dictionary of the different training DataLoaders" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "# 1. Create epochs list\n", "num_epochs = [5, 10]\n", "\n", "# 2. Create models list (need to create a new model for each experiment)\n", "models = [\"effnetb0\", \"effnetb2\"]\n", "\n", "# 3. Create dataloaders dictionary for various dataloaders\n", "train_dataloaders = {\"data_10_percent\": train_dataloader_10_percent,\n", " \"data_20_percent\": train_dataloader_20_percent}" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Lists and dictionary created!\n", "\n", "Now we can write code to iterate through each of the different options and try out each of the different combinations.\n", "\n", "We'll also save the model at the end of each experiment so later on we can load back in the best model and use it for making predictions.\n", "\n", "Specifically, let's go through the following steps: \n", "1. Set the random seeds (so our experiment results are reproducible, in practice, you might run the same experiment across ~3 different seeds and average the results).\n", "2. Keep track of different experiment numbers (this is mostly for pretty print outs).\n", "3. Loop through the `train_dataloaders` dictionary items for each of the different training DataLoaders.\n", "4. Loop through the list of epoch numbers.\n", "5. Loop through the list of different model names.\n", "6. Create information print outs for the current running experiment (so we know what's happening).\n", "7. Check which model is the target model and create a new EffNetB0 or EffNetB2 instance (we create a new model instance each experiment so all models start from the same standpoint).\n", "8. Create a new loss function (`torch.nn.CrossEntropyLoss()`) and optimizer (`torch.optim.Adam(params=model.parameters(), lr=0.001)`) for each new experiment.\n", "9. Train the model with the modified `train()` function passing the appropriate details to the `writer` parameter.\n", "10. Save the trained model with an appropriate file name to file with `save_model()` from [`utils.py`](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/going_modular/going_modular/utils.py). \n", "\n", "We can also use the `%%time` magic to see how long all of our experiments take together in a single Jupyter/Google Colab cell.\n", "\n", "Let's do it!" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[INFO] Experiment number: 1\n", "[INFO] Model: effnetb0\n", "[INFO] DataLoader: data_10_percent\n", "[INFO] Number of epochs: 5\n", "[INFO] Created new effnetb0 model.\n", "[INFO] Created SummaryWriter, saving to: runs/2022-06-23/data_10_percent/effnetb0/5_epochs...\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "7f724e8d22604328b6f2c69ab0b3948f", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/5 [00:00 **Note:** Depending on the random seeds you used/hardware you used there's a chance your numbers aren't exactly the same as what's here. This is okay. It's due to the inheret randomness of deep learning. What matters most is the trend. Where your numbers are heading. If they're off by a large amount, perhaps there's something wrong and best to go back and check the code. But if they're off by a small amount (say a couple of decimal places or so), that's okay. \n", "\n", "\"various\n", "\n", "*Visualizing the test loss values for the different modelling experiments in TensorBoard, you can see that the EffNetB0 model trained for 10 epochs and with 20% of the data achieves the lowest loss. This sticks with the overall trend of the experiments that: more data, larger model and longer training time is generally better.*\n", "\n", "You can also upload your TensorBoard experiment results to [tensorboard.dev](https://tensorboard.dev) to host them publically for free.\n", "\n", "For example, running code similiar to the following: " ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "# # Upload the results to TensorBoard.dev (uncomment to try it out)\n", "# !tensorboard dev upload --logdir runs \\\n", "# --name \"07. PyTorch Experiment Tracking: FoodVision Mini model results\" \\\n", "# --description \"Comparing results of different model size, training data amount and training time.\"" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Running the cell above results in the experiments from this notebook being publically viewable at: https://tensorboard.dev/experiment/VySxUYY7Rje0xREYvCvZXA/\n", "\n", "> **Note:** Beware that anything you upload to tensorboard.dev is publically available for anyone to see. So if you do upload your experiments, be careful they don't contain sensitive information. " ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## 9. Load in the best model and make predictions with it\n", "\n", "Looking at the TensorBoard logs for our eight experiments, it seems experiment number eight achieved the best overall results (highest test accuracy, second lowest test loss).\n", "\n", "This is the experiment that used:\n", "* EffNetB2 (double the parameters of EffNetB0)\n", "* 20% pizza, steak, sushi training data (double the original training data)\n", "* 10 epochs (double the original training time)\n", "\n", "In essence, our biggest model achieved the best results.\n", "\n", "Though it wasn't as if these results were far better than the other models.\n", "\n", "The same model on the same data achieved similar results in half the training time (experiment number 6).\n", "\n", "This suggests that potentially the most influential parts of our experiments were the number of parameters and the amount of data.\n", "\n", "Inspecting the results further it seems that generally a model with more parameters (EffNetB2) and more data (20% pizza, steak, sushi training data) performs better (lower test loss and higher test accuracy).\n", "\n", "More experiments could be done to further test this but for now, let's import our best performing model from experiment eight (saved to: `models/07_effnetb2_data_20_percent_10_epochs.pth`, you can [download this model from the course GitHub](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/models/07_effnetb2_data_20_percent_10_epochs.pth)) and perform some qualitative evaluations.\n", "\n", "In other words, let's *visualize, visualize, visualize!*\n", "\n", "We can import the best saved model by creating a new instance of EffNetB2 using the `create_effnetb2()` function and then load in the saved `state_dict()` with `torch.load()`." ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[INFO] Created new effnetb2 model.\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Setup the best model filepath\n", "best_model_path = \"models/07_effnetb2_data_20_percent_10_epochs.pth\"\n", "\n", "# Instantiate a new instance of EffNetB2 (to load the saved state_dict() to)\n", "best_model = create_effnetb2()\n", "\n", "# Load the saved best model state_dict()\n", "best_model.load_state_dict(torch.load(best_model_path))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Best model loaded!\n", "\n", "While we're here, let's check its filesize.\n", "\n", "This is an important consideration later on when deploying the model (incorporating it in an app).\n", "\n", "If the model is too large, it can be hard to deploy." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "EfficientNetB2 feature extractor model size: 29 MB\n" ] } ], "source": [ "# Check the model file size\n", "from pathlib import Path\n", "\n", "# Get the model size in bytes then convert to megabytes\n", "effnetb2_model_size = Path(best_model_path).stat().st_size // (1024*1024)\n", "print(f\"EfficientNetB2 feature extractor model size: {effnetb2_model_size} MB\")" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Looks like our best model so far is 29 MB in size. We'll keep this in mind if we wanted to deploy it later on.\n", "\n", "Time to make and visualize some predictions.\n", "\n", "We created a `pred_and_plot_image()` function to use a trained model to make predictions on an image in [06. PyTorch Transfer Learning section 6](https://www.learnpytorch.io/06_pytorch_transfer_learning/#6-make-predictions-on-images-from-the-test-set).\n", "\n", "And we can reuse this function by importing it from [`going_modular.going_modular.predictions.py`](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/going_modular/going_modular/predictions.py) (I put the `pred_and_plot_image()` function in a script so we could reuse it).\n", "\n", "So to make predictions on various images the model hasn't seen before, we'll first get a list of all the image filepaths from the 20% pizza, steak, sushi testing dataset and then we'll randomly select a subset of these filepaths to pass to our `pred_and_plot_image()` function." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Import function to make predictions on images and plot them \n", "# See the function previously created in section: https://www.learnpytorch.io/06_pytorch_transfer_learning/#6-make-predictions-on-images-from-the-test-set\n", "from going_modular.going_modular.predictions import pred_and_plot_image\n", "\n", "# Get a random list of 3 images from 20% test set\n", "import random\n", "num_images_to_plot = 3\n", "test_image_path_list = list(Path(data_20_percent_path / \"test\").glob(\"*/*.jpg\")) # get all test image paths from 20% dataset\n", "test_image_path_sample = random.sample(population=test_image_path_list,\n", " k=num_images_to_plot) # randomly select k number of images\n", "\n", "# Iterate through random test image paths, make predictions on them and plot them\n", "for image_path in test_image_path_sample:\n", " pred_and_plot_image(model=best_model,\n", " image_path=image_path,\n", " class_names=class_names,\n", " image_size=(224, 224))" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Nice!\n", "\n", "Running the cell above a few times we can see our model performs quite well and often has higher prediction probabilities than previous models we've built.\n", "\n", "This suggests the model is more confident in the decisions it's making. " ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### 9.1 Predict on a custom image with the best model\n", "\n", "Making predictions on the test dataset is cool but the real magic of machine learning is making predictions on custom images of your own.\n", "\n", "So let's import the trusty [pizza dad image](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/images/04-pizza-dad.jpeg) (a photo of my dad in front of a pizza) we've been using for the past couple of sections and see how our model performs on it." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "data/04-pizza-dad.jpeg already exists, skipping download.\n" ] }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Download custom image\n", "import requests\n", "\n", "# Setup custom image path\n", "custom_image_path = Path(\"data/04-pizza-dad.jpeg\")\n", "\n", "# Download the image if it doesn't already exist\n", "if not custom_image_path.is_file():\n", " with open(custom_image_path, \"wb\") as f:\n", " # When downloading from GitHub, need to use the \"raw\" file link\n", " request = requests.get(\"https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/04-pizza-dad.jpeg\")\n", " print(f\"Downloading {custom_image_path}...\")\n", " f.write(request.content)\n", "else:\n", " print(f\"{custom_image_path} already exists, skipping download.\")\n", "\n", "# Predict on custom image\n", "pred_and_plot_image(model=model,\n", " image_path=custom_image_path,\n", " class_names=class_names)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "Woah!\n", "\n", "Two thumbs again!\n", "\n", "Our best model predicts \"pizza\" correctly and this time with an even higher prediction probability (0.978) than the first feature extraction model we trained and used in [06. PyTorch Transfer Learning section 6.1](https://www.learnpytorch.io/06_pytorch_transfer_learning/#61-making-predictions-on-a-custom-image).\n", "\n", "This again suggests our current best model (EffNetB2 feature extractor trained on 20% of the pizza, steak, sushi training data and for 10 epochs) has learned patterns to make it more confident of its decision to predict pizza.\n", "\n", "I wonder what could improve our model's performance even further? \n", "\n", "I'll leave that as a challenge for you to investigate." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Main takeaways\n", "\n", "We've now gone full circle on the PyTorch workflow introduced in [01. PyTorch Workflow Fundamentals](https://www.learnpytorch.io/01_pytorch_workflow/), we've gotten data ready, we've built and picked a pretrained model, we've used our various helper functions to train and evaluate the model and in this notebook we've improved our FoodVision Mini model by running and tracking a series of experiments.\n", "\n", "\"a\n", "\n", "You should be proud of yourself, this is no small feat!\n", "\n", "The main ideas you should take away from this Milestone Project 1 are:\n", "\n", "* The machine learning practioner's motto: *experiment, experiment, experiment!* (though we've been doing plenty of this already).\n", "* In the beginning, keep your experiments small so you can work fast, your first few experiments shouldn't take more than a few seconds to a few minutes to run.\n", "* The more experiments you do, the quicker you can figure out what *doesn't* work.\n", "* Scale up when you find something that works. For example, since we've found a pretty good performing model with EffNetB2 as a feature extractor, perhaps you'd now like to see what happens when you scale it up to the whole [Food101 dataset](https://pytorch.org/vision/main/generated/torchvision.datasets.Food101.html) from `torchvision.datasets`.\n", "* Programmatically tracking your experiments takes a few steps to set up but it's worth it in the long run so you can figure out what works and what doesn't.\n", " * There are many different machine learning experiment trackers out there so explore a few and try them out." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Exercises\n", "\n", "> **Note:** These exercises expect the use of `torchvision` v0.13+ (released July 2022), previous versions may work but will likely have errors.\n", "\n", "All of the exercises are focused on practicing the code above.\n", "\n", "You should be able to complete them by referencing each section or by following the resource(s) linked.\n", "\n", "All exercises should be completed using [device-agnostic code](https://pytorch.org/docs/stable/notes/cuda.html#device-agnostic-code).\n", "\n", "**Resources:**\n", "* [Exercise template notebook for 07](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/extras/exercises/07_pytorch_experiment_tracking_exercise_template.ipynb)\n", "* [Example solutions notebook for 07](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/extras/solutions/07_pytorch_experiment_tracking_exercise_solutions.ipynb) (try the exercises *before* looking at this)\n", " * See a live [video walkthrough of the solutions on YouTube](https://youtu.be/cO_r2FYcAjU) (errors and all)\n", "\n", "\n", "1. Pick a larger model from [`torchvision.models`](https://pytorch.org/vision/main/models.html) to add to the list of experiments (for example, EffNetB3 or higher). \n", " * How does it perform compared to our existing models?\n", "2. Introduce data augmentation to the list of experiments using the 20% pizza, steak, sushi training and test datasets, does this change anything?\n", " * For example, you could have one training DataLoader that uses data augmentation (e.g. `train_dataloader_20_percent_aug` and `train_dataloader_20_percent_no_aug`) and then compare the results of two of the same model types training on these two DataLoaders.\n", " * **Note:** You may need to alter the `create_dataloaders()` function to be able to take a transform for the training data and the testing data (because you don't need to perform data augmentation on the test data). See [04. PyTorch Custom Datasets section 6](https://www.learnpytorch.io/04_pytorch_custom_datasets/#6-other-forms-of-transforms-data-augmentation) for examples of using data augmentation or the script below for an example:\n", "\n", "```python\n", "# Note: Data augmentation transform like this should only be performed on training data\n", "train_transform_data_aug = transforms.Compose([\n", " transforms.Resize((224, 224)),\n", " transforms.TrivialAugmentWide(),\n", " transforms.ToTensor(),\n", " normalize\n", "])\n", "\n", "# Helper function to view images in a DataLoader (works with data augmentation transforms or not) \n", "def view_dataloader_images(dataloader, n=10):\n", " if n > 10:\n", " print(f\"Having n higher than 10 will create messy plots, lowering to 10.\")\n", " n = 10\n", " imgs, labels = next(iter(dataloader))\n", " plt.figure(figsize=(16, 8))\n", " for i in range(n):\n", " # Min max scale the image for display purposes\n", " targ_image = imgs[i]\n", " sample_min, sample_max = targ_image.min(), targ_image.max()\n", " sample_scaled = (targ_image - sample_min)/(sample_max - sample_min)\n", "\n", " # Plot images with appropriate axes information\n", " plt.subplot(1, 10, i+1)\n", " plt.imshow(sample_scaled.permute(1, 2, 0)) # resize for Matplotlib requirements\n", " plt.title(class_names[labels[i]])\n", " plt.axis(False)\n", "\n", "# Have to update `create_dataloaders()` to handle different augmentations\n", "import os\n", "from torch.utils.data import DataLoader\n", "from torchvision import datasets\n", "\n", "NUM_WORKERS = os.cpu_count() # use maximum number of CPUs for workers to load data \n", "\n", "# Note: this is an update version of data_setup.create_dataloaders to handle\n", "# differnt train and test transforms.\n", "def create_dataloaders(\n", " train_dir, \n", " test_dir, \n", " train_transform, # add parameter for train transform (transforms on train dataset)\n", " test_transform, # add parameter for test transform (transforms on test dataset)\n", " batch_size=32, num_workers=NUM_WORKERS\n", "):\n", " # Use ImageFolder to create dataset(s)\n", " train_data = datasets.ImageFolder(train_dir, transform=train_transform)\n", " test_data = datasets.ImageFolder(test_dir, transform=test_transform)\n", "\n", " # Get class names\n", " class_names = train_data.classes\n", "\n", " # Turn images into data loaders\n", " train_dataloader = DataLoader(\n", " train_data,\n", " batch_size=batch_size,\n", " shuffle=True,\n", " num_workers=num_workers,\n", " pin_memory=True,\n", " )\n", " test_dataloader = DataLoader(\n", " test_data,\n", " batch_size=batch_size,\n", " shuffle=True,\n", " num_workers=num_workers,\n", " pin_memory=True,\n", " )\n", "\n", " return train_dataloader, test_dataloader, class_names\n", "```\n", "\n", "3. Scale up the dataset to turn FoodVision Mini into FoodVision Big using the entire [Food101 dataset from `torchvision.models`](https://pytorch.org/vision/stable/generated/torchvision.datasets.Food101.html#torchvision.datasets.Food101)\n", " * You could take the best performing model from your various experiments or even the EffNetB2 feature extractor we created in this notebook and see how it goes fitting for 5 epochs on all of Food101.\n", " * If you try more than one model, it would be good to have the model's results tracked.\n", " * If you load the Food101 dataset from `torchvision.models`, you'll have to create PyTorch DataLoaders to use it in training.\n", " * **Note:** Due to the larger amount of data in Food101 compared to our pizza, steak, sushi dataset, this model will take longer to train." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "## Extra-curriculum\n", "\n", "* Read [The Bitter Lesson](http://www.incompleteideas.net/IncIdeas/BitterLesson.html) blog post by Richard Sutton to get an idea of how many of the latest advancements in AI have come from increased scale (bigger datasets and bigger models) and more general (less meticulously crafted) methods.\n", "* Go through the [PyTorch YouTube/code tutorial](https://pytorch.org/tutorials/beginner/introyt/tensorboardyt_tutorial.html) for TensorBoard for 20-minutes and see how it compares to the code we've written in this notebook.\n", "* Perhaps you may want to view and rearrange your model's TensorBoard logs with a DataFrame (so you can sort the results by lowest loss or highest accuracy), there's a guide for this [in the TensorBoard documentation](https://www.tensorflow.org/tensorboard/dataframe_api). \n", "* If you like to use VSCode for development using scripts or notebooks (VSCode can now use Jupyter Notebooks natively), you can setup TensorBoard right within VSCode using the [PyTorch Development in VSCode guide](https://code.visualstudio.com/docs/datascience/pytorch-support).\n", "* To go further with experiment tracking and see how your PyTorch model is performing from a speed perspective (are there any bottlenecks that could be improved to speed up training?), see the [PyTorch documentation for the PyTorch profiler](https://pytorch.org/blog/introducing-pytorch-profiler-the-new-and-improved-performance-tool/).\n", "* Made With ML is an outstanding resource for all things machine learning by Goku Mohandas and their [guide on experiment tracking](https://madewithml.com/courses/mlops/experiment-tracking/) contains a fantastic introduction to tracking machine learning experiments with MLflow." ] } ], "metadata": { "interpreter": { "hash": "3fbe1355223f7b2ffc113ba3ade6a2b520cadace5d5ec3e828c83ce02eb221bf" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" } }, "nbformat": 4, "nbformat_minor": 4 }