{ "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "view-in-github" }, "source": [ "\"Open\n", "\n", "[View Source Code](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/02_pytorch_classification.ipynb) | [View Slides](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/slides/02_pytorch_classification.pdf) | [Watch Video Walkthrough](https://youtu.be/Z_ikDlimN6A?t=30691) " ] }, { "cell_type": "markdown", "metadata": { "id": "r8C1WSzsHC7x" }, "source": [ "# 02. PyTorch Neural Network Classification\n", "\n", "## What is a classification problem?\n", "\n", "A [classification problem](https://en.wikipedia.org/wiki/Statistical_classification) involves predicting whether something is one thing or another.\n", "\n", "For example, you might want to:\n", "\n", "| Problem type | What is it? | Example |\n", "| ----- | ----- | ----- |\n", "| **Binary classification** | Target can be one of two options, e.g. yes or no | Predict whether or not someone has heart disease based on their health parameters. |\n", "| **Multi-class classification** | Target can be one of more than two options | Decide whether a photo of is of food, a person or a dog. |\n", "| **Multi-label classification** | Target can be assigned more than one option | Predict what categories should be assigned to a Wikipedia article (e.g. mathematics, science & philosohpy). |\n", "\n", "
\n", "\"various\n", "
\n", " \n", "Classification, along with regression (predicting a number, covered in [notebook 01](https://www.learnpytorch.io/01_pytorch_workflow/)) is one of the most common types of machine learning problems.\n", "\n", "In this notebook, we're going to work through a couple of different classification problems with PyTorch. \n", "\n", "In other words, taking a set of inputs and predicting what class those set of inputs belong to.\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "id": "9wTlTaDKH7Oj" }, "source": [ "## What we're going to cover\n", "\n", "In this notebook we're going to reiterate over the PyTorch workflow we coverd in [01. PyTorch Workflow](https://www.learnpytorch.io/02_pytorch_classification/).\n", "\n", "\"a\n", "\n", "Except instead of trying to predict a straight line (predicting a number, also called a regression problem), we'll be working on a **classification problem**.\n", "\n", "Specifically, we're going to cover:\n", "\n", "| **Topic** | **Contents** |\n", "| ----- | ----- |\n", "| **0. Architecture of a classification neural network** | Neural networks can come in almost any shape or size, but they typically follow a similar floor plan. |\n", "| **1. Getting binary classification data ready** | Data can be almost anything but to get started we're going to create a simple binary classification dataset. |\n", "| **2. Building a PyTorch classification model** | Here we'll create a model to learn patterns in the data, we'll also choose a **loss function**, **optimizer** and build a **training loop** specific to classification. | \n", "| **3. Fitting the model to data (training)** | We've got data and a model, now let's let the model (try to) find patterns in the (**training**) data. |\n", "| **4. Making predictions and evaluating a model (inference)** | Our model's found patterns in the data, let's compare its findings to the actual (**testing**) data. |\n", "| **5. Improving a model (from a model perspective)** | We've trained an evaluated a model but it's not working, let's try a few things to improve it. |\n", "| **6. Non-linearity** | So far our model has only had the ability to model straight lines, what about non-linear (non-straight) lines? |\n", "| **7. Replicating non-linear functions** | We used **non-linear functions** to help model non-linear data, but what do these look like? |\n", "| **8. Putting it all together with multi-class classification** | Let's put everything we've done so far for binary classification together with a multi-class classification problem. |\n" ] }, { "cell_type": "markdown", "metadata": { "id": "uxdUc9OfHtgU" }, "source": [ "## Where can you get help?\n", "\n", "All of the materials for this course [live on GitHub](https://github.com/mrdbourke/pytorch-deep-learning).\n", "\n", "And if you run into trouble, you can ask a question on the [Discussions page](https://github.com/mrdbourke/pytorch-deep-learning/discussions) there too.\n", "\n", "There's also the [PyTorch developer forums](https://discuss.pytorch.org/), a very helpful place for all things PyTorch. \n", "\n" ] }, { "cell_type": "markdown", "metadata": { "id": "MSLHJiQxH4jU" }, "source": [ "## 0. Architecture of a classification neural network\n", "\n", "Before we get into writing code, let's look at the general architecture of a classification neural network.\n", "\n", "| **Hyperparameter** | **Binary Classification** | **Multiclass classification** |\n", "| --- | --- | --- |\n", "| **Input layer shape** (`in_features`) | Same as number of features (e.g. 5 for age, sex, height, weight, smoking status in heart disease prediction) | Same as binary classification |\n", "| **Hidden layer(s)** | Problem specific, minimum = 1, maximum = unlimited | Same as binary classification |\n", "| **Neurons per hidden layer** | Problem specific, generally 10 to 512 | Same as binary classification |\n", "| **Output layer shape** (`out_features`) | 1 (one class or the other) | 1 per class (e.g. 3 for food, person or dog photo) |\n", "| **Hidden layer activation** | Usually [ReLU](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html#torch.nn.ReLU) (rectified linear unit) but [can be many others](https://en.wikipedia.org/wiki/Activation_function#Table_of_activation_functions) | Same as binary classification |\n", "| **Output activation** | [Sigmoid](https://en.wikipedia.org/wiki/Sigmoid_function) ([`torch.sigmoid`](https://pytorch.org/docs/stable/generated/torch.sigmoid.html) in PyTorch)| [Softmax](https://en.wikipedia.org/wiki/Softmax_function) ([`torch.softmax`](https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html) in PyTorch) |\n", "| **Loss function** | [Binary crossentropy](https://en.wikipedia.org/wiki/Cross_entropy#Cross-entropy_loss_function_and_logistic_regression) ([`torch.nn.BCELoss`](https://pytorch.org/docs/stable/generated/torch.nn.BCELoss.html) in PyTorch) | Cross entropy ([`torch.nn.CrossEntropyLoss`](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html) in PyTorch) |\n", "| **Optimizer** | [SGD](https://pytorch.org/docs/stable/generated/torch.optim.SGD.html) (stochastic gradient descent), [Adam](https://pytorch.org/docs/stable/generated/torch.optim.Adam.html) (see [`torch.optim`](https://pytorch.org/docs/stable/optim.html) for more options) | Same as binary classification |\n", "\n", "Of course, this ingredient list of classification neural network components will vary depending on the problem you're working on.\n", "\n", "But it's more than enough to get started.\n", "\n", "We're going to gets hands-on with this setup throughout this notebook." ] }, { "cell_type": "markdown", "metadata": { "id": "VwvxFEjKHC71" }, "source": [ "## 1. Make classification data and get it ready\n", "\n", "Let's begin by making some data.\n", "\n", "We'll use the [`make_circles()`](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_circles.html) method from Scikit-Learn to generate two circles with different coloured dots. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "RGeZvHsyHC72" }, "outputs": [], "source": [ "from sklearn.datasets import make_circles\n", "\n", "\n", "# Make 1000 samples \n", "n_samples = 1000\n", "\n", "# Create circles\n", "X, y = make_circles(n_samples,\n", " noise=0.03, # a little bit of noise to the dots\n", " random_state=42) # keep random state so we get the same values" ] }, { "cell_type": "markdown", "metadata": { "id": "1FwwzJnQV2jv" }, "source": [ "Alright, now let's view the first 5 `X` and `y` values." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "oAb8vcIhWEO8", "outputId": "b7316d88-7733-4981-9b4a-0a98c7cdd829" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "First 5 X features:\n", "[[ 0.75424625 0.23148074]\n", " [-0.75615888 0.15325888]\n", " [-0.81539193 0.17328203]\n", " [-0.39373073 0.69288277]\n", " [ 0.44220765 -0.89672343]]\n", "\n", "First 5 y labels:\n", "[1 1 1 1 0]\n" ] } ], "source": [ "print(f\"First 5 X features:\\n{X[:5]}\")\n", "print(f\"\\nFirst 5 y labels:\\n{y[:5]}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "ATakj2bVWBou" }, "source": [ "Looks like there's two `X` values per one `y` value. \n", "\n", "Let's keep following the data explorer's motto of *visualize, visualize, visualize* and put them into a pandas DataFrame." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 363 }, "id": "XAAqx_8sHC73", "outputId": "cd6ef4fe-cda3-48db-f2a5-9820660eab14" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
X1X2label
00.7542460.2314811
1-0.7561590.1532591
2-0.8153920.1732821
3-0.3937310.6928831
40.442208-0.8967230
5-0.4796460.6764351
6-0.0136480.8033491
70.7715130.1477601
8-0.169322-0.7934561
9-0.1214861.0215090
\n", "
" ], "text/plain": [ " X1 X2 label\n", "0 0.754246 0.231481 1\n", "1 -0.756159 0.153259 1\n", "2 -0.815392 0.173282 1\n", "3 -0.393731 0.692883 1\n", "4 0.442208 -0.896723 0\n", "5 -0.479646 0.676435 1\n", "6 -0.013648 0.803349 1\n", "7 0.771513 0.147760 1\n", "8 -0.169322 -0.793456 1\n", "9 -0.121486 1.021509 0" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Make DataFrame of circle data\n", "import pandas as pd\n", "circles = pd.DataFrame({\"X1\": X[:, 0],\n", " \"X2\": X[:, 1],\n", " \"label\": y\n", "})\n", "circles.head(10)" ] }, { "cell_type": "markdown", "metadata": { "id": "FK2T7GpYW2BE" }, "source": [ "It looks like each pair of `X` features (`X1` and `X2`) has a label (`y`) value of either 0 or 1.\n", "\n", "This tells us that our problem is **binary classification** since there's only two options (0 or 1).\n", "\n", "How many values of each class is there?" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "cV8sVR9tHC74", "outputId": "dee5739f-1695-4e71-bb0b-de270f08b621" }, "outputs": [ { "data": { "text/plain": [ "1 500\n", "0 500\n", "Name: label, dtype: int64" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Check different labels\n", "circles.label.value_counts()" ] }, { "cell_type": "markdown", "metadata": { "id": "ytQ5rm9eXa65" }, "source": [ "500 each, nice and balanced.\n", "\n", "Let's plot them." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 265 }, "id": "ANkSmESCHC75", "outputId": "89b2f9ac-728c-481d-f4ba-6192a8334758" }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Visualize with a plot\n", "import matplotlib.pyplot as plt\n", "plt.scatter(x=X[:, 0], \n", " y=X[:, 1], \n", " c=y, \n", " cmap=plt.cm.RdYlBu);" ] }, { "cell_type": "markdown", "metadata": { "id": "h0e1oR27HC76" }, "source": [ "Alrighty, looks like we've got a problem to solve.\n", "\n", "Let's find out how we could build a PyTorch neural network to classify dots into red (0) or blue (1).\n", "\n", "> **Note:** This dataset is often what's considered a **toy problem** (a problem that's used to try and test things out on) in machine learning. \n", "> \n", "> But it represents the major key of classification, you have some kind of data represented as numerical values and you'd like to build a model that's able to classify it, in our case, separate it into red or blue dots." ] }, { "cell_type": "markdown", "metadata": { "id": "Ny6_J7F4HC76" }, "source": [ "### 1.1 Input and output shapes\n", "\n", "One of the most common errors in deep learning is shape errors.\n", "\n", "Mismatching the shapes of tensors and tensor operations with result in errors in your models.\n", "\n", "We're going to see plenty of these throughout the course.\n", "\n", "And there's no surefire way to making sure they won't happen, they will.\n", "\n", "What you can do instead is continually familiarize yourself with the shape of the data you're working with.\n", "\n", "I like referring to it as input and output shapes.\n", "\n", "Ask yourself:\n", "\n", "\"What shapes are my inputs and what shapes are my outputs?\"\n", "\n", "Let's find out." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "r0h--vKdHC77", "outputId": "16e65bb5-83e1-49c7-af5e-90ecede4eeae" }, "outputs": [ { "data": { "text/plain": [ "((1000, 2), (1000,))" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Check the shapes of our features and labels\n", "X.shape, y.shape" ] }, { "cell_type": "markdown", "metadata": { "id": "PW7zolShbfDg" }, "source": [ "Looks like we've got a match on the first dimension of each.\n", "\n", "There's 1000 `X` and 1000 `y`. \n", "\n", "But what's the second dimension on `X`?\n", "\n", "It often helps to view the values and shapes of a single sample (features and labels).\n", "\n", "Doing so will help you understand what input and output shapes you'd be expecting from your model." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "k-L5zobVHC78", "outputId": "b3a96ca8-45f1-47d1-a98b-c2a7be79c0d5" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Values for one sample of X: [0.75424625 0.23148074] and the same for y: 1\n", "Shapes for one sample of X: (2,) and the same for y: ()\n" ] } ], "source": [ "# View the first example of features and labels\n", "X_sample = X[0]\n", "y_sample = y[0]\n", "print(f\"Values for one sample of X: {X_sample} and the same for y: {y_sample}\")\n", "print(f\"Shapes for one sample of X: {X_sample.shape} and the same for y: {y_sample.shape}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "Vn692XaRaifl" }, "source": [ "This tells us the second dimension for `X` means it has two features (vector) where as `y` has a single feature (scalar).\n", "\n", "We have two inputs for one output." ] }, { "cell_type": "markdown", "metadata": { "id": "eLyDPN6ZR_ho" }, "source": [ "### 1.2 Turn data into tensors and create train and test splits\n", "\n", "We've investigated the input and output shapes of our data, now let's prepare it for being used with PyTorch and for modelling.\n", "\n", "Specifically, we'll need to:\n", "1. Turn our data into tensors (right now our data is in NumPy arrays and PyTorch prefers to work with PyTorch tensors).\n", "2. Split our data into training and test sets (we'll train a model on the training set to learn the patterns between `X` and `y` and then evaluate those learned patterns on the test dataset)." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Z2gcR31aHC78", "outputId": "cd4e10c1-c358-4b74-81e0-bab7610197cc" }, "outputs": [ { "data": { "text/plain": [ "(tensor([[ 0.7542, 0.2315],\n", " [-0.7562, 0.1533],\n", " [-0.8154, 0.1733],\n", " [-0.3937, 0.6929],\n", " [ 0.4422, -0.8967]]),\n", " tensor([1., 1., 1., 1., 0.]))" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Turn data into tensors\n", "# Otherwise this causes issues with computations later on\n", "import torch\n", "X = torch.from_numpy(X).type(torch.float)\n", "y = torch.from_numpy(y).type(torch.float)\n", "\n", "# View the first five samples\n", "X[:5], y[:5]" ] }, { "cell_type": "markdown", "metadata": { "id": "r9XNJv8lfmRG" }, "source": [ "Now our data is in tensor format, let's split it into training and test sets.\n", "\n", "To do so, let's use the helpful function [`train_test_split()`](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) from Scikit-Learn.\n", "\n", "We'll use `test_size=0.2` (80% training, 20% testing) and because the split happens randomly across the data, let's use `random_state=42` so the split is reproducible." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "DW6PU82BHC79", "outputId": "a2d3f3df-701e-44ef-a506-fd31d5443e90" }, "outputs": [ { "data": { "text/plain": [ "(800, 200, 800, 200)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Split data into train and test sets\n", "from sklearn.model_selection import train_test_split\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(X, \n", " y, \n", " test_size=0.2, # 20% test, 80% train\n", " random_state=42) # make the random split reproducible\n", "\n", "len(X_train), len(X_test), len(y_train), len(y_test)" ] }, { "cell_type": "markdown", "metadata": { "id": "t-wYDWy9gSRc" }, "source": [ "Nice! Looks like we've now got 800 training samples and 200 testing samples." ] }, { "cell_type": "markdown", "metadata": { "id": "iCyQ93VTHC79" }, "source": [ "## 2. Building a model\n", "\n", "We've got some data ready, now it's time to build a model.\n", "\n", "We'll break it down into a few parts.\n", "\n", "1. Setting up device agnostic code (so our model can run on CPU or GPU if it's available).\n", "2. Constructing a model by subclassing `nn.Module`.\n", "3. Defining a loss function and optimizer.\n", "4. Creating a training loop (this'll be in the next section).\n", "\n", "The good news is we've been through all of the above steps before in notebook 01.\n", "\n", "Except now we'll be adjusting them so they work with a classification dataset.\n", "\n", "Let's start by importing PyTorch and `torch.nn` as well as setting up device agnostic code." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 36 }, "id": "2itMTezPRnVR", "outputId": "507bf4a3-b5ae-4943-aa21-4eb181b0d741" }, "outputs": [ { "data": { "text/plain": [ "'cuda'" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Standard PyTorch imports\n", "import torch\n", "from torch import nn\n", "\n", "# Make device agnostic code\n", "device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n", "device" ] }, { "cell_type": "markdown", "metadata": { "id": "p1Fil1v04nXr" }, "source": [ "Excellent, now `device` is setup, we can use it for any data or models we create and PyTorch will handle it on the CPU (default) or GPU if it's available.\n", "\n", "How about we create a model?\n", "\n", "We'll want a model capable of handling our `X` data as inputs and producing something in the shape of our `y` data as ouputs.\n", "\n", "In other words, given `X` (features) we want our model to predict `y` (label).\n", "\n", "This setup where you have features and labels is referred to as **supervised learning**. Because your data is telling your model what the outputs should be given a certain input.\n", "\n", "To create such a model it'll need to handle the input and output shapes of `X` and `y`.\n", "\n", "Remember how I said input and output shapes are important? Here we'll see why.\n", "\n", "Let's create a model class that:\n", "1. Subclasses `nn.Module` (almost all PyTorch models are subclasses of `nn.Module`).\n", "2. Creates 2 `nn.Linear` layers in the constructor capable of handling the input and output shapes of `X` and `y`.\n", "3. Defines a `forward()` method containing the forward pass computation of the model.\n", "4. Instantiates the model class and sends it to the target `device`. " ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "C_EsAE3VHC7-", "outputId": "d9c976ea-e23f-4993-c847-a15b0bf7b0b5" }, "outputs": [ { "data": { "text/plain": [ "CircleModelV0(\n", " (layer_1): Linear(in_features=2, out_features=5, bias=True)\n", " (layer_2): Linear(in_features=5, out_features=1, bias=True)\n", ")" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 1. Construct a model class that subclasses nn.Module\n", "class CircleModelV0(nn.Module):\n", " def __init__(self):\n", " super().__init__()\n", " # 2. Create 2 nn.Linear layers capable of handling X and y input and output shapes\n", " self.layer_1 = nn.Linear(in_features=2, out_features=5) # takes in 2 features (X), produces 5 features\n", " self.layer_2 = nn.Linear(in_features=5, out_features=1) # takes in 5 features, produces 1 feature (y)\n", " \n", " # 3. Define a forward method containing the forward pass computation\n", " def forward(self, x):\n", " # Return the output of layer_2, a single feature, the same shape as y\n", " return self.layer_2(self.layer_1(x)) # computation goes through layer_1 first then the output of layer_1 goes through layer_2\n", "\n", "# 4. Create an instance of the model and send it to target device\n", "model_0 = CircleModelV0().to(device)\n", "model_0" ] }, { "cell_type": "markdown", "metadata": { "id": "TtdkMniZ7KyK" }, "source": [ "What's going on here?\n", "\n", "We've seen a few of these steps before.\n", "\n", "The only major change is what's happening between `self.layer_1` and `self.layer_2`.\n", "\n", "`self.layer_1` takes 2 input features `in_features=2` and produces 5 output features `out_features=5`.\n", "\n", "This is known as having 5 **hidden units** or **neurons**.\n", "\n", "This layer turns the input data from having 2 features to 5 features.\n", "\n", "Why do this?\n", "\n", "This allows the model to learn patterns from 5 numbers rather than just 2 numbers, *potentially* leading to better outputs.\n", "\n", "I say potentially because sometimes it doesn't work.\n", "\n", "The number of hidden units you can use in neural network layers is a **hyperparameter** (a value you can set yourself) and there's no set in stone value you have to use.\n", "\n", "Generally more is better but there's also such a thing as too much. The amount you choose will depend on your model type and dataset you're working with. \n", "\n", "Since our dataset is small and simple, we'll keep it small.\n", "\n", "The only rule with hidden units is that the next layer, in our case, `self.layer_2` has to take the same `in_features` as the previous layer `out_features`.\n", "\n", "That's why `self.layer_2` has `in_features=5`, it takes the `out_features=5` from `self.layer_1` and performs a linear computation on them, turning them into `out_features=1` (the same shape as `y`).\n", "\n", "![A visual example of what a classification neural network with linear activation looks like on the tensorflow playground](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/02-tensorflow-playground-linear-activation.png)\n", "*A visual example of what a similar classificiation neural network to the one we've just built looks like. Try create one of your own on the [TensorFlow Playground website](https://playground.tensorflow.org/).*\n", "\n", "You can also do the same as above using [`nn.Sequential`](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html).\n", "\n", "`nn.Sequential` performs a forward pass computation of the input data through the layers in the order they appear." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "H7I8GyLjHC7_", "outputId": "def65504-863a-4361-816e-909dbfa4d624" }, "outputs": [ { "data": { "text/plain": [ "Sequential(\n", " (0): Linear(in_features=2, out_features=5, bias=True)\n", " (1): Linear(in_features=5, out_features=1, bias=True)\n", ")" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Replicate CircleModelV0 with nn.Sequential\n", "model_0 = nn.Sequential(\n", " nn.Linear(in_features=2, out_features=5),\n", " nn.Linear(in_features=5, out_features=1)\n", ").to(device)\n", "\n", "model_0" ] }, { "cell_type": "markdown", "metadata": { "id": "MiRHq9mxFIf-" }, "source": [ "Woah, that looks much simpler than subclassing `nn.Module`, why not just always use `nn.Sequential`?\n", "\n", "`nn.Sequential` is fantastic for straight-forward computations, however, as the namespace says, it *always* runs in sequential order.\n", "\n", "So if you'd something else to happen (rather than just straight-forward sequential computation) you'll want to define your own custom `nn.Module` subclass.\n", "\n", "Now we've got a model, let's see what happens when we pass some data through it." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Tt339-_CC8sz", "outputId": "a4014167-4181-434e-ab75-905094229b3a" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Length of predictions: 200, Shape: torch.Size([200, 1])\n", "Length of test samples: 200, Shape: torch.Size([200])\n", "\n", "First 10 predictions:\n", "tensor([[-0.4279],\n", " [-0.3417],\n", " [-0.5975],\n", " [-0.3801],\n", " [-0.5078],\n", " [-0.4559],\n", " [-0.2842],\n", " [-0.3107],\n", " [-0.6010],\n", " [-0.3350]], device='cuda:0', grad_fn=)\n", "\n", "First 10 test labels:\n", "tensor([1., 0., 1., 0., 1., 1., 0., 0., 1., 0.])\n" ] } ], "source": [ "# Make predictions with the model\n", "untrained_preds = model_0(X_test.to(device))\n", "print(f\"Length of predictions: {len(untrained_preds)}, Shape: {untrained_preds.shape}\")\n", "print(f\"Length of test samples: {len(y_test)}, Shape: {y_test.shape}\")\n", "print(f\"\\nFirst 10 predictions:\\n{untrained_preds[:10]}\")\n", "print(f\"\\nFirst 10 test labels:\\n{y_test[:10]}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "q7v8TVnqGMZh" }, "source": [ "Hmm, it seems there's the same amount of predictions as there is test labels but the predictions don't look like they're in the same form or shape as the test labels.\n", "\n", "We've got a couple steps we can do to fix this, we'll see these later on." ] }, { "cell_type": "markdown", "metadata": { "id": "8aoQn39pHC7_" }, "source": [ "### 2.1 Setup loss function and optimizer\n", "\n", "We've setup a loss (also called a criterion or cost function) and optimizer before in [notebook 01](https://www.learnpytorch.io/01_pytorch_workflow/#creating-a-loss-function-and-optimizer-in-pytorch).\n", "\n", "But different problem types require different loss functions. \n", "\n", "For example, for a regression problem (predicting a number) you might used mean absolute error (MAE) loss.\n", "\n", "And for a binary classification problem (like ours), you'll often use [binary cross entropy](https://towardsdatascience.com/understanding-binary-cross-entropy-log-loss-a-visual-explanation-a3ac6025181a) as the loss function.\n", "\n", "However, the same optimizer function can often be used across different problem spaces.\n", "\n", "For example, the stochastic gradient descent optimizer (SGD, `torch.optim.SGD()`) can be used for a range of problems, and the same applies to the Adam optimizer (`torch.optim.Adam()`). \n", "\n", "| Loss function/Optimizer | Problem type | PyTorch Code |\n", "| ----- | ----- | ----- |\n", "| Stochastic Gradient Descent (SGD) optimizer | Classification, regression, many others. | [`torch.optim.SGD()`](https://pytorch.org/docs/stable/generated/torch.optim.SGD.html) |\n", "| Adam Optimizer | Classification, regression, many others. | [`torch.optim.Adam()`](https://pytorch.org/docs/stable/generated/torch.optim.Adam.html) |\n", "| Binary cross entropy loss | Binary classification | [`torch.nn.BCELossWithLogits`](https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html) or [`torch.nn.BCELoss`](https://pytorch.org/docs/stable/generated/torch.nn.BCELoss.html) |\n", "| Cross entropy loss | Mutli-class classification | [`torch.nn.CrossEntropyLoss`](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html) |\n", "| Mean absolute error (MAE) or L1 Loss | Regression | [`torch.nn.L1Loss`](https://pytorch.org/docs/stable/generated/torch.nn.L1Loss.html) | \n", "| Mean squared error (MSE) or L2 Loss | Regression | [`torch.nn.MSELoss`](https://pytorch.org/docs/stable/generated/torch.nn.MSELoss.html#torch.nn.MSELoss) | \n", "\n", "*Table of various loss functions and optimizers, there are more but these some common ones you'll see.*\n", "\n", "Since we're working with a binary classification problem, let's use a binary cross entropy loss function.\n", "\n", "> **Note:** Recall a **loss function** is what measures how *wrong* your model predictions are, the higher the loss, the worse your model.\n", ">\n", "> Also, PyTorch documentation often refers to loss functions as \"loss criterion\" or \"criterion\", these are all different ways of describing the same thing.\n", "\n", "PyTorch has two binary cross entropy implementations:\n", "1. [`torch.nn.BCELoss()`](https://pytorch.org/docs/stable/generated/torch.nn.BCELoss.html) - Creates a loss function that measures the binary cross entropy between the target (label) and input (features).\n", "2. [`torch.nn.BCEWithLogitsLoss()`](https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html) - This is the same as above except it has a sigmoid layer ([`nn.Sigmoid`](https://pytorch.org/docs/stable/generated/torch.nn.Sigmoid.html)) built-in (we'll see what this means soon).\n", "\n", "Which one should you use? \n", "\n", "The [documentation for `torch.nn.BCEWithLogitsLoss()`](https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html) states that it's more numerically stable than using `torch.nn.BCELoss()` after a `nn.Sigmoid` layer. \n", "\n", "So generally, implementation 2 is a better option. However for advanced usage, you may want to separate the combination of `nn.Sigmoid` and `torch.nn.BCELoss()` but that is beyond the scope of this notebook.\n", "\n", "Knowing this, let's create a loss function and an optimizer. \n", "\n", "For the optimizer we'll use `torch.optim.SGD()` to optimize the model parameters with learning rate 0.1.\n", "\n", "> **Note:** There's a [discussion on the PyTorch forums about the use of `nn.BCELoss` vs. `nn.BCEWithLogitsLoss`](https://discuss.pytorch.org/t/bceloss-vs-bcewithlogitsloss/33586/4). It can be confusing at first but as with many things, it becomes easier with practice." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "id": "DjpQsdOZHC7_" }, "outputs": [], "source": [ "# Create a loss function\n", "# loss_fn = nn.BCELoss() # BCELoss = no sigmoid built-in\n", "loss_fn = nn.BCEWithLogitsLoss() # BCEWithLogitsLoss = sigmoid built-in\n", "\n", "# Create an optimizer\n", "optimizer = torch.optim.SGD(params=model_0.parameters(), \n", " lr=0.1)" ] }, { "cell_type": "markdown", "metadata": { "id": "RKmi3fp9wYnV" }, "source": [ "Now let's also create an **evaluation metric**.\n", "\n", "An evaluation metric can be used to offer another perspective on how your model is going.\n", "\n", "If a loss function measures how *wrong* your model is, I like to think of evaluation metrics as measuring how *right* it is.\n", "\n", "Of course, you could argue both of these are doing the same thing but evaluation metrics offer a different perspective.\n", "\n", "After all, when evaluating your models it's good to look at things from multiple points of view.\n", "\n", "There are several evaluation metrics that can be used for classification problems but let's start out with **accuracy**.\n", "\n", "Accuracy can be measured by dividing the total number of correct predictions over the total number of predictions.\n", "\n", "For example, a model that makes 99 correct predictions out of 100 will have an accuracy of 99%.\n", "\n", "Let's write a function to do so.\n", "\n" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "id": "RcaZQR0mHC7_" }, "outputs": [], "source": [ "# Calculate accuracy (a classification metric)\n", "def accuracy_fn(y_true, y_pred):\n", " correct = torch.eq(y_true, y_pred).sum().item() # torch.eq() calculates where two tensors are equal\n", " acc = (correct / len(y_pred)) * 100 \n", " return acc" ] }, { "cell_type": "markdown", "metadata": { "id": "OuplloDPxviL" }, "source": [ "Excellent! We can now use this function whilst training our model to measure it's performance alongside the loss." ] }, { "cell_type": "markdown", "metadata": { "id": "4UpJKZ8PHC8A" }, "source": [ "## 3. Train model\n", "\n", "Okay, now we've got a loss function and optimizer ready to go, let's train a model.\n", "\n", "Do you remember the steps in a PyTorch training loop?\n", "\n", "If not, here's a reminder.\n", "\n", "Steps in training:\n", "\n", "
\n", " PyTorch training loop steps\n", "
    \n", "
  1. Forward pass - The model goes through all of the training data once, performing its\n", " forward() function\n", " calculations (model(x_train)).\n", "
  2. \n", "
  3. Calculate the loss - The model's outputs (predictions) are compared to the ground truth and evaluated\n", " to see how\n", " wrong they are (loss = loss_fn(y_pred, y_train).
  4. \n", "
  5. Zero gradients - The optimizers gradients are set to zero (they are accumulated by default) so they\n", " can be\n", " recalculated for the specific training step (optimizer.zero_grad()).
  6. \n", "
  7. Perform backpropagation on the loss - Computes the gradient of the loss with respect for every model\n", " parameter to\n", " be updated (each parameter\n", " with requires_grad=True). This is known as backpropagation, hence \"backwards\"\n", " (loss.backward()).
  8. \n", "
  9. Step the optimizer (gradient descent) - Update the parameters with requires_grad=True\n", " with respect to the loss\n", " gradients in order to improve them (optimizer.step()).
  10. \n", "
\n", "
\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "id": "NMeDqPpsnst6" }, "source": [ "### 3.1 Going from raw model outputs to predicted labels (logits -> prediction probabilities -> prediction labels)\n", "\n", "Before the training loop steps, let's see what comes out of our model during the forward pass (the forward pass is defined by the `forward()` method).\n", "\n", "To do so, let's pass the model some data." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "e5stsbDl0zKl", "outputId": "829a0842-0a37-455a-b058-3e547200f836" }, "outputs": [ { "data": { "text/plain": [ "tensor([[-0.4279],\n", " [-0.3417],\n", " [-0.5975],\n", " [-0.3801],\n", " [-0.5078]], device='cuda:0', grad_fn=)" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# View the frist 5 outputs of the forward pass on the test data\n", "y_logits = model_0(X_test.to(device))[:5]\n", "y_logits" ] }, { "cell_type": "markdown", "metadata": { "id": "b1sNl8D8fWXm" }, "source": [ "Since our model hasn't been trained, these outputs are basically random.\n", "\n", "But *what* are they?\n", "\n", "They're the output of our `forward()` method.\n", "\n", "Which implements two layers of `nn.Linear()` which internally calls the following equation:\n", "\n", "$$\n", "\\mathbf{y} = x \\cdot \\mathbf{Weights}^T + \\mathbf{bias}\n", "$$\n", "\n", "The *raw outputs* (unmodified) of this equation ($\\mathbf{y}$) and in turn, the raw outputs of our model are often referred to as [**logits**](https://datascience.stackexchange.com/a/31045).\n", "\n", "That's what our model is outputing above when it takes in the input data ($x$ in the equation or `X_test` in the code), logits.\n", "\n", "However, these numbers are hard to interpret.\n", "\n", "We'd like some numbers that are comparable to our truth labels.\n", "\n", "To get our model's raw outputs (logits) into such a form, we can use the [sigmoid activation function](https://pytorch.org/docs/stable/generated/torch.sigmoid.html).\n", "\n", "Let's try it out.\n" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "QGC7UmBi0s6E", "outputId": "3564ee3b-e34f-40a3-a40a-b9c9b948da04" }, "outputs": [ { "data": { "text/plain": [ "tensor([[0.3946],\n", " [0.4154],\n", " [0.3549],\n", " [0.4061],\n", " [0.3757]], device='cuda:0', grad_fn=)" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Use sigmoid on model logits\n", "y_pred_probs = torch.sigmoid(y_logits)\n", "y_pred_probs" ] }, { "cell_type": "markdown", "metadata": { "id": "HxmWz_GVjZoB" }, "source": [ "Okay, it seems like the outputs now have some kind of consistency (even though they're still random).\n", "\n", "They're now in the form of **prediction probabilities** (I usually refer to these as `y_pred_probs`), in other words, the values are now how much the model thinks the data point belongs to one class or another.\n", "\n", "In our case, since we're dealing with binary classification, our ideal outputs are 0 or 1.\n", "\n", "So these values can be viewed as a decision boundary.\n", "\n", "The closer to 0, the more the model thinks the sample belongs to class 0, the closer to 1, the more the model thinks the sample belongs to class 1.\n", "\n", "More specificially:\n", "* If `y_pred_probs` >= 0.5, `y=1` (class 1)\n", "* If `y_pred_probs` < 0.5, `y=0` (class 0)\n", "\n", "To turn our prediction probabilities in prediction labels, we can round the outputs of the sigmoid activation function." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "naZxlTTwlMwX", "outputId": "6134e47d-bbb2-46c3-e2ec-b4f42d517e39" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([True, True, True, True, True], device='cuda:0')\n" ] }, { "data": { "text/plain": [ "tensor([0., 0., 0., 0., 0.], device='cuda:0', grad_fn=)" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Find the predicted labels (round the prediction probabilities)\n", "y_preds = torch.round(y_pred_probs)\n", "\n", "# In full\n", "y_pred_labels = torch.round(torch.sigmoid(model_0(X_test.to(device))[:5]))\n", "\n", "# Check for equality\n", "print(torch.eq(y_preds.squeeze(), y_pred_labels.squeeze()))\n", "\n", "# Get rid of extra dimension\n", "y_preds.squeeze()" ] }, { "cell_type": "markdown", "metadata": { "id": "5cMsgFWWmPLU" }, "source": [ "Excellent! Now it looks like our model's predictions are in the same form as our truth labels (`y_test`)." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "MaQ0CN4ZmU1W", "outputId": "6b1cd7b4-f1d0-49b5-a8c3-338cf75c6cb0" }, "outputs": [ { "data": { "text/plain": [ "tensor([1., 0., 1., 0., 1.])" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "y_test[:5]" ] }, { "cell_type": "markdown", "metadata": { "id": "NXqUulG3maPH" }, "source": [ "This means we'll be able to compare our models predictions to the test labels to see how well it's going. \n", "\n", "To recap, we converted our model's raw outputs (logits) to predicition probabilities using a sigmoid activation function.\n", "\n", "And then converted the prediction probabilities to prediction labels by rounding them.\n", "\n", "> **Note:** The use of the sigmoid activation function is often only for binary classification logits. For multi-class classification, we'll be looking at using the [softmax activation function](https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html) (this will come later on).\n", ">\n", "> And the use of the sigmoid activation function is not required when passing our model's raw outputs to the `nn.BCEWithLogitsLoss` (the \"logits\" in logits loss is because it works on the model's raw logits output), this is because it has a sigmoid function built-in." ] }, { "cell_type": "markdown", "metadata": { "id": "Va7gg8yxn6Sg" }, "source": [ "### 3.2 Building a training and testing loop\n", "\n", "Alright, we've discussed how to take our raw model outputs and convert them to prediction labels, now let's build a training loop.\n", "\n", "Let's start by training for 100 epochs and outputing the model's progress every 10 epochs. " ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "DFABVGo2HC8A", "outputId": "e0341074-b603-41d5-a389-c401b4934d73" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epoch: 0 | Loss: 0.72090, Accuracy: 50.00% | Test loss: 0.72196, Test acc: 50.00%\n", "Epoch: 10 | Loss: 0.70291, Accuracy: 50.00% | Test loss: 0.70542, Test acc: 50.00%\n", "Epoch: 20 | Loss: 0.69659, Accuracy: 50.00% | Test loss: 0.69942, Test acc: 50.00%\n", "Epoch: 30 | Loss: 0.69432, Accuracy: 43.25% | Test loss: 0.69714, Test acc: 41.00%\n", "Epoch: 40 | Loss: 0.69349, Accuracy: 47.00% | Test loss: 0.69623, Test acc: 46.50%\n", "Epoch: 50 | Loss: 0.69319, Accuracy: 49.00% | Test loss: 0.69583, Test acc: 46.00%\n", "Epoch: 60 | Loss: 0.69308, Accuracy: 50.12% | Test loss: 0.69563, Test acc: 46.50%\n", "Epoch: 70 | Loss: 0.69303, Accuracy: 50.38% | Test loss: 0.69551, Test acc: 46.00%\n", "Epoch: 80 | Loss: 0.69302, Accuracy: 51.00% | Test loss: 0.69543, Test acc: 46.00%\n", "Epoch: 90 | Loss: 0.69301, Accuracy: 51.00% | Test loss: 0.69537, Test acc: 46.00%\n" ] } ], "source": [ "torch.manual_seed(42)\n", "\n", "# Set the number of epochs\n", "epochs = 100\n", "\n", "# Put data to target device\n", "X_train, y_train = X_train.to(device), y_train.to(device)\n", "X_test, y_test = X_test.to(device), y_test.to(device)\n", "\n", "# Build training and evaluation loop\n", "for epoch in range(epochs):\n", " ### Training\n", " model_0.train()\n", "\n", " # 1. Forward pass (model outputs raw logits)\n", " y_logits = model_0(X_train).squeeze() # squeeze to remove extra `1` dimensions, this won't work unless model and data are on same device \n", " y_pred = torch.round(torch.sigmoid(y_logits)) # turn logits -> pred probs -> pred labls\n", " \n", " # 2. Calculate loss/accuracy\n", " # loss = loss_fn(torch.sigmoid(y_logits), # Using nn.BCELoss you need torch.sigmoid()\n", " # y_train) \n", " loss = loss_fn(y_logits, # Using nn.BCEWithLogitsLoss works with raw logits\n", " y_train) \n", " acc = accuracy_fn(y_true=y_train, \n", " y_pred=y_pred) \n", "\n", " # 3. Optimizer zero grad\n", " optimizer.zero_grad()\n", "\n", " # 4. Loss backwards\n", " loss.backward()\n", "\n", " # 5. Optimizer step\n", " optimizer.step()\n", "\n", " ### Testing\n", " model_0.eval()\n", " with torch.inference_mode():\n", " # 1. Forward pass\n", " test_logits = model_0(X_test).squeeze() \n", " test_pred = torch.round(torch.sigmoid(test_logits))\n", " # 2. Caculate loss/accuracy\n", " test_loss = loss_fn(test_logits,\n", " y_test)\n", " test_acc = accuracy_fn(y_true=y_test,\n", " y_pred=test_pred)\n", "\n", " # Print out what's happening every 10 epochs\n", " if epoch % 10 == 0:\n", " print(f\"Epoch: {epoch} | Loss: {loss:.5f}, Accuracy: {acc:.2f}% | Test loss: {test_loss:.5f}, Test acc: {test_acc:.2f}%\")" ] }, { "cell_type": "markdown", "metadata": { "id": "It1Xy_f5perA" }, "source": [ "Hmm, what do you notice about the performance of our model?\n", "\n", "It looks like it went through the training and testing steps fine but the results don't seem to have moved too much.\n", "\n", "The accuracy barely moves above 50% on each data split.\n", "\n", "And because we're working with a balanced binary classification problem, it means our model is performing as good as random guessing (with 500 samples of class 0 and class 1 a model predicting class 1 every single time would achieve 50% accuracy)." ] }, { "cell_type": "markdown", "metadata": { "id": "WCeyddo-HC8A" }, "source": [ "## 4. Make predictions and evaluate the model\n", "\n", "From the metrics it looks like our model is random guessing.\n", "\n", "How could we investigate this further?\n", "\n", "I've got an idea.\n", "\n", "The data explorer's motto!\n", "\n", "\"Visualize, visualize, visualize!\"\n", "\n", "Let's make a plot of our model's predictions, the data it's trying to predict on and the decision boundary it's creating for whether something is class 0 or class 1.\n", "\n", "To do so, we'll write some code to download and import the [`helper_functions.py` script](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/helper_functions.py) from the [Learn PyTorch for Deep Learning repo](https://github.com/mrdbourke/pytorch-deep-learning).\n", "\n", "It contains a helpful function called `plot_decision_boundary()` which creates a NumPy meshgrid to visually plot the different points where our model is predicting certain classes.\n", "\n", "We'll also import `plot_predictions()` which we wrote in notebook 01 to use later." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "QakJVGI8t2gB", "outputId": "4f63a8a3-5eae-49fb-e17e-df6c5721ab5a" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "helper_functions.py already exists, skipping download\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/home/daniel/.local/lib/python3.8/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: /home/daniel/.local/lib/python3.8/site-packages/torchvision/image.so: undefined symbol: _ZN3c106detail19maybe_wrap_dim_slowIlEET_S2_S2_b\n", " warn(f\"Failed to load image Python extension: {e}\")\n" ] } ], "source": [ "import requests\n", "from pathlib import Path \n", "\n", "# Download helper functions from Learn PyTorch repo (if not already downloaded)\n", "if Path(\"helper_functions.py\").is_file():\n", " print(\"helper_functions.py already exists, skipping download\")\n", "else:\n", " print(\"Downloading helper_functions.py\")\n", " request = requests.get(\"https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/helper_functions.py\")\n", " with open(\"helper_functions.py\", \"wb\") as f:\n", " f.write(request.content)\n", "\n", "from helper_functions import plot_predictions, plot_decision_boundary" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 390 }, "id": "bEbDUTKjHC8B", "outputId": "aa344fe8-0207-43df-f0e9-61b9302459a5" }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Plot decision boundaries for training and test sets\n", "plt.figure(figsize=(12, 6))\n", "plt.subplot(1, 2, 1)\n", "plt.title(\"Train\")\n", "plot_decision_boundary(model_0, X_train, y_train)\n", "plt.subplot(1, 2, 2)\n", "plt.title(\"Test\")\n", "plot_decision_boundary(model_0, X_test, y_test)" ] }, { "cell_type": "markdown", "metadata": { "id": "aeHx9MbTvrJH" }, "source": [ "Oh wow, it seems like we've found the cause of model's performance issue.\n", "\n", "It's currently trying to split the red and blue dots using a straight line...\n", "\n", "That explains the 50% accuracy. Since our data is circular, drawing a straight line can at best cut it down the middle.\n", "\n", "In machine learning terms, our model is **underfitting**, meaning it's not learning predictive patterns from the data.\n", "\n", "How could we improve this?" ] }, { "cell_type": "markdown", "metadata": { "id": "6VivsLTmHC8B" }, "source": [ "## 5. Improving a model (from a model perspective) \n", "\n", "Let's try to fix our model's underfitting problem.\n", "\n", "Focusing specifically on the model (not the data), there are a few ways we could do this.\n", "\n", "| Model improvement technique* | What does it do? |\n", "| ----- | ----- |\n", "| **Add more layers** | Each layer *potentially* increases the learning capabilities of the model with each layer being able to learn some kind of new pattern in the data, more layers is often referred to as making your neural network *deeper*. |\n", "| **Add more hidden units** | Similar to the above, more hidden units per layer means a *potential* increase in learning capabilities of the model, more hidden units is often referred to as making your neural network *wider*. |\n", "| **Fitting for longer (more epochs)** | Your model might learn more if it had more opportunities to look at the data. |\n", "| **Changing the activation functions** | Some data just can't be fit with only straight lines (like what we've seen), using non-linear activation functions can help with this (hint, hint). |\n", "| **Change the learning rate** | Less model specific, but still related, the learning rate of the optimizer decides how much a model should change its parameters each step, too much and the model overcorrects, too little and it doesn't learn enough. |\n", "| **Change the loss function** | Again, less model specific but still important, different problems require different loss functions. For example, a binary cross entropy loss function won't work with a multi-class classification problem. |\n", "| **Use transfer learning** | Take a pretrained model from a problem domain similar to yours and adjust it to your own problem. We cover transfer learning in [notebook 06](https://www.learnpytorch.io/06_pytorch_transfer_learning/). |\n", "\n", "> **Note:** *because you can adjust all of these by hand, they're referred to as **hyperparameters**. \n", ">\n", "> And this is also where machine learning's half art half science comes in, there's no real way to know here what the best combination of values is for your project, best to follow the data scientist's motto of \"experiment, experiment, experiment\".\n", "\n", "Let's see what happens if we add an extra layer to our model, fit for longer (`epochs=1000` instead of `epochs=100`) and increase the number of hidden units from `5` to `10`.\n", "\n", "We'll follow the same steps we did above but with a few changed hyperparameters." ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "j-GRUN9QHC8B", "outputId": "9174e74b-862a-4bca-9f17-c7176d22f81e" }, "outputs": [ { "data": { "text/plain": [ "CircleModelV1(\n", " (layer_1): Linear(in_features=2, out_features=10, bias=True)\n", " (layer_2): Linear(in_features=10, out_features=10, bias=True)\n", " (layer_3): Linear(in_features=10, out_features=1, bias=True)\n", ")" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "class CircleModelV1(nn.Module):\n", " def __init__(self):\n", " super().__init__()\n", " self.layer_1 = nn.Linear(in_features=2, out_features=10)\n", " self.layer_2 = nn.Linear(in_features=10, out_features=10) # extra layer\n", " self.layer_3 = nn.Linear(in_features=10, out_features=1)\n", " \n", " def forward(self, x): # note: always make sure forward is spelt correctly!\n", " # Creating a model like this is the same as below, though below\n", " # generally benefits from speedups where possible.\n", " # z = self.layer_1(x)\n", " # z = self.layer_2(z)\n", " # z = self.layer_3(z)\n", " # return z\n", " return self.layer_3(self.layer_2(self.layer_1(x)))\n", "\n", "model_1 = CircleModelV1().to(device)\n", "model_1" ] }, { "cell_type": "markdown", "metadata": { "id": "ACkcim2k2G5R" }, "source": [ "Now we've got a model, we'll recreate a loss function and optimizer instance, using the same settings as before." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "id": "AXYwYPpSHC8B" }, "outputs": [], "source": [ "# loss_fn = nn.BCELoss() # Requires sigmoid on input\n", "loss_fn = nn.BCEWithLogitsLoss() # Does not require sigmoid on input\n", "optimizer = torch.optim.SGD(model_1.parameters(), lr=0.1)" ] }, { "cell_type": "markdown", "metadata": { "id": "drHt2W7x1JEW" }, "source": [ "Beautiful, model, optimizer and loss function ready, let's make a training loop.\n", "\n", "This time we'll train for longer (`epochs=1000` vs `epochs=100`) and see if it improves our model. " ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "aX0QGBozHC8C", "outputId": "b00e48e9-1075-4f6e-c0c3-e511451d3fe9" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epoch: 0 | Loss: 0.69396, Accuracy: 50.88% | Test loss: 0.69261, Test acc: 51.00%\n", "Epoch: 100 | Loss: 0.69305, Accuracy: 50.38% | Test loss: 0.69379, Test acc: 48.00%\n", "Epoch: 200 | Loss: 0.69299, Accuracy: 51.12% | Test loss: 0.69437, Test acc: 46.00%\n", "Epoch: 300 | Loss: 0.69298, Accuracy: 51.62% | Test loss: 0.69458, Test acc: 45.00%\n", "Epoch: 400 | Loss: 0.69298, Accuracy: 51.12% | Test loss: 0.69465, Test acc: 46.00%\n", "Epoch: 500 | Loss: 0.69298, Accuracy: 51.00% | Test loss: 0.69467, Test acc: 46.00%\n", "Epoch: 600 | Loss: 0.69298, Accuracy: 51.00% | Test loss: 0.69468, Test acc: 46.00%\n", "Epoch: 700 | Loss: 0.69298, Accuracy: 51.00% | Test loss: 0.69468, Test acc: 46.00%\n", "Epoch: 800 | Loss: 0.69298, Accuracy: 51.00% | Test loss: 0.69468, Test acc: 46.00%\n", "Epoch: 900 | Loss: 0.69298, Accuracy: 51.00% | Test loss: 0.69468, Test acc: 46.00%\n" ] } ], "source": [ "torch.manual_seed(42)\n", "\n", "epochs = 1000 # Train for longer\n", "\n", "# Put data to target device\n", "X_train, y_train = X_train.to(device), y_train.to(device)\n", "X_test, y_test = X_test.to(device), y_test.to(device)\n", "\n", "for epoch in range(epochs):\n", " ### Training\n", " # 1. Forward pass\n", " y_logits = model_1(X_train).squeeze()\n", " y_pred = torch.round(torch.sigmoid(y_logits)) # logits -> predicition probabilities -> prediction labels\n", "\n", " # 2. Calculate loss/accuracy\n", " loss = loss_fn(y_logits, y_train)\n", " acc = accuracy_fn(y_true=y_train, \n", " y_pred=y_pred)\n", "\n", " # 3. Optimizer zero grad\n", " optimizer.zero_grad()\n", "\n", " # 4. Loss backwards\n", " loss.backward()\n", "\n", " # 5. Optimizer step\n", " optimizer.step()\n", "\n", " ### Testing\n", " model_1.eval()\n", " with torch.inference_mode():\n", " # 1. Forward pass\n", " test_logits = model_1(X_test).squeeze() \n", " test_pred = torch.round(torch.sigmoid(test_logits))\n", " # 2. Caculate loss/accuracy\n", " test_loss = loss_fn(test_logits,\n", " y_test)\n", " test_acc = accuracy_fn(y_true=y_test,\n", " y_pred=test_pred)\n", "\n", " # Print out what's happening every 10 epochs\n", " if epoch % 100 == 0:\n", " print(f\"Epoch: {epoch} | Loss: {loss:.5f}, Accuracy: {acc:.2f}% | Test loss: {test_loss:.5f}, Test acc: {test_acc:.2f}%\")\n" ] }, { "cell_type": "markdown", "metadata": { "id": "o0ca3sIV1WrZ" }, "source": [ "What? Our model trained for longer and with an extra layer but it still looks like it didn't learn any patterns better than random guessing.\n", "\n", "Let's visualize." ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 390 }, "id": "GUjJqRw7HC8C", "outputId": "12d163f9-b602-459e-f34b-54c58abf8d7d" }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Plot decision boundaries for training and test sets\n", "plt.figure(figsize=(12, 6))\n", "plt.subplot(1, 2, 1)\n", "plt.title(\"Train\")\n", "plot_decision_boundary(model_1, X_train, y_train)\n", "plt.subplot(1, 2, 2)\n", "plt.title(\"Test\")\n", "plot_decision_boundary(model_1, X_test, y_test)" ] }, { "cell_type": "markdown", "metadata": { "id": "8UqtuZYwHC8C" }, "source": [ "Hmmm.\n", "\n", "Our model is still drawing a straight line between the red and blue dots.\n", "\n", "If our model is drawing a straight line, could it model linear data? Like we did in [notebook 01](https://www.learnpytorch.io/01_pytorch_workflow/)?" ] }, { "cell_type": "markdown", "metadata": { "id": "Nam5esXj2Mj_" }, "source": [ "### 5.1 Preparing data to see if our model can model a straight line\n", "Let's create some linear data to see if our model's able to model it and we're not just using a model that can't learn anything." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "xLoEBQ6fHC8C", "outputId": "176a4674-3142-420c-bace-97cdbfbf473e" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "100\n" ] }, { "data": { "text/plain": [ "(tensor([[0.0000],\n", " [0.0100],\n", " [0.0200],\n", " [0.0300],\n", " [0.0400]]),\n", " tensor([[0.3000],\n", " [0.3070],\n", " [0.3140],\n", " [0.3210],\n", " [0.3280]]))" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create some data (same as notebook 01)\n", "weight = 0.7\n", "bias = 0.3\n", "start = 0\n", "end = 1\n", "step = 0.01\n", "\n", "# Create data\n", "X_regression = torch.arange(start, end, step).unsqueeze(dim=1)\n", "y_regression = weight * X_regression + bias # linear regression formula\n", "\n", "# Check the data\n", "print(len(X_regression))\n", "X_regression[:5], y_regression[:5]" ] }, { "cell_type": "markdown", "metadata": { "id": "wquTX_wX275-" }, "source": [ "Wonderful, now let's split our data into training and test sets." ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "UIoO2k8yHC8D", "outputId": "2d9144d9-18d9-427a-a898-7a5c920cd9aa" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "80 80 20 20\n" ] } ], "source": [ "# Create train and test splits\n", "train_split = int(0.8 * len(X_regression)) # 80% of data used for training set\n", "X_train_regression, y_train_regression = X_regression[:train_split], y_regression[:train_split]\n", "X_test_regression, y_test_regression = X_regression[train_split:], y_regression[train_split:]\n", "\n", "# Check the lengths of each split\n", "print(len(X_train_regression), \n", " len(y_train_regression), \n", " len(X_test_regression), \n", " len(y_test_regression))" ] }, { "cell_type": "markdown", "metadata": { "id": "sQtonoMn3s90" }, "source": [ "Beautiful, let's see how the data looks.\n", "\n", "To do so, we'll use the `plot_predictions()` function we created in notebook 01. \n", "\n", "It's contained within the [`helper_functions.py` script](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/helper_functions.py) on the Learn PyTorch for Deep Learning repo which we downloaded above." ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 428 }, "id": "pcg5OvncHC8D", "outputId": "77b50411-c589-4d5f-e7ca-d03b1b9a68cc" }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plot_predictions(train_data=X_train_regression,\n", " train_labels=y_train_regression,\n", " test_data=X_test_regression,\n", " test_labels=y_test_regression\n", ");" ] }, { "cell_type": "markdown", "metadata": { "id": "kZRflODP66kG" }, "source": [ "### 5.2 Adjusting `model_1` to fit a straight line\n", "\n", "Now we've got some data, let's recreate `model_1` but with a loss function suited to our regression data." ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "rc13zblAHC8D", "outputId": "7bd16b1f-bd2e-486b-b963-9733aaa757be" }, "outputs": [ { "data": { "text/plain": [ "Sequential(\n", " (0): Linear(in_features=1, out_features=10, bias=True)\n", " (1): Linear(in_features=10, out_features=10, bias=True)\n", " (2): Linear(in_features=10, out_features=1, bias=True)\n", ")" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Same architecture as model_1 (but using nn.Sequential)\n", "model_2 = nn.Sequential(\n", " nn.Linear(in_features=1, out_features=10),\n", " nn.Linear(in_features=10, out_features=10),\n", " nn.Linear(in_features=10, out_features=1)\n", ").to(device)\n", "\n", "model_2" ] }, { "cell_type": "markdown", "metadata": { "id": "FOtBAv1E7OqX" }, "source": [ "We'll setup the loss function to be `nn.L1Loss()` (the same as mean absolute error) and the optimizer to be `torch.optim.SGD()`. " ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "id": "f06fraapHC8E" }, "outputs": [], "source": [ "# Loss and optimizer\n", "loss_fn = nn.L1Loss()\n", "optimizer = torch.optim.SGD(model_2.parameters(), lr=0.1)" ] }, { "cell_type": "markdown", "metadata": { "id": "lU7GfFLm7a21" }, "source": [ "Now let's train the model using the regular training loop steps for `epochs=1000` (just like `model_1`).\n", "\n", "> **Note:** We've been writing similar training loop code over and over again. I've made it that way on purpose though, to keep practicing. However, do you have ideas how we could functionize this? That would save a fair bit of coding in the future. Potentially there could be a function for training and a function for testing. " ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "YTitEWgSHC8E", "outputId": "16da5efa-3c3b-494b-ef4e-e5244f4cf097" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epoch: 0 | Train loss: 0.75986, Test loss: 0.54143\n", "Epoch: 100 | Train loss: 0.09309, Test loss: 0.02901\n", "Epoch: 200 | Train loss: 0.07376, Test loss: 0.02850\n", "Epoch: 300 | Train loss: 0.06745, Test loss: 0.00615\n", "Epoch: 400 | Train loss: 0.06107, Test loss: 0.02004\n", "Epoch: 500 | Train loss: 0.05698, Test loss: 0.01061\n", "Epoch: 600 | Train loss: 0.04857, Test loss: 0.01326\n", "Epoch: 700 | Train loss: 0.06109, Test loss: 0.02127\n", "Epoch: 800 | Train loss: 0.05599, Test loss: 0.01426\n", "Epoch: 900 | Train loss: 0.05571, Test loss: 0.00603\n" ] } ], "source": [ "# Train the model\n", "torch.manual_seed(42)\n", "\n", "# Set the number of epochs\n", "epochs = 1000\n", "\n", "# Put data to target device\n", "X_train_regression, y_train_regression = X_train_regression.to(device), y_train_regression.to(device)\n", "X_test_regression, y_test_regression = X_test_regression.to(device), y_test_regression.to(device)\n", "\n", "for epoch in range(epochs):\n", " ### Training \n", " # 1. Forward pass\n", " y_pred = model_2(X_train_regression)\n", " \n", " # 2. Calculate loss (no accuracy since it's a regression problem, not classification)\n", " loss = loss_fn(y_pred, y_train_regression)\n", "\n", " # 3. Optimizer zero grad\n", " optimizer.zero_grad()\n", "\n", " # 4. Loss backwards\n", " loss.backward()\n", "\n", " # 5. Optimizer step\n", " optimizer.step()\n", "\n", " ### Testing\n", " model_2.eval()\n", " with torch.inference_mode():\n", " # 1. Forward pass\n", " test_pred = model_2(X_test_regression)\n", " # 2. Calculate the loss \n", " test_loss = loss_fn(test_pred, y_test_regression)\n", "\n", " # Print out what's happening\n", " if epoch % 100 == 0: \n", " print(f\"Epoch: {epoch} | Train loss: {loss:.5f}, Test loss: {test_loss:.5f}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "IoyLsZW78m_6" }, "source": [ "Okay, unlike `model_1` on the classification data, it looks like `model_2`'s loss is actually going down.\n", "\n", "Let's plot its predictions to see if that's so.\n", "\n", "And remember, since our model and data are using the target `device`, and this device may be a GPU, however, our plotting function uses matplotlib and matplotlib can't handle data on the GPU.\n", "\n", "To handle that, we'll send all of our data to the CPU using [`.cpu()`](https://pytorch.org/docs/stable/generated/torch.Tensor.cpu.html) when we pass it to `plot_predictions()`." ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 428 }, "id": "AvltMCW_HC8E", "outputId": "5887db7d-128b-46f3-c978-c12af112d470" }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Turn on evaluation mode\n", "model_2.eval()\n", "\n", "# Make predictions (inference)\n", "with torch.inference_mode():\n", " y_preds = model_2(X_test_regression)\n", "\n", "# Plot data and predictions with data on the CPU (matplotlib can't handle data on the GPU)\n", "# (try removing .cpu() from one of the below and see what happens)\n", "plot_predictions(train_data=X_train_regression.cpu(),\n", " train_labels=y_train_regression.cpu(),\n", " test_data=X_test_regression.cpu(),\n", " test_labels=y_test_regression.cpu(),\n", " predictions=y_preds.cpu());" ] }, { "cell_type": "markdown", "metadata": { "id": "cZFiXR8B9wYx" }, "source": [ "Alright, it looks like our model is able to do far better than random guessing on straight lines.\n", "\n", "This is a good thing.\n", "\n", "It means our model at least has *some* capacity to learn.\n", "\n", "> **Note:** A helpful troubleshooting step when building deep learning models is to start as small as possible to see if the model works before scaling it up. \n", ">\n", "> This could mean starting with a simple neural network (not many layers, not many hidden neurons) and a small dataset (like the one we've made) and then **overfitting** (making the model perform too well) on that small example before increasing the amount data or the model size/design to *reduce* overfitting.\n", "\n", "So what could it be?\n", "\n", "Let's find out." ] }, { "cell_type": "markdown", "metadata": { "id": "j82n3OyWHC8E" }, "source": [ "## 6. The missing piece: non-linearity \n", "\n", "We've seen our model can draw straight (linear) lines, thanks to its linear layers.\n", "\n", "But how about we give it the capacity to draw non-straight (non-linear) lines?\n", "\n", "How?\n", "\n", "Let's find out.\n" ] }, { "cell_type": "markdown", "metadata": { "id": "cmfOV8v6__17" }, "source": [ "### 6.1 Recreating non-linear data (red and blue circles)\n", "\n", "First, let's recreate the data to start off fresh. We'll use the same setup as before. " ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 265 }, "id": "owqilKGBHC8F", "outputId": "b8c1d692-6f3e-43ca-9323-98bd40e39a89" }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Make and plot data\n", "import matplotlib.pyplot as plt\n", "from sklearn.datasets import make_circles\n", "\n", "n_samples = 1000\n", "\n", "X, y = make_circles(n_samples=1000,\n", " noise=0.03,\n", " random_state=42,\n", ")\n", "\n", "plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.RdBu);" ] }, { "cell_type": "markdown", "metadata": { "id": "D63mlmxR_pxt" }, "source": [ "Nice! Now let's split it into training and test sets using 80% of the data for training and 20% for testing." ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "0UkH4cKLHC8F", "outputId": "5619ad2f-ad5d-476a-dcfd-9f8d48036899" }, "outputs": [ { "data": { "text/plain": [ "(tensor([[ 0.6579, -0.4651],\n", " [ 0.6319, -0.7347],\n", " [-1.0086, -0.1240],\n", " [-0.9666, -0.2256],\n", " [-0.1666, 0.7994]]),\n", " tensor([1., 0., 0., 0., 1.]))" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Convert to tensors and split into train and test sets\n", "import torch\n", "from sklearn.model_selection import train_test_split\n", "\n", "# Turn data into tensors\n", "X = torch.from_numpy(X).type(torch.float)\n", "y = torch.from_numpy(y).type(torch.float)\n", "\n", "# Split into train and test sets\n", "X_train, X_test, y_train, y_test = train_test_split(X, \n", " y, \n", " test_size=0.2,\n", " random_state=42\n", ")\n", "\n", "X_train[:5], y_train[:5]" ] }, { "cell_type": "markdown", "metadata": { "id": "wNm_OBgD_4tk" }, "source": [ "### 6.2 Building a model with non-linearity \n", "\n", "Now here comes the fun part.\n", "\n", "What kind of pattern do you think you could draw with unlimited straight (linear) and non-straight (non-linear) lines?\n", "\n", "I bet you could get pretty creative.\n", "\n", "So far our neural networks have only been using linear (straight) line functions.\n", "\n", "But the data we've been working with is non-linear (circles).\n", "\n", "What do you think will happen when we introduce the capability for our model to use **non-linear actviation functions**?\n", "\n", "Well let's see.\n", "\n", "PyTorch has a bunch of [ready-made non-linear activation functions](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity) that do similiar but different things. \n", "\n", "One of the most common and best performing is [ReLU](https://en.wikipedia.org/wiki/Rectifier_(neural_networks) (rectified linear-unit, [`torch.nn.ReLU()`](https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html)).\n", "\n", "Rather than talk about it, let's put it in our neural network between the hidden layers in the forward pass and see what happens." ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "i_yG7AFHHC8F", "outputId": "00a69e38-369f-47fb-b729-fd4c7b5b47de" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CircleModelV2(\n", " (layer_1): Linear(in_features=2, out_features=10, bias=True)\n", " (layer_2): Linear(in_features=10, out_features=10, bias=True)\n", " (layer_3): Linear(in_features=10, out_features=1, bias=True)\n", " (relu): ReLU()\n", ")\n" ] } ], "source": [ "# Build model with non-linear activation function\n", "from torch import nn\n", "class CircleModelV2(nn.Module):\n", " def __init__(self):\n", " super().__init__()\n", " self.layer_1 = nn.Linear(in_features=2, out_features=10)\n", " self.layer_2 = nn.Linear(in_features=10, out_features=10)\n", " self.layer_3 = nn.Linear(in_features=10, out_features=1)\n", " self.relu = nn.ReLU() # <- add in ReLU activation function\n", " # Can also put sigmoid in the model \n", " # This would mean you don't need to use it on the predictions\n", " # self.sigmoid = nn.Sigmoid()\n", "\n", " def forward(self, x):\n", " # Intersperse the ReLU activation function between layers\n", " return self.layer_3(self.relu(self.layer_2(self.relu(self.layer_1(x)))))\n", "\n", "model_3 = CircleModelV2().to(device)\n", "print(model_3)" ] }, { "cell_type": "markdown", "metadata": { "id": "1UASf5SWEPNJ" }, "source": [ "![a classification neural network on TensorFlow playground with ReLU activation](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/02-tensorflow-playground-relu-activation.png)\n", "*A visual example of what a similar classificiation neural network to the one we've just built (using ReLU activation) looks like. Try create one of your own on the [TensorFlow Playground website](https://playground.tensorflow.org/).*\n", "\n", "> **Question:** *Where should I put the non-linear activation functions when constructing a neural network?*\n", ">\n", "> A rule of thumb is to put them in between hidden layers and just after the output layer, however, there is no set in stone option. As you learn more about neural networks and deep learning you'll find a bunch of different ways of putting things together. In the meantine, best to experiment, experiment, experiment.\n", "\n", "Now we've got a model ready to go, let's create a binary classification loss function as well as an optimizer." ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "id": "uWNlx4lTHC8F" }, "outputs": [], "source": [ "# Setup loss and optimizer \n", "loss_fn = nn.BCEWithLogitsLoss()\n", "optimizer = torch.optim.SGD(model_3.parameters(), lr=0.1)" ] }, { "cell_type": "markdown", "metadata": { "id": "NQL9GF5yFTGD" }, "source": [ "Wonderful! \n", "\n" ] }, { "cell_type": "markdown", "metadata": { "id": "INnCmr2RMk8L" }, "source": [ "### 6.3 Training a model with non-linearity\n", "\n", "You know the drill, model, loss function, optimizer ready to go, let's create a training and testing loop." ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "gTS8yTN_HC8F", "outputId": "df71bd5d-cd0d-4e39-f3e0-7fc90bfaaeb0" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epoch: 0 | Loss: 0.69295, Accuracy: 50.00% | Test Loss: 0.69319, Test Accuracy: 50.00%\n", "Epoch: 100 | Loss: 0.69115, Accuracy: 52.88% | Test Loss: 0.69102, Test Accuracy: 52.50%\n", "Epoch: 200 | Loss: 0.68977, Accuracy: 53.37% | Test Loss: 0.68940, Test Accuracy: 55.00%\n", "Epoch: 300 | Loss: 0.68795, Accuracy: 53.00% | Test Loss: 0.68723, Test Accuracy: 56.00%\n", "Epoch: 400 | Loss: 0.68517, Accuracy: 52.75% | Test Loss: 0.68411, Test Accuracy: 56.50%\n", "Epoch: 500 | Loss: 0.68102, Accuracy: 52.75% | Test Loss: 0.67941, Test Accuracy: 56.50%\n", "Epoch: 600 | Loss: 0.67515, Accuracy: 54.50% | Test Loss: 0.67285, Test Accuracy: 56.00%\n", "Epoch: 700 | Loss: 0.66659, Accuracy: 58.38% | Test Loss: 0.66322, Test Accuracy: 59.00%\n", "Epoch: 800 | Loss: 0.65160, Accuracy: 64.00% | Test Loss: 0.64757, Test Accuracy: 67.50%\n", "Epoch: 900 | Loss: 0.62362, Accuracy: 74.00% | Test Loss: 0.62145, Test Accuracy: 79.00%\n" ] } ], "source": [ "# Fit the model\n", "torch.manual_seed(42)\n", "epochs = 1000\n", "\n", "# Put all data on target device\n", "X_train, y_train = X_train.to(device), y_train.to(device)\n", "X_test, y_test = X_test.to(device), y_test.to(device)\n", "\n", "for epoch in range(epochs):\n", " # 1. Forward pass\n", " y_logits = model_3(X_train).squeeze()\n", " y_pred = torch.round(torch.sigmoid(y_logits)) # logits -> prediction probabilities -> prediction labels\n", " \n", " # 2. Calculate loss and accuracy\n", " loss = loss_fn(y_logits, y_train) # BCEWithLogitsLoss calculates loss using logits\n", " acc = accuracy_fn(y_true=y_train, \n", " y_pred=y_pred)\n", " \n", " # 3. Optimizer zero grad\n", " optimizer.zero_grad()\n", "\n", " # 4. Loss backward\n", " loss.backward()\n", "\n", " # 5. Optimizer step\n", " optimizer.step()\n", "\n", " ### Testing\n", " model_3.eval()\n", " with torch.inference_mode():\n", " # 1. Forward pass\n", " test_logits = model_3(X_test).squeeze()\n", " test_pred = torch.round(torch.sigmoid(test_logits)) # logits -> prediction probabilities -> prediction labels\n", " # 2. Calcuate loss and accuracy\n", " test_loss = loss_fn(test_logits, y_test)\n", " test_acc = accuracy_fn(y_true=y_test,\n", " y_pred=test_pred)\n", "\n", " # Print out what's happening\n", " if epoch % 100 == 0:\n", " print(f\"Epoch: {epoch} | Loss: {loss:.5f}, Accuracy: {acc:.2f}% | Test Loss: {test_loss:.5f}, Test Accuracy: {test_acc:.2f}%\")" ] }, { "cell_type": "markdown", "metadata": { "id": "x89XvV-EMqvB" }, "source": [ "Ho ho! That's looking far better!" ] }, { "cell_type": "markdown", "metadata": { "id": "tfViHC1aM15t" }, "source": [ "### 6.4 Evaluating a model trained with non-linear activation functions\n", "\n", "Remember how our circle data is non-linear? Well, let's see how our models predictions look now the model's been trained with non-linear activation functions." ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "wGHQiWT2HC8G", "outputId": "e95a412b-bef1-4d3d-b705-1eb75a55449e" }, "outputs": [ { "data": { "text/plain": [ "(tensor([1., 0., 1., 0., 0., 1., 0., 0., 1., 0.], device='cuda:0'),\n", " tensor([1., 1., 1., 1., 0., 1., 1., 1., 1., 0.]))" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Make predictions\n", "model_3.eval()\n", "with torch.inference_mode():\n", " y_preds = torch.round(torch.sigmoid(model_3(X_test))).squeeze()\n", "y_preds[:10], y[:10] # want preds in same format as truth labels" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 390 }, "id": "gEeMinjyHC8G", "outputId": "3566e274-ef7a-4eb8-ef34-57b97e7781bc" }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Plot decision boundaries for training and test sets\n", "plt.figure(figsize=(12, 6))\n", "plt.subplot(1, 2, 1)\n", "plt.title(\"Train\")\n", "plot_decision_boundary(model_1, X_train, y_train) # model_1 = no non-linearity\n", "plt.subplot(1, 2, 2)\n", "plt.title(\"Test\")\n", "plot_decision_boundary(model_3, X_test, y_test) # model_3 = has non-linearity" ] }, { "cell_type": "markdown", "metadata": { "id": "cAfU5dXrNGuC" }, "source": [ "Nice! Not perfect but still far better than before.\n", "\n", "Potentially you could try a few tricks to improve the test accuracy of the model? (hint: head back to section 5 for tips on improving the model)" ] }, { "cell_type": "markdown", "metadata": { "id": "rarkXnX-Nhj2" }, "source": [ "## 7. Replicating non-linear activation functions\n", "\n", "We saw before how adding non-linear activation functions to our model can help it to model non-linear data.\n", "\n", "> **Note:** Much of the data you'll encounter in the wild is non-linear (or a combination of linear and non-linear). Right now we've been working with dots on a 2D plot. But imagine if you had images of plants you'd like to classify, there's a lot of different plant shapes. Or text from Wikipedia you'd like to summarize, there's lots of different ways words can be put together (linear and non-linear patterns). \n", "\n", "But what does a non-linear activation *look* like?\n", "\n", "How about we replicate some and what they do?\n", "\n", "Let's start by creating a small amount of data." ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "uvpqD28OscTm", "outputId": "befa3534-f5b1-48f2-88c5-63836759d834" }, "outputs": [ { "data": { "text/plain": [ "tensor([-10., -9., -8., -7., -6., -5., -4., -3., -2., -1., 0., 1.,\n", " 2., 3., 4., 5., 6., 7., 8., 9.])" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a toy tensor (similar to the data going into our model(s))\n", "A = torch.arange(-10, 10, 1, dtype=torch.float32)\n", "A" ] }, { "cell_type": "markdown", "metadata": { "id": "vvSZ4M3-ssZn" }, "source": [ "Wonderful, now let's plot it." ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 265 }, "id": "FTCID1lRsrte", "outputId": "477fdd3d-ae28-4caf-b9ed-b2a9ec369bee" }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAD4CAYAAADhNOGaAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAk/ElEQVR4nO3dd3hUdd738feP3nsvofcqhu4qKioiish6i66Kuoq667P37rMroKiLHduqa0N0dXXXVXcJTUQEbGBBKTek0UJoISF0EgghZb7PHxnvJ8aElpycmczndV255sz5/Wbmex2G+eScOfkeZ2aIiEjkquR3ASIi4i8FgYhIhFMQiIhEOAWBiEiEUxCIiES4Kn4XcDaaNGli7du397sMEZGwsmbNmv1m1rTo+rAMgvbt27N69Wq/yxARCSvOuR3FrdehIRGRCKcgEBGJcAoCEZEIpyAQEYlwCgIRkQinIBARiXAKAhGRCKcgEBEJA4eO5fDwRwlkZOeW+XOH5R+UiYhECjNjUdwe/rwgnsNZuQzv1ISRPZuX6WsoCEREQtTejGwemBfPksR0+rSuzz9+PZgeLeuV+esoCEREQoyZ8Z/VKTz6cSI5eQHuu7w7vz6vA1Uqe3M0X0EgIhJCdh7I4r65sXyTdIBBHRrx1Pi+dGhS29PXVBCIiISA/IDx92+38+ynm6hcyfHY1b25YVAUlSo5z19bQSAi4rMt6ZlMjonlf3Ye5sJuTXl8XB9aNahZbq+vIBAR8UlOXoCZX23l5c+TqF29Mi9c15+x/VvhnPd7AYUpCEREfBCbcpjJs2PZuCeTK/u14s9X9qRJneq+1KIgEBEpR8dz8nlh2WbeWJFM07rVeePmaC4p478LOFMKAhGRcrIy+QBTY2LZfiCL6we15b7RPahXo6rfZSkIRES8lpmdy4xPNvLe9zuJalSLf90+mGGdm/hd1v9SEIiIeOjzjelMmxtPekY2t5/XgT9e2o2a1Sr7XdZPKAhERDxw4OgJHlmYyPx1qXRtXodXfzWMc6Ia+l1WsTwNAudcN+DDQqs6Ag+Z2QuF5owA5gPbgqvmmNkjXtYlIuIVM+Oj2DSmL0ggMzuX34/swm9GdKZaldBt9uxpEJjZJqA/gHOuMrAbmFvM1BVmNsbLWkREvLbnSDYPzItj2Ya99GvbgKfH96Vbi7p+l3VK5Xlo6GJgq5ntKMfXFBHxnJnxwapdPPHxBnIDAR64oge3Du9A5XJoD1EWyjMIJgDvlzA21Dm3HkgF/mRmCUUnOOcmAZMAoqKiPCtSRORM7DhwjKkxcXyXfIChHRszY3wf2jX2tklcWXNm5v2LOFeNgg/5XmaWXmSsHhAws6POudHAi2bW5WTPFx0dbatXr/auYBGRU8gPGG99vY3nlm6iaqVKTLuiB9cNbFvu7SHOhHNujZlFF11fXnsElwNri4YAgJllFFpe5Jx71TnXxMz2l1NtIiJnZNOeTCbPXs/6lCOM7NGMx67uQ4v6Nfwu66yVVxBcTwmHhZxzLYB0MzPn3CAKrqN8oJzqEhE5bTl5AV75IolXv0yiXo2qvHT9OYzp2zKk9wJOh+dB4JyrBVwC3Flo3V0AZjYT+CVwt3MuDzgOTLDyOF4lInIG1u06zOTZ69mcfpSr+7fioSt70ah2Nb/LKhOeB4GZZQGNi6ybWWj5ZeBlr+sQETkbWTl5/GXJZt76ZhvN69XgrVuiuai7v03iypr+slhEpATfJu1n6pw4dh7M4sYhUUwZ1Z26IdAkrqwpCEREijhyPJcnF23gg1W7aN+4Fh9MGsKQjo1P/cAwpSAQESlkaWI6D8yLY1/mCe68oCN/GNmVGlVDq0lcWVMQiIgA+4+eYPqCBBbGptG9RV3euDmavm0a+F1WuVAQiEhEMzPmrdvNwx8lknUinz9e0pU7L+gU0k3iypqCQEQiVurh40ybG8cXm/ZxTlRBk7guzUO/SVxZUxCISMQJBIz3ftjJU59sJD9gPDSmJxOHtQ+bJnFlTUEgIhEled9Rps6J44dtBxneuTFPjutLVONafpflKwWBiESEvPwAb369jeeXbqZalUo8Pb4v10a3Cfv2EGVBQSAiFV5iagaTY9YTvzuDS3s259Gre9O8Xvg2iStrCgIRqbBO5OXz8udJvPblVhrUqsrLN5zDFX3Cv0lcWVMQiEiFtGbHIabExJK09yjXDGjNg1f0pGEFaRJX1hQEIlKhHDuRx7NLNvH3b7fTsl4N3r51IBd2a+Z3WSFNQSAiFcaKLfu4b04cKYeOc/PQdkwe1Z061fUxdyraQiIS9o5k5fL4okT+vTqFjk1q8+87hzKoQyO/ywobCgIRCWuL4/fw4Px4Dh7L4e4Rnfjvi7tU+CZxZU1BICJhaV9mQZO4j+PS6NmyHm/fMpDerev7XVZYKo9LVW4HMoF8IM/MoouMO+BFYDSQBdxiZmu9rktEwpOZMWftbh5ZmMjx3Hzuvawbk87vSNXKkdMkrqyV1x7BhWa2v4Sxy4EuwZ/BwGvBWxGRn0g5lMX9c+NZvnkf57ZryFPj+9K5WR2/ywp7oXBoaCzwbvCC9Sudcw2ccy3NLM3vwkQkNAQCxj+/38FTn2zEgIev6sVNQ9pRKUKbxJW18ggCA5Y45wx43cxmFRlvDewqdD8luO4nQeCcmwRMAoiKivKuWhEJKVv3HWVqTCyrth/iF12a8MS4PrRtFNlN4spaeQTBcDNLdc41A5Y65zaa2fJC48VFuv1sRUGAzAKIjo7+2biIVCy5+QHeWJHMC8u2ULNqZZ69th/jB7RWewgPeB4EZpYavN3rnJsLDAIKB0EK0LbQ/TZAqtd1iUjoit99hCkxsSSkZjC6TwumX9WLZnXVJM4rngaBc642UMnMMoPLlwKPFJm2ALjHOfcBBV8SH9H3AyKRKTs3n79+toXXlyfTsFY1Zt44gFG9W/pdVoXn9R5Bc2BucFeuCvAvM1vsnLsLwMxmAosoOHU0iYLTR2/1uCYRCUGrtx9kckwsyfuOce25bXjgip7Ur1XV77IigqdBYGbJQL9i1s8stGzAb72sQ0RC19ETeTyzeCPvrtxBq/o1efe2QZzftanfZUWUUDh9VEQi1Feb93H/nDhSjxxn4tD23HtZN2qrSVy50xYXkXJ3OCuHRxYmMmftbjo1rc3su4Zybjs1ifOLgkBEytWiuDQemh/P4axc7rmwM/dc1FlN4nymIBCRcrE3I5uH5iewOGEPvVvX453bBtGrlZrEhQIFgYh4ysz4z5oUHluYSHZegCmjunPHLzpQRU3iQoaCQEQ8s+tgFvfPjWPFlv0Mat+IGeP70LGpmsSFGgWBiJS5/IDx7nfbeebTTTjg0at786tBUWoSF6IUBCJSppL2ZjJ5dixrdx5mRLemPD6uD60b1PS7LDkJBYGIlInc/ACvf7WVv36WRK3qlXn+un5c3V9N4sKBgkBESi0u5Qj3zl7Pxj2ZXNG3JQ9f1Ysmdar7XZacJgWBiJy17Nx8Xli2hTdWJNO4djVev+lcLuvVwu+y5AwpCETkrHyffICpc+LYtv8Y10W35f4relC/pprEhSMFgYickczsXJ5evIl/rNxB20Y1ee/2wQzv3MTvsqQUFAQictq+2LSXaXPiSMvI5rbhHfjTZV2pVU0fI+FO/4IickqHjuXw6MJE5vzPbro0q0PM3cMYENXQ77KkjCgIRKREZsbHcWn8eX4CR47n8ruLOvPbizpTvYqaxFUkCgIRKVZ6RjYPzItnaWI6fdvU55+3D6ZHy3p+lyUe8PqaxW2Bd4EWQACYZWYvFpkzApgPbAuummNmRa9rLCLlxMz49+pdPPbxBnLyAtw/uju3DVeTuIrM6z2CPOCPZrbWOVcXWOOcW2pmiUXmrTCzMR7XIiKnsPNAFlPnxPLt1gMM7tCIp8b3pX2T2n6XJR7z+prFaUBacDnTObcBaA0UDQIR8VF+wHj7m208t2QzlSs5Hh/Xm+sHqklcpCi37wicc+2Bc4Dvixke6pxbD6QCfzKzhGIePwmYBBAVFeVhpSKRZXN6QZO4dbsOc1H3Zjw+rjct66tJXCQplyBwztUBYoDfm1lGkeG1QDszO+qcGw3MA7oUfQ4zmwXMAoiOjjZvKxap+HLyArz25VZe/mILdWtU5cUJ/bmqXys1iYtAngeBc64qBSHwnpnNKTpeOBjMbJFz7lXnXBMz2+91bSKRav2uw0yJiWXjnkyu6teKP1/Zk8ZqEhexvD5ryAF/AzaY2V9KmNMCSDczc84NAioBB7ysSyRSHc/J5/llm3lzRTLN6tbgzZujGdmzud9lic+83iMYDtwExDnn1gXX3Q9EAZjZTOCXwN3OuTzgODDBzHToR6SMfbf1APfNiWX7gSxuGBzF1Mu7U6+GmsSJ92cNfQ2c9ICjmb0MvOxlHSKRLCM7lycXbeT9H3bSrnEt/nXHYIZ1UpM4+f/0l8UiFdhnG9KZNjeevZnZTDq/I38Y2ZWa1dQeQn5KQSBSAR04eoKHP0pkwfpUujWvy8ybzqV/2wZ+lyUhSkEgUoGYGQvWp/LwR4lkZufyh5FduXtEJ6pVUXsIKZmCQKSCSDtynAfmxvPZxr30b9uAp3/Zl67N6/pdloQBBYFImAsEjPdX7eTJRRvJCwR44Ioe3Dq8A5XVHkJOk4JAJIxt33+MqXNiWZl8kGGdGjPjmr5ENa7ld1kSZhQEImEoLz/AW8EmcdUqV2LGNX24bmBbtYeQs6IgEAkzG9IymBITS2zKEUb2aM5jV/emRf0afpclYUxBIBImTuTl88oXW3n1iyTq16zKS9efw5i+LbUXIKWmIBAJA2t3HmLK7Fi27D3KuHNa8+CYnjSqXc3vsqSCUBCIhLCsnDyeW7KZt77ZRot6NXj7loFc2L2Z32VJBaMgEAlR3yTtZ+qcWHYdPM6NQ6KYMqo7ddUkTjygIBAJMUeO5/Lkog18sGoXHZrU5sNJQxjcsbHfZUkFpiAQCSFLEvbwwLx49h89wZ0XFDSJq1FVTeLEWwoCkRCwL/ME0z9K4OPYNLq3qMubE6Pp26aB32VJhFAQiPjIzJi3bjcPf5RI1ol8/nhJV+4a0YmqldUkTsqPgkDEJ7sPH2fa3Di+3LSPc6Ia8PT4vnRRkzjxQXlcvH4U8CJQGXjTzGYUGXfB8dFAFnCLma31ui4RvwQCxns/7GTGog0EDB4a05OJw9qrSZz4xuuL11cGXgEuAVKAVc65BWaWWGja5UCX4M9g4LXgrUiFk7zvKFNj4vhh+0HO69yEJ6/pQ9tGahIn/vJ6j2AQkGRmyQDOuQ+AsUDhIBgLvBu8YP1K51wD51xLM0vzuDaRcpOXH+DNr7fx/NLNVKtSiafH9+Xa6DZqDyEhwesgaA3sKnQ/hZ//tl/cnNbAT4LAOTcJmAQQFRVV5oWKeCUxNYPJMeuJ353BpT2b8+jVvWleT03iJHR4HQTF/bpjZzEHM5sFzAKIjo7+2bhIqDmRl8/Lnyfx2pdbaVCrKq/+agCX926hvQAJOV4HQQrQttD9NkDqWcwRCStrdhxkSkwcSXuPcs2A1jx4RU8aqkmchCivg2AV0MU51wHYDUwAbigyZwFwT/D7g8HAEX0/IOHq2Ik8nvl0E+98t51W9Wvy91sHMqKbmsRJaPM0CMwszzl3D/ApBaePvmVmCc65u4LjM4FFFJw6mkTB6aO3elmTiFdWbNnHfXPiSDl0nJuHtmPyqO7Uqa4/1ZHQ5/m71MwWUfBhX3jdzELLBvzW6zpEvHIkK5fHPk7kP2tS6NikNv++cyiDOjTyuyyR06ZfV0RKYXH8Hh6cH8/BYzn8ZkQnfndxFzWJk7CjIBA5C3szs5m+IIFFcXvo2bIeb98ykN6t6/tdlshZURCInAEzY87a3TyyMJHjufnce1k3Jp3fUU3iJKwpCEROU8qhLO6fG8/yzfuIbteQGeP70rlZHb/LEik1BYHIKQQCxj9W7uCpxRsBePiqXtw0pB2V1CROKggFgchJbN13lCmzY1m94xDnd23KE+N606ahmsRJxaIgEClGbn6AWcuTefGzLdSsWplnr+3H+AGt1R5CKiQFgUgR8buPMCUmloTUDEb3acH0q3rRrK6axEnFpSAQCcrOzeevn23h9eXJNKxVjZk3DmBU75Z+lyXiOQWBCLB6+0Emx8SSvO8Y157bhgeu6En9WlX9LkukXCgIJKIdPZHHM4s38u7KHbSqX5N3bxvE+V2b+l2WSLlSEEjE+mrzPu6fE0fqkeNMHNqeey/rRm01iZMIpHe9RJzDWTk8unADMWtT6NS0NrPvGsq57dQkTiKXgkAiyidxaTw4P4HDWTncc2Fn7rmos5rEScRTEEhE2JuRzUPzE1icsIferevxzm0D6dVKTeJEQEEgFZyZ8Z81KTy2MJHsvABTRnXnjl90oIqaxIn8LwWBVFi7DmZx/9w4VmzZz6D2jZgxvg8dm6pJnEhRngWBc+4Z4EogB9gK3Gpmh4uZtx3IBPKBPDOL9qomiQz5AePd77bzzKebcMCjY3vxq8FqEidSEi/3CJYC9wWvW/wUcB8wpYS5F5rZfg9rkQiRtDeTybNjWbvzMCO6NeXxcX1o3aCm32WJhDTPgsDMlhS6uxL4pVevJZKbH+D1r7by18+SqFW9Ms9f14+r+6tJnMjpKK/vCG4DPixhzIAlzjkDXjezWcVNcs5NAiYBREVFeVKkhKe4lCPcO3s9G/dkckXfljx8VS+a1Knud1kiYaNUQeCcWwa0KGZompnND86ZBuQB75XwNMPNLNU51wxY6pzbaGbLi04KBsQsgOjoaCtN3VIxZOfm88KyLbyxIpnGtavx+k3nclmv4t6OInIypQoCMxt5snHn3ERgDHCxmRX74W1mqcHbvc65ucAg4GdBIFLY98kHmDonjm37jzFhYFvuG92D+jXVJE7kbHh51tAoCr4cvsDMskqYUxuoZGaZweVLgUe8qknCX2Z2Lk8t3sg/V+6kbaOavHf7YIZ3buJ3WSJhzcvvCF4GqlNwuAdgpZnd5ZxrBbxpZqOB5sDc4HgV4F9mttjDmiSMfbFpL9PmxJGWkc2vz+vAHy/tSq1q+lMYkdLy8qyhziWsTwVGB5eTgX5e1SAVw8FjOTy6MJG5/7ObLs3qEHP3MAZENfS7LJEKQ79OScgyMxbGpjF9QQJHjufyu4u78NsLO1G9iprEiZQlBYGEpPSMbKbNjWfZhnT6tqnPe3cMpnuLen6XJVIhKQgkpJgZH67axeOLNpCTF2Da6B7cOry9msSJeEhBICFj54Esps6J5dutBxjcoRFPje9L+ya1/S5LpMJTEIjv8gPG299s49klm6haqRJPjOvDhIFt1SROpJwoCMRXm/ZkMjkmlvW7DnNx92Y8Nq43LeurSZxIeVIQiC9y8gK8+mUSr3yRRN0aVXlxQn+u6tdKTeJEfKAgkHK3btdhpsyOZVN6JmP7t+KhMT1prCZxIr5REEi5OZ6Tz1+WbuJvX2+jWd0a/G1iNBf3aO53WSIRT0Eg5eLbrfuZGhPHzoNZ3DA4iqmXd6deDTWJEwkFCgLxVEZ2Lk8u2sj7P+ykXeNavH/HEIZ2aux3WSJSiIJAPLMsMZ1p8+LYl3mCSed35A8ju1KzmtpDiIQaBYGUuQNHTzD9o0Q+Wp9K9xZ1mXVTNP3aNvC7LBEpgYJAyoyZsWB9KtMXJHD0RB5/GNmVu0d0oloVtYcQCWUKAikTqYeP88C8eD7fuJf+bRvw9C/70rV5Xb/LEpHToCCQUgkEjPdX7eTJRRvJCwR44Ioe3Dq8A5XVHkIkbCgI5Kxt23+MqTGxfL/tIMM6NWbGNX2JalzL77JE5Ax5dvDWOTfdObfbObcu+DO6hHmjnHObnHNJzrmpXtUjZScvP8Cs5VsZ9cJyEtMyeGp8H967fbBCQCRMeb1H8LyZPVvSoHOuMvAKcAmQAqxyzi0ws0SP65KztCEtgykxscSmHOGSns157OreNK9Xw++yRKQU/D40NAhICl67GOfcB8BYQEEQYk7k5fPKF1t59Ysk6tesyss3nMMVfVqqSZxIBeB1ENzjnLsZWA380cwOFRlvDewqdD8FGFzcEznnJgGTAKKiojwoVUqyduchpsyOZcveo4w7pzUPjelJw9rV/C5LRMpIqYLAObcMaFHM0DTgNeBRwIK3zwG3FX2KYh5rxb2Wmc0CZgFER0cXO0fKVlZOHs8t2cxb32yjRb0avH3LQC7s3szvskSkjJUqCMxs5OnMc869ASwsZigFaFvofhsgtTQ1Sdn4Jmk/U+fEsuvgcW4cEsWUUd2pqyZxIhWSZ4eGnHMtzSwteHccEF/MtFVAF+dcB2A3MAG4waua5NSOHM/lyUUb+GDVLjo0qc2Hk4YwuKOaxIlUZF5+R/C0c64/BYd6tgN3AjjnWgFvmtloM8tzzt0DfApUBt4yswQPa5KTWJKwhwfmxXPgWA53XdCJ34/sQo2qahInUtF5FgRmdlMJ61OB0YXuLwIWeVWHnNr+oyeYviCBhbFp9GhZj79NHEifNvX9LktEyonfp4+Kj8yMeet28/BHiWSdyOdPl3blzgs6UbWymsSJRBIFQYTaffg40+bG8eWmfQyIKmgS17mZmsSJRCIFQYQJBIz3ftjJjEUbCBg8NKYnE4e1V5M4kQimIIggyfuOMjUmjh+2H+S8zk148po+tG2k/kAikU5BEAHy8gO8sWIbzy/bTI0qlXj6l3259tw2ag8hIoCCoMJLTM1gcsx64ndncFmv5jw6tjfN1CRORApREFRQ2bn5vPx5EjO/2kqDWtV47VcDuLxPS7/LEpEQpCCogNbsOMjk2bFs3XeM8QPa8OCYHjSopSZxIlI8BUEFcuxEHs98uol3vttOq/o1eee2QVzQtanfZYlIiFMQVBDLN+/jvjlxpB45zs1D2nHvqO7Uqa5/XhE5NX1ShLkjWbk8+nEis9ek0LFpbf5951AGtm/kd1kiEkYUBGFscXwaD85P4OCxHH4zohO/u1hN4kTkzCkIwtDezGz+PD+BT+L30LNlPd6+ZSC9W6tJnIicHQVBGDEzZq9J4bGPN3A8N5/Jo7pxxy86qkmciJSKgiBM7DqYxf1z41ixZT8D2zdkxvi+dGpax++yRKQCUBCEuEDAePe77Tz96SYc8MjYXtw4uB2V1CRORMqIgiCEJe09ytSYWFbvOMT5XZvyxLjetGmoJnEiUra8vGbxh0C34N0GwGEz61/MvO1AJpAP5JlZtFc1hYvc/ACzlifz4rIt1KxWmeeu7cc1A1qrSZyIeMLLS1Ve9+Oyc+454MhJpl9oZvu9qiWcxO8+wuTZsSSmZTC6Twsevqo3TetW97ssEanAPD805Ap+jf0v4CKvXyucZefm8+JnW5i1PJlGtasx88ZzGdW7hd9liUgEKI/vCH4BpJvZlhLGDVjinDPgdTObVdwk59wkYBJAVFSUJ4X6ZdX2g0yZHUvy/mP8V3Qbpo3uSf1aVf0uS0QiRKmCwDm3DCju19ZpZjY/uHw98P5Jnma4maU655oBS51zG81sedFJwYCYBRAdHW2lqTtUHD2Rx9OLN/Ludzto07Am//z1YM7r0sTvskQkwpQqCMxs5MnGnXNVgGuAc0/yHKnB273OubnAIOBnQVDRfLlpL9PmxpN65Di3Dm/Pny7tRm01iRMRH3j9yTMS2GhmKcUNOudqA5XMLDO4fCnwiMc1+erQsRwe/TiROWt307lZHWbfNYxz2zX0uywRiWBeB8EEihwWcs61At40s9FAc2Bu8LTIKsC/zGyxxzX5wsz4JH4PD82P53BWLv/nos7cc1FnqldRkzgR8ZenQWBmtxSzLhUYHVxOBvp5WUMo2JuRzYPz4/k0IZ0+revz7m2D6dmqnt9liYgA+stiT5kZ/1mdwqMfJ5KTF+C+y7vz6/M6UEVN4kQkhCgIPLLrYBb3zYnj66T9DOrQiBnX9KGjmsSJSAhSEJSx/IDxzrfbeebTTVSu5Hjs6t7cMChKTeJEJGQpCMrQlvRMpsTEsnbnYUZ0a8oT4/rQqkFNv8sSETkpBUEZyMkL8PpXW3np8yRqV6/MC9f1Z2z/VmoSJyJhQUFQSrEph5k8O5aNezK5sl8r/nxlT5rUUZM4EQkfCoKzlJ2bz/NLN/PGimSa1q3OGzdHc0nP5n6XJSJyxhQEZ2Fl8gGmxsSy/UAW1w9qy9TLe1C/pprEiUh4UhCcgczsXGZ8spH3vt9JVKNa/Ov2wQzrrCZxIhLeFASn6fON6UybG096Rja3n9eB/3tpV2pV0+YTkfCnT7JTOHgsh0c+SmDeulS6NKvDq3cP45woNYkTkYpDQVACM+Oj2DSmL0gg43gu/31xF35zYSc1iRORCkdBUIw9R7J5YF48yzak069NfZ66YzDdW6hJnIhUTAqCQsyMD1bt4omPN5AbCDBtdA9uO68DldUeQkQqMAVB0I4Dx5gaE8d3yQcY0rERM67pS/smtf0uS0TEcxEfBPkB4+1vtvHskk1UrVSJJ8b1YcLAtmoSJyIRI6KDYNOeTCbHxLJ+12Eu7t6Mx8b1pmV9NYkTkchSqiukOOeudc4lOOcCzrnoImP3OeeSnHObnHOXlfD4Rs65pc65LcHbcjkvMycvwAvLNjPmpRXsOpjFixP68+bEaIWAiESk0l4qKx64BlheeKVzricF1yvuBYwCXnXOFXfe5VTgMzPrAnwWvO+pdbsOc+VLX/PCsi2M7tOSpX84n7H9W6tTqIhErFIdGjKzDUBxH6JjgQ/M7ASwzTmXBAwCvitm3ojg8jvAl8CU0tR0Mi99toXnl22mWd0a/G1iNBf3UJM4ERGvviNoDawsdD8luK6o5maWBmBmac65ZiU9oXNuEjAJICoq6qyKimpciwmDoph6eXfq1VCTOBEROI0gcM4tA1oUMzTNzOaX9LBi1tmZFPazB5vNAmYBREdHn9Vzje3fmrH9i8sjEZHIdcogMLORZ/G8KUDbQvfbAKnFzEt3zrUM7g20BPaexWuJiEgplPbL4pIsACY456o75zoAXYAfSpg3Mbg8EShpD0NERDxS2tNHxznnUoChwMfOuU8BzCwB+DeQCCwGfmtm+cHHvFnoVNMZwCXOuS3AJcH7IiJSjpxZqQ7d+yI6OtpWr17tdxkiImHFObfGzKKLrvfq0JCIiIQJBYGISIRTEIiIRDgFgYhIhAvLL4udc/uAHWf58CbA/jIsp6ypvtJRfaWj+kovlGtsZ2ZNi64MyyAoDefc6uK+NQ8Vqq90VF/pqL7SC4cai9KhIRGRCKcgEBGJcJEYBLP8LuAUVF/pqL7SUX2lFw41/kTEfUcgIiI/FYl7BCIiUoiCQEQkwlXYIHDOjXLObXLOJTnnfnYtZFfgr8HxWOfcgHKsra1z7gvn3AbnXIJz7r+LmTPCOXfEObcu+PNQedUXfP3tzrm44Gv/rMOfz9uvW6Htss45l+Gc+32ROeW6/Zxzbznn9jrn4guta+ScW+qc2xK8bVjCY0/6XvWwvmeccxuD/35znXMNSnjsSd8LHtY33Tm3u9C/4egSHuvX9vuwUG3bnXPrSnis59uv1Myswv0AlYGtQEegGrAe6FlkzmjgEwqupjYE+L4c62sJDAgu1wU2F1PfCGChj9twO9DkJOO+bb9i/q33UPCHMr5tP+B8YAAQX2jd08DU4PJU4KkS6j/pe9XD+i4FqgSXnyquvtN5L3hY33TgT6fx7+/L9isy/hzwkF/br7Q/FXWPYBCQZGbJZpYDfACMLTJnLPCuFVgJNAheJc1zZpZmZmuDy5nABoq/pnMo8237FXExsNXMzvYvzcuEmS0HDhZZPRZ4J7j8DnB1MQ89nfeqJ/WZ2RIzywveXUnBlQR9UcL2Ox2+bb8fOecc8F/A+2X9uuWlogZBa2BXofsp/PyD9nTmeM451x44B/i+mOGhzrn1zrlPnHO9yrcyDFjinFvjnJtUzHhIbD9gAiX/B/Rz+wE0N7M0KAh/oFkxc0JlO95GwR5ecU71XvDSPcFDV2+VcGgtFLbfL4B0M9tSwrif2++0VNQgcMWsK3qe7OnM8ZRzrg4QA/zezDKKDK+l4HBHP+AlYF551gYMN7MBwOXAb51z5xcZD4XtVw24CvhPMcN+b7/TFQrbcRqQB7xXwpRTvRe88hrQCegPpFFw+KUo37cfcD0n3xvwa/udtooaBClA20L32wCpZzHHM865qhSEwHtmNqfouJllmNnR4PIioKpzrkl51WdmqcHbvcBcCnbBC/N1+wVdDqw1s/SiA35vv6D0Hw+XBW/3FjPH7/fhRGAM8CsLHtAu6jTeC54ws3QzyzezAPBGCa/r9/arAlwDfFjSHL+235moqEGwCujinOsQ/K1xArCgyJwFwM3Bs1+GAEd+3I33WvCY4t+ADWb2lxLmtAjOwzk3iIJ/qwPlVF9t51zdH5cp+FIxvsg037ZfISX+Jubn9itkATAxuDwRmF/MnNN5r3rCOTcKmAJcZWZZJcw5nfeCV/UV/s5pXAmv69v2CxoJbDSzlOIG/dx+Z8Tvb6u9+qHgrJbNFJxRMC247i7gruCyA14JjscB0eVY23kU7L7GAuuCP6OL1HcPkEDBWRArgWHlWF/H4OuuD9YQUtsv+Pq1KPhgr19onW/bj4JASgNyKfgt9ddAY+AzYEvwtlFwbitg0cneq+VUXxIFx9d/fA/OLFpfSe+FcqrvH8H3ViwFH+4tQ2n7Bdf//cf3XKG55b79SvujFhMiIhGuoh4aEhGR06QgEBGJcAoCEZEIpyAQEYlwCgIRkQinIBARiXAKAhGRCPf/AGdnnLeQ+DdZAAAAAElFTkSuQmCC", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Visualize the toy tensor\n", "plt.plot(A);" ] }, { "cell_type": "markdown", "metadata": { "id": "ovyXy-Qxsx9x" }, "source": [ "A straight line, nice.\n", "\n", "Now let's see how the ReLU activation function influences it.\n", "\n", "And instead of using PyTorch's ReLU (`torch.nn.ReLU`), we'll recreate it ourselves.\n", "\n", "The ReLU function turns all negatives to 0 and leaves the positive values as they are." ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "XhdCQKbcsxi1", "outputId": "47c7fa54-6d03-47b7-a745-0bb3599c82fd" }, "outputs": [ { "data": { "text/plain": [ "tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 2., 3., 4., 5., 6., 7.,\n", " 8., 9.])" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create ReLU function by hand \n", "def relu(x):\n", " return torch.maximum(torch.tensor(0), x) # inputs must be tensors\n", "\n", "# Pass toy tensor through ReLU function\n", "relu(A)" ] }, { "cell_type": "markdown", "metadata": { "id": "dqw_P9GiuiB9" }, "source": [ "It looks like our ReLU function worked, all of the negative values are zeros.\n", "\n", "Let's plot them." ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 265 }, "id": "fICVVmAgsxal", "outputId": "4ada6523-81e8-450e-89b5-0be3da23f04c" }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Plot ReLU activated toy tensor\n", "plt.plot(relu(A));" ] }, { "cell_type": "markdown", "metadata": { "id": "AlmI4CHzsxEt" }, "source": [ "Nice! That looks exactly like the shape of the ReLU function on the [Wikipedia page for ReLU](https://en.wikipedia.org/wiki/Rectifier_(neural_networks)).\n", "\n", "How about we try the [sigmoid function](https://en.wikipedia.org/wiki/Sigmoid_function) we've been using?\n", "\n", "The sigmoid function formula goes like so:\n", "\n", "$$ out_i = \\frac{1}{1+e^{-input_i}} $$ \n", "\n", "Or using $x$ as input:\n", "\n", "$$ S(x) = \\frac{1}{1+e^{-x_i}} $$\n", "\n", "Where $S$ stands for sigmoid, $e$ stands for [exponential](https://en.wikipedia.org/wiki/Exponential_function) ([`torch.exp()`](https://pytorch.org/docs/stable/generated/torch.exp.html)) and $i$ stands for a particular element in a tensor.\n", "\n", "Let's build a function to replicate the sigmoid function with PyTorch." ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "hcDQDy8bvDcR", "outputId": "899e7595-5b1c-4182-c1ca-94aadaa097e1" }, "outputs": [ { "data": { "text/plain": [ "tensor([4.5398e-05, 1.2339e-04, 3.3535e-04, 9.1105e-04, 2.4726e-03, 6.6929e-03,\n", " 1.7986e-02, 4.7426e-02, 1.1920e-01, 2.6894e-01, 5.0000e-01, 7.3106e-01,\n", " 8.8080e-01, 9.5257e-01, 9.8201e-01, 9.9331e-01, 9.9753e-01, 9.9909e-01,\n", " 9.9966e-01, 9.9988e-01])" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create a custom sigmoid function\n", "def sigmoid(x):\n", " return 1 / (1 + torch.exp(-x))\n", "\n", "# Test custom sigmoid on toy tensor\n", "sigmoid(A)" ] }, { "cell_type": "markdown", "metadata": { "id": "qiwvlDWmxPUt" }, "source": [ "Woah, those values look a lot like prediction probabilities we've seen earlier, let's see what they look like visualized." ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 265 }, "id": "dxihhxGBxOWf", "outputId": "c6a6d3de-e9fb-445d-8d63-3964753a4559" }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Plot sigmoid activated toy tensor\n", "plt.plot(sigmoid(A));" ] }, { "cell_type": "markdown", "metadata": { "id": "IpOqVYpdxgWl" }, "source": [ "Looking good! We've gone from a straight line to a curved line.\n", "\n", "Now there's plenty more [non-linear activation functions](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity) that exist in PyTorch that we haven't tried.\n", "\n", "But these two are two of the most common.\n", "\n", "And the point remains, what patterns could you draw using an unlimited amount of linear (straight) and non-linear (not straight) lines?\n", "\n", "Almost anything right?\n", "\n", "That's exactly what our model is doing when we combine linear and non-linear functions.\n", "\n", "Instead of telling our model what to do, we give it tools to figure out how to best discover patterns in the data.\n", "\n", "And those tools are linear and non-linear functions." ] }, { "cell_type": "markdown", "metadata": { "id": "_1OeaW0FHC8G" }, "source": [ "## 8. Putting things together by building a multi-class PyTorch model\n", "\n", "We've covered a fair bit.\n", "\n", "But now let's put it all together using a multi-class classification problem.\n", "\n", "Recall a **binary classification** problem deals with classifying something as one of two options (e.g. a photo as a cat photo or a dog photo) where as a **multi-class classification** problem deals with classifying something from a list of *more than* two options (e.g. classifying a photo as a cat a dog or a chicken).\n", "\n", "![binary vs multi-class classification image with the example of dog vs cat for binary classification and dog vs cat vs chicken for multi-class classification](https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/images/02-binary-vs-multi-class-classification.png)\n", "*Example of binary vs. multi-class classification. Binary deals with two classes (one thing or another), where as multi-class classification can deal with any number of classes over two, for example, the popular [ImageNet-1k dataset](https://www.image-net.org/) is used as a computer vision benchmark and has 1000 classes.*\n" ] }, { "cell_type": "markdown", "metadata": { "id": "f5Ephtx6f1jB" }, "source": [ "### 8.1 Creating mutli-class classification data\n", "\n", "To begin a multi-class classification problem, let's create some multi-class data.\n", "\n", "To do so, we can leverage Scikit-Learn's [`make_blobs()`](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_blobs.html) method.\n", "\n", "This method will create however many classes (using the `centers` parameter) we want.\n", "\n", "Specifically, let's do the following:\n", "\n", "1. Create some multi-class data with `make_blobs()`.\n", "2. Turn the data into tensors (the default of `make_blobs()` is to use NumPy arrays).\n", "3. Split the data into training and test sets using `train_test_split()`.\n", "4. Visualize the data." ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 515 }, "id": "x3_UwcwHHC8G", "outputId": "7cd92d66-41e3-4aa0-ab94-f9f9ef40fcce" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[-8.4134, 6.9352],\n", " [-5.7665, -6.4312],\n", " [-6.0421, -6.7661],\n", " [ 3.9508, 0.6984],\n", " [ 4.2505, -0.2815]]) tensor([3, 2, 2, 1, 1])\n" ] }, { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Import dependencies\n", "import torch\n", "import matplotlib.pyplot as plt\n", "from sklearn.datasets import make_blobs\n", "from sklearn.model_selection import train_test_split\n", "\n", "# Set the hyperparameters for data creation\n", "NUM_CLASSES = 4\n", "NUM_FEATURES = 2\n", "RANDOM_SEED = 42\n", "\n", "# 1. Create multi-class data\n", "X_blob, y_blob = make_blobs(n_samples=1000,\n", " n_features=NUM_FEATURES, # X features\n", " centers=NUM_CLASSES, # y labels \n", " cluster_std=1.5, # give the clusters a little shake up (try changing this to 1.0, the default)\n", " random_state=RANDOM_SEED\n", ")\n", "\n", "# 2. Turn data into tensors\n", "X_blob = torch.from_numpy(X_blob).type(torch.float)\n", "y_blob = torch.from_numpy(y_blob).type(torch.LongTensor)\n", "print(X_blob[:5], y_blob[:5])\n", "\n", "# 3. Split into train and test sets\n", "X_blob_train, X_blob_test, y_blob_train, y_blob_test = train_test_split(X_blob,\n", " y_blob,\n", " test_size=0.2,\n", " random_state=RANDOM_SEED\n", ")\n", "\n", "# 4. Plot data\n", "plt.figure(figsize=(10, 7))\n", "plt.scatter(X_blob[:, 0], X_blob[:, 1], c=y_blob, cmap=plt.cm.RdYlBu);" ] }, { "cell_type": "markdown", "metadata": { "id": "xBCnUs0oHC8G" }, "source": [ "Nice! Looks like we've got some multi-class data ready to go.\n", "\n", "Let's build a model to separate the coloured blobs. \n", "\n", "> **Question:** Does this dataset need non-linearity? Or could you draw a succession of straight lines to separate it?" ] }, { "cell_type": "markdown", "metadata": { "id": "_dINSA0Chiof" }, "source": [ "### 8.2 Building a multi-class classification model in PyTorch\n", "\n", "We've created a few models in PyTorch so far.\n", "\n", "You might also be starting to get an idea of how flexible neural networks are.\n", "\n", "How about we build one similar to `model_3` but this still capable of handling multi-class data?\n", "\n", "To do so, let's create a subclass of `nn.Module` that takes in three hyperparameters:\n", "* `input_features` - the number of `X` features coming into the model.\n", "* `output_features` - the ideal numbers of output features we'd like (this will be equivalent to `NUM_CLASSES` or the number of classes in your multi-class classification problem).\n", "* `hidden_units` - the number of hidden neurons we'd like each hidden layer to use.\n", "\n", "Since we're putting things together, let's setup some device agnostic code (we don't have to do this again in the same notebook, it's only a reminder).\n", "\n", "Then we'll create the model class using the hyperparameters above." ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 36 }, "id": "g9OPjDfQk1AU", "outputId": "64d275af-144c-4b5f-99b7-a2d2ebf235f8" }, "outputs": [ { "data": { "text/plain": [ "'cuda'" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create device agnostic code\n", "device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n", "device" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "TGoZCzOrHC8H", "outputId": "0f8d1d8e-e578-4bf6-eea6-1cf27bda4764" }, "outputs": [ { "data": { "text/plain": [ "BlobModel(\n", " (linear_layer_stack): Sequential(\n", " (0): Linear(in_features=2, out_features=8, bias=True)\n", " (1): Linear(in_features=8, out_features=8, bias=True)\n", " (2): Linear(in_features=8, out_features=4, bias=True)\n", " )\n", ")" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from torch import nn\n", "\n", "# Build model\n", "class BlobModel(nn.Module):\n", " def __init__(self, input_features, output_features, hidden_units=8):\n", " \"\"\"Initializes all required hyperparameters for a multi-class classification model.\n", "\n", " Args:\n", " input_features (int): Number of input features to the model.\n", " out_features (int): Number of output features of the model\n", " (how many classes there are).\n", " hidden_units (int): Number of hidden units between layers, default 8.\n", " \"\"\"\n", " super().__init__()\n", " self.linear_layer_stack = nn.Sequential(\n", " nn.Linear(in_features=input_features, out_features=hidden_units),\n", " # nn.ReLU(), # <- does our dataset require non-linear layers? (try uncommenting and see if the results change)\n", " nn.Linear(in_features=hidden_units, out_features=hidden_units),\n", " # nn.ReLU(), # <- does our dataset require non-linear layers? (try uncommenting and see if the results change)\n", " nn.Linear(in_features=hidden_units, out_features=output_features), # how many classes are there?\n", " )\n", " \n", " def forward(self, x):\n", " return self.linear_layer_stack(x)\n", "\n", "# Create an instance of BlobModel and send it to the target device\n", "model_4 = BlobModel(input_features=NUM_FEATURES, \n", " output_features=NUM_CLASSES, \n", " hidden_units=8).to(device)\n", "model_4" ] }, { "cell_type": "markdown", "metadata": { "id": "_eOASXWAHC8H" }, "source": [ "Excellent! Our multi-class model is ready to go, let's create a loss function and optimizer for it.\n", "\n", "### 8.3 Creating a loss function and optimizer for a multi-class PyTorch model\n", "\n", "Since we're working on a multi-class classification problem, we'll use the `nn.CrossEntropyLoss()` method as our loss function.\n", "\n", "And we'll stick with using SGD with a learning rate of 0.1 for optimizing our `model_4` parameters.\n" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "id": "3ngHLo--HC8H" }, "outputs": [], "source": [ "# Create loss and optimizer\n", "loss_fn = nn.CrossEntropyLoss()\n", "optimizer = torch.optim.SGD(model_4.parameters(), \n", " lr=0.1) # exercise: try changing the learning rate here and seeing what happens to the model's performance" ] }, { "cell_type": "markdown", "metadata": { "id": "orcVVmLzo3gX" }, "source": [ "### 8.4 Getting prediction probabilities for a multi-class PyTorch model\n", "\n", "Alright, we've got a loss function and optimizer ready, and we're ready to train our model but before we do let's do a single forward pass with our model to see if it works." ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "--d6YmldpZh_", "outputId": "2b1fb56f-cf42-49a3-f1ec-7978cfd68b56" }, "outputs": [ { "data": { "text/plain": [ "tensor([[-1.2711, -0.6494, -1.4740, -0.7044],\n", " [ 0.2210, -1.5439, 0.0420, 1.1531],\n", " [ 2.8698, 0.9143, 3.3169, 1.4027],\n", " [ 1.9576, 0.3125, 2.2244, 1.1324],\n", " [ 0.5458, -1.2381, 0.4441, 1.1804]], device='cuda:0',\n", " grad_fn=)" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Perform a single forward pass on the data (we'll need to put it to the target device for it to work)\n", "model_4(X_blob_train.to(device))[:5]" ] }, { "cell_type": "markdown", "metadata": { "id": "0fk1K9VlpoPI" }, "source": [ "What's coming out here?\n", "\n", "It looks like we get one value per feature of each sample.\n", "\n", "Let's check the shape to confirm." ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "3W4jnvmWp0OH", "outputId": "f515c93d-2c57-43fe-f5ce-d6b6aa20cd0f" }, "outputs": [ { "data": { "text/plain": [ "(torch.Size([4]), 4)" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# How many elements in a single prediction sample?\n", "model_4(X_blob_train.to(device))[0].shape, NUM_CLASSES " ] }, { "cell_type": "markdown", "metadata": { "id": "NyQSNaSVqBBN" }, "source": [ "Wonderful, our model is predicting one value for each class that we have.\n", "\n", "Do you remember what the raw outputs of our model are called?\n", "\n", "Hint: it rhymes with \"frog splits\" (no animals were harmed in the creation of these materials).\n", "\n", "If you guessed *logits*, you'd be correct.\n", "\n", "So right now our model is outputing logits but what if we wanted to figure out exactly which label is was giving the sample?\n", "\n", "As in, how do we go from `logits -> prediction probabilities -> prediction labels` just like we did with the binary classification problem?\n", "\n", "That's where the [softmax activation function](https://en.wikipedia.org/wiki/Softmax_function) comes into play.\n", "\n", "The softmax function calculates the probability of each prediction class being the actual predicted class compared to all other possible classes.\n", "\n", "If this doesn't make sense, let's see in code." ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "6hU_Wxudrbiq", "outputId": "c12e6a9f-c80f-466c-aa5c-27e30cfe9963" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[-1.2549, -0.8112, -1.4795, -0.5696],\n", " [ 1.7168, -1.2270, 1.7367, 2.1010],\n", " [ 2.2400, 0.7714, 2.6020, 1.0107],\n", " [-0.7993, -0.3723, -0.9138, -0.5388],\n", " [-0.4332, -1.6117, -0.6891, 0.6852]], device='cuda:0',\n", " grad_fn=)\n", "tensor([[0.1872, 0.2918, 0.1495, 0.3715],\n", " [0.2824, 0.0149, 0.2881, 0.4147],\n", " [0.3380, 0.0778, 0.4854, 0.0989],\n", " [0.2118, 0.3246, 0.1889, 0.2748],\n", " [0.1945, 0.0598, 0.1506, 0.5951]], device='cuda:0',\n", " grad_fn=)\n" ] } ], "source": [ "# Make prediction logits with model\n", "y_logits = model_4(X_blob_test.to(device))\n", "\n", "# Perform softmax calculation on logits across dimension 1 to get prediction probabilities\n", "y_pred_probs = torch.softmax(y_logits, dim=1) \n", "print(y_logits[:5])\n", "print(y_pred_probs[:5])" ] }, { "cell_type": "markdown", "metadata": { "id": "A_pbSytSrzHF" }, "source": [ "Hmm, what's happened here?\n", "\n", "It may still look like the outputs of the softmax function are jumbled numbers (and they are, since our model hasn't been trained and is predicting using random patterns) but there's a very specific thing different about each sample.\n", "\n", "After passing the logits through the softmax function, each individual sample now adds to 1 (or very close to).\n", "\n", "Let's check." ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "5fC5No7IsSiB", "outputId": "db517fa9-04eb-4efe-a75b-970bdf2a3163" }, "outputs": [ { "data": { "text/plain": [ "tensor(1., device='cuda:0', grad_fn=)" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Sum the first sample output of the softmax activation function \n", "torch.sum(y_pred_probs[0])" ] }, { "cell_type": "markdown", "metadata": { "id": "yhwu9ln1sbl7" }, "source": [ "These prediction probablities are essentially saying how much the model *thinks* the target `X` sample (the input) maps to each class.\n", "\n", "Since there's one value for each class in `y_pred_probs`, the index of the *highest* value is the class the model thinks the specific data sample *most* belongs to.\n", "\n", "We can check which index has the highest value using `torch.argmax()`." ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "6X3Yf5gCsbME", "outputId": "a7e4db7e-08fd-426c-8b54-dfd7d3943d79" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([0.1872, 0.2918, 0.1495, 0.3715], device='cuda:0',\n", " grad_fn=)\n", "tensor(3, device='cuda:0')\n" ] } ], "source": [ "# Which class does the model think is *most* likely at the index 0 sample?\n", "print(y_pred_probs[0])\n", "print(torch.argmax(y_pred_probs[0]))" ] }, { "cell_type": "markdown", "metadata": { "id": "9veE071JtSZJ" }, "source": [ "You can see the output of `torch.argmax()` returns 3, so for the features (`X`) of the sample at index 0, the model is predicting that the most likely class value (`y`) is 3.\n", "\n", "Of course, right now this is just random guessing so it's got a 25% chance of being right (since there's four classes). But we can improve those chances by training the model.\n", "\n", "> **Note:** To summarize the above, a model's raw output is referred to as **logits**.\n", "> \n", "> For a multi-class classification problem, to turn the logits into **prediction probabilities**, you use the softmax activation function (`torch.softmax`).\n", ">\n", "> The index of the value with the highest **prediction probability** is the class number the model thinks is *most* likely given the input features for that sample (although this is a prediction, it doesn't mean it will be correct)." ] }, { "cell_type": "markdown", "metadata": { "id": "hlqJ3_xTupU3" }, "source": [ "### 8.5 Creating a training and testing loop for a multi-class PyTorch model\n", "\n", "Alright, now we've got all of the preparation steps out of the way, let's write a training and testing loop to improve and evaluate our model.\n", "\n", "We've done many of these steps before so much of this will be practice.\n", "\n", "The only difference is that we'll be adjusting the steps to turn the model outputs (logits) to prediction probabilities (using the softmax activation function) and then to prediction labels (by taking the argmax of the output of the softmax activation function).\n", "\n", "Let's train the model for `epochs=100` and evaluate it every 10 epochs." ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "55s1Pis9HC8H", "outputId": "4a7b0fd7-8a08-40a4-8694-435178b70832" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epoch: 0 | Loss: 1.04324, Acc: 65.50% | Test Loss: 0.57861, Test Acc: 95.50%\n", "Epoch: 10 | Loss: 0.14398, Acc: 99.12% | Test Loss: 0.13037, Test Acc: 99.00%\n", "Epoch: 20 | Loss: 0.08062, Acc: 99.12% | Test Loss: 0.07216, Test Acc: 99.50%\n", "Epoch: 30 | Loss: 0.05924, Acc: 99.12% | Test Loss: 0.05133, Test Acc: 99.50%\n", "Epoch: 40 | Loss: 0.04892, Acc: 99.00% | Test Loss: 0.04098, Test Acc: 99.50%\n", "Epoch: 50 | Loss: 0.04295, Acc: 99.00% | Test Loss: 0.03486, Test Acc: 99.50%\n", "Epoch: 60 | Loss: 0.03910, Acc: 99.00% | Test Loss: 0.03083, Test Acc: 99.50%\n", "Epoch: 70 | Loss: 0.03643, Acc: 99.00% | Test Loss: 0.02799, Test Acc: 99.50%\n", "Epoch: 80 | Loss: 0.03448, Acc: 99.00% | Test Loss: 0.02587, Test Acc: 99.50%\n", "Epoch: 90 | Loss: 0.03300, Acc: 99.12% | Test Loss: 0.02423, Test Acc: 99.50%\n" ] } ], "source": [ "# Fit the model\n", "torch.manual_seed(42)\n", "\n", "# Set number of epochs\n", "epochs = 100\n", "\n", "# Put data to target device\n", "X_blob_train, y_blob_train = X_blob_train.to(device), y_blob_train.to(device)\n", "X_blob_test, y_blob_test = X_blob_test.to(device), y_blob_test.to(device)\n", "\n", "for epoch in range(epochs):\n", " ### Training\n", " model_4.train()\n", "\n", " # 1. Forward pass\n", " y_logits = model_4(X_blob_train) # model outputs raw logits \n", " y_pred = torch.softmax(y_logits, dim=1).argmax(dim=1) # go from logits -> prediction probabilities -> prediction labels\n", " # print(y_logits)\n", " # 2. Calculate loss and accuracy\n", " loss = loss_fn(y_logits, y_blob_train) \n", " acc = accuracy_fn(y_true=y_blob_train,\n", " y_pred=y_pred)\n", "\n", " # 3. Optimizer zero grad\n", " optimizer.zero_grad()\n", "\n", " # 4. Loss backwards\n", " loss.backward()\n", "\n", " # 5. Optimizer step\n", " optimizer.step()\n", "\n", " ### Testing\n", " model_4.eval()\n", " with torch.inference_mode():\n", " # 1. Forward pass\n", " test_logits = model_4(X_blob_test)\n", " test_pred = torch.softmax(test_logits, dim=1).argmax(dim=1)\n", " # 2. Calculate test loss and accuracy\n", " test_loss = loss_fn(test_logits, y_blob_test)\n", " test_acc = accuracy_fn(y_true=y_blob_test,\n", " y_pred=test_pred)\n", "\n", " # Print out what's happening\n", " if epoch % 10 == 0:\n", " print(f\"Epoch: {epoch} | Loss: {loss:.5f}, Acc: {acc:.2f}% | Test Loss: {test_loss:.5f}, Test Acc: {test_acc:.2f}%\") " ] }, { "cell_type": "markdown", "metadata": { "id": "m_JNlpd4L6dL" }, "source": [ "### 8.6 Making and evaluating predictions with a PyTorch multi-class model\n", "\n", "It looks like our trained model is performaning pretty well.\n", "\n", "But to make sure of this, let's make some predictions and visualize them." ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "NjCKKhsGHC8H", "outputId": "583e10fb-fa1c-463a-fbd0-4d31d76b82ab" }, "outputs": [ { "data": { "text/plain": [ "tensor([[ 4.3377, 10.3539, -14.8948, -9.7642],\n", " [ 5.0142, -12.0371, 3.3860, 10.6699],\n", " [ -5.5885, -13.3448, 20.9894, 12.7711],\n", " [ 1.8400, 7.5599, -8.6016, -6.9942],\n", " [ 8.0726, 3.2906, -14.5998, -3.6186],\n", " [ 5.5844, -14.9521, 5.0168, 13.2890],\n", " [ -5.9739, -10.1913, 18.8655, 9.9179],\n", " [ 7.0755, -0.7601, -9.5531, 0.1736],\n", " [ -5.5918, -18.5990, 25.5309, 17.5799],\n", " [ 7.3142, 0.7197, -11.2017, -1.2011]], device='cuda:0')" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Make predictions\n", "model_4.eval()\n", "with torch.inference_mode():\n", " y_logits = model_4(X_blob_test)\n", "\n", "# View the first 10 predictions\n", "y_logits[:10]" ] }, { "cell_type": "markdown", "metadata": { "id": "lpAdeJRMNHjG" }, "source": [ "Alright, looks like our model's predictions are still in logit form.\n", "\n", "Though to evaluate them, they'll have to be in the same form as our labels (`y_blob_test`) which are in integer form.\n", "\n", "Let's convert our model's prediction logits to prediction probabilities (using `torch.softmax()`) then to prediction labels (by taking the `argmax()` of each sample).\n", "\n", "> **Note:** It's possible to skip the `torch.softmax()` function and go straight from `predicted logits -> predicted labels` by calling `torch.argmax()` directly on the logits.\n", ">\n", "> For example, `y_preds = torch.argmax(y_logits, dim=1)`, this saves a computation step (no `torch.softmax()`) but results in no prediction probabilities being available to use. " ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "faDQ4oLpHC8H", "outputId": "ca32986f-8dc0-419d-84df-1cc3ba8577e5" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Predictions: tensor([1, 3, 2, 1, 0, 3, 2, 0, 2, 0], device='cuda:0')\n", "Labels: tensor([1, 3, 2, 1, 0, 3, 2, 0, 2, 0], device='cuda:0')\n", "Test accuracy: 99.5%\n" ] } ], "source": [ "# Turn predicted logits in prediction probabilities\n", "y_pred_probs = torch.softmax(y_logits, dim=1)\n", "\n", "# Turn prediction probabilities into prediction labels\n", "y_preds = y_pred_probs.argmax(dim=1)\n", "\n", "# Compare first 10 model preds and test labels\n", "print(f\"Predictions: {y_preds[:10]}\\nLabels: {y_blob_test[:10]}\")\n", "print(f\"Test accuracy: {accuracy_fn(y_true=y_blob_test, y_pred=y_preds)}%\")" ] }, { "cell_type": "markdown", "metadata": { "id": "AMA5SSixOSru" }, "source": [ "Nice! Our model predictions are now in the same form as our test labels.\n", "\n", "Let's visualize them with `plot_decision_boundary()`, remember because our data is on the GPU, we'll have to move it to the CPU for use with matplotlib (`plot_decision_boundary()` does this automatically for us)." ] }, { "cell_type": "code", "execution_count": 59, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 390 }, "id": "kLOzBFdRHC8I", "outputId": "71dcaf7f-a30e-457d-8949-916cf9c5cc79" }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.figure(figsize=(12, 6))\n", "plt.subplot(1, 2, 1)\n", "plt.title(\"Train\")\n", "plot_decision_boundary(model_4, X_blob_train, y_blob_train)\n", "plt.subplot(1, 2, 2)\n", "plt.title(\"Test\")\n", "plot_decision_boundary(model_4, X_blob_test, y_blob_test)" ] }, { "cell_type": "markdown", "metadata": { "id": "p0vFchUQx7mc" }, "source": [ "## 9. More classification evaluation metrics\n", "\n", "So far we've only covered a couple of ways of evaluating a classification model (accuracy, loss and visualizing predictions).\n", "\n", "These are some of the most common methods you'll come across and are a good starting point.\n", "\n", "However, you may want to evaluate your classification model using more metrics such as the following:\n", "\n", "| **Metric name/Evaluation method** | **Defintion** | **Code** |\n", "| --- | --- | --- |\n", "| Accuracy | Out of 100 predictions, how many does your model get correct? E.g. 95% accuracy means it gets 95/100 predictions correct. | [`torchmetrics.Accuracy()`](https://torchmetrics.readthedocs.io/en/stable/classification/accuracy.html#id3) or [`sklearn.metrics.accuracy_score()`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html) |\n", "| Precision | Proportion of true positives over total number of samples. Higher precision leads to less false positives (model predicts 1 when it should've been 0). | [`torchmetrics.Precision()`](https://torchmetrics.readthedocs.io/en/stable/classification/precision.html#id4) or [`sklearn.metrics.precision_score()`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html) |\n", "| Recall | Proportion of true positives over total number of true positives and false negatives (model predicts 0 when it should've been 1). Higher recall leads to less false negatives. | [`torchmetrics.Recall()`](https://torchmetrics.readthedocs.io/en/stable/classification/recall.html#id5) or [`sklearn.metrics.recall_score()`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.recall_score.html) |\n", "| F1-score | Combines precision and recall into one metric. 1 is best, 0 is worst. | [`torchmetrics.F1Score()`](https://torchmetrics.readthedocs.io/en/stable/classification/f1_score.html#f1score) or [`sklearn.metrics.f1_score()`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html) |\n", "| [Confusion matrix](https://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/) | Compares the predicted values with the true values in a tabular way, if 100% correct, all values in the matrix will be top left to bottom right (diagnol line). | [`torchmetrics.ConfusionMatrix`](https://torchmetrics.readthedocs.io/en/stable/classification/confusion_matrix.html#confusionmatrix) or [`sklearn.metrics.plot_confusion_matrix()`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.ConfusionMatrixDisplay.html#sklearn.metrics.ConfusionMatrixDisplay.from_predictions) |\n", "| Classification report | Collection of some of the main classification metrics such as precision, recall and f1-score. | [`sklearn.metrics.classification_report()`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html) |\n", "\n", "Scikit-Learn (a popular and world-class machine learning library) has many implementations of the above metrics and you're looking for a PyTorch-like version, check out [TorchMetrics](https://torchmetrics.readthedocs.io/en/latest/), especially the [TorchMetrics classification section](https://torchmetrics.readthedocs.io/en/stable/pages/classification.html). \n", "\n", "Let's try the `torchmetrics.Accuracy` metric out.\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "6_HrLXfutFFX", "outputId": "ca9af26f-3d97-4019-fd7d-105f1dc2e68c" }, "outputs": [ { "data": { "text/plain": [ "tensor(0.9950, device='cuda:0')" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "try:\n", " from torchmetrics import Accuracy\n", "except:\n", " !pip install torchmetrics==0.9.3 # this is the version we're using in this notebook (later versions exist here: https://torchmetrics.readthedocs.io/en/stable/generated/CHANGELOG.html#changelog)\n", " from torchmetrics import Accuracy\n", "\n", "# Setup metric and make sure it's on the target device\n", "torchmetrics_accuracy = Accuracy(task='multiclass', num_classes=4).to(device)\n", "\n", "# Calculate accuracy\n", "torchmetrics_accuracy(y_preds, y_blob_test)" ] }, { "cell_type": "markdown", "metadata": { "id": "v4d-S9_-HC8I" }, "source": [ "## Exercises\n", "\n", "All of the exercises are focused on practicing the code in the sections above.\n", "\n", "You should be able to complete them by referencing each section or by following the resource(s) linked.\n", "\n", "All exercises should be completed using [device-agonistic code](https://pytorch.org/docs/stable/notes/cuda.html#device-agnostic-code).\n", "\n", "Resources:\n", "* [Exercise template notebook for 02](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/extras/exercises/02_pytorch_classification_exercises.ipynb)\n", "* [Example solutions notebook for 02](https://github.com/mrdbourke/pytorch-deep-learning/blob/main/extras/solutions/02_pytorch_classification_exercise_solutions.ipynb) (try the exercises *before* looking at this)\n", "\n", "1. Make a binary classification dataset with Scikit-Learn's [`make_moons()`](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_moons.html) function.\n", " * For consistency, the dataset should have 1000 samples and a `random_state=42`.\n", " * Turn the data into PyTorch tensors. Split the data into training and test sets using `train_test_split` with 80% training and 20% testing.\n", "2. Build a model by subclassing `nn.Module` that incorporates non-linear activation functions and is capable of fitting the data you created in 1.\n", " * Feel free to use any combination of PyTorch layers (linear and non-linear) you want.\n", "3. Setup a binary classification compatible loss function and optimizer to use when training the model.\n", "4. Create a training and testing loop to fit the model you created in 2 to the data you created in 1.\n", " * To measure model accuray, you can create your own accuracy function or use the accuracy function in [TorchMetrics](https://torchmetrics.readthedocs.io/en/latest/).\n", " * Train the model for long enough for it to reach over 96% accuracy.\n", " * The training loop should output progress every 10 epochs of the model's training and test set loss and accuracy.\n", "5. Make predictions with your trained model and plot them using the `plot_decision_boundary()` function created in this notebook.\n", "6. Replicate the Tanh (hyperbolic tangent) activation function in pure PyTorch.\n", " * Feel free to reference the [ML cheatsheet website](https://ml-cheatsheet.readthedocs.io/en/latest/activation_functions.html#tanh) for the formula.\n", "7. Create a multi-class dataset using the [spirals data creation function from CS231n](https://cs231n.github.io/neural-networks-case-study/) (see below for the code).\n", " * Construct a model capable of fitting the data (you may need a combination of linear and non-linear layers).\n", " * Build a loss function and optimizer capable of handling multi-class data (optional extension: use the Adam optimizer instead of SGD, you may have to experiment with different values of the learning rate to get it working).\n", " * Make a training and testing loop for the multi-class data and train a model on it to reach over 95% testing accuracy (you can use any accuracy measuring function here that you like).\n", " * Plot the decision boundaries on the spirals dataset from your model predictions, the `plot_decision_boundary()` function should work for this dataset too.\n", "\n", "```python\n", "# Code for creating a spiral dataset from CS231n\n", "import numpy as np\n", "N = 100 # number of points per class\n", "D = 2 # dimensionality\n", "K = 3 # number of classes\n", "X = np.zeros((N*K,D)) # data matrix (each row = single example)\n", "y = np.zeros(N*K, dtype='uint8') # class labels\n", "for j in range(K):\n", " ix = range(N*j,N*(j+1))\n", " r = np.linspace(0.0,1,N) # radius\n", " t = np.linspace(j*4,(j+1)*4,N) + np.random.randn(N)*0.2 # theta\n", " X[ix] = np.c_[r*np.sin(t), r*np.cos(t)]\n", " y[ix] = j\n", "# lets visualize the data\n", "plt.scatter(X[:, 0], X[:, 1], c=y, s=40, cmap=plt.cm.Spectral)\n", "plt.show()\n", "```\n", "\n", "## Extra-curriculum\n", "\n", "* Write down 3 problems where you think machine classification could be useful (these can be anything, get creative as you like, for example, classifying credit card transactions as fraud or not fraud based on the purchase amount and purchase location features). \n", "* Research the concept of \"momentum\" in gradient-based optimizers (like SGD or Adam), what does it mean?\n", "* Spend 10-minutes reading the [Wikipedia page for different activation functions](https://en.wikipedia.org/wiki/Activation_function#Table_of_activation_functions), how many of these can you line up with [PyTorch's activation functions](https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity)?\n", "* Research when accuracy might be a poor metric to use (hint: read [\"Beyond Accuracy\" by by Will Koehrsen](https://willkoehrsen.github.io/statistics/learning/beyond-accuracy-precision-and-recall/) for ideas).\n", "* **Watch:** For an idea of what's happening within our neural networks and what they're doing to learn, watch [MIT's Introduction to Deep Learning video](https://youtu.be/7sB052Pz0sQ)." ] } ], "metadata": { "accelerator": "GPU", "colab": { "collapsed_sections": [], "include_colab_link": true, "name": "02_pytorch_classification.ipynb", "provenance": [] }, "interpreter": { "hash": "3fbe1355223f7b2ffc113ba3ade6a2b520cadace5d5ec3e828c83ce02eb221bf" }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" } }, "nbformat": 4, "nbformat_minor": 4 }