{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Let's have a look at how to implement a neural network in Python. \n", "\n", "### Implementing the Feedforward Part of a Neural Network\n", "\n", "As a small programming exercise and to improve our understanding of neural networks, let's implement the feedforward part of a neural network from scratch. We will have to calculate the output of the network for some given weights and biases, as well as some inputs. Let's start by importing the necessary libraries" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2025-06-28T20:33:44.258370Z", "iopub.status.busy": "2025-06-28T20:33:44.258043Z", "iopub.status.idle": "2025-06-28T20:33:44.420173Z", "shell.execute_reply": "2025-06-28T20:33:44.419617Z" } }, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we define the activation function for which we use the sigmoid function" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2025-06-28T20:33:44.423211Z", "iopub.status.busy": "2025-06-28T20:33:44.422980Z", "iopub.status.idle": "2025-06-28T20:33:44.425635Z", "shell.execute_reply": "2025-06-28T20:33:44.425244Z" } }, "outputs": [], "source": [ "def activation_function(x):\n", " return 1/(1+np.exp(-x)) # sigmoid function" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we define the feedforward function which calculates the output of the neural network given some inputs, weights, and biases. The function takes the inputs, weights, and biases as arguments and returns the output of the network" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2025-06-28T20:33:44.427587Z", "iopub.status.busy": "2025-06-28T20:33:44.427413Z", "iopub.status.idle": "2025-06-28T20:33:44.430285Z", "shell.execute_reply": "2025-06-28T20:33:44.429872Z" } }, "outputs": [], "source": [ "def feedforward(inputs, w1, w2, b1, b2):\n", "\n", " # Compute the pre-activation values for the first layer\n", " z = b1 + np.matmul(w1, inputs)\n", "\n", " # Compute the post-activation values for the first layer\n", " a = activation_function(z)\n", "\n", " # Combine the post-activation values of the first layer to an output\n", " g = b2 + np.matmul(w2, a)\n", "\n", " return g" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Mathematically, the function computes the following\n", "\n", "$$z = b^{1} + w^1 x$$\n", "\n", "$$a = \\phi(z)$$\n", "\n", "$$g = b^2 + w^2 a$$\n", "\n", "and returns $g$ at the end. We have written this using matrix notation to make it more compact. Remember that node $j$ in the hidden layer is given by\n", "\n", "$$z_j = b_{j}^{1} + \\sum_{i=1}^N w_{ji}^{1} x_i$$\n", "\n", "$$a_j = \\phi(z_j)$$\n", "\n", "and the output of the network is given by\n", "\n", "$$g(x ; w) = b^{2}+\\sum_{j=1}^{M} w_{j}^{2} a_j.$$\n", "\n", "\n", "Let's test the function with some example inputs, weights and biases" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2025-06-28T20:33:44.432323Z", "iopub.status.busy": "2025-06-28T20:33:44.432140Z", "iopub.status.idle": "2025-06-28T20:33:44.438919Z", "shell.execute_reply": "2025-06-28T20:33:44.438187Z" } }, "outputs": [ { "data": { "text/plain": [ "np.float64(1.0943291429384328)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Define the weights and biases\n", "w1 = np.array([[0.1, 0.2], [0.3, 0.4]]) # 2x2 matrix\n", "w2 = np.array([0.5, 0.6]) # 1-d vector\n", "b1 = np.array([0.1, 0.2]) # 1-d vector\n", "b2 = 0.3\n", "\n", "# Define the inputs\n", "inputs = np.array([1, 2]) # 1-d vector\n", "\n", "# Compute the output of the network\n", "feedforward(inputs, w1, w2, b1, b2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To operationalize this, we would also need to define a loss function and an optimization algorithm to update the weights and biases. However, this is beyond the scope of this course.\n", "\n", "\n", "### Using Neural Networks in Sci-Kit Learn\n", "\n", "Sci-kit learn provides a simple interface to use neural networks. However, it is not as flexible as the more commonly used PyTorch or TensorFlow. We can reuse the **dataset of credit card transactions** from [Kaggle.com](https://www.kaggle.com/datasets/dhanushnarayananr/credit-card-fraud/data) to demonstrate how to use neural networks in scikit-learn." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2025-06-28T20:33:44.473746Z", "iopub.status.busy": "2025-06-28T20:33:44.473506Z", "iopub.status.idle": "2025-06-28T20:33:49.047398Z", "shell.execute_reply": "2025-06-28T20:33:49.046742Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset already downloaded!\n" ] } ], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "from sklearn.preprocessing import StandardScaler, MinMaxScaler\n", "from sklearn.neural_network import MLPClassifier\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.metrics import confusion_matrix, accuracy_score, roc_auc_score, recall_score, precision_score, roc_curve\n", "pd.set_option('display.max_columns', 50) # Display up to 50 columns\n", "from io import BytesIO\n", "from urllib.request import urlopen\n", "from zipfile import ZipFile\n", "import os.path\n", "\n", "# Check if the file exists\n", "if not os.path.isfile('data/card_transdata.csv'):\n", "\n", " print('Downloading dataset...')\n", "\n", " # Define the dataset to be downloaded\n", " zipurl = 'https://www.kaggle.com/api/v1/datasets/download/dhanushnarayananr/credit-card-fraud'\n", "\n", " # Download and unzip the dataset in the data folder\n", " with urlopen(zipurl) as zipresp:\n", " with ZipFile(BytesIO(zipresp.read())) as zfile:\n", " zfile.extractall('data')\n", "\n", " print('DONE!')\n", "\n", "else:\n", "\n", " print('Dataset already downloaded!')\n", "\n", "# Load the data\n", "df = pd.read_csv('data/card_transdata.csv')\n", "\n", "# Split the data into training and test sets\n", "X = df.drop('fraud', axis=1) # All variables except `fraud`\n", "y = df['fraud'] # Only our fraud variables\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size = 0.3, random_state = 42)\n", "\n", "# Scale the features\n", "def scale_features(scaler, df, col_names, only_transform=False):\n", "\n", " # Extract the features we want to scale\n", " features = df[col_names] \n", "\n", " # Fit the scaler to the features and transform them\n", " if only_transform:\n", " features = scaler.transform(features.values)\n", " else:\n", " features = scaler.fit_transform(features.values)\n", "\n", " # Replace the original features with the scaled features\n", " df[col_names] = features\n", "\n", "col_names = ['distance_from_home', 'distance_from_last_transaction', 'ratio_to_median_purchase_price'] \n", "scaler = StandardScaler() \n", "scale_features(scaler, X_train, col_names)\n", "scale_features(scaler, X_test, col_names, only_transform=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Recall that the target variable $y$ is `fraud`, which indicates whether the transaction is fraudulent or not. The other variables are the features $x$ of the transactions.\n", "\n", "To use a neural network for a classification task, we can use the `MLPClassifier` class from scikit-learn. The following code snippet shows how to use a neural network with one hidden layer with 16 nodes" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2025-06-28T20:33:49.050383Z", "iopub.status.busy": "2025-06-28T20:33:49.050095Z", "iopub.status.idle": "2025-06-28T20:34:58.669043Z", "shell.execute_reply": "2025-06-28T20:34:58.668402Z" } }, "outputs": [], "source": [ "clf = MLPClassifier(hidden_layer_sizes=(16,), random_state=42, verbose=False).fit(X_train, y_train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you would like to use a neural network with multiple hidden layers, you can specify the number of nodes per hidden layer using the `hidden_layer_sizes` parameter. For example, the following code snippet shows how to use a neural network with two hidden layers, one with 5 nodes and the other with 4 nodes" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "execution": { "iopub.execute_input": "2025-06-28T20:34:58.671583Z", "iopub.status.busy": "2025-06-28T20:34:58.671370Z", "iopub.status.idle": "2025-06-28T20:35:35.196974Z", "shell.execute_reply": "2025-06-28T20:35:35.196395Z" } }, "outputs": [], "source": [ "#| eval: false\n", "clf = MLPClassifier(alpha=1e-5, hidden_layer_sizes=(5,4), activation='logistic', random_state=42).fit(X_train, y_train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the `alpha` parameter specifies the regularization strength, the `activation` parameter specifies the activation function (by default it uses `relu`) and the `random_state` parameter specifies the seed for the random number generator (useful for reproducible results).\n", "\n", "We can check the loss curve to see how the neural network loss declined during training" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2025-06-28T20:35:35.201050Z", "iopub.status.busy": "2025-06-28T20:35:35.200807Z", "iopub.status.idle": "2025-06-28T20:35:35.365323Z", "shell.execute_reply": "2025-06-28T20:35:35.364814Z" } }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.plot(clf.loss_curve_)\n", "plt.title(\"Loss Curve\", fontsize=14)\n", "plt.xlabel('Iterations')\n", "plt.ylabel('Cost')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can then use the same way to evaluate the neural network performance as we did for the other ML models" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2025-06-28T20:35:35.367723Z", "iopub.status.busy": "2025-06-28T20:35:35.367492Z", "iopub.status.idle": "2025-06-28T20:35:35.568067Z", "shell.execute_reply": "2025-06-28T20:35:35.567239Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Accuracy: 0.9955266666666667\n", "Precision: 0.971747127308582\n", "Recall: 0.9772319896266352\n", "ROC AUC: 0.9996638991577014\n" ] } ], "source": [ "y_pred = clf.predict(X_test)\n", "y_proba = clf.predict_proba(X_test)\n", "\n", "print(f\"Accuracy: {accuracy_score(y_test, y_pred)}\")\n", "print(f\"Precision: {precision_score(y_test, y_pred)}\")\n", "print(f\"Recall: {recall_score(y_test, y_pred)}\")\n", "print(f\"ROC AUC: {roc_auc_score(y_test, y_proba[:, 1])}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The neural network performs substantially better than the logistic regression. As in the case of the tree-based methods, the ROC AUC score is much closer to the maximum value of 1 and we have an almost perfect classifier" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "execution": { "iopub.execute_input": "2025-06-28T20:35:35.570987Z", "iopub.status.busy": "2025-06-28T20:35:35.570519Z", "iopub.status.idle": "2025-06-28T20:35:35.702769Z", "shell.execute_reply": "2025-06-28T20:35:35.702129Z" } }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Compute the ROC curve\n", "fpr, tpr, thresholds = roc_curve(y_test, y_proba[:, 1])\n", "\n", "# Plot the ROC curve\n", "plt.plot(fpr, tpr)\n", "plt.plot([0, 1], [0, 1], linestyle='--', color='grey')\n", "plt.xlabel('False Positive Rate (FPR)')\n", "plt.ylabel('True Positive Rate (TPR)')\n", "plt.title('ROC Curve')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's also check the confusion matrix to see where we still make mistakes" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "execution": { "iopub.execute_input": "2025-06-28T20:35:35.705945Z", "iopub.status.busy": "2025-06-28T20:35:35.705688Z", "iopub.status.idle": "2025-06-28T20:35:35.949913Z", "shell.execute_reply": "2025-06-28T20:35:35.949015Z" } }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "conf_mat = confusion_matrix(y_test, y_pred, labels=[1, 0]).transpose() # Transpose the sklearn confusion matrix to match the convention in the lecture\n", "sns.heatmap(conf_mat, annot=True, cmap='Blues', fmt='g', xticklabels=['Fraud', 'No Fraud'], yticklabels=['Fraud', 'No Fraud'])\n", "plt.xlabel(\"Actual\")\n", "plt.ylabel(\"Predicted\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are around 270 false negatives, i.e., a fraudulent transaction that we did not detect. There are also around 980 false positives, i.e., \"false alarms\", where non-fraudulent transactions were classified as fraudulent.\n", "\n", "\n", "### Using Neural Networks in PyTorch\n", "\n", "While it is possible to use neural networks in scikit-learn, it is more common to use PyTorch or TensorFlow for neural networks. PyTorch is a popular deep-learning library that is widely used in academia and industry. In this section, we will show how to use PyTorch to build a simple neural network for the same credit card fraud detection task.\n", "\n", "::: {.callout-warning}\n", "### Feel Free to Skip This Section\n", "\n", "This section might be a bit more challenging than what we have looked at previously. If you think that you are not ready for this, feel free to skip this section. This is mainly meant to be a starting point for those who are interested in learning more about neural networks.\n", "\n", "For a more in-depth introduction to PyTorch, I recommend that you check out the [official PyTorch tutorials](https://pytorch.org/tutorials/). This section, in particular, builds on the [Learning PyTorch with Examples](https://pytorch.org/tutorials/beginner/pytorch_with_examples.html) tutorial.\n", "\n", ":::\n", "\n", "\n", "Let's start by importing the necessary libraries" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "execution": { "iopub.execute_input": "2025-06-28T20:35:35.952828Z", "iopub.status.busy": "2025-06-28T20:35:35.952626Z", "iopub.status.idle": "2025-06-28T20:35:38.316751Z", "shell.execute_reply": "2025-06-28T20:35:38.316262Z" } }, "outputs": [], "source": [ "import torch\n", "from torch.utils.data import DataLoader, TensorDataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then, let's prepare the data for PyTorch. We need to convert the data in our DataFrame to PyTorch tensors" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "execution": { "iopub.execute_input": "2025-06-28T20:35:38.319321Z", "iopub.status.busy": "2025-06-28T20:35:38.319072Z", "iopub.status.idle": "2025-06-28T20:35:38.353325Z", "shell.execute_reply": "2025-06-28T20:35:38.352597Z" } }, "outputs": [], "source": [ "X_train_tensor = torch.tensor(X_train.values, dtype=torch.float32)\n", "y_train_tensor = torch.tensor(y_train.values, dtype=torch.long)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that we also converted the input values to `float32` for improved training speed and the target values to `long` which is a type of integer (remember our target `y` can only take values zero or one). Next, we need to create a `DataLoader` object to load the data in mini-batches during the training process" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "execution": { "iopub.execute_input": "2025-06-28T20:35:38.356217Z", "iopub.status.busy": "2025-06-28T20:35:38.355931Z", "iopub.status.idle": "2025-06-28T20:35:38.359843Z", "shell.execute_reply": "2025-06-28T20:35:38.359265Z" } }, "outputs": [], "source": [ "dataset = TensorDataset(X_train_tensor, y_train_tensor)\n", "dataloader = DataLoader(dataset, batch_size=200, shuffle=True)\n", "dataset_size = len(dataloader.dataset)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we define the neural network model using the `nn` module from PyTorch" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "execution": { "iopub.execute_input": "2025-06-28T20:35:38.362347Z", "iopub.status.busy": "2025-06-28T20:35:38.362078Z", "iopub.status.idle": "2025-06-28T20:35:38.369820Z", "shell.execute_reply": "2025-06-28T20:35:38.369181Z" } }, "outputs": [], "source": [ "model = torch.nn.Sequential(\n", " torch.nn.Linear(7, 16), # 7 input features, 16 nodes in the hidden layer\n", " torch.nn.ReLU(), # ReLU activation function\n", " torch.nn.Linear(16, 2) # 16 nodes in the hidden layer, 2 output nodes (fraud or no fraud)\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We also need to define the loss function and the optimizer. We will use the cross-entropy loss function and the Adam optimizer" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "execution": { "iopub.execute_input": "2025-06-28T20:35:38.372486Z", "iopub.status.busy": "2025-06-28T20:35:38.372269Z", "iopub.status.idle": "2025-06-28T20:35:39.521619Z", "shell.execute_reply": "2025-06-28T20:35:39.521133Z" } }, "outputs": [], "source": [ "loss_fn = torch.nn.CrossEntropyLoss()\n", "optimizer = torch.optim.Adam(model.parameters(), lr=1e-3, weight_decay=1e-5) # Adam optimizer with learning rate of 0.001 and L2 regularization (analogous to alpha in scikit-learn)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now train the neural network using the following code snippet" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "execution": { "iopub.execute_input": "2025-06-28T20:35:39.523956Z", "iopub.status.busy": "2025-06-28T20:35:39.523668Z", "iopub.status.idle": "2025-06-28T20:44:21.029710Z", "shell.execute_reply": "2025-06-28T20:44:21.029036Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epoch 9 loss: 0.024499\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 19 loss: 0.008713\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 29 loss: 0.016122\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 39 loss: 0.007585\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 49 loss: 0.005461\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 59 loss: 0.020942\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 69 loss: 0.016723\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 79 loss: 0.007653\n" ] } ], "source": [ "for epoch in range(80):\n", "\n", " # Loop over batches in an epoch using DataLoader\n", " for id_batch, (X_batch, y_batch) in enumerate(dataloader):\n", "\n", " # Compute the predicted y using the neural network model with the current weights\n", " y_batch_pred = model(X_batch)\n", "\n", " # Compute the loss\n", " loss = loss_fn(y_batch_pred, y_batch)\n", "\n", " # Reset the gradients of the loss function to zero\n", " optimizer.zero_grad()\n", "\n", " # Compute the gradient of the loss with respect to model parameters\n", " loss.backward()\n", "\n", " # Update the weights by taking a \"step\" in the direction that reduces the loss\n", " optimizer.step()\n", "\n", " if epoch % 10 == 9:\n", " print(f\"Epoch {epoch} loss: {loss.item():>7f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that here we are updating the model weights for each mini-batch in the dataset and go over the whole dataset 80 times (epochs). We print the loss every epoch to see how the loss decreases over time.\n", "\n", "The following snippet shows how to use full-batch gradient descent instead of mini-batch gradient descent" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "execution": { "iopub.execute_input": "2025-06-28T20:44:21.032470Z", "iopub.status.busy": "2025-06-28T20:44:21.032215Z", "iopub.status.idle": "2025-06-28T20:47:30.939881Z", "shell.execute_reply": "2025-06-28T20:47:30.939004Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epoch 99 loss: 0.009982\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 199 loss: 0.009945\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 299 loss: 0.009928\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 399 loss: 0.009920\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 499 loss: 0.009914\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 599 loss: 0.009910\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 699 loss: 0.009907\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 799 loss: 0.009904\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 899 loss: 0.009901\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 999 loss: 0.009899\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 1099 loss: 0.009897\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 1199 loss: 0.009895\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 1299 loss: 0.009893\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 1399 loss: 0.009891\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 1499 loss: 0.009890\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 1599 loss: 0.009888\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 1699 loss: 0.009886\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 1799 loss: 0.009885\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 1899 loss: 0.009883\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Epoch 1999 loss: 0.009881\n" ] } ], "source": [ "#| eval: false\n", "for epoch in range(2000):\n", "\n", " # Compute the predicted y using the neural network model with the current weights\n", " y_epoch_pred = model(X_train_tensor)\n", "\n", " # Compute the loss\n", " loss = loss_fn(y_epoch_pred, y_train_tensor)\n", "\n", " # Reset the gradients of the loss function to zero\n", " optimizer.zero_grad()\n", "\n", " # Compute the gradient of the loss with respect to model parameters\n", " loss.backward()\n", "\n", " # Update the weights by taking a \"step\" in the direction that reduces the loss\n", " optimizer.step()\n", "\n", " # Print the loss every 100 epochs\n", " if epoch % 100 == 99:\n", " print(f\"Epoch {epoch} loss: {loss.item():>7f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that in this version we are updating the model weights 2000 times (epochs) and printing the loss every 100 epochs. We can now evaluate the model on the test set" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "execution": { "iopub.execute_input": "2025-06-28T20:47:30.943432Z", "iopub.status.busy": "2025-06-28T20:47:30.943096Z", "iopub.status.idle": "2025-06-28T20:47:31.064948Z", "shell.execute_reply": "2025-06-28T20:47:31.064126Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Accuracy: 0.9965833333333334\n", "Precision: 0.9775587566338135\n", "Recall: 0.9834865184394188\n" ] } ], "source": [ "X_test_tensor = torch.tensor(X_test.values, dtype=torch.float32)\n", "y_pred = torch.argmax(model(X_test_tensor), dim=1).numpy()\n", "\n", "print(f\"Accuracy: {accuracy_score(y_test, y_pred)}\")\n", "print(f\"Precision: {precision_score(y_test, y_pred)}\")\n", "print(f\"Recall: {recall_score(y_test, y_pred)}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that for simplicity we are reusing the sci-kit learn metrics to evaluate the model. \n", "\n", "However, our neural network trained in PyTorch does not perform exactly the same as the neural network trained in scikit-learn. This is likely because of different hyperparameters or different initializations of the weights. In practice, it is common to experiment with different hyperparameters to find the best model or to use grid search and cross-validation to try many values and find the best-performing ones.\n", "\n", "\n", "### Conclusions\n", "\n", "In this chapter, we have learned about neural networks, which are the foundation of deep learning. We have seen how to implement parts of a simple neural network from scratch and how to use neural networks in scikit-learn and PyTorch." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.4" } }, "nbformat": 4, "nbformat_minor": 4 }