Data Science

Diploma in Banking Supervision (CEMFI)

Author
Affiliation

Joël Marbet

Banco de España

Published

July 01, 2025

About this Course

This course serves as an introduction to machine learning techniques used in data science. While we will cover some of the underlying theory to get a better understanding of the methods we are going to use, the emphasis will be on practical implementation. Throughout the course, we will be using the programming language Python, which is the dominant programming language in this field.

The course is divided into two parts. In the first part, we will get a brief overview of the field, cover some basic concepts of machine learning and have a look at some of the most commonly used methods. In the second part, we will apply these methods to real-world problems, which hopefully will give you a starting point for your own projects. The course outline is as follows:

Part I: Overview and Methods

  1. Introduction to Machine Learning
  2. Basic Concepts
  3. Decision Trees
  4. Neural Networks
  5. Additional Methods

Part II: Applications

  1. Loan Default Prediction
  2. House Price Prediction

The course is designed to be self-contained, meaning that you do not need any prior knowledge of machine learning to follow along.

Useful Resources

The course does not follow a particular textbook but has drawn material from several sources such as

  • Hastie, Tibshirani, and Friedman (2009), “The Elements of Statistical Learning”
  • Murphy (2012), “Machine Learning: A Probabilistic Perspective”
  • Murphy (2022), “Probabilistic Machine Learning: An Introduction”
  • Murphy (2023), “Probabilistic Machine Learning: Advanced Topics”
  • Goodfellow, Bengio, and Courville (2016), “Deep Learning”
  • Bishop (2006), “Pattern Recognition And Machine Learning”
  • Nielsen (2019), “Neural Networks and Deep Learning”
  • Sutton and Barto (2018), “Reinforcement Learning: An Introduction”

Note that all of these books are officially available for free in the form of PDFs or online versions (see the links in the references). However, you are not required to read them and, as a word of warning, the books go much deeper into the mathematical theory behind the machine learning techniques than we will in this course. Nevertheless, you may find them useful if you want to learn more about the subject.

Regarding programming in Python, McKinney (2022) “Python for Data Analysis” might serve as a good reference book. The book is available for free online and covers a lot of the material we will be using in this course. You can find it here: Python for Data Analysis.

Software Installation Notes

We will be using Python for this course. For simplicity, we will be using the Anaconda distribution, which is a popular distribution of Python (and R) that aims to simplify the management of packages. We will also be using the Visual Studio Code (VS Code) as our code editor.

Anaconda Installation

The first step is to install the Anaconda distribution:

  1. Download the Anaconda distribution from anaconda.com. Note: If you are using a M1 Mac (or newer), you have to choose the 64-Bit (M1) Graphical Installer. With an older Intel Mac, you can choose the 64-Bit Graphical Installer. With Windows, you can choose the 64-Bit Graphical Installer (i.e., the only Windows option).

  2. Open the installer that you have downloaded in the previous step and follow the on-screen instructions.

  3. If it asks you to update Anaconda Navigator at the end, you can click Yes (to agree to the update), Yes (to quit Anaconda Navigator) and then Update Now (to actually start the update).

To confirm that the installation was successful, you can open a terminal window on macOS/Linux or an Anaconda Prompt if you are on Windows and run the following command:

conda --version

This should display the version of Conda that you have installed. If you see an error message, the installation was likely not successful and you should ask for advice from your peers or send me an email.

Terminal Output after Anaconda Installation

Creating a Conda Environment

Next, we want to create a new environment for this course that contains the correct Python version and all the Python packages we need. We can do this by creating a new Conda environment from the environment.yml provided on Moodle.

  1. Open a terminal window on macOS/Linux or an Anaconda Prompt if you are on Windows.

  2. There are two ways to create the Conda environment:

    Option A: Run the following command from the terminal or Anaconda Prompt:

    conda env create -f https://datascience.joelmarbet.com/environment.yml

    This downloads the environment.yml file automatically and creates the environment.

    Option B: Download the environment.yml file manually:

    1. Navigate to the folder where you have downloaded the environment.yml file. On macOS/Linux, you can do this by running the following command in the terminal:

      cd ~/Downloads

      which will navigate to the Downloads folder in your home directory.

      On Windows, you can do this by running the following command in the Anaconda Prompt:

      cd "%userprofile%/Downloads"

      which will navigate to the Downloads folder in your user profile.

      Note that if you use a different path that contains space you need to put the path in quotes, e.g., cd "~/My Downloads".

    2. Create a new Conda environment from the environment.yml file by running the following command in the terminal or Anaconda Prompt:

      conda env create -f environment.yml

    Either option will create a new Conda environment called datascience_course_cemfi with the correct Python version and all the Python packages we need for this course. Note that the installation might take a few minutes.

  3. Activate the new Conda environment by running the following command in the terminal or Anaconda Prompt:

    conda activate datascience_course_cemfi 

To confirm that the environment was created successfully, you can run the following command in the terminal or Anaconda Prompt:

python --version

This should display Python version 3.8.8. If you see another Python version you might have forgotten to activate the environment or the environment was not created successfully.

Terminal Output From Environment Creation
Resetting or Updating a Conda Environment

If you accidentally make changes to the environment and want to reset it to the original state, you can do this by navigating to the folder where you have downloaded environment.yml and then running the following command in the terminal or Anaconda Prompt:

conda env update --file environment.yml --prune

Alternatively, you can also update the environment by running the following command in the terminal or Anaconda Prompt, which downloads the environment.yml file automatically from the course website:

conda env update --file https://datascience.joelmarbet.com/environment.yml --prune

This can also be used to update the environment if we add new packages to the environment.yml file.

Installing VS Code

The last step is to install the Visual Studio Code (VS Code) editor:

  1. Download the Visual Studio Code editor from code.visualstudio.com.
  2. Open the installer that you have downloaded in the previous step and follow the on-screen instructions.

We also need to install some VS Code extensions that will help us with Python programming and Jupyter notebooks:

  1. Open VS Code.

  2. Click on the Extensions icon on the left sidebar (or press Cmd+Shift+X on macOS or Ctrl+Shift+X on Windows).

    Installing Extensions in VSCode
  3. Search for Python and click on the Install button for the extension that is provided by Microsoft.

  4. Search for Jupyter and click on the Install button for the extension that is provided by Microsoft.

Testing the Installation

To test the installation, you can download a Juypter notebook from Moodle and open it in VS Code:

  1. Open the Jupyter notebook in VS Code.

  2. Click on Select Kernel in the top right corner of the notebook and choose the datascience_course_cemfi kernel.

    VSCode Jupyter Kernel Selection
  3. Run the first cell of the notebook by clicking on the Execute Cell button next to the cell on the left.

If you see the output of the cell (or a green check mark below the cell), the installation was successful.

Running Jupyter Notebooks in the Browser

If you have issues running Jupyter notebooks in VSCode, you can also run them in the browser. To do this, you can open a terminal window on macOS/Linux or an Anaconda Prompt if you are on Windows and run the following command:

jupyter notebook

This will open a new tab in your default browser with the Jupyter notebook interface. You can then navigate to the folder where you have downloaded the course materials and open the notebooks from there.