ML Series: The Basic Structure of a Neural Network: A step-by-step Tutorial in Python

Neural networks are algorithms created explicitly to simulate biological neural networks. The idea behind the Neural networks was to created an artificial system that would function like the human brain. At the basic level a Multi Layer Perceptron (MLP) is the simplest form of a Neural Network.

Introduction

At the basic level a Multi Layer Perceptron (MLP) is the simplest form of a Neural Network.

MLPs were initially inspired by the Perceptron, a supervised machine learning algorithm for binary classification. The Perceptron was only capable of handling linearly separable data hence the multi-layer perception was introduced to overcome this limitation.

An MLP is an artificial neural network and it consists of an input layer, one or more hidden layers and an output layer, an activation function and a set of weights and biases:

In this article, we will implement a basic neural network from scratch making use of mainly the NumPy library.

The goal is to implement a Multilayer Perceptron, with an input layer, two hidden layers and an output layer. We will be making use of the Softmax activation function for the output and ReLu activation function for the hidden layers. The steps we are going to be taken are:

1. Data Preparation

2. Initialize parameters

3. Define activation functions

4. Initialize weights and biases

5. Forward Propagation

6. Backward Propagation

7. Train the Neural Network

8. Evaluate the accuracy on test data

9. Make Prediction

1. Data Preparation

We are going to be utilizing the iris dataset from sklearn, it is a simple dataset with, which has four features (Sepal Length, Sepal Width, Petal Length, Petal Width) and three flower classes (Iris Setosa, Iris Versicolor, Iris Virginica)

python

import numpy as np
import copy
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

# Load Iris dataset
iris = load_iris()
X = iris.data
y = iris.target.reshape(-1,1)

2. Initialize Parameters

The next step is to use one hot encoding for the y values, and the reason we are doing this is to convert the categorical labels into numerical vectors, making them compatible with our neural network. So we are going to split the data into test and training sets

python

def one_hot_encode(y):
     maps = {0: [1., 0., 0.], 1: [0., 1., 0.], 2: [0., 0., 1.]}
     new_y = []
     for i in y:
        new_y.append(maps[i[0]])
     return np.array(new_y)

y_onehot = one_hot_encode(y)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y_onehot, test_size=0.2, random_state=42)

3. Define activation functions

Activation functions are functions that take in the in weighted sum of inputs and then, gives a real number (regression) or binary (classification). They are used to introduce a level of non-linearity in a network, which allows it to learn complex patterns.

For our neural network, we are going to be using the ReLU activation function for our hidden layers, and the Softmax activation function for the output layer.

python

def relu(x):
  return np.maximum(0, x)

def softmax(x):
    e_x = np.exp(x - np.max(x, axis=1, keepdims=True))
    return e_x / np.sum(e_x, axis=1, keepdims=True)

# Define derivative of activation function (ReLU derivative)
def relu_derivative(x):
  return np.where(x > 0, 1, 0)

4. Initialize weights and biases

Here we initialized the weights and biases of each layer to train the neural network. The random module from Numpy takes care of that.

python

# Step 4: Initialize weights and biases
def initialize_parameters(hidden_size, hidden_size2):
     # Define the network architecture
     input_size = X_train.shape[1]  # 4 features
     output_size = y_train.shape[1]  # 3 unique labels

     # Initialize weights and biases
     np.random.seed(42)
     weights = {
     'h1': np.random.uniform(-0.1, 0.1, (input_size, hidden_size)),
     'h2': np.random.uniform(-0.1, 0.1, (hidden_size, hidden_size2)),
     'out': np.random.uniform(-0.1, 0.1, (hidden_size2, output_size))
     }
     biases = {
     'h1': np.random.uniform(-0.1, 0.1, hidden_size),
     'h2': np.random.uniform(-0.1, 0.1, hidden_size2),
     'out': np.random.uniform(-0.1, 0.1, output_size)
     }
     return weights, biases

5. Forward Propagation

The forward propagation algorithm takes in the weights and biases and computes the linear output and then passes it through an activation function, and then repeats it for each layer until an output value for each is obtained.

python

def forward(X, weights, biases):
    z1 = np.dot(X, weights['h1']) + biases['h1'] # W1.X + b1
    a1 = relu(z1)
    z2 = np.dot(a1, weights['h2']) + biases['h2'] # W2.A1 + b2
    a2 = relu(z2)
    z3 = np.dot(a2, weights['out']) + biases['out'] # Wout.A2 + bout
    a3 = softmax(z3)
    
    return a1, a2, a3

6. Backward Propagation

The backward propagation algorithm is calculates the error of the output layer, propagating the error backward to the hidden layer, and computing the gradients using the chain rule. Then it goes a step further to update the weights and biases gotten from the back propagation.

python

def backward_propagation(X, y, a1, a2, a3, learning_rate, weights, biases):
    m = y.shape[0]
    
    dz3 = a3 - y
    dw3 = np.dot(a2.T, dz3) / m
    db3 = np.sum(dz3, axis=0) / m

    dz2 = np.dot(dz3, weights['out'].T) * relu_derivative(a2)
    dw2 = np.dot(a1.T, dz2) / m
    db2 = np.sum(dz2, axis=0) / m

    dz1 = np.dot(dz2, weights['h2'].T) * relu_derivative(a1)
    dw1 = np.dot(X.T, dz1) / m
    db1 = np.sum(dz1, axis=0) / m

    # Update weights and biases
    weights['h1'] -= learning_rate * dw1
    weights['h2'] -= learning_rate * dw2
    weights['out'] -= learning_rate * dw3
    biases['h1'] -= learning_rate * db1
    biases['h2'] -= learning_rate * db2
    biases['out'] -= learning_rate * db3

7. Train the Neural Network

Then we move on to training the model, using the training data. The training process repeatedly performs forward propagation, and backward propagation to update the weights and biases, until the number of epochs has been reached.

python

def train_mlp(hidden_layer_size, hidden_layer_size2, epochs, learning_rate):
     scores_so_far = {}
     weights, biases = initialize_parameters(hidden_layer_size, hidden_layer_size2)

     # Training the network
     for epoch in range(epochs):
          a1, a2, a3 = forward(X_train, weights, biases)
          backward_propagation(X_train, y_train, a1, a2, a3, learning_rate, weights, biases)

          predictions = np.argmax(a3, axis=1)
          true_labels = np.argmax(y_train, axis=1)

          curr_accuracy = accuracy_score(true_labels, predictions)

          print(f'Accuracy: {curr_accuracy}, epoch: {epoch}')

8. Evaluate the accuracy on test data

Based on the data passed through, the accuracy of the predicted output values would be compared against the training set. The accuracy of both the train and test datasets are displayed.

python

# Evaluate the model
      scores_so_far[curr_accuracy] = [copy.deepcopy(weights), copy.deepcopy(biases), copy.deepcopy(a3)]
      weights_new = scores_so_far[max(scores_so_far.keys())][0]
      biases_new = scores_so_far[max(scores_so_far.keys())][1]    
      
      _, _, out_test = forward(X_test, weights_new, biases_new)
      
      y_pred_train = np.argmax(scores_so_far[max(scores_so_far.keys())][2], axis=1)
      train_labels = np.argmax(y_train, axis=1)
      
      accuracy = accuracy_score(train_labels, y_pred_train)
      
      print(f'Model Train Accuracy: {accuracy * 100:.2f}')
      
      y_pred = np.argmax(out_test, axis=1)
      test_labels = np.argmax(y_test, axis=1)
      
      accuracy = accuracy_score(test_labels, y_pred)
      
      print(f'Model Test Accuracy: {accuracy * 100:.2f}')

      return weights, biases

#Call the train_mlp function
weights, biases = train_mlp(hidden_layer_size=90, hidden_layer_size2=55, epochs=200, learning_rate=0.1)

9. Make prediction

We can then use the weights and biases from the training of the model to predict new values. Voila, we have a fully functional deep learning model!

python

def predict(X):     
     _, _, prediction = forward(X, weights, biases)
     predictions = (prediction > 0.5).astype(int)
     return predictions

predictions = predict(X_test)

Conclusion

We have been able to successfully develop a MultiLayer Perceptron Model from ground up using mainly the numpy library. The model created is the less complex version of the deep learning models that we use interact with everyday i.e. ChatGPT, Gemini, Google assistant, Midjourney etc.

The article Building a Neural Network from Scratch using Numpy and Math Libraries: A Step-by-Step Tutorial In Python was a major source of inspiration for my article.

Thank you for reading!

The full code to the project is highlighted below 👇