Basics of Neural Networks

 

·       Simple intuition behind neural networks

If you have been a developer or seen one work – you know how it is to search for bugs in a code. You would fire various test cases by varying the inputs or circumstances and look for the output. The change in output provides you a hint on where to look for the bug – which module to check, which lines to read. Once you find it, you make the changes and the exercise continues until you have the right code / application.

Neural networks work in very similar manner. It takes several input, processes it through multiple neurons from multiple hidden layers and returns the result using an output layer. This result estimation process is technically known as “Forward Propagation“.

Next, we compare the result with actual output. The task is to make the output to neural network as close to actual (desired) output. Each of these neurons are contributing some error to final output. How do you reduce the error?

We try to minimize the value/ weight of neurons those are contributing more to the error and this happens while traveling back to the neurons of the neural network and finding where the error lies. This process is known as “Backward Propagation“.

In order to reduce these number of iterations to minimize the error, the neural networks use a common algorithm known as “Gradient Descent”, which helps to optimize the task quickly and efficiently.

That’s it – this is how neural network works! I know this is a very simple representation, but it would help you understand things in a simple manner.

·    Multi-Layer Perceptron and its basics

Just like atoms form the basics of any material on earth – the basic forming unit of a neural network is a perceptron. So, what is a perceptron?

 A perceptron can be understood as anything that takes multiple inputs and produces one output. For example, look at the image below.



Perceptron

 

The above structure takes three inputs and produces one output. The next logical question is what is the relationship between input and output? Let us start with basic ways and build on to find more complex ways.

Below, I have discussed three ways of creating input output relationships:

1. By directly combining the input and computing the output based on a threshold value. for eg: Take x1=0, x2=1, x3=1 and setting a threshold =0. So, if x1+x2+x3>0, the output is 1 otherwise 0. You can see that in this case, the perceptron calculates the output as 1.

2. Next, let us add weights to the inputs. Weights give importance to an input. For example, you assign w1=2, w2=3 and w3=4 to x1, x2 and x3 respectively.

 To compute the output, we will multiply input with respective weights and compare with threshold value as w1*x1 + w2*x2 + w3*x3 > threshold. These weights assign more importance to x3 in comparison to x1 and x2.

3. Next, let us add bias: Each perceptron also has a bias which can be thought of as how much flexible the perceptron is. It is somehow similar to the constant b of a linear function y = ax + b. It allows us to move the line up and down to fit the prediction with the data better. Without b the line will always goes through the origin (0, 0) and you may get a poorer fit.

For example, a perceptron may have two inputs, in that case, it requires three weights. One for each input and one for the bias. Now linear representation of input will look like, w1*x1 + w2*x2 + w3*x3 + 1*b.

But, all of this is still linear which is what perceptron’s used to be. But that was not as much fun. So, people thought of evolving a perceptron to what is now called as artificial neuron. A neuron applies non-linear transformations (activation function) to the inputs and biases.

 

·       What is an activation function?

Activation Function takes the sum of weighted input (w1*x1 + w2*x2 + w3*x3 + 1*b) as an argument and return the output of the neuron.





In above equation, we have represented 1 as x0 and b as w0.

The activation function is mostly used to make a non-linear transformation which allows us to fit nonlinear hypotheses or to estimate the complex functions. There are multiple activation functions, like: “Sigmoid”, “Tanh”, ReLu and many other.

·      Forward Propagation, Back Propagation and Epochs

Till now, we have computed the output and this process is known as “Forward Propagation“. But what if the estimated output is far away from the actual output (high error). In the neural network what we do, we update the biases and weights based on the error. This weight and bias updating process is known as “Back Propagation“.

Back-propagation (BP) algorithms work by determining the loss (or error) at the output and then propagating it back into the network. The weights are updated to minimize the error resulting from each neuron. The first step in minimizing the error is to determine the gradient (Derivatives) of each node w.r.t. the final output. To get a mathematical perspective of the backward propagation, refer below section.

 This one round of forward and back propagation iteration is known as one training iteration aka “Epoch“.

 

·       Implementation Using python:

 

# importing the library

import numpy as np

 

# creating the input array

X=np.array([[1,0,1,0],[1,0,1,1],[0,1,0,1]])

print ('\n Input:')

print(X)

 

# creating the output array

y=np.array([[1],[1],[0]])

print ('\n Actual Output:')

print(y)

 

# defining the Sigmoid Function

def sigmoid (x):

    return 1/(1 + np.exp(-x))

 

# derivative of Sigmoid Function

def derivatives_sigmoid(x):

    return x * (1 - x)

 

# initializing the variables

epoch=5000 # number of training iterations

lr=0.1 # learning rate

inputlayer_neurons = X.shape[1] # number of features in data set

hiddenlayer_neurons = 3 # number of hidden layers neurons

output_neurons = 1 # number of neurons at output layer

 

# initializing weight and bias

wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))

bh=np.random.uniform(size=(1,hiddenlayer_neurons))

wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))

bout=np.random.uniform(size=(1,output_neurons))

 

# training the model

for i in range(epoch):

 

    #Forward Propogation

    hidden_layer_input1=np.dot(X,wh)

    hidden_layer_input=hidden_layer_input1 + bh

    hiddenlayer_activations = sigmoid(hidden_layer_input)

    output_layer_input1=np.dot(hiddenlayer_activations,wout)

    output_layer_input= output_layer_input1+ bout

    output = sigmoid(output_layer_input)

 

    #Backpropagation

    E = y-output

    slope_output_layer = derivatives_sigmoid(output)

    slope_hidden_layer = derivatives_sigmoid(hiddenlayer_activations)

    d_output = E * slope_output_layer

    Error_at_hidden_layer = d_output.dot(wout.T)

    d_hiddenlayer = Error_at_hidden_layer * slope_hidden_layer

    wout += hiddenlayer_activations.T.dot(d_output) *lr

    bout += np.sum(d_output, axis=0,keepdims=True) *lr

    wh += X.T.dot(d_hiddenlayer) *lr

    bh += np.sum(d_hiddenlayer, axis=0,keepdims=True) *lr

 

print ('\n Output from the model:')

print (output)

 

 

 

 

 

 

 

 

 

 

 

·       Output

Input:

[[1 0 1 0]

 [1 0 1 1]

 [0 1 0 1]]

 

      Actual Output:

[[1]

 [1]

[0]]

 

     Output from the model:

[[0.97803237]

 [0.97104099]

 [0.03975799]]

 

 

Comments

Popular posts from this blog

Copyright registration Procedure in India

Impact of outliers to the decision trees

Connecting Business