Stochastic gradient descent with multiple variables

The idea is to understand how u can create a sample for stochastic gradient descent

using python , numpy and some basic maths.

so what is gradient descent

u might be bored with the term and it is always boring with visualization

here i will run through a sample using python and plot that in mat plot lib for visual.

matplot lib is another library u can install on the go and it is like a simple x,y graph that we used

in our school days.don't worry this is pretty simple.

1.install python - https://www.python.org/downloads/

if you are using windows it will be a exe run that .

once installed type in command line

since the function is

y = w1* x1 +w2 * x2 +b

we will use the know

mean square error loss function

mae = (y - y_hat)**2

next step would be to calculate partial derivatives for w1 , w2 , b

with respect to y

dy/dw1 = x1

dy/dw2 = x2

dy/db = 1

h = (y - y_hat) ** 2

h = u **2 , u = y - y_hat

dh/du = 2u = 2 (y - y_hat)

du/dy = 1

by chain rule dh/dy = dh/du * du/dy

dh/dy = 2 (y - y_hat) * 1

but we need the partial derivatives of w1 , w2 , b

with respect to h (loss), again chain rule

if u don't know chain rule refer the partial derivative chain rule page

so dh/dw1 = dh/dy * dy/dw1

a) dh/dw1 = 2 (y - y_hat) * 1 * x1

dh/dw2 = dh/dy * dy/dw2

b ) dh/dw2 = 2 (y - y_hat) * 1 * x2

dh/db = dh/dy * dy/db

c) dh/db = 2 (y - y_hat) * 1 * 1

ok now we got the p.d do we can do the backprop now.

entire code -

import numpy as np
import matplotlib.pyplot as plt
import math

#matrix multiplication for linear regression

#x1 x2 y
#[111]

#loss function

#learning rate
#w1-feature 1 - x1
#w2- feature 2 - x2
#b - bias
#m -total no of samples

w1 = -

w2 = -

b = 0
m = 6
alpha = 0.001
epochs =3

#X matrix of all feature /one sample at a time for this example

m1 = np.array([[]])
m2 = np.array([[]])
m3 = np.array([[]])
m4 = np.array([[]])
m5 = np.array([[]])
m6 = np.array([[]])

X = np.concatenate((m1, m2, m3, m4, m5, m6))
#print(X)
#print(X.shape)
#print(X.ndim)

W = np.array([w1,w2])
#print(W)
#print(W.shape)
#print(W.ndim)

#loss_function
def loss_function(y_predict, y):
        Loss = (y_predict - y)**2
        #print("Loss", Loss)
        return Loss


def ypredict(x1, x2):
    _x = np.array([x1, x2])
    #print(_x)
    #matrix multiplication

    y_predict_ = W.dot(_x)+ b
    #print("y_predict_" ,y_predict_)
    return y_predict_


def derivative_naveez_w1(x1, y_predict, y):
    _w1 = w1- alpha*(2*x1*(y_predict - y))
    #print("_w1>>", _w1)
    return _w1

def derivative_naveez_w2(x2, y_predict, y):
    _w2 = w2 - alpha*(2*x2*(y_predict - y))
    #print("_w2>>",_w2)
    return _w2

def derivative_naveez_b(y_predict,y):
    _b = b - alpha*(2*(y_predict - y))
    #print("_b>>",_b)
    return _b

#epoch loop

#sample size loop
y_axis = np.array([])
for epoch in range(epochs):
    for i in range(m):
        #print(X[i][0])
        #print(X[i][1])
        y_predict_ = ypredict(X[i][0],X[i][1])
        loss = loss_function(y_predict_, X[i][2])
        #print("loss>>>>>", loss)
        y_axis = np.append(y_axis, loss)
        _w1= derivative_naveez_w1(X[i][0], y_predict_, X[i][2])
        #print("w1>>>", w1)
        w1 = _w1
        _w2 = derivative_naveez_w2(X[i][1], y_predict_, X[i][2])
        #print("w2>>>", w2)
        w2 = _w2
        _b = derivative_naveez_b(y_predict_, X[i][2] )
        #print("b>>>", b)
        b = _b





epoch_x_axis = np.arange(m*epochs)
print(math.floor(1/3))
#print(y_axis)
plt.plot(epoch_x_axis, y_axis)
plt.xlabel('epochs')
plt.ylabel('Loss')

plt.show()

Machine Learning , IOT , Microservices , python

Search This Blog

Stochastic gradient descent with multiple variables

Labels

Comments

Post a Comment

Popular posts from this blog

SHA-256 initial values

Linear Regression with one variable - Introduction

Auth0