Skip to main content

Stochastic gradient descent with multiple variables

 The idea is to understand how u can create a sample for stochastic gradient descent

using python , numpy and some basic maths.


so what is gradient descent 

u might be bored with the term and it is always boring with visualization

here i will run through a sample using python and plot that in mat plot lib for visual.

matplot lib is another library u can install on the go and it is like a simple x,y graph that we used

in our school days.don't worry this is pretty simple.


1.install python - https://www.python.org/downloads/

if you are using windows it will be a exe run that .

once installed type in command line

 

since the function is 

y = w1* x1 +w2 * x2 +b

we will use the know 

mean square error loss function 

mae  = (y - y_hat)**2

next step would be to calculate partial derivatives for w1 , w2 , b 

with respect to y

dy/dw1 = x1

dy/dw2 = x2 

dy/db = 1


h = (y - y_hat) ** 2 

h = u **2 , u = y - y_hat

dh/du = 2u = 2 (y - y_hat)

du/dy = 1

by chain rule dh/dy = dh/du * du/dy

dh/dy = 2 (y - y_hat) * 1

 

but we need the partial derivatives of w1 , w2 , b 

with respect to h (loss), again chain rule

if u don't know chain rule refer the partial derivative chain rule page

 so dh/dw1 = dh/dy * dy/dw1

 a) dh/dw1 = 2 (y - y_hat) * 1 * x1

dh/dw2 = dh/dy * dy/dw2

b ) dh/dw2 = 2 (y - y_hat) * 1 * x2 

dh/db = dh/dy * dy/db

c) dh/db = 2 (y - y_hat) * 1 * 1

 

ok now we got the p.d do we can do the backprop now.

 

 

 

entire code -

 

import numpy as np
import matplotlib.pyplot as plt
import math


#matrix multiplication for linear regression

#x1 x2 y
#[111]


#loss function

#learning rate
#w1-feature 1 - x1
#w2- feature 2 - x2
#b - bias
#m -total no of samples

w1 = -

w2 = -

b = 0
m = 6
alpha = 0.001
epochs =3

#X matrix of all feature /one sample at a time for this example

m1 = np.array([[]])
m2 = np.array([[]])
m3 = np.array([[]])
m4 = np.array([[]])
m5 = np.array([[]])
m6 = np.array([[]])

X = np.concatenate((m1, m2, m3, m4, m5, m6))             
#print(X)
#print(X.shape)
#print(X.ndim)

W = np.array([w1,w2])
#print(W)
#print(W.shape)
#print(W.ndim)
    
#loss_function
def loss_function(y_predict, y):
        Loss = (y_predict - y)**2
        #print("Loss", Loss)
        return Loss
        

def ypredict(x1, x2):
    _x = np.array([x1, x2])
    #print(_x)
    #matrix multiplication
    
    y_predict_ = W.dot(_x)+ b
    #print("y_predict_" ,y_predict_)
    return y_predict_
    

def derivative_naveez_w1(x1, y_predict, y):
    _w1 = w1- alpha*(2*x1*(y_predict - y))
    #print("_w1>>", _w1)
    return _w1

def derivative_naveez_w2(x2, y_predict, y):
    _w2 = w2 - alpha*(2*x2*(y_predict - y))
    #print("_w2>>",_w2)
    return _w2



def derivative_naveez_b(y_predict,y):
    _b = b - alpha*(2*(y_predict - y))
    #print("_b>>",_b)
    return _b


#epoch loop

#sample size loop
y_axis  = np.array([])
for epoch in range(epochs):
    for i in range(m):
        #print(X[i][0])
        #print(X[i][1])
        y_predict_ = ypredict(X[i][0],X[i][1])
        loss = loss_function(y_predict_, X[i][2])
        #print("loss>>>>>", loss)
        y_axis = np.append(y_axis, loss)
        _w1= derivative_naveez_w1(X[i][0], y_predict_, X[i][2])
        #print("w1>>>", w1)
        w1 = _w1
        _w2 = derivative_naveez_w2(X[i][1], y_predict_, X[i][2])
        #print("w2>>>", w2)
        w2 = _w2
        _b = derivative_naveez_b(y_predict_, X[i][2] )
        #print("b>>>", b)
        b = _b
        

            
                

epoch_x_axis = np.arange(m*epochs)
print(math.floor(1/3))
#print(y_axis)
plt.plot(epoch_x_axis, y_axis)
plt.xlabel('epochs')
plt.ylabel('Loss')

plt.show()

        




 


Comments

Popular posts from this blog

SHA-256 initial values

The simple workout to arrive at the initial values for sha-256 The first 32 bit of the fractional part of the sqroot (first 8 prime number 2-19) Alright what does it say  Sqrroot(prime)- Let’s say the first prime is 2 Sqroot(2)  = 1.414213562373095 Convert to hexadecimal- Since we are worried about the fractional part alone Converting the fractional part would be easy Fractional part- 0.414213562373095 Multiply the fractional part with 16 to arrive at hex 0.414213562373095*16= 6.62741699796952 0.62741699796952*16= 10.03867196751232 0.03867196751232*16=0.61875148019712 0.61875148019712*16=9.90002368315392 0.90002368315392*16=14.40037893046272 0.40037893046272*16=6.40606288740352 0.40606288740352*16=6.49700619845632 0.49700619845632*16=7.95209917530112 Resulting hexadecimal would be 6a09e667 which is  h0 := 0x6a09e667 Iam going to stop at the 8th iteration , why is that ? Since we are interested in 32 bit (8*4=32) Alright to make it clear  Convert hexade...

Linear Regression with one variable - Introduction

 It is not but making a some how clear relationship among variables the dependent and independent variables. talking in terms of maths the equation can be used meaningfully for something may be to determine /predict values from data. if y = m * x + b  the values for m , b can be anything but has to appropriate to predict y  so the loss which is  difference from existing to prediction is close to zero ~0 to start with we can say the one variable as -x  in some scenario m , b are called variables    the equation stated about is a line equation we have any equation  y = 2*x  y = x*x y = 2x +2x*x  so why the need of all these equations , it is all about playing data now a days in machine learning problems we create a data sets , lets consider as x  y to be a value of x the datas . y = datas  when we express the data as a function and plot in the graph we get the curves  take some random data x and plot x and y  x =1 , 2, ...