The idea is to understand how u can create a sample for stochastic gradient descent
using python , numpy and some basic maths.
so what is gradient descent
u might be bored with the term and it is always boring with visualization
here i will run through a sample using python and plot that in mat plot lib for visual.
matplot lib is another library u can install on the go and it is like a simple x,y graph that we used
in our school days.don't worry this is pretty simple.
1.install python - https://www.python.org/downloads/
if you are using windows it will be a exe run that .
once installed type in command line
since the function is
y = w1* x1 +w2 * x2 +b
we will use the know
mean square error loss function
mae = (y - y_hat)**2
next step would be to calculate partial derivatives for w1 , w2 , b
with respect to y
dy/dw1 = x1
dy/dw2 = x2
dy/db = 1
h = (y - y_hat) ** 2
h = u **2 , u = y - y_hat
dh/du = 2u = 2 (y - y_hat)
du/dy = 1
by chain rule dh/dy = dh/du * du/dy
dh/dy = 2 (y - y_hat) * 1
but we need the partial derivatives of w1 , w2 , b
with respect to h (loss), again chain rule
if u don't know chain rule refer the partial derivative chain rule page
so dh/dw1 = dh/dy * dy/dw1
a) dh/dw1 = 2 (y - y_hat) * 1 * x1
dh/dw2 = dh/dy * dy/dw2
b ) dh/dw2 = 2 (y - y_hat) * 1 * x2
dh/db = dh/dy * dy/db
c) dh/db = 2 (y - y_hat) * 1 * 1
ok now we got the p.d do we can do the backprop now.
entire code -
import numpy as np
import matplotlib.pyplot as plt
import math
#matrix multiplication for linear regression
#x1 x2 y
#[111]
#loss function
#learning rate
#w1-feature 1 - x1
#w2- feature 2 - x2
#b - bias
#m -total no of samples
w1 = -
w2 = -
b = 0
m = 6
alpha = 0.001
epochs =3
#X matrix of all feature /one sample at a time for this example
m1 = np.array([[]])
m2 = np.array([[]])
m3 = np.array([[]])
m4 = np.array([[]])
m5 = np.array([[]])
m6 = np.array([[]])
X = np.concatenate((m1, m2, m3, m4, m5, m6))
#print(X)
#print(X.shape)
#print(X.ndim)
W = np.array([w1,w2])
#print(W)
#print(W.shape)
#print(W.ndim)
#loss_function
def loss_function(y_predict, y):
Loss = (y_predict - y)**2
#print("Loss", Loss)
return Loss
def ypredict(x1, x2):
_x = np.array([x1, x2])
#print(_x)
#matrix multiplication
y_predict_ = W.dot(_x)+ b
#print("y_predict_" ,y_predict_)
return y_predict_
def derivative_naveez_w1(x1, y_predict, y):
_w1 = w1- alpha*(2*x1*(y_predict - y))
#print("_w1>>", _w1)
return _w1
def derivative_naveez_w2(x2, y_predict, y):
_w2 = w2 - alpha*(2*x2*(y_predict - y))
#print("_w2>>",_w2)
return _w2
def derivative_naveez_b(y_predict,y):
_b = b - alpha*(2*(y_predict - y))
#print("_b>>",_b)
return _b
#epoch loop
#sample size loop
y_axis = np.array([])
for epoch in range(epochs):
for i in range(m):
#print(X[i][0])
#print(X[i][1])
y_predict_ = ypredict(X[i][0],X[i][1])
loss = loss_function(y_predict_, X[i][2])
#print("loss>>>>>", loss)
y_axis = np.append(y_axis, loss)
_w1= derivative_naveez_w1(X[i][0], y_predict_, X[i][2])
#print("w1>>>", w1)
w1 = _w1
_w2 = derivative_naveez_w2(X[i][1], y_predict_, X[i][2])
#print("w2>>>", w2)
w2 = _w2
_b = derivative_naveez_b(y_predict_, X[i][2] )
#print("b>>>", b)
b = _b
epoch_x_axis = np.arange(m*epochs)
print(math.floor(1/3))
#print(y_axis)
plt.plot(epoch_x_axis, y_axis)
plt.xlabel('epochs')
plt.ylabel('Loss')
plt.show()
Comments
Post a Comment