So what is learning rate and why it was introduced ,
sometimes it is called as step function ,why it is called the step function
it increases the step taken to reach from point a to b. alright u might be confused ,
let s go for an example
y = m* x line equation
x1 = x0 + 𝛿X
m1 = m0 +𝛿m
ok so what are these functions
will see one at a time
if y = m * x and y = 3 and x =2 the value of m would be
m = y /x = 3 /2 =1.5
which is the slope of the equation .
these might a real time example think of x as input and y as output
x can be a gold purchased year
y can be gold rate
the gold rate increases as the year increases
the examples are just numbers don't think too much for now
in this case i know x and y i am going to keep the 'm 'as the parameter
to be found by iteration or techniques in machine learning ,ok let go
we will take this example
m1 = m0 +𝛿m
y = m * x , x =2 , y = 3
the idea here is to find the optimized value or the value closer to
m = y /x = 3 /2 = 1.5
so why this trial real world example will be input and output
the reason we use m here is to optimize the eq
we cannot change x or y all we can do is do manipulation on existing data.
let us assume m =0.5 here
so m0 = 0.5
y = m * x
y_predict_value = 0.5 * 2 = 1
loss = y - y_predict_value
l = 3 - 1 = 2 the loss is 2
here the loss function l = y - y_predict_value
for now y_predict_value i will call as y_hat
l = y - y_hat
so what i can do to minimise the loss ,all i can do is change /modify the value of m
in terms of computers /machine learning this is called optimizing the value 'm'
to minimize the loss function ' l '
how to minimise the ' l '
y_predict = m * x
the smallest change that i can do in ' m ' that is 𝛿l / 𝛿m
so what is this 𝛿m it is the partial derivative with respect the loss function ' l'
and not partial derivative 𝛿y_hat / 𝛿m with respect to function y .alright
y_hat = 0.5 * x
𝛿y_hat / 𝛿m = x
l = y - y_hat
𝛿l/𝛿y_hat = -1
𝛿l/𝛿m = [𝛿y_hat / 𝛿m] * [ 𝛿l/𝛿y_hat]
= x * -1 = -x
m1 = m0 + 𝛿l/𝛿m
= 0.5 + (-x) = 0.5 - 2 = -1.5
y_hat = -1.5 * 2 = - 3
l = y - y_hat = 3 - (-3) = 6
so the loss is still 6 and not zero
ok
m2 = m1 + 𝛿l/𝛿m
= -1.5 + ( - x ) = -1.5 - 2 = -3.5
y_hat = -3.5 * 2 = - 7
l = y - y_hat = 3 - (-7) = 10
the loss seems to be increasing rather that to decrease ,
since m1 = m0 +𝛿m
iam going to change the funcion as m1 = m0 - 𝛿m
so why this negative ,this is the descent direction -𝛿m to minimise the loss
so now m1 = 0.5 - (-2) = 2.5
m1 = 2.5 ok the expected value of m should be around '1.5' refer slope above
since the step that we took from m0 to m1 is too large we might have
crossed the value '1.5 ' and reached '2.5'
this is where we use the learning rate ' 𝝰 '
m1 = m0 - 𝝰 * 𝛿l/𝛿m
the value of ' 𝝰 ' can be anything let us assume for now ' 𝝰 = 0.5'
earlier the value of ' 𝝰 ' was ' 1 ' ,how?
m1 = m0 - 1*(𝛿l/𝛿m) that's it
now we will do the iteration again with the new value of ' 𝝰 = 0.5'
m1 = 0.5 - 0.5 (-2) = 1.5
y_hat = 1.5 * 2 = 3
l = y - y_hat = 3 - (-3) = 0
so now we have arrived at the value of 'm' in 2 steps
earlier the step was too large ,by taking repeated steps we have missed
the value ,
by reducing the step size , we have arrived at the optimal value.
Comments
Post a Comment