Learning rate

So what is learning rate and why it was introduced ,

sometimes it is called as step function ,why it is called the step function

it increases the step taken to reach from point a to b. alright u might be confused ,

let s go for an example

y = m* x line equation

x1 = x0 + 𝛿X

m1 = m0 +𝛿m

ok so what are these functions

will see one at a time

if y = m * x and y = 3 and x =2 the value of m would be

m = y /x = 3 /2 =1.5

which is the slope of the equation .

these might a real time example think of x as input and y as output

x can be a gold purchased year

y can be gold rate

the gold rate increases as the year increases

the examples are just numbers don't think too much for now

in this case i know x and y i am going to keep the 'm 'as the parameter

to be found by iteration or techniques in machine learning ,ok let go

we will take this example

m1 = m0 +𝛿m

y = m * x , x =2 , y = 3

the idea here is to find the optimized value or the value closer to

m = y /x = 3 /2 = 1.5

so why this trial real world example will be input and output

the reason we use m here is to optimize the eq

we cannot change x or y all we can do is do manipulation on existing data.

let us assume m =0.5 here

so m0 = 0.5

y = m * x

y_predict_value = 0.5 * 2 = 1

loss = y - y_predict_value

l = 3 - 1 = 2 the loss is 2

here the loss function l = y - y_predict_value

for now y_predict_value i will call as y_hat

l = y - y_hat

so what i can do to minimise the loss ,all i can do is change /modify the value of m

in terms of computers /machine learning this is called optimizing the value 'm'

to minimize the loss function ' l '

how to minimise the ' l '

y_predict = m * x

the smallest change that i can do in ' m ' that is 𝛿l / 𝛿m

so what is this 𝛿m it is the partial derivative with respect the loss function ' l'

and not partial derivative 𝛿y_hat / 𝛿m with respect to function y .alright

y_hat = 0.5 * x

𝛿y_hat / 𝛿m = x

l = y - y_hat

𝛿l/𝛿y_hat = -1

𝛿l/𝛿m = [𝛿y_hat / 𝛿m] * [ 𝛿l/𝛿y_hat]

= x * -1 = -x

m1 = m0 + 𝛿l/𝛿m

= 0.5 + (-x) = 0.5 - 2 = -1.5

y_hat = -1.5 * 2 = - 3

l = y - y_hat = 3 - (-3) = 6

so the loss is still 6 and not zero

m2 = m1 + 𝛿l/𝛿m

= -1.5 + ( - x ) = -1.5 - 2 = -3.5

y_hat = -3.5 * 2 = - 7

l = y - y_hat = 3 - (-7) = 10

the loss seems to be increasing rather that to decrease ,

since m1 = m0 +𝛿m

iam going to change the funcion as m1 = m0 - 𝛿m

so why this negative ,this is the descent direction -𝛿m to minimise the loss

so now m1 = 0.5 - (-2) = 2.5

m1 = 2.5 ok the expected value of m should be around '1.5' refer slope above

since the step that we took from m0 to m1 is too large we might have

crossed the value '1.5 ' and reached '2.5'

this is where we use the learning rate ' 𝝰 '

m1 = m0 - 𝝰 * 𝛿l/𝛿m

the value of ' 𝝰 ' can be anything let us assume for now ' 𝝰 = 0.5'

earlier the value of ' 𝝰 ' was ' 1 ' ,how?

m1 = m0 - 1*(𝛿l/𝛿m) that's it

now we will do the iteration again with the new value of ' 𝝰 = 0.5'

m1 = 0.5 - 0.5 (-2) = 1.5

y_hat = 1.5 * 2 = 3

l = y - y_hat = 3 - (-3) = 0

so now we have arrived at the value of 'm' in 2 steps

earlier the step was too large ,by taking repeated steps we have missed

the value ,

by reducing the step size , we have arrived at the optimal value.

Machine Learning , IOT , Microservices , python

Search This Blog

Learning rate

Labels

Comments

Post a Comment

Popular posts from this blog

SHA-256 initial values

Linear Regression with one variable - Introduction

Running node in browser-https://stackblitz.com