Skip to main content

Learning rate

 So what is learning  rate and why it was introduced ,

sometimes it is called as step function ,why it is called the step function 

it increases the step taken to reach from point a to b. alright u might be confused ,

let s go for an example 

 

y = m* x  line equation

x1 = x0 + 𝛿X 

m1 = m0 +𝛿m 

ok so what are these functions 

will see one at a time 

if y  = m * x and y = 3 and x =2 the value of m would be 

m = y /x = 3 /2 =1.5

which is the slope of the equation .

these might a real time example think of x as input and y as output 

x can be a gold purchased year

y can be gold rate 

the gold rate increases as the year increases 

the examples are just numbers don't think too much for now 

in this case i know x and y i am going to keep the 'm 'as  the parameter 

to be found by iteration or techniques in machine learning ,ok let go 

 

we will take this example 

m1 = m0 +𝛿m 

 y = m * x  , x =2 , y = 3 

the idea here is to find the optimized value or the value closer to 

m = y /x = 3 /2 = 1.5

so why this trial real world example will be input and output 

the reason we use m here is to optimize the eq 

we cannot change x or y all we can do is do manipulation on existing data.


let us assume m =0.5 here 

so m0 = 0.5 

y = m * x 

y_predict_value = 0.5 * 2 = 1

loss = y - y_predict_value

l = 3 - 1 = 2 the loss is 2 

 

here the loss function l  = y - y_predict_value

for now y_predict_value i will call as y_hat

l = y - y_hat

so what i can do to minimise the loss ,all i can do is change /modify the value of m

in terms of computers /machine learning this is called optimizing the value 'm'

to minimize the loss function ' l '

how to minimise the ' l ' 

y_predict  = m * x 

the smallest change that i can do in ' m ' that is 𝛿l / 𝛿m

so what is this 𝛿m it is the partial derivative with respect the loss function ' l' 

and not partial derivative  𝛿y_hat / 𝛿m with respect to function y .alright


y_hat = 0.5 * x

𝛿y_hat / 𝛿m =  x

l = y - y_hat

𝛿l/𝛿y_hat  = -1

 

 𝛿l/𝛿m = [𝛿y_hat / 𝛿m] * [ 𝛿l/𝛿y_hat]

= x * -1 = -x 

m1 = m0 + 𝛿l/𝛿m

 = 0.5 + (-x) = 0.5  - 2 = -1.5  


y_hat  = -1.5 * 2 = - 3 

l = y -  y_hat = 3 - (-3) = 6 

so the loss is still 6 and not zero 

ok 

m2  = m1 + 𝛿l/𝛿m

 = -1.5 + ( - x ) = -1.5 - 2 = -3.5 

 y_hat  = -3.5 * 2 = - 7

 

l = y -  y_hat = 3 - (-7) = 10

 

the loss seems to be increasing rather that to decrease , 

since m1 = m0 +𝛿m

iam going to change the funcion as  m1 = m0 - 𝛿m

so why this negative ,this is the descent direction  -𝛿m to minimise the loss 

so now m1 = 0.5 - (-2) =  2.5 

m1 = 2.5 ok the expected value of m should be around '1.5' refer slope above

since the step that we took from m0 to m1 is too large we might have 

crossed the value '1.5 ' and reached '2.5'

this is where we use the learning rate ' 𝝰 ' 

m1 = m0 - 𝝰 * 𝛿l/𝛿m 


the value of  ' 𝝰 ' can be anything let us assume for now ' 𝝰  = 0.5'

earlier the value of ' 𝝰  ' was ' 1 ' ,how?

m1 = m0 - 1*(𝛿l/𝛿m) that's it 

now we will do the iteration again with the new value of  ' 𝝰  = 0.5'


m1 = 0.5 - 0.5 (-2) = 1.5 

y_hat =  1.5 * 2 = 3

l = y - y_hat = 3 - (-3) = 0

so now we have arrived at the value of 'm' in 2 steps 

earlier the step was too large ,by taking repeated steps we have missed

the value ,

by reducing the step size , we have arrived at the optimal value.

 

 

 

 

 

 




Comments

Popular posts from this blog

SHA-256 initial values

The simple workout to arrive at the initial values for sha-256 The first 32 bit of the fractional part of the sqroot (first 8 prime number 2-19) Alright what does it say  Sqrroot(prime)- Let’s say the first prime is 2 Sqroot(2)  = 1.414213562373095 Convert to hexadecimal- Since we are worried about the fractional part alone Converting the fractional part would be easy Fractional part- 0.414213562373095 Multiply the fractional part with 16 to arrive at hex 0.414213562373095*16= 6.62741699796952 0.62741699796952*16= 10.03867196751232 0.03867196751232*16=0.61875148019712 0.61875148019712*16=9.90002368315392 0.90002368315392*16=14.40037893046272 0.40037893046272*16=6.40606288740352 0.40606288740352*16=6.49700619845632 0.49700619845632*16=7.95209917530112 Resulting hexadecimal would be 6a09e667 which is  h0 := 0x6a09e667 Iam going to stop at the 8th iteration , why is that ? Since we are interested in 32 bit (8*4=32) Alright to make it clear  Convert hexade...

Linear Regression with one variable - Introduction

 It is not but making a some how clear relationship among variables the dependent and independent variables. talking in terms of maths the equation can be used meaningfully for something may be to determine /predict values from data. if y = m * x + b  the values for m , b can be anything but has to appropriate to predict y  so the loss which is  difference from existing to prediction is close to zero ~0 to start with we can say the one variable as -x  in some scenario m , b are called variables    the equation stated about is a line equation we have any equation  y = 2*x  y = x*x y = 2x +2x*x  so why the need of all these equations , it is all about playing data now a days in machine learning problems we create a data sets , lets consider as x  y to be a value of x the datas . y = datas  when we express the data as a function and plot in the graph we get the curves  take some random data x and plot x and y  x =1 , 2, ...

Running node in browser-https://stackblitz.com

 Here comes the web browser development for web apps you can run your node project in web and do development as well  please tryout the node or any other web development using  https://stackblitz.com/ https://stackblitz.com/ run your scripts in the terminal Import your project from git hub and enjoy web over web development. see you soon.