Skip to main content

Learning rate

 So what is learning  rate and why it was introduced ,

sometimes it is called as step function ,why it is called the step function 

it increases the step taken to reach from point a to b. alright u might be confused ,

let s go for an example 

 

y = m* x  line equation

x1 = x0 + 𝛿X 

m1 = m0 +𝛿m 

ok so what are these functions 

will see one at a time 

if y  = m * x and y = 3 and x =2 the value of m would be 

m = y /x = 3 /2 =1.5

which is the slope of the equation .

these might a real time example think of x as input and y as output 

x can be a gold purchased year

y can be gold rate 

the gold rate increases as the year increases 

the examples are just numbers don't think too much for now 

in this case i know x and y i am going to keep the 'm 'as  the parameter 

to be found by iteration or techniques in machine learning ,ok let go 

 

we will take this example 

m1 = m0 +𝛿m 

 y = m * x  , x =2 , y = 3 

the idea here is to find the optimized value or the value closer to 

m = y /x = 3 /2 = 1.5

so why this trial real world example will be input and output 

the reason we use m here is to optimize the eq 

we cannot change x or y all we can do is do manipulation on existing data.


let us assume m =0.5 here 

so m0 = 0.5 

y = m * x 

y_predict_value = 0.5 * 2 = 1

loss = y - y_predict_value

l = 3 - 1 = 2 the loss is 2 

 

here the loss function l  = y - y_predict_value

for now y_predict_value i will call as y_hat

l = y - y_hat

so what i can do to minimise the loss ,all i can do is change /modify the value of m

in terms of computers /machine learning this is called optimizing the value 'm'

to minimize the loss function ' l '

how to minimise the ' l ' 

y_predict  = m * x 

the smallest change that i can do in ' m ' that is 𝛿l / 𝛿m

so what is this 𝛿m it is the partial derivative with respect the loss function ' l' 

and not partial derivative  𝛿y_hat / 𝛿m with respect to function y .alright


y_hat = 0.5 * x

𝛿y_hat / 𝛿m =  x

l = y - y_hat

𝛿l/𝛿y_hat  = -1

 

 𝛿l/𝛿m = [𝛿y_hat / 𝛿m] * [ 𝛿l/𝛿y_hat]

= x * -1 = -x 

m1 = m0 + 𝛿l/𝛿m

 = 0.5 + (-x) = 0.5  - 2 = -1.5  


y_hat  = -1.5 * 2 = - 3 

l = y -  y_hat = 3 - (-3) = 6 

so the loss is still 6 and not zero 

ok 

m2  = m1 + 𝛿l/𝛿m

 = -1.5 + ( - x ) = -1.5 - 2 = -3.5 

 y_hat  = -3.5 * 2 = - 7

 

l = y -  y_hat = 3 - (-7) = 10

 

the loss seems to be increasing rather that to decrease , 

since m1 = m0 +𝛿m

iam going to change the funcion as  m1 = m0 - 𝛿m

so why this negative ,this is the descent direction  -𝛿m to minimise the loss 

so now m1 = 0.5 - (-2) =  2.5 

m1 = 2.5 ok the expected value of m should be around '1.5' refer slope above

since the step that we took from m0 to m1 is too large we might have 

crossed the value '1.5 ' and reached '2.5'

this is where we use the learning rate ' 𝝰 ' 

m1 = m0 - 𝝰 * 𝛿l/𝛿m 


the value of  ' 𝝰 ' can be anything let us assume for now ' 𝝰  = 0.5'

earlier the value of ' 𝝰  ' was ' 1 ' ,how?

m1 = m0 - 1*(𝛿l/𝛿m) that's it 

now we will do the iteration again with the new value of  ' 𝝰  = 0.5'


m1 = 0.5 - 0.5 (-2) = 1.5 

y_hat =  1.5 * 2 = 3

l = y - y_hat = 3 - (-3) = 0

so now we have arrived at the value of 'm' in 2 steps 

earlier the step was too large ,by taking repeated steps we have missed

the value ,

by reducing the step size , we have arrived at the optimal value.

 

 

 

 

 

 




Comments

Popular posts from this blog

SHA-256 initial values

The simple workout to arrive at the initial values for sha-256 The first 32 bit of the fractional part of the sqroot (first 8 prime number 2-19) Alright what does it say  Sqrroot(prime)- Let’s say the first prime is 2 Sqroot(2)  = 1.414213562373095 Convert to hexadecimal- Since we are worried about the fractional part alone Converting the fractional part would be easy Fractional part- 0.414213562373095 Multiply the fractional part with 16 to arrive at hex 0.414213562373095*16= 6.62741699796952 0.62741699796952*16= 10.03867196751232 0.03867196751232*16=0.61875148019712 0.61875148019712*16=9.90002368315392 0.90002368315392*16=14.40037893046272 0.40037893046272*16=6.40606288740352 0.40606288740352*16=6.49700619845632 0.49700619845632*16=7.95209917530112 Resulting hexadecimal would be 6a09e667 which is  h0 := 0x6a09e667 Iam going to stop at the 8th iteration , why is that ? Since we are interested in 32 bit (8*4=32) Alright to make it clear  Convert hexade...

Linear Regression with one variable - Introduction

 It is not but making a some how clear relationship among variables the dependent and independent variables. talking in terms of maths the equation can be used meaningfully for something may be to determine /predict values from data. if y = m * x + b  the values for m , b can be anything but has to appropriate to predict y  so the loss which is  difference from existing to prediction is close to zero ~0 to start with we can say the one variable as -x  in some scenario m , b are called variables    the equation stated about is a line equation we have any equation  y = 2*x  y = x*x y = 2x +2x*x  so why the need of all these equations , it is all about playing data now a days in machine learning problems we create a data sets , lets consider as x  y to be a value of x the datas . y = datas  when we express the data as a function and plot in the graph we get the curves  take some random data x and plot x and y  x =1 , 2, ...