A far clear application of ANOVA it's an extension of t-test what we are trying to prove whether there is a defense in mean or difference no difference in me okay now if we are trying to test whether there is a difference in mean and there's no difference in mean then why the test is name is ANOVA what is the full form of ANOVA analysis of variance and analyzing variance we are trying to prove whether the mean is same or not why it is then called analysis of variance it should be called analysis of mean.

End of the day you heard what you are trying to prove is whether there is a significant difference in mean of the groups or there is no difference in mean of the groups then you should call anm analysis of mean by analysis of variance the reason is pictorially I can try to explain and depict so that you understand how this makes sense okay I can have some data which is distributed like this so this is one of the data points and mean of this is coming somewhere here and I can have another distribution is this mean is still coming here but are these to say no are there variances different yes right so I really am trying to prove and I can have something which is something like this mean is again overlapping but very less variance so basically I am not just trying to prove the mean is same I am also trying to check whether there is the most scattering data or less scatter in Dida because to me this segment was at this segment was it this segment or three different segments right.

But if I just see the mean statistics they all same destroy mean is not the only thing that you are trying to hablo you are trying to see mean with the standard deviation whenever you do this and - you should always recollect the bullet short example for bullets but the center is average of that is a sender means we got the target but standard deviation we do not get the target okay that's why it is an analysis of millions right so with that understanding let us try to understand what is know what test ANOVA test is calculated based upon the F test statistics what is f test F test is basically explained the variance divided by unexplained variance assume there's not data which you would have learned in K nearest neighbor also assume this is the data if this entire data I use and compute the variance of this data will it be high yes it will be some number variance will be high.

Now if I convert this data into do groups I separate it like this is my Group 1 this is group 2 is the variance of Group 1 plus medians of group 2 will it be less than the variance total variance that I had initially yes it should be less but by grouping them into Group 1 and group 2 is my variance becomes 0 so the amount which I have been able to reduce the variance that is what I have been able to explain by grouping them ready what I have been able to explain by grouping them however within this group they still have variance and within this group they still medians that we will still have not been able to explain that is unexplained variance so the explained variance is the variability which is between the groups the variability which still exists between the groups is the Bida the explained variance component what is the reduction by grouping them and the variability which exists within this group is what I am NOT able to explain.

So far that is the unexplained part is the variability which is within the group and the variability which you have been able to express understand by grouping them apart that is the between-group variability okay so that is that two-element a stashed is a ratio of between-group variability divided by within-group variability okay if the value of F is higher which means the segments are different between group there is a high variability if the F value is higher which means the segments are different which means there is a significant difference between the mean okay that's what it is trying to prove as part of F test okay that's what we are trying to prove as part of our ANOVA for simplicity.

I explained to simplify the explanation I took two but ANOVA is when it is used and why I bar minus y bar the whole square divided by K minus 1 this is the between group variability what is why I bar denotes the sample mean of the eight groups, okay what it means is there will be a mean of this group let me represent by X there is a mean of this let me represent by black X Y I bar minus y bar what is y bar Y bar denotes the overall mean okay which let me represent by there's no other color so let me represent by 2x this is overall, okay so what you are now doing is why I borrow minus y bar you are taking this distance this is Group one and you are taking this distance this is the between group variability why I bar minus y bar the whole square are you getting Reza between group variability n n is basically several observation in that group because when you are doing y minus y bar you have to give weight edge to several observations in that group right.

That's why n divided by K minus 1 where K is several groups is the first element here unexplained variable Y IJ minus y I bar what is why I bar Y y IJ is this now sorry Y idea these data points okay and why I bar is this now this part there is a variable that is within-group variability this is a within-group variability this you compute you separately compute this and what it is saying summation you sum up all of this within groups that are the unexplained component divided by n minus K n is several observations a total number of observations K is several groups case number of groups K here is several groups so here degree of freedom is if there are five groups degree of freedom will be four by four let's understand that I know the population means I know what is the population means this is the point where my population means is okay I know the position of Group 1.

Let's have one more group here this is done you mean okay I know the mean of group 1 mean known to mean known third mean known do I really need to know the fourth mean if I have this information do I really need to know the fourth mean not require to know that's why the degree of freedom for that is K minus one here okay and that's why for variance what do we do X minus X bar the whole square divided by n minus one okay n minus one is what we do that's why we take here K minus one because here we are doing a calculation concerning the five of four centers okay.

That's a formula and at the bottom here it is taken n minus K because the mean of the first is already accounted for in the above calculation now at the below calculation you take n minus K because for each group you have to do n minus 1 n minus 1 n minus 1 right so that's why you have to do there are four groups so n minus one for the first group n minus one for the second group so minus K is what you take so this is the formula okay small n is the particular group observation size capital n is a total number of observation n is the number of observation in the eight groups I have not documented here k n capital n capital n is the total number of observation K is the number of dupes case' number of groups source Wikipedia.