We'll take a look at that as well in an example anyway yes so how do you transform nonlinear data into linear data exact if we can basically the transform or like moving from a nonlinear to linear requires us to moving into higher dimensional space which we are trying to avoid we are trying to reduce dimension but it will ultimately try us to go to a higher dimension and then come back where we started right that may not be very helpful right so yes so that's that we can do that we can try it out maybe sometime it helps but what it will force us to go to a higher dimension which we are trying to avoid because we already are in higher dimension space.

We are trying to avoid going too high even for higher dimensions right everybody is ok with what PC does good thing is you don't need to rotate the line you know manually that was just for understanding what is you know what happens there is beautiful math to do it all right so what we will do we're going to check out the math as well and so these are the two things we want to calculate right we want to calculate the eigen vectors which are the unit vectors which represent the PCs the line the now which is a plane or a line in two-dimensional space and we want to calculate eigen values right so so very cool sounding names you know so you know so we will we'll see how to calculate these right how to calculate these using math but because once we have these that's it we are done right we have our pcs we have our I you know SSDs or the variants captured and that's it we can easily so so how many eigenvalues and how many eigen vectors should we have or will we have it will be same as number of features to begin with.

So it will be n like if you have two features two PC's 100 features 100 pieces I don't think I'm in better possible they are I mean I I I think mathematically are even but you know when you are drawing the perpendicular you know plane it may not be possible because you have captured 100% of it see I useless is the first day nothing I'm just thinking mathematically yeah whether it's possible or not we're probably not we will see the math also maybe you can try to see more than several features as possible or not alright so more features then you use your polynomial and all my go-to BCA pcs to reduce the dimension if you have more you can add you know that logarithmic polynomial and whatnot or you go to just feed it to a neural network and it will build its own features right.

Building more is not a problem it's doing you know using less is a is problematic for us right so now so we will have the same number of features in the original data right now is this was the slide I was talking about which we sort of the basics you will have the eigenvalue and then you can calculate the percentage variance captured by it and for example here if I want to capture 90% I can just keep pc1 & pc2 and throw away PC 3 and PC for right so that's what we can do now the little harder part right so but let's see let's say I hope it's not too difficult so I would I will go bill Gordon into the map but before that I thought we'll spend another couple of minutes to make sure we understand why our data points the new features are independent right.

So how are we getting our features is by projecting data points on PCs and a pc1 & pc2 so if you look at this picture the red red X is our the PC one points now now the PC one is like for each data point this is the new value right in the first feature and the second feature is the are these blue values so if I just rotate PC one to make it as x-axis and forget about f1 f2 because when we are not going to work with f1 f2 anymore we are going to work with pc1 & pc2 alone that this is how the picture will look like right the old data points are gone we are just left with pc1 & pc2 values of our examples right so which means are which and if you can if you see here each feature is it's on its own axis completely independent of other feature that's what cuz independence means right it's it's you know visually we can see that because it's it's on its own feature or not on its own axis now right completely independent of each other right so so make sure we are comfortable with that thought why we are saying that pcs are independent features right all right so let's leases are in a way related yes both each pieces at a perpendicular to other pieces.

Yeah I mean yes we are deriving other pieces from earlier pieces yes sorry the 90 degree means no correlation independent that's what we mean mathematically all right why is PC unsupervised learning is in a way because we basically we are not giving any labels to it you're right I mean we are just showing the data and telling it Kim can you give us data which can capture you know the lower dimension and still capture the information in our original data so we are not being at any label we are not defining any any any column for it that a this is what it column means and still it can do that still it can capture a lot of information and in smaller dimension space and supervisors usually means basically we do not give it any labels any that a this is what it means you know to how to reduce of course the algorithm.

We have to teach you know even the clustering algorithm is brought up by humans not when machine you know machine still has to follow the algorithm that we prescribe right so that part is supervised but as far as data is concerned you know the data is not supervised we just throw data at it and we get a lesser amount of data and still which still captures most of the information in original data right so that's the idea so there's no label in a mall here you can think of that simple yes I mean you know but sometimes we get my ear I'm providing everything to the Machine then all the whole PC or clustering the whole algorithm is provided by us how can it's answer proviso and services related to the date treatment of data not necessarily of the of algorithm itself right so it's with the data in respect to the data right I mean there are efforts there are efforts where machine has been asked to find out an optimum algorithm given a problem that if research area is called neural architecture search or Nass NES.

Still, very rudimentary work but that's the direction people are trying out especially google and all who have tons of money to throw at these problems and not worry about a return so that is going on but we still I mean do the algorithm part is still done by humans right all right so and that's what may be the that's one thing where our algorithms are not as good as brain right we want to build a brain basically right with all machine learning deep learning we are trying to build a brain but the brain is super good compared to any algorithm out there and it may not be machines problem it could be our problem because we are building these algorithms right.

So let's look at the math of PCA why not why try it out yes there's no harm in it we'll try it out only thing is if it's a non linearly separable data nonlinear data then then then yes it may not do a good job but yeah linearly separable yes why not go especially for the structured data I would say go for it I mean why avoid using it we can I mean it will not give you good results it'll not give you this good results the reason just be because it's at its heart it's linear math that's the reason right so we you not you'll not be sure whether results are good or not I mean right.

I mean it might still give you an hour an answer you know pc-1 pc2 pc3 but you're not sure whether they're really good or not right okay so um what we're going to do is we are going to learn math how to do it in math all the rotation part and SSDs and eigen vectors so lets we'll again we will do simpler math not do in like in math or multi-dimension because that will require multiple classes so we will do just simple quadratic equations right so we will start with you know a table with the order data with two dimensions f1 and f2 and so, of course, we basically use matrix.