The singular value decomposition of data matrix
The Principal Component Analysis (PCA) is a technique used in many fields: data science, signal processing, mechanics, etc. As a student of machine learning, I should take sometime to at least review this technique; and maybe ICA too, in some future posts.
With a data science mind set, the key idea of PCA is to reduce the dimensionality of the data while retaining as much variation as possible in the result. Personally, I think of PCA as projecting the “cloud” of data points to a “flat” surface. More technically, PCA is particularly useful when we have a large number of variables. In such situation, we might want to look at the data from a point of view where one direction capture the most variance (the data spread out the most). A picture from setosa illustrates this idea:
Let \(\mathbf{x}\) be a vector containing \(p\) random variables, we define the principal components of \(\mathbf{x}\) as follow:
Let \(\Sigma\) be the covariance matrix of \(\mathbf{x}\), then \(\alpha_1; \alpha_2;... ;\alpha_k\) are respectively eigenvectors of \(\Sigma\) corresponding with eigenvalues \(\lambda_1; ...; \lambda_k\) (s.t. \(\lambda_1 > ... > \lambda_k\)) and \(V(z_i) = V(\alpha_i \mathbf{x}) = \lambda_i\).
PCA is widely used when it comes to data as it gives a general view of the dataset. It is known as the singular value decomposition of data matrix \(\mathbf{X}\), or the eigenvalue decomposition of \(\mathbf{X}^\top\mathbf{X}\) (main theorem).
Take a simple approach to image recognition as an example. If we consider each pixel of a given image is a random variable, then we can compute the covariance matrix. By choosing k-largest eigenvalues and their corresponding eigenvectors, we can have the image in a new space. Interestingly, we can choose k as small as we want, resulting in a compact representation of images. In the newly defined space, every image is represented as a vector. These vectors can be used for similarity comparison. Such approach to image recognition is naïve and not effective in many cases. However, it gives a baseline and an example of the PCA technique.