The Principal Component Analysis (PCA) is a technique used in many fields: data science, signal processing, mechanics, etc. As a student of machine learning, I should take sometime to at least review this technique; and maybe ICA too, in some future posts.

## Reducing dimensionality

With a data science mind set, the key idea of PCA is to reduce the dimensionality of the data while retaining as much variation as possible in the result. Personally, I think of PCA as projecting the “cloud” of data points to a “flat” surface. More technically, PCA is particularly useful when we have a large number of variables. In such situation, we might want to look at the data from a point of view where one direction capture the most variance (the data spread out the most). A picture from setosa illustrates this idea:

Under the transformation, our data now have large variance on pc1 and small variance on pc2. The data now can be represented only on pc1 without much information loss.

Let $\mathbf{x}$ be a vector containing $p$ random variables, we define the principal components of $\mathbf{x}$ as follow:

1. Find $\alpha_1 \in R^p$ such that:

2. Next, find $\alpha_2 \in R^p$ such that:

3. Continue doing so, we can define $% $ to satisfy the condition above.

## Main theorem

Let $\Sigma$ be the covariance matrix of $\mathbf{x}$, then $\alpha_1; \alpha_2;... ;\alpha_k$ are respectively eigenvectors of $\Sigma$ corresponding with eigenvalues $\lambda_1; ...; \lambda_k$ (s.t. $\lambda_1 > ... > \lambda_k$) and $V(z_i) = V(\alpha_i \mathbf{x}) = \lambda_i$.

## Applications

PCA is widely used when it comes to data as it gives a general view of the dataset. It is known as the singular value decomposition of data matrix $\mathbf{X}$, or the eigenvalue decomposition of $\mathbf{X}^\top\mathbf{X}$ (main theorem).

Take a simple approach to image recognition as an example. If we consider each pixel of a given image is a random variable, then we can compute the covariance matrix. By choosing k-largest eigenvalues and their corresponding eigenvectors, we can have the image in a new space. Interestingly, we can choose k as small as we want, resulting in a compact representation of images. In the newly defined space, every image is represented as a vector. These vectors can be used for similarity comparison. Such approach to image recognition is naïve and not effective in many cases. However, it gives a baseline and an example of the PCA technique.