Principal Component Analysis

November 28, 2016 - 2 minute read -

The Principal Component Analysis (PCA) is a technique used in many fields: data science, signal processing, mechanics, etc. As a student of machine learning, I should take sometime to at least review this technique; and maybe ICA too, in some future posts.

Reducing dimensionality

With a data science mind set, the key idea of PCA is to reduce the dimensionality of the data while retaining as much variation as possible in the result. Personally, I think of PCA as projecting the “cloud” of data points to a “flat” surface. More technically, PCA is particularly useful when we have a large number of variables. In such situation, we might want to look at the data from a point of view where one direction capture the most variance (the data spread out the most). A picture from setosa illustrates this idea:

PCA 2D Under the transformation, our data now have large variance on pc1 and small variance on pc2. The data now can be represented only on pc1 without much information loss.

Let be a vector containing random variables, we define the principal components of as follow:

  1. Find such that:

  2. Next, find such that:

  3. Continue doing so, we can define to satisfy the condition above.

Main theorem

Let be the covariance matrix of , then are respectively eigenvectors of corresponding with eigenvalues (s.t. ) and .


PCA is widely used when it comes to data as it gives a general view of the dataset. It is known as the singular value decomposition of data matrix , or the eigenvalue decomposition of (main theorem).

Take a simple approach to image recognition as an example. If we consider each pixel of a given image is a random variable, then we can compute the covariance matrix. By choosing k-largest eigenvalues and their corresponding eigenvectors, we can have the image in a new space. Interestingly, we can choose k as small as we want, resulting in a compact representation of images. In the newly defined space, every image is represented as a vector. These vectors can be used for similarity comparison. Such approach to image recognition is naïve and not effective in many cases. However, it gives a baseline and an example of the PCA technique.