jrnl · home about list my companies

# PCA by example with exercices and solutions

The topic is PCA - principal component analysis. We'll look at an example and have a few exercies + solutions at the end of the post.

Intuition

If we want to reduce the dimension of data (e.g. from 2D to 1D) due to space constraints, easier calculations or visualization, we want to lose as little information about our data as possible.

Take a look at the charts below. We will project (the cyan lines) our data (the cyan points) onto the blue line, i.e. we will reduce the 2D coordinate system to the 1-dimensional line. Now which projection explains our data better? After the projection, on chart A we retain the information that the two pairs are far away from each other. On chart B we do not. Another way of reasoning about this is to say: The variance of the distances (black arrows) is higher on chart A. So the higher the variance, the better explained our data by our projection.  Let's define a few things.

• A principal component (PC) is an axis. In our example: The blue line is our axis.

• The blue line in chart A is PC1, the one in chart B is PC2. PC1 is the best principal component in terms of data explanation.

• PCA maximizes the sum of squared distances of the projected points to the origin to find the PC with highest variance.

• That sum is the eigenvalue of the PC. Variance of a PC: Eigenvalue / (n-1). Thus the highest eigenvalue equals the best PC.

• The normalized vector on the PC line is the eigenvector, representing the direction our PC is going.

So in conclusion: The eigenvector with the largest eigenvalue is the direction along which our data has maximum variance, i.e. the data is maximally explained.

If we have two eigenvalues 6 and 2, then the PC for eigenvalue 6 explains 6/(6+2) = 75%. (because eigenvalues are connected to variance)

In our example: PC2 is perpendicular to PC1. In a 3D example: PC3 would be perpendicular to the plane of PC1 and PC2.

Steps of PCA

1. Mean normalization
2. Compute Covariance matrix
3. Compute eigenvectors/values with said matrix
4. Select top k eigenvalues and their eigenvectors
5. Create orthogonal base with the eigenvectors
6. Transform data by multiplying with said base

Recipe

1. Calculate covariance.
Formula:
a = A - 11*A / n
Cov(A) = a'a / n

2. Solve det(M-Iλ) = 0.

3. For both λ₁ and λ₂ solve:
M * [ x   = λ₁ * [ x
y ]          y ]
4. Put x=1 and convert to unit vector: (x**2 + y**2)**.5 = 1.

5. Orthogonal base consists of both eigenvectors side by side:
[ x₁ x₂
y₁ y₂ ]

6. Take first eigenvector and apply to a (from step 1).
a * [ x₁
y₁ ]
This is the transformed data.

Example

Data D:
[ 1 0
2 0
3 0
5 6
6 6
7 6 ]

1. Cov
a =
[ 1 0   [ 4 3   [ -3 -3
2 0     4 3     -2 -3
3 0  -  4 3  =  -1 -3
5 6     4 3      1  3
6 6     4 3      2  3
7 6 ]   4 3 ]    3  3 ]

Cov(D) = M = a'a / n = [ 4.66 6
6    9 ]

2. Solve det(M-Iλ) = 0
(4.66-λ) * (9-λ) - 6*6 = 0
λ₁ = 13.213
λ₂ = 0.454

3.
[ 4.66 6   * [ x   = 13.213 * [ x
6    9 ]     y ]              y ]
4.66x + 6y = 13.213x
6x + 9y = 13.213y
=> y = 1.4244x

4.
x = 1
y = 1.4244
Unit vector:
x₁ = 1 / sqrt(1 + 1.4244**2) = 0.57
y₁ = 1.4244 / sqrt(1 + 1.4244**2) = 0.82

Repeat steps 3 and 4 for λ₂ = 0.454. Solution:
x₂ = 0.82
y₂ = -0.57

5.
[ 0.57  0.82
0.82 -0.57 ]

6.
[ -3 -3               [ -4.18
-2 -3                 -3.6
-1 -3   * [ 0.57   =  -3.03
1  3       0.82 ]     3.03
2  3                  3.6
3  3 ]                4.18 ]

Gratz!

Exercises

1. a) Consider we conduct a PCA on a two-dimensional data set and we get the eigenvalue 6 and 2. Draw a distribution of sample points that may give rise to this result. Also, draw the two eigenvectors.

b) Consider 3 data points in a 2-dimensional space R**2: (-1, 1), (0, 0), (1, 1). What's the first principal component of the given dataset?

If we project the original data points onto the 1-dimensional subspace spanned by the principal component you choose, what are their coordinates in this subspace? What is the variance of the projected data?

For the projected data you just obtained above, now if we represent them in the original 2-dimensional space and consider them as the reconstruction of the original data points, what is the reconstruction error (squared)? Compute the reconstruction of the points.

2. a) Name four steps for performing a PCA.

b) Suppose we perform PCA on a two-dimensional dataset and it yields two eigenvalues which are equal. What does it mean regarding the importance of the dimension? Would pursuing a dimensionality reduction be a good choice? Please explain. Sketch a dataset where its two eigenvalues would have the same size.

c) Given the data points below, would PCA be capable of identifying the two lines? Sketch the principle axes. (please view with monospace font) i)

x           x
x       x
x   x
x
x   x
x       x
x           x

ii)

x
x
x
x
x                   x
x
x
x
x

Solutions

1. a) Points: [-1, .5**.5], [-1, 0], [-1, -(.5**.5)], [1, 0], [1, .5**.5], [1, -(.5**.5)]

>>> pca = PCA(n_components=2)
>>> pca.fit(np.array([[-1, .5**.5], [-1, 0], [-1, -(.5**.5)], [1, 0], [1, .5**.5], [1, -(.5**.5)]]))
PCA(copy=True, iterated_power='auto', n_components=2, random_state=None,
svd_solver='auto', tol=0.0, whiten=False)
>>> pca.singular_values_
array([2.44948974, 1.41421356])

How? I divided the eigenvalue 6 into 6 points (because 4 points would've been weird to calculate). And set their distance to 1 from the proposed principal component line, ie they're at x = -1 and x = 1. This way their SS is 1+1+1+1+1+1 = 6. Now to get the eigenvalue 2, we need to set y values to sqrt(.5). Why? Because two points are ON the second PC, we only care for four points. So to get eigenvalue 2: sqrt(.5)**2 = 0.5. And 0.5 * 4 = 2.

b) It's a line f(x) = .5 Their coordinates: [-1, 0], [0, 0], [1, 0] Variance: 1 Reconstruction error: 2

2. a) Normalize input Calculate covariance matrix Calculate eigenvectors of said matrix Eigenvector with greatest eigenvalue is our first principal component.

b) Both dimensions are equally important. Pursuing dimensionality reduction would be bad because we lose a lot of data/the data becomes very skewed. We lose 50% explanation of the data. Dataset: [0, 0], [1, 0], [0, 1], [1, 1]

c) If we fit the PCs to the apparent 'lines', one line will contribute ~0 variance to the total variance of the PC. Now if we set the PCs as horizontal/vertical axes, both lines will contribute, maximizing the total variance. So PCA will not find the apparent 'lines'. Also, both principal components will have the same importance.

Published on