Why Does SVD Give Us the Principal Components

March 16, 2025

Image created by Midjourney

A bit of an odd post today, but this has been on my mind lately. We've been using the SVD heavily in my machine learning class to find principal components of matrices, but I couldn't seem to wrap my head around why the SVD gave us the principal components until now. It feels like one of those things I will forget unless I write about it, so here we go. Posting about it here will make me come back to it more often.

Singular Value Decomposition (SVD) is a fundamental tool in linear algebra and data science. One of its most important applications is in Principal Component Analysis (PCA), where it helps us find the principal components of a dataset. But why does SVD naturally give us the principal components? Let’s break it down.

1. The Covariance Matrix and Eigenvectors

Given a data matrix \( X \), we typically compute the covariance matrix to understand the variance in the data. The covariance matrix is defined as:

\[ \text{Cov}(X) = \frac{1}{m} X^T X \]

where each row of \( X \) represents a data point, and we have \( n \) features. The principal components are the eigenvectors of this covariance matrix. We use eigendecomposition to find:

\[ X^T X = P D P^T \]

where \( P \) contains the eigenvectors and \( D \) is a diagonal matrix of eigenvalues. The key insight is that SVD directly provides this decomposition.

2. Singular Value Decomposition (SVD)

The SVD of \( X \) is given by:

\[ X = U \Sigma V^T \]

Here:

\( U \) (left singular vectors) are the eigenvectors of \( X X^T \).
\( V \) (right singular vectors) are the eigenvectors of \( X^T X \).
\( \Sigma \) contains the singular values, which are related to the eigenvalues.

3. Relationship Between SVD and PCA

When performing PCA, we typically compute the eigenvectors of \( X^T X \). But from SVD, we know:

\[ X^T X = (U \Sigma V^T)^T (U \Sigma V^T) = V \Sigma^2 V^T \]

\[ PDP^T= V \Sigma^2 V^T \]

This shows that the columns of \( V \) are the eigenvectors of \( X^T X \), which means they are the principal components. Therefore, **SVD directly provides the principal components in \( V \)**.

4. Row-Stacked vs. Column-Stacked Data

If the data is:

Row-stacked (each row is a data point), then the principal components are in **\( V \)**.
Column-stacked (each column is a data point), then the principal components are in **\( U \)**.

Conclusion

The SVD naturally gives us the principal components because it decomposes the data into a form where the right singular vectors correspond to the eigenvectors of the covariance matrix. This is why SVD is widely used in PCA and other dimensionality reduction techniques!