When working in data analysis it is almost impossible to avoid using linear algebra, even if it is on the background, e.g. simple linear regression. In this post I want to discuss one of the most important theorems of finite dimensional vector spaces: the spectral theorem. The objective is not to give a complete and rigorous treatment of the subject, but rather show the main ingredients, some examples and applications.

We are going to restrict to real vector spaces for simplicity. The complex case has an analogous treatment.
Because all \(n\)-dimensional vector spaces are isomorphic, we will work on \(V = \mathbb{R}^n\).
We consider \(\mathbb{R}^n\) as an inner product space with respect to the Euclidean metric \(\langle \cdot, \cdot \rangle\).

Eigenvalues and Eigenvectors

Let \(A\in M_n(\mathbb{R})\) be an \(n\)-dimensional matrix with real entries. A scalar \(\lambda\in\mathbb{C}\) is an eigenvalue for \(A\) if there exists a non-zero vector \(v\in \mathbb{R}^n\) such that \(Av = \lambda v\). The vector \(v\) is said to be an eigenvector of \(A\) associated to \(\lambda\). We denote by \(E(\lambda)\) the subspace generated by all the eigenvectors of associated to \(\lambda\). The set of eigenvalues of \(A\), denote by \(\text{spec(A)}\), is called the spectrum of \(A\).

We can rewrite the eigenvalue equation as \((A - \lambda I)v = 0\), where \(I\in M_n(\mathbb{R})\) denotes the identity matrix. Hence, computing eigenvectors is equivalent to find elements in the kernel of \(A - \lambda I\). A sufficient (and necessary) condition for a non-trivial kernel is \(\det (A - \lambda I)=0\). Thus, in order to find eigenvalues we need to calculate roots of the characteristic polynomial \(\det (A - \lambda I)=0\).

Remark: By the Fundamental Theorem of Algebra eigenvalues always exist and could potentially be complex numbers.

Remark: Note that \(A\) is invertible if and only if \(0 \notin \text{spec}(A)\).

Example 1 (Part I)

Consider the matrix

\[ A = \left( \begin{array}{cc} 1 & 2\\ 2 & 1 \end{array} \right) \]

Eigenvalues

Let us compute and factorize the characteristic polynomial to find the eigenvalues:

\[ \det(A -\lambda I) = (1 - \lambda)^2 - 2^2 = (1 - \lambda + 2) (1 - \lambda - 2) = - (3 - \lambda)(1 + \lambda) \]

Hence, we have two different eigenvalues \(\lambda_1 = 3\) and \(\lambda_2 = -1\).

Eigenvectors

For \(\lambda_1 = 3\) we have

\[ A-3I = \left( \begin{array}{cc} -2 & 2\\ 2 & - 2 \end{array} \right) \]

Hence,

\[ E(\lambda_1 = 3) = \text{span} \left\{ \left( \begin{array}{c} 1\\ 1 \end{array} \right) \right \} \]

Similarly, for \(\lambda_2 = -1\) we have

\[ A + I = \left( \begin{array}{cc} 2 & 2\\ 2 & 2 \end{array} \right) \]

so we easily see that

\[ E(\lambda_2 = -1) = \text{span} \left\{ \left( \begin{array}{c} 1\\ -1 \end{array} \right) \right \} \]

We can find eigenvalues and eigenvector in R as follows:

# Define matrix A.
A <- matrix(c(1,2,2,1), nrow = 2, byrow = TRUE)

A

##      [,1] [,2]
## [1,]    1    2
## [2,]    2    1

sp_decomp <- eigen(x = A)

Eigenvalues

sp_decomp$values

## [1]  3 -1

Eigenvectors

sp_decomp$vectors

##           [,1]       [,2]
## [1,] 0.7071068 -0.7071068
## [2,] 0.7071068  0.7071068

Let us verify the eigenvalue equations

lambda_1 <- sp_decomp$values[1]
lambda_2 <- sp_decomp$values[2]

v_1 <- sp_decomp$vectors[,1]
v_2 <- sp_decomp$vectors[,2]

(A %*% v_1 == lambda_1*v_1) & (A %*% v_2 == lambda_2*v_2)

##      [,1]
## [1,] TRUE
## [2,] TRUE

Symmetric Matrices

We want to restrict now to a certain subspace of matrices, namely symmetric matrices. Recall that a matrix \(A\) is symmetric if \(A^T = A\), i.e. it is equal to its transpose.

An important property of symmetric matrices is that is spectrum consists of real eigenvalues. To see this let \(A\in M_n(\mathbb{R}) \subset M_n(\mathbb{C})\) be a symmetric matrix with eigenvalue \(\lambda\) and corresponding eigenvector \(v\). Assume \(||v|| = 1\), then

\[ \lambda = \lambda \langle v, v \rangle = \langle \lambda v, v \rangle = \langle Av, v \rangle = \langle v, A^T v \rangle = \langle v, Av \rangle = \langle v, \lambda v \rangle = \bar{\lambda} \langle v, v \rangle = \bar{\lambda} \] That is, \(\lambda\) is equal to its complex conjugate. In particular, we see that the characteristic polynomial splits into a product of degree one polynomials with real coefficients. This property is very important.

Theorem (Schur): Let \(A\in M_n(\mathbb{R})\) be a matrix such that its characteristic polynomial splits (as above), then there exists an orthonormal basis of \(\mathbb{R}^n\) such that \(A\) is upper-triangular.

Proof: One can use induction on the dimension \(n\). We omit the (non-trivial) details.

Remark: When we say that there exists an orthonormal basis of \(\mathbb{R}^n\) such that \(A\) is upper-triangular, we see \(A:\mathbb{R}^n\longrightarrow \mathbb{R}^n\) as a linear transformation.

The following theorem is a straightforward consequence of Schur’s theorem.

Theorem A matrix \(A\) is symmetric if and only if there exists an orthonormal basis for \(\mathbb{R}^n\) consisting of eigenvectors of \(A\).

Example 2

Let us see a concrete example where the statement of the theorem above does not hold. Consider the matrix

\[ B = \left( \begin{array}{cc} 1 & 2\\ 0 & 1 \end{array} \right) \]

Eigenvalues

\[ \det(B -\lambda I) = (1 - \lambda)^2 \] Hence, the spectrum of \(B\) consist of the single value \(\lambda = 1\).

Eigenvectors

The kernel of the matrix

\[ B - I = \left( \begin{array}{cc} 0 & 2\\ 0 & 0 \end{array} \right) \]

is the subspace

\[ E(\lambda = 1) = \text{span} \left\{ \left( \begin{array}{c} 1\\ 0 \end{array} \right) \right \} \] In particular, we see that the eigenspace of all the eigenvectors of \(B\) has dimension one, so we can not find a basis of eigenvector for \(\mathbb{R}^2\).

Le us see how to compute this in \(R\)

B <- matrix(c(1,2,0,1), nrow = 2, byrow = TRUE)

B

##      [,1] [,2]
## [1,]    1    2
## [2,]    0    1

sp_decomp <- eigen(x = B)

sp_decomp$values

## [1] 1 1

sp_decomp$vectors

##      [,1]          [,2]
## [1,]    1 -1.000000e+00
## [2,]    0  1.110223e-16

Observe that these two columns are linerly dependent.

The following is another important result for symmetric matrices.

Proposition: If \(\lambda_1\) and \(\lambda_2\) are two distinct eigenvalues of a symmetric matrix \(A\) with corresponding eigenvectors \(v_1\) and \(v_2\) then \(v_1\) and \(v_2\) are orthogonal.

Proof: We argue similarly as before

\[ \lambda_1\langle v_1, v_2 \rangle = \langle \lambda_1 v_1, v_2 \rangle = \langle A v_1, v_2 \rangle = \langle v_1, A v_2 \rangle = \langle v_1, \lambda_2 v_2 \rangle = \bar{\lambda}_2 \langle v_1, v_2 \rangle = \lambda_2 \langle v_1, v_2 \rangle \] which proofs that \(\langle v_1, v_2 \rangle\) must be zero.

Orthogonal Projections

Let \(W \leq \mathbb{R}^n\) be subspace. We define its orthogonal complement as \[ W^{\perp} := \{ v \in \mathbb{R} \:|\: \langle v, w \rangle = 0 \:\forall \: w \in W \} \]

A matrix \(P\in M_n(\mathbb{R}^n)\) is said to be an orthogonal projection if

\(P^2 = P\)
\(\text{ran}(P)^\perp = \ker(P)\)

Here

\(\ker(P)=\{v \in \mathbb{R}^2 \:|\: Pv = 0\}\) denotes the kernel of \(P\).
\(\text{ran}(P) = \{ Pv \: | \: v \in \mathbb{R}\}\) is the range (or image) of \(P\).

Example 3

Let us consider a non-zero vector \(u\in\mathbb{R}\). We can use the inner product to construct the orthogonal projection onto the span of \(u\) as follows:

\[ P_{u}:=\frac{1}{\|u\|^2}\langle u, \cdot \rangle u : \mathbb{R}^n \longrightarrow \{\alpha u\: | \: \alpha\in\mathbb{R}\} \] Note that:

\[ P^2_u(v) = \frac{1}{\|u\|^4}\langle u, \langle u , v \rangle u \rangle u = \frac{1}{\|u\|^2}\langle u, v \rangle u = P_u(v) \]

The condition \(\text{ran}(P_u)^\perp = \ker(P_u)\) is trivially satisfied. Hence, \(P_u\) is an orthogonal projection.

Example 1 (Part II)

Let us compute the orthogonal projections onto the eigenspaces of the matrix

\[ A = \left( \begin{array}{cc} 1 & 2\\ 2 & 1 \end{array} \right) \]

For \(\lambda_1 = 3\), recall that

\[ E(\lambda_1 = 3) = \text{span} \left\{ \left( \begin{array}{c} 1\\ 1 \end{array} \right) \right \} \]

From Example 3 above we see that

\[ P(\lambda_1 = 3) = \frac{1}{2}\left\langle \left( \begin{array}{c} 1 \\ 1 \end{array} \right) , \cdot \right\rangle \left( \begin{array}{c} 1 \\ 1 \end{array} \right) : \mathbb{R}\longrightarrow E(\lambda_1 = 3) \]

Which in matrix form (with respect to the canonical basis of \(\mathbb{R}^2\)) is given by

\[ P(\lambda_1 = 3) = \frac{1}{2} \left( \begin{array}{cc} 1 & 1 \\ 1 & 1 \end{array} \right) \]

A similar argument shows that

\[ P(\lambda_2 = -1) = \frac{1}{2} \left( \begin{array}{cc} 1 & - 1 \\ -1 & 1 \end{array} \right) \]

Observe that

\[ P(\lambda_1 = 3)P(\lambda_2 = -1) = \frac{1}{4} \left( \begin{array}{cc} 1 & 1 \\ 1 & 1 \end{array} \right) \left( \begin{array}{cc} 1 & - 1 \\ -1 & 1 \end{array} \right) = \left( \begin{array}{cc} 0 & 0 \\ 0 & 0 \end{array} \right) \]

as expected (see Proposition above).

Let us see how to compute the orthogonal projections in R.

sp_decomp <- eigen(x = A)

# Construct projection function.
proj <- function(v, x) {
  
  inner_product <- as.numeric(v %*% x)

  inner_product*v  
}

# Apply to the canonical basis. 
proj_1 <- cbind(matrix(data = proj(v = sp_decomp$vectors[, 1], x = c(1, 0)), ncol = 1), 
                matrix(data = proj(v = sp_decomp$vectors[, 1], x = c(0, 1)), ncol = 1))

proj_2 <- cbind(matrix(data = proj(v = sp_decomp$vectors[, 2], x = c(1, 0)), ncol = 1), 
                matrix(data = proj(v = sp_decomp$vectors[, 2], x = c(0, 1)), ncol = 1))

proj_1

##      [,1] [,2]
## [1,]  0.5  0.5
## [2,]  0.5  0.5

proj_2

##      [,1] [,2]
## [1,]  0.5 -0.5
## [2,] -0.5  0.5

Note for example

proj_1 %*% proj_1

##      [,1] [,2]
## [1,]  0.5  0.5
## [2,]  0.5  0.5

and

proj_1 %*% proj_2

##      [,1] [,2]
## [1,]    0    0
## [2,]    0    0

The Spectral Theorem

Now we are ready to understand the statement of the spectral theorem.

Theorem (Spectral Theorem for Matrices) Let \(A\in M_n(\mathbb{R})\) be a symmetric matrix, with distinct eigenvalues \(\lambda_1, \lambda_2, \cdots, \lambda_k\). Let \(E(\lambda_i)\) be the eigenspace of \(A\) corresponding to the eigenvalue \(\lambda_i\), and let \(P(\lambda_i):\mathbb{R}^n\longrightarrow E(\lambda_i)\) be the corresponding orthogonal projection of \(\mathbb{R}^n\) onto \(E(\lambda_i)\). Then the following statements are true:

There is a direct sum decomposition \(\mathbb{R}^n = \bigoplus_{i=1}^{k} E(\lambda_i)\).
If \(B(\lambda_i) := \bigoplus_{i\neq j}^{k} E(\lambda_i)\), then \(E(\lambda_i)^{\perp} = B(\lambda_i)\).
The orthogonal projections satisfy \(P(\lambda_i)P(\lambda_j)=\delta_{ij}P(\lambda_i)\) for \(1 \leq i, j \leq k\). Here \(\delta_{ij}\) denotes the Kronecker delta.
The identity matrix can be decomposed as \(I = \sum_{i=i}^{k} P(\lambda_i)\).
The matrix \(A\) can be written as \(A = \sum_{i=i}^{k} \lambda_i P(\lambda_i)\).

Comments

We can read this first statement as follows: There exists a basis of \(\mathbb{R}^n\) consisting of eigenvectors of \(A\).
The basis above can chosen to be orthonormal using the Gram–Schmidt process within each eigenspace.

As a consequence of this theorem we see that there exist an orthogonal matrix \(Q\in SO(n)\) (i.e \(QQ^T=Q^TQ=I\) and \(\det(Q)=1\)) such that

\[ A = QDQ^{-1} \]

where \(D\) is a diagonal matrix containing the eigenvalues in \(A\) (with multiplicity). The matrix \(Q\) is constructed by stacking the normalized orthogonal eigenvectors of \(A\) as column vectors.

Example 1 (Part III)

Let us continue working with the matrix

\[ A = \left( \begin{array}{cc} 1 & 2\\ 2 & 1 \end{array} \right) \]

We have already verified the first three statements of the spectral theorem in Part I and Part II. For d. let us simply compute \(P(\lambda_1 = 3) + P(\lambda_2 = -1)\),

\[ \frac{1}{2} \left( \begin{array}{cc} 1 & 1 \\ 1 & 1 \end{array} \right) + \frac{1}{2} \left( \begin{array}{cc} 1 & -1 \\ -1 & 1 \end{array} \right) = \left( \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right) = I \]

Finally, for e. we calculate

\[ \frac{3}{2} \left( \begin{array}{cc} 1 & 1 \\ 1 & 1 \end{array} \right) - \frac{1}{2} \left( \begin{array}{cc} 1 & -1 \\ -1 & 1 \end{array} \right) = \left( \begin{array}{cc} 1 & 2 \\ 2 & 1 \end{array} \right) = A \]

In R we can easily get the same results:

proj_1 + proj_2

##      [,1] [,2]
## [1,]    1    0
## [2,]    0    1

sp_decomp$values[1]*proj_1 + sp_decomp$values[2]*proj_2

##      [,1] [,2]
## [1,]    1    2
## [2,]    2    1

The matrix \(Q\) is simply given by

\[ Q = \frac{1}{\sqrt{2}} \left( \begin{array}{cc} 1 & -1 \\ 1 & 1 \end{array} \right) \] Obvserve that

\[ \frac{1}{\sqrt{2}} \left( \begin{array}{cc} 1 & -1 \\ 1 & 1 \end{array} \right) \left( \begin{array}{cc} 1 & 2\\ 2 & 1 \end{array} \right) \frac{1}{\sqrt{2}} \left( \begin{array}{cc} 1 & -1 \\ 1 & 1 \end{array} \right) = \left( \begin{array}{cc} 3 & 0\\ 0 & -1 \end{array} \right) \] In R this is an immediate computation

Q <- sp_decomp$vectors

Q

##           [,1]       [,2]
## [1,] 0.7071068 -0.7071068
## [2,] 0.7071068  0.7071068

D <- t(Q) %*% A %*% Q

D

##      [,1] [,2]
## [1,]    3    0
## [2,]    0   -1

This completes the verification of the spectral theorem in this simple example.

A Sketch of the Proof

This follow easily from the discussion on symmetric matrices above.
This follows by the Proposition above and the dimension theorem (to prove the two inclusions).
This also follows from the Proposition above.
This follows from a. and b.
For \(v\in\mathbb{R}^n\), let us decompose it as

\[ v = \sum_{i=1}^{k} v_i \]

where \(v_i\in E(\lambda_i)\). Then,

\[ Av = A\left(\sum_{i=1}^{k} v_i\right) = \sum_{i=1}^{k} A v_i = \sum_{i=1}^{k} \lambda_iv_i = \left( \sum_{i=1}^{k} \lambda_i P(\lambda_i)\right)v \]

Some Applications

Functional Calculus

For manny applications (e.g. compute heat kernel of the graph Laplacian) one is intereted in computing the exponential of a symmetric matrix \(A\) defined by the (convergent) series

\[ e^A:= \sum_{k=0}^{\infty}\frac{A^k}{k!} \]

In practice, to compute the exponential we can use the relation A = \(Q D Q^{-1}\),

\[ e^A= \sum_{k=0}^{\infty}\frac{(Q D Q^{-1})^k}{k!} = Q\left(\sum_{k=0}^{\infty}\frac{D^k}{k!}\right)Q^{-1} = Qe^{D}Q^{-1} \]

and since \(D\) is diagonal then \(e^{D}\) is just again a diagonal matrix with entries \(e^{\lambda_i}\).

Example 1 (Part IV)

We compute \(e^A\). First let us calculate \(e^D\) using the expm package.

expm::expm(D)

##          [,1]      [,2]
## [1,] 20.08554 0.0000000
## [2,]  0.00000 0.3678794

Note that

exp(3)

## [1] 20.08554

exp(-1)

## [1] 0.3678794

Now we compute the product

Q %*% expm::expm(D) %*% t(Q)

##           [,1]      [,2]
## [1,] 10.226708  9.858829
## [2,]  9.858829 10.226708

This coincides with the result obtained using expm.

expm::expm(A)

##           [,1]      [,2]
## [1,] 10.226708  9.858829
## [2,]  9.858829 10.226708

In a similar manner, one can easily show that for any polynomial \(p(x)\) one has

\[ p(A) = \sum_{i=1}^{k}p(\lambda_i)P(\lambda_i) \]

Moreover, one can extend this relation to the space of continuous functions \(f:\text{spec}(A)\subset\mathbb{R}\longrightarrow \mathbb{C}\), this is known as the spectral mapping theorem.

Remark: The Cayley–Hamilton theorem says that every square matrix (over a commutative ring) satisfies its own characteristic polynomial.

Example 1 (Part V)

Let us verify this for the matrix \(A\).

(diag(2) - A) %*% (diag(2) - A) - 4*diag(2)

##      [,1] [,2]
## [1,]    0    0
## [2,]    0    0

Principal Component Analysis

Given an observation matrix \(X\in M_{n\times p}(\mathbb{R})\), the covariance matrix \(A:= X^T X \in M_p(\mathbb{R})\) is clearly symmetric and therefore diagonalizable. In this context, principal component analysis just translates to reducing the dimensionality by projecting onto a subspace generated by a subset of eigenvectors of \(A\).

Spectrum of the Graph Laplacian

In various applications, like the spectral embedding non-linear dimensionality algorithm or spectral clustering, the spectral decomposition of the grah Laplacian is of much interest (see for example PyData Berlin 2018: On Laplacian Eigenmaps for Dimensionality Reduction).

Many More …

Spectral Theorem for Bounded and Unbounded Operators

This is just the begining! There is a beautifull rich theory on the spectral analysis of bounded and unbounded self-adjoint operators on Hilbert spaces with many applications (e.g. Quantum Mechanics, Fourier Decomposition, Signal Processing, …).

The Spectral Theorem for Matrices

Eigenvalues and Eigenvectors

Example 1 (Part I)

Symmetric Matrices

Example 2

Orthogonal Projections

Example 3

Example 1 (Part II)

The Spectral Theorem

Comments

Example 1 (Part III)

A Sketch of the Proof

Some Applications

Functional Calculus

Example 1 (Part IV)

Example 1 (Part V)

Principal Component Analysis

Spectrum of the Graph Laplacian

Many More …

Spectral Theorem for Bounded and Unbounded Operators

References