When working in data analysis it is almost impossible to avoid using linear algebra, even if it is on the background, e.g. simple linear regression. In this post I want to discuss one of the most important theorems of finite dimensional vector spaces: the spectral theorem. The objective is not to give a complete and rigorous treatment of the subject, but rather show the main ingredients, some examples and applications.
- We are going to restrict to real vector spaces for simplicity. The complex case has an analogous treatment.
- Because all \(n\)-dimensional vector spaces are isomorphic, we will work on \(V = \mathbb{R}^n\).
- We consider \(\mathbb{R}^n\) as an inner product space with respect to the Euclidean metric \(\langle \cdot, \cdot \rangle\).
Eigenvalues and Eigenvectors
Let \(A\in M_n(\mathbb{R})\) be an \(n\)-dimensional matrix with real entries. A scalar \(\lambda\in\mathbb{C}\) is an eigenvalue for \(A\) if there exists a non-zero vector \(v\in \mathbb{R}^n\) such that \(Av = \lambda v\). The vector \(v\) is said to be an eigenvector of \(A\) associated to \(\lambda\). We denote by \(E(\lambda)\) the subspace generated by all the eigenvectors of associated to \(\lambda\). The set of eigenvalues of \(A\), denote by \(\text{spec(A)}\), is called the spectrum of \(A\).
We can rewrite the eigenvalue equation as \((A - \lambda I)v = 0\), where \(I\in M_n(\mathbb{R})\) denotes the identity matrix. Hence, computing eigenvectors is equivalent to find elements in the kernel of \(A - \lambda I\). A sufficient (and necessary) condition for a non-trivial kernel is \(\det (A - \lambda I)=0\). Thus, in order to find eigenvalues we need to calculate roots of the characteristic polynomial \(\det (A - \lambda I)=0\).
Remark: By the Fundamental Theorem of Algebra eigenvalues always exist and could potentially be complex numbers.
Remark: Note that \(A\) is invertible if and only if \(0 \notin \text{spec}(A)\).
Example 1 (Part I)
Consider the matrix
\[ A = \left( \begin{array}{cc} 1 & 2\\ 2 & 1 \end{array} \right) \]
- Eigenvalues
Let us compute and factorize the characteristic polynomial to find the eigenvalues:
\[ \det(A -\lambda I) = (1 - \lambda)^2 - 2^2 = (1 - \lambda + 2) (1 - \lambda - 2) = - (3 - \lambda)(1 + \lambda) \]
Hence, we have two different eigenvalues \(\lambda_1 = 3\) and \(\lambda_2 = -1\).
- Eigenvectors
For \(\lambda_1 = 3\) we have
\[ A-3I = \left( \begin{array}{cc} -2 & 2\\ 2 & - 2 \end{array} \right) \]
Hence,
\[ E(\lambda_1 = 3) = \text{span} \left\{ \left( \begin{array}{c} 1\\ 1 \end{array} \right) \right \} \]
Similarly, for \(\lambda_2 = -1\) we have
\[ A + I = \left( \begin{array}{cc} 2 & 2\\ 2 & 2 \end{array} \right) \]
so we easily see that
\[ E(\lambda_2 = -1) = \text{span} \left\{ \left( \begin{array}{c} 1\\ -1 \end{array} \right) \right \} \]
We can find eigenvalues and eigenvector in R as follows:
# Define matrix A.
A <- matrix(c(1,2,2,1), nrow = 2, byrow = TRUE)
A
## [,1] [,2]
## [1,] 1 2
## [2,] 2 1
sp_decomp <- eigen(x = A)
- Eigenvalues
sp_decomp$values
## [1] 3 -1
- Eigenvectors
sp_decomp$vectors
## [,1] [,2]
## [1,] 0.7071068 -0.7071068
## [2,] 0.7071068 0.7071068
Let us verify the eigenvalue equations
lambda_1 <- sp_decomp$values[1]
lambda_2 <- sp_decomp$values[2]
v_1 <- sp_decomp$vectors[,1]
v_2 <- sp_decomp$vectors[,2]
(A %*% v_1 == lambda_1*v_1) & (A %*% v_2 == lambda_2*v_2)
## [,1]
## [1,] TRUE
## [2,] TRUE
Symmetric Matrices
We want to restrict now to a certain subspace of matrices, namely symmetric matrices. Recall that a matrix \(A\) is symmetric if \(A^T = A\), i.e. it is equal to its transpose.
An important property of symmetric matrices is that is spectrum consists of real eigenvalues. To see this let \(A\in M_n(\mathbb{R}) \subset M_n(\mathbb{C})\) be a symmetric matrix with eigenvalue \(\lambda\) and corresponding eigenvector \(v\). Assume \(||v|| = 1\), then
\[ \lambda = \lambda \langle v, v \rangle = \langle \lambda v, v \rangle = \langle Av, v \rangle = \langle v, A^T v \rangle = \langle v, Av \rangle = \langle v, \lambda v \rangle = \bar{\lambda} \langle v, v \rangle = \bar{\lambda} \] That is, \(\lambda\) is equal to its complex conjugate. In particular, we see that the characteristic polynomial splits into a product of degree one polynomials with real coefficients. This property is very important.
Theorem (Schur): Let \(A\in M_n(\mathbb{R})\) be a matrix such that its characteristic polynomial splits (as above), then there exists an orthonormal basis of \(\mathbb{R}^n\) such that \(A\) is upper-triangular.
Proof: One can use induction on the dimension \(n\). We omit the (non-trivial) details.
Remark: When we say that there exists an orthonormal basis of \(\mathbb{R}^n\) such that \(A\) is upper-triangular, we see \(A:\mathbb{R}^n\longrightarrow \mathbb{R}^n\) as a linear transformation.
The following theorem is a straightforward consequence of Schur’s theorem.
Theorem A matrix \(A\) is symmetric if and only if there exists an orthonormal basis for \(\mathbb{R}^n\) consisting of eigenvectors of \(A\).
Example 2
Let us see a concrete example where the statement of the theorem above does not hold. Consider the matrix
\[ B = \left( \begin{array}{cc} 1 & 2\\ 0 & 1 \end{array} \right) \]
- Eigenvalues
\[ \det(B -\lambda I) = (1 - \lambda)^2 \] Hence, the spectrum of \(B\) consist of the single value \(\lambda = 1\).
- Eigenvectors
The kernel of the matrix
\[ B - I = \left( \begin{array}{cc} 0 & 2\\ 0 & 0 \end{array} \right) \]
is the subspace
\[ E(\lambda = 1) = \text{span} \left\{ \left( \begin{array}{c} 1\\ 0 \end{array} \right) \right \} \] In particular, we see that the eigenspace of all the eigenvectors of \(B\) has dimension one, so we can not find a basis of eigenvector for \(\mathbb{R}^2\).
Le us see how to compute this in \(R\)
B <- matrix(c(1,2,0,1), nrow = 2, byrow = TRUE)
B
## [,1] [,2]
## [1,] 1 2
## [2,] 0 1
sp_decomp <- eigen(x = B)
sp_decomp$values
## [1] 1 1
sp_decomp$vectors
## [,1] [,2]
## [1,] 1 -1.000000e+00
## [2,] 0 1.110223e-16
Observe that these two columns are linerly dependent.
The following is another important result for symmetric matrices.
Proposition: If \(\lambda_1\) and \(\lambda_2\) are two distinct eigenvalues of a symmetric matrix \(A\) with corresponding eigenvectors \(v_1\) and \(v_2\) then \(v_1\) and \(v_2\) are orthogonal.
Proof: We argue similarly as before
\[ \lambda_1\langle v_1, v_2 \rangle = \langle \lambda_1 v_1, v_2 \rangle = \langle A v_1, v_2 \rangle = \langle v_1, A v_2 \rangle = \langle v_1, \lambda_2 v_2 \rangle = \bar{\lambda}_2 \langle v_1, v_2 \rangle = \lambda_2 \langle v_1, v_2 \rangle \] which proofs that \(\langle v_1, v_2 \rangle\) must be zero.
Orthogonal Projections
Let \(W \leq \mathbb{R}^n\) be subspace. We define its orthogonal complement as \[ W^{\perp} := \{ v \in \mathbb{R} \:|\: \langle v, w \rangle = 0 \:\forall \: w \in W \} \]
A matrix \(P\in M_n(\mathbb{R}^n)\) is said to be an orthogonal projection if
- \(P^2 = P\)
- \(\text{ran}(P)^\perp = \ker(P)\)
Here
- \(\ker(P)=\{v \in \mathbb{R}^2 \:|\: Pv = 0\}\) denotes the kernel of \(P\).
- \(\text{ran}(P) = \{ Pv \: | \: v \in \mathbb{R}\}\) is the range (or image) of \(P\).
Example 3
Let us consider a non-zero vector \(u\in\mathbb{R}\). We can use the inner product to construct the orthogonal projection onto the span of \(u\) as follows:
\[ P_{u}:=\frac{1}{\|u\|^2}\langle u, \cdot \rangle u : \mathbb{R}^n \longrightarrow \{\alpha u\: | \: \alpha\in\mathbb{R}\} \] Note that:
\[ P^2_u(v) = \frac{1}{\|u\|^4}\langle u, \langle u , v \rangle u \rangle u = \frac{1}{\|u\|^2}\langle u, v \rangle u = P_u(v) \]
The condition \(\text{ran}(P_u)^\perp = \ker(P_u)\) is trivially satisfied. Hence, \(P_u\) is an orthogonal projection.
Example 1 (Part II)
Let us compute the orthogonal projections onto the eigenspaces of the matrix
\[ A = \left( \begin{array}{cc} 1 & 2\\ 2 & 1 \end{array} \right) \]
For \(\lambda_1 = 3\), recall that
\[ E(\lambda_1 = 3) = \text{span} \left\{ \left( \begin{array}{c} 1\\ 1 \end{array} \right) \right \} \]
From Example 3 above we see that
\[ P(\lambda_1 = 3) = \frac{1}{2}\left\langle \left( \begin{array}{c} 1 \\ 1 \end{array} \right) , \cdot \right\rangle \left( \begin{array}{c} 1 \\ 1 \end{array} \right) : \mathbb{R}\longrightarrow E(\lambda_1 = 3) \]
Which in matrix form (with respect to the canonical basis of \(\mathbb{R}^2\)) is given by
\[ P(\lambda_1 = 3) = \frac{1}{2} \left( \begin{array}{cc} 1 & 1 \\ 1 & 1 \end{array} \right) \]
A similar argument shows that
\[ P(\lambda_2 = -1) = \frac{1}{2} \left( \begin{array}{cc} 1 & - 1 \\ -1 & 1 \end{array} \right) \]
Observe that
\[ P(\lambda_1 = 3)P(\lambda_2 = -1) = \frac{1}{4} \left( \begin{array}{cc} 1 & 1 \\ 1 & 1 \end{array} \right) \left( \begin{array}{cc} 1 & - 1 \\ -1 & 1 \end{array} \right) = \left( \begin{array}{cc} 0 & 0 \\ 0 & 0 \end{array} \right) \]
as expected (see Proposition above).
Let us see how to compute the orthogonal projections in R.
sp_decomp <- eigen(x = A)
# Construct projection function.
proj <- function(v, x) {
inner_product <- as.numeric(v %*% x)
inner_product*v
}
# Apply to the canonical basis.
proj_1 <- cbind(matrix(data = proj(v = sp_decomp$vectors[, 1], x = c(1, 0)), ncol = 1),
matrix(data = proj(v = sp_decomp$vectors[, 1], x = c(0, 1)), ncol = 1))
proj_2 <- cbind(matrix(data = proj(v = sp_decomp$vectors[, 2], x = c(1, 0)), ncol = 1),
matrix(data = proj(v = sp_decomp$vectors[, 2], x = c(0, 1)), ncol = 1))
proj_1
## [,1] [,2]
## [1,] 0.5 0.5
## [2,] 0.5 0.5
proj_2
## [,1] [,2]
## [1,] 0.5 -0.5
## [2,] -0.5 0.5
Note for example
proj_1 %*% proj_1
## [,1] [,2]
## [1,] 0.5 0.5
## [2,] 0.5 0.5
and
proj_1 %*% proj_2
## [,1] [,2]
## [1,] 0 0
## [2,] 0 0
The Spectral Theorem
Now we are ready to understand the statement of the spectral theorem.
Theorem (Spectral Theorem for Matrices) Let \(A\in M_n(\mathbb{R})\) be a symmetric matrix, with distinct eigenvalues \(\lambda_1, \lambda_2, \cdots, \lambda_k\). Let \(E(\lambda_i)\) be the eigenspace of \(A\) corresponding to the eigenvalue \(\lambda_i\), and let \(P(\lambda_i):\mathbb{R}^n\longrightarrow E(\lambda_i)\) be the corresponding orthogonal projection of \(\mathbb{R}^n\) onto \(E(\lambda_i)\). Then the following statements are true:
- There is a direct sum decomposition \(\mathbb{R}^n = \bigoplus_{i=1}^{k} E(\lambda_i)\).
- If \(B(\lambda_i) := \bigoplus_{i\neq j}^{k} E(\lambda_i)\), then \(E(\lambda_i)^{\perp} = B(\lambda_i)\).
- The orthogonal projections satisfy \(P(\lambda_i)P(\lambda_j)=\delta_{ij}P(\lambda_i)\) for \(1 \leq i, j \leq k\). Here \(\delta_{ij}\) denotes the Kronecker delta.
- The identity matrix can be decomposed as \(I = \sum_{i=i}^{k} P(\lambda_i)\).
- The matrix \(A\) can be written as \(A = \sum_{i=i}^{k} \lambda_i P(\lambda_i)\).
A Sketch of the Proof
This follow easily from the discussion on symmetric matrices above.
This follows by the Proposition above and the dimension theorem (to prove the two inclusions).
This also follows from the Proposition above.
This follows from a. and b.
For \(v\in\mathbb{R}^n\), let us decompose it as
\[ v = \sum_{i=1}^{k} v_i \]
where \(v_i\in E(\lambda_i)\). Then,
\[ Av = A\left(\sum_{i=1}^{k} v_i\right) = \sum_{i=1}^{k} A v_i = \sum_{i=1}^{k} \lambda_iv_i = \left( \sum_{i=1}^{k} \lambda_i P(\lambda_i)\right)v \]
Some Applications
Functional Calculus
For manny applications (e.g. compute heat kernel of the graph Laplacian) one is intereted in computing the exponential of a symmetric matrix \(A\) defined by the (convergent) series
\[ e^A:= \sum_{k=0}^{\infty}\frac{A^k}{k!} \]
In practice, to compute the exponential we can use the relation A = \(Q D Q^{-1}\),
\[ e^A= \sum_{k=0}^{\infty}\frac{(Q D Q^{-1})^k}{k!} = Q\left(\sum_{k=0}^{\infty}\frac{D^k}{k!}\right)Q^{-1} = Qe^{D}Q^{-1} \]
and since \(D\) is diagonal then \(e^{D}\) is just again a diagonal matrix with entries \(e^{\lambda_i}\).
Example 1 (Part IV)
We compute \(e^A\). First let us calculate \(e^D\) using the expm
package.
expm::expm(D)
## [,1] [,2]
## [1,] 20.08554 0.0000000
## [2,] 0.00000 0.3678794
Note that
exp(3)
## [1] 20.08554
exp(-1)
## [1] 0.3678794
Now we compute the product
Q %*% expm::expm(D) %*% t(Q)
## [,1] [,2]
## [1,] 10.226708 9.858829
## [2,] 9.858829 10.226708
This coincides with the result obtained using expm
.
expm::expm(A)
## [,1] [,2]
## [1,] 10.226708 9.858829
## [2,] 9.858829 10.226708
In a similar manner, one can easily show that for any polynomial \(p(x)\) one has
\[ p(A) = \sum_{i=1}^{k}p(\lambda_i)P(\lambda_i) \]
Moreover, one can extend this relation to the space of continuous functions \(f:\text{spec}(A)\subset\mathbb{R}\longrightarrow \mathbb{C}\), this is known as the spectral mapping theorem.
Remark: The Cayley–Hamilton theorem says that every square matrix (over a commutative ring) satisfies its own characteristic polynomial.
Example 1 (Part V)
Let us verify this for the matrix \(A\).
(diag(2) - A) %*% (diag(2) - A) - 4*diag(2)
## [,1] [,2]
## [1,] 0 0
## [2,] 0 0
Principal Component Analysis
Given an observation matrix \(X\in M_{n\times p}(\mathbb{R})\), the covariance matrix \(A:= X^T X \in M_p(\mathbb{R})\) is clearly symmetric and therefore diagonalizable. In this context, principal component analysis just translates to reducing the dimensionality by projecting onto a subspace generated by a subset of eigenvectors of \(A\).
Spectrum of the Graph Laplacian
In various applications, like the spectral embedding non-linear dimensionality algorithm or spectral clustering, the spectral decomposition of the grah Laplacian is of much interest (see for example PyData Berlin 2018: On Laplacian Eigenmaps for Dimensionality Reduction).
Many More …
Spectral Theorem for Bounded and Unbounded Operators
This is just the begining! There is a beautifull rich theory on the spectral analysis of bounded and unbounded self-adjoint operators on Hilbert spaces with many applications (e.g. Quantum Mechanics, Fourier Decomposition, Signal Processing, …).
Comments
As a consequence of this theorem we see that there exist an orthogonal matrix \(Q\in SO(n)\) (i.e \(QQ^T=Q^TQ=I\) and \(\det(Q)=1\)) such that
\[ A = QDQ^{-1} \]
where \(D\) is a diagonal matrix containing the eigenvalues in \(A\) (with multiplicity). The matrix \(Q\) is constructed by stacking the normalized orthogonal eigenvectors of \(A\) as column vectors.
Example 1 (Part III)
Let us continue working with the matrix
\[ A = \left( \begin{array}{cc} 1 & 2\\ 2 & 1 \end{array} \right) \]
We have already verified the first three statements of the spectral theorem in Part I and Part II. For d. let us simply compute \(P(\lambda_1 = 3) + P(\lambda_2 = -1)\),
\[ \frac{1}{2} \left( \begin{array}{cc} 1 & 1 \\ 1 & 1 \end{array} \right) + \frac{1}{2} \left( \begin{array}{cc} 1 & -1 \\ -1 & 1 \end{array} \right) = \left( \begin{array}{cc} 1 & 0 \\ 0 & 1 \end{array} \right) = I \]
Finally, for e. we calculate
\[ \frac{3}{2} \left( \begin{array}{cc} 1 & 1 \\ 1 & 1 \end{array} \right) - \frac{1}{2} \left( \begin{array}{cc} 1 & -1 \\ -1 & 1 \end{array} \right) = \left( \begin{array}{cc} 1 & 2 \\ 2 & 1 \end{array} \right) = A \]
In R we can easily get the same results:
The matrix \(Q\) is simply given by
\[ Q = \frac{1}{\sqrt{2}} \left( \begin{array}{cc} 1 & -1 \\ 1 & 1 \end{array} \right) \] Obvserve that
\[ \frac{1}{\sqrt{2}} \left( \begin{array}{cc} 1 & -1 \\ 1 & 1 \end{array} \right) \left( \begin{array}{cc} 1 & 2\\ 2 & 1 \end{array} \right) \frac{1}{\sqrt{2}} \left( \begin{array}{cc} 1 & -1 \\ 1 & 1 \end{array} \right) = \left( \begin{array}{cc} 3 & 0\\ 0 & -1 \end{array} \right) \] In R this is an immediate computation
This completes the verification of the spectral theorem in this simple example.