Requires:Eigenvalues, Eigenvectors & Diagonalization

The Spectral Theorem & Symmetric Matrices

The Spectral Theorem is one of the deepest and most useful results in linear algebra: every real symmetric matrix admits an orthonormal eigenbasis. This transforms symmetric matrices from abstract linear maps into concrete geometric objects — ellipsoids, quadratic forms, and projection operators — with a complete and stable numerical theory. PCA, kernel methods, Gaussian processes, spectral graph theory, and the analysis of optimization landscapes all rest on this foundation.

Concepts

Quadratic Form — Level Set xᵀAx = 1

off-diagonal b =0.00

A = [2, 0.00; 0.00, 2]

Positive Definite

λ₁ = 2.000, 1/√λ₁ = 0.707

λ₂ = 2.000, 1/√λ₂ = 0.707

tr(A) = 4.000

det(A) = 4.000

κ(A) = 1.00

Diagonal, equal eigenvalues — level set is a circle. Identity up to scale.

Dashed axes are the eigenvectors of A, with length 1/√λ — the semi-axes of the ellipse. Large eigenvalue = short axis (tight constraint). Small eigenvalue = long axis (loose constraint).

The Spectral Theorem says A = QΛQᵀ — the ellipse axes always align with eigenvectors, regardless of off-diagonal entries.

Every symmetric matrix is, at its core, a collection of stretches along orthogonal axes. The Spectral Theorem makes this geometric picture precise: for any real symmetric matrix, there always exists a rotation that transforms it into a pure diagonal scaling — an ellipse in 2D, an ellipsoid in $n$ dimensions. This is the foundational result behind PCA, kernel methods, and all second-order analysis of loss surfaces.

Statement of the Spectral Theorem

Theorem (Real Spectral Theorem). Let $A \in \mathbb{R}^{n \times n}$ be symmetric ( $A = A^T$ ). Then:

All eigenvalues of $A$ are real.
There exists an orthogonal matrix $Q \in \mathbb{R}^{n \times n}$ ( $Q^TQ = QQ^T = I$ ) and a real diagonal matrix $\Lambda = \operatorname{diag}(\lambda_1, \ldots, \lambda_n)$ such that

$A = Q\Lambda Q^T.$

The orthogonality of $Q$ is not optional — it is the consequence of symmetry. For a non-symmetric matrix, eigenvectors need not be orthogonal and the eigenbasis can be numerically ill-conditioned. Symmetry forces the eigenvectors into perpendicular directions, which is the only basis that is numerically stable and geometrically interpretable.

The columns $\mathbf{q}_1, \ldots, \mathbf{q}_n$ of $Q$ form an orthonormal eigenbasis of $\mathbb{R}^n$ , with $A\mathbf{q}_i = \lambda_i\mathbf{q}_i$ .

Proof sketch (induction on $n$ ). The key lemma is that every eigenspace of $A$ is $A$ -invariant and its orthogonal complement is also $A$ -invariant (for symmetric $A$ ). One proceeds as follows:

Base case $n=1$ : trivial.
Inductive step: By the Fundamental Theorem of Algebra, $A$ has at least one complex eigenvalue. By reality of eigenvalues (proved in Lesson 3), it has a real eigenvalue $\lambda_1$ with real unit eigenvector $\mathbf{q}_1$ .
Let $W = \operatorname{span}\{\mathbf{q}_1\}^\perp$ (the $(n-1)$ -dimensional orthogonal complement). For any $\mathbf{w} \in W$ : $\langle A\mathbf{w}, \mathbf{q}_1 \rangle = \langle \mathbf{w}, A\mathbf{q}_1 \rangle = \lambda_1 \langle \mathbf{w}, \mathbf{q}_1 \rangle = 0$ . So $A$ maps $W$ into $W$ — $W$ is $A$ -invariant.
The restriction $A|_W : W \to W$ is symmetric. By induction, it has an orthonormal eigenbasis $\{\mathbf{q}_2, \ldots, \mathbf{q}_n\}$ for $W$ .
Together, $\{\mathbf{q}_1, \ldots, \mathbf{q}_n\}$ is an orthonormal eigenbasis for $\mathbb{R}^n$ . $\square$

Outer-Product (Spectral) Decomposition

Since $A = Q\Lambda Q^T$ and $Q$ is orthogonal:

$A = \sum_{i=1}^n \lambda_i \mathbf{q}_i \mathbf{q}_i^T.$

Each term $\mathbf{q}_i \mathbf{q}_i^T$ is a rank-1 orthogonal projection onto the $i$ -th eigendirection. The entire matrix $A$ is a weighted sum of projectors, with eigenvalues as weights.

This decomposition is practically useful for:

Low-rank approximation: truncate to top $k$ terms: $A_k = \sum_{i=1}^k \lambda_i \mathbf{q}_i\mathbf{q}_i^T$ . By the Eckart-Young theorem, $A_k$ is the best rank- $k$ approximation to $A$ in both spectral and Frobenius norms.
Matrix functions: $f(A) = \sum_{i=1}^n f(\lambda_i) \mathbf{q}_i\mathbf{q}_i^T$ for any function $f$ (e.g., $f=\exp$ gives the matrix exponential, $f=\sqrt{\cdot}$ gives the matrix square root).
Pseudoinverse: $A^+ = \sum_{\lambda_i \neq 0} \frac{1}{\lambda_i} \mathbf{q}_i\mathbf{q}_i^T$ (invert only nonzero eigenvalues).

Quadratic Forms

A quadratic form associated with $A \in \mathbb{R}^{n \times n}$ symmetric is the scalar function

$Q_A(\mathbf{x}) = \mathbf{x}^T A \mathbf{x} = \sum_{i,j} a_{ij} x_i x_j.$

Using the spectral decomposition $A = Q\Lambda Q^T$ and substituting $\mathbf{y} = Q^T\mathbf{x}$ (a rotation):

$\mathbf{x}^T A \mathbf{x} = \mathbf{y}^T \Lambda \mathbf{y} = \sum_{i=1}^n \lambda_i y_i^2.$

In the eigenbasis, the quadratic form is purely diagonal — no cross terms. This is the algebraic statement of the Principal Axis Theorem: there always exists a rotation of coordinates that eliminates cross terms in a quadratic form.

Level sets $\{\mathbf{x} : \mathbf{x}^TA\mathbf{x} = c\}$ for $c > 0$ and $A \succ 0$ are ellipsoids. In the eigenbasis, the ellipsoid has semi-axes of length $\sqrt{c/\lambda_i}$ along the $i$ -th eigenvector direction. Large eigenvalue = short axis (tight constraint). Small eigenvalue = long axis (loose constraint).

Classification of quadratic forms:

Condition	Name	Level sets
All $\lambda_i > 0$	Positive definite	Ellipsoids (bounded, closed)
All $\lambda_i \geq 0$	Positive semidefinite	Degenerate ellipsoids (unbounded in null directions)
All $\lambda_i \leq 0$	Negative semidefinite	—
All $\lambda_i < 0$	Negative definite	—
Mixed signs	Indefinite	Hyperboloids (unbounded)

Schur Decomposition

The Spectral Theorem requires $A$ to be symmetric. For general square matrices, the best one can do is:

Theorem (Schur Decomposition). For any $A \in \mathbb{C}^{n \times n}$ , there exists a unitary $U$ ( $U^*U = I$ ) such that

$A = UTU^*$

where $T$ is upper triangular with the eigenvalues of $A$ on the diagonal.

If $A \in \mathbb{R}^{n \times n}$ and has only real eigenvalues, then $U$ can be chosen real orthogonal (real Schur form). The Schur decomposition is the foundation for the QR algorithm for eigenvalue computation.

Normal Matrices

A matrix $A$ (real or complex) is normal if $AA^* = A^*A$ . Normal matrices are precisely those that are unitarily diagonalizable:

$A \text{ normal} \iff A = U\Lambda U^*, \quad U \text{ unitary}, \quad \Lambda \text{ diagonal}.$

Examples of normal matrices:

Type	Condition	Eigenvalues
Symmetric (real)	$A = A^T$	Real
Hermitian (complex)	$A = A^*$	Real
Skew-symmetric	$A = -A^T$	Purely imaginary
Orthogonal	$A^TA = I$	$
Unitary	$A^*A = I$	On unit circle in $\mathbb{C}$

Non-normal matrices (e.g., upper triangular with distinct diagonal) may not have orthogonal eigenvectors — their eigenbasis can be poorly conditioned, leading to transient growth even when $\rho(A) < 1$ .

The Frobenius Norm and Eigenvalues

For symmetric $A$ with eigenvalues $\lambda_1, \ldots, \lambda_n$ :

$\|A\|_F^2 = \operatorname{tr}(A^TA) = \operatorname{tr}(A^2) = \sum_{i=1}^n \lambda_i^2.$

More generally, $\operatorname{tr}(A^k) = \sum_{i=1}^n \lambda_i^k$ for any positive integer $k$ . The trace of matrix powers thus provides a sequence of invariants (Newton's power sums) that determine the eigenvalues via the characteristic polynomial.

Simultaneous Diagonalizability

Theorem. Two symmetric matrices $A, B$ are simultaneously orthogonally diagonalizable (i.e., $\exists Q$ orthogonal such that both $Q^TAQ$ and $Q^TBQ$ are diagonal) if and only if $AB = BA$ .

This is important in statistics: the sample covariance matrix and a population covariance matrix are simultaneously diagonalizable if and only if they commute — a condition that arises in hypothesis testing (the Fisher discriminant, Hotelling's $T^2$ ).

Worked Example

Example 1: Spectral Decomposition and Low-Rank Approximation

Let $A = \begin{pmatrix}3 & 2 \\ 2 & 3\end{pmatrix}$ . From Lesson 3, $\lambda_1 = 5$ , $\lambda_2 = 1$ , $\mathbf{q}_1 = \frac{1}{\sqrt{2}}\begin{pmatrix}1\\1\end{pmatrix}$ , $\mathbf{q}_2 = \frac{1}{\sqrt{2}}\begin{pmatrix}1\\-1\end{pmatrix}$ .

Outer-product form:

$A = 5 \cdot \frac{1}{2}\begin{pmatrix}1\\1\end{pmatrix}\begin{pmatrix}1&1\end{pmatrix} + 1 \cdot \frac{1}{2}\begin{pmatrix}1\\-1\end{pmatrix}\begin{pmatrix}1&-1\end{pmatrix} = \frac{5}{2}\begin{pmatrix}1&1\\1&1\end{pmatrix} + \frac{1}{2}\begin{pmatrix}1&-1\\-1&1\end{pmatrix}.$

Verify: $\frac{5}{2}\begin{pmatrix}1&1\\1&1\end{pmatrix} + \frac{1}{2}\begin{pmatrix}1&-1\\-1&1\end{pmatrix} = \begin{pmatrix}3&2\\2&3\end{pmatrix}$ ✓

Rank-1 approximation (keep only the top term): $A_1 = \frac{5}{2}\begin{pmatrix}1&1\\1&1\end{pmatrix}$ . This captures $\frac{5}{5+1} = 83.3\%$ of the spectral energy (and of the Frobenius norm squared $= 25 + 1 = 26$ , the rank-1 term contributes $25$ ).

Matrix square root: $A^{1/2} = Q\Lambda^{1/2}Q^T = Q\operatorname{diag}(\sqrt{5}, 1)Q^T = \frac{1}{2}\begin{pmatrix}\sqrt{5}+1 & \sqrt{5}-1 \\ \sqrt{5}-1 & \sqrt{5}+1\end{pmatrix}$ .

Example 2: Diagonalizing a Quadratic Form

Consider $f(x_1, x_2) = 2x_1^2 + 4x_1 x_2 + 5x_2^2$ . Write in matrix form:

$f(\mathbf{x}) = \mathbf{x}^T A \mathbf{x}, \qquad A = \begin{pmatrix}2 & 2 \\ 2 & 5\end{pmatrix}.$

Characteristic polynomial: $(2-\lambda)(5-\lambda) - 4 = \lambda^2 - 7\lambda + 6 = (\lambda-6)(\lambda-1) = 0$ .

Eigenvalues: $\lambda_1 = 6$ , $\lambda_2 = 1$ . Semi-axes of the level set $f(\mathbf{x})=1$ : $1/\sqrt{6} \approx 0.408$ and $1/\sqrt{1} = 1$ .

Change of coordinates: in the eigenbasis $\mathbf{y} = Q^T\mathbf{x}$ :

$f(\mathbf{x}) = 6y_1^2 + y_2^2.$

No cross term. The level set $f(\mathbf{x}) = 1$ is an ellipse with semi-axes $1/\sqrt{6}$ (along $\mathbf{q}_1$ ) and $1$ (along $\mathbf{q}_2$ ).

Example 3: Kernel Matrix Decomposition

A symmetric positive semidefinite matrix $K \in \mathbb{R}^{n \times n}$ with $(K)_{ij} = k(\mathbf{x}_i, \mathbf{x}_j)$ (a kernel matrix / Gram matrix) has spectral decomposition $K = Q\Lambda Q^T$ , $\lambda_i \geq 0$ .

The Nyström approximation uses the top $m \ll n$ eigenpairs to approximate the full kernel:

$K \approx Q_m \Lambda_m Q_m^T,$

reducing storage from $O(n^2)$ to $O(nm)$ and matrix-vector products from $O(n^2)$ to $O(nm)$ .

Mercer's Theorem guarantees that any continuous, symmetric, positive semi-definite kernel $k(\cdot,\cdot)$ on a compact space admits a spectral expansion $k(\mathbf{x}, \mathbf{y}) = \sum_{i=1}^\infty \lambda_i \phi_i(\mathbf{x})\phi_i(\mathbf{y})$ — the infinite-dimensional analogue of the finite spectral theorem.

Connections

Where Your Intuition Breaks

Any matrix can be diagonalized by an orthogonal transformation. Only symmetric (and more generally, normal) matrices have orthogonal eigenbases. For a general non-symmetric matrix, the Jordan normal form is the best available decomposition — and Jordan blocks produce polynomial growth in matrix powers even when the spectral radius is below 1. A recurrent weight matrix with $\rho(W) = 0.99$ is stable in theory, but if it has Jordan blocks, the matrix powers $W^t$ can grow as $t^k$ before eventually decaying — this is precisely the transient growth that makes training certain RNN architectures numerically treacherous. The Spectral Theorem's guarantee of orthogonal eigenvectors is not universal; it is a special property of symmetric matrices.

Why Symmetry Is Not Just a Convenience

Symmetric matrices appear naturally wherever an inner product appears:

Covariance matrices $\Sigma = \mathbb{E}[(\mathbf{x}-\mu)(\mathbf{x}-\mu)^T]$ — by construction symmetric PSD
Gram matrices $K_{ij} = \langle \phi_i, \phi_j \rangle$ — by commutativity of inner product
Graph Laplacians $L = D - W$ where $W$ is a symmetric adjacency — symmetric PSD
Hessians $H_{ij} = \partial^2 f / \partial x_i \partial x_j$ — symmetric by Clairaut's theorem (for smooth $f$ )
Fisher information $\mathcal{I}(\theta)_{ij} = \mathbb{E}[\partial_i \log p \cdot \partial_j \log p]$ — symmetric PSD

In each case, the Spectral Theorem guarantees real eigenvalues, orthogonal eigenvectors, and a stable eigendecomposition — properties that do not hold for general matrices.

💡Intuition

Quadratic forms are the geometry of second-order information. The Hessian $H = \nabla^2 f(\mathbf{x}^*)$ at a critical point is symmetric. Its quadratic form $\mathbf{d}^T H \mathbf{d}$ measures the curvature of $f$ in direction $\mathbf{d}$ . Eigenvectors of $H$ are the directions of pure curvature; eigenvalues measure the curvature magnitude. This is why the eigenspectrum of the Hessian completely controls gradient descent convergence — and why ill-conditioned Hessians (large $\kappa(H)$ ) make optimization slow.

💡Intuition

Outer-product decomposition = interpretable features. In PCA, $C = \sum \lambda_i \mathbf{q}_i\mathbf{q}_i^T$ . Each term $\lambda_i\mathbf{q}_i\mathbf{q}_i^T$ is a rank-1 "feature" — the $i$ -th principal component captures $\lambda_i / \operatorname{tr}(C)$ fraction of total variance. Plotting the $\mathbf{q}_i$ (e.g., eigenfaces in face recognition, eigenmodes in physics) reveals the dominant patterns in data. The outer-product form makes this decomposition transparent.

⚠️Warning

Numerical computation: use eigh, not eig. For symmetric/Hermitian matrices, use numpy.linalg.eigh (or scipy.linalg.eigh) rather than the general eig. The eigh routine exploits symmetry (roughly halving the work), guarantees real output, returns eigenvalues sorted, and uses more numerically stable algorithms (divide-and-conquer or MRRR). Using eig on a symmetric matrix wastes computation and may return slightly complex eigenvalues due to floating-point artifacts.

Stability and Perturbation: Weyl's Theorem

Theorem (Weyl). Let $A, E \in \mathbb{R}^{n \times n}$ be symmetric with eigenvalues $\lambda_1 \leq \cdots \leq \lambda_n$ and $\mu_1 \leq \cdots \leq \mu_n$ for $A$ and $A + E$ respectively. Then:

$|\mu_k - \lambda_k| \leq \|E\|_2 \quad \text{for all } k.$

Eigenvalues of symmetric matrices are Lipschitz stable under perturbation — a $\delta$ perturbation in the matrix causes at most a $\delta$ change in each eigenvalue. This is in stark contrast to non-symmetric matrices, where eigenvalues can be extremely sensitive to perturbations.

This stability is why numerical eigenvalue algorithms for symmetric matrices (like the symmetric QR algorithm) achieve backward stability — the computed eigenvalues are the exact eigenvalues of a nearby matrix.

Comparing Spectral Decomposition to Other Decompositions

Decomposition	Form	Requires	Gives
Spectral ( $A$ symmetric)	$Q\Lambda Q^T$	$A = A^T$	Orthogonal eigenbasis, real $\lambda$
Eigendecomposition (general)	$PDP^{-1}$	$A$ diagonalizable	Eigenbasis (may not be orthogonal)
Schur	$UTU^*$	Any $A$	Orthonormal basis, triangular $T$
SVD	$U\Sigma V^T$	Any $A$ (even non-square)	Two orthonormal bases, non-neg $\sigma$
QR	$QR$	Any $A$	Orthogonal $Q$ , upper triangular $R$

For symmetric PSD matrices, the spectral decomposition and the SVD coincide: singular values equal eigenvalues, left and right singular vectors are both the eigenvectors.

Enjoying these notes?

Get new lessons delivered to your inbox. No spam.

Eigenvalues, Eigenvectors & Diagonalization

SVD, QR & LU Decompositions