Neural-Path/Notes
50 min

The Spectral Theorem & Symmetric Matrices

The Spectral Theorem is one of the deepest and most useful results in linear algebra: every real symmetric matrix admits an orthonormal eigenbasis. This transforms symmetric matrices from abstract linear maps into concrete geometric objects — ellipsoids, quadratic forms, and projection operators — with a complete and stable numerical theory. PCA, kernel methods, Gaussian processes, spectral graph theory, and the analysis of optimization landscapes all rest on this foundation.

Concepts

Quadratic Form — Level Set xᵀAx = 1

off-diagonal b =0.00

A = [2, 0.00; 0.00, 2]

1/√λ₁1/√λ₂O
Positive Definite

λ₁ = 2.000, 1/√λ₁ = 0.707

λ₂ = 2.000, 1/√λ₂ = 0.707

tr(A) = 4.000

det(A) = 4.000

κ(A) = 1.00

Diagonal, equal eigenvalues — level set is a circle. Identity up to scale.

Dashed axes are the eigenvectors of A, with length 1/√λ — the semi-axes of the ellipse. Large eigenvalue = short axis (tight constraint). Small eigenvalue = long axis (loose constraint).

The Spectral Theorem says A = QΛQᵀ — the ellipse axes always align with eigenvectors, regardless of off-diagonal entries.

Every symmetric matrix is, at its core, a collection of stretches along orthogonal axes. The Spectral Theorem makes this geometric picture precise: for any real symmetric matrix, there always exists a rotation that transforms it into a pure diagonal scaling — an ellipse in 2D, an ellipsoid in nn dimensions. This is the foundational result behind PCA, kernel methods, and all second-order analysis of loss surfaces.

Statement of the Spectral Theorem

Theorem (Real Spectral Theorem). Let ARn×nA \in \mathbb{R}^{n \times n} be symmetric (A=ATA = A^T). Then:

  1. All eigenvalues of AA are real.
  2. There exists an orthogonal matrix QRn×nQ \in \mathbb{R}^{n \times n} (QTQ=QQT=IQ^TQ = QQ^T = I) and a real diagonal matrix Λ=diag(λ1,,λn)\Lambda = \operatorname{diag}(\lambda_1, \ldots, \lambda_n) such that

A=QΛQT.A = Q\Lambda Q^T.

The orthogonality of QQ is not optional — it is the consequence of symmetry. For a non-symmetric matrix, eigenvectors need not be orthogonal and the eigenbasis can be numerically ill-conditioned. Symmetry forces the eigenvectors into perpendicular directions, which is the only basis that is numerically stable and geometrically interpretable.

  1. The columns q1,,qn\mathbf{q}_1, \ldots, \mathbf{q}_n of QQ form an orthonormal eigenbasis of Rn\mathbb{R}^n, with Aqi=λiqiA\mathbf{q}_i = \lambda_i\mathbf{q}_i.

Proof sketch (induction on nn). The key lemma is that every eigenspace of AA is AA-invariant and its orthogonal complement is also AA-invariant (for symmetric AA). One proceeds as follows:

  • Base case n=1n=1: trivial.
  • Inductive step: By the Fundamental Theorem of Algebra, AA has at least one complex eigenvalue. By reality of eigenvalues (proved in Lesson 3), it has a real eigenvalue λ1\lambda_1 with real unit eigenvector q1\mathbf{q}_1.
  • Let W=span{q1}W = \operatorname{span}\{\mathbf{q}_1\}^\perp (the (n1)(n-1)-dimensional orthogonal complement). For any wW\mathbf{w} \in W: Aw,q1=w,Aq1=λ1w,q1=0\langle A\mathbf{w}, \mathbf{q}_1 \rangle = \langle \mathbf{w}, A\mathbf{q}_1 \rangle = \lambda_1 \langle \mathbf{w}, \mathbf{q}_1 \rangle = 0. So AA maps WW into WWWW is AA-invariant.
  • The restriction AW:WWA|_W : W \to W is symmetric. By induction, it has an orthonormal eigenbasis {q2,,qn}\{\mathbf{q}_2, \ldots, \mathbf{q}_n\} for WW.
  • Together, {q1,,qn}\{\mathbf{q}_1, \ldots, \mathbf{q}_n\} is an orthonormal eigenbasis for Rn\mathbb{R}^n. \square

Outer-Product (Spectral) Decomposition

Since A=QΛQTA = Q\Lambda Q^T and QQ is orthogonal:

A=i=1nλiqiqiT.A = \sum_{i=1}^n \lambda_i \mathbf{q}_i \mathbf{q}_i^T.

Each term qiqiT\mathbf{q}_i \mathbf{q}_i^T is a rank-1 orthogonal projection onto the ii-th eigendirection. The entire matrix AA is a weighted sum of projectors, with eigenvalues as weights.

This decomposition is practically useful for:

  • Low-rank approximation: truncate to top kk terms: Ak=i=1kλiqiqiTA_k = \sum_{i=1}^k \lambda_i \mathbf{q}_i\mathbf{q}_i^T. By the Eckart-Young theorem, AkA_k is the best rank-kk approximation to AA in both spectral and Frobenius norms.
  • Matrix functions: f(A)=i=1nf(λi)qiqiTf(A) = \sum_{i=1}^n f(\lambda_i) \mathbf{q}_i\mathbf{q}_i^T for any function ff (e.g., f=expf=\exp gives the matrix exponential, f=f=\sqrt{\cdot} gives the matrix square root).
  • Pseudoinverse: A+=λi01λiqiqiTA^+ = \sum_{\lambda_i \neq 0} \frac{1}{\lambda_i} \mathbf{q}_i\mathbf{q}_i^T (invert only nonzero eigenvalues).

Quadratic Forms

A quadratic form associated with ARn×nA \in \mathbb{R}^{n \times n} symmetric is the scalar function

QA(x)=xTAx=i,jaijxixj.Q_A(\mathbf{x}) = \mathbf{x}^T A \mathbf{x} = \sum_{i,j} a_{ij} x_i x_j.

Using the spectral decomposition A=QΛQTA = Q\Lambda Q^T and substituting y=QTx\mathbf{y} = Q^T\mathbf{x} (a rotation):

xTAx=yTΛy=i=1nλiyi2.\mathbf{x}^T A \mathbf{x} = \mathbf{y}^T \Lambda \mathbf{y} = \sum_{i=1}^n \lambda_i y_i^2.

In the eigenbasis, the quadratic form is purely diagonal — no cross terms. This is the algebraic statement of the Principal Axis Theorem: there always exists a rotation of coordinates that eliminates cross terms in a quadratic form.

Level sets {x:xTAx=c}\{\mathbf{x} : \mathbf{x}^TA\mathbf{x} = c\} for c>0c > 0 and A0A \succ 0 are ellipsoids. In the eigenbasis, the ellipsoid has semi-axes of length c/λi\sqrt{c/\lambda_i} along the ii-th eigenvector direction. Large eigenvalue = short axis (tight constraint). Small eigenvalue = long axis (loose constraint).

Classification of quadratic forms:

ConditionNameLevel sets
All λi>0\lambda_i > 0Positive definiteEllipsoids (bounded, closed)
All λi0\lambda_i \geq 0Positive semidefiniteDegenerate ellipsoids (unbounded in null directions)
All λi0\lambda_i \leq 0Negative semidefinite
All λi<0\lambda_i < 0Negative definite
Mixed signsIndefiniteHyperboloids (unbounded)

Schur Decomposition

The Spectral Theorem requires AA to be symmetric. For general square matrices, the best one can do is:

Theorem (Schur Decomposition). For any ACn×nA \in \mathbb{C}^{n \times n}, there exists a unitary UU (UU=IU^*U = I) such that

A=UTUA = UTU^*

where TT is upper triangular with the eigenvalues of AA on the diagonal.

If ARn×nA \in \mathbb{R}^{n \times n} and has only real eigenvalues, then UU can be chosen real orthogonal (real Schur form). The Schur decomposition is the foundation for the QR algorithm for eigenvalue computation.

Normal Matrices

A matrix AA (real or complex) is normal if AA=AAAA^* = A^*A. Normal matrices are precisely those that are unitarily diagonalizable:

A normal    A=UΛU,U unitary,Λ diagonal.A \text{ normal} \iff A = U\Lambda U^*, \quad U \text{ unitary}, \quad \Lambda \text{ diagonal}.

Examples of normal matrices:

TypeConditionEigenvalues
Symmetric (real)A=ATA = A^TReal
Hermitian (complex)A=AA = A^*Real
Skew-symmetricA=ATA = -A^TPurely imaginary
OrthogonalATA=IA^TA = I$
UnitaryAA=IA^*A = IOn unit circle in C\mathbb{C}

Non-normal matrices (e.g., upper triangular with distinct diagonal) may not have orthogonal eigenvectors — their eigenbasis can be poorly conditioned, leading to transient growth even when ρ(A)<1\rho(A) < 1.

The Frobenius Norm and Eigenvalues

For symmetric AA with eigenvalues λ1,,λn\lambda_1, \ldots, \lambda_n:

AF2=tr(ATA)=tr(A2)=i=1nλi2.\|A\|_F^2 = \operatorname{tr}(A^TA) = \operatorname{tr}(A^2) = \sum_{i=1}^n \lambda_i^2.

More generally, tr(Ak)=i=1nλik\operatorname{tr}(A^k) = \sum_{i=1}^n \lambda_i^k for any positive integer kk. The trace of matrix powers thus provides a sequence of invariants (Newton's power sums) that determine the eigenvalues via the characteristic polynomial.

Simultaneous Diagonalizability

Theorem. Two symmetric matrices A,BA, B are simultaneously orthogonally diagonalizable (i.e., Q\exists Q orthogonal such that both QTAQQ^TAQ and QTBQQ^TBQ are diagonal) if and only if AB=BAAB = BA.

This is important in statistics: the sample covariance matrix and a population covariance matrix are simultaneously diagonalizable if and only if they commute — a condition that arises in hypothesis testing (the Fisher discriminant, Hotelling's T2T^2).

Worked Example

Example 1: Spectral Decomposition and Low-Rank Approximation

Let A=(3223)A = \begin{pmatrix}3 & 2 \\ 2 & 3\end{pmatrix}. From Lesson 3, λ1=5\lambda_1 = 5, λ2=1\lambda_2 = 1, q1=12(11)\mathbf{q}_1 = \frac{1}{\sqrt{2}}\begin{pmatrix}1\\1\end{pmatrix}, q2=12(11)\mathbf{q}_2 = \frac{1}{\sqrt{2}}\begin{pmatrix}1\\-1\end{pmatrix}.

Outer-product form:

A=512(11)(11)+112(11)(11)=52(1111)+12(1111).A = 5 \cdot \frac{1}{2}\begin{pmatrix}1\\1\end{pmatrix}\begin{pmatrix}1&1\end{pmatrix} + 1 \cdot \frac{1}{2}\begin{pmatrix}1\\-1\end{pmatrix}\begin{pmatrix}1&-1\end{pmatrix} = \frac{5}{2}\begin{pmatrix}1&1\\1&1\end{pmatrix} + \frac{1}{2}\begin{pmatrix}1&-1\\-1&1\end{pmatrix}.

Verify: 52(1111)+12(1111)=(3223)\frac{5}{2}\begin{pmatrix}1&1\\1&1\end{pmatrix} + \frac{1}{2}\begin{pmatrix}1&-1\\-1&1\end{pmatrix} = \begin{pmatrix}3&2\\2&3\end{pmatrix}

Rank-1 approximation (keep only the top term): A1=52(1111)A_1 = \frac{5}{2}\begin{pmatrix}1&1\\1&1\end{pmatrix}. This captures 55+1=83.3%\frac{5}{5+1} = 83.3\% of the spectral energy (and of the Frobenius norm squared =25+1=26= 25 + 1 = 26, the rank-1 term contributes 2525).

Matrix square root: A1/2=QΛ1/2QT=Qdiag(5,1)QT=12(5+151515+1)A^{1/2} = Q\Lambda^{1/2}Q^T = Q\operatorname{diag}(\sqrt{5}, 1)Q^T = \frac{1}{2}\begin{pmatrix}\sqrt{5}+1 & \sqrt{5}-1 \\ \sqrt{5}-1 & \sqrt{5}+1\end{pmatrix}.

Example 2: Diagonalizing a Quadratic Form

Consider f(x1,x2)=2x12+4x1x2+5x22f(x_1, x_2) = 2x_1^2 + 4x_1 x_2 + 5x_2^2. Write in matrix form:

f(x)=xTAx,A=(2225).f(\mathbf{x}) = \mathbf{x}^T A \mathbf{x}, \qquad A = \begin{pmatrix}2 & 2 \\ 2 & 5\end{pmatrix}.

Characteristic polynomial: (2λ)(5λ)4=λ27λ+6=(λ6)(λ1)=0(2-\lambda)(5-\lambda) - 4 = \lambda^2 - 7\lambda + 6 = (\lambda-6)(\lambda-1) = 0.

Eigenvalues: λ1=6\lambda_1 = 6, λ2=1\lambda_2 = 1. Semi-axes of the level set f(x)=1f(\mathbf{x})=1: 1/60.4081/\sqrt{6} \approx 0.408 and 1/1=11/\sqrt{1} = 1.

Change of coordinates: in the eigenbasis y=QTx\mathbf{y} = Q^T\mathbf{x}:

f(x)=6y12+y22.f(\mathbf{x}) = 6y_1^2 + y_2^2.

No cross term. The level set f(x)=1f(\mathbf{x}) = 1 is an ellipse with semi-axes 1/61/\sqrt{6} (along q1\mathbf{q}_1) and 11 (along q2\mathbf{q}_2).

Example 3: Kernel Matrix Decomposition

A symmetric positive semidefinite matrix KRn×nK \in \mathbb{R}^{n \times n} with (K)ij=k(xi,xj)(K)_{ij} = k(\mathbf{x}_i, \mathbf{x}_j) (a kernel matrix / Gram matrix) has spectral decomposition K=QΛQTK = Q\Lambda Q^T, λi0\lambda_i \geq 0.

The Nyström approximation uses the top mnm \ll n eigenpairs to approximate the full kernel:

KQmΛmQmT,K \approx Q_m \Lambda_m Q_m^T,

reducing storage from O(n2)O(n^2) to O(nm)O(nm) and matrix-vector products from O(n2)O(n^2) to O(nm)O(nm).

Mercer's Theorem guarantees that any continuous, symmetric, positive semi-definite kernel k(,)k(\cdot,\cdot) on a compact space admits a spectral expansion k(x,y)=i=1λiϕi(x)ϕi(y)k(\mathbf{x}, \mathbf{y}) = \sum_{i=1}^\infty \lambda_i \phi_i(\mathbf{x})\phi_i(\mathbf{y}) — the infinite-dimensional analogue of the finite spectral theorem.

Connections

Where Your Intuition Breaks

Any matrix can be diagonalized by an orthogonal transformation. Only symmetric (and more generally, normal) matrices have orthogonal eigenbases. For a general non-symmetric matrix, the Jordan normal form is the best available decomposition — and Jordan blocks produce polynomial growth in matrix powers even when the spectral radius is below 1. A recurrent weight matrix with ρ(W)=0.99\rho(W) = 0.99 is stable in theory, but if it has Jordan blocks, the matrix powers WtW^t can grow as tkt^k before eventually decaying — this is precisely the transient growth that makes training certain RNN architectures numerically treacherous. The Spectral Theorem's guarantee of orthogonal eigenvectors is not universal; it is a special property of symmetric matrices.

Why Symmetry Is Not Just a Convenience

Symmetric matrices appear naturally wherever an inner product appears:

  • Covariance matrices Σ=E[(xμ)(xμ)T]\Sigma = \mathbb{E}[(\mathbf{x}-\mu)(\mathbf{x}-\mu)^T] — by construction symmetric PSD
  • Gram matrices Kij=ϕi,ϕjK_{ij} = \langle \phi_i, \phi_j \rangle — by commutativity of inner product
  • Graph Laplacians L=DWL = D - W where WW is a symmetric adjacency — symmetric PSD
  • Hessians Hij=2f/xixjH_{ij} = \partial^2 f / \partial x_i \partial x_j — symmetric by Clairaut's theorem (for smooth ff)
  • Fisher information I(θ)ij=E[ilogpjlogp]\mathcal{I}(\theta)_{ij} = \mathbb{E}[\partial_i \log p \cdot \partial_j \log p] — symmetric PSD

In each case, the Spectral Theorem guarantees real eigenvalues, orthogonal eigenvectors, and a stable eigendecomposition — properties that do not hold for general matrices.

💡Intuition

Quadratic forms are the geometry of second-order information. The Hessian H=2f(x)H = \nabla^2 f(\mathbf{x}^*) at a critical point is symmetric. Its quadratic form dTHd\mathbf{d}^T H \mathbf{d} measures the curvature of ff in direction d\mathbf{d}. Eigenvectors of HH are the directions of pure curvature; eigenvalues measure the curvature magnitude. This is why the eigenspectrum of the Hessian completely controls gradient descent convergence — and why ill-conditioned Hessians (large κ(H)\kappa(H)) make optimization slow.

💡Intuition

Outer-product decomposition = interpretable features. In PCA, C=λiqiqiTC = \sum \lambda_i \mathbf{q}_i\mathbf{q}_i^T. Each term λiqiqiT\lambda_i\mathbf{q}_i\mathbf{q}_i^T is a rank-1 "feature" — the ii-th principal component captures λi/tr(C)\lambda_i / \operatorname{tr}(C) fraction of total variance. Plotting the qi\mathbf{q}_i (e.g., eigenfaces in face recognition, eigenmodes in physics) reveals the dominant patterns in data. The outer-product form makes this decomposition transparent.

⚠️Warning

Numerical computation: use eigh, not eig. For symmetric/Hermitian matrices, use numpy.linalg.eigh (or scipy.linalg.eigh) rather than the general eig. The eigh routine exploits symmetry (roughly halving the work), guarantees real output, returns eigenvalues sorted, and uses more numerically stable algorithms (divide-and-conquer or MRRR). Using eig on a symmetric matrix wastes computation and may return slightly complex eigenvalues due to floating-point artifacts.

Stability and Perturbation: Weyl's Theorem

Theorem (Weyl). Let A,ERn×nA, E \in \mathbb{R}^{n \times n} be symmetric with eigenvalues λ1λn\lambda_1 \leq \cdots \leq \lambda_n and μ1μn\mu_1 \leq \cdots \leq \mu_n for AA and A+EA + E respectively. Then:

μkλkE2for all k.|\mu_k - \lambda_k| \leq \|E\|_2 \quad \text{for all } k.

Eigenvalues of symmetric matrices are Lipschitz stable under perturbation — a δ\delta perturbation in the matrix causes at most a δ\delta change in each eigenvalue. This is in stark contrast to non-symmetric matrices, where eigenvalues can be extremely sensitive to perturbations.

This stability is why numerical eigenvalue algorithms for symmetric matrices (like the symmetric QR algorithm) achieve backward stability — the computed eigenvalues are the exact eigenvalues of a nearby matrix.

Comparing Spectral Decomposition to Other Decompositions

DecompositionFormRequiresGives
Spectral (AA symmetric)QΛQTQ\Lambda Q^TA=ATA = A^TOrthogonal eigenbasis, real λ\lambda
Eigendecomposition (general)PDP1PDP^{-1}AA diagonalizableEigenbasis (may not be orthogonal)
SchurUTUUTU^*Any AAOrthonormal basis, triangular TT
SVDUΣVTU\Sigma V^TAny AA (even non-square)Two orthonormal bases, non-neg σ\sigma
QRQRQRAny AAOrthogonal QQ, upper triangular RR

For symmetric PSD matrices, the spectral decomposition and the SVD coincide: singular values equal eigenvalues, left and right singular vectors are both the eigenvectors.

Enjoying these notes?

Get new lessons delivered to your inbox. No spam.