Neural-Path/Notes
45 min

Spectral Graph Theory: Laplacians, Eigenvalues & the Cheeger Inequality

The graph Laplacian encodes connectivity through its eigenvalues and eigenvectors; the Cheeger inequality connects the second smallest eigenvalue to how well-connected the graph is. These spectral properties power clustering, random walk analysis, and the theoretical foundations of graph neural networks.

Concepts

Graph Laplacian $L = D - A$. The Fiedler value λ₂ (second smallest eigenvalue) measures algebraic connectivity: λ₂ = 0 iff the graph is disconnected.

1d=22d=33d=24d=25d=3

Eigenvalues λ:

λ1
0.00
λ2
0.73
λ3
2.00
λ4
3.27
λ5
4.00
λ₂ = 0.73→ well connected
L is always positive semi-definite (all λ ≥ 0) with λ₁ = 0. Number of zero eigenvalues = number of connected components.

The adjacency matrix AA tells you which vertices are connected. The Laplacian L=DAL = D - A tells you how information flows through the graph — and its eigenvalues tell you how fast it flows and where the bottlenecks are. Every spectral graph method, from clustering to GCN to random walk analysis, is reading properties of this single matrix.

The Graph Laplacian

For an undirected graph G=(V,E)G = (V, E) with adjacency matrix AA and degree matrix DD, the combinatorial Laplacian is:

L=DAL = D - A

LL is symmetric and positive semidefinite. For any vector xRnx \in \mathbb{R}^n:

xTLx={i,j}E(xixj)20x^T L x = \sum_{\{i,j\} \in E} (x_i - x_j)^2 \geq 0

This quadratic form measures how much xx varies across edges. The constant vector 1\mathbf{1} satisfies L1=0L\mathbf{1} = \mathbf{0} (since row sums of LL are zero), so λ1=0\lambda_1 = 0 is always an eigenvalue with eigenvector 1/n\mathbf{1}/\sqrt{n}.

The Laplacian had to take this form. If you want an operator on graph signals that is zero on constant functions (no variation across a constant signal) and measures variation by summing over edges (so each edge contributes locally), then xTLx={i,j}(xixj)2x^T L x = \sum_{\{i,j\}}(x_i - x_j)^2 is the unique symmetric bilinear form with these properties — and it factors as xT(DA)xx^T(D - A)x.

The full spectrum 0=λ1λ2λn0 = \lambda_1 \leq \lambda_2 \leq \cdots \leq \lambda_n encodes graph structure. The normalized Laplacian Lsym=D1/2LD1/2L_{\text{sym}} = D^{-1/2} L D^{-1/2} is preferable when degrees vary widely; its eigenvalues lie in [0,2][0, 2] and it removes degree bias.

Algebraic Connectivity and the Fiedler Value

The second eigenvalue λ2\lambda_2 of LL is the algebraic connectivity or Fiedler value. By the min-max (Courant-Fischer) theorem:

λ2=minx1,x0xTLxx2=minx1{i,j}E(xixj)2ixi2\lambda_2 = \min_{x \perp \mathbf{1},\, x \neq 0} \frac{x^T L x}{\|x\|^2} = \min_{x \perp \mathbf{1}} \frac{\sum_{\{i,j\} \in E}(x_i - x_j)^2}{\sum_i x_i^2}

Two key facts follow: (1) λ2>0\lambda_2 > 0 if and only if GG is connected (disconnected graphs have 1S\mathbf{1}_{S} as a zero eigenvector for each component SS); (2) λ2\lambda_2 is large when the graph has no bottleneck — many edges crossing any partition.

The eigenvector corresponding to λ2\lambda_2, the Fiedler vector, provides a natural vertex ordering: vertices with similar Fiedler vector values tend to be well-connected to each other. Cutting the graph at the median of the Fiedler vector is a principled bipartition.

The Cheeger Inequality

The conductance (or Cheeger constant) of a graph measures the minimum bottleneck ratio over all cuts. For SVS \subseteq V:

h(G)=minS:0<vol(S)vol(V)/2E(S,Sˉ)vol(S)h(G) = \min_{S : 0 < \text{vol}(S) \leq \text{vol}(V)/2} \frac{|E(S, \bar{S})|}{\text{vol}(S)}

where vol(S)=vSdeg(v)\text{vol}(S) = \sum_{v \in S} \deg(v) and E(S,Sˉ)E(S, \bar{S}) counts edges crossing the cut. The Cheeger inequality connects h(G)h(G) to λ2\lambda_2 of LsymL_{\text{sym}}:

λ22h(G)2λ2\frac{\lambda_2}{2} \leq h(G) \leq \sqrt{2\lambda_2}

The lower bound is exact for expander graphs; the upper bound is constructive — the Fiedler vector sweep gives a cut achieving h(G)2λ2h(G) \leq \sqrt{2\lambda_2}. This is the discrete analogue of the Cheeger inequality in Riemannian geometry, which bounds the Chebyshev constant of a manifold by the spectral gap of its Laplace-Beltrami operator.

Spectral Clustering

Spectral clustering embeds vertices into Rk\mathbb{R}^k using the first kk eigenvectors of LL (or LsymL_{\text{sym}}), then applies kk-means in this embedding space. The algorithm is:

  1. Compute the kk smallest eigenvectors u1,,uku_1, \ldots, u_k of LsymL_{\text{sym}}.
  2. Form matrix URn×kU \in \mathbb{R}^{n \times k} with these as columns; row-normalize if using LsymL_{\text{sym}}.
  3. Cluster the rows of UU with kk-means.

This works because eigenvectors of LL are smooth with respect to graph structure — the Fiedler vector changes slowly across connected subgraphs. The embedding maps the combinatorial clustering problem into a geometric one where Euclidean kk-means is effective.

Random Walks and Mixing Time

The lazy random walk on GG stays put with probability 1/21/2 and moves to a uniform random neighbor with probability 1/21/2. Its transition matrix is:

P=IL2dmax=ID1/2LsymD1/22dmaxP = I - \frac{L}{2d_{\max}} = I - \frac{D^{-1/2}L_{\text{sym}}D^{1/2}}{2d_{\max}}

If μi\mu_i are eigenvalues of LsymL_{\text{sym}}, then eigenvalues of PP are 1μi/(2dmax)[0,1]1 - \mu_i/(2d_{\max}) \in [0, 1]. The distribution after tt steps converges to the stationary distribution (proportional to degree) at rate proportional to (1λ2/(2dmax))t(1 - \lambda_2/(2d_{\max}))^t. The mixing time — steps until the walk is close to stationary — is therefore Θ(dmax/λ2)\Theta(d_{\max}/\lambda_2). Large λ2\lambda_2 (high conductance) means fast mixing.

Worked Example

Computing LL and finding the Fiedler vector. Take a 5-node path graph: V={1,2,3,4,5}V = \{1,2,3,4,5\}, edges {1,2},{2,3},{3,4},{4,5}\{1,2\}, \{2,3\}, \{3,4\}, \{4,5\}. The degree matrix is D=diag(1,2,2,2,1)D = \text{diag}(1,2,2,2,1) and adjacency matrix AA has 1s at positions (i,i±1)(i,i\pm 1). The Laplacian L=DAL = D - A has diagonal (1,2,2,2,1)(1,2,2,2,1) and off-diagonal 1-1s at adjacent pairs.

The eigenvalues of the path Laplacian are λk=22cos(kπ/n)\lambda_k = 2 - 2\cos(k\pi/n) for k=0,,n1k = 0, \ldots, n-1. For n=5n=5: λ1=0\lambda_1 = 0, λ2=22cos(π/5)0.382\lambda_2 = 2 - 2\cos(\pi/5) \approx 0.382. Since λ2>0\lambda_2 > 0, the graph is connected. The Fiedler vector has components proportional to sin(kπ/5)\sin(k\pi/5) for k=1,,5k=1,\ldots,5, giving values approximately (0.59,0.95,0.95,0.59,0)(0.59, 0.95, 0.95, 0.59, 0) after appropriate sign convention — monotone along the path, so the optimal bipartition cuts between vertices 2 and 3 or 3 and 4 (the middle), as expected.

Random walk convergence. On this path graph with dmax=2d_{\max} = 2, the spectral gap of PP is 1λ2/(22)10.0955=0.9051 - \lambda_2/(2 \cdot 2) \approx 1 - 0.0955 = 0.905. After tt steps, the total variation distance to stationarity is bounded by (0.905)t(0.905)^t. To reach distance ε=0.01\varepsilon = 0.01: tlog(0.01)/log(0.905)46t \geq \log(0.01)/\log(0.905) \approx 46 steps. The path graph mixes slowly because it has low conductance (removing one middle edge cuts volume roughly in half), consistent with small λ2\lambda_2.

Connections

Where Your Intuition Breaks

The Fiedler vector gives the best 2-partition in the sense of minimizing conductance (the Cheeger constant) — but conductance and minimum edge cut are different objectives. A graph with one high-degree hub connected to two dense subgraphs will be bisected by the Fiedler vector to separate the hub from one subgraph (because including the hub in any partition inflates the denominator vol(S)\text{vol}(S)), not to cut between the two dense subgraphs (the intuitively "correct" clustering). Spectral partitioning is sensitive to vertex degree in exactly this way: the combinatorial Laplacian LL treats all edges equally, so high-degree vertices dominate. The normalized Laplacian Lsym=D1/2LD1/2L_{\text{sym}} = D^{-1/2}LD^{-1/2} corrects for this by weighting by degree — always use LsymL_{\text{sym}} for clustering on graphs with heterogeneous degree distributions, not LL.

💡Intuition

The Cheeger inequality is the bridge between geometry and combinatorics. In Riemannian geometry, Cheeger's theorem bounds the first nonzero eigenvalue of the Laplace-Beltrami operator below by the square of the Cheeger constant of the manifold. The discrete graph version is structurally identical: λ2/2h(G)2λ2\lambda_2/2 \leq h(G) \leq \sqrt{2\lambda_2}. This is not a coincidence — graphs are discretizations of manifolds, and the Laplacian is the natural differential operator on both. This connection inspired the design of graph convolutional networks, which perform graph Fourier analysis using eigenvectors of LL just as classical Fourier analysis uses eigenfunctions of the continuous Laplacian.

💡Intuition

Spectral clustering is the origin point for graph neural network design. The graph Fourier transform represents a signal xRnx \in \mathbb{R}^n (one value per vertex) in the eigenbasis of LL: x^=UTx\hat{x} = U^T x where UU contains eigenvectors as columns. Spectral graph convolution filters this signal: x^filtered=g(Λ)x^\hat{x}_{\text{filtered}} = g(\Lambda)\hat{x} where gg is a filter function applied to eigenvalues. ChebConv approximates gg with Chebyshev polynomials; GCN uses a first-order approximation. Every GNN layer is therefore performing localized spectral filtering on the graph signal — the algebraic connectivity λ2\lambda_2 controls the spatial scale of the filter.

⚠️Warning

Full spectral methods require eigendecomposition of LL, which costs O(n3)O(n^3) and is completely impractical for million-node graphs (citation networks, social networks, protein interaction networks). For large graphs, two approximations are standard: Lanczos iteration computes the kk smallest eigenvectors in O(km)O(km) time with knk \ll n restarts; Chebyshev polynomial approximation of spectral filters avoids eigendecomposition entirely by computing matrix-vector products LjxL^j x directly in O(jm)O(jm) time. These scalable approximations are exactly what make GCN and related architectures tractable on large graphs.

Enjoying these notes?

Get new lessons delivered to your inbox. No spam.