Spectral Graph Theory: Laplacians, Eigenvalues & the Cheeger Inequality

The graph Laplacian encodes connectivity through its eigenvalues and eigenvectors; the Cheeger inequality connects the second smallest eigenvalue to how well-connected the graph is. These spectral properties power clustering, random walk analysis, and the theoretical foundations of graph neural networks.

Concepts

Graph Laplacian $L = D - A$. The Fiedler value λ₂ (second smallest eigenvalue) measures algebraic connectivity: λ₂ = 0 iff the graph is disconnected.

Eigenvalues λ:

λ1

0.00

λ2

0.73

λ3

2.00

λ4

3.27

λ5

4.00

λ₂ = 0.73→ well connected

L is always positive semi-definite (all λ ≥ 0) with λ₁ = 0. Number of zero eigenvalues = number of connected components.

The adjacency matrix $A$ tells you which vertices are connected. The Laplacian $L = D - A$ tells you how information flows through the graph — and its eigenvalues tell you how fast it flows and where the bottlenecks are. Every spectral graph method, from clustering to GCN to random walk analysis, is reading properties of this single matrix.

The Graph Laplacian

For an undirected graph $G = (V, E)$ with adjacency matrix $A$ and degree matrix $D$ , the combinatorial Laplacian is:

$L = D - A$

$L$ is symmetric and positive semidefinite. For any vector $x \in \mathbb{R}^n$ :

$x^T L x = \sum_{\{i,j\} \in E} (x_i - x_j)^2 \geq 0$

This quadratic form measures how much $x$ varies across edges. The constant vector $\mathbf{1}$ satisfies $L\mathbf{1} = \mathbf{0}$ (since row sums of $L$ are zero), so $\lambda_1 = 0$ is always an eigenvalue with eigenvector $\mathbf{1}/\sqrt{n}$ .

The Laplacian had to take this form. If you want an operator on graph signals that is zero on constant functions (no variation across a constant signal) and measures variation by summing over edges (so each edge contributes locally), then $x^T L x = \sum_{\{i,j\}}(x_i - x_j)^2$ is the unique symmetric bilinear form with these properties — and it factors as $x^T(D - A)x$ .

The full spectrum $0 = \lambda_1 \leq \lambda_2 \leq \cdots \leq \lambda_n$ encodes graph structure. The normalized Laplacian $L_{\text{sym}} = D^{-1/2} L D^{-1/2}$ is preferable when degrees vary widely; its eigenvalues lie in $[0, 2]$ and it removes degree bias.

Algebraic Connectivity and the Fiedler Value

The second eigenvalue $\lambda_2$ of $L$ is the algebraic connectivity or Fiedler value. By the min-max (Courant-Fischer) theorem:

$\lambda_2 = \min_{x \perp \mathbf{1},\, x \neq 0} \frac{x^T L x}{\|x\|^2} = \min_{x \perp \mathbf{1}} \frac{\sum_{\{i,j\} \in E}(x_i - x_j)^2}{\sum_i x_i^2}$

Two key facts follow: (1) $\lambda_2 > 0$ if and only if $G$ is connected (disconnected graphs have $\mathbf{1}_{S}$ as a zero eigenvector for each component $S$ ); (2) $\lambda_2$ is large when the graph has no bottleneck — many edges crossing any partition.

The eigenvector corresponding to $\lambda_2$ , the Fiedler vector, provides a natural vertex ordering: vertices with similar Fiedler vector values tend to be well-connected to each other. Cutting the graph at the median of the Fiedler vector is a principled bipartition.

The Cheeger Inequality

The conductance (or Cheeger constant) of a graph measures the minimum bottleneck ratio over all cuts. For $S \subseteq V$ :

$h(G) = \min_{S : 0 < \text{vol}(S) \leq \text{vol}(V)/2} \frac{|E(S, \bar{S})|}{\text{vol}(S)}$

where $\text{vol}(S) = \sum_{v \in S} \deg(v)$ and $E(S, \bar{S})$ counts edges crossing the cut. The Cheeger inequality connects $h(G)$ to $\lambda_2$ of $L_{\text{sym}}$ :

$\frac{\lambda_2}{2} \leq h(G) \leq \sqrt{2\lambda_2}$

The lower bound is exact for expander graphs; the upper bound is constructive — the Fiedler vector sweep gives a cut achieving $h(G) \leq \sqrt{2\lambda_2}$ . This is the discrete analogue of the Cheeger inequality in Riemannian geometry, which bounds the Chebyshev constant of a manifold by the spectral gap of its Laplace-Beltrami operator.

Spectral Clustering

Spectral clustering embeds vertices into $\mathbb{R}^k$ using the first $k$ eigenvectors of $L$ (or $L_{\text{sym}}$ ), then applies $k$ -means in this embedding space. The algorithm is:

Compute the $k$ smallest eigenvectors $u_1, \ldots, u_k$ of $L_{\text{sym}}$ .
Form matrix $U \in \mathbb{R}^{n \times k}$ with these as columns; row-normalize if using $L_{\text{sym}}$ .
Cluster the rows of $U$ with $k$ -means.

This works because eigenvectors of $L$ are smooth with respect to graph structure — the Fiedler vector changes slowly across connected subgraphs. The embedding maps the combinatorial clustering problem into a geometric one where Euclidean $k$ -means is effective.

Random Walks and Mixing Time

The lazy random walk on $G$ stays put with probability $1/2$ and moves to a uniform random neighbor with probability $1/2$ . Its transition matrix is:

$P = I - \frac{L}{2d_{\max}} = I - \frac{D^{-1/2}L_{\text{sym}}D^{1/2}}{2d_{\max}}$

If $\mu_i$ are eigenvalues of $L_{\text{sym}}$ , then eigenvalues of $P$ are $1 - \mu_i/(2d_{\max}) \in [0, 1]$ . The distribution after $t$ steps converges to the stationary distribution (proportional to degree) at rate proportional to $(1 - \lambda_2/(2d_{\max}))^t$ . The mixing time — steps until the walk is close to stationary — is therefore $\Theta(d_{\max}/\lambda_2)$ . Large $\lambda_2$ (high conductance) means fast mixing.

Worked Example

Computing $L$ and finding the Fiedler vector. Take a 5-node path graph: $V = \{1,2,3,4,5\}$ , edges $\{1,2\}, \{2,3\}, \{3,4\}, \{4,5\}$ . The degree matrix is $D = \text{diag}(1,2,2,2,1)$ and adjacency matrix $A$ has 1s at positions $(i,i\pm 1)$ . The Laplacian $L = D - A$ has diagonal $(1,2,2,2,1)$ and off-diagonal $-1$ s at adjacent pairs.

The eigenvalues of the path Laplacian are $\lambda_k = 2 - 2\cos(k\pi/n)$ for $k = 0, \ldots, n-1$ . For $n=5$ : $\lambda_1 = 0$ , $\lambda_2 = 2 - 2\cos(\pi/5) \approx 0.382$ . Since $\lambda_2 > 0$ , the graph is connected. The Fiedler vector has components proportional to $\sin(k\pi/5)$ for $k=1,\ldots,5$ , giving values approximately $(0.59, 0.95, 0.95, 0.59, 0)$ after appropriate sign convention — monotone along the path, so the optimal bipartition cuts between vertices 2 and 3 or 3 and 4 (the middle), as expected.

Random walk convergence. On this path graph with $d_{\max} = 2$ , the spectral gap of $P$ is $1 - \lambda_2/(2 \cdot 2) \approx 1 - 0.0955 = 0.905$ . After $t$ steps, the total variation distance to stationarity is bounded by $(0.905)^t$ . To reach distance $\varepsilon = 0.01$ : $t \geq \log(0.01)/\log(0.905) \approx 46$ steps. The path graph mixes slowly because it has low conductance (removing one middle edge cuts volume roughly in half), consistent with small $\lambda_2$ .

Connections

Where Your Intuition Breaks

The Fiedler vector gives the best 2-partition in the sense of minimizing conductance (the Cheeger constant) — but conductance and minimum edge cut are different objectives. A graph with one high-degree hub connected to two dense subgraphs will be bisected by the Fiedler vector to separate the hub from one subgraph (because including the hub in any partition inflates the denominator $\text{vol}(S)$ ), not to cut between the two dense subgraphs (the intuitively "correct" clustering). Spectral partitioning is sensitive to vertex degree in exactly this way: the combinatorial Laplacian $L$ treats all edges equally, so high-degree vertices dominate. The normalized Laplacian $L_{\text{sym}} = D^{-1/2}LD^{-1/2}$ corrects for this by weighting by degree — always use $L_{\text{sym}}$ for clustering on graphs with heterogeneous degree distributions, not $L$ .

💡Intuition

The Cheeger inequality is the bridge between geometry and combinatorics. In Riemannian geometry, Cheeger's theorem bounds the first nonzero eigenvalue of the Laplace-Beltrami operator below by the square of the Cheeger constant of the manifold. The discrete graph version is structurally identical: $\lambda_2/2 \leq h(G) \leq \sqrt{2\lambda_2}$ . This is not a coincidence — graphs are discretizations of manifolds, and the Laplacian is the natural differential operator on both. This connection inspired the design of graph convolutional networks, which perform graph Fourier analysis using eigenvectors of $L$ just as classical Fourier analysis uses eigenfunctions of the continuous Laplacian.

💡Intuition

Spectral clustering is the origin point for graph neural network design. The graph Fourier transform represents a signal $x \in \mathbb{R}^n$ (one value per vertex) in the eigenbasis of $L$ : $\hat{x} = U^T x$ where $U$ contains eigenvectors as columns. Spectral graph convolution filters this signal: $\hat{x}_{\text{filtered}} = g(\Lambda)\hat{x}$ where $g$ is a filter function applied to eigenvalues. ChebConv approximates $g$ with Chebyshev polynomials; GCN uses a first-order approximation. Every GNN layer is therefore performing localized spectral filtering on the graph signal — the algebraic connectivity $\lambda_2$ controls the spatial scale of the filter.

⚠️Warning

Full spectral methods require eigendecomposition of $L$ , which costs $O(n^3)$ and is completely impractical for million-node graphs (citation networks, social networks, protein interaction networks). For large graphs, two approximations are standard: Lanczos iteration computes the $k$ smallest eigenvectors in $O(km)$ time with $k \ll n$ restarts; Chebyshev polynomial approximation of spectral filters avoids eigendecomposition entirely by computing matrix-vector products $L^j x$ directly in $O(jm)$ time. These scalable approximations are exactly what make GCN and related architectures tractable on large graphs.

Enjoying these notes?

Get new lessons delivered to your inbox. No spam.

Graph Fundamentals: Connectivity, Trees, Planarity & Colorings

Random Graphs: Erdős–Rényi, Phase Transitions & Small-World Networks