Neural-Path/Notes
40 min

Random Graphs: Erdős–Rényi, Phase Transitions & Small-World Networks

Random graph models like Erdős–Rényi reveal phase transitions — sharp thresholds where properties like connectivity appear suddenly — and provide baselines for real-world network analysis. The mathematics of these thresholds, through first and second moment methods, is the same toolkit used in statistical physics and probabilistic combinatorics.

Concepts

Erdős–Rényi random graph $G(n,p)$ with $n=20$ nodes. Each edge included independently with probability $p$. Phase transition at $p_c = 1/(n-1) \approx 0.053$.

p = 0.08↑ above p_c
p_c = 0.053 (giant component threshold)
|E| = 15 / 190 max
components = 8
largest = 8 nodes (40%)
Near p_c: the phase transition region. A dominant component is forming but may not yet span Θ(n) vertices.

Below a critical edge density, a random graph consists of small isolated fragments; above it, a single giant component emerges connecting most vertices. This phase transition — appearing suddenly at an exact threshold — is the foundational fact of random graph theory, and the same mathematical structure governs percolation in physics, satisfiability thresholds in logic, and epidemic spread in epidemiology.

The Erdős–Rényi Model

The Erdős–Rényi model G(n,p)G(n,p) places nn vertices and includes each of the (n2)\binom{n}{2} possible edges independently with probability pp. The expected number of edges is (n2)pn2p/2\binom{n}{2}p \approx n^2 p/2. The degree of any vertex follows Bin(n1,p)\text{Bin}(n-1, p); for p=c/np = c/n, this converges to a Poisson(c)\text{Poisson}(c) distribution as nn \to \infty.

The closely related model G(n,m)G(n,m) fixes exactly mm edges chosen uniformly at random. For p=m/(n2)p = m/\binom{n}{2}, properties of G(n,m)G(n,m) and G(n,p)G(n,p) are asymptotically equivalent for most purposes, so the two models are often used interchangeably.

Phase Transitions and the Giant Component

The most striking feature of G(n,p)G(n,p) is a sharp phase transition at p=1/np = 1/n (equivalently c=1c = 1 when p=c/np = c/n):

  • For c<1c < 1: all connected components have size O(logn)O(\log n) with high probability (whp).
  • At c=1c = 1: the largest component has size Θ(n2/3)\Theta(n^{2/3}).
  • For c>1c > 1: a unique giant component of size Θ(n)\Theta(n) exists whp, containing fraction β>0\beta > 0 of all vertices where β\beta satisfies β=1ecβ\beta = 1 - e^{-c\beta}.

This phase transition mirrors the percolation transition in statistical physics. The fraction β\beta of vertices in the giant component satisfies the self-consistency equation above: a vertex vv is in the giant component iff it has at least one neighbor also in the giant component, giving the fixed-point equation via branching process analysis.

The self-consistency equation β=1ecβ\beta = 1 - e^{-c\beta} is not just a calculation — it is the survival probability of a Poisson(cc) branching process. Each vertex in the giant component recruits its Poisson(c)\text{Poisson}(c) neighbors, and β\beta is the probability this process survives to infinity. Branching processes are the right model because G(n,c/n)G(n, c/n) looks locally like a tree: in any bounded neighborhood, two edge-disjoint paths connecting the same pair of vertices require O(1/n)O(1/n) probability, so cycles are negligible at the local level.

Threshold Functions and Sharp Thresholds

For a monotone graph property P\mathcal{P} (one preserved under edge addition), there exists a threshold function p(n)p^*(n) such that:

Pr[G(n,p)P]{0if p/p01if p/p\Pr[G(n,p) \in \mathcal{P}] \to \begin{cases} 0 & \text{if } p/p^* \to 0 \\ 1 & \text{if } p/p^* \to \infty \end{cases}

Key thresholds:

  • Connectivity: p=log(n)/np^* = \log(n)/n. At this threshold, the last isolated vertex joins the graph, causing the jump to full connectivity.
  • Triangle-freeness: p=n1p^* = n^{-1}. Below this, expected triangle count is o(1)o(1); above it, triangles proliferate.
  • Hamiltonicity: p=log(n)/np^* = \log(n)/n, same as connectivity (the graph becomes Hamiltonian as soon as it has minimum degree 2\geq 2).

The Bollobás–Thomason theorem guarantees every monotone property has a sharp threshold: the transition window has width o(p)o(p^*), meaning the jump from probability near 0 to near 1 happens over a negligible range of pp.

First and Second Moment Methods

First moment (Markov's inequality): If E[X]0\mathbb{E}[X] \to 0 then Pr[X1]0\Pr[X \geq 1] \to 0 (no isolated vertices whp once plogn/np \gg \log n / n). If XX counts objects, then Pr[no object exists]E[X]0\Pr[\text{no object exists}] \leq \mathbb{E}[X] \to 0 if E[X]\mathbb{E}[X] \to \infty — but the converse requires more.

Second moment (Paley–Zygmund): To show Pr[X>0]1\Pr[X > 0] \to 1, the Cauchy-Schwarz inequality gives:

Pr[X>0](E[X])2E[X2]\Pr[X > 0] \geq \frac{(\mathbb{E}[X])^2}{\mathbb{E}[X^2]}

The ratio E[X2]/(E[X])2\mathbb{E}[X^2]/(\mathbb{E}[X])^2 is close to 1 when different objects are approximately independent. For isolated vertices at p=logn/np = \log n / n: E[X]1\mathbb{E}[X] \to 1 and the ratio approaches 1, so Pr[X>0]1/2\Pr[X > 0] \to 1/2 — revealing the sharp threshold structure.

Stochastic Block Model

The stochastic block model (SBM) extends G(n,p)G(n,p) with community structure: nn vertices are partitioned into kk communities of equal size n/kn/k. Edges appear with probability pinp_{\text{in}} within communities and poutp_{\text{out}} across communities, with pin>poutp_{\text{in}} > p_{\text{out}}.

The Kesten–Stigum threshold determines when community recovery is information-theoretically possible. For 2 equal communities with pin=a/np_{\text{in}} = a/n and pout=b/np_{\text{out}} = b/n:

(ab)2>2(\sqrt{a} - \sqrt{b})^2 > 2

Below this threshold, no algorithm (even computationally unbounded) can recover communities better than random guessing. Above it, belief propagation achieves optimal recovery. The SBM is the canonical model for evaluating community detection algorithms, including spectral clustering (which achieves recovery down to the Kesten–Stigum threshold).

Worked Example

First moment method: no isolated vertices above log-threshold. Let XX = number of isolated vertices in G(n,p)G(n,p) with p=clog(n)/np = c\log(n)/n, c>1c > 1. Each vertex vv is isolated with probability (1p)n1ep(n1)eclogn=nc(1-p)^{n-1} \approx e^{-p(n-1)} \approx e^{-c\log n} = n^{-c}. By linearity:

E[X]=nnc=n1c0(c>1)\mathbb{E}[X] = n \cdot n^{-c} = n^{1-c} \to 0 \quad (c > 1)

By Markov's inequality, Pr[X1]E[X]0\Pr[X \geq 1] \leq \mathbb{E}[X] \to 0. So whp there are no isolated vertices, which implies (with more work) full connectivity.

Second moment: triangles in G(n,1/2)G(n, 1/2). Let XX = number of triangles. Each triple {i,j,k}\{i,j,k\} forms a triangle with probability (1/2)3=1/8(1/2)^3 = 1/8, giving:

E[X]=(n3)18n348\mathbb{E}[X] = \binom{n}{3} \cdot \frac{1}{8} \approx \frac{n^3}{48}

For concentration, compute Var(X)=E[X2](E[X])2\text{Var}(X) = \mathbb{E}[X^2] - (\mathbb{E}[X])^2. Pairs of triangles sharing no edge contribute (1/8)2(1/8)^2 each; pairs sharing one edge share 2 edge variables, contributing (1/8)(1/4)(1/8)(1/4); pairs sharing two edges (impossible for triangles). The dominant term gives Var(X)=Θ(n4)\text{Var}(X) = \Theta(n^4) while (E[X])2=Θ(n6)(\mathbb{E}[X])^2 = \Theta(n^6). By Chebyshev: Pr[XE[X]>t]Var(X)/t2\Pr[|X - \mathbb{E}[X]| > t] \leq \text{Var}(X)/t^2. Setting t=εE[X]t = \varepsilon \mathbb{E}[X] gives Pr[XE[X]>εE[X]]Θ(n4)/((εn3)2)=Θ(n2)0\Pr[|X - \mathbb{E}[X]| > \varepsilon \mathbb{E}[X]] \leq \Theta(n^4)/((\varepsilon n^3)^2) = \Theta(n^{-2}) \to 0. So XX concentrates around its mean.

Connections

Where Your Intuition Breaks

The sharp threshold at p=1/np = 1/n makes giant component emergence look like a sudden event — and in the nn \to \infty limit it is. For finite nn, the transition is smooth; there is no nn at which a giant component suddenly appears. The threshold is a statement about asymptotic probability, not a finite-nn fact. More importantly, the threshold for a giant component (p=1/np = 1/n) and the threshold for full connectivity (p=logn/np = \log n / n) differ by a factor of logn\log n. Between these thresholds, the graph has a giant component containing most vertices while also having isolated vertices outside it. "Almost all vertices connected" and "the graph is connected" are different events, and the gap between their thresholds is practically significant for any finite network: you can have a well-connected core and isolated outliers simultaneously.

💡Intuition

The sharp threshold phenomenon is a manifestation of the local-to-global interaction structure of random graphs. When pp crosses 1/n1/n, local tree-like neighborhoods suddenly merge into a giant network-wide component. This same threshold structure appears in statistical physics (Ising model), satisfiability (3-SAT threshold near clause-to-variable ratio 4.27), and error-correcting codes (channel capacity threshold). The Bollobás–Thomason theorem gives a unifying explanation: any monotone property must have a sharp threshold because of the product structure of independent edge probabilities and Talagrand's concentration inequality for Lipschitz functions on product spaces.

💡Intuition

The Erdős–Rényi model is the null model for real networks — and real networks consistently violate its predictions in instructive ways. ER graphs have Poisson degree distributions; observed social and biological networks have heavy-tailed (power-law) degree distributions Pr[deg(v)=k]kγ\Pr[\deg(v) = k] \propto k^{-\gamma} with γ(2,3)\gamma \in (2,3). The Barabási–Albert preferential attachment model generates power-law degrees by adding new nodes that connect preferentially to high-degree nodes. ER graphs also have low clustering (Pr[triangleedge]p\Pr[\text{triangle} \mid \text{edge}] \approx p); real networks have high clustering. These departures motivate the SBM, configuration model, and geometric random graph models.

⚠️Warning

Real networks violate all three main assumptions of the ER model simultaneously: edges are not independent (triadic closure), degree distributions are not Poisson (scale-free structure), and community memberships are not random (homophily). Applying ER-based null hypothesis tests directly to real network data gives systematically wrong baselines — a network's observed clustering coefficient may be 100x the ER prediction, but 10x the appropriate configuration model prediction that matches the degree sequence. Always match the null model to the structural features that are not under study.

Enjoying these notes?

Get new lessons delivered to your inbox. No spam.