Random Graphs: Erdős–Rényi, Phase Transitions & Small-World Networks
Random graph models like Erdős–Rényi reveal phase transitions — sharp thresholds where properties like connectivity appear suddenly — and provide baselines for real-world network analysis. The mathematics of these thresholds, through first and second moment methods, is the same toolkit used in statistical physics and probabilistic combinatorics.
Concepts
Erdős–Rényi random graph $G(n,p)$ with $n=20$ nodes. Each edge included independently with probability $p$. Phase transition at $p_c = 1/(n-1) \approx 0.053$.
Below a critical edge density, a random graph consists of small isolated fragments; above it, a single giant component emerges connecting most vertices. This phase transition — appearing suddenly at an exact threshold — is the foundational fact of random graph theory, and the same mathematical structure governs percolation in physics, satisfiability thresholds in logic, and epidemic spread in epidemiology.
The Erdős–Rényi Model
The Erdős–Rényi model places vertices and includes each of the possible edges independently with probability . The expected number of edges is . The degree of any vertex follows ; for , this converges to a distribution as .
The closely related model fixes exactly edges chosen uniformly at random. For , properties of and are asymptotically equivalent for most purposes, so the two models are often used interchangeably.
Phase Transitions and the Giant Component
The most striking feature of is a sharp phase transition at (equivalently when ):
- For : all connected components have size with high probability (whp).
- At : the largest component has size .
- For : a unique giant component of size exists whp, containing fraction of all vertices where satisfies .
This phase transition mirrors the percolation transition in statistical physics. The fraction of vertices in the giant component satisfies the self-consistency equation above: a vertex is in the giant component iff it has at least one neighbor also in the giant component, giving the fixed-point equation via branching process analysis.
The self-consistency equation is not just a calculation — it is the survival probability of a Poisson() branching process. Each vertex in the giant component recruits its neighbors, and is the probability this process survives to infinity. Branching processes are the right model because looks locally like a tree: in any bounded neighborhood, two edge-disjoint paths connecting the same pair of vertices require probability, so cycles are negligible at the local level.
Threshold Functions and Sharp Thresholds
For a monotone graph property (one preserved under edge addition), there exists a threshold function such that:
Key thresholds:
- Connectivity: . At this threshold, the last isolated vertex joins the graph, causing the jump to full connectivity.
- Triangle-freeness: . Below this, expected triangle count is ; above it, triangles proliferate.
- Hamiltonicity: , same as connectivity (the graph becomes Hamiltonian as soon as it has minimum degree ).
The Bollobás–Thomason theorem guarantees every monotone property has a sharp threshold: the transition window has width , meaning the jump from probability near 0 to near 1 happens over a negligible range of .
First and Second Moment Methods
First moment (Markov's inequality): If then (no isolated vertices whp once ). If counts objects, then if — but the converse requires more.
Second moment (Paley–Zygmund): To show , the Cauchy-Schwarz inequality gives:
The ratio is close to 1 when different objects are approximately independent. For isolated vertices at : and the ratio approaches 1, so — revealing the sharp threshold structure.
Stochastic Block Model
The stochastic block model (SBM) extends with community structure: vertices are partitioned into communities of equal size . Edges appear with probability within communities and across communities, with .
The Kesten–Stigum threshold determines when community recovery is information-theoretically possible. For 2 equal communities with and :
Below this threshold, no algorithm (even computationally unbounded) can recover communities better than random guessing. Above it, belief propagation achieves optimal recovery. The SBM is the canonical model for evaluating community detection algorithms, including spectral clustering (which achieves recovery down to the Kesten–Stigum threshold).
Worked Example
First moment method: no isolated vertices above log-threshold. Let = number of isolated vertices in with , . Each vertex is isolated with probability . By linearity:
By Markov's inequality, . So whp there are no isolated vertices, which implies (with more work) full connectivity.
Second moment: triangles in . Let = number of triangles. Each triple forms a triangle with probability , giving:
For concentration, compute . Pairs of triangles sharing no edge contribute each; pairs sharing one edge share 2 edge variables, contributing ; pairs sharing two edges (impossible for triangles). The dominant term gives while . By Chebyshev: . Setting gives . So concentrates around its mean.
Connections
Where Your Intuition Breaks
The sharp threshold at makes giant component emergence look like a sudden event — and in the limit it is. For finite , the transition is smooth; there is no at which a giant component suddenly appears. The threshold is a statement about asymptotic probability, not a finite- fact. More importantly, the threshold for a giant component () and the threshold for full connectivity () differ by a factor of . Between these thresholds, the graph has a giant component containing most vertices while also having isolated vertices outside it. "Almost all vertices connected" and "the graph is connected" are different events, and the gap between their thresholds is practically significant for any finite network: you can have a well-connected core and isolated outliers simultaneously.
The sharp threshold phenomenon is a manifestation of the local-to-global interaction structure of random graphs. When crosses , local tree-like neighborhoods suddenly merge into a giant network-wide component. This same threshold structure appears in statistical physics (Ising model), satisfiability (3-SAT threshold near clause-to-variable ratio 4.27), and error-correcting codes (channel capacity threshold). The Bollobás–Thomason theorem gives a unifying explanation: any monotone property must have a sharp threshold because of the product structure of independent edge probabilities and Talagrand's concentration inequality for Lipschitz functions on product spaces.
The Erdős–Rényi model is the null model for real networks — and real networks consistently violate its predictions in instructive ways. ER graphs have Poisson degree distributions; observed social and biological networks have heavy-tailed (power-law) degree distributions with . The Barabási–Albert preferential attachment model generates power-law degrees by adding new nodes that connect preferentially to high-degree nodes. ER graphs also have low clustering (); real networks have high clustering. These departures motivate the SBM, configuration model, and geometric random graph models.
Real networks violate all three main assumptions of the ER model simultaneously: edges are not independent (triadic closure), degree distributions are not Poisson (scale-free structure), and community memberships are not random (homophily). Applying ER-based null hypothesis tests directly to real network data gives systematically wrong baselines — a network's observed clustering coefficient may be 100x the ER prediction, but 10x the appropriate configuration model prediction that matches the degree sequence. Always match the null model to the structural features that are not under study.
Enjoying these notes?
Get new lessons delivered to your inbox. No spam.