Neural-Path/Notes
45 min

Generating Functions, Combinatorial Enumeration & the Probabilistic Method

Generating functions encode combinatorial sequences as power series, making enumeration problems algebraic; the probabilistic method proves existence of combinatorial objects by showing a random construction succeeds with positive probability. Together they form two of the most powerful nonconstructive tools in discrete mathematics.

Concepts

Erdős–Rényi random graph $G(n,p)$ with $n=20$ nodes. Each edge included independently with probability $p$. Phase transition at $p_c = 1/(n-1) \approx 0.053$.

p = 0.08↑ above p_c
p_c = 0.053 (giant component threshold)
|E| = 15 / 190 max
components = 8
largest = 8 nodes (40%)
Near p_c: the phase transition region. A dominant component is forming but may not yet span Θ(n) vertices.

Combinatorics asks two questions: "how many?" and "does one exist?" The answers use completely different techniques. Generating functions convert counting sequences into algebraic equations that can be solved symbolically; the probabilistic method bypasses counting entirely by showing a random construction succeeds with positive probability, guaranteeing existence without constructing the object.

Ordinary and Exponential Generating Functions

The ordinary generating function (OGF) of a sequence (an)(a_n) is the formal power series:

A(x)=n=0anxnA(x) = \sum_{n=0}^{\infty} a_n x^n

Addition of sequences corresponds to addition of OGFs; convolution of sequences (cn=k=0nakbnk)(c_n = \sum_{k=0}^n a_k b_{n-k}) corresponds to multiplication: C(x)=A(x)B(x)C(x) = A(x) \cdot B(x). This algebraic structure makes GFs ideal for counting objects built from independent parts.

The convolution rule C(x)=A(x)B(x)C(x) = A(x) \cdot B(x) is why power series are the natural representation for counting: most combinatorial structures are built by splitting into a left component of size kk and a right component of size nkn - k. Multiplying OGFs algebraically encodes this splitting — the coefficient of xnx^n in the product is the sum over all splits, which is exactly the count by inclusion over partition points. The EGF uses xn/n!x^n/n! to handle labeling: labeled structures of size nn split in (nk)\binom{n}{k} ways (choosing which labels go left), and the EGF product absorbs this binomial coefficient automatically via the product rule for EGFs.

For labeled structures where order matters, the exponential generating function (EGF) is:

A(x)=n=0anxnn!A(x) = \sum_{n=0}^{\infty} a_n \frac{x^n}{n!}

The EGF of permutations is n0n!xn/n!=1/(1x)\sum_{n\geq 0} n! \cdot x^n/n! = 1/(1-x). The EGF of derangements (permutations with no fixed points) is ex/(1x)e^{-x}/(1-x), since the derangement count is Dn=n!k=0n(1)k/k!D_n = n!\sum_{k=0}^n (-1)^k/k!. The EGF of labeled rooted trees is the solution to T(x)=xeT(x)T(x) = x e^{T(x)} (by Lagrange inversion, T(x)=n1nn1xn/n!T(x) = \sum_{n \geq 1} n^{n-1} x^n/n!, recovering Cayley's formula).

Fibonacci and Catalan Numbers via Generating Functions

Fibonacci: The recurrence Fn=Fn1+Fn2F_n = F_{n-1} + F_{n-2} with F0=0F_0 = 0, F1=1F_1 = 1 gives F(x)=x/(1xx2)F(x) = x/(1-x-x^2). Partial fractions with roots ϕ=(1+5)/2\phi = (1+\sqrt{5})/2, ψ=(15)/2\psi = (1-\sqrt{5})/2 yield Binet's formula Fn=(ϕnψn)/5F_n = (\phi^n - \psi^n)/\sqrt{5}.

Catalan numbers: CnC_n counts binary trees with nn internal nodes, triangulations of an (n+2)(n+2)-gon, and Dyck paths of length 2n2n. The recurrence Cn=k=0n1CkCn1kC_n = \sum_{k=0}^{n-1} C_k C_{n-1-k} says "split at the root." Taking OGF:

C(x)=1+xC(x)2C(x) = 1 + x \cdot C(x)^2

Solving: C(x)=(114x)/(2x)C(x) = (1 - \sqrt{1-4x})/(2x). Expanding via the generalized binomial theorem and using the saddle point method gives Cn4n/(n3/2π)C_n \sim 4^n / (n^{3/2}\sqrt{\pi}).

The Probabilistic Method

The probabilistic method (Erdős, 1947) shows existence of a combinatorial object by constructing a probability space over all candidate objects and proving Pr[object has desired property]>0\Pr[\text{object has desired property}] > 0. Since the probability is positive, at least one such object exists — without constructing it.

Basic form: To show a graph with nn vertices and mm edges has chromatic number χk\chi \geq k, exhibit a probability distribution over colorings and show all have a monochromatic edge with positive probability.

Alteration method: First apply a random construction, then fix the small number of violations. Example: take a random subset of vertices by including each independently with probability pp; vertices violating independence are removed. Optimizing pp gives the independence number lower bound α(G)n/(2d)\alpha(G) \geq n/(2d) for dd-regular graphs.

Lovász Local Lemma (LLL): Let A1,,AmA_1, \ldots, A_m be "bad" events, each with probability at most pp, and each depending on (sharing probability space structure with) at most dd other events. If:

ep(d+1)1ep(d+1) \leq 1

then Pr[iAi]>0\Pr[\bigcap_i \overline{A_i}] > 0 — all bad events can be avoided simultaneously. Here ee is Euler's number.

Ramsey Numbers and Graph Coloring Bounds

The Ramsey number R(k,k)R(k,k) is the minimum nn such that every 2-coloring of the edges of KnK_n contains a monochromatic kk-clique. Ramsey (1930) showed R(k,k)R(k,k) is finite; the probabilistic method gives the lower bound:

R(k,k)>2k/2R(k,k) > 2^{k/2}

Proof: Color each edge of KnK_n red or blue independently with probability 1/21/2. For a fixed set of kk vertices, the probability all (k2)\binom{k}{2} edges are the same color is 2(1/2)(k2)=21(k2)2 \cdot (1/2)^{\binom{k}{2}} = 2^{1-\binom{k}{2}}. By union bound over (nk)\binom{n}{k} sets:

Pr[monochromatic k-clique exists](nk)21(k2)<1\Pr[\text{monochromatic } k\text{-clique exists}] \leq \binom{n}{k} \cdot 2^{1-\binom{k}{2}} < 1

when n<2k/2n < 2^{k/2}. For such nn, there exists a 2-coloring with no monochromatic kk-clique, so R(k,k)>nR(k,k) > n.

For kk-SAT: a CNF formula where each clause has exactly kk literals and each variable appears in at most 2k22^{k-2} clauses is always satisfiable. By LLL with p=(1/2)kp = (1/2)^k (probability a random assignment falsifies one clause) and dk2k2d \leq k \cdot 2^{k-2} dependencies: ep(d+1)e2k(k2k2+1)ek/4<1ep(d+1) \leq e \cdot 2^{-k} \cdot (k \cdot 2^{k-2} + 1) \leq ek/4 < 1 for k4k \geq 4.

Worked Example

Catalan numbers via OGF. Start from C(x)=1+xC(x)2C(x) = 1 + x C(x)^2. Rearranging: xC2C+1=0xC^2 - C + 1 = 0. Quadratic formula gives C(x)=(1±14x)/(2x)C(x) = (1 \pm \sqrt{1-4x})/(2x). The minus sign is chosen so C(0)=1C(0) = 1 (use L'Hôpital: as x0x \to 0, (114x)/(2x)1(1-\sqrt{1-4x})/(2x) \to 1). Expanding 14x\sqrt{1-4x} via the generalized binomial theorem:

14x=n=0(1/2n)(4x)n\sqrt{1-4x} = \sum_{n=0}^{\infty} \binom{1/2}{n}(-4x)^n

Extracting the coefficient of xnx^n in C(x)C(x) yields Cn=1n+1(2nn)C_n = \frac{1}{n+1}\binom{2n}{n}. For n=4n = 4: C4=15(84)=70/5=14C_4 = \frac{1}{5}\binom{8}{4} = 70/5 = 14. The saddle-point asymptotics follow from the dominant singularity at x=1/4x = 1/4: Cn4n/(n3/2π)C_n \sim 4^n/(n^{3/2}\sqrt{\pi}).

Ramsey lower bound. For k=4k = 4: the bound gives R(4,4)>22=4R(4,4) > 2^2 = 4. In fact R(4,4)=18R(4,4) = 18; the first moment calculation only gives n<4n < 4, which is weak — the probabilistic bound is loose for small kk but becomes asymptotically tight in the logarithmic sense. For k=10k = 10: R(10,10)>25=32R(10,10) > 2^5 = 32, while the true value is unknown (between 798 and 23556). The lower bound is exponential in k/2k/2; the upper bound from the Erdős-Szekeres theorem is (2k2k1)4k/k\binom{2k-2}{k-1} \sim 4^k/\sqrt{k} — the gap between 2k/22^{k/2} and 4k4^k remains a major open problem.

Applying LLL to 3-SAT. A 3-CNF formula has each clause with k=3k=3 literals. Each clause is falsified by exactly one of 23=82^3 = 8 assignments; a random assignment falsifies it with probability p=1/8p = 1/8. Each clause shares variables with at most 3(l1)3 \cdot (l-1) other clauses where l2k2=2l \leq 2^{k-2} = 2 clauses per variable: d31=3d \leq 3 \cdot 1 = 3. Check: ep(d+1)=e(1/8)4=e/21.36>1ep(d+1) = e \cdot (1/8) \cdot 4 = e/2 \approx 1.36 > 1. The symmetric LLL fails here; the asymmetric LLL with variable-specific bounds is needed, confirming that 3-SAT satisfiability is more delicate than 4-SAT.

Connections

Where Your Intuition Breaks

The probabilistic lower bound gives R(k,k)>2k/2R(k,k) > 2^{k/2}, and the recursion upper bound gives R(k,k)4kR(k,k) \leq 4^k. These bounds are separated by an exponential, and for k5k \geq 5 the true value of R(k,k)R(k,k) is unknown. The gap has resisted decades of effort. The key misconception is that the probabilistic argument can be "made constructive" by analyzing its proof — it cannot. The proof shows a random coloring avoids a monochromatic clique with positive probability, but provides no way to identify which coloring is good. Derandomizing this argument (finding an explicit good coloring) is an open problem that would require fundamentally new combinatorial insights, not just tighter analysis of the probabilistic proof. The probabilistic method proves good objects are common without letting you find one.

💡Intuition

The probabilistic method is one of the most conceptually striking proof techniques in mathematics: you prove an object exists without finding it. Erdős used it to show Ramsey numbers grow at least exponentially, a bound that had no constructive proof for decades and still has no explicit construction matching it. The power comes from the union bound and linearity of expectation — tools so elementary that the conclusions seem disproportionately strong. The method works because most random objects have the desired property; the "bad" cases are rare and avoidable.

💡Intuition

Generating functions are the bridge between discrete combinatorics and complex analysis. The EGF of a combinatorial class determines its analytic properties: singularities control asymptotics (the radius of convergence of the EGF determines exponential growth rates), and the saddle-point method extracts precise asymptotic coefficients from the behavior near the dominant singularity. The EGF of permutations is 1/(1x)1/(1-x) (pole at 1, giving n!n! growth); the EGF of set partitions (Bell numbers) is eex1e^{e^x - 1} (essential singularity, giving super-exponential growth). This analytic structure, studied in analytic combinatorics, predicts asymptotic behavior without computing terms directly.

⚠️Warning

The probabilistic method proves existence but gives no algorithm. This gap between existence and constructibility is significant: for Ramsey colorings, we know good colorings exist for n<2k/2n < 2^{k/2} but cannot find them efficiently. The Moser–Tardos algorithmic LLL closes this gap: when the LLL condition holds, a simple randomized algorithm (resample any violated bad event) terminates in polynomial expected time and produces an outcome with no bad events. This constructive version has applications in combinatorial optimization, coding theory, and constraint satisfaction that the existential version alone cannot provide.

Enjoying these notes?

Get new lessons delivered to your inbox. No spam.