Neural-Path/Notes
45 min

Combinatorial Optimization: Matching, Approximation Algorithms & Complexity

Combinatorial optimization asks for the best discrete structure — a tour, a subset, a schedule — from an exponentially large but finite set. Most natural problems are NP-hard, making exact algorithms impractical at scale. Approximation algorithms offer a compelling alternative: polynomial-time procedures with provable worst-case guarantees on solution quality.

Concepts

10/1510/1010/120/50/410/1010/1010/14sABCDt

Green edges carry flow; red edges are saturated (form the min-cut). Max-flow = min-cut = 20.

The question "find the shortest tour through all cities" sounds geometric, but its difficulty is combinatorial: there are (n1)!/2(n-1)!/2 possible tours, and no structure allows us to discard most of them without examination. Combinatorial optimization formalizes this: a finite but exponentially large search space, an objective to minimize, and the question of whether we can do better than exhaustive search. For most natural problems, we cannot — but approximation algorithms offer polynomial-time procedures with provable guarantees on how far the solution can be from optimal.

NP-Hardness and Complexity

A problem is NP-hard if every problem in NP reduces to it in polynomial time. An optimization problem is NP-hard if its decision version is NP-complete. NP-hardness rules out polynomial-time exact algorithms unless P = NP.

Key reductions chain as follows: 3-SATpIndependent SetpVertex CoverpSet CoverpTSP (decision)\text{3-SAT} \leq_p \text{Independent Set} \leq_p \text{Vertex Cover} \leq_p \text{Set Cover} \leq_p \text{TSP (decision)}. Each reduction takes a "yes" instance to a "yes" instance in polynomial time, so a polynomial solver for the target would solve the source.

An approximation algorithm with ratio ρ\rho produces a solution of cost at most ρOPT\rho \cdot \text{OPT} (for minimization) in polynomial time. The ratio ρ1\rho \geq 1 measures the worst-case multiplicative deviation from optimum.

The reduction chain is not just a theoretical classification — it is a constructive argument. Each reduction is an explicit polynomial-time function that maps "yes" instances to "yes" instances, so a polynomial algorithm for the target would immediately give a polynomial algorithm for every problem in the chain. This is why NP-hardness proofs close the door on exact polynomial algorithms: they show the problem encodes something as hard as Boolean satisfiability itself.

Traveling Salesman Problem

Given a complete graph KnK_n with edge weights d:ER0d : E \to \mathbb{R}_{\geq 0}, find the minimum-weight Hamiltonian cycle.

ILP formulation (Dantzig-Fulkerson-Johnson):

min  eEdexes.t.eδ(v)xe=2  v,  eδ(S)xe2  SV,  xe{0,1}\min \; \sum_{e \in E} d_e x_e \quad \text{s.t.} \quad \sum_{e \in \delta(v)} x_e = 2 \; \forall v, \; \sum_{e \in \delta(S)} x_e \geq 2 \; \forall \emptyset \subsetneq S \subsetneq V, \; x_e \in \{0,1\}

where δ(v)\delta(v) is the set of edges incident to vv and δ(S)\delta(S) is the cut set for SS. The subtour elimination constraints eδ(S)xe2\sum_{e \in \delta(S)} x_e \geq 2 prevent disconnected cycles — there are exponentially many of them, handled by cutting-plane separation.

Christofides' algorithm (for metric TSP, where triangle inequality holds):

  1. Compute a minimum spanning tree TT.
  2. Let OO be the set of odd-degree vertices in TT. Find a minimum-weight perfect matching MM on OO.
  3. Combine TMT \cup M to get an Eulerian multigraph. Find an Euler tour, then shortcut repeated vertices.

Approximation ratio: TOPT|T| \leq \text{OPT} (removing any edge from OPT gives a spanning tree). M12OPT|M| \leq \frac{1}{2}\text{OPT} (the optimal TSP tour restricted to OO gives a perfect matching on OO of cost OPT\leq \text{OPT}; since O|O| is even, take every other edge). Shortcuts only decrease cost by the triangle inequality. Total: 32OPT\frac{3}{2}\text{OPT}.

0/1 Knapsack and FPTAS

The 0/1 knapsack problem:

max  i=1nvixis.t.i=1nwixiW,  xi{0,1}\max \; \sum_{i=1}^n v_i x_i \quad \text{s.t.} \quad \sum_{i=1}^n w_i x_i \leq W, \; x_i \in \{0,1\}

Dynamic programming in O(nW)O(nW) time. Define dp[i][w]dp[i][w] = maximum value using items 1,,i1,\ldots,i with capacity exactly ww:

dp[i][w]=max(dp[i1][w],  dp[i1][wwi]+vi)dp[i][w] = \max(dp[i-1][w], \; dp[i-1][w - w_i] + v_i)

with dp[0][w]=0dp[0][w] = 0 for all ww. The answer is maxwdp[n][w]\max_w dp[n][w]. This is pseudo-polynomial: polynomial in nn and WW, but WW can be exponential in the input bit-length.

FPTAS (fully polynomial-time approximation scheme). Scale and round the values: v~i=vi/K\tilde{v}_i = \lfloor v_i / K \rfloor where K=εVmax/nK = \varepsilon V_{\max} / n. Solve the DP on scaled values in O(n2/ε)O(n^2 / \varepsilon) time. The optimal value of the rounded instance differs by at most εOPT\varepsilon \text{OPT}, giving a (1ε)(1-\varepsilon)-approximation for any ε>0\varepsilon > 0 in polynomial time.

Scheduling: Makespan Minimization

Given nn jobs with processing times p1,,pnp_1, \ldots, p_n and mm machines, assign jobs to minimize makespan Cmax=maxkjSkpjC_{\max} = \max_k \sum_{j \in S_k} p_j.

List scheduling (Graham, 1966): Assign each job greedily to the currently least-loaded machine.

Approximation ratio 2: When the last job jj^* finishes on machine kk at time CmaxC_{\max}, machine kk was the least loaded when jj^* was assigned, so:

Cmaxpj1mjpjOPT,pjOPTC_{\max} - p_{j^*} \leq \frac{1}{m}\sum_j p_j \leq \text{OPT}, \quad p_{j^*} \leq \text{OPT}

Thus Cmax2OPTC_{\max} \leq 2 \cdot \text{OPT}. Sorting jobs in decreasing order before assignment (LPT rule) improves this to 43OPT\frac{4}{3}\text{OPT}, and a PTAS achieves (1+ε)OPT(1+\varepsilon)\text{OPT} for any fixed ε>0\varepsilon > 0.

LP Relaxations and Integrality Gaps

The LP relaxation lower-bounds OPT and its integrality gap OPT/zLP\text{OPT}/z_{LP} (or zLP/OPTz_{LP}/\text{OPT} for maximization) limits how good an LP-based approximation can be.

For metric TSP, the Held-Karp LP relaxation (the natural LP without integrality) is conjectured to have integrality gap exactly 4/34/3 — matching Christofides' ratio. This is the content of the 4/34/3 conjecture, one of the most famous open problems in combinatorial optimization.

For vertex cover, the LP relaxation (set xv[0,1]x_v \in [0,1], minimize xv\sum x_v s.t. xu+xv1x_u + x_v \geq 1 for edges) has integrality gap 2\leq 2, leading to the standard 2-approximation by rounding all xv1/2x_v \geq 1/2 to 1.

Worked Example

0/1 Knapsack DP

Four items: (vi,wi){(3,2),(4,3),(5,4),(6,5)}(v_i, w_i) \in \{(3,2), (4,3), (5,4), (6,5)\}, capacity W=8W = 8.

DP table dp[i][w]dp[i][w] (rows = items added, cols = weight 0,,80,\ldots,8):

ii \ ww012345678
0000000000
1 (v=3,w=2)003333333
2 (v=4,w=3)003447777
3 (v=5,w=4)003457899
4 (v=6,w=5)00345791010

Optimal value = 10 at dp[4][7]dp[4][7] and dp[4][8]dp[4][8]. Backtrack: item 4 included (w=5w=5), remaining capacity 3; item 2 included (w=3w=3), value 4+6=104+6=10. Solution: items 2 and 4.

Christofides 3/2 Ratio Argument

For 4-city metric TSP with symmetric distances, suppose OPT=10\text{OPT} = 10.

  • MST cost T10|T| \leq 10 (delete any OPT edge to get a spanning tree).
  • Odd-degree vertices in TT: say 2 vertices. Minimum matching M|M| on these 2 vertices uses one OPT edge, so M5|M| \leq 5.
  • Euler tour of TMT \cup M: total weight 15\leq 15. Shortcut (triangle inequality): still 15\leq 15.
  • Ratio: 15/10=3/215/10 = 3/2. \checkmark

The matching bound is tight because the tour on the even-indexed OPT edges and the tour on odd-indexed OPT edges are both perfect matchings of OO, together covering OPT. The cheaper one costs OPT/2\leq \text{OPT}/2.

Connections

Where Your Intuition Breaks

A 3/23/2-approximation for TSP sounds like it leaves a 50% gap from optimal — but on real-world instances, heuristics like LKH (Lin-Kernighan-Helsgott) routinely find tours within 1–2% of OPT. The gap between the worst-case approximation ratio and typical performance is enormous. The reason: approximation ratios bound the adversarial case, not the typical case. On random Euclidean instances or structured graphs, the LP relaxation is nearly tight and the matching step adds little. The ratio 3/23/2 is tight on carefully constructed adversarial examples, not on the delivery routes, circuit board problems, or genome assembly instances that motivate TSP in practice. Always interpret approximation ratios as theoretical certificates, not predictions of average performance.

💡Intuition

The LP relaxation quality and approximation ratio are intimately linked. For vertex cover: the LP has integrality gap 2\leq 2 and the LP-rounding gives a 2-approximation. For TSP: Christofides' ratio 3/23/2 matches the conjectured LP gap. The Held-Karp LP gives a lower bound on OPT; if you can round its fractional solution to an integer tour losing a factor of at most ρ\rho, you immediately have a ρ\rho-approximation. Designing good LP relaxations is as important as designing the rounding scheme.

ProblemBest known approx. ratioLP gapNP-hard?
Metric TSP3/23/2 (Christofides)4/34/3 (conjectured)Yes
Vertex Cover22Yes
Set Coverlnn+1\ln n + 1lnn\ln nYes (inapprox.)
KnapsackFPTAS (1ε)(1-\varepsilon)1Yes
Makespan (PCmaxP \| C_{\max})PTASStrongly NP-hard
⚠️Warning

Not every NP-hard problem admits a good approximation. The PCP theorem (Arora et al., 1998) establishes that for problems like MAX-3-SAT, approximating within certain constants is NP-hard itself. For Set Cover, achieving ratio better than (1ε)lnn(1-\varepsilon)\ln n for any ε>0\varepsilon > 0 is NP-hard (Feige, 1998). Inapproximability results are as important as upper bounds: they tell you when to stop looking for better algorithms and start asking whether your problem has special structure.

Practical Takeaways

When facing a combinatorial optimization problem: (1) identify if it reduces to a polynomially-solvable special case (network flow, matching, shortest path); (2) if NP-hard, determine the best LP relaxation and its gap; (3) choose between exact ILP (branch-and-cut), a certified approximation, or problem-specific heuristics based on required optimality guarantees and instance size.

Enjoying these notes?

Get new lessons delivered to your inbox. No spam.