Combinatorial Optimization: Matching, Approximation Algorithms & Complexity

Combinatorial optimization asks for the best discrete structure — a tour, a subset, a schedule — from an exponentially large but finite set. Most natural problems are NP-hard, making exact algorithms impractical at scale. Approximation algorithms offer a compelling alternative: polynomial-time procedures with provable worst-case guarantees on solution quality.

Concepts

Green edges carry flow; red edges are saturated (form the min-cut). Max-flow = min-cut = 20.

The question "find the shortest tour through all cities" sounds geometric, but its difficulty is combinatorial: there are $(n-1)!/2$ possible tours, and no structure allows us to discard most of them without examination. Combinatorial optimization formalizes this: a finite but exponentially large search space, an objective to minimize, and the question of whether we can do better than exhaustive search. For most natural problems, we cannot — but approximation algorithms offer polynomial-time procedures with provable guarantees on how far the solution can be from optimal.

NP-Hardness and Complexity

A problem is NP-hard if every problem in NP reduces to it in polynomial time. An optimization problem is NP-hard if its decision version is NP-complete. NP-hardness rules out polynomial-time exact algorithms unless P = NP.

Key reductions chain as follows: $\text{3-SAT} \leq_p \text{Independent Set} \leq_p \text{Vertex Cover} \leq_p \text{Set Cover} \leq_p \text{TSP (decision)}$ . Each reduction takes a "yes" instance to a "yes" instance in polynomial time, so a polynomial solver for the target would solve the source.

An approximation algorithm with ratio $\rho$ produces a solution of cost at most $\rho \cdot \text{OPT}$ (for minimization) in polynomial time. The ratio $\rho \geq 1$ measures the worst-case multiplicative deviation from optimum.

The reduction chain is not just a theoretical classification — it is a constructive argument. Each reduction is an explicit polynomial-time function that maps "yes" instances to "yes" instances, so a polynomial algorithm for the target would immediately give a polynomial algorithm for every problem in the chain. This is why NP-hardness proofs close the door on exact polynomial algorithms: they show the problem encodes something as hard as Boolean satisfiability itself.

Traveling Salesman Problem

Given a complete graph $K_n$ with edge weights $d : E \to \mathbb{R}_{\geq 0}$ , find the minimum-weight Hamiltonian cycle.

ILP formulation (Dantzig-Fulkerson-Johnson):

\min \; \sum_{e \in E} d_e x_e \quad \text{s.t.} \quad \sum_{e \in \delta(v)} x_e = 2 \; \forall v, \; \sum_{e \in \delta(S)} x_e \geq 2 \; \forall \emptyset \subsetneq S \subsetneq V, \; x_e \in \{0,1\}

where $\delta(v)$ is the set of edges incident to $v$ and $\delta(S)$ is the cut set for $S$ . The subtour elimination constraints $\sum_{e \in \delta(S)} x_e \geq 2$ prevent disconnected cycles — there are exponentially many of them, handled by cutting-plane separation.

Christofides' algorithm (for metric TSP, where triangle inequality holds):

Compute a minimum spanning tree $T$ .
Let $O$ be the set of odd-degree vertices in $T$ . Find a minimum-weight perfect matching $M$ on $O$ .
Combine $T \cup M$ to get an Eulerian multigraph. Find an Euler tour, then shortcut repeated vertices.

Approximation ratio: $|T| \leq \text{OPT}$ (removing any edge from OPT gives a spanning tree). $|M| \leq \frac{1}{2}\text{OPT}$ (the optimal TSP tour restricted to $O$ gives a perfect matching on $O$ of cost $\leq \text{OPT}$ ; since $|O|$ is even, take every other edge). Shortcuts only decrease cost by the triangle inequality. Total: $\frac{3}{2}\text{OPT}$ .

0/1 Knapsack and FPTAS

The 0/1 knapsack problem:

\max \; \sum_{i=1}^n v_i x_i \quad \text{s.t.} \quad \sum_{i=1}^n w_i x_i \leq W, \; x_i \in \{0,1\}

Dynamic programming in $O(nW)$ time. Define $dp[i][w]$ = maximum value using items $1,\ldots,i$ with capacity exactly $w$ :

dp[i][w] = \max(dp[i-1][w], \; dp[i-1][w - w_i] + v_i)

with $dp[0][w] = 0$ for all $w$ . The answer is $\max_w dp[n][w]$ . This is pseudo-polynomial: polynomial in $n$ and $W$ , but $W$ can be exponential in the input bit-length.

FPTAS (fully polynomial-time approximation scheme). Scale and round the values: $\tilde{v}_i = \lfloor v_i / K \rfloor$ where $K = \varepsilon V_{\max} / n$ . Solve the DP on scaled values in $O(n^2 / \varepsilon)$ time. The optimal value of the rounded instance differs by at most $\varepsilon \text{OPT}$ , giving a $(1-\varepsilon)$ -approximation for any $\varepsilon > 0$ in polynomial time.

Scheduling: Makespan Minimization

Given $n$ jobs with processing times $p_1, \ldots, p_n$ and $m$ machines, assign jobs to minimize makespan $C_{\max} = \max_k \sum_{j \in S_k} p_j$ .

List scheduling (Graham, 1966): Assign each job greedily to the currently least-loaded machine.

Approximation ratio 2: When the last job $j^*$ finishes on machine $k$ at time $C_{\max}$ , machine $k$ was the least loaded when $j^*$ was assigned, so:

C_{\max} - p_{j^*} \leq \frac{1}{m}\sum_j p_j \leq \text{OPT}, \quad p_{j^*} \leq \text{OPT}

Thus $C_{\max} \leq 2 \cdot \text{OPT}$ . Sorting jobs in decreasing order before assignment (LPT rule) improves this to $\frac{4}{3}\text{OPT}$ , and a PTAS achieves $(1+\varepsilon)\text{OPT}$ for any fixed $\varepsilon > 0$ .

LP Relaxations and Integrality Gaps

The LP relaxation lower-bounds OPT and its integrality gap $\text{OPT}/z_{LP}$ (or $z_{LP}/\text{OPT}$ for maximization) limits how good an LP-based approximation can be.

For metric TSP, the Held-Karp LP relaxation (the natural LP without integrality) is conjectured to have integrality gap exactly $4/3$ — matching Christofides' ratio. This is the content of the $4/3$ conjecture, one of the most famous open problems in combinatorial optimization.

For vertex cover, the LP relaxation (set $x_v \in [0,1]$ , minimize $\sum x_v$ s.t. $x_u + x_v \geq 1$ for edges) has integrality gap $\leq 2$ , leading to the standard 2-approximation by rounding all $x_v \geq 1/2$ to 1.

Worked Example

0/1 Knapsack DP

Four items: $(v_i, w_i) \in \{(3,2), (4,3), (5,4), (6,5)\}$ , capacity $W = 8$ .

DP table $dp[i][w]$ (rows = items added, cols = weight $0,\ldots,8$ ):

$i$ \ $w$	2	3	4	5	6	7	8
0	0	0	0	0	0	0	0
1 (v=3,w=2)	3	3	3	3	3	3	3
2 (v=4,w=3)	3	4	4	7	7	7	7
3 (v=5,w=4)	3	4	5	7	8	9	9
4 (v=6,w=5)	3	4	5	7	9	10	10

Optimal value = 10 at $dp[4][7]$ and $dp[4][8]$ . Backtrack: item 4 included ( $w=5$ ), remaining capacity 3; item 2 included ( $w=3$ ), value $4+6=10$ . Solution: items 2 and 4.

Christofides 3/2 Ratio Argument

For 4-city metric TSP with symmetric distances, suppose $\text{OPT} = 10$ .

MST cost $|T| \leq 10$ (delete any OPT edge to get a spanning tree).
Odd-degree vertices in $T$ : say 2 vertices. Minimum matching $|M|$ on these 2 vertices uses one OPT edge, so $|M| \leq 5$ .
Euler tour of $T \cup M$ : total weight $\leq 15$ . Shortcut (triangle inequality): still $\leq 15$ .
Ratio: $15/10 = 3/2$ . $\checkmark$

The matching bound is tight because the tour on the even-indexed OPT edges and the tour on odd-indexed OPT edges are both perfect matchings of $O$ , together covering OPT. The cheaper one costs $\leq \text{OPT}/2$ .

Connections

Where Your Intuition Breaks

A $3/2$ -approximation for TSP sounds like it leaves a 50% gap from optimal — but on real-world instances, heuristics like LKH (Lin-Kernighan-Helsgott) routinely find tours within 1–2% of OPT. The gap between the worst-case approximation ratio and typical performance is enormous. The reason: approximation ratios bound the adversarial case, not the typical case. On random Euclidean instances or structured graphs, the LP relaxation is nearly tight and the matching step adds little. The ratio $3/2$ is tight on carefully constructed adversarial examples, not on the delivery routes, circuit board problems, or genome assembly instances that motivate TSP in practice. Always interpret approximation ratios as theoretical certificates, not predictions of average performance.

💡Intuition

The LP relaxation quality and approximation ratio are intimately linked. For vertex cover: the LP has integrality gap $\leq 2$ and the LP-rounding gives a 2-approximation. For TSP: Christofides' ratio $3/2$ matches the conjectured LP gap. The Held-Karp LP gives a lower bound on OPT; if you can round its fractional solution to an integer tour losing a factor of at most $\rho$ , you immediately have a $\rho$ -approximation. Designing good LP relaxations is as important as designing the rounding scheme.

Problem	Best known approx. ratio	LP gap	NP-hard?
Metric TSP	$3/2$ (Christofides)	$4/3$ (conjectured)	Yes
Vertex Cover	2	2	Yes
Set Cover	$\ln n + 1$	$\ln n$	Yes (inapprox.)
Knapsack	FPTAS $(1-\varepsilon)$	1	Yes
Makespan ( $P \\| C_{\max}$ )	PTAS	—	Strongly NP-hard

⚠️Warning

Not every NP-hard problem admits a good approximation. The PCP theorem (Arora et al., 1998) establishes that for problems like MAX-3-SAT, approximating within certain constants is NP-hard itself. For Set Cover, achieving ratio better than $(1-\varepsilon)\ln n$ for any $\varepsilon > 0$ is NP-hard (Feige, 1998). Inapproximability results are as important as upper bounds: they tell you when to stop looking for better algorithms and start asking whether your problem has special structure.

Practical Takeaways

When facing a combinatorial optimization problem: (1) identify if it reduces to a polynomially-solvable special case (network flow, matching, shortest path); (2) if NP-hard, determine the best LP relaxation and its gap; (3) choose between exact ILP (branch-and-cut), a certified approximation, or problem-specific heuristics based on required optimality guarantees and instance size.

Enjoying these notes?

Get new lessons delivered to your inbox. No spam.

Network Flows: Max-Flow, Min-Cost & Transportation Problems

Bridge: Dynamic Programming, Bellman Equations & Neural Architecture Search