Neural-Path/Notes
40 min

Zero-Sum Games & the Minimax Theorem

Zero-sum games capture pure competition: whatever one player gains, the other loses exactly. This structure is restrictive enough to admit a clean fundamental theorem — von Neumann's minimax theorem — yet rich enough to model everything from chess endgames to GAN training. The theorem's proof through linear programming duality makes it one of the most elegant connections between game theory and convex optimization.

Concepts

Minimax tree traversal with MAX nodes (root and depth 2) and MIN nodes (depth 1). Step through the algorithm.

MAXMINMINMAXMAXMAXMAX35291746???????
1/6
Game tree structure — MAX nodes (purple) maximize, MIN nodes (amber) minimize.

In chess, when you choose a move, your opponent will respond with the move that hurts you most. Then you respond with the move that hurts them most given their choice. Zero-sum games formalize this alternating reasoning: your gain is their loss exactly, so there is no room for mutually beneficial deals. Von Neumann's minimax theorem says this adversarial recursion always has a definite answer — a unique value that both players can guarantee simultaneously, no matter how deep the reasoning goes.

Zero-Sum Games and Matrix Representation

A two-player zero-sum game satisfies u1(a)+u2(a)=0u_1(a) + u_2(a) = 0 for all action profiles. Since payoffs sum to zero, the game is fully described by the row player's payoff matrix ARm×nA \in \mathbb{R}^{m \times n}, where m=A1m = |A_1| and n=A2n = |A_2|. Row player maximizes, column player minimizes.

Under mixed strategies xΔmx \in \Delta^m (row) and yΔny \in \Delta^n (column), the expected payoff to the row player is xAyx^\top A y. The game reduces to:

maxxΔm minyΔn xAyvs.minyΔn maxxΔm xAy.\max_{x \in \Delta^m}\ \min_{y \in \Delta^n}\ x^\top A y \quad \text{vs.} \quad \min_{y \in \Delta^n}\ \max_{x \in \Delta^m}\ x^\top A y.

The Minimax Theorem (von Neumann, 1928)

For any finite matrix game AA:

maxxΔm minyΔn xAy=minyΔn maxxΔm xAy=v\max_{x \in \Delta^m}\ \min_{y \in \Delta^n}\ x^\top A y = \min_{y \in \Delta^n}\ \max_{x \in \Delta^m}\ x^\top A y = v

where vv is called the value of the game. The minimax equals the maximin — the order of optimization does not matter. Both players have optimal mixed strategies that achieve vv simultaneously.

The equality of maximin and minimax is not obvious — in general, maxminminmax\max \min \leq \min \max (weak duality), and the gap can be strict. The zero-sum condition is what forces the gap to close: the row player's maximin LP and the column player's minimax LP are LP duals of each other, and LP strong duality says their optima coincide. Without zero-sum, the two players' optimization problems are not dual, and the game value need not exist.

Proof via LP Duality

The row player's optimization can be written as a linear program. Let vv be the guaranteed payoff:

Primal (row player):

maxx,vvs.t.ixiAijvjixi=1,x0\begin{aligned} \max_{x,v}\quad & v \\ \text{s.t.}\quad & \sum_i x_i A_{ij} \geq v \quad \forall j \\ & \sum_i x_i = 1,\quad x \geq 0 \end{aligned}

Dual (column player):

miny,wws.t.jAijyjwijyj=1,y0\begin{aligned} \min_{y,w}\quad & w \\ \text{s.t.}\quad & \sum_j A_{ij} y_j \leq w \quad \forall i \\ & \sum_j y_j = 1,\quad y \geq 0 \end{aligned}

Strong LP duality (which follows from the separating hyperplane theorem for convex sets) gives primal optimum == dual optimum, i.e., maximin == minimax =v= v. The dual variables (y,w)(y^*, w^*) are exactly the column player's optimal mixed strategy and the game value. LP duality is thus precisely the minimax theorem.

Saddle Points

A pair (x,y)(x^*, y^*) is a saddle point if

xAyxAyxAyxΔm, yΔn.x^\top A y^* \leq x^{*\top} A y^* \leq x^{*\top} A y \quad \forall x \in \Delta^m,\ y \in \Delta^n.

Saddle points in zero-sum games correspond exactly to Nash equilibria. The minimax theorem guarantees at least one saddle point exists in mixed strategies. In pure strategies, saddle points exist only when maximinjAij=minjmaxiAij\max_i \min_j A_{ij} = \min_j \max_i A_{ij} — otherwise mixing is required.

Alpha-Beta Pruning

In two-player zero-sum perfect-information games (chess, Go), the minimax value can be computed by full tree search, but the tree is exponential in depth. Alpha-beta pruning maintains bounds [α,β][\alpha, \beta] — the best values guaranteed for MAX and MIN respectively — and prunes subtrees that cannot affect the root value:

  • At a MAX node: if any child value β\geq \beta, prune remaining children (MIN would never allow this).
  • At a MIN node: if any child value α\leq \alpha, prune remaining children (MAX would never choose this path).

With optimal move ordering, alpha-beta reduces the branching factor from bb to b\sqrt{b}, enabling search to twice the depth for the same computation.

Worked Example

Rock-Paper-Scissors

The payoff matrix (row player's perspective, encoding win=1, loss=-1, draw=0):

A=(011101110)A = \begin{pmatrix} 0 & -1 & 1 \\ 1 & 0 & -1 \\ -1 & 1 & 0 \end{pmatrix}

Claim: x=y=(1/3,1/3,1/3)x^* = y^* = (1/3, 1/3, 1/3) is the unique Nash equilibrium with value v=0v = 0.

Verification. Under y=(1/3,1/3,1/3)y^* = (1/3, 1/3, 1/3), the row player's expected payoff from Rock is (1/3)(0)+(1/3)(1)+(1/3)(1)=0(1/3)(0) + (1/3)(-1) + (1/3)(1) = 0, and by symmetry Scissors and Paper also give 0. So the row player is indifferent — any mixed strategy is a best response, including x=(1/3,1/3,1/3)x^* = (1/3, 1/3, 1/3). Symmetric argument applies for the column player. Value v=0v = 0: the game is fair.

Solving a 2×3 Matrix Game via LP

Consider:

A=(310024)A = \begin{pmatrix} 3 & 1 & 0 \\ 0 & 2 & 4 \end{pmatrix}

Row player primal (let x1=px_1 = p, x2=1px_2 = 1-p):

3pvp+2(1p)v    2pv4(1p)v\begin{aligned} &3p \geq v \\ &p + 2(1-p) \geq v \implies 2 - p \geq v \\ &4(1-p) \geq v \end{aligned}

Binding constraints determine pp. Setting 3p=2p3p = 2 - p gives p=1/2p = 1/2, v=3/2v = 3/2. Check: 4(11/2)=2>3/24(1-1/2) = 2 > 3/2 — column 3 is not in support. The row player's optimal mix is (1/2,1/2)(1/2, 1/2) with value v=3/2v = 3/2. The column player mixes only over columns 1 and 2; solving 3q1+q2=2q23q_1 + q_2 = 2q_2 (both columns in support must give equal payoff to row player's mix) yields q1=1/4q_1 = 1/4, q2=3/4q_2 = 3/4.

Alpha-Beta on a Depth-3 Tree

Consider a game tree where the root is a MAX node at depth 0, two MIN nodes at depth 1, four MAX nodes at depth 2, and eight leaves. Leaves (left to right): 3, 5, 2, 9, 1, 7, 4, 6.

  • Depth-2 MAX nodes compute: max(3,5)=5, max(2,9)=9, max(1,7)=7, max(4,6)=6.
  • Left MIN node: evaluates first child (value 5), sets β=5\beta = 5. Evaluates second child (value 9). Since 9>59 > 5, MIN will never choose 9 — but both children needed here. Left MIN value = min(5,9) = 5.
  • Root MAX: after left child gives 5, sets α=5\alpha = 5.
  • Right MIN node: evaluates first child (value 7). Since 7α=57 \geq \alpha = 5, MIN will choose 7\leq 7. Evaluates second child (value 6). Right MIN = min(7,6) = 6. Root MAX = max(5,6) = 6.

In this example no pruning occurred because values were favorable. With worse ordering, the second depth-2 subtree under right MIN could have been pruned if its first leaf were 5\leq 5.

Connections

Where Your Intuition Breaks

The minimax theorem guarantees that optimal mixed strategies exist — but it says nothing about convergence. In GAN training, GG and DD run simultaneous gradient descent on minGmaxDV(D,G)\min_G \max_D V(D,G). The saddle point exists (von Neumann guarantees it), but simultaneous gradient descent does not converge to saddle points in zero-sum games unless the objective is convex-concave. In Rock-Paper-Scissors, gradient descent cycles perpetually around the (1/3,1/3,1/3)(1/3,1/3,1/3) equilibrium without ever reaching it. GAN training instability is exactly this: the minimax theorem is a theorem about existence, not dynamics. The gap between "a Nash equilibrium exists" and "gradient descent finds it" is where all the difficulty lives.

💡Intuition

The minimax theorem is the first fundamental theorem of game theory (von Neumann, 1928). Its proof via LP duality connects it to the separating hyperplane theorem in convex analysis: two disjoint convex sets can be separated by a hyperplane, and the optimal mixed strategy is exactly the separating hyperplane in payoff space. This places the minimax theorem firmly inside the convex analysis module's lineage.

PropertyPure strategiesMixed strategies
Saddle point existenceNot guaranteedAlways (minimax theorem)
Value computationMax-min scan, O(mn)O(mn)LP in O((m+n)1.5)O((m+n)^{1.5})
Uniqueness of strategiesWhen saddle point is pureNot in general
Strategic substitutesNot requiredSufficient for existence
💡Intuition

GANs are zero-sum games. The generator GG and discriminator DD play minGmaxDV(D,G)\min_G \max_D V(D,G), which has the minimax form. Von Neumann's theorem guarantees the saddle point exists (at pG=pdatap_G = p_{\text{data}}, D=1/2D^* = 1/2 everywhere). The difficulty is that simultaneous gradient descent — the algorithm used in practice — does not converge to Nash equilibria in zero-sum games except in the convex-concave case. GAN training instability is a direct consequence of this gap between existence and convergence.

⚠️Warning

The minimax theorem requires the zero-sum condition u1+u2=0u_1 + u_2 = 0. In non-zero-sum games, maxxminyu1(x,y)minymaxxu1(x,y)\max_x \min_y u_1(x, y) \neq \min_y \max_x u_1(x, y) in general — the minimax and maximin values diverge, and the concept of a "game value" breaks down. Generalizing minimax to non-zero-sum settings requires moving to Nash equilibrium, which lacks the clean LP duality structure and becomes PPAD-hard to compute.

Enjoying these notes?

Get new lessons delivered to your inbox. No spam.