Neural-Path/Notes
40 min

Extensive Form Games, Backward Induction & Subgame Perfection

Normal form games capture simultaneous decisions, but many strategic interactions are sequential: one player moves, the other observes and responds, and so on. Extensive form games model this temporal structure explicitly with a game tree, and the right solution concept — subgame perfect equilibrium — eliminates Nash equilibria that rely on threats no rational player would ever carry out, giving sharper and more behaviorally meaningful predictions.

Concepts

Minimax tree traversal with MAX nodes (root and depth 2) and MIN nodes (depth 1). Step through the algorithm.

MAXMINMINMAXMAXMAXMAX35291746???????
1/6
Game tree structure — MAX nodes (purple) maximize, MIN nodes (amber) minimize.

A chess position is not just a state — it encodes the entire history of moves, and your opponent will respond to your next move with whatever is best for them. Normal form games collapse this temporal structure into a single simultaneous choice, losing the sequential reasoning that drives real strategic interaction. Extensive form games restore it: a game tree records who moves when, what they know, and what they can do. The key question becomes not just "what is the equilibrium?" but "which player has the advantage from moving first, and which threats are credible enough to matter?"

Extensive Form: Trees, Strategies, and Information Sets

An extensive form game is a tuple (N,T,P,I,A,u)(N, T, P, \mathcal{I}, A, u) where:

  • TT is a rooted tree whose nodes are either decision nodes or terminal nodes (leaves).
  • P:TleavesN{0}P : T \setminus \text{leaves} \to N \cup \{0\} assigns each decision node to a player (00 = Nature).
  • Ii\mathcal{I}_i is a partition of player ii's nodes into information sets — nodes the player cannot distinguish between when it is their turn.
  • A(h)A(h) is the set of actions available at information set hh (the same for every node in hh).
  • ui:leavesRu_i : \text{leaves} \to \mathbb{R} are terminal payoffs.

A pure strategy for player ii is a complete contingent plan: a function mapping every information set of ii to an action. Crucially, strategies must specify behavior even at information sets that are never reached on the equilibrium path — this is what makes subgame perfection non-trivial.

Perfect information: every information set is a singleton — the acting player always knows the full history. Imperfect information: some information sets contain multiple nodes, capturing strategic uncertainty about opponents' past moves (e.g., simultaneous decisions within a sequential game).

Backward Induction and Zermelo's Theorem

In a finite perfect-information game, backward induction solves the game from the leaves upward. At each terminal node, payoffs are known. Moving one level up, each player at a decision node selects the action leading to their highest-payoff subtree (already solved). The algorithm runs in O(T)O(|T|) time.

Zermelo's Theorem. Every finite two-player perfect-information zero-sum game has a value, and each player has a pure strategy that guarantees at least that value. More generally, every finite perfect-information game (not necessarily zero-sum) has a pure strategy Nash equilibrium computable by backward induction.

Backward induction must process the entire tree — every decision node, including off-path nodes — because a strategy is defined at every information set, not just those reached in equilibrium. This is not redundancy: off-path behavior is what makes a threat credible or not. Subgame perfection depends entirely on what players would do at nodes they are "never supposed to reach," which only backward induction can determine.

In games without payoff ties (generic games), backward induction yields a unique outcome path, though strategy profiles are not unique because they must specify off-path behavior.

Subgame Perfect Equilibrium

A subgame is a subtree rooted at a single node vv that: (i) includes all successors of vv, and (ii) does not split any information set — every information set is either entirely inside or entirely outside the subgame.

A subgame perfect equilibrium (SPE) (Selten, 1965) is a strategy profile that induces a Nash equilibrium in every subgame — including those reached by deviations from the equilibrium path.

SPE eliminates Nash equilibria sustained by non-credible threats: a threat to take a costly action if a player deviates is credible only if carrying it out is actually optimal, given that future play is already determined by SPE in every continuation.

One-Shot Deviation Principle. A strategy profile is a SPE if and only if no player can profitably deviate at a single information set while conforming to the prescribed strategy everywhere else. This gives an efficient way to verify SPE without checking all possible deviations: it suffices to check single-step deviations at each information set, moving backward through the tree.

Imperfect Information and Bayesian Games

When information sets are non-singletons, backward induction cannot be applied naively because a player at a multi-node information set does not know which node they occupy — so they cannot compute a well-defined continuation value.

Perfect Bayesian Equilibrium (PBE) extends SPE: (i) players hold beliefs (probability distributions) over nodes within each information set, (ii) strategies are best responses given those beliefs, and (iii) beliefs are updated via Bayes' rule wherever possible on the equilibrium path. Off-equilibrium-path beliefs are only weakly constrained.

Bayesian games (Harsanyi, 1967) model incomplete information — players have private types drawn from a common prior. Each player's strategy is a mapping from types to actions. A Bayesian Nash equilibrium requires each type to play a best response in expectation over opponents' types. Sequential Bayesian games combine temporal structure with type uncertainty, leading to signaling games and cheap-talk equilibria.

Stackelberg Games: First-Mover Advantage

In a Stackelberg game, one player (the leader) commits to a strategy first, and the other (the follower) best-responds after observing the leader's choice. The game is solved by backward induction: compute the follower's best-response function, then optimize the leader's strategy over that function.

The leader generally earns at least as much as in the simultaneous Nash equilibrium — first-mover advantage — because commitment to a strategy preempts the follower's ability to influence the leader's choice. Stackelberg games model leadership in oligopoly, platform pricing, and (crucially for ML) adversarial robustness.

Worked Example

Three-Period Alternating-Offer Bargaining

Two players split a pie of size 1 over three periods. Player 1 proposes a split (s,1s)(s, 1-s) in period 1. Player 2 accepts or rejects. If rejected, Player 2 proposes in period 2; Player 1 accepts or rejects. If rejected, Player 1 proposes in period 3, which Player 2 must accept or both get zero.

Both players discount by δ(0,1)\delta \in (0,1) per period: a payoff of xx in period tt is worth δt1x\delta^{t-1} x in period 1 terms.

Period 3 (last node). Player 1 proposes; Player 2 accepts any s30s_3 \geq 0. Player 1 offers s3=1s_3 = 1, keeping everything. Player 2 gets 0.

Period 2. Player 2 proposes; Player 1 accepts iff the offer gives at least δ1=δ\delta \cdot 1 = \delta (the discounted value of getting everything in period 3). Player 2 therefore offers Player 1 exactly δ\delta and keeps 1δ1 - \delta.

Player 2's period-2 payoff (in period-1 terms) is δ(1δ)\delta(1 - \delta).

Period 1. Player 1 proposes. Player 2 accepts iff their share is at least δ(1δ)\delta \cdot (1 - \delta) (discounted value of the period-2 outcome). So Player 1 offers Player 2 exactly δ(1δ)\delta(1 - \delta) and keeps:

s1=1δ(1δ)=1δ+δ2.s_1^* = 1 - \delta(1 - \delta) = 1 - \delta + \delta^2.

Unique SPE: Player 1 proposes s1=1δ+δ2s_1^* = 1 - \delta + \delta^2; Player 2 accepts; bargaining ends in period 1.

First-mover advantage. As δ1\delta \to 1 (patient players), s111+1=1s_1^* \to 1 - 1 + 1 = 1 — but in the infinite-horizon Rubinstein (1982) game, the limit gives s1=1/(1+δ)s_1^* = 1/(1+\delta), showing a non-trivial first-mover advantage that shrinks as δ1\delta \to 1. With three periods, the asymmetry is stark: Player 1 always gets 1δ+δ2>1/21 - \delta + \delta^2 > 1/2 for δ<1\delta < 1 (since 1δ+δ21/2=(δ1/2)201 - \delta + \delta^2 - 1/2 = (\delta - 1/2)^2 \geq 0).

Centipede Game and the Backward Induction Paradox

Two players alternate deciding to Take (end the game) or Pass (continue). At round tt, taking gives the active player roughly 2t2^t and the other 2t12^{t-1}; passing grows both payoffs. With 6 rounds, always passing gives payoffs (64, 128).

Backward induction: at round 6, the active player takes (64 > 32). Knowing this, the round-5 player takes. Unraveling all the way back gives the unique SPE: take immediately at round 1, payoffs (2, 1).

The backward induction paradox: rational logic delivers (2, 1) when (64, 128) is available. Experimental evidence consistently shows humans pass several rounds. The paradox reflects the extreme common-knowledge-of-rationality assumption: each player must believe the other will unravel, even when doing so is individually costly and the other has historically cooperated. This motivates models of bounded rationality and quantal response equilibrium.

Connections

Where Your Intuition Breaks

The centipede game shows the sharpest failure of backward induction as a behavioral prediction: the unique SPE says "take immediately," yet the Pareto-dominant outcome (64, 128) is available by passing. The paradox is not that the theorem is wrong — it is a valid mathematical result. The paradox is that backward induction requires common knowledge of rationality at every node, including off-path nodes that would only be reached if someone behaved irrationally. Once you believe the other player might not be perfectly rational, the unraveling stops. Real players cooperate in centipede games because they do not actually know whether their opponent will defect, and the expected gain from cooperation outweighs the SPE risk. SPE is the right solution concept given its assumptions; the lesson is that those assumptions are strong.

💡Intuition

SPE as a credibility filter. Nash equilibria in extensive-form games can rely on threats that rational players would never carry out: "if you deviate I will punish you, even at cost to myself." SPE prunes these — a strategy is subgame perfect only if it is a best response in every subgame, including those reached by deviations. The one-shot deviation principle makes verification tractable: check only single-step deviations moving backward through the tree.

Solution conceptSequential rationalityHandles imperfect infoComputational tractability
Nash equilibriumNoYesPPAD-hard (general)
Subgame perfect eq.Yes (perfect info)PartialPolynomial (backward induction)
Perfect Bayesian eq.YesYesDepends on belief system
Sequential equilibriumYes (stronger consistency)YesMore constrained beliefs
⚠️Warning

Backward induction fails with imperfect information. When information sets contain multiple nodes, the player at that set does not know which node they occupy, so they cannot compute a well-defined continuation value. The correct tool is Perfect Bayesian Equilibrium or Sequential Equilibrium (Kreps and Wilson, 1982), which require explicit belief updating via Bayes' rule. Mechanically applying backward induction to an imperfect-information game yields an ill-defined result.

💡Intuition

Stackelberg structure appears throughout ML. Adversarial training is a Stackelberg game: the defender commits to a training procedure, then the attacker best-responds by finding adversarial examples. Hyperparameter optimization is Stackelberg: the outer loop (meta-learner) commits to hyperparameters, the inner loop (base learner) best-responds. Recognizing the Stackelberg structure clarifies what first-mover advantage each party holds and which player's optimization is treated as the inner problem.

Enjoying these notes?

Get new lessons delivered to your inbox. No spam.