Martingales: Optional Stopping & Doob's Inequalities

A martingale formalizes the concept of a "fair game" — a stochastic process whose expected future value, given the present, equals its current value. Martingale theory provides the sharpest tools for controlling random processes: optional stopping determines when you can stop a process without changing its expected value, and Doob's inequalities bound the maximum of the process over time.

Concepts

A fair coin-flip game: you start with $100, win $1 on heads, lose $1 on tails. After any number of flips, your expected future wealth equals your current wealth — the game has no memory advantage, and no clever strategy changes your expectation. This is a martingale. The optional stopping theorem makes this rigorous: no betting rule, however elaborate, can turn a fair game into a profitable one by choosing when to stop.

Filtrations and Adapted Processes

A filtration $(\mathcal{F}_t)_{t \geq 0}$ is an increasing sequence of sigma-algebras representing the information available at time $t$ : $\mathcal{F}_s \subseteq \mathcal{F}_t$ for $s \leq t$ . A process $(X_t)$ is adapted to $(\mathcal{F}_t)$ if $X_t$ is $\mathcal{F}_t$ -measurable for each $t$ .

The natural filtration of $X$ is $\mathcal{F}_t^X = \sigma(X_s : s \leq t)$ — the information generated by the process itself.

Martingales: Definition and Examples

An adapted integrable process $(M_t)$ is a martingale with respect to $(\mathcal{F}_t)$ if:

$E[M_t \mid \mathcal{F}_s] = M_s \quad \text{for all } s \leq t.$

The filtration $(\mathcal{F}_t)$ in this definition is not a technicality — it is the precise specification of what "knowing the present" means. The same process can be a martingale with respect to one filtration and not another: $B_t^2 - t$ is a martingale with respect to the natural filtration of Brownian motion, but enlarging the filtration to include the future destroys this property. Stating which filtration makes a process fair is not optional; it is the entire content of the definition.

A supermartingale satisfies $E[M_t \mid \mathcal{F}_s] \leq M_s$ (expected to decrease); a submartingale satisfies $E[M_t \mid \mathcal{F}_s] \geq M_s$ (expected to increase).

Canonical examples:

Process	Type
$M_n = \sum_{i=1}^n X_i$ with $E[X_i] = 0$ and $X_i \perp \mathcal{F}_{i-1}$	Martingale
Brownian motion $B_t$	Martingale
$B_t^2 - t$	Martingale
$e^{\theta B_t - \theta^2 t/2}$ (exponential martingale)	Martingale
$-	M_n
$f(M_n)$ for convex $f$ (Jensen)	Submartingale

Martingale difference sequences: $(D_n)$ where $D_n = M_n - M_{n-1}$ satisfies $E[D_n \mid \mathcal{F}_{n-1}] = 0$ . The variance of $M_n = M_0 + \sum_{k=1}^n D_k$ satisfies $\text{Var}[M_n] = \sum_{k=1}^n E[D_k^2]$ (orthogonality of increments).

Optional Stopping Theorem

A stopping time $\tau$ is a random variable with $\{\tau \leq t\} \in \mathcal{F}_t$ for all $t$ — the decision to stop can only depend on current and past information, not the future.

Optional Stopping Theorem (OST): let $(M_n)$ be a martingale and $\tau$ a stopping time. If any of the following conditions hold:

$\tau$ is bounded: $\tau \leq N$ a.s. for some fixed $N$
$|M_{n \wedge \tau}| \leq C$ uniformly
$E[\tau] < \infty$ and the increments are bounded: $|M_n - M_{n-1}| \leq C$ a.s.

then $E[M_\tau] = E[M_0]$ .

Consequence: in any fair game, no stopping strategy can create an expected profit. The Gambler's Ruin is a direct application: a gambler starting at $x$ betting $\pm 1$ on a fair coin, stopping at $0$ or $N$ , has $E[M_\tau] = x$ , so the ruin probability from $x$ is $(N-x)/N$ (solving $E[M_\tau] = x = (1 - p_{\text{ruin}}) \cdot N + p_{\text{ruin}} \cdot 0$ ).

Doob's Inequalities

Doob's maximal inequality: for a non-negative submartingale $(M_n)_{n=0}^N$ :

$P\!\left(\max_{0 \leq k \leq N} M_k \geq \lambda\right) \leq \frac{E[M_N]}{\lambda}.$

For a martingale: $P(\max_{k \leq N} |M_k| \geq \lambda) \leq E[M_N^2] / \lambda^2$ (using Jensen, $M_k^2$ is a submartingale).

Doob's $L^p$ inequality: for $p > 1$ :

$E\!\left[\left(\max_{k \leq N} |M_k|\right)^p\right] \leq \left(\frac{p}{p-1}\right)^p E[|M_N|^p].$

The constant $(p/(p-1))^p$ is sharp (Doob, 1953). For $p=2$ : $E[\max_k M_k^2] \leq 4E[M_N^2]$ .

Martingale Convergence Theorems

Upcrossings: an upcrossing of $[a,b]$ by $(M_n)$ is a passage from below $a$ to above $b$ . Let $U_N(a,b)$ be the number of upcrossings by time $N$ .

Doob's upcrossing inequality: $E[U_N(a,b)] \leq \frac{E[(M_N - a)^+]}{b-a}$ .

Martingale convergence theorem: if $(M_n)$ is a supermartingale with $\sup_n E[M_n^-] < \infty$ , then $M_\infty = \lim_{n\to\infty} M_n$ exists a.s. The limit is finite a.s.

$L^2$ convergence: if $\sup_n E[M_n^2] < \infty$ , then $M_n \to M_\infty$ both a.s. and in $L^2$ .

Azuma-Hoeffding inequality: let $(M_n)$ be a martingale with bounded differences $|M_k - M_{k-1}| \leq c_k$ a.s. Then:

$P(M_n - M_0 \geq t) \leq \exp\!\left(-\frac{t^2}{2\sum_{k=1}^n c_k^2}\right).$

This is the fundamental concentration inequality for dependent variables and is the basis for proving generalization bounds in learning theory.

Worked Example

Example 1: Azuma-Hoeffding for Learned Models

Let $f(x_1, \ldots, x_n)$ be the empirical risk of a learned classifier. Define $M_k = E[f \mid x_1, \ldots, x_k]$ . Then $(M_k)$ is a martingale (by tower property), and if replacing one training point changes $f$ by at most $c/n$ (bounded sensitivity), then $|M_k - M_{k-1}| \leq c/n$ .

Azuma-Hoeffding gives:

$P(f - E[f] \geq t) \leq \exp\!\left(-\frac{t^2}{2n(c/n)^2}\right) = \exp\!\left(-\frac{nt^2}{2c^2}\right).$

This is the McDiarmid inequality for functions of independent variables. It implies that the empirical risk concentrates around its mean at rate $O(1/\sqrt{n})$ , matching VC theory but with sharper constants.

Example 2: Gambler's Ruin with Biased Coin

A gambler bets $\pm 1$ repeatedly, with win probability $p \neq 1/2$ . Define $M_n = \left(\frac{q}{p}\right)^{S_n}$ where $S_n$ is the current fortune. Then $M_n$ is a martingale: $E[M_{n+1} \mid S_n = s] = p(q/p)^{s+1} + q(q/p)^{s-1} = (q/p)^s = M_n$ .

Apply OST with $\tau = \min\{S_n = 0 \text{ or } S_n = N\}$ : $E[M_\tau] = E[M_0] = (q/p)^x$ . Since $M_\tau = (q/p)^0 = 1$ at ruin or $(q/p)^N$ at success:

$P(\text{ruin}) \cdot 1 + (1 - P(\text{ruin})) \cdot (q/p)^N = (q/p)^x.$

Solving: $P(\text{ruin}) = \frac{(q/p)^N - (q/p)^x}{(q/p)^N - 1}$ . For $p > 1/2$ (favorable game), $q/p < 1$ , so $(q/p)^N \to 0$ as $N \to \infty$ and $P(\text{ruin}) \to (q/p)^x$ : a favorable game still has positive ruin probability from finite capital.

Example 3: Martingales in SGD Analysis

Define the loss $L(\theta_t)$ at step $t$ of SGD. The stochastic gradient $g_t$ has $E[g_t \mid \theta_t] = \nabla L(\theta_t)$ and noise $\xi_t = g_t - \nabla L(\theta_t)$ satisfying $E[\xi_t \mid \theta_t] = 0$ .

The accumulated noise term $M_t = \sum_{k=0}^{t-1} \eta_k \xi_k$ is a martingale. Azuma-Hoeffding controls its fluctuations when gradient noise is bounded. The convergence rate of SGD — the $O(1/\sqrt{T})$ rate for convex problems — follows from balancing the deterministic progress term $-\eta_t \|\nabla L\|^2$ against the martingale noise term $\eta_t \xi_t$ .

Connections

Where Your Intuition Breaks

Optional stopping says you cannot profit in a fair game by choosing a clever stopping time — but the theorem requires the stopping time to be bounded, or the stopped process to be uniformly integrable. The gambler's doubling strategy (double the bet after each loss, stop after the first win) appears to always profit: the stopping time $\tau$ is almost surely finite. The flaw is not in the math but in the conditions: the maximum loss before stopping is unbounded, the process is not uniformly integrable, and the OST does not apply. More subtly, random walk on $\mathbb{Z}$ with one absorbing barrier has an almost-surely-finite stopping time with infinite expectation — technically satisfying the letter of some OST conditions while violating the spirit. The conditions of optional stopping are not formalities; they are exactly the hypotheses that prevent infinite-horizon exploitation.

💡Intuition

The optional stopping theorem formulates "no free lunch" precisely. Any stopping rule — doubling bets, stopping after wins, timing the market — cannot change the expected value of a fair game. This is not just an informal principle but a theorem with precise conditions. The conditions matter: if you allow unbounded stopping times and the process can grow, OST may fail (the St. Petersburg paradox involves an unbounded martingale). The theorem works exactly when you cannot "gain information from the future."

💡Intuition

Martingale differences are the right generalization of independence. Many concentration inequalities (Chernoff, Hoeffding) assume independence. Azuma-Hoeffding replaces independence with the martingale difference structure $E[D_k \mid \mathcal{F}_{k-1}] = 0$ , which allows arbitrarily complex dependence between $D_k$ and past variables. This is why martingale methods appear throughout machine learning theory: training examples are not always independent (active learning, online learning, Markov chain-sampled minibatches), but the error increment can still form a martingale difference sequence.

⚠️Warning

Doob's $L^p$ inequality has a sharp constant that blows up at $p=1$ . The constant $(p/(p-1))^p \to \infty$ as $p \to 1$ , so no analog exists for $p=1$ . There is no uniform bound on $E[\max_k |M_k|]$ in terms of $E[|M_N|]$ — the maximum can be much larger than the terminal value. This failure at $p=1$ is not a gap in the theorem; it reflects a genuine phenomenon: there exist martingales $(M_n)$ with $E[|M_N|] = 1$ but $E[\max_k |M_k|] = \infty$ . Processes with heavy tails require different tools (e.g., Burkholder-Davis-Gundy inequalities using the quadratic variation).

Enjoying these notes?

Get new lessons delivered to your inbox. No spam.

Continuous-Time Markov Chains & Poisson Processes

Brownian Motion & Gaussian Processes