Continuous-Time Markov Chains & Poisson Processes

Continuous-time Markov chains model systems that jump between states at random times — from radioactive decay to queuing systems to neural spike trains. The Poisson process is the canonical example, and its properties underlie survival analysis, event modeling, and the connection between discrete probabilistic models and differential equations.

Concepts

A hospital emergency room, a packet router, a radioactive nucleus — all are systems that jump between states at random times, with no memory of how long they have been waiting. The exponential distribution's memorylessness is not a modeling convenience but a mathematical necessity: it is the unique continuous distribution where $P(\text{wait} > s+t \mid \text{wait} > s) = P(\text{wait} > t)$ . Continuous-time Markov chains are the precise framework for such systems, with the generator matrix $Q$ encoding the rate of every possible transition.

Continuous-Time Markov Chains: Generator and Kolmogorov Equations

A continuous-time Markov chain (CTMC) on state space $\mathcal{S}$ satisfies the Markov property in continuous time: for all $s < t$ , $P(X_t = j \mid X_u, u \leq s) = P(X_t = j \mid X_s)$ .

The chain is characterized by the generator matrix (or $Q$ -matrix): $Q_{ij} \geq 0$ for $i \neq j$ (transition rate from $i$ to $j$ ), and $Q_{ii} = -\sum_{j \neq i} Q_{ij}$ (so each row sums to zero). The holding time in state $i$ is exponential with rate $q_i = -Q_{ii} = \sum_{j \neq i} Q_{ij}$ .

The transition semigroup $P(t) = e^{tQ}$ (matrix exponential) satisfies $P(s+t) = P(s)P(t)$ (Chapman-Kolmogorov). The probability vector $\mu(t) = \mu(0) e^{tQ}$ evolves as:

$\frac{d\mu}{dt} = \mu Q \quad \text{(Kolmogorov forward equation)}.$

Equivalently, $\frac{d}{dt}P(t) = P(t)Q = QP(t)$ .

The matrix exponential $P(t) = e^{tQ}$ is forced by two requirements working together: the Chapman-Kolmogorov equation $P(s+t) = P(s)P(t)$ (from the Markov property) and differentiability at $t = 0$ (from the exponential holding-time distribution). Any solution satisfying both is a matrix exponential; the generator $Q = P'(0)$ is the unique finite object encoding instantaneous transition rates. Without $Q$ , you would need to specify $P(t)$ for every $t$ separately — with $Q$ , one matrix determines all finite-time probabilities.

Stationary distribution: $\pi Q = 0$ , i.e., $\sum_i \pi_i Q_{ij} = 0$ for all $j$ . Detailed balance: $\pi_i Q_{ij} = \pi_j Q_{ji}$ .

Relationship to discrete-time chain: the embedded chain has transitions $P_{ij} = Q_{ij}/q_i$ for $i \neq j$ . The CTMC visits states in the order of the embedded chain but spends exponential holding times at each.

The Poisson Process

The Poisson process $N(t)$ with rate $\lambda$ is the canonical CTMC with states $\mathcal{S} = \{0, 1, 2, \ldots\}$ and generator $Q_{i,i+1} = \lambda$ , $Q_{ii} = -\lambda$ (only upward jumps).

Definition via inter-arrivals: inter-arrival times $T_1, T_2, \ldots$ are iid $\text{Exp}(\lambda)$ ; $N(t) = \max\{n : T_1 + \cdots + T_n \leq t\}$ .

Marginal distribution: $N(t) \sim \text{Poisson}(\lambda t)$ , i.e., $P(N(t) = k) = e^{-\lambda t}(\lambda t)^k / k!$ .

Independent increments: for $s < t$ , $N(t) - N(s) \sim \text{Poisson}(\lambda(t-s))$ and is independent of $\mathcal{F}_s$ .

Superposition: the merge of $k$ independent Poisson processes with rates $\lambda_1, \ldots, \lambda_k$ is Poisson with rate $\sum \lambda_i$ .

Thinning: independently keep each arrival with probability $p$ ; the result is Poisson with rate $\lambda p$ .

Conditioning: given $N(t) = n$ , the $n$ arrival times are distributed as the order statistics of $n$ iid $\text{Uniform}[0,t]$ random variables.

Birth-Death Chains

A birth-death chain has states $\{0, 1, 2, \ldots\}$ with transitions only between adjacent states: birth rate $\lambda_i$ from state $i$ to $i+1$ , death rate $\mu_i$ from $i$ to $i-1$ , with $\mu_0 = 0$ .

The detailed balance equations reduce to a product formula for the stationary distribution:

$\pi_i = \pi_0 \prod_{k=0}^{i-1} \frac{\lambda_k}{\mu_{k+1}}.$

M/M/1 queue: arrivals at rate $\lambda$ , service at rate $\mu$ , $\rho = \lambda/\mu$ . Stationary distribution: $\pi_i = (1-\rho)\rho^i$ for $\rho < 1$ . Mean queue length: $E[N] = \rho/(1-\rho)$ ; mean wait time (Little's law): $E[W] = 1/(\mu - \lambda)$ .

Renewal Theory

A renewal process generalizes Poisson by allowing non-exponential inter-arrival times $T_i \sim F$ with mean $\mu_T = E[T_i]$ .

Renewal theorem: $m(t) = E[N(t)] \to t/\mu_T$ as $t \to \infty$ .

Inspection paradox: the renewal interval containing a fixed time $t$ has length distribution stochastically greater than $F$ . This is why buses always seem late — you are more likely to arrive during a long gap.

Worked Example

Example 1: Waiting Times for Rare Events

In a network, packet errors occur as a Poisson process with rate $\lambda = 0.01$ errors/second.

Mean time to first error: $E[T_1] = 1/\lambda = 100$ seconds.

Probability of error-free period of length $t = 60$ s: $P(N(60) = 0) = e^{-0.6} \approx 0.55$ .

Expected errors in 1000 seconds: $E[N(1000)] = 10$ , with $\text{Var}[N(1000)] = 10$ (Poisson: mean = variance).

Overdispersion check: if observed variance $\gg$ mean, the Poisson assumption fails (e.g., bursty errors form a clustered process). Variance-to-mean ratio $= 1$ is the Poisson diagnostic.

Example 2: M/M/1 Queue Near Saturation

A web server handles requests: $\lambda = 90$ /sec, $\mu = 100$ /sec, $\rho = 0.9$ .

Mean queue length: $E[N] = 0.9/0.1 = 9$ requests. Mean response time: $E[W] = 1/(100-90) = 0.1$ seconds.

At $\rho = 0.99$ : $E[N] = 99$ , $E[W] = 1$ second — 10× worse for a 10% load increase. Near $\rho = 1$ , the system enters congestion collapse. Real CDN capacity planning provisions $\rho < 0.7$ to maintain sub-100ms latency.

Example 3: Gillespie Algorithm for Chemical Kinetics

A system has two species $A$ and $B$ with reactions $A \xrightarrow{k_1} B$ and $B \xrightarrow{k_2} A$ . At state $(n_A, n_B)$ , the total rate is $r = k_1 n_A + k_2 n_B$ .

The Gillespie algorithm: time to next event $\Delta t \sim \text{Exp}(r)$ ; event type $A \to B$ with probability $k_1 n_A / r$ . The stationary distribution is binomial: $\pi_{n_A} \propto \binom{N}{n_A}(k_1/k_2)^{n_A}$ . This is exact stochastic kinetics with no mean-field approximation.

Connections

Where Your Intuition Breaks

The Poisson process is memoryless in the sense that the waiting time for the next arrival is independent of how long you have been waiting. But this does not mean the process is memoryless about the total count: $N(t)$ grows steadily with time, and the process is not stationary. More subtly, the inspection paradox undermines the intuition that "a random time lands in an interval of average length": if you sample a uniformly random time in $[0, T]$ , you are more likely to land inside a long inter-arrival interval than a short one. The average length of the interval containing your random time is $E[T_1^2]/E[T_1]$ — strictly greater than $E[T_1]$ unless all intervals are equal. The memorylessness of the exponential eliminates the inspection paradox for the Poisson process specifically, but it returns for any renewal process with non-exponential inter-arrivals.

💡Intuition

The generator $Q$ is the continuous-time analog of $P - I$ . The Kolmogorov forward equation $\dot\mu = \mu Q$ is the probability analog of a first-order linear ODE. When $Q$ has eigenvalues $-\gamma_k$ with $\gamma_k > 0$ , the transient terms decay as $e^{-\gamma_k t}$ , and the system relaxes to equilibrium at rate $\gamma_1$ — the spectral gap of the generator, analogous to the discrete case. This makes eigendecomposition of $Q$ the primary tool for analyzing relaxation times.

💡Intuition

Poisson processes are maximally unstructured arrival streams. The Poisson process is the unique renewal process with both independent and stationary increments simultaneously. It is the maximum entropy point process for a given mean rate — knowing only the mean intensity fully specifies the distribution. This is why Poisson is the default model for counts of independent events: it uses the least information beyond the mean. Departures from Poisson (overdispersion, clustering, periodic structure) are meaningful signals about structure in the data-generating process.

⚠️Warning

Little's law is exact but requires stationarity. $L = \lambda W$ (average number in system = arrival rate × average time in system) holds for any stable queueing system under very general assumptions — not just M/M/1. But it requires the system to be in steady state. During transients (e.g., a server starting up or experiencing a traffic spike), $L = \lambda W$ can fail dramatically. In production observability, computing latency from throughput and queue depth via Little's law is valid only if the system is stationary; measurement during traffic spikes gives misleading estimates.

Enjoying these notes?

Get new lessons delivered to your inbox. No spam.

Discrete-Time Markov Chains: Stationarity, Ergodicity & Mixing Times

Martingales: Optional Stopping & Doob's Inequalities