Itô Calculus & Stochastic Differential Equations

Itô calculus extends differentiation and integration to stochastic processes driven by Brownian motion. The key departure from classical calculus is that Brownian paths have nonzero quadratic variation — $(dB)^2 = dt$ — which generates an extra second-derivative term in the chain rule and makes stochastic differential equations a distinct object from their deterministic counterparts.

Concepts

Classical calculus was built for smooth curves — functions you can zoom in on until they look like straight lines. Brownian motion breaks this: zoom in on any Brownian path at any scale and it looks just as jagged. The total length of a Brownian path over any interval is infinite, so ordinary Riemann integration fails entirely. Itô calculus rebuilds integration from scratch for these paths, and the price of doing so is a correction term in the chain rule that has no classical analogue — not a small perturbation, but a term of the same order as the drift.

The Itô Integral

The classical Riemann-Stieltjes integral $\int_0^T f(t)\,dB_t$ fails because Brownian motion has infinite total variation. The Itô integral is defined as an $L^2$ limit of left-Riemann sums over adapted integrands:

$\int_0^T H_t\,dB_t = \lim_{\|\Pi\| \to 0} \sum_{k} H_{t_k}(B_{t_{k+1}} - B_{t_k}),$

where $H$ is adapted to $(\mathcal{F}_t)$ and $E[\int_0^T H_t^2\,dt] < \infty$ . The left endpoint (vs midpoint or right endpoint) is essential: it gives the integral its martingale property.

The left endpoint evaluation is the unique choice that makes $\int_0^t H_s\,dB_s$ a martingale — the mathematical expression of "no foresight." Using the midpoint (Stratonovich) yields a different integral that obeys the classical chain rule but loses the martingale property. This is not a convention; the left endpoint is the specific choice that preserves the probabilistic structure needed to analyze stochastic systems, while the Stratonovich convention is preferred in physics where the chain rule must hold for physical-noise approximations.

Itô isometry: $E\!\left[\left(\int_0^T H_t\,dB_t\right)^2\right] = E\!\left[\int_0^T H_t^2\,dt\right].$

The Itô integral $M_t = \int_0^t H_s\,dB_s$ is a martingale (not just a local martingale) when $E[\int_0^T H_s^2\,ds] < \infty$ .

Contrast with Stratonovich integral: $\int_0^T H_t \circ dB_t = \lim \sum H_{(t_k+t_{k+1})/2}(B_{t_{k+1}}-B_{t_k})$ uses midpoints. Stratonovich satisfies the classical chain rule but loses the martingale property. Itô is used in finance (self-financing portfolios); Stratonovich appears in physics (Wong-Zakai theorem for physical noise).

Itô's Lemma (Stochastic Chain Rule)

Let $X_t$ be an Itô process: $dX_t = \mu_t\,dt + \sigma_t\,dB_t$ . For $f \in C^2$ :

$df(X_t) = f'(X_t)\,dX_t + \frac{1}{2}f''(X_t)\,d[X,X]_t = f'(X_t)(\mu_t\,dt + \sigma_t\,dB_t) + \frac{1}{2}f''(X_t)\sigma_t^2\,dt.$

The Itô correction term $\frac{1}{2}f''(X_t)\sigma_t^2\,dt$ arises from the quadratic variation $d[X,X]_t = \sigma_t^2\,dt$ . In the formal Itô multiplication table: $(dt)^2 = 0$ , $dt \cdot dB_t = 0$ , $(dB_t)^2 = dt$ .

Multidimensional Itô lemma: for $\mathbf{X}_t \in \mathbb{R}^d$ with $d\mathbf{X}_t = \boldsymbol{\mu}_t\,dt + \boldsymbol{\sigma}_t\,d\mathbf{B}_t$ ( $\mathbf{B}_t \in \mathbb{R}^m$ ):

$df(t, \mathbf{X}_t) = \frac{\partial f}{\partial t}\,dt + \nabla f \cdot d\mathbf{X}_t + \frac{1}{2}\text{tr}(\boldsymbol{\sigma}_t \boldsymbol{\sigma}_t^T \nabla^2 f)\,dt.$

The Hessian term $\frac{1}{2}\text{tr}(\Sigma_t \nabla^2 f)$ involves the instantaneous covariance matrix $\Sigma_t = \boldsymbol{\sigma}_t \boldsymbol{\sigma}_t^T$ .

Stochastic Differential Equations

A stochastic differential equation (SDE) in Itô form is:

$dX_t = \mu(X_t, t)\,dt + \sigma(X_t, t)\,dB_t, \quad X_0 = x_0.$

The drift $\mu$ and diffusion coefficient $\sigma$ can depend on the current state and time. The SDE is a shorthand for the integral equation $X_t = x_0 + \int_0^t \mu(X_s, s)\,ds + \int_0^t \sigma(X_s, s)\,dB_s$ .

Existence and uniqueness (strong solutions): if $\mu$ and $\sigma$ satisfy a Lipschitz condition in $x$ and a linear growth bound, then a unique strong solution exists.

Fokker-Planck equation: the probability density $p(x, t)$ of the solution satisfies:

$\frac{\partial p}{\partial t} = -\frac{\partial}{\partial x}[\mu(x,t)p] + \frac{1}{2}\frac{\partial^2}{\partial x^2}[\sigma(x,t)^2 p].$

This is the forward equation — it evolves the density forward in time given the SDE coefficients.

Important SDEs:

SDE	Name	Application
$dX_t = \mu X_t\,dt + \sigma X_t\,dB_t$	Geometric Brownian Motion	Stock prices (Black-Scholes)
$dX_t = \theta(\mu - X_t)\,dt + \sigma\,dB_t$	Ornstein-Uhlenbeck	Mean-reverting process
$dX_t = -\nabla U(X_t)\,dt + \sqrt{2T}\,dB_t$	Langevin SDE	MCMC sampling
$dX_t = f(X_t)\,dt + \sigma\,dB_t$	General Itô diffusion	Neural SDEs, physics simulation

The Feynman-Kac Formula

For the SDE $dX_t = \mu(X_t)\,dt + \sigma(X_t)\,dB_t$ with terminal condition $g$ , the function:

$u(x, t) = E_x\!\left[g(X_T) e^{-\int_t^T r(X_s)\,ds}\right]$

satisfies the backward Kolmogorov PDE (Feynman-Kac):

$\frac{\partial u}{\partial t} + \mu(x)\frac{\partial u}{\partial x} + \frac{1}{2}\sigma(x)^2 \frac{\partial^2 u}{\partial x^2} - r(x)u = 0, \quad u(x,T) = g(x).$

This converts the PDE into a stochastic expectation — the basis of Monte Carlo PDE solvers and the Black-Scholes option pricing formula (GBM with $r =$ risk-free rate, $g = \max(S_T - K, 0)$ ).

Worked Example

Example 1: Geometric Brownian Motion and Log-Normal Prices

Apply Itô's lemma to $f(X_t) = \log X_t$ where $dX_t = \mu X_t\,dt + \sigma X_t\,dB_t$ :

$d(\log X_t) = \frac{1}{X_t}dX_t - \frac{1}{2}\frac{1}{X_t^2}(dX_t)^2 = \mu\,dt + \sigma\,dB_t - \frac{1}{2}\sigma^2\,dt = \left(\mu - \frac{\sigma^2}{2}\right)dt + \sigma\,dB_t.$

Integrating: $\log X_t = \log X_0 + (\mu - \sigma^2/2)t + \sigma B_t$ , so $X_t = X_0 \exp((\mu - \sigma^2/2)t + \sigma B_t)$ .

$X_t$ is log-normal: $E[X_t] = X_0 e^{\mu t}$ , $\text{Var}[X_t] = X_0^2 e^{2\mu t}(e^{\sigma^2 t} - 1)$ .

The Itô correction $-\sigma^2/2$ in the drift of $\log X_t$ explains why $E[\log X_t] = \log X_0 + (\mu - \sigma^2/2)t < \log E[X_t]$ — Jensen's inequality for the concave log function. A naive "apply log to the SDE" without Itô's lemma would miss this term.

Example 2: Ornstein-Uhlenbeck Process

$dX_t = \theta(\mu - X_t)\,dt + \sigma\,dB_t$ , $X_0 = x_0$ . This SDE has a linear drift and additive noise. The exact solution:

$X_t = \mu + (x_0 - \mu)e^{-\theta t} + \sigma \int_0^t e^{-\theta(t-s)}\,dB_s.$

Stationary distribution: $X_\infty \sim \mathcal{N}(\mu, \sigma^2/(2\theta))$ (mean-reverting Gaussian). Autocorrelation: $\text{Cov}(X_t, X_{t+h}) = \frac{\sigma^2}{2\theta}e^{-\theta h}$ (exponential decay). The OU process is the continuous-time analog of an AR(1) model.

ML connection: the continuous-time limit of stochastic gradient descent with weight decay is an OU process: $d\theta_t = -\lambda\theta_t\,dt - \eta\nabla L(\theta_t)\,dt + \sqrt{2\eta T}\,dB_t$ . The stationary distribution approximates the Gibbs distribution $\propto e^{-L(\theta)/T}$ under mild conditions.

Example 3: Black-Scholes via Feynman-Kac

European call option: payoff $g(S_T) = (S_T - K)^+$ at expiry $T$ , stock $dS_t = rS_t\,dt + \sigma S_t\,dB_t$ under risk-neutral measure.

By Feynman-Kac, the option price $C(S, t) = e^{-r(T-t)}E[(S_T - K)^+ \mid S_t = S]$ satisfies:

$\frac{\partial C}{\partial t} + rS\frac{\partial C}{\partial S} + \frac{1}{2}\sigma^2 S^2 \frac{\partial^2 C}{\partial S^2} - rC = 0.$

The closed-form solution: $C = S\Phi(d_1) - Ke^{-r(T-t)}\Phi(d_2)$ where $d_{1,2} = \frac{\log(S/K) + (r \pm \sigma^2/2)(T-t)}{\sigma\sqrt{T-t}}$ .

The $\pm\sigma^2/2$ in $d_1$ vs $d_2$ is the Itô correction term appearing again. Without Itô's lemma, Black-Scholes cannot be derived correctly.

Connections

Where Your Intuition Breaks

The Itô correction term $\frac{1}{2}f''(X_t)\sigma_t^2\,dt$ looks like a small perturbation, but for $f(x) = x^2$ and $X_t = B_t$ it gives $d(B_t^2) = 2B_t\,dB_t + dt$ . Integrating both sides yields $B_T^2 = 2\int_0^T B_t\,dB_t + T$ : the correction contributes $T$ — exactly the quadratic variation, not a negligible error. Without it, you would compute $E[B_T^2] = 0$ , but $B_T \sim \mathcal{N}(0, T)$ so $E[B_T^2] = T$ . The correction term is not a refinement of classical calculus; it is a genuinely new term that cannot be recovered by any limiting argument from ordinary integration.

💡Intuition

$(dB)^2 = dt$ is the fundamental identity of stochastic calculus. In classical calculus, second-order terms $(dx)^2 \to 0$ as $dx \to 0$ . But Brownian motion's quadratic variation is $[B,B]_t = t$ — finite, deterministic, growing. This makes $(dB_t)^2 = dt$ a finite correction, not a negligible term. Every surprising result in stochastic calculus — Itô's lemma correction, the Black-Scholes $\sigma^2/2$ terms, the Fokker-Planck diffusion term — traces back to this single identity. Internalizing $(dB)^2 = dt$ is the key to reading SDEs fluently.

💡Intuition

The Fokker-Planck equation connects microscopic SDEs to macroscopic densities. A single SDE trajectory is a random path; the Fokker-Planck equation evolves the probability density of the ensemble. The drift term $-\partial_x(\mu p)$ is convection; the diffusion term $\frac{1}{2}\partial_{xx}(\sigma^2 p)$ is spreading. For Langevin dynamics $dX = -\nabla U\,dt + \sqrt{2T}\,dB$ , the stationary Fokker-Planck solution is $p^* \propto e^{-U(x)/T}$ — the Boltzmann distribution. This is why Langevin dynamics samples from the correct posterior: the SDE's stationary distribution is exactly the target.

⚠️Warning

The Itô and Stratonovich interpretations give different numerical discretizations. The Euler-Maruyama scheme $X_{t+\Delta t} \approx X_t + \mu(X_t)\Delta t + \sigma(X_t)\Delta B_t$ implements Itô. The Milstein scheme and Runge-Kutta-style methods for Stratonovich SDEs require additional correction terms. For an SDE of the form $dX = f(X)\circ dB$ (Stratonovich), converting to Itô form adds a drift correction: $dX = \frac{1}{2}f'(X)f(X)\,dt + f(X)\,dB$ . Numerical SDE libraries (torchsde, diffeqpy) let you specify Itô vs Stratonovich — using the wrong convention silently produces wrong results.

Enjoying these notes?

Get new lessons delivered to your inbox. No spam.

Brownian Motion & Gaussian Processes

Bridge: MCMC, Langevin Dynamics & Diffusion Models as SDEs