Neural-Path/Notes
45 min

Itô Calculus & Stochastic Differential Equations

Itô calculus extends differentiation and integration to stochastic processes driven by Brownian motion. The key departure from classical calculus is that Brownian paths have nonzero quadratic variation — (dB)2=dt(dB)^2 = dt — which generates an extra second-derivative term in the chain rule and makes stochastic differential equations a distinct object from their deterministic counterparts.

Concepts

Classical calculus was built for smooth curves — functions you can zoom in on until they look like straight lines. Brownian motion breaks this: zoom in on any Brownian path at any scale and it looks just as jagged. The total length of a Brownian path over any interval is infinite, so ordinary Riemann integration fails entirely. Itô calculus rebuilds integration from scratch for these paths, and the price of doing so is a correction term in the chain rule that has no classical analogue — not a small perturbation, but a term of the same order as the drift.

The Itô Integral

The classical Riemann-Stieltjes integral 0Tf(t)dBt\int_0^T f(t)\,dB_t fails because Brownian motion has infinite total variation. The Itô integral is defined as an L2L^2 limit of left-Riemann sums over adapted integrands:

0THtdBt=limΠ0kHtk(Btk+1Btk),\int_0^T H_t\,dB_t = \lim_{\|\Pi\| \to 0} \sum_{k} H_{t_k}(B_{t_{k+1}} - B_{t_k}),

where HH is adapted to (Ft)(\mathcal{F}_t) and E[0THt2dt]<E[\int_0^T H_t^2\,dt] < \infty. The left endpoint (vs midpoint or right endpoint) is essential: it gives the integral its martingale property.

The left endpoint evaluation is the unique choice that makes 0tHsdBs\int_0^t H_s\,dB_s a martingale — the mathematical expression of "no foresight." Using the midpoint (Stratonovich) yields a different integral that obeys the classical chain rule but loses the martingale property. This is not a convention; the left endpoint is the specific choice that preserves the probabilistic structure needed to analyze stochastic systems, while the Stratonovich convention is preferred in physics where the chain rule must hold for physical-noise approximations.

Itô isometry: E ⁣[(0THtdBt)2]=E ⁣[0THt2dt].E\!\left[\left(\int_0^T H_t\,dB_t\right)^2\right] = E\!\left[\int_0^T H_t^2\,dt\right].

The Itô integral Mt=0tHsdBsM_t = \int_0^t H_s\,dB_s is a martingale (not just a local martingale) when E[0THs2ds]<E[\int_0^T H_s^2\,ds] < \infty.

Contrast with Stratonovich integral: 0THtdBt=limH(tk+tk+1)/2(Btk+1Btk)\int_0^T H_t \circ dB_t = \lim \sum H_{(t_k+t_{k+1})/2}(B_{t_{k+1}}-B_{t_k}) uses midpoints. Stratonovich satisfies the classical chain rule but loses the martingale property. Itô is used in finance (self-financing portfolios); Stratonovich appears in physics (Wong-Zakai theorem for physical noise).

Itô's Lemma (Stochastic Chain Rule)

Let XtX_t be an Itô process: dXt=μtdt+σtdBtdX_t = \mu_t\,dt + \sigma_t\,dB_t. For fC2f \in C^2:

df(Xt)=f(Xt)dXt+12f(Xt)d[X,X]t=f(Xt)(μtdt+σtdBt)+12f(Xt)σt2dt.df(X_t) = f'(X_t)\,dX_t + \frac{1}{2}f''(X_t)\,d[X,X]_t = f'(X_t)(\mu_t\,dt + \sigma_t\,dB_t) + \frac{1}{2}f''(X_t)\sigma_t^2\,dt.

The Itô correction term 12f(Xt)σt2dt\frac{1}{2}f''(X_t)\sigma_t^2\,dt arises from the quadratic variation d[X,X]t=σt2dtd[X,X]_t = \sigma_t^2\,dt. In the formal Itô multiplication table: (dt)2=0(dt)^2 = 0, dtdBt=0dt \cdot dB_t = 0, (dBt)2=dt(dB_t)^2 = dt.

Multidimensional Itô lemma: for XtRd\mathbf{X}_t \in \mathbb{R}^d with dXt=μtdt+σtdBtd\mathbf{X}_t = \boldsymbol{\mu}_t\,dt + \boldsymbol{\sigma}_t\,d\mathbf{B}_t (BtRm\mathbf{B}_t \in \mathbb{R}^m):

df(t,Xt)=ftdt+fdXt+12tr(σtσtT2f)dt.df(t, \mathbf{X}_t) = \frac{\partial f}{\partial t}\,dt + \nabla f \cdot d\mathbf{X}_t + \frac{1}{2}\text{tr}(\boldsymbol{\sigma}_t \boldsymbol{\sigma}_t^T \nabla^2 f)\,dt.

The Hessian term 12tr(Σt2f)\frac{1}{2}\text{tr}(\Sigma_t \nabla^2 f) involves the instantaneous covariance matrix Σt=σtσtT\Sigma_t = \boldsymbol{\sigma}_t \boldsymbol{\sigma}_t^T.

Stochastic Differential Equations

A stochastic differential equation (SDE) in Itô form is:

dXt=μ(Xt,t)dt+σ(Xt,t)dBt,X0=x0.dX_t = \mu(X_t, t)\,dt + \sigma(X_t, t)\,dB_t, \quad X_0 = x_0.

The drift μ\mu and diffusion coefficient σ\sigma can depend on the current state and time. The SDE is a shorthand for the integral equation Xt=x0+0tμ(Xs,s)ds+0tσ(Xs,s)dBsX_t = x_0 + \int_0^t \mu(X_s, s)\,ds + \int_0^t \sigma(X_s, s)\,dB_s.

Existence and uniqueness (strong solutions): if μ\mu and σ\sigma satisfy a Lipschitz condition in xx and a linear growth bound, then a unique strong solution exists.

Fokker-Planck equation: the probability density p(x,t)p(x, t) of the solution satisfies:

pt=x[μ(x,t)p]+122x2[σ(x,t)2p].\frac{\partial p}{\partial t} = -\frac{\partial}{\partial x}[\mu(x,t)p] + \frac{1}{2}\frac{\partial^2}{\partial x^2}[\sigma(x,t)^2 p].

This is the forward equation — it evolves the density forward in time given the SDE coefficients.

Important SDEs:

SDENameApplication
dXt=μXtdt+σXtdBtdX_t = \mu X_t\,dt + \sigma X_t\,dB_tGeometric Brownian MotionStock prices (Black-Scholes)
dXt=θ(μXt)dt+σdBtdX_t = \theta(\mu - X_t)\,dt + \sigma\,dB_tOrnstein-UhlenbeckMean-reverting process
dXt=U(Xt)dt+2TdBtdX_t = -\nabla U(X_t)\,dt + \sqrt{2T}\,dB_tLangevin SDEMCMC sampling
dXt=f(Xt)dt+σdBtdX_t = f(X_t)\,dt + \sigma\,dB_tGeneral Itô diffusionNeural SDEs, physics simulation

The Feynman-Kac Formula

For the SDE dXt=μ(Xt)dt+σ(Xt)dBtdX_t = \mu(X_t)\,dt + \sigma(X_t)\,dB_t with terminal condition gg, the function:

u(x,t)=Ex ⁣[g(XT)etTr(Xs)ds]u(x, t) = E_x\!\left[g(X_T) e^{-\int_t^T r(X_s)\,ds}\right]

satisfies the backward Kolmogorov PDE (Feynman-Kac):

ut+μ(x)ux+12σ(x)22ux2r(x)u=0,u(x,T)=g(x).\frac{\partial u}{\partial t} + \mu(x)\frac{\partial u}{\partial x} + \frac{1}{2}\sigma(x)^2 \frac{\partial^2 u}{\partial x^2} - r(x)u = 0, \quad u(x,T) = g(x).

This converts the PDE into a stochastic expectation — the basis of Monte Carlo PDE solvers and the Black-Scholes option pricing formula (GBM with r=r = risk-free rate, g=max(STK,0)g = \max(S_T - K, 0)).

Worked Example

Example 1: Geometric Brownian Motion and Log-Normal Prices

Apply Itô's lemma to f(Xt)=logXtf(X_t) = \log X_t where dXt=μXtdt+σXtdBtdX_t = \mu X_t\,dt + \sigma X_t\,dB_t:

d(logXt)=1XtdXt121Xt2(dXt)2=μdt+σdBt12σ2dt=(μσ22)dt+σdBt.d(\log X_t) = \frac{1}{X_t}dX_t - \frac{1}{2}\frac{1}{X_t^2}(dX_t)^2 = \mu\,dt + \sigma\,dB_t - \frac{1}{2}\sigma^2\,dt = \left(\mu - \frac{\sigma^2}{2}\right)dt + \sigma\,dB_t.

Integrating: logXt=logX0+(μσ2/2)t+σBt\log X_t = \log X_0 + (\mu - \sigma^2/2)t + \sigma B_t, so Xt=X0exp((μσ2/2)t+σBt)X_t = X_0 \exp((\mu - \sigma^2/2)t + \sigma B_t).

XtX_t is log-normal: E[Xt]=X0eμtE[X_t] = X_0 e^{\mu t}, Var[Xt]=X02e2μt(eσ2t1)\text{Var}[X_t] = X_0^2 e^{2\mu t}(e^{\sigma^2 t} - 1).

The Itô correction σ2/2-\sigma^2/2 in the drift of logXt\log X_t explains why E[logXt]=logX0+(μσ2/2)t<logE[Xt]E[\log X_t] = \log X_0 + (\mu - \sigma^2/2)t < \log E[X_t] — Jensen's inequality for the concave log function. A naive "apply log to the SDE" without Itô's lemma would miss this term.

Example 2: Ornstein-Uhlenbeck Process

dXt=θ(μXt)dt+σdBtdX_t = \theta(\mu - X_t)\,dt + \sigma\,dB_t, X0=x0X_0 = x_0. This SDE has a linear drift and additive noise. The exact solution:

Xt=μ+(x0μ)eθt+σ0teθ(ts)dBs.X_t = \mu + (x_0 - \mu)e^{-\theta t} + \sigma \int_0^t e^{-\theta(t-s)}\,dB_s.

Stationary distribution: XN(μ,σ2/(2θ))X_\infty \sim \mathcal{N}(\mu, \sigma^2/(2\theta)) (mean-reverting Gaussian). Autocorrelation: Cov(Xt,Xt+h)=σ22θeθh\text{Cov}(X_t, X_{t+h}) = \frac{\sigma^2}{2\theta}e^{-\theta h} (exponential decay). The OU process is the continuous-time analog of an AR(1) model.

ML connection: the continuous-time limit of stochastic gradient descent with weight decay is an OU process: dθt=λθtdtηL(θt)dt+2ηTdBtd\theta_t = -\lambda\theta_t\,dt - \eta\nabla L(\theta_t)\,dt + \sqrt{2\eta T}\,dB_t. The stationary distribution approximates the Gibbs distribution eL(θ)/T\propto e^{-L(\theta)/T} under mild conditions.

Example 3: Black-Scholes via Feynman-Kac

European call option: payoff g(ST)=(STK)+g(S_T) = (S_T - K)^+ at expiry TT, stock dSt=rStdt+σStdBtdS_t = rS_t\,dt + \sigma S_t\,dB_t under risk-neutral measure.

By Feynman-Kac, the option price C(S,t)=er(Tt)E[(STK)+St=S]C(S, t) = e^{-r(T-t)}E[(S_T - K)^+ \mid S_t = S] satisfies:

Ct+rSCS+12σ2S22CS2rC=0.\frac{\partial C}{\partial t} + rS\frac{\partial C}{\partial S} + \frac{1}{2}\sigma^2 S^2 \frac{\partial^2 C}{\partial S^2} - rC = 0.

The closed-form solution: C=SΦ(d1)Ker(Tt)Φ(d2)C = S\Phi(d_1) - Ke^{-r(T-t)}\Phi(d_2) where d1,2=log(S/K)+(r±σ2/2)(Tt)σTtd_{1,2} = \frac{\log(S/K) + (r \pm \sigma^2/2)(T-t)}{\sigma\sqrt{T-t}}.

The ±σ2/2\pm\sigma^2/2 in d1d_1 vs d2d_2 is the Itô correction term appearing again. Without Itô's lemma, Black-Scholes cannot be derived correctly.

Connections

Where Your Intuition Breaks

The Itô correction term 12f(Xt)σt2dt\frac{1}{2}f''(X_t)\sigma_t^2\,dt looks like a small perturbation, but for f(x)=x2f(x) = x^2 and Xt=BtX_t = B_t it gives d(Bt2)=2BtdBt+dtd(B_t^2) = 2B_t\,dB_t + dt. Integrating both sides yields BT2=20TBtdBt+TB_T^2 = 2\int_0^T B_t\,dB_t + T: the correction contributes TT — exactly the quadratic variation, not a negligible error. Without it, you would compute E[BT2]=0E[B_T^2] = 0, but BTN(0,T)B_T \sim \mathcal{N}(0, T) so E[BT2]=TE[B_T^2] = T. The correction term is not a refinement of classical calculus; it is a genuinely new term that cannot be recovered by any limiting argument from ordinary integration.

💡Intuition

(dB)2=dt(dB)^2 = dt is the fundamental identity of stochastic calculus. In classical calculus, second-order terms (dx)20(dx)^2 \to 0 as dx0dx \to 0. But Brownian motion's quadratic variation is [B,B]t=t[B,B]_t = t — finite, deterministic, growing. This makes (dBt)2=dt(dB_t)^2 = dt a finite correction, not a negligible term. Every surprising result in stochastic calculus — Itô's lemma correction, the Black-Scholes σ2/2\sigma^2/2 terms, the Fokker-Planck diffusion term — traces back to this single identity. Internalizing (dB)2=dt(dB)^2 = dt is the key to reading SDEs fluently.

💡Intuition

The Fokker-Planck equation connects microscopic SDEs to macroscopic densities. A single SDE trajectory is a random path; the Fokker-Planck equation evolves the probability density of the ensemble. The drift term x(μp)-\partial_x(\mu p) is convection; the diffusion term 12xx(σ2p)\frac{1}{2}\partial_{xx}(\sigma^2 p) is spreading. For Langevin dynamics dX=Udt+2TdBdX = -\nabla U\,dt + \sqrt{2T}\,dB, the stationary Fokker-Planck solution is peU(x)/Tp^* \propto e^{-U(x)/T} — the Boltzmann distribution. This is why Langevin dynamics samples from the correct posterior: the SDE's stationary distribution is exactly the target.

⚠️Warning

The Itô and Stratonovich interpretations give different numerical discretizations. The Euler-Maruyama scheme Xt+ΔtXt+μ(Xt)Δt+σ(Xt)ΔBtX_{t+\Delta t} \approx X_t + \mu(X_t)\Delta t + \sigma(X_t)\Delta B_t implements Itô. The Milstein scheme and Runge-Kutta-style methods for Stratonovich SDEs require additional correction terms. For an SDE of the form dX=f(X)dBdX = f(X)\circ dB (Stratonovich), converting to Itô form adds a drift correction: dX=12f(X)f(X)dt+f(X)dBdX = \frac{1}{2}f'(X)f(X)\,dt + f(X)\,dB. Numerical SDE libraries (torchsde, diffeqpy) let you specify Itô vs Stratonovich — using the wrong convention silently produces wrong results.

Enjoying these notes?

Get new lessons delivered to your inbox. No spam.