Integration in Rⁿ: Fubini, Change of Variables & Surface Integrals
Integration in multiple dimensions underpins the computation of expectations, partition functions, marginal distributions, and the change-of-variables formula that makes normalizing flows tractable. Fubini's theorem tells us when iterated integrals can be exchanged; the change-of-variables formula with its Jacobian determinant is the mathematical foundation for density transformations in generative modeling.
Concepts
Change of Variables — hover cells to see Jacobian det = r
Left: uniform grid in polar parameter space (θ, r). Right: the same cells after the map (r,θ)↦(r cosθ, r sinθ). Cells near the center (small r) shrink — the Jacobian determinant det J = r corrects for this in the change-of-variables formula. Darker = larger det J = more area stretching.
Every probability density you work with in ML — Gaussian, softmax output, normalizing flow — must integrate to 1 over its domain. When you transform variables (reparameterize a VAE, apply a normalizing flow, change to polar coordinates to compute a Gaussian normalization constant), the integral doesn't just follow the map — it must be corrected by how much the map stretches or squishes volume. That correction factor is the Jacobian determinant. Integration in makes this precise.
The Lebesgue Integral in
The Lebesgue integral of a measurable function over a measurable set is defined via measure theory. For practical purposes, it agrees with the Riemann integral whenever the Riemann integral exists, but handles pathological functions and limit exchanges more gracefully.
Key properties:
- Linearity:
- Monotonicity: a.e.
- Triangle inequality:
Fubini-Tonelli Theorem
Theorem (Fubini). Let be integrable (i.e., ). Then:
Tonelli's extension: if , the iterated integrals are equal (possibly ) even without the integrability assumption.
Why it matters for ML. Computing joint distributions and marginalizing: . Fubini guarantees you can do this in any order. The ELBO (Evidence Lower BOund) in variational inference involves — an integral over latent variables, computable as an iterated integral under Fubini.
Counterexample (why we need integrability). The function on has but — the iterated integrals are unequal because .
Change of Variables Formula
Theorem. Let be a diffeomorphism between open sets in , and integrable. Then:
The Jacobian determinant is the local volume scaling factor — how much expands or contracts area/volume near . Without this correction, probability mass would not be conserved under the transformation: regions that get stretched would appear to have more probability, and compressed regions less, without the total remaining 1. The determinant is precisely the factor needed to make the integral invariant under smooth reparameterization.
Polar coordinates (): , , . Thus:
The extra factor prevents cells near the origin (which map to tiny wedges) from being over-counted.
Spherical coordinates (): , . The volume element is .
Important Integrals via Change of Variables
Gaussian integral.
Proof: . Change to polar; the Jacobian determinant is essential.
Multivariate Gaussian normalization.
Proof: change variables , Jacobian , reduce to a product of standard Gaussians.
Surface Integrals and Differential Forms
A surface integral generalizes integration over curves in to integration over surfaces. For a surface parameterized by :
The cross product is the 2D Jacobian determinant analogue for surfaces in 3D.
Differential forms provide the coordinate-free version: a -form integrates over -dimensional oriented manifolds without choosing coordinates. The key fact: (Stokes' theorem), which generalizes the Fundamental Theorem of Calculus, Green's theorem, and the Divergence theorem.
The Laplacian and Harmonic Functions
The Laplacian is the divergence of the gradient — it measures how much the average value of near a point exceeds its value at the point.
Harmonic functions: . By the mean value property, — the value equals the average over any ball. Harmonic functions are the equilibrium solutions of diffusion processes.
Laplacian in ML:
- Graph Laplacian (discrete analogue): eigenvectors give spectral clustering embeddings
- Laplacian regularization: minimize — smooths predictions across graph edges
- Score function: appears in score-based generative models (DDPM, score matching)
Worked Example
Example 1: Gaussian KL Divergence
For , :
This closed-form integral uses change of variables (to diagonalize ) and the Gaussian normalization formula. It appears in the ELBO for VAEs, Gaussian process inference, and information-theoretic analysis of representation learning.
Example 2: Expectation by Change of Variables
If and (with ), then .
This is the reparameterization trick in VAEs: gradient flows through and because is the noise, making the sampling step differentiable. The change-of-variables formula justifies the density transformation.
Example 3: Normalizing Flow Log-Likelihood
For an invertible map with :
Log-likelihood: .
Training maximizes this over data . The Jacobian log-determinant is the correction for how stretches or squishes volume — exactly what the change-of-variables diagram illustrates.
Connections
Where Your Intuition Breaks
The most seductive shortcut: treating expectation and differentiation as freely interchangeable. In many ML derivations you want to write or differentiate under the integral sign. This works when and are both integrable and a dominating function condition holds (Leibniz integral rule / dominated convergence theorem) — but not in general. The REINFORCE estimator is one rigorous way to handle the gradient of an expectation when the distribution depends on ; the reparameterization trick is another. Using naive differentiation under the integral without checking integrability conditions leads to incorrect gradient estimates that can be silently wrong in sparse reward or heavy-tailed settings.
Intractable integrals and VI. Most interesting probability integrals in ML are intractable: requires integrating over all latent configurations. Variational inference replaces this with an optimization: where is a tractable family. Monte Carlo integration replaces it with sampling. Both approaches sidestep the intractable integral but introduce approximation error — the gap between these methods is a central theme of probabilistic ML.
Improper integrals and unnormalized densities. Many score-based and energy-based models define without computing the normalizing constant . The Fubini condition (integrability) must be checked to ensure . Distributions that don't normalize properly lead to nonsensical samples. This is why score matching trains on (the score function) without needing — the gradient of is zero w.r.t. parameters.
Enjoying these notes?
Get new lessons delivered to your inbox. No spam.