Neural-Path/Notes
45 min

Hilbert Spaces: Inner Products, Orthonormal Bases & Riesz Representation

Hilbert spaces add an inner product to the Banach structure, giving geometry — angles, orthogonality, and projection — that is absent from general Banach spaces. The Riesz representation theorem identifies every continuous linear functional with a vector via the inner product, making Hilbert spaces "self-dual" and enabling a geometric theory of approximation that underlies kernel methods and Gaussian processes.

Concepts

Orthogonal Projection — drag the vector tip

WvP(v)v − P(v)O
v =(0.50, 0.90)
P(v) =(0.76, 0.44)
v − P(v) =(-0.26, 0.46)
‖v‖ =1.030
‖P(v)‖ =0.883
‖v − P(v)‖ =0.529
cos θ =0.858
θ =30.9°

Rotate subspace W

angle = 30°

The residual v − P(v) is always perpendicular to W (right-angle box). This is the Best Approximation Theorem: P(v) is the closest point in W to v.

In ordinary R3\mathbb{R}^3, you can measure angles between vectors, project one vector onto another, and decompose any vector into a component "in the direction of MM" and a component perpendicular to it. Hilbert spaces generalize this geometric structure to function spaces and infinite dimensions. The core fact is the projection theorem: for any closed subspace MM and any point xx outside it, there is a unique closest point in MM, and the error vector is perpendicular to MM. This single geometric fact is the foundation of Fourier series, least-squares regression, and the kernel trick.

Inner Product Spaces

An inner product space is a vector space HH equipped with a map ,:H×HR\langle\cdot,\cdot\rangle: H \times H \to \mathbb{R} satisfying:

  1. Positive definiteness: x,x0\langle x, x \rangle \geq 0 with equality iff x=0x = 0
  2. Linearity in first argument: αx+βy,z=αx,z+βy,z\langle \alpha x + \beta y, z \rangle = \alpha\langle x,z\rangle + \beta\langle y,z\rangle
  3. Conjugate symmetry: x,y=y,x\langle x, y \rangle = \overline{\langle y, x \rangle} (reduces to x,y=y,x\langle x,y\rangle = \langle y,x\rangle over R\mathbb{R})

The inner product induces a norm x=x,x\|x\| = \sqrt{\langle x,x\rangle}, so every inner product space is a normed space. The parallelogram law x+y2+xy2=2(x2+y2)\|x+y\|^2 + \|x-y\|^2 = 2(\|x\|^2 + \|y\|^2) characterizes norms arising from inner products — it holds in HH but fails in 1\ell^1 and L1L^1.

Hilbert Spaces

A Hilbert space is a complete inner product space. Examples:

  • Rn\mathbb{R}^n with the dot product: x,y=ixiyi\langle x, y \rangle = \sum_i x_i y_i
  • 2\ell^2: square-summable sequences with x,y=ixiyi\langle x, y \rangle = \sum_i x_i y_i
  • L2([a,b])L^2([a,b]): square-integrable functions with f,g=abf(x)g(x)dx\langle f, g \rangle = \int_a^b f(x)g(x)\,dx

Every Hilbert space is a Banach space, but not conversely (L1L^1 is Banach but not Hilbert since L1L^1 norms do not satisfy the parallelogram law).

Orthogonality and the Projection Theorem

Two elements x,yHx, y \in H are orthogonal, written xyx \perp y, if x,y=0\langle x, y \rangle = 0. For a subset MHM \subset H, the orthogonal complement is M={yH:y,x=0 for all xM}M^\perp = \{y \in H : \langle y, x \rangle = 0 \text{ for all } x \in M\}.

Projection theorem. Let MM be a closed subspace of a Hilbert space HH and let xHx \in H. Then there exists a unique mMm \in M such that

xm=infyMxy\|x - m\| = \inf_{y \in M} \|x - y\|

and xmMx - m \in M^\perp. The element m=PMxm = P_M x is the orthogonal projection of xx onto MM, and we have the orthogonal decomposition x=PMx+(xPMx)x = P_M x + (x - P_M x) with PMxMP_M x \in M and xPMxMx - P_M x \in M^\perp.

The perpendicularity of the error is not a design choice — it is a logical consequence of optimality. If the error xmx - m had any component inside MM, you could move mm in that direction and get a closer point, contradicting minimality. The inner product is the language that makes "perpendicular" meaningful in infinite dimensions; without it, Banach spaces have no notion of angle, and the projection theorem fails for general norms.

Riesz Representation Theorem

Theorem. Let HH be a Hilbert space and f:HRf: H \to \mathbb{R} a bounded linear functional. Then there exists a unique yHy \in H such that

f(x)=x,yfor all xHf(x) = \langle x, y \rangle \quad \text{for all } x \in H

and fH=yH\|f\|_{H^*} = \|y\|_H.

This establishes an isometric isomorphism HHH \cong H^*: every bounded functional on HH is "represented" by an element of HH itself via the inner product. Hilbert spaces are therefore reflexive (HHH^{**} \cong H).

Orthonormal Bases and Fourier Series

A sequence {ek}\{e_k\} in HH is orthonormal if ej,ek=δjk\langle e_j, e_k \rangle = \delta_{jk}. It is a complete orthonormal basis (ONB) if the only element orthogonal to all eke_k is 00, equivalently if span{ek}=H\overline{\mathrm{span}\{e_k\}} = H.

Gram-Schmidt process. Given a linearly independent sequence {vk}\{v_k\}, produce an ONB by u1=v1/v1u_1 = v_1/\|v_1\| and

u~k+1=vk+1j=1kvk+1,ujujuk+1=u~k+1/u~k+1\begin{aligned} \tilde{u}_{k+1} &= v_{k+1} - \sum_{j=1}^k \langle v_{k+1}, u_j \rangle u_j \\ u_{k+1} &= \tilde{u}_{k+1}/\|\tilde{u}_{k+1}\| \end{aligned}

Bessel's inequality. For any ONB {ek}\{e_k\} and xHx \in H: kx,ek2x2\sum_k |\langle x, e_k \rangle|^2 \leq \|x\|^2.

Parseval's identity. If {ek}\{e_k\} is a complete ONB: kx,ek2=x2\sum_k |\langle x, e_k \rangle|^2 = \|x\|^2 and x=kx,ekekx = \sum_k \langle x, e_k \rangle e_k.

Fourier series are the canonical ONB expansion in L2([π,π])L^2([-\pi, \pi]): the functions {1/2π,cos(nx)/π,sin(nx)/π}n1\{1/\sqrt{2\pi},\, \cos(nx)/\sqrt{\pi},\, \sin(nx)/\sqrt{\pi}\}_{n \geq 1} form a complete ONB.

Worked Example

Gram-Schmidt on {1,x,x2}\{1, x, x^2\} in L2([1,1])L^2([-1,1])

The inner product is f,g=11f(x)g(x)dx\langle f, g \rangle = \int_{-1}^1 f(x)g(x)\,dx.

Step 1. v1=1v_1 = 1, v12=2\|v_1\|^2 = 2, so P0(x)=1/2P_0(x) = 1/\sqrt{2} (normalized).

Step 2. v2=xv_2 = x. Since x,1=11xdx=0\langle x, 1 \rangle = \int_{-1}^1 x\,dx = 0 by symmetry, u~2=x\tilde{u}_2 = x and x2=2/3\|x\|^2 = 2/3, so P1(x)=x3/2P_1(x) = x\sqrt{3/2} (normalized). The unnormalized polynomial is xx.

Step 3. v3=x2v_3 = x^2. Project out the P0P_0 component: x2,1=11x2dx=2/3\langle x^2, 1 \rangle = \int_{-1}^1 x^2\,dx = 2/3. The P1P_1 component vanishes by symmetry. So u~3=x2(1/3)\tilde{u}_3 = x^2 - (1/3), giving (after normalization) the second Legendre polynomial P2(x)x21/3P_2(x) \propto x^2 - 1/3. The standard normalization is P2(x)=(3x21)/2P_2(x) = (3x^2 - 1)/2.

Best L2L^2 Approximation of f(x)=xf(x) = |x| by Degree-1 Polynomials

Let M=span{1,x}M = \mathrm{span}\{1, x\} in L2([1,1])L^2([-1,1]). The best approximation is p(x)=x,1/2(1/2)+x,x3/2(x3/2)p^*(x) = \langle |x|, 1/\sqrt{2}\rangle (1/\sqrt{2}) + \langle |x|, x\sqrt{3/2}\rangle (x\sqrt{3/2}).

x,1=11xdx=1,x,x=11xxdx=0\langle |x|, 1 \rangle = \int_{-1}^1 |x|\,dx = 1, \qquad \langle |x|, x \rangle = \int_{-1}^1 |x|\cdot x\,dx = 0

So p(x)=x,1/121=(1/2)1=1/2p^*(x) = \langle |x|, 1\rangle / \|1\|^2 \cdot 1 = (1/2) \cdot 1 = 1/2. The best constant approximation to x|x| on [1,1][-1,1] in L2L^2 is the constant 1/21/2.

Parseval's Identity for a Simple Fourier Series

Consider f(x)=xf(x) = x on L2([π,π])L^2([-\pi, \pi]). The Fourier sine coefficients are bn=(2/π)0πxsin(nx)dx=2(1)n+1/nb_n = (2/\pi)\int_0^\pi x\sin(nx)\,dx = 2(-1)^{n+1}/n. Parseval's identity gives:

f2=ππx2dx=2π33=πn=1bn2=πn=14n2\|f\|^2 = \int_{-\pi}^\pi x^2\,dx = \frac{2\pi^3}{3} = \pi \sum_{n=1}^\infty b_n^2 = \pi \sum_{n=1}^\infty \frac{4}{n^2}

This recovers the identity n=11/n2=π2/6\sum_{n=1}^\infty 1/n^2 = \pi^2/6, showing Parseval's identity is a non-trivial result even when evaluated on a single function.

Connections

Where Your Intuition Breaks

In finite dimensions, any subspace of Rn\mathbb{R}^n is closed, so the projection theorem always applies. In infinite-dimensional Hilbert spaces, subspaces can fail to be closed, and the projection theorem fails for non-closed subspaces. The polynomial subspace of L2([a,b])L^2([a,b]) is dense but not closed: every function in L2L^2 can be approximated by polynomials (Weierstrass), but most L2L^2 functions are not polynomials. The projection onto polynomials is not a well-defined map. Closedness is not a technicality — it is the condition that "best approximation" is achievable, not merely approachable. In practice, kernel methods and regularization work precisely because they optimize over closed subspaces (RKHS balls).

💡Intuition

Hilbert spaces are special among Banach spaces precisely because the inner product gives a canonical identification of HH with HH^* via the Riesz theorem. In a general Banach space, XX and XX^* are different objects related by duality; in a Hilbert space they coincide isometrically. This "self-duality" enables geometric intuition: optimizing a linear functional over a Hilbert space is the same as finding the closest element to the representing vector, turning optimization into projection.

💡Intuition

The kernel trick in machine learning is exactly the projection theorem applied implicitly. A kernel k(x,y)=ϕ(x),ϕ(y)Hk(x, y) = \langle \phi(x), \phi(y) \rangle_H defines a Hilbert space HH of functions. Training an SVM or fitting a Gaussian process finds the element of a closed subspace (the hypothesis class) closest to the "signal" in HH — the algorithm never forms ϕ(x)\phi(x) explicitly but computes inner products via kk, making infinite-dimensional optimization computationally tractable.

⚠️Warning

In infinite-dimensional Hilbert spaces, the closed unit ball {x:x1}\{x : \|x\| \leq 1\} is not compact in the norm topology. Sequences on the unit sphere can have no convergent subsequence — for example, in 2\ell^2 the sequence of standard basis vectors ene_n satisfies enem2=2\|e_n - e_m\|^2 = 2 for nmn \neq m, so no subsequence is Cauchy. This breaks finite-dimensional optimization intuition: existence of minimizers over the unit ball cannot be argued by sequential compactness in the norm topology. Instead, one uses weak compactness (the unit ball is weakly compact in a Hilbert space by reflexivity).

PropertyRn\mathbb{R}^nL2([a,b])L^2([a,b])L1([a,b])L^1([a,b])C([a,b])C([a,b])
Inner productYesYesNoNo
Hilbert spaceYesYesNoNo
Banach spaceYesYesYesYes
ReflexiveYesYesNoNo
SeparableYesYesYesYes

Enjoying these notes?

Get new lessons delivered to your inbox. No spam.