Hilbert Spaces: Inner Products, Orthonormal Bases & Riesz Representation

Hilbert spaces add an inner product to the Banach structure, giving geometry — angles, orthogonality, and projection — that is absent from general Banach spaces. The Riesz representation theorem identifies every continuous linear functional with a vector via the inner product, making Hilbert spaces "self-dual" and enabling a geometric theory of approximation that underlies kernel methods and Gaussian processes.

Concepts

Orthogonal Projection — drag the vector tip

v =(0.50, 0.90)

P(v) =(0.76, 0.44)

v − P(v) =(-0.26, 0.46)

‖v‖ =1.030

‖P(v)‖ =0.883

‖v − P(v)‖ =0.529

cos θ =0.858

θ =30.9°

Rotate subspace W

angle = 30°

The residual v − P(v) is always perpendicular to W (right-angle box). This is the Best Approximation Theorem: P(v) is the closest point in W to v.

In ordinary $\mathbb{R}^3$ , you can measure angles between vectors, project one vector onto another, and decompose any vector into a component "in the direction of $M$ " and a component perpendicular to it. Hilbert spaces generalize this geometric structure to function spaces and infinite dimensions. The core fact is the projection theorem: for any closed subspace $M$ and any point $x$ outside it, there is a unique closest point in $M$ , and the error vector is perpendicular to $M$ . This single geometric fact is the foundation of Fourier series, least-squares regression, and the kernel trick.

Inner Product Spaces

An inner product space is a vector space $H$ equipped with a map $\langle\cdot,\cdot\rangle: H \times H \to \mathbb{R}$ satisfying:

Positive definiteness: $\langle x, x \rangle \geq 0$ with equality iff $x = 0$
Linearity in first argument: $\langle \alpha x + \beta y, z \rangle = \alpha\langle x,z\rangle + \beta\langle y,z\rangle$
Conjugate symmetry: $\langle x, y \rangle = \overline{\langle y, x \rangle}$ (reduces to $\langle x,y\rangle = \langle y,x\rangle$ over $\mathbb{R}$ )

The inner product induces a norm $\|x\| = \sqrt{\langle x,x\rangle}$ , so every inner product space is a normed space. The parallelogram law $\|x+y\|^2 + \|x-y\|^2 = 2(\|x\|^2 + \|y\|^2)$ characterizes norms arising from inner products — it holds in $H$ but fails in $\ell^1$ and $L^1$ .

Hilbert Spaces

A Hilbert space is a complete inner product space. Examples:

$\mathbb{R}^n$ with the dot product: $\langle x, y \rangle = \sum_i x_i y_i$
$\ell^2$ : square-summable sequences with $\langle x, y \rangle = \sum_i x_i y_i$
$L^2([a,b])$ : square-integrable functions with $\langle f, g \rangle = \int_a^b f(x)g(x)\,dx$

Every Hilbert space is a Banach space, but not conversely ( $L^1$ is Banach but not Hilbert since $L^1$ norms do not satisfy the parallelogram law).

Orthogonality and the Projection Theorem

Two elements $x, y \in H$ are orthogonal, written $x \perp y$ , if $\langle x, y \rangle = 0$ . For a subset $M \subset H$ , the orthogonal complement is $M^\perp = \{y \in H : \langle y, x \rangle = 0 \text{ for all } x \in M\}$ .

Projection theorem. Let $M$ be a closed subspace of a Hilbert space $H$ and let $x \in H$ . Then there exists a unique $m \in M$ such that

$\|x - m\| = \inf_{y \in M} \|x - y\|$

and $x - m \in M^\perp$ . The element $m = P_M x$ is the orthogonal projection of $x$ onto $M$ , and we have the orthogonal decomposition $x = P_M x + (x - P_M x)$ with $P_M x \in M$ and $x - P_M x \in M^\perp$ .

The perpendicularity of the error is not a design choice — it is a logical consequence of optimality. If the error $x - m$ had any component inside $M$ , you could move $m$ in that direction and get a closer point, contradicting minimality. The inner product is the language that makes "perpendicular" meaningful in infinite dimensions; without it, Banach spaces have no notion of angle, and the projection theorem fails for general norms.

Riesz Representation Theorem

Theorem. Let $H$ be a Hilbert space and $f: H \to \mathbb{R}$ a bounded linear functional. Then there exists a unique $y \in H$ such that

$f(x) = \langle x, y \rangle \quad \text{for all } x \in H$

and $\|f\|_{H^*} = \|y\|_H$ .

This establishes an isometric isomorphism $H \cong H^*$ : every bounded functional on $H$ is "represented" by an element of $H$ itself via the inner product. Hilbert spaces are therefore reflexive ( $H^{**} \cong H$ ).

Orthonormal Bases and Fourier Series

A sequence $\{e_k\}$ in $H$ is orthonormal if $\langle e_j, e_k \rangle = \delta_{jk}$ . It is a complete orthonormal basis (ONB) if the only element orthogonal to all $e_k$ is $0$ , equivalently if $\overline{\mathrm{span}\{e_k\}} = H$ .

Gram-Schmidt process. Given a linearly independent sequence $\{v_k\}$ , produce an ONB by $u_1 = v_1/\|v_1\|$ and

\begin{aligned} \tilde{u}_{k+1} &= v_{k+1} - \sum_{j=1}^k \langle v_{k+1}, u_j \rangle u_j \\ u_{k+1} &= \tilde{u}_{k+1}/\|\tilde{u}_{k+1}\| \end{aligned}

Bessel's inequality. For any ONB $\{e_k\}$ and $x \in H$ : $\sum_k |\langle x, e_k \rangle|^2 \leq \|x\|^2$ .

Parseval's identity. If $\{e_k\}$ is a complete ONB: $\sum_k |\langle x, e_k \rangle|^2 = \|x\|^2$ and $x = \sum_k \langle x, e_k \rangle e_k$ .

Fourier series are the canonical ONB expansion in $L^2([-\pi, \pi])$ : the functions $\{1/\sqrt{2\pi},\, \cos(nx)/\sqrt{\pi},\, \sin(nx)/\sqrt{\pi}\}_{n \geq 1}$ form a complete ONB.

Worked Example

Gram-Schmidt on $\{1, x, x^2\}$ in $L^2([-1,1])$

The inner product is $\langle f, g \rangle = \int_{-1}^1 f(x)g(x)\,dx$ .

Step 1. $v_1 = 1$ , $\|v_1\|^2 = 2$ , so $P_0(x) = 1/\sqrt{2}$ (normalized).

Step 2. $v_2 = x$ . Since $\langle x, 1 \rangle = \int_{-1}^1 x\,dx = 0$ by symmetry, $\tilde{u}_2 = x$ and $\|x\|^2 = 2/3$ , so $P_1(x) = x\sqrt{3/2}$ (normalized). The unnormalized polynomial is $x$ .

Step 3. $v_3 = x^2$ . Project out the $P_0$ component: $\langle x^2, 1 \rangle = \int_{-1}^1 x^2\,dx = 2/3$ . The $P_1$ component vanishes by symmetry. So $\tilde{u}_3 = x^2 - (1/3)$ , giving (after normalization) the second Legendre polynomial $P_2(x) \propto x^2 - 1/3$ . The standard normalization is $P_2(x) = (3x^2 - 1)/2$ .

Best $L^2$ Approximation of $f(x) = |x|$ by Degree-1 Polynomials

Let $M = \mathrm{span}\{1, x\}$ in $L^2([-1,1])$ . The best approximation is $p^*(x) = \langle |x|, 1/\sqrt{2}\rangle (1/\sqrt{2}) + \langle |x|, x\sqrt{3/2}\rangle (x\sqrt{3/2})$ .

$\langle |x|, 1 \rangle = \int_{-1}^1 |x|\,dx = 1, \qquad \langle |x|, x \rangle = \int_{-1}^1 |x|\cdot x\,dx = 0$

So $p^*(x) = \langle |x|, 1\rangle / \|1\|^2 \cdot 1 = (1/2) \cdot 1 = 1/2$ . The best constant approximation to $|x|$ on $[-1,1]$ in $L^2$ is the constant $1/2$ .

Parseval's Identity for a Simple Fourier Series

Consider $f(x) = x$ on $L^2([-\pi, \pi])$ . The Fourier sine coefficients are $b_n = (2/\pi)\int_0^\pi x\sin(nx)\,dx = 2(-1)^{n+1}/n$ . Parseval's identity gives:

$\|f\|^2 = \int_{-\pi}^\pi x^2\,dx = \frac{2\pi^3}{3} = \pi \sum_{n=1}^\infty b_n^2 = \pi \sum_{n=1}^\infty \frac{4}{n^2}$

This recovers the identity $\sum_{n=1}^\infty 1/n^2 = \pi^2/6$ , showing Parseval's identity is a non-trivial result even when evaluated on a single function.

Connections

Where Your Intuition Breaks

In finite dimensions, any subspace of $\mathbb{R}^n$ is closed, so the projection theorem always applies. In infinite-dimensional Hilbert spaces, subspaces can fail to be closed, and the projection theorem fails for non-closed subspaces. The polynomial subspace of $L^2([a,b])$ is dense but not closed: every function in $L^2$ can be approximated by polynomials (Weierstrass), but most $L^2$ functions are not polynomials. The projection onto polynomials is not a well-defined map. Closedness is not a technicality — it is the condition that "best approximation" is achievable, not merely approachable. In practice, kernel methods and regularization work precisely because they optimize over closed subspaces (RKHS balls).

💡Intuition

Hilbert spaces are special among Banach spaces precisely because the inner product gives a canonical identification of $H$ with $H^*$ via the Riesz theorem. In a general Banach space, $X$ and $X^*$ are different objects related by duality; in a Hilbert space they coincide isometrically. This "self-duality" enables geometric intuition: optimizing a linear functional over a Hilbert space is the same as finding the closest element to the representing vector, turning optimization into projection.

💡Intuition

The kernel trick in machine learning is exactly the projection theorem applied implicitly. A kernel $k(x, y) = \langle \phi(x), \phi(y) \rangle_H$ defines a Hilbert space $H$ of functions. Training an SVM or fitting a Gaussian process finds the element of a closed subspace (the hypothesis class) closest to the "signal" in $H$ — the algorithm never forms $\phi(x)$ explicitly but computes inner products via $k$ , making infinite-dimensional optimization computationally tractable.

⚠️Warning

In infinite-dimensional Hilbert spaces, the closed unit ball $\{x : \|x\| \leq 1\}$ is not compact in the norm topology. Sequences on the unit sphere can have no convergent subsequence — for example, in $\ell^2$ the sequence of standard basis vectors $e_n$ satisfies $\|e_n - e_m\|^2 = 2$ for $n \neq m$ , so no subsequence is Cauchy. This breaks finite-dimensional optimization intuition: existence of minimizers over the unit ball cannot be argued by sequential compactness in the norm topology. Instead, one uses weak compactness (the unit ball is weakly compact in a Hilbert space by reflexivity).

Property	$\mathbb{R}^n$	$L^2([a,b])$	$L^1([a,b])$	$C([a,b])$
Inner product	Yes	Yes	No	No
Hilbert space	Yes	Yes	No	No
Banach space	Yes	Yes	Yes	Yes
Reflexive	Yes	Yes	No	No
Separable	Yes	Yes	Yes	Yes

Enjoying these notes?

Get new lessons delivered to your inbox. No spam.

Normed & Banach Spaces: Completeness, Compactness & the Hahn-Banach Theorem

Bounded Linear Operators: Adjoints, Spectrum & Compact Operators