Vector Spaces & Linear Maps
Every machine learning model operates on vectors. When we write , we are implicitly using the axioms of a vector space — the rules that make addition and scalar multiplication behave predictably. The gradient update stays in the same space as because that space is closed under exactly these operations. Understanding the axiomatic foundation makes every downstream theorem in linear algebra inevitable rather than arbitrary: spectral decompositions, least-squares solutions, and backpropagation all follow from the same eight rules you will see in this lesson.
Concepts
Isometry — grid rotates, lengths and angles preserved
When you multiply a NumPy array by a scalar or add two arrays elementwise, you're already using vector space operations — the rules that make these behave predictably. The eight axioms below are not arbitrary: they are the minimal conditions that make span, basis, and dimension well-defined. Any set of objects that respects these rules — polynomials, matrices, functions — is a vector space and inherits the full machinery of linear algebra.
Fields
A field is a set equipped with addition and multiplication that satisfies commutativity, associativity, distributivity, and the existence of additive and multiplicative identities and inverses (except division by zero). The two fields relevant throughout this module are the real numbers and the complex numbers . Unless otherwise stated, assume .
Vector Space Axioms
A vector space over field is a set equipped with two operations:
- Addition: , written
- Scalar multiplication: , written
satisfying all eight axioms:
| # | Axiom | Statement |
|---|---|---|
| 1 | Commutativity | |
| 2 | Associativity (addition) | |
| 3 | Additive identity | such that |
| 4 | Additive inverse | , such that |
| 5 | Multiplicative identity | |
| 6 | Associativity (scalar) | |
| 7 | Distributivity I | |
| 8 | Distributivity II |
Elements of are called vectors; elements of are scalars.
The eight axioms are the minimum needed to guarantee that span, basis, and dimension are well-defined. Without distributivity (Axioms 7–8) you could not decompose a vector into basis components; without the additive inverse (Axiom 4) you could not subtract. Each axiom closes exactly one loophole that would otherwise break one of the geometric concepts linear algebra is built on.
The word "vector" does not mean "arrow in 2D or 3D space." It means any element of any set that satisfies these eight axioms. Polynomials, matrices, and functions are all vectors in the appropriate spaces.
Canonical examples:
- : tuples of real numbers with coordinatewise addition and scalar multiplication — the workhorse of ML
- : real matrices with elementwise addition and scalar multiplication;
- : polynomials of degree at most with coefficients in ;
- The zero space : the trivial vector space with a single element;
Non-examples: with real scalar multiplication fails Axiom 5 (since ). The unit sphere is not a vector space because it is not closed under addition.
Subspaces
A nonempty subset is a subspace of if it is itself a vector space under the inherited operations. The subspace test reduces this to three conditions:
Conditions 2 and 3 together mean is closed under linear combinations. They imply Condition 1 for any nonempty (take in Condition 3).
Key examples that appear throughout ML:
Both satisfy the subspace test (verify: for null space closure under addition).
Span, Linear Independence, and Bases
Span: The span of a set is the set of all finite linear combinations:
is always a subspace of — the smallest subspace containing .
Linear independence: Vectors are linearly independent if the only solution to is . Equivalently, no lies in the span of the others.
Basis: A set is a basis for if it is linearly independent and . Equivalently, every has a unique representation .
Theorem (Steinitz Exchange Lemma): Any two bases of a finite-dimensional vector space have the same cardinality. This common cardinality is the dimension of , written .
Linear Maps
A function between vector spaces over the same field is linear (a linear map or linear transformation) if for all and :
Linearity has two immediate consequences: , and is completely determined by its values on any basis.
Canonical examples:
- for : matrix multiplication is the canonical linear map from to
- Differentiation , : linear because and
- Orthogonal projection onto a subspace : linear by the linearity of the projection formula
Note on neural networks: A neural network layer with is affine, not linear (it fails ). The distinction matters when analyzing expressivity and composability.
Kernel and image: For a linear map :
Both are subspaces (verify by the subspace test). The kernel measures how much information collapses; the image measures what outputs can produce.
Imagine the transformation that takes any 2D vector and collapses it onto the x-axis: . The kernel of this map is the entire y-axis — all vectors map to zero. The image is just the x-axis. This is a rank-1 linear map: it destroys one dimension of information irreversibly.
The Rank-Nullity Theorem
Theorem (Rank-Nullity / Dimension Theorem): For any linear map with :
Proof. Let and choose a basis for . By the basis extension theorem, extend to a basis for , so .
We claim is a basis for .
Spanning: For any , write for some . Then , since .
Independence: Suppose . Then , so . Writing and using independence of the full basis gives all .
Therefore , and .
Rank-nullity is a dimension budget. A linear map cannot create new dimensions: every dimension lost to the kernel is a dimension denied to the image. If is a classification layer, rank-nullity says at least dimensions of input are discarded.
Matrix Representations and Change of Basis
Given ordered bases for and for , the matrix of with respect to is the matrix whose -th column is the coordinate vector of in basis :
Change of basis. If is the invertible change-of-basis matrix from to the standard basis (columns are the basis vectors expressed in standard coordinates), then the matrix of in is related to the standard-basis matrix by:
This is matrix similarity. Two matrices and are similar () if for some invertible . Similar matrices represent the same linear map in different bases and share: eigenvalues, determinant, trace, and rank.
For a symmetric matrix , the spectral theorem (Lesson 4) guarantees the existence of an orthonormal eigenbasis. In that basis, is diagonal — the simplest possible matrix representation. Change of basis is the mechanism by which this simplification happens.
Worked Example
Example 1: Verifying the Null Space
Let . Find and apply the subspace test.
Row reduction:
Free variable: . Back-substitution: , . So:
Subspace test:
- ✓
- If then ✓
- If then ✓
Example 2: Rank-Nullity in Action
For : two pivot columns give ; one free variable gives .
Since , the column space is all of : the map is surjective, and has at least one solution for every . Since the null space is nontrivial, the solution is not unique.
Example 3: Change of Basis for a 2D Rotation
Let be rotation by . In the standard basis:
Choose the orthonormal basis .
The change-of-basis matrix and its inverse:
(Since is orthogonal, .) Computing :
This previews the spectral theorem: for symmetric matrices, one can always choose to be an eigenbasis, making diagonal.
Connections
Where Your Intuition Breaks
A higher-dimensional vector space always has more expressive power. The ambient dimension and the effective rank of a linear map are entirely different things. A weight matrix acts on 512-dimensional inputs, but if , the map erases 502 input directions — they produce identical outputs. LoRA exploits this directly: the low-rank update with , and adds exactly new expressive directions regardless of how large is. Rank is the true measure of capacity; dimension is just the ambient container.
Invertibility and the Four Fundamental Subspaces
For with rank , the four fundamental subspaces partition the domain and codomain:
| Subspace | Lives in | Dimension | Role |
|---|---|---|---|
| Column space | Reachable outputs | ||
| Left null space | Unreachable directions | ||
| Row space | Inputs that matter | ||
| Null space | Inputs that are erased |
The fundamental theorem of linear algebra states: and .
Invertibility conditions:
| Condition | Geometric meaning | Consequence |
|---|---|---|
| (square, full rank) | Bijective — no information lost or missed | exists; has unique solution |
| (wide, full row rank) | Surjective — all outputs reachable | has solutions; infinitely many |
| (tall, full col rank) | Injective — no kernel | has at most one solution; may have none |
| (rank deficient) | Neither injective nor surjective | Kernel is nontrivial; image is a proper subspace |
Common Pitfalls
Confusing linear with affine. Every neural network layer is affine when . Affine maps do not form a vector space (composition of two affine maps is affine, but the sum is not generally affine). This distinction matters for analyzing equivariance, scaling laws, and composability proofs.
Assuming all subspaces are hyperplanes. A subspace of can have any dimension from 0 to . In particular, is a subspace of itself.
Rank deficiency as a failure mode. If a weight matrix becomes rank deficient during training, the model can no longer distinguish inputs that differ only in . This is one motivation for rank-regularization and for LoRA fine-tuning: the low-rank update explicitly constrains .
Rank-Nullity as Algorithm Design
The rank-nullity theorem is a design constraint, not just a theorem. When building a compression layer with , you are choosing to discard dimensions. The question is which dimensions. PCA (Lesson 5) answers this optimally in a least-squares sense; attention (Module 13, Lesson 7) answers it adaptively based on context.
Every result in this module is an answer to the same question: given a linear map , what is the most useful basis to express it in? The spectral theorem (Lesson 4) answers this for symmetric matrices. SVD (Lesson 5) answers it for arbitrary matrices. Matrix calculus (Lesson 8) asks how the answer changes when we perturb the map.
Enjoying these notes?
Get new lessons delivered to your inbox. No spam.