Positive Semidefinite Matrices & Quadratic Forms
Positive semidefinite (PSD) matrices are the matrix analogue of non-negative numbers: they define valid inner products, encode covariances, kernels, and curvatures, and form a convex cone under matrix addition and positive scaling. The PSD condition is the central constraint in semidefinite programming, Gaussian process models, and the analysis of loss surfaces in optimization. Understanding the multiple equivalent characterizations of PSD-ness — via eigenvalues, Cholesky, principal minors, and Gram matrices — is essential for both theory and numerical practice.
Concepts
PSD Cone — trace-2 symmetric matrices (drag the point)
Positive Definite
b² + Δ² = 0.000 < 1 ✓
A =
[1.000, 0]
[0, 1.000]
λ₁ = 1.000
λ₂ = 1.000
det = 1.000
tr = 2.000 (fixed = 2)
Cholesky A = LLᵀ
L = [1.000, 0]
[0, 1.000]
The green disk is the PD cone cross-section (trace fixed = 2). Drag inside for PD, on the boundary for PSD, outside for indefinite.
A covariance matrix must not assign negative variance to any direction — that would be geometrically impossible. Positive semidefinite (PSD) matrices are the formal expression of this constraint: they are the matrix analogue of non-negative numbers, forming a convex cone. Every covariance matrix, kernel matrix, Gram matrix, and positive-curvature Hessian is PSD, and the theory of these objects hinges on this single constraint.
Definitions and Notation
A symmetric matrix is:
| Name | Symbol | Condition |
|---|---|---|
| Positive definite | for all | |
| Positive semidefinite | for all | |
| Negative definite | for all | |
| Negative semidefinite | for all | |
| Indefinite | — | Takes both positive and negative values |
The condition for all is exactly what ensures defines a valid squared length. A matrix violating this would assign negative "squared distance" to some direction — geometrically impossible, and numerically catastrophic (Cholesky factorization would fail, log-likelihoods would be undefined).
The ordering (the Loewner order) means . This is a partial order on symmetric matrices that is compatible with scalar ordering: implies for , and .
Equivalent Characterizations
Theorem. For a symmetric , the following are equivalent:
- (definition)
- All eigenvalues
- for some (Gram matrix form)
- All leading principal minors for
- The Cholesky factorization exists (with possible zero diagonal entries)
For positive definite , all the inequalities become strict:
- All
- All leading principal minors (Sylvester's criterion)
- Cholesky with lower triangular and positive diagonal exists uniquely
The PSD Cone
The set of PSD matrices, denoted , is a convex cone:
- for
- The set is closed and convex
For , a trace-2 symmetric matrix has the form . It is PD iff , i.e., — the PD cone cross-section (at fixed trace) is a disk (illustrated in the diagram at the top of this lesson).
Gram Matrices and Kernel Matrices
Gram matrix. Given vectors , the Gram matrix with is PSD. Proof: for any :
Kernel matrix. A function is a positive definite kernel if for all and all , the kernel (Gram) matrix is PSD. This is Mercer's condition. Examples:
| Kernel | Formula | Notes |
|---|---|---|
| Linear | Gram matrix of data | |
| Polynomial | , | |
| Gaussian RBF | length scale | |
| Laplace | Rougher than RBF | |
| Matérn | (various) | Parameterized smoothness |
The Schur Complement
For a block symmetric matrix with :
Theorem. .
The matrix is the Schur complement of in .
Why it matters: The Schur complement arises in Gaussian conditioning. If , then the conditional covariance of is exactly the Schur complement — the PSD Schur complement theorem guarantees this is a valid (non-negative) covariance.
Completing the square. The block factorization:
shows that — Schur complement determinant formula.
Matrix Square Root and Powers
For with spectral decomposition , define:
Properties: , , unique among PSD matrices.
More generally, for any function applied entrywise to the eigenvalues. For a PD matrix: is the inverse square root, used in whitening transformations.
Whitening. Given covariance and data vector with , the whitened vector has . Whitening is the preprocessing step that removes all linear correlations, mapping arbitrary Gaussians to standard Gaussians.
Operations Preserving PSD-ness
The following operations map PSD matrices to PSD matrices:
| Operation | Result | Proof |
|---|---|---|
| for | Convex cone | |
| for any | Gram form | |
| (Hadamard product) | Schur product theorem | |
| , all | ||
| Eigenvalues |
Schur product theorem. If , then the elementwise (Hadamard) product (with ) is also PSD. Corollary: the pointwise product of two positive definite kernels is a positive definite kernel.
Worked Example
Example 1: Sylvester's Criterion
Test for positive definiteness.
Leading principal minors:
- : ✓
- : ✓
- : ✓
All minors positive by Sylvester's criterion. The Cholesky factor is:
Example 2: Covariance Matrix and Its Inverse
If the covariance matrix is , then (eigenvalues ).
The precision matrix is also PD (eigenvalues = ). The precision matrix encodes partial correlations: means and are conditionally dependent (their partial correlation is nonzero).
The Cholesky of the precision is used in sparse Gaussian graphical models (glasso), where sparsity of encodes conditional independence structure.
Example 3: Gaussian Conditioning via Schur Complement
Let .
Given , the conditional distribution is:
The conditional variance is the Schur complement of in :
The PSD-ness of the Schur complement is what guarantees the conditional variance is non-negative (and equals zero only when , i.e., perfect linear relationship).
Connections
Where Your Intuition Breaks
A matrix with all positive entries is positive definite. These are entirely different properties. The matrix has all positive entries, but , so it is indefinite — there exist directions where . Conversely, the matrix has negative off-diagonal entries but is positive definite (eigenvalues 1.9 and 0.1). Positive definiteness is a property of the quadratic form the matrix defines, not a property of the individual entries. Always check eigenvalues or Sylvester's criterion, not entry signs.
PSD in Optimization: Second-Order Conditions
Theorem (Second-Order Optimality). For a twice-differentiable :
- Necessary condition for local min: and
- Sufficient condition for strict local min: and
- Convex function: is convex for all
The eigenvalues of the Hessian at a critical point determine its nature: all positive → minimum, all negative → maximum, mixed signs → saddle point.
Saddle points dominate in high dimensions. For a random function of variables, a critical point has probability roughly of being a local minimum (all Hessian eigenvalues must be positive). In high-dimensional loss landscapes (neural networks with millions of parameters), virtually all critical points are saddle points. This is one reason why gradient descent in deep learning doesn't get stuck — it escapes saddles efficiently, often faster than theory predicts.
PSD matrices as valid metrics. A PSD matrix defines a (possibly degenerate) inner product . When , this is a genuine inner product (positive definite). Mahalanobis distance uses the precision matrix as the metric: . The condition is what guarantees is a true metric.
Near-singular covariance matrices. In practice, sample covariance matrices from high-dimensional data () are always singular (rank at most ) — not even PSD with full rank. Regularization via (diagonal loading, a.k.a. Ledoit-Wolf shrinkage) restores positive definiteness and enables stable Cholesky factorization. The choice of trades off between the sample covariance and the identity — a bias-variance tradeoff in covariance estimation.
Semidefinite Programming (SDP)
A semidefinite program optimizes a linear objective over the PSD cone:
SDPs generalize both linear programs (LP: variables are non-negative scalars, a 1D PSD cone) and second-order cone programs (SOCP). They are solvable in polynomial time via interior-point methods and arise in:
- Relaxations of combinatorial problems (MAX-CUT via Goemans-Williamson SDP relaxation)
- Sum-of-squares (SOS) proofs of polynomial nonnegativity
- Controller design in control theory (linear matrix inequalities)
- Metric learning (learning a PSD Mahalanobis metric from pairwise constraints)
- Low-rank matrix completion (Netflix prize problem)
Enjoying these notes?
Get new lessons delivered to your inbox. No spam.