The foundations of ML, documented while learning.
From linear algebra and probability theory to production AI — the math that actually underlies the models, the code that ships, and the gaps that textbooks skip. Free to read.
164 notes. 26 modules. 5 parts.
From linear algebra and probability theory through production AI — organized as a proper curriculum.
Data Engineering
20 notesData Infrastructure & Engineering
12 notes
- Data Systems Landscape
- The Modern Data Warehouse
- Data Lakes & Lakehouses
- Data Ingestion & CDC
- Batch Pipelines & ELT
- Data Modeling
- Stream Processing & Real-Time Data
- Pipeline Orchestration
- Data Quality & Testing
- Lineage, Governance & Contracts
- Data Observability
- Data Mesh & Platform Thinking
ML/AI Data Engineering
8 notes
- Feature Stores
- Training Data at Scale
- Dataset Versioning & Reproducibility
- ML Metadata & Lineage
- LLM Dataset Construction
- Vector Databases
- ML Data Pipelines
- Data Flywheels & Feedback Loops
Mathematics
68 notesLinear Algebra & Matrix Analysis
8 notes
- Vector Spaces & Linear Maps
- Inner Products, Norms & Orthogonality
- Eigenvalues, Eigenvectors & Diagonalization
- The Spectral Theorem & Symmetric Matrices
- SVD, QR & LU Decompositions
- Positive Semidefinite Matrices & Quadratic Forms
- Linear Systems & Least Squares
- Matrix Calculus & Differentiation
Multivariate Calculus & Differential Geometry
6 notes
- Topology Primer: Metric Spaces, Continuity & Compactness
- Differentiation in Rⁿ: Jacobians, Hessians & the Chain Rule
- Integration in Rⁿ: Fubini, Change of Variables & Surface Integrals
- Smooth Manifolds & Tangent Spaces
- Riemannian Geometry: Metrics, Geodesics & Curvature
- Bridge: Loss Landscapes, Natural Gradient & Equivariant Networks
Convex Analysis & Optimization Theory
6 notes
- Convex Sets & Functions: Definitions, Examples & Closure Properties
- Duality: Lagrangians, KKT Conditions & Strong Duality
- Gradient Methods: Convergence Rates & Information-Theoretic Lower Bounds
- Proximal Methods, ADMM & Operator Splitting
- Non-Convex Landscapes: Saddle Points, Spurious Minima & Escape
- Bridge: Adam, Learning Rate Theory & Neural Loss Landscape Analysis
Probability Theory
6 notes
- Measure Theory Primer: σ-Algebras, Measures & Lebesgue Integration
- Probability Spaces, Random Variables & Distributions
- Expectation, Moments, Characteristic Functions & Generating Functions
- Modes of Convergence: Almost Sure, In Probability, Lp & In Distribution
- Limit Theorems: LLN, CLT & Berry-Esseen
- Bridge: Concentration Inequalities (Hoeffding, Bernstein, Sub-Gaussian) & Generalization Bounds
Statistical Inference & Learning Theory
6 notes
- Estimation Theory: MLE, Sufficiency, Fisher Information & Cramér-Rao
- Hypothesis Testing: Neyman-Pearson, Likelihood Ratio Tests & Multiple Testing
- Bayesian Inference: Priors, Posteriors & Conjugacy
- PAC Learning, VC Dimension & Rademacher Complexity
- High-Dimensional Statistics: Sparsity, RIP & Compressed Sensing
- Bridge: Double Descent, Implicit Regularization & Modern Generalization Theory
Information Theory
5 notes
- Entropy, Mutual Information & the Information Hierarchy
- KL Divergence, f-Divergences & Total Variation Distance
- Data Processing Inequality & Sufficient Statistics
- Channel Capacity & Shannon's Coding Theorems
- Bridge: ELBO & VAEs, Contrastive Learning & Rate-Distortion as Compression
Stochastic Processes
6 notes
- Discrete-Time Markov Chains: Stationarity, Ergodicity & Mixing Times
- Continuous-Time Markov Chains & Poisson Processes
- Martingales: Optional Stopping & Doob's Inequalities
- Brownian Motion & Gaussian Processes
- Itô Calculus & Stochastic Differential Equations
- Bridge: MCMC, Langevin Dynamics & Diffusion Models as SDEs
Numerical Methods & Scientific Computing
5 notes
- Floating Point Arithmetic, Numerical Stability & Condition Numbers
- Direct Linear Solvers: LU, Cholesky & QR Factorizations
- Iterative Solvers: Conjugate Gradient & Krylov Methods
- Automatic Differentiation: Forward Mode, Reverse Mode & Computation Graphs
- Bridge: Autodiff in PyTorch/JAX, Mixed Precision & Numerical Stability in Training
Operations Research
5 notes
- Linear Programming: Simplex Method, Geometry & LP Duality
- Integer Programming: Branch & Bound & Cutting Planes
- Network Flows: Max-Flow, Min-Cost & Transportation Problems
- Combinatorial Optimization: Matching, Approximation Algorithms & Complexity
- Bridge: Dynamic Programming, Bellman Equations & Neural Architecture Search
Game Theory & Mechanism Design
5 notes
- Normal Form Games, Nash Equilibria & Mixed Strategies
- Extensive Form Games, Backward Induction & Subgame Perfection
- Zero-Sum Games & the Minimax Theorem
- Mechanism Design: Revelation Principle, VCG & Auction Theory
- Bridge: GANs as Zero-Sum Games, Multi-Agent RL & RLHF as Mechanism Design
Functional Analysis & Operator Theory
5 notes
- Normed & Banach Spaces: Completeness, Compactness & the Hahn-Banach Theorem
- Hilbert Spaces: Inner Products, Orthonormal Bases & Riesz Representation
- Bounded Linear Operators: Adjoints, Spectrum & Compact Operators
- Reproducing Kernel Hilbert Spaces: Mercer's Theorem & Feature Maps
- Bridge: Kernel Methods, Attention as Operators & Neural Operators (FNO)
Graph Theory & Combinatorics
5 notes
- Graph Fundamentals: Connectivity, Trees, Planarity & Colorings
- Spectral Graph Theory: Laplacians, Eigenvalues & the Cheeger Inequality
- Random Graphs: Erdős–Rényi, Phase Transitions & Small-World Networks
- Generating Functions, Combinatorial Enumeration & the Probabilistic Method
- Bridge: Spectral Clustering, GNN Expressivity & the Weisfeiler-Leman Hierarchy
Machine Learning
22 notesClassical ML Foundations
7 notes
- Linear & Logistic Regression
- Decision Trees & Ensembles
- Feature Engineering
- Train / Validation / Test Splits
- Bias-Variance Tradeoff
- Gradient Descent
- Gradient Boosting & Tabular ML
NLP Essentials
5 notes
- Tokenization & BPE
- Text Preprocessing
- N-gram Language Models
- Classical Text Classification
- Sequence Labeling
Deep Learning
10 notes
- Neural Networks & Backprop
- Regularization Techniques
- Training Dynamics
- Convolutional Neural Networks
- RNNs & LSTMs
- Embeddings
- Attention & Transformers
- BERT & Encoder Models
- Seq2Seq & T5
- Mixture of Experts
Modern AI
36 notesAlignment & Safety
7 notes
- SFT: Supervised Fine-Tuning
- RLHF & PPO
- DPO: Direct Preference Optimization
- GRPO: Group Relative Policy Optimization
- Constitutional AI
- Red-Teaming & Evaluation
- The Alignment Tax
LLMs in Production
11 notes
- Claude API & SDK
- Prompt Engineering
- Structured Outputs
- RAG Systems
- Advanced RAG
- Agents & Tool Use
- Fine-Tuning in Practice
- Evals Framework
- LLM-as-Judge & Regression Evals
- Latency & Cost Optimization
- Deployment & Serving
Agent Engineering
6 notes
- Agent Patterns
- Tool Use & MCP
- Multi-Agent Systems
- Planning & Reasoning
- Agent Evals & Reliability
- Agents in Production
Vision & Multimodal
8 notes
- Vision Transformers (ViT)
- CLIP & Contrastive Learning
- Vision-Language Models
- Object Detection & Segmentation
- Diffusion Models
- Latent Diffusion & Guided Generation
- Video Understanding
- Video Generation
Specialized Topics
4 notes
- Recommender Systems
- Graph Neural Networks
- Time Series & Forecasting
- Reinforcement Learning Fundamentals
Measurement
18 notesExperimentation
4 notes
- Hypothesis Testing
- Experimental Design
- Validity Threats
- Adaptive Experiments & Bandits
Causal Inference
4 notes
- Potential Outcomes & DAGs
- Quasi-Experimental Methods
- Propensity Scores & Modern Methods
- Heterogeneous Treatment Effects
Advanced Experiment Design
5 notes
- Geo Testing & Market Holdouts
- Switchback Experiments
- Network Experiments & Interference
- Always-Valid Sequential Testing
- Long-run Measurement & Holdout Groups
Modern Causal Methods
5 notes
- Synthetic Control
- Staggered DiD & Modern Estimators
- Double ML & Debiased Estimation
- Causal Forests & CATE Estimation
- Sensitivity Analysis & Robustness
Written while learning. Shared openly.
Rachel Z.
ML Practitioner
These notes started as margin annotations working through ML — the proofs I had to re-derive, the intuitions that finally clicked, the production patterns nobody writes down. They grew into something more systematic: 5 parts covering the math, the models, and the production layer, shared in case they're useful to anyone else.
"I write these for the version of me that hit a wall trying to understand why the math actually matters."
164
Notes
26
Modules
Free
Always

Rachel Z.
ML Practitioner
model.fit(X_train, y_train) accuracy = model.score(X_test)