Neural-Path ← Home

NOTES

165 notes on ML & LLMs.

Written while working through the material — the math, the code, and the production context that usually gets left out.

Where to start

Start with dataData engineer moving into MLData infrastructure, pipelines, and feature stores — the data layer that makes ML work.~18 hrs Start at the beginningEngineer new to MLLinear models to neural nets — build the intuition from scratch.~40 hrs Start the mathWant the deep mathLinear algebra, calculus, probability, optimization, and beyond — the full mathematical spine.~60 hrs Jump to transformersKnow ML, learning LLMsSkip to transformers, then alignment, RAG, and agents.~20 hrs Jump to productionBuilding with AI nowAPIs, prompt engineering, RAG, evals — practical LLM engineering.~12 hrs Jump to alignmentResearch trackAlignment math, RLHF, DPO, and safety — the theory behind the models.~15 hrs

Getting Started

MODULE 00

Getting Started

Set up your local environment on any OS to run all code in this course.

Local Environment Setup

Part I

Data Engineering2 modules

MODULE 01

Data Infrastructure & Engineering

From raw business events to analytical systems: warehouses, lakehouses, pipelines, streaming, orchestration, quality, and governance.

Data Systems Landscape

The Modern Data Warehouse

Data Lakes & Lakehouses

Data Ingestion & CDC

Batch Pipelines & ELT

MODULE 02

ML/AI Data Engineering

Feature stores, training data at scale, dataset versioning, LLM data pipelines, vector databases, and production feedback loops.

Training Data at Scale

Dataset Versioning & Reproducibility

ML Metadata & Lineage

LLM Dataset Construction

Part II

Mathematics12 modules

MODULE 03

Linear Algebra & Matrix Analysis

Vector spaces, spectral theory, matrix decompositions, and the geometric machinery behind every ML model — developed from first principles.

Vector Spaces & Linear Maps

Inner Products, Norms & Orthogonality

Eigenvalues, Eigenvectors & Diagonalization

The Spectral Theorem & Symmetric Matrices

SVD, QR & LU Decompositions

MODULE 04

Multivariate Calculus & Differential Geometry

Gradients, Hessians, manifolds, and Riemannian geometry — the geometric foundation of learning.

Topology Primer: Metric Spaces, Continuity & Compactness

Differentiation in Rⁿ: Jacobians, Hessians & the Chain Rule

Integration in Rⁿ: Fubini, Change of Variables & Surface Integrals

Smooth Manifolds & Tangent Spaces

Riemannian Geometry: Metrics, Geodesics & Curvature

MODULE 05

Convex Analysis & Optimization Theory

Duality, KKT conditions, proximal methods, and the geometry of non-convex loss landscapes.

Convex Sets & Functions: Definitions, Examples & Closure Properties

Duality: Lagrangians, KKT Conditions & Strong Duality

Gradient Methods: Convergence Rates & Information-Theoretic Lower Bounds

Proximal Methods, ADMM & Operator Splitting

Non-Convex Landscapes: Saddle Points, Spurious Minima & Escape

MODULE 06

Probability Theory

Measure-theoretic probability, distributions, convergence modes, limit theorems, and concentration inequalities.

Measure Theory Primer: σ-Algebras, Measures & Lebesgue Integration

Probability Spaces, Random Variables & Distributions

Expectation, Moments, Characteristic Functions & Generating Functions

Modes of Convergence: Almost Sure, In Probability, Lp & In Distribution

Limit Theorems: LLN, CLT & Berry-Esseen

MODULE 07

Statistical Inference & Learning Theory

MLE, Bayesian inference, PAC learning, VC dimension, and the theory behind why models generalize.

Estimation Theory: MLE, Sufficiency, Fisher Information & Cramér-Rao

Hypothesis Testing: Neyman-Pearson, Likelihood Ratio Tests & Multiple Testing

Bayesian Inference: Priors, Posteriors & Conjugacy

PAC Learning, VC Dimension & Rademacher Complexity

High-Dimensional Statistics: Sparsity, RIP & Compressed Sensing

MODULE 08

Information Theory

Entropy, mutual information, KL divergence, channel capacity, and the deep connection between compression and intelligence.

Entropy, Mutual Information & the Information Hierarchy

KL Divergence, f-Divergences & Total Variation Distance

Data Processing Inequality & Sufficient Statistics

Channel Capacity & Shannon's Coding Theorems

Bridge: ELBO & VAEs, Contrastive Learning & Rate-Distortion as Compression

MODULE 09

Stochastic Processes

Markov chains, martingales, Brownian motion, Itô calculus, and stochastic differential equations.

Discrete-Time Markov Chains: Stationarity, Ergodicity & Mixing Times

Continuous-Time Markov Chains & Poisson Processes

Martingales: Optional Stopping & Doob's Inequalities

Brownian Motion & Gaussian Processes

Itô Calculus & Stochastic Differential Equations

MODULE 10

Numerical Methods & Scientific Computing

Floating point, direct and iterative solvers, automatic differentiation, and numerical stability in practice.

Floating Point Arithmetic, Numerical Stability & Condition Numbers

Direct Linear Solvers: LU, Cholesky & QR Factorizations

Iterative Solvers: Conjugate Gradient & Krylov Methods

Automatic Differentiation: Forward Mode, Reverse Mode & Computation Graphs

Bridge: Autodiff in PyTorch/JAX, Mixed Precision & Numerical Stability in Training

MODULE 11

Operations Research

Linear programming, integer programming, network flows, combinatorial optimization, and dynamic programming.

Linear Programming: Simplex Method, Geometry & LP Duality

Integer Programming: Branch & Bound & Cutting Planes

Network Flows: Max-Flow, Min-Cost & Transportation Problems

Combinatorial Optimization: Matching, Approximation Algorithms & Complexity

Bridge: Dynamic Programming, Bellman Equations & Neural Architecture Search

MODULE 12

Game Theory & Mechanism Design

Nash equilibria, minimax, mechanism design, and the multi-agent foundations of GANs and RLHF.

Normal Form Games, Nash Equilibria & Mixed Strategies

Extensive Form Games, Backward Induction & Subgame Perfection

Zero-Sum Games & the Minimax Theorem

Mechanism Design: Revelation Principle, VCG & Auction Theory

Bridge: GANs as Zero-Sum Games, Multi-Agent RL & RLHF as Mechanism Design

MODULE 13

Functional Analysis & Operator Theory

Hilbert and Banach spaces, bounded operators, RKHS, and the operator-theoretic view of attention and kernels.

Normed & Banach Spaces: Completeness, Compactness & the Hahn-Banach Theorem

Hilbert Spaces: Inner Products, Orthonormal Bases & Riesz Representation

Bounded Linear Operators: Adjoints, Spectrum & Compact Operators

Reproducing Kernel Hilbert Spaces: Mercer's Theorem & Feature Maps

Bridge: Kernel Methods, Attention as Operators & Neural Operators (FNO)

MODULE 14

Graph Theory & Combinatorics

Spectral graph theory, random graphs, the probabilistic method, and the combinatorial foundations of GNNs.

Graph Fundamentals: Connectivity, Trees, Planarity & Colorings

Spectral Graph Theory: Laplacians, Eigenvalues & the Cheeger Inequality

Random Graphs: Erdős–Rényi, Phase Transitions & Small-World Networks

Generating Functions, Combinatorial Enumeration & the Probabilistic Method

Bridge: Spectral Clustering, GNN Expressivity & the Weisfeiler-Leman Hierarchy

Part III

Machine Learning3 modules

MODULE 15

Classical ML Foundations

Linear models, trees, and the core concepts every ML engineer must know.

Linear & Logistic Regression

Decision Trees & Ensembles

Feature Engineering

Train / Validation / Test Splits

Bias-Variance Tradeoff

MODULE 16

NLP Essentials

Classical NLP from tokenization through n-gram language models, text classification, and sequence labeling — the pre-neural foundations every practitioner needs.

Tokenization & BPE

Text Preprocessing

N-gram Language Models

Classical Text Classification

Sequence Labeling

MODULE 17

Deep Learning

Neural networks from backprop to Transformers.

Neural Networks & Backprop

Regularization Techniques

Training Dynamics

Convolutional Neural Networks

Part IV

Modern AI5 modules

MODULE 18

Alignment & Safety

RLHF, Constitutional AI, red-teaming, and the alignment tax.

SFT: Supervised Fine-Tuning

DPO: Direct Preference Optimization

GRPO: Group Relative Policy Optimization

Constitutional AI

MODULE 19

LLMs in Production

Claude API, RAG, agents, evals, and deployment at scale.

Claude API & SDK

Prompt Engineering

Structured Outputs

MODULE 20

Agent Engineering

ReAct, tool use, multi-agent systems, planning, and production reliability for LLM agents.

Multi-Agent Systems

Planning & Reasoning

Agent Evals & Reliability

MODULE 21

Vision & Multimodal

From pixels to perception to generation: ViT, CLIP, VLMs, diffusion models, and video architectures.

Vision Transformers (ViT)

CLIP & Contrastive Learning

Vision-Language Models

Object Detection & Segmentation

Diffusion Models

MODULE 22

Specialized Topics

Recommender systems, graph neural networks, time series, and reinforcement learning — specialized architectures powering production systems.

Recommender Systems

Graph Neural Networks

Time Series & Forecasting

Reinforcement Learning Fundamentals

Part V

Measurement4 modules

MODULE 23

Experimentation

The statistics and design of A/B tests: hypothesis testing, variance reduction, and validity threats.

Hypothesis Testing

Experimental Design

Validity Threats

Adaptive Experiments & Bandits

MODULE 24

Causal Inference

When experiments fail: potential outcomes, DAGs, DiD, RDD, IV, and modern observational methods.

Potential Outcomes & DAGs

Quasi-Experimental Methods

Propensity Scores & Modern Methods

Heterogeneous Treatment Effects

MODULE 25

Advanced Experiment Design

Geo holdouts, switchback designs, network experiments, always-valid inference, and long-run measurement — for when standard A/B testing breaks down.

Geo Testing & Market Holdouts

Switchback Experiments

Network Experiments & Interference

Always-Valid Sequential Testing

Long-run Measurement & Holdout Groups

MODULE 26

Modern Causal Methods

Synthetic control, staggered DiD, Double ML, causal forests, and sensitivity analysis — the state-of-the-art toolkit for rigorous causal estimation at scale.

Synthetic Control

Staggered DiD & Modern Estimators

Double ML & Debiased Estimation

Causal Forests & CATE Estimation

Sensitivity Analysis & Robustness