Local Environment Setup

Overview

Every code example in this course runs on a standard Python environment. By the end of this setup you will have:

Python 3.11 in an isolated virtual environment
Core ML stack: NumPy, Pandas, Scikit-learn, Matplotlib
Deep learning: PyTorch with hardware acceleration (Compute Unified Device Architecture (CUDA) on Linux/Windows, MPS on Apple Silicon)
LLM tooling: Transformers, Datasets, Optuna, LangChain
Verification script that confirms your environment matches the course

Why conda (Miniforge) over pip alone

For ML work, conda is the recommended environment manager because it handles non-Python binary dependencies — CUDA libraries, Math Kernel Library (MKL), OpenBLAS — that pip cannot install. Miniforge is the open-source, community-maintained conda distribution that defaults to conda-forge (the largest, fastest-updated package channel) and ships native arm64 binaries for Apple Silicon.

Tool	Python pkgs	Binary deps (CUDA, MKL)	ARM64 native	Recommended for
`pip` + `venv`	✓	✗	✓	Simple projects
Miniforge / conda	✓	✓	✓	ML / this course
Anaconda	✓	✓	✓ (slow)	Enterprise
`uv`	✓	✗	✓	Fast pure-Python

⚠️Warning

Do not use Anaconda on Apple Silicon. The default Anaconda distribution ships x86_64 binaries that run under Rosetta 2 emulation. Use Miniforge instead — it installs native arm64 conda and runs 2–4× faster on M-series chips.

Hardware acceleration overview

Platform	PyTorch backend	What accelerates
macOS Apple Silicon (M1–M4)	MPS (Metal Performance Shaders)	Matrix ops, convolutions, attention
Linux + NVIDIA GPU	CUDA	Everything — full acceleration
Windows + NVIDIA GPU	CUDA	Everything — full acceleration
macOS Intel / CPU-only	CPU	No GPU acceleration

For the classical ML lessons (Modules 01–02), CPU is sufficient. For deep learning (Module 02) you will feel the difference above batch sizes of ~128 — a Graphics Processing Unit (GPU) or MPS speeds up training 5–20×.

Platform Setup

macOS — Apple Silicon (M1, M2, M3, M4)

Step 1: Install Homebrew

bash

/bin/bash -c "$(curl -fsSL https://brew.sh/install.sh)"

After installation, follow the printed instructions to add Homebrew to your PATH (the two echo and eval lines it shows you).

Step 2: Install Miniforge

bash

brew install miniforge
conda init zsh        # or: conda init bash  (check with: echo $SHELL)

Close and reopen your terminal.

Step 3: Create the course environment

bash

conda create -n neuralpath python=3.11 -y
conda activate neuralpath

Step 4: Install packages

bash

# Core scientific stack
pip install numpy pandas scikit-learn matplotlib seaborn jupyterlab
 
# PyTorch — MPS backend ships in the standard wheel on Apple Silicon
pip install torch torchvision torchaudio
 
# LLM / NLP tooling
pip install transformers datasets tokenizers accelerate
 
# Experiment infrastructure
pip install optuna optuna-integration[sklearn] mlflow
 
# Utilities
pip install gray-matter pydantic httpx fastapi uvicorn python-dotenv

Step 5: Verify MPS

python

import torch
print(torch.__version__)                        # e.g. 2.3.0
print(torch.backends.mps.is_available())        # True
print(torch.backends.mps.is_built())            # True
 
x = torch.ones(3, 3, device="mps")
print(x @ x)                                   # matrix multiply on GPU

💡Intuition

mps.is_available() returns True on M1+ with macOS 12.3+. If it returns False, update macOS via System Settings → General → Software Update.

macOS — Intel

Step 1: Install Homebrew (same as above)

Step 2: Install Miniforge

bash

brew install miniforge
conda init zsh

Close and reopen your terminal.

Step 3: Create the course environment

bash

conda create -n neuralpath python=3.11 -y
conda activate neuralpath

Step 4: Install packages

bash

pip install numpy pandas scikit-learn matplotlib seaborn jupyterlab
pip install torch torchvision torchaudio          # CPU-only on Intel Mac
pip install transformers datasets tokenizers accelerate
pip install optuna optuna-integration[sklearn] mlflow
pip install gray-matter pydantic httpx fastapi uvicorn python-dotenv

Step 5: Verify

python

import torch
print(torch.__version__)
print(torch.backends.mps.is_available())   # False — Intel Mac has no MPS
print(torch.cuda.is_available())           # False — no CUDA on Mac
# CPU training only
device = "cpu"

⚠️Warning

Intel Macs have no hardware acceleration for PyTorch. Deep learning experiments in Module 02 will be slow. Consider using Google Colab (free T4 GPU) for the training-heavy notebooks: open a notebook, Runtime → Change runtime type → T4 GPU.

Windows

Step 1: Install NVIDIA drivers (GPU users only)

Open PowerShell and check your GPU:

powershell

nvidia-smi

If nvidia-smi is not found, download and install the latest NVIDIA Game Ready or Studio driver from nvidia.com/drivers. After installation, nvidia-smi should show your GPU and CUDA version (e.g. CUDA Version: 12.4).

Step 2: Install Miniconda

Download the Windows installer from docs.conda.io/en/latest/miniconda.html. Run the .exe, accept defaults, and check "Add Miniconda3 to my PATH" (or use the Anaconda Prompt).

Step 3: Create the environment

Open Anaconda Prompt (start menu search):

cmd

conda create -n neuralpath python=3.11 -y
conda activate neuralpath

Step 4: Install PyTorch with CUDA

Check your CUDA version from nvidia-smi output (top-right corner). Install the matching PyTorch wheel:

cmd

:: CUDA 12.1 (most common for recent drivers)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
 
:: CUDA 11.8 (older drivers)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
 
:: CPU only (no NVIDIA GPU)
pip install torch torchvision torchaudio

Step 5: Install remaining packages

cmd

pip install numpy pandas scikit-learn matplotlib seaborn jupyterlab
pip install transformers datasets tokenizers accelerate
pip install optuna optuna-integration[sklearn] mlflow
pip install gray-matter pydantic httpx fastapi uvicorn python-dotenv

Step 6: Verify CUDA

python

import torch
print(torch.__version__)               # e.g. 2.3.0+cu121
print(torch.cuda.is_available())       # True
print(torch.cuda.get_device_name(0))   # e.g. NVIDIA GeForce RTX 4080
print(torch.cuda.memory_allocated())   # 0 (no allocations yet)

⚠️Warning

Windows Defender / antivirus sometimes quarantines conda environment files during creation. If conda create hangs or produces permission errors, temporarily disable real-time protection for the Miniconda directory (C:\Users\<you>\miniconda3).

Linux (Ubuntu 20.04 / 22.04 / 24.04)

Step 1: Check for NVIDIA GPU

bash

nvidia-smi

If not found, install drivers:

bash

# Ubuntu 22.04+
sudo apt update
sudo apt install -y nvidia-driver-535    # or latest: ubuntu-drivers autoinstall
sudo reboot
nvidia-smi    # verify after reboot

Step 2: Install Miniforge

bash

wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
bash Miniforge3-Linux-x86_64.sh -b -p "$HOME/miniforge3"
~/miniforge3/bin/conda init bash
source ~/.bashrc

Step 3: Create the environment

bash

conda create -n neuralpath python=3.11 -y
conda activate neuralpath

Step 4: Install packages

bash

# PyTorch — CUDA wheels auto-detect your GPU
pip install torch torchvision torchaudio
 
# Alternatively, pin a CUDA version:
# pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
 
pip install numpy pandas scikit-learn matplotlib seaborn jupyterlab
pip install transformers datasets tokenizers accelerate
pip install optuna optuna-integration[sklearn] mlflow
pip install gray-matter pydantic httpx fastapi uvicorn python-dotenv

Step 5: Verify CUDA

python

import torch
print(torch.__version__)
print(torch.cuda.is_available())       # True
print(torch.cuda.get_device_name(0))

💡Intuition

On headless Linux servers (cloud VMs, SSH sessions), use tmux or screen to keep long training runs alive after disconnecting: tmux new -s train, then run your script, then detach with Ctrl-B D. Reattach later with tmux attach -t train.

Verification

Paste this into a new file verify_env.py and run python verify_env.py from your activated neuralpath environment:

python

import sys
import importlib
 
REQUIRED = {
    "numpy": "1.24",
    "pandas": "2.0",
    "sklearn": "1.3",
    "matplotlib": "3.7",
    "torch": "2.0",
    "transformers": "4.35",
    "optuna": "3.4",
    "mlflow": "2.8",
    "fastapi": "0.100",
}
 
print(f"Python {sys.version}\n")
all_ok = True
 
for pkg, min_version in REQUIRED.items():
    try:
        mod = importlib.import_module(pkg)
        version = getattr(mod, "__version__", "unknown")
        ok = version >= min_version
        status = "✓" if ok else "⚠ (upgrade recommended)"
        print(f"  {pkg:<18} {version:<12} {status}")
        if not ok:
            all_ok = False
    except ImportError:
        print(f"  {pkg:<18} NOT INSTALLED  ✗")
        all_ok = False
 
# Hardware check
import torch
print(f"\nHardware acceleration:")
if torch.cuda.is_available():
    print(f"  CUDA  ✓  {torch.cuda.get_device_name(0)}")
elif torch.backends.mps.is_available():
    print(f"  MPS   ✓  Apple Silicon")
else:
    print(f"  CPU only — GPU acceleration not available")
 
print(f"\n{'All checks passed ✓' if all_ok else 'Some packages need attention — see above'}")

Expected output on a healthy environment:

Python 3.11.8

  numpy              1.26.4       ✓
  pandas             2.2.1        ✓
  sklearn            1.4.1        ✓
  matplotlib         3.8.3        ✓
  torch              2.3.0        ✓
  transformers       4.40.0       ✓
  optuna             3.6.1        ✓
  mlflow             2.12.1       ✓
  fastapi            0.110.0      ✓

Hardware acceleration:
  MPS   ✓  Apple Silicon        (or CUDA / CPU depending on your machine)

All checks passed ✓

Troubleshooting

Problem	Cause	Fix
`conda: command not found`	Shell not initialized	Run `conda init zsh` (or `bash`), restart terminal
`ModuleNotFoundError` after install	Wrong environment active	`conda activate neuralpath`
`mps.is_available()` is `False` on M1	macOS < 12.3	Update macOS
`CUDA not available` on Linux	Driver mismatch	`pip install torch --index-url .../cu121` matching your `nvidia-smi` CUDA version
`pip` installs to base env	Forgot to activate	Always `conda activate neuralpath` first
Slow conda install	Default channels	Miniforge uses `conda-forge` by default — much faster than Anaconda

Running JupyterLab

bash

conda activate neuralpath
jupyter lab

This opens a browser tab at http://localhost:8888. All code examples in the course can be run in a notebook. For scripts, python script.py in a terminal with the environment active works identically.

Enjoying these notes?

Get new lessons delivered to your inbox. No spam.

Data Infra

Data Systems Landscape