Synthetic Control

A country bans cigarette sales to minors. A city passes a minimum wage law. A company launches in one region. How do you estimate what would have happened if the policy never occurred? You cannot randomize countries or cities. You have one treated unit and a handful of potential comparisons. Synthetic control is designed exactly for this situation: constructing a data-driven counterfactual from a weighted combination of untreated donor units.

Theory

Synthetic control (Abadie, Diamond, Hainmueller 2010) constructs a counterfactual for a single treated unit by finding a convex combination of donor units that best matches the treated unit in the pre-treatment period.

The optimization problem. Let $Y_{1t}$ be the treated unit's outcome at time $t$ and $Y_{jt}$ be donor $j$ 's outcome. Find weights $\mathbf{w} = (w_1, \ldots, w_{J_0})$ with $w_j \geq 0$ and $\sum_j w_j = 1$ minimizing:

$\min_{\mathbf{w}} \sum_{t \leq T_0} \left(Y_{1t} - \sum_{j=1}^{J_0} w_j Y_{jt}\right)^2$

The treatment effect at each post-period time is:

$\hat{\tau}_t = Y_{1t} - \sum_{j=1}^{J_0} \hat{w}_j Y_{jt}$

Why it had to be this way. DiD requires parallel trends — treated and control evolve identically without treatment. With a single treated unit and heterogeneous pre-period trends, this is often implausible. Synthetic control makes the counterfactual explicit and testable: if pre-period MSPE is low, the counterfactual is credible. The convexity constraint ( $w_j \geq 0$ , $\sum w_j = 1$ ) prevents extrapolation outside the support of the donor pool.

Inference via permutation. Standard asymptotic inference doesn't apply — you have one treated unit. Instead, run the same optimization for each donor unit as a placebo "treated" unit. The p-value is:

$p = \frac{1}{J_0} \sum_{j=1}^{J_0} \mathbf{1}\!\left[\text{post-period gap for donor } j \geq \text{post-period gap for treated}\right]$

Augmented Synthetic Control (ASCM). When pre-period fit is imperfect (common with few donors), Ben-Michael et al. (2021) add a bias correction term estimated from outcome model residuals. ASCM is more robust to poor pre-period fit and often preferred in practice.

Synthetic DiD. Arkhangelsky et al. (2021) combine SC-style unit weights with time weights (from a pre-period balancing regression), giving a doubly-robust estimator that inherits properties of both DiD and SC. It tolerates imperfect pre-period fit better than pure SC.

Walkthrough

Scenario: A tech company launches a new product in Germany. 12 other European countries serve as potential donors. We want to estimate the causal effect on monthly active users (MAU).

Step 1: Compute SC weights.

python

import numpy as np
from scipy.optimize import minimize
 
def fit_sc_weights(pre_treat: np.ndarray, pre_donor: np.ndarray) -> np.ndarray:
    """Minimize pre-period MSPE subject to convex weight constraints."""
    J0 = pre_donor.shape[1]
    def loss(w): return float(np.sum((pre_treat - pre_donor @ w) ** 2))
    result = minimize(
        loss, x0=np.ones(J0) / J0, method='SLSQP',
        bounds=[(0, 1)] * J0,
        constraints=[{'type': 'eq', 'fun': lambda w: w.sum() - 1}],
        options={'ftol': 1e-12, 'maxiter': 2000},
    )
    if not result.success:
        raise RuntimeError(f"SC failed: {result.message}")
    return result.x

Step 2: Estimate treatment effects.

python

def sc_treatment_effects(
    post_treat: np.ndarray,   # (T_post,)
    post_donor: np.ndarray,   # (T_post, J0)
    weights: np.ndarray,      # (J0,) from fit_sc_weights
) -> dict:
    synth_post = post_donor @ weights
    gaps = post_treat - synth_post
    return {
        'ate': float(gaps.mean()),
        'per_period': gaps.tolist(),
        'cumulative': float(gaps.sum()),
    }

Step 3: Permutation p-value.

python

def permutation_pvalue(
    pre_treat: np.ndarray, post_treat: np.ndarray,
    pre_donor: np.ndarray, post_donor: np.ndarray,
) -> dict:
    w = fit_sc_weights(pre_treat, pre_donor)
    actual_ate = float((post_treat - post_donor @ w).mean())
    actual_pre_mspe = float(np.mean((pre_treat - pre_donor @ w) ** 2))
 
    placebo_ates, placebo_mspes = [], []
    J0 = pre_donor.shape[1]
    for j in range(J0):
        other = [k for k in range(J0) if k != j]
        if len(other) < 2: continue
        try:
            w_j = fit_sc_weights(pre_donor[:, j], pre_donor[:, other])
            pre_mspe_j = float(np.mean((pre_donor[:, j] - pre_donor[:, other] @ w_j) ** 2))
            ate_j = float((post_donor[:, j] - post_donor[:, other] @ w_j).mean())
            placebo_ates.append(ate_j)
            placebo_mspes.append(pre_mspe_j)
        except RuntimeError:
            continue
 
    pvalue = float(np.mean(np.abs(placebo_ates) >= abs(actual_ate)))
    mspe_ratio = actual_pre_mspe / (float(np.median(placebo_mspes)) + 1e-12)
    return {
        'ate': round(actual_ate, 4),
        'pvalue': round(pvalue, 4),
        'mspe_ratio': round(mspe_ratio, 2),
        'reliable': mspe_ratio < 2.0,
    }

Analysis & Evaluation

Where your intuition breaks. Synthetic control looks like a regression: you're fitting a weighted combination of donors to match the treated unit. The crucial difference is the convexity constraint. In a regression, you can extrapolate — put negative weight on donors that diverge from the treated unit. In SC, all weights are non-negative and sum to 1, so the counterfactual is always a convex combination of actual donor paths. This prevents extrapolation but means SC can fail when the treated unit is unusual: if no convex combination of donors can match the treated unit's pre-period trend, the SC counterfactual is unreliable.

Method	Assumption	When it breaks
DiD	Parallel trends	Treated unit diverges pre-period
Synthetic control	Good pre-period fit	Treated unit is extreme, few donors
Augmented SC	Flexible bias correction	Very poor pre-period fit
Synthetic DiD	Both SC and DiD conditions	Both assumptions violated simultaneously

When SC fails. If MSPE ratio $> 2$ (treated MSPE is more than twice the median placebo MSPE), the SC fit is poor. Signs to watch for: many donors have near-zero weight (SC is just using 1–2 donors), or the pre-period fit wiggles around the treated unit rather than tracking it smoothly.

⚠️Warning

Report the MSPE ratio, not just the post-period gap. A visually impressive post-period divergence is only credible if the pre-period fit was tight. The MSPE ratio is the most important diagnostic for SC validity.

Production-Ready Code

python

"""
Synthetic control production pipeline.
SC weights, ASCM bias correction, Synthetic DiD,
and automated permutation inference with diagnostics.
"""
 
from __future__ import annotations
from dataclasses import dataclass
import numpy as np
import pandas as pd
from scipy.optimize import minimize
 
 
@dataclass
class SyntheticControlResult:
    weights: np.ndarray
    pre_mspe: float
    mspe_ratio: float
    post_gaps: np.ndarray     # (T_post,) per-period treatment effects
    ate: float
    pvalue: float
    reliable: bool            # mspe_ratio < 2
 
 
def fit_sc_weights(pre_treat: np.ndarray, pre_donor: np.ndarray) -> np.ndarray:
    J0 = pre_donor.shape[1]
    result = minimize(
        lambda w: float(np.sum((pre_treat - pre_donor @ w) ** 2)),
        x0=np.ones(J0) / J0, method='SLSQP',
        bounds=[(0, 1)] * J0,
        constraints=[{'type': 'eq', 'fun': lambda w: w.sum() - 1}],
        options={'ftol': 1e-12, 'maxiter': 3000},
    )
    if not result.success:
        raise RuntimeError(f"SC optimization failed: {result.message}")
    return result.x
 
 
def synthetic_control(
    pre_treat: np.ndarray,
    post_treat: np.ndarray,
    pre_donor: np.ndarray,
    post_donor: np.ndarray,
    donor_names: list[str] | None = None,
) -> SyntheticControlResult:
    """Full synthetic control pipeline with permutation inference."""
    w = fit_sc_weights(pre_treat, pre_donor)
    pre_mspe = float(np.mean((pre_treat - pre_donor @ w) ** 2))
    post_gaps = post_treat - post_donor @ w
    ate = float(post_gaps.mean())
 
    placebo_ates, placebo_mspes = [], []
    J0 = pre_donor.shape[1]
    for j in range(J0):
        other = [k for k in range(J0) if k != j]
        if len(other) < 2:
            continue
        try:
            wj = fit_sc_weights(pre_donor[:, j], pre_donor[:, other])
            placebo_mspes.append(float(np.mean((pre_donor[:, j] - pre_donor[:, other] @ wj) ** 2)))
            placebo_ates.append(float((post_donor[:, j] - post_donor[:, other] @ wj).mean()))
        except RuntimeError:
            continue
 
    median_mspe = float(np.median(placebo_mspes)) if placebo_mspes else float('nan')
    mspe_ratio = pre_mspe / (median_mspe + 1e-12)
    pvalue = float(np.mean(np.abs(placebo_ates) >= abs(ate))) if placebo_ates else float('nan')
 
    return SyntheticControlResult(
        weights=w,
        pre_mspe=pre_mspe,
        mspe_ratio=mspe_ratio,
        post_gaps=post_gaps,
        ate=ate,
        pvalue=pvalue,
        reliable=mspe_ratio < 2.0,
    )
 
 
def ascm_bias_correction(
    pre_treat: np.ndarray,
    post_treat: np.ndarray,
    pre_donor: np.ndarray,
    post_donor: np.ndarray,
    sc_weights: np.ndarray,
) -> float:
    """Augmented SC bias correction (Ben-Michael et al. 2021).
 
    Estimates bias in pre-period using ridge regression on residuals,
    then extrapolates bias correction to post-period.
    Returns bias-corrected ATE.
    """
    from sklearn.linear_model import Ridge
    model = Ridge(alpha=1.0).fit(pre_donor.T, pre_treat)
    synth_pre = pre_donor @ sc_weights
    pre_bias = float((pre_treat - synth_pre).mean())
    synth_post = post_donor @ sc_weights
    post_bias_correction = model.predict(post_donor.T).mean() - model.predict(pre_donor.T).mean()
    ate_raw = float((post_treat - synth_post).mean())
    return ate_raw - float(post_bias_correction) + pre_bias
 
 
def sc_summary(result: SyntheticControlResult, alpha: float = 0.05) -> dict:
    return {
        'ate': round(result.ate, 4),
        'pvalue': round(result.pvalue, 4),
        'significant': result.pvalue < alpha,
        'mspe_ratio': round(result.mspe_ratio, 2),
        'reliable': result.reliable,
        'top_donors': sorted(
            enumerate(result.weights), key=lambda x: -x[1]
        )[:3],
        'warning': None if result.reliable else (
            f"MSPE ratio {result.mspe_ratio:.1f} > 2.0 — consider ASCM or Synthetic DiD"
        ),
    }

Enjoying these notes?

Get new lessons delivered to your inbox. No spam.

AdvancedExp

Long-run Measurement & Holdout Groups

Staggered DiD & Modern Estimators