Sensitivity Analysis & Robustness

Every causal estimate from observational data rests on an untestable assumption: that you have measured and controlled for all relevant confounders. You can never prove this. What you can do is ask how sensitive your conclusion is to violations of the assumption — and report that sensitivity alongside your estimate. Sensitivity analysis is not a limitation to apologize for; it is the honest accounting that makes causal claims credible.

Theory

Cinelli-Hazlett sensitivity — how strong must confounding be to explain away the effect?

The fundamental problem. Unconfoundedness — $Y(d) \perp D \mid X$ — is the identifying assumption for observational causal inference. It is untestable from data: any dataset consistent with unconfoundedness is also consistent with some unmeasured confounder $U$ that explains the association. What sensitivity analysis provides is a characterization of how large $U$ would need to be to overturn your conclusion.

E-values (VanderWeele and Ding, 2017). The E-value is the minimum relative risk (RR) that an unmeasured confounder $U$ would need to have with both $D$ (treatment) and $Y$ (outcome) — after conditioning on measured confounders — to fully explain away the observed association. Formally:

$E\text{-value} = RR + \sqrt{RR \cdot (RR - 1)}$

where $RR$ is the observed risk ratio. A large E-value means confounding would need to be very strong to explain the result. An E-value near 1 means even weak confounding could explain it away.

Why it had to be this way. The E-value formula comes from bounding the bias formula for unmeasured confounding. If $U$ has relative risk $\text{RR}_{UD}$ with treatment and $\text{RR}_{UY}$ with outcome, the maximum bias factor from $U$ is $\text{RR}_{UD} \cdot \text{RR}_{UY} / (\text{RR}_{UD} + \text{RR}_{UY} - 1)$ . The E-value is the minimum value of $\text{RR}_{UD} = \text{RR}_{UY}$ that achieves bias factor $= RR$ — i.e., fully explains away the effect. It provides a single interpretable threshold.

Rosenbaum bounds ( $\Gamma$ sensitivity). For matched observational studies, Rosenbaum (2002) asks: how large does $\Gamma$ (the odds ratio of treatment for two matched units that differ only in an unmeasured covariate) need to be before the conclusion changes? A study is $\Gamma = 2$ robust if, even with hidden bias that doubles the odds of treatment, the conclusion holds.

Cinelli-Hazlett partial $R^2$ framework. For linear regression with continuous outcomes, the partial $R^2$ framework (Cinelli and Hazlett, 2020) expresses sensitivity in terms of:

$R^2_{Y \sim Z \mid X, D}$ : partial $R^2$ of unmeasured confounder $Z$ with outcome (after controlling for $X$ and $D$ )
$R^2_{D \sim Z \mid X}$ : partial $R^2$ of $Z$ with treatment (after controlling for $X$ )

The contour lines in the sensitivity plot show combinations of these two quantities that would exactly zero out the estimate or reduce it to a specific magnitude. You can benchmark against observed covariates: if no observed covariate has both high $R^2_{Y}$ and $R^2_{D}$ , a confounder would need to be stronger than anything you've measured.

Placebo tests. A complementary approach: use pre-treatment outcomes as pseudo-outcomes and test whether the "treatment effect" on them is near zero. If the estimator finds spurious effects on outcomes that precede treatment, the identifying assumptions are likely violated.

Walkthrough

Scenario: A propensity-score matched study finds a 20% relative risk reduction from a new drug. We want to characterize sensitivity to unmeasured confounding.

Step 1: Compute E-value.

python

import numpy as np
 
def evalue_rr(rr: float, rr_lower_ci: float | None = None) -> dict:
    """Compute E-value for an observed risk ratio.
 
    Also computes E-value for the confidence interval bound (more conservative).
    """
    def ev(r): return r + np.sqrt(r * (r - 1)) if r >= 1 else 1.0 / (r + np.sqrt(r * (1/r - 1)))
    e_point = float(ev(rr))
    result = {'rr': rr, 'evalue_point': round(e_point, 3)}
    if rr_lower_ci is not None:
        e_ci = float(ev(rr_lower_ci))
        result['evalue_ci'] = round(e_ci, 3)
        result['interpretation'] = (
            f"To explain away the point estimate, an unmeasured confounder would need "
            f"RR >= {e_point:.2f} with both treatment and outcome. "
            f"To move the CI to include 1.0, RR >= {e_ci:.2f}."
        )
    return result

Step 2: Partial $R^2$ robustness value (Cinelli-Hazlett).

python

def robustness_value(
    t_stat: float,
    n: int,
    k: int,    # number of covariates in model
    alpha: float = 0.05,
) -> dict:
    """Compute Cinelli-Hazlett robustness value RV_alpha.
 
    RV_alpha: minimum partial R^2 (equal for treatment and outcome) that
    reduces the t-stat to the critical value at level alpha.
    """
    from scipy.stats import t as t_dist
    t_crit = t_dist.ppf(1 - alpha / 2, df=n - k - 1)
    rv = max(0.0, (abs(t_stat) - t_crit) / (abs(t_stat) + t_crit + 1e-9))
    return {
        't_stat': round(t_stat, 3),
        't_crit': round(t_crit, 3),
        'rv_alpha': round(rv, 4),
        'interpretation': (
            f"An unmeasured confounder with partial R^2 >= {rv:.2%} with both "
            f"treatment and outcome would reduce significance below alpha={alpha}."
        ),
    }

Step 3: Placebo test.

python

import pandas as pd
 
def placebo_test(
    df: pd.DataFrame,
    outcome_col: str,
    treatment_col: str,
    pre_outcome_col: str,   # outcome measured before treatment
    covariate_cols: list[str],
) -> dict:
    """Test identifying assumptions by using pre-treatment outcome as pseudo-outcome.
 
    A large 'treatment effect' on the pre-treatment outcome indicates confounding.
    """
    import statsmodels.formula.api as smf
    covs = ' + '.join(covariate_cols)
    model_main = smf.ols(f'{outcome_col} ~ {treatment_col} + {covs}', data=df).fit()
    model_placebo = smf.ols(f'{pre_outcome_col} ~ {treatment_col} + {covs}', data=df).fit()
    main_coef = float(model_main.params[treatment_col])
    placebo_coef = float(model_placebo.params[treatment_col])
    placebo_pval = float(model_placebo.pvalues[treatment_col])
    return {
        'main_effect': round(main_coef, 6),
        'placebo_effect': round(placebo_coef, 6),
        'placebo_pvalue': round(placebo_pval, 4),
        'ratio': round(abs(placebo_coef / (main_coef + 1e-9)), 3),
        'warning': (
            'Placebo effect is large relative to main effect — confounding likely'
            if abs(placebo_coef) > 0.3 * abs(main_coef) else None
        ),
    }

Analysis & Evaluation

Where your intuition breaks. A large E-value feels reassuring — "confounding would need to be very strong." But large E-values are common when the effect size is large, not necessarily when the study design is good. A study with strong confounding controls, careful matching, and an E-value of 2 is far more trustworthy than a study with no confounding controls and an E-value of 5. The E-value characterizes robustness to unmeasured confounding, conditional on the assumption that measured confounders are controlled. If the measured confounders are poorly controlled, the E-value overstates robustness.

Method	What it measures	When to use
E-value	RR strength needed to explain away effect	Binary outcomes, RR scale
Rosenbaum bound	Odds ratio of hidden bias	Matched studies
Partial R² (Cinelli-Hazlett)	Variance explained by confounder	Linear regression, ATE scale
Placebo test	Pre-treatment outcome falsification	Panel data with baseline
Manski bounds	Worst-case bounds without assumptions	When assumptions are highly uncertain

How to report. Do not bury sensitivity analysis in an appendix. Include the E-value or robustness value in the main result table: "Effect = 0.25 (SE = 0.07); Robustness value = 0.12 — an unmeasured confounder explaining 12% of variance in both treatment and outcome would eliminate significance." This allows readers to calibrate against known confounders.

🚀Production

Benchmark sensitivity against observed covariates. After computing the Cinelli-Hazlett contour, plot where your measured covariates fall in the partial R² space. If the "danger zone" (where confounding would overturn the result) requires a confounder stronger than any measured covariate, the result is more robust. If common covariates fall in the danger zone, the unmeasured confounder concern is concrete and specific.

Production-Ready Code

python

"""
Sensitivity analysis production toolkit.
E-values, Rosenbaum bounds, Cinelli-Hazlett partial R^2,
placebo tests, and automated sensitivity reports.
"""
 
from __future__ import annotations
from dataclasses import dataclass
import numpy as np
import pandas as pd
from scipy.stats import norm, t as t_dist
 
 
@dataclass
class SensitivityReport:
    estimate: float
    se: float
    t_stat: float
    evalue: float | None
    robustness_value: float | None
    placebo_effect: float | None
    placebo_pvalue: float | None
    interpretation: str
 
 
def evalue_rr(rr: float, rr_ci_bound: float | None = None) -> dict:
    """E-value for observed risk ratio (VanderWeele-Ding 2017)."""
    def ev(r: float) -> float:
        if r >= 1:
            return r + np.sqrt(r * (r - 1))
        return 1.0 / (1/r + np.sqrt((1/r) * (1/r - 1)))
    result = {
        'rr': rr,
        'evalue_point': round(float(ev(rr)), 3),
    }
    if rr_ci_bound is not None:
        result['evalue_ci'] = round(float(ev(rr_ci_bound)), 3)
    return result
 
 
def cinelli_hazlett_bounds(
    t_stat: float,
    n: int,
    k: int,
    benchmark_r2: dict[str, tuple[float, float]] | None = None,
    alpha: float = 0.05,
) -> dict:
    """Cinelli-Hazlett partial R^2 sensitivity bounds.
 
    Returns robustness value and optional benchmark comparisons.
    benchmark_r2: {'covariate_name': (r2_treatment, r2_outcome), ...}
    """
    df = n - k - 1
    t_crit = float(t_dist.ppf(1 - alpha / 2, df=df))
    rv = max(0.0, float((abs(t_stat) - t_crit) / (abs(t_stat) + t_crit + 1e-9)))
 
    result = {
        't_stat': round(t_stat, 3),
        't_crit': round(t_crit, 3),
        'robustness_value': round(rv, 4),
        'benchmarks': {},
    }
 
    if benchmark_r2:
        for name, (r2d, r2y) in benchmark_r2.items():
            bias_t = abs(t_stat) * np.sqrt(r2d * r2y) / (np.sqrt((1 - r2d) * (1 - r2y)) + 1e-9)
            t_adjusted = abs(t_stat) - bias_t
            result['benchmarks'][name] = {
                'r2_treatment': r2d,
                'r2_outcome': r2y,
                't_adjusted': round(float(t_adjusted), 3),
                'still_significant': t_adjusted > t_crit,
            }
 
    return result
 
 
def rosenbaum_gamma_bound(
    matched_pairs: pd.DataFrame,   # columns: Y_treated, Y_control
    gamma: float = 2.0,
    alpha: float = 0.05,
) -> dict:
    """Rosenbaum sensitivity test for matched studies at given Gamma value.
 
    Returns whether conclusion holds at the specified Gamma level.
    """
    diffs = (matched_pairs['Y_treated'] - matched_pairs['Y_control']).values
    n = len(diffs)
    pos = np.sum(diffs > 0)
 
    p_max = gamma / (1 + gamma)
    from scipy.stats import binom
    pvalue_upper = float(binom.sf(pos - 1, n, p_max))
 
    return {
        'gamma': gamma,
        'n_pairs': n,
        'n_positive': int(pos),
        'pvalue_upper': round(pvalue_upper, 4),
        'conclusion_holds': pvalue_upper < alpha,
        'interpretation': (
            f"At Gamma={gamma} (hidden bias could {gamma}x the odds of treatment), "
            f"p-value upper bound = {pvalue_upper:.4f}. "
            f"{'Conclusion holds.' if pvalue_upper < alpha else 'Conclusion may not hold.'}"
        ),
    }
 
 
def run_sensitivity_report(
    estimate: float,
    se: float,
    n: int,
    k: int,
    placebo_result: dict | None = None,
    benchmark_r2: dict | None = None,
    alpha: float = 0.05,
) -> SensitivityReport:
    """Generate a unified sensitivity report."""
    t_stat = estimate / se
    rv_result = cinelli_hazlett_bounds(t_stat, n, k, benchmark_r2, alpha)
    rv = rv_result['robustness_value']
    t_crit = rv_result['t_crit']
    interp = (
        f"Effect = {estimate:.4f} (SE = {se:.4f}, t = {t_stat:.2f}). "
        f"Robustness value = {rv:.2%}: an unmeasured confounder explaining "
        f"{rv:.0%} of variance in both treatment and outcome would eliminate significance. "
        f"{'Conclusion is robust.' if rv > 0.10 else 'Conclusion is fragile — sensitivity analysis required.'}"
    )
    return SensitivityReport(
        estimate=round(estimate, 6),
        se=round(se, 6),
        t_stat=round(t_stat, 3),
        evalue=None,  # set separately for RR-scale estimates
        robustness_value=round(rv, 4),
        placebo_effect=placebo_result.get('placebo_effect') if placebo_result else None,
        placebo_pvalue=placebo_result.get('placebo_pvalue') if placebo_result else None,
        interpretation=interp,
    )

Enjoying these notes?

Get new lessons delivered to your inbox. No spam.

Causal Forests & CATE Estimation