Sensitivity Analysis & Robustness
Every causal estimate from observational data rests on an untestable assumption: that you have measured and controlled for all relevant confounders. You can never prove this. What you can do is ask how sensitive your conclusion is to violations of the assumption — and report that sensitivity alongside your estimate. Sensitivity analysis is not a limitation to apologize for; it is the honest accounting that makes causal claims credible.
Theory
Cinelli-Hazlett sensitivity — how strong must confounding be to explain away the effect?
The fundamental problem. Unconfoundedness — — is the identifying assumption for observational causal inference. It is untestable from data: any dataset consistent with unconfoundedness is also consistent with some unmeasured confounder that explains the association. What sensitivity analysis provides is a characterization of how large would need to be to overturn your conclusion.
E-values (VanderWeele and Ding, 2017). The E-value is the minimum relative risk (RR) that an unmeasured confounder would need to have with both (treatment) and (outcome) — after conditioning on measured confounders — to fully explain away the observed association. Formally:
where is the observed risk ratio. A large E-value means confounding would need to be very strong to explain the result. An E-value near 1 means even weak confounding could explain it away.
Why it had to be this way. The E-value formula comes from bounding the bias formula for unmeasured confounding. If has relative risk with treatment and with outcome, the maximum bias factor from is . The E-value is the minimum value of that achieves bias factor — i.e., fully explains away the effect. It provides a single interpretable threshold.
Rosenbaum bounds ( sensitivity). For matched observational studies, Rosenbaum (2002) asks: how large does (the odds ratio of treatment for two matched units that differ only in an unmeasured covariate) need to be before the conclusion changes? A study is robust if, even with hidden bias that doubles the odds of treatment, the conclusion holds.
Cinelli-Hazlett partial framework. For linear regression with continuous outcomes, the partial framework (Cinelli and Hazlett, 2020) expresses sensitivity in terms of:
- : partial of unmeasured confounder with outcome (after controlling for and )
- : partial of with treatment (after controlling for )
The contour lines in the sensitivity plot show combinations of these two quantities that would exactly zero out the estimate or reduce it to a specific magnitude. You can benchmark against observed covariates: if no observed covariate has both high and , a confounder would need to be stronger than anything you've measured.
Placebo tests. A complementary approach: use pre-treatment outcomes as pseudo-outcomes and test whether the "treatment effect" on them is near zero. If the estimator finds spurious effects on outcomes that precede treatment, the identifying assumptions are likely violated.
Walkthrough
Scenario: A propensity-score matched study finds a 20% relative risk reduction from a new drug. We want to characterize sensitivity to unmeasured confounding.
Step 1: Compute E-value.
import numpy as np
def evalue_rr(rr: float, rr_lower_ci: float | None = None) -> dict:
"""Compute E-value for an observed risk ratio.
Also computes E-value for the confidence interval bound (more conservative).
"""
def ev(r): return r + np.sqrt(r * (r - 1)) if r >= 1 else 1.0 / (r + np.sqrt(r * (1/r - 1)))
e_point = float(ev(rr))
result = {'rr': rr, 'evalue_point': round(e_point, 3)}
if rr_lower_ci is not None:
e_ci = float(ev(rr_lower_ci))
result['evalue_ci'] = round(e_ci, 3)
result['interpretation'] = (
f"To explain away the point estimate, an unmeasured confounder would need "
f"RR >= {e_point:.2f} with both treatment and outcome. "
f"To move the CI to include 1.0, RR >= {e_ci:.2f}."
)
return resultStep 2: Partial robustness value (Cinelli-Hazlett).
def robustness_value(
t_stat: float,
n: int,
k: int, # number of covariates in model
alpha: float = 0.05,
) -> dict:
"""Compute Cinelli-Hazlett robustness value RV_alpha.
RV_alpha: minimum partial R^2 (equal for treatment and outcome) that
reduces the t-stat to the critical value at level alpha.
"""
from scipy.stats import t as t_dist
t_crit = t_dist.ppf(1 - alpha / 2, df=n - k - 1)
rv = max(0.0, (abs(t_stat) - t_crit) / (abs(t_stat) + t_crit + 1e-9))
return {
't_stat': round(t_stat, 3),
't_crit': round(t_crit, 3),
'rv_alpha': round(rv, 4),
'interpretation': (
f"An unmeasured confounder with partial R^2 >= {rv:.2%} with both "
f"treatment and outcome would reduce significance below alpha={alpha}."
),
}Step 3: Placebo test.
import pandas as pd
def placebo_test(
df: pd.DataFrame,
outcome_col: str,
treatment_col: str,
pre_outcome_col: str, # outcome measured before treatment
covariate_cols: list[str],
) -> dict:
"""Test identifying assumptions by using pre-treatment outcome as pseudo-outcome.
A large 'treatment effect' on the pre-treatment outcome indicates confounding.
"""
import statsmodels.formula.api as smf
covs = ' + '.join(covariate_cols)
model_main = smf.ols(f'{outcome_col} ~ {treatment_col} + {covs}', data=df).fit()
model_placebo = smf.ols(f'{pre_outcome_col} ~ {treatment_col} + {covs}', data=df).fit()
main_coef = float(model_main.params[treatment_col])
placebo_coef = float(model_placebo.params[treatment_col])
placebo_pval = float(model_placebo.pvalues[treatment_col])
return {
'main_effect': round(main_coef, 6),
'placebo_effect': round(placebo_coef, 6),
'placebo_pvalue': round(placebo_pval, 4),
'ratio': round(abs(placebo_coef / (main_coef + 1e-9)), 3),
'warning': (
'Placebo effect is large relative to main effect — confounding likely'
if abs(placebo_coef) > 0.3 * abs(main_coef) else None
),
}Analysis & Evaluation
Where your intuition breaks. A large E-value feels reassuring — "confounding would need to be very strong." But large E-values are common when the effect size is large, not necessarily when the study design is good. A study with strong confounding controls, careful matching, and an E-value of 2 is far more trustworthy than a study with no confounding controls and an E-value of 5. The E-value characterizes robustness to unmeasured confounding, conditional on the assumption that measured confounders are controlled. If the measured confounders are poorly controlled, the E-value overstates robustness.
| Method | What it measures | When to use |
|---|---|---|
| E-value | RR strength needed to explain away effect | Binary outcomes, RR scale |
| Rosenbaum bound | Odds ratio of hidden bias | Matched studies |
| Partial R² (Cinelli-Hazlett) | Variance explained by confounder | Linear regression, ATE scale |
| Placebo test | Pre-treatment outcome falsification | Panel data with baseline |
| Manski bounds | Worst-case bounds without assumptions | When assumptions are highly uncertain |
How to report. Do not bury sensitivity analysis in an appendix. Include the E-value or robustness value in the main result table: "Effect = 0.25 (SE = 0.07); Robustness value = 0.12 — an unmeasured confounder explaining 12% of variance in both treatment and outcome would eliminate significance." This allows readers to calibrate against known confounders.
Benchmark sensitivity against observed covariates. After computing the Cinelli-Hazlett contour, plot where your measured covariates fall in the partial R² space. If the "danger zone" (where confounding would overturn the result) requires a confounder stronger than any measured covariate, the result is more robust. If common covariates fall in the danger zone, the unmeasured confounder concern is concrete and specific.
Production-Ready Code
"""
Sensitivity analysis production toolkit.
E-values, Rosenbaum bounds, Cinelli-Hazlett partial R^2,
placebo tests, and automated sensitivity reports.
"""
from __future__ import annotations
from dataclasses import dataclass
import numpy as np
import pandas as pd
from scipy.stats import norm, t as t_dist
@dataclass
class SensitivityReport:
estimate: float
se: float
t_stat: float
evalue: float | None
robustness_value: float | None
placebo_effect: float | None
placebo_pvalue: float | None
interpretation: str
def evalue_rr(rr: float, rr_ci_bound: float | None = None) -> dict:
"""E-value for observed risk ratio (VanderWeele-Ding 2017)."""
def ev(r: float) -> float:
if r >= 1:
return r + np.sqrt(r * (r - 1))
return 1.0 / (1/r + np.sqrt((1/r) * (1/r - 1)))
result = {
'rr': rr,
'evalue_point': round(float(ev(rr)), 3),
}
if rr_ci_bound is not None:
result['evalue_ci'] = round(float(ev(rr_ci_bound)), 3)
return result
def cinelli_hazlett_bounds(
t_stat: float,
n: int,
k: int,
benchmark_r2: dict[str, tuple[float, float]] | None = None,
alpha: float = 0.05,
) -> dict:
"""Cinelli-Hazlett partial R^2 sensitivity bounds.
Returns robustness value and optional benchmark comparisons.
benchmark_r2: {'covariate_name': (r2_treatment, r2_outcome), ...}
"""
df = n - k - 1
t_crit = float(t_dist.ppf(1 - alpha / 2, df=df))
rv = max(0.0, float((abs(t_stat) - t_crit) / (abs(t_stat) + t_crit + 1e-9)))
result = {
't_stat': round(t_stat, 3),
't_crit': round(t_crit, 3),
'robustness_value': round(rv, 4),
'benchmarks': {},
}
if benchmark_r2:
for name, (r2d, r2y) in benchmark_r2.items():
bias_t = abs(t_stat) * np.sqrt(r2d * r2y) / (np.sqrt((1 - r2d) * (1 - r2y)) + 1e-9)
t_adjusted = abs(t_stat) - bias_t
result['benchmarks'][name] = {
'r2_treatment': r2d,
'r2_outcome': r2y,
't_adjusted': round(float(t_adjusted), 3),
'still_significant': t_adjusted > t_crit,
}
return result
def rosenbaum_gamma_bound(
matched_pairs: pd.DataFrame, # columns: Y_treated, Y_control
gamma: float = 2.0,
alpha: float = 0.05,
) -> dict:
"""Rosenbaum sensitivity test for matched studies at given Gamma value.
Returns whether conclusion holds at the specified Gamma level.
"""
diffs = (matched_pairs['Y_treated'] - matched_pairs['Y_control']).values
n = len(diffs)
pos = np.sum(diffs > 0)
p_max = gamma / (1 + gamma)
from scipy.stats import binom
pvalue_upper = float(binom.sf(pos - 1, n, p_max))
return {
'gamma': gamma,
'n_pairs': n,
'n_positive': int(pos),
'pvalue_upper': round(pvalue_upper, 4),
'conclusion_holds': pvalue_upper < alpha,
'interpretation': (
f"At Gamma={gamma} (hidden bias could {gamma}x the odds of treatment), "
f"p-value upper bound = {pvalue_upper:.4f}. "
f"{'Conclusion holds.' if pvalue_upper < alpha else 'Conclusion may not hold.'}"
),
}
def run_sensitivity_report(
estimate: float,
se: float,
n: int,
k: int,
placebo_result: dict | None = None,
benchmark_r2: dict | None = None,
alpha: float = 0.05,
) -> SensitivityReport:
"""Generate a unified sensitivity report."""
t_stat = estimate / se
rv_result = cinelli_hazlett_bounds(t_stat, n, k, benchmark_r2, alpha)
rv = rv_result['robustness_value']
t_crit = rv_result['t_crit']
interp = (
f"Effect = {estimate:.4f} (SE = {se:.4f}, t = {t_stat:.2f}). "
f"Robustness value = {rv:.2%}: an unmeasured confounder explaining "
f"{rv:.0%} of variance in both treatment and outcome would eliminate significance. "
f"{'Conclusion is robust.' if rv > 0.10 else 'Conclusion is fragile — sensitivity analysis required.'}"
)
return SensitivityReport(
estimate=round(estimate, 6),
se=round(se, 6),
t_stat=round(t_stat, 3),
evalue=None, # set separately for RR-scale estimates
robustness_value=round(rv, 4),
placebo_effect=placebo_result.get('placebo_effect') if placebo_result else None,
placebo_pvalue=placebo_result.get('placebo_pvalue') if placebo_result else None,
interpretation=interp,
)Enjoying these notes?
Get new lessons delivered to your inbox. No spam.