Network Experiments & Interference

When you give a user a new feature, they tell their friends. When you change a seller's pricing, it ripples through the buyers they serve. When you treat a node in a social network, the untreated nodes in its neighborhood are no longer a clean control. Network interference — the violation of SUTVA through social or marketplace connections — is one of the most common and most underappreciated sources of bias in tech industry experiments.

Theory

Bipartite marketplace: seller treatment spills over to shared buyers

Network interference occurs when unit $i$ 's outcome depends not just on its own assignment but on the assignments of its neighbors. This invalidates the standard potential outcomes framework, which assumes $Y_i(d_1, \ldots, d_n) = Y_i(d_i)$ . To reason carefully about interference, we need the exposure mapping framework.

Exposure mappings. An exposure mapping $f(d_i, \mathbf{d}_{N(i)})$ compresses the full assignment vector into a summary of the relevant exposure for unit $i$ . For example:

In a social network: $f = (d_i, \bar{d}_{N(i)})$ — my assignment and the fraction of neighbors who are treated
In a bipartite marketplace: $f = \mathbf{1}[\text{any seller I buy from is treated}]$

The key insight is that we replace $Y_i(d_1, \ldots, d_n)$ with $Y_i(f(d_i, \mathbf{d}_{N(i)}))$ , which is a manageable object to reason about and estimate.

Why it had to be this way. Without an exposure mapping, we can only estimate ITT (intent-to-treat) effects — the average over all the messy ways interference contaminates the control group. With an exposure mapping, we can define and estimate: direct effects (unit treated, neighbors not), indirect effects (unit not treated, neighbors treated), and total effects. These decompositions are essential for understanding mechanism and for correcting bias.

Randomization strategies for networked data:

Cluster randomization (ego-network): Randomly assign entire friendship clusters — ego and all alters — to treatment or control. Eliminates within-cluster interference but ignores between-cluster spillovers.
Graph clustering: Partition the social graph into dense communities (Louvain, spectral clustering) and randomize at the community level. Reduces interference because few edges cross community boundaries.
Bipartite randomization: In two-sided markets, randomize on one side (sellers) and measure the other (buyers). Buyers in the same exposure zone (all sellers treated, all untreated, or mixed) are analyzed separately.

Estimands under interference:

Direct effect (DE): $\mathbb{E}\!\left[Y_i(1, \mathbf{0}_{N(i)}) - Y_i(0, \mathbf{0}_{N(i)})\right]$ — treated vs. control with untreated neighbors
Indirect/spillover effect (IE): $\mathbb{E}\!\left[Y_i(0, \mathbf{1}_{N(i)}) - Y_i(0, \mathbf{0}_{N(i)})\right]$ — neighbor treatment on untreated unit
Total effect (TE): $DE + IE$ — from full control to full treatment

Walkthrough

Scenario: A marketplace has 10,000 sellers and 100,000 buyers. We test a new seller pricing tool. Sellers are randomized to treatment/control; we measure buyer outcomes.

Step 1: Compute buyer exposure mapping.

python

import pandas as pd
import numpy as np
 
def compute_buyer_exposure(
    buyer_seller_edges: pd.DataFrame,  # columns: buyer_id, seller_id
    seller_assignments: pd.Series,     # index: seller_id, values: 0/1
) -> pd.DataFrame:
    """Classify each buyer's exposure based on their sellers' assignments."""
    merged = buyer_seller_edges.merge(
        seller_assignments.rename('treatment'),
        left_on='seller_id', right_index=True,
    )
    stats = merged.groupby('buyer_id').agg(
        n_sellers=('treatment', 'count'),
        n_treated=('treatment', 'sum'),
    ).reset_index()
    stats['treat_frac'] = stats['n_treated'] / stats['n_sellers']
 
    def exposure_type(row):
        if row['treat_frac'] == 1.0:
            return 'direct'
        elif row['treat_frac'] == 0.0:
            return 'control'
        return 'indirect'
 
    stats['exposure_type'] = stats.apply(exposure_type, axis=1)
    return stats

Step 2: Estimate direct and spillover effects.

python

def estimate_network_effects(
    buyer_outcomes: pd.DataFrame,  # columns: buyer_id, outcome
    buyer_exposure: pd.DataFrame,
) -> dict:
    """Estimate direct, indirect, and total effects."""
    df = buyer_outcomes.merge(
        buyer_exposure[['buyer_id', 'exposure_type']], on='buyer_id'
    )
    means = df.groupby('exposure_type')['outcome'].agg(['mean', 'std', 'count'])
    direct_effect = means.loc['direct', 'mean'] - means.loc['control', 'mean']
    indirect_effect = means.loc['indirect', 'mean'] - means.loc['control', 'mean']
    se_direct = np.sqrt(
        means.loc['direct', 'std']**2 / means.loc['direct', 'count'] +
        means.loc['control', 'std']**2 / means.loc['control', 'count']
    )
    return {
        'direct_effect': round(direct_effect, 6),
        'indirect_effect': round(indirect_effect, 6),
        'total_effect': round(direct_effect + indirect_effect, 6),
        'se_direct': round(se_direct, 6),
    }

Step 3: Cluster-robust standard errors. Because buyers sharing sellers are correlated, cluster SEs at the seller level.

Analysis & Evaluation

Where your intuition breaks. The natural reaction to interference is to build bigger clusters — assign entire friend groups together. Bigger clusters do reduce interference, but they also reduce the number of independent observations. A design with 10 large clusters has the statistical power of roughly 10 observations, not 10,000. The optimal cluster size balances interference reduction against effective sample size reduction — and this optimum is often surprisingly small clusters (5–20 users), accepting modest interference for far greater power.

Design	Interference	Effective $n$	Use when
Individual randomization	High	$n$ users	No network, SUTVA valid
Ego-cluster	Low	$n$ clusters	Dense friendship graphs
Graph clustering (Louvain)	Very low	$n$ communities	Sparse graphs, few cross-edges
Bipartite (seller-side)	Structured	$n$ sellers	Two-sided marketplaces
Full holdout	Zero	$n$ markets	Only option for full-network effects

When to give up and use a holdout. If the network is so dense that no reasonable clustering achieves cross-cluster edge fraction below 5%, a bipartite experiment or full geo holdout is more honest than reporting an "experiment result" with heavy interference contamination.

💡Intuition

Interference biases toward zero. When treatment spills into the control group, control outcomes improve, compressing the apparent effect. This means interference typically causes underestimation of true treatment effects — network experiments are conservative, not anti-conservative.

Production-Ready Code

python

"""
Network experiment pipeline.
Graph clustering, exposure mapping, spillover-adjusted
estimation, and cluster-robust SEs.
"""
 
from __future__ import annotations
from dataclasses import dataclass
import warnings
import numpy as np
import pandas as pd
import networkx as nx
from scipy.stats import norm
 
 
@dataclass
class NetworkExperimentConfig:
    n_clusters_target: int = 500
    max_cross_edge_frac: float = 0.10
    randomization_seed: int = 42
 
 
def cluster_graph_for_experiment(
    edges: list[tuple[int, int]],
    config: NetworkExperimentConfig,
) -> dict[int, int]:
    """Partition social graph into clusters for randomization.
 
    Returns node -> cluster_id mapping. Warns if cross-cluster
    edge fraction exceeds threshold.
    """
    G = nx.Graph()
    G.add_edges_from(edges)
    communities = nx.community.greedy_modularity_communities(G)
    node_to_cluster: dict[int, int] = {}
    for cid, community in enumerate(communities):
        for node in community:
            node_to_cluster[node] = cid
 
    cross = sum(
        1 for u, v in G.edges()
        if node_to_cluster.get(u) != node_to_cluster.get(v)
    )
    cross_frac = cross / max(G.number_of_edges(), 1)
    if cross_frac > config.max_cross_edge_frac:
        warnings.warn(
            f"Cross-cluster edge fraction {cross_frac:.1%} > "
            f"{config.max_cross_edge_frac:.0%}. "
            "Interference contamination may be substantial."
        )
    return node_to_cluster
 
 
def assign_clusters(
    cluster_ids: list[int],
    config: NetworkExperimentConfig,
) -> dict[int, int]:
    """Randomly assign clusters to treatment (1) or control (0)."""
    rng = np.random.default_rng(config.randomization_seed)
    unique_clusters = sorted(set(cluster_ids))
    n = len(unique_clusters)
    perm = rng.permutation(n)
    return {cid: int(perm[i] < n // 2) for i, cid in enumerate(unique_clusters)}
 
 
def spillover_adjusted_estimator(
    outcomes: pd.DataFrame,   # columns: unit_id, outcome, exposure_type, cluster_id
    cluster_col: str = 'cluster_id',
) -> dict:
    """Estimate direct and spillover effects with cluster-robust SEs."""
    def cluster_mean_se(mask: pd.Series) -> tuple[float, float]:
        sub = outcomes[mask]
        cluster_means = sub.groupby(cluster_col)['outcome'].mean()
        return float(cluster_means.mean()), float(cluster_means.sem())
 
    direct_m, direct_se = cluster_mean_se(outcomes['exposure_type'] == 'direct')
    control_m, control_se = cluster_mean_se(outcomes['exposure_type'] == 'control')
    indirect_m, indirect_se = cluster_mean_se(outcomes['exposure_type'] == 'indirect')
 
    de = direct_m - control_m
    de_se = np.sqrt(direct_se**2 + control_se**2)
    ie = indirect_m - control_m
    ie_se = np.sqrt(indirect_se**2 + control_se**2)
 
    def z_pval(effect: float, se: float) -> float:
        return float(2 * norm.sf(abs(effect / se))) if se > 0 else float('nan')
 
    return {
        'direct_effect': round(de, 6),
        'direct_se': round(de_se, 6),
        'direct_pvalue': round(z_pval(de, de_se), 4),
        'indirect_effect': round(ie, 6),
        'indirect_se': round(ie_se, 6),
        'indirect_pvalue': round(z_pval(ie, ie_se), 4),
        'total_effect': round(de + ie, 6),
    }

Enjoying these notes?

Get new lessons delivered to your inbox. No spam.

Switchback Experiments

Always-Valid Sequential Testing