Robustness without theater

A practical way to think about robustness checks: not a checklist, but a map from threats → diagnostics → sensitivity.

Published

February 5, 2026

Robustness checks are often performed as theater:

add controls until the coefficient “looks stable”
try a few specs until one is significant
report a handful of alternatives with no interpretation

That is not robustness. That is specification drift.

Robustness is a disciplined attempt to answer:

“How easily could my conclusion be wrong, and why?”

This post is a simple framework for doing robustness checks that actually increase trust.

Step 1: Name the threats

Start by listing the top threats to validity.

Common ones:

Confounding: unmeasured variables affect both treatment and outcome
Selection: who is treated is not comparable to who is not treated
Measurement: outcomes or treatment are mismeasured (possibly differently across groups)
Time: trends, shocks, anticipation, or delayed effects
Spillovers: one unit’s treatment affects another unit’s outcome
Functional form: the model is imposing structure that is not warranted

Pick the threats that are plausible in your context. If you can’t name plausible threats, you’re not done thinking.

Step 2: Match each threat to a diagnostic

Diagnostics are not “extra models.” They are checks that target a specific failure mode.

Examples:

Confounding

pre-treatment balance checks (where meaningful)
negative control outcomes (if available)
covariate stability (do key covariates behave like they should?)

Time threats (common in DiD/event studies)

pre-trends / placebo leads
sensitivity to time window
robustness to alternative trend controls

Spillovers

check outcomes in “nearby” units
redefine treatment intensity / exposure
test for displacement patterns

The point: each check should have a reason.

Step 3: Do sensitivity, not just variation

Many robustness checks vary the model but never translate results into a meaningful statement.

Sensitivity asks: - How large would bias need to be to change the sign or the decision? - How much selection on unobservables would be required to overturn the estimate? - How sensitive is the result to a reasonable redefinition of treatment/outcome?

Even a simple sensitivity narrative is powerful:

“To eliminate the observed effect, an unmeasured confounder would need to be at least as predictive as [X], which is unlikely given [context].”

Or:

“The estimate is stable to alternative windows and controls, but it is sensitive to excluding early adopters; therefore conclusions apply primarily to later adopters.”

Step 4: Don’t report 20 specs — report structure

If you run many specs, summarize them in a structured way:

show the estimate distribution across specs
highlight the “reasonable” specification set
explicitly state what varies and what doesn’t

A reader should be able to see: - what assumptions drive the result - where uncertainty increases - what conclusions remain stable

A robustness template you can reuse

When you report robustness, aim for four sentences:

Main estimate and interpretation.
Primary threats you considered.
Key checks you ran and what they indicate.
Sensitivity statement: what would overturn the conclusion.

Example:

“We estimate that X increases Y by about Z. The main threats are time-varying confounding and differential measurement. Placebo leads show no pre-trends and results are stable across windows and trend controls. However, the estimate is sensitive to excluding early adopters, so we interpret it as applying primarily to later adoption contexts.”

That reads like science, not like output dumping.

The goal: decision-relevant trust

Robustness is not about proving you are right. It’s about telling the truth about what your design can support.

Good robustness increases trust even when it weakens the headline result—because it clarifies the boundary of the claim.

If you want, I can turn this framework into a short checklist page on Causalica and a reusable “robustness report” template you can paste into future write-ups.