Robustness without theater
Robustness checks are often performed as theater:
- add controls until the coefficient “looks stable”
- try a few specs until one is significant
- report a handful of alternatives with no interpretation
That is not robustness. That is specification drift.
Robustness is a disciplined attempt to answer:
“How easily could my conclusion be wrong, and why?”
This post is a simple framework for doing robustness checks that actually increase trust.
Step 1: Name the threats
Start by listing the top threats to validity.
Common ones:
- Confounding: unmeasured variables affect both treatment and outcome
- Selection: who is treated is not comparable to who is not treated
- Measurement: outcomes or treatment are mismeasured (possibly differently across groups)
- Time: trends, shocks, anticipation, or delayed effects
- Spillovers: one unit’s treatment affects another unit’s outcome
- Functional form: the model is imposing structure that is not warranted
Pick the threats that are plausible in your context. If you can’t name plausible threats, you’re not done thinking.
Step 2: Match each threat to a diagnostic
Diagnostics are not “extra models.” They are checks that target a specific failure mode.
Examples:
Confounding
- pre-treatment balance checks (where meaningful)
- negative control outcomes (if available)
- covariate stability (do key covariates behave like they should?)
Time threats (common in DiD/event studies)
- pre-trends / placebo leads
- sensitivity to time window
- robustness to alternative trend controls
Spillovers
- check outcomes in “nearby” units
- redefine treatment intensity / exposure
- test for displacement patterns
The point: each check should have a reason.
Step 3: Do sensitivity, not just variation
Many robustness checks vary the model but never translate results into a meaningful statement.
Sensitivity asks: - How large would bias need to be to change the sign or the decision? - How much selection on unobservables would be required to overturn the estimate? - How sensitive is the result to a reasonable redefinition of treatment/outcome?
Even a simple sensitivity narrative is powerful:
“To eliminate the observed effect, an unmeasured confounder would need to be at least as predictive as [X], which is unlikely given [context].”
Or:
“The estimate is stable to alternative windows and controls, but it is sensitive to excluding early adopters; therefore conclusions apply primarily to later adopters.”
Step 4: Don’t report 20 specs — report structure
If you run many specs, summarize them in a structured way:
- show the estimate distribution across specs
- highlight the “reasonable” specification set
- explicitly state what varies and what doesn’t
A reader should be able to see: - what assumptions drive the result - where uncertainty increases - what conclusions remain stable
A robustness template you can reuse
When you report robustness, aim for four sentences:
- Main estimate and interpretation.
- Primary threats you considered.
- Key checks you ran and what they indicate.
- Sensitivity statement: what would overturn the conclusion.
Example:
“We estimate that X increases Y by about Z. The main threats are time-varying confounding and differential measurement. Placebo leads show no pre-trends and results are stable across windows and trend controls. However, the estimate is sensitive to excluding early adopters, so we interpret it as applying primarily to later adoption contexts.”
That reads like science, not like output dumping.
The goal: decision-relevant trust
Robustness is not about proving you are right. It’s about telling the truth about what your design can support.
Good robustness increases trust even when it weakens the headline result—because it clarifies the boundary of the claim.
If you want, I can turn this framework into a short checklist page on Causalica and a reusable “robustness report” template you can paste into future write-ups.