A causal simulation of infant sleep risk, and what back-to-sleep advice overlooks
We just had our second baby a few weeks ago, and somewhere in the late-night feeds I started pulling on the advice every new parent gets drilled on: always put the baby down on its back. It's good advice and I follow it. Two things made me want to understand the stomach-sleeping part in particular. My first son, once he was old enough to roll himself over, preferred his stomach immensely, slept better that way, and fought being put on his back. And my own generation slept face-down as infants; that was the standard advice at the time. So the quant in me wanted something narrower than reassurance: does sleeping prone carry risk by itself, or does it only look dangerous because it travels with other things like soft bedding, smoking, and a baby too young to roll over? You can't run that experiment, for obvious reasons. So I built a simulation.
A simulation can't prove a real-world cause. It can falsify. If you program prone to be causal, the sim "finds" causation; if you program it as a harmless marker, it finds that. So the engine asks one question instead: can I build a world where prone has zero independent effect and still reproduce every real historical number? If no such world survives the calibration targets, "it's just a marker" is ruled out. If one does, then history alone can't separate cause from correlation, and the honest output is a bound, not a verdict -- something like "to match the data, prone must carry at least X% of its risk directly, and a hidden confounder would need strength E to explain the rest away."
Three competing worlds, judged on the same evidence. I run prone as a direct cause, prone as mostly a marker, and a triple-risk interaction where position only matters given an already-vulnerable infant in a critical window. Each gets the same treatment -- a do-operator on the causal graph, back-door adjustment, IPW, and E-value sensitivity bounds -- and then has to reproduce the published odds ratios or it's out.
Calibrating a world is its own fight. Each one has to match the same published numbers -- prone prevalence before and after Back-to-Sleep, the SIDS rate, the adjusted odds ratio -- and the loss is just the weighted squared error against those targets:
# Each world must reproduce the published numbers. The loss is a
# weighted squared error, in log space, against every target.
def loss(world, x):
params = finalize(world, x) # solve the pinned knobs first
sim = simulate(world, params)
return sum(
spec.weight * (log(sim[name]) - log(spec.target)) ** 2
for name, spec in TARGETS.items()
)
# Don't hand all 10 parameters to the optimizer -- the naive version
# diverged (death rates 15-20x too high). Pin the ones that map
# cleanly to a single target, optimize only the coupled few.
result = minimize(lambda x: loss(world, x), x0,
method="Nelder-Mead", bounds=bounds)
The discipline matters more than the optimizer. Some parameters map to exactly one target -- a prevalence, a baseline death rate -- so I solve those by hand and lock them, and the optimizer only has to deal with the few that are actually coupled. Hand it all ten at once and you get a world that fits none of the targets.
The marker world couldn't survive. No calibration with prone set to zero effect reproduced the historical odds ratios; faking them would take a hidden confounder stronger than any real one. The clearest way to see why is a PCA on the risk-factor matrix. Before the campaign, when nearly every baby slept prone, prone loads on its own axis, almost orthogonal to the social-adversity bundle of smoking, low SES, and soft bedding. When everyone does a thing, that thing can't be a proxy for poverty -- so the prone-death link of that era can't be confounding. It has to be causal. After the campaign, prone migrates onto the adversity axis: the families still placing babies prone now skew toward smoking and low income, which is exactly why the modern odds ratio is partly inflated by selection rather than by extra danger.
It's causal, but it isn't the biggest factor. With prone settled as a real cause, you can ask a different question: of everything that separates a baby who dies from one who doesn't, how much of the variation does sleep position actually account for? Decompose the death risk by share of variance and prone lands mid-pack.
| Factor | Share of variance in death |
|---|---|
| Latent vulnerability | 49.8% |
| Heavy smoking | 22.5% |
| Smoking (any) | 9.5% |
| Prone | 7.5% |
| SES | 5.7% |
| Soft bedding | 5.0% |
The single biggest axis is a latent vulnerability you can't see or change -- some infants are born with a brainstem that handles a breathing challenge poorly. Among the things you can measure, smoking dominates. Prone is real and removable, but it became the headline win for a different reason: it's the biggest modifiable lever, not the biggest slice of variance. You can't change a baby's brainstem. You can change its position.
So how much of prone's risk can you engineer away? Knowing prone is causal doesn't tell a parent what to do with a crib at 9pm. The decision-relevant question is how much of the effect runs through channels you can close. The model splits the calibrated prone effect into mechanisms:
| Channel | Share of prone's effect | Engineerable at home |
|---|---|---|
| Rebreathing exhaled CO2 | 55% | Yes -- firm, breathable surface |
| Airway obstruction | 18% | No -- intrinsic to the position |
| Endogenous autonomic events | 15% | No -- vulnerable infants only |
| Thermal / overheating | 12% | Yes -- cool room |
| Free-standing arousal hazard | 0% | -- |
That last row -- the zero -- is the one doing the most work. A free-standing arousal hazard would mean prone kills a healthy baby on its own, in any crib, just by blunting the instinct to wake up. If that term were large, none of this would be engineerable. But the arousal data put it near zero: prone raises a baby's wake-up threshold without deranging a healthy one at rest. Prone doesn't kill by itself. It kills when a real breathing challenge -- rebreathed CO2, overheating, a blocked airway -- meets a baby that fails to rouse from it.
That's why most of the risk comes off. The two biggest channels are the two a parent can shut off: rebreathing, with a firm breathable mattress and nothing soft near the face, and overheating, with a cool room. What's left is the obstruction risk intrinsic to the position and the autonomic events that only fire in the small group of already-vulnerable infants nobody can pick out in advance. That's the best-supported split; the residual table below also runs the adversarial case where the arousal term is large instead.
Toggle the engineered environment on and read off the residual:
| Mechanism assumption | Historical crib | Engineered crib | Removable |
|---|---|---|---|
| Gated (best supported) | 0.191 / 1000 | 0.041 / 1000 | 78% |
| Endogenous-heavy | 0.191 / 1000 | 0.069 / 1000 | 64% |
| Adversarial bound | 0.191 / 1000 | 0.093 / 1000 | 51% |
Those are excess absolute risks, prone versus supine, for a low-risk infant. Running it three ways bounds the answer: even under the adversarial assumption you remove half the excess, and under the best-supported one nearly four-fifths. The engineered residual lands around 0.04 to 0.09 per thousand -- on the order of 1 in 11,000 to 25,000. That turns a multiplicative odds ratio of three into an absolute number a parent can hold: small, but against an outcome with no second chance, not zero.
This reminded me of a story from Ed Catmull, the co-founder of Pixar. Their big meetings ran at a long, narrow conference table. The people stuck at the far ends couldn't catch anyone's eye without craning, so they felt like second-class citizens and went quiet. The table was strangling the conversation, and it took leadership more than a decade to notice, because for the people in the middle, it worked fine. A fix for one problem (a table big enough to seat everyone) had quietly broken a more important one (the flat, open flow of ideas the company runs on). They swapped in a square table that put everyone on equal footing. Catmull's takeaway: every solution has a hidden cost, so you can never assume a fix is done -- you have to go hunting for what it just broke.
So what did back-to-sleep break? It cut SIDS deaths from about 1.2 per 1,000 births to 0.4. By Catmull's rule, a fix that size broke something downstream. My candidate: back-sleeping makes babies sleep lighter, lighter sleep wears parents down, and worn-down parents improvise at 3am. The real comparison was never back versus stomach. It's what a wrecked parent does when the baby won't settle on its back and the other rule is never bed-share. They feed lying down and drift off, or sink into the couch with the baby on their chest. The couch is the killer: sofa and armchair co-sleeping runs 18 to 50 times the risk of a baby asleep on its back. A low-risk baby placed prone in a proper crib -- firm mattress, bare, cool room -- runs about 1.4 times baseline, and near baseline with a parent awake and watching. Flipping the baby prone in a good crib is 10 to 30 times safer than that same parent collapsing on the sofa. The messaging hands parents a list of don'ts with no sanctioned do, and pushes them toward the worst option exactly when they can least resist it.
There's a quant version of the same idea. I spent years building trading strategies, and there's always a gap between the backtest and the live market. A strategy can be optimal on paper, under perfect execution, and then slippage and real human behavior eat the edge the moment you deploy it. "Always back, never bed-share" is the backtest-optimal policy: it assumes perfect compliance. The exhausted parent at 3am is the part of the system that doesn't comply. A good quant optimizes the realized outcome, net of how imperfect agents actually execute it, not the version that looks best on paper. Public-health advice should be built the same way. Harm reduction beats abstinence here for the same reason it does everywhere: people in distress will act, and the job is to make the thing they reach for the least dangerous version, not to pretend they won't reach.
The cheapest fix is to attack the exhaustion that drives the whole spiral. A responsive bassinet that keeps the baby supine and soothes it back to sleep -- the SNOO is the best-known -- beats every other home setup in the model, not by making any single night safer in the crib but by removing the desperation that leads to the couch. The catch is that it's about $1,700, which paywalls the one tool that defuses the worst failure mode away from the families most likely to hit it. Finland has handed every expecting mother a "baby box" that doubles as a safe sleep space since 1938; Scotland copied it in 2017 and has shipped over 350,000 baby boxes. My take ended up somewhere I didn't expect: governments should give every new family a responsive bassinet.
It would ease the sleep deprivation of the newborn phase, and as a byproduct it might even nudge the birth rate back above replacement: for some, a less grueling first few months may make the next kid an easier yes.
Two caveats, because I don't want to oversell this. No home device has been proven to cut SIDS deaths in a trial, and the fertility-and-bassinet link I find tempting is a plausible mechanism, not a measured effect. I'm not an epidemiologist and none of this is medical advice. It's a reasoning tool, and the value of a reasoning tool is people trying to break it, which is why it's public on GitHub. Follow your pediatrician. But if you'd rather understand the risk than just follow the rule, the code is there to check.
Where the numbers come from. The simulation is only as honest as the targets it has to reproduce, so every calibration number is a published one rather than a guess. The sources:
- Prone prevalence and how it shifted across Back-to-Sleep: risk-factor changes after the campaign, the historical review in Gilbert et al. 2005, Int. J. Epidemiology, and the original New Zealand Cot Death Study (Mitchell et al. 1991).
- The vulnerability axis, the biggest slice of variance in the table above: the brainstem serotonin triple-risk hypothesis (Kinney).
- Smoking's odds ratio and its rising attributable risk as prone fell away.
- The mechanism split rests on Horne et al. 2001 (J Pediatr): prone raises a baby's arousal threshold but doesn't derange a healthy one at baseline, which is what pins the free-standing arousal channel near zero.
- The couch number: Carpenter et al. 2013 (BMJ Open) puts sofa and armchair co-sleeping near an odds ratio of 18, and Blair et al. 2014 shows bed-sharing risk concentrates in the hazardous cases.
- The soother case leans on the breastfeeding-and-SIDS meta-analyses (Hauck 2011, Thompson 2017) and the SNOO supine-device research, which shows it keeps babies on their backs but has never been tested against SIDS deaths directly.
Current guidance throughout is the AAP 2022 safe-sleep recommendations. The full target list and tolerances live in the repo's calibration files.
← Writing