Covid-19: Epidemiology is useful

I pulled this data from Google Trends of searches for "coronavirus".  I had to set a range that excluded everything from mid-December 2019 to the present, because they dwarfed all the other searches and compressed the scale.

As you can see, there's a natural die-off of coronavirus-related searches by mid-March, which is good news for those concerned with the present crisis.  I've heard a lot of speculation about the current coronavirus, Covid-19, and I want to talk about a few ideas from epidemiology to get at more general principles.  First, the reproduction number (R0) of Covid-19 looks like it's about 2.2.  Meaning each infected person is likely to infect 2.2 additional people before they clear the virus or die and are no longer infectious.  That means it's likely to spread quickly - or more accurately exponentially.

In fact, given an R0 of that magnitude, it's unlikely we'll be able to stop it from spreading, no matter how good our quarantine is.  There are just too many people infected already.  So why focus on stopping it, or even slowing it down?  So we can learn new things about it.  Not out of curiosity, but as our best method of using that information to save lives.

Epidemiology: Science or natural philosophy?

I recently wrote about non-empirical 'science', noting the difference between things that are amenable to study via the scientific method and things that aren't.  I want to revisit the topic from another angle, and talk about a branch of learning - epidemiology - for which we can't run controlled experiments.  Specifically, I want to talk about how that branch of learning is useful, even if it's not rigorous in the way most branches of science are.

I'll refer to a recent editorial in the New England Journal of Medicine about Covid-19, the most recent coronavirus to go viral.  The editorial also links to the source articles it's based on, but for simplicity I'll focus on the editorial for now.

The article starts out by mentioning that Covid-19 is genetically similar to other coronaviruses - SARS  and MERS - which was some of the first information we found out about it.  If all you know is that there's a new virus making the rounds, and that it's genetically similar to diseases with fatality rates of 9% (SARS) and 36% (MERS), you too might be tempted to shut down all of China for a few weeks.  However, as the article points out, we now know that the true mortality rate for Covid-19 is below 1%, and probably similar to the seasonal flu.  But notice how they state their conclusion about the actual fatality rate of Covid-19: "If one assumes...".

All of these reports are retrospective analyses, which rely on cases of people who got sick and came to the hospital.  We know some people got sick but were either asymptomatic, or didn't go to the hospital.  Naturally those people are infected, but aren't counted in the mortality rate they measured in their retrospective analysis, so they have to make assumptions, adding those people back in, to get a more accurate estimate of how likely people are to die if they contract Covid-19.  That involves using educated guesses.

If researchers wanted to confirm their hypothesis through the scientific method, they'd need to randomly assign one group of people to get infected with Covid-19, compared to an uninfected control group.  Depending on the precision you want to get from this experiment, you'd need anywhere from a couple hundred to a couple thousand volunteers.  Obviously this would be highly unethical, so nobody is proposing we do something like that.  As with so many other questions in epidemiology, it's infeasible to investigate this experimentally.  But there's no provision within the scientific method that says you get a pass because something is hard, or would result in people accurately associating your methods with Stalin.

Science isn't an imperative to do something just because you're curious, and it's not wisdom that's owed to humanity, graded on a curve in case the experiment can't be done right.  It's an algorithm, of sorts, that allows you to gain confidence in the predictive nature of your hypothesis based on how you collect your supporting evidence.  Most of epidemiology doesn't follow that algorithm.  It's a series of observations that fall under what Aristotle called 'natural philosophy'.

And yet it's still useful in the face of a potential pandemic like this one.  Some of the benefit to delaying the spread of Covid-19 comes from the potential to create a vaccine against it, but it's unlikely we'll be able to manufacture enough doses of vaccine before it comes around again to catch everyone.  That means we'll need to rationally distribute vaccine to the people who need it most.  Naturally, we'll want to start with health care workers, so hospitals and clinics can continue to act as places of healing and not networks to spread disease.

The data reported in the NEJM suggests that elderly people (and not young children) are the most susceptible to severe complications.  If those observations hold up, we'll want to focus vaccination on the elderly and people who associate with them the most.  We might be able to free up some vaccines from schools, and even keep school in session, as we focus resources where they're needed elsewhere.

Because we didn't do controlled experiments, we can't be confident in our estimates.  The results could be biased based on nationality, environmental factors unique to the Wuhan area, the nature of Covid-19's mutation state at that time, etc.  But the estimates we have now are much better than having no information at all.

If the choice were between having experimentally-validated results and using serial observations to make a decision, we would expect a more accurate result from the experimental results.  There are some things we'll never know because we can't do the experiments to figure them out.  For example, at the beginning of an outbreak is when you have the most potential to contain infection spread.  With something like Covid-19, after a certain point it will get away from you, so the early phase is when you have to make the decision to shut everything down.  At that point, you can't know whether something will end up being a significant problem or whether it will come to nothing.  So if you're a bureaucrat trying to decide whether to do something about an emerging disease, you have to pick one or the other as an educated guess.  If you choose 'potential pandemic' and you're wrong you just wasted a bunch of resources and everyone will get mad at you as lacking good judgement.  If you pick 'potential pandemic' and you're right, nobody will know that you stopped a pandemic through drastic means.  Instead, they'll all get mad at you and accuse you of lacking good judgement.

Since the scientific revolution changed things, we often talk about making data-driven, scientifically-validated decisions.  But many decisions can't be validated scientifically.  They're based on subjective evidence, or are not data-driven; but that isn't a failing of the individual making the decision so much as the uncertainty inherent in the world we live in.  We often don't get clear data, or the chance to test something experimentally, so we have to make do.  In the case of epidemiology, slowing down the spread of the disease may well save lives.  So far it has given us time to learn about the transmission rate of the disease, to potentially develop a vaccine, and to target future interventions at those who need them most.

We'll never know the counterfactual - how many people would die if we don't do those interventions.  Because the only way to find that out is to randomly assign people to the 'no treatment' group and watch them die.  Instead we can intervene next November largely through faith that our information is accurate enough to be useful. We'll hope our interventions save lives.  Which lives, or how many, we'll never know.


  1. The whole business of the unseen disaster that was prevented is fascinating to me. And I have a tendency to err on the side of imagining that such prevention is more important than we think.

    As far as the death rate for COVID-19. Estimates seem to be all over the place. I'm convinced it's higher than 1%, but I agree the 3-4% of diagnosed cases(source: wikipedia) is probably an overstatement because of people who are never diagnosed.

    1. This question becomes more interesting the more you do it. Right now there's a bit of a debate in the medical community about when and how much to engage in preventative medicine - and how well we really do at preventing disease. Some good examples of poor preventative medicine can be found in the book Ending Medical Reversal. (To be fair, the ongoing debate usually centers around specific preventative measures, as opposed to the concept of preventing disease altogether. However in EMR the authors suggest that the strong focus on surrogate endpoints causes us to spend a lot of money 'preventing' biomarkers from exceeding a certain range while leaving the underlying disease causes unaffected.)

      This isn't to say that preventative medicine never works. Just that it can be difficult to tell when an intervention is effective writ large - especially in the midst of changing disease incidence. This has been hotly debated in oncology, where we see increasing cancer incidence and better survival rates. How much of this is due to better detection rates driving up incidence statistics, and how much is due to better treatments of early stage disease preventing more severe malignancy? There is some data to suggest both are happening, but we can't do the 'definitive experiment' to figure out which elements in the cornucopia of interventions are effective. We almost never get a precision estimate with decimals attached. Usually we're lucky if we can get the direction right. So we keep doing all of the interventions, and every once in a while the balance of the evidence suggests we drop something (like mammograms for younger women) or add something else back in. When people are dying it's hard to argue "let's not do something that makes logical sense because we don't yet have evidence to confirm it's effective."

      It's fascinating to watch the contentions right now in metabolic health brought up by Taubs and his followers, where he basically argues that the intervention CAUSED the increased incidence. How do you prove that? Mostly by theory and by inference from population-level statistics and data from lab animals. It's much harder - and takes longer - to do the kind of controlled trials required by traditional science to support your hypothesis. Understanding public health crises is vital. But the tools we have for investigating them are limited, especially when the crisis actively changes the population you're studying.

      It remains to be seen whether COVID-19 will be seasonal like previous coronaviruses. I'm sure we'll keep revising mortality rates as more data comes in. Perhaps it will vary with ethnicity, SES, and a few other factors we haven't considered yet. More data should help us mitigate the effects of the epidemic, and more time should help in developing a vaccine, but uncertainty will remain a central feature of the evolving crisis.

    2. Fascinating stuff, and definitely well into your area of distinguishing science from things which claim that mantle, but really aren't science.

      What's your take on all the recent cancellations? Over reaction? Appropriate? Appropriate but maybe slightly premature? Appropriate and overdue?

    3. My take is mostly based on the idea that uncertainty is high right now - mainly due to the nature of knowledge-gathering in this kind of situation. Some things we think we know:

      The strategy is to slow the spread, not to stop it. The hearsay I'm getting from Italy is they weren't prepared and that's creating big problems for them. If we can slow the spread enough, we can get prepared and won't overwhelm our supply chains/health care capacity so we can treat vulnerable populations as they get sick.

      Incubation estimates vary, but it's probably longer than 2 weeks and potentially as long as 3-4 weeks. People are infectious during this period, so you can't say "I don't have it" because you might. All these cancellations and closures might be to ensure YOU aren't spreading the virus, not to keep you from catching it.

      If I were advising government officials right now, I'd ask, "What would you do if current trends in your state were extended out on an exponential curve 3-4 weeks from now?" Say your state has four cases. The trend extended out 3-4 weeks says you could be looking at major levels of infection. Sure, most people who get it don't get hospitalized, but enough do that you don't have capacity in your facilities. The answer to the question above is probably, "I'd shut down public events, close schools, and try to isolate people as much as possible." Okay, well you probably have a bunch of asymptomatic people spreading the virus right now. You'll only detect them in 3-4 weeks. Since that outcome is already set, maybe do the thing you were planning to do 'if things get bad enough', and do it yesterday.

      Finally, I'm hearing that some risk factors for mortality in the face of COVID-19 have a much higher prevalence in the US population than in the Chinese population. So the death rates reported there may not mirror what we can expect here. For example, cardiovascular risk, obesity, and diabetes may increase risk of death. We have a much heavier burden in the US for that kind of thing, so maybe we're wrong about the death rate. Say we're hearing it's 1-2%, but that's not counting all the people who don't report in, so maybe the real rate is 0.4-0.8%. Except we have lots of risk factors in the US, so bump it up again to 0.8-1.6% That's still a lot of people. It's more than ten times as bad as the seasonal flu.

      Of course, there's always the possibility that COVID-19 acts like many other coronaviruses and we start to see it fall off in the next few weeks. If it follows the charts above (which are search mentions, not actual infections, so it's at least one step away from good data) we should see an inflection point in new cases before the end of the month. But again, we're operating under a shroud of uncertainty. We like precision in science, because we don't want to add bad information into the long-term store of knowledge. But when we're making short-run situational decisions it can be difficult to inform our guesses with precision evidence. We're stuck making significant choices based on sub-optimal information.

      So even if COVID-19 ends up being seasonal I'm not going to complain that they took the measures they did. Because they didn't know for certain that's what would happen. And the price for being wrong is unknown as well - but potentially very high. We can second-guess them after it's all over, and even now I'm not convinced it's going to be as bad as lots of people say. But I suspect if I were a state governor right now I'd probably be rushing to close off public gatherings, just in case.

    4. Has your assessment of it gotten better or worse in the last few days? I'd have to say mine has gotten worse, but that also means that I think we're going to need to keep current measures in place for quite a while, and I'm not sure how tenable it is to shut down all gatherings over 50 people for several months.

      I guess part of my pessimism comes from this article, where someone actually puts numbers to the ubiquitous flatten the curve chart:

    5. That's an interesting article. I wonder why he has such high confidence that the virus can be contained. The only argument for containment would be if the virus is seasonal - something that we still can't confirm, even if it is a coronavirus and (as per the chart at the top of this post) that's often the case.

      I still think the defining feature of this or any new pandemic is uncertainty. People who want to believe the death rate is low can find evidence to cite supporting their hypothesis and dismiss the alarmists; people who want to believe the opposite can find evidence to support their hypothesis and dismiss the low-ballers. I'm not convinced the ultimate death rate will be anywhere near what this guy is estimating, but I'm more certain that both our estimates are illusory. I could be wrong; he could be wrong; we're probably all wrong about multiple aspects of the disease and the unfolding crisis. It hasn't been two weeks since I posted this, and already just about every statistic from the NEJM article has been adjusted, second-guessed, refuted, confirmed again, refuted again, dismissed, and/or relied upon to make important decisions.

      The hard thing for most people is that we want to KNOW the true answers. What is the death rate? (Probably differs based on risk factors, health care system, time of year contracted, strain/sub-strain of the virus, etc.) Is the response overblown? (Maybe the US looks like Italy without any countermeasures, maybe not.) Are the things we're doing right now necessary, or even helpful in saving lives? Are we wasting everyone's time trying to delay without ever realizing any benefit from that delay?

      We will never know. As ever much as we want to know right now so we do the 'right thing', that's not realistic. It's certainly folly to publish an article in Medium with high confidence that control won't work, but containment will. It's foolish to publish anything with high confidence.

      How much confidence do I have the coronavirus will end up being seasonal in the US? I'd have expected cases to start waning by now if it is, though it's still possible that we'll start seeing it. Maybe it's somewhat temperature sensitive and we'll have to wait longer for the drop off in new cases. Maybe it's not at all temperature sensitive, and we'll see this whole thing play out throughout the summer. I don't know.

      I'm not an epidemiologist, but I do have a terminal degree in and work in the biomedical sciences. Listen to the experts. But remember when you hear advice that conflicts with what they said a week ago that they're not working from perfect data and a solid plan. They're still making good educated guesses based on the best available evidence, and I support their efforts. But demanding solid answers from anyone is to demand something that is not currently known or even knowable.


Post a Comment

Popular posts from this blog

A better addiction

Open Questions: The Origin of Life