Are COVID-19 cases already on the decline?

Caveat: I am not an epidemiologist.  My fields are oncology, immunology, and to a lesser extent microbiology.  This whole situation is changing rapidly anyway, so uncertainty reigns during times like these.  I'm more interested in data analysis than I am in diagnosing conditions on the ground.

We should be careful - suspicious even - when we see data and think it's talking to us.  Especially in times like these, I often hear people say they're not doing anything fancy, they "just look at what the data say."  That's false.  Data don't say anything.  People say things about data.  Sometimes data can look very convincing, but then as you ask more questions and look at the data in a different way you can come away with different interpretations.  That's not what people mean when they say thy're just looking at what the data say.

By all means look at the data, but remember that if you're hearing something it's not because the data are talking to you.  It's because you're reading something into the data (unless you're actually hearing voices, but that's still not the data talking).  It's perfectly valid for someone else to come along and read something else from the same set of data.  Indeed, sometimes the difference between an expert and a layperson is that the expert has learned not to be fooled by a surface reading of the data.

A scant 16 days ago, I posted a simple Google Trends graph showing that searches for coronavirus are seasonal, and that possibly the current virus would be, too.  Since then, I've been asked if my opinion on that has changed.  There are multiple very good reasons to believe that this won't end up being seasonal, that SARS-CoV-2 is already different from many other coronaviruses.  For our purposes today let's just focus on the official US data.  Here is the CDC's data for cases of people who tested positive for COVID-19 over time:

Ignore those last few days.  It looks like the numbers are dropping off, but that's just because new cases haven't been reported yet.  I fully expect them to be higher than the cases already reported through March 10th, as they've been doing for awhile now.

If that's so, this graph looks like steadily increasing numbers of cases of coronavirus.  This is what it feels like to watch exponential growth happen before your very eyes, isn't it?  Clearly the data are telling us something, and we'd be fools not to listen!

Maybe.  But remember to be suspicious when you think the data are talking.  You might be out of your mind.  Let's stop to ask some more questions about the data.  For example, we know you can only test positive if you've actually been tested.  We also know that testing hasn't exactly been steady.  So does the CDC publish numbers about how many tests were done?  Indeed they do!

Again, those last few columns should continue to increase over time.  We're waiting for the data to come in, not experiencing a sudden drop off in testing.  The absolute number of confirmed cases don't tell us much if we know the number of tests per day has been steadily changing.  And it has been steadily increasing, along with the number of positive tests, although not at a 1:1 rate.  At a very interesting rate:

I added all tests together so we can compare cases per day and tests per day.  I'm not sure if this means tests initiated, or test results received, but I think it means tests initiated.  I'm also pretty sure the cases are dated back to when the test was run, so I think the dates should should match, but I could be wrong.

Don't get fooled by the axes here.  The left axis tracks tests per day, and the right axis tracks cases.  Otherwise the tests would dwarf the cases per day.  Obviously you have to test more than one person per positive COVID-19 case confirmed.  This next chart tracks the number of tests you need to run to get a positive test:

Amazingly, back on February 24-25 every test performed had a greater than 50% probability of coming out positive! I have to admit that when I first saw this chart I got excited.  It really felt like it spoke to me.  It said, "The rate of cases per day peaked at the end of February and has been going down ever since!  This means COVID-19 cases have been on the decline since before the containment measures went into effect."

It was at this point that I began to get suspicious.  Data don't talk, so if I look at a chart and I hear something I know it's not coming from the data but my own interpretation.  In this case, one interpretation is to say that if we're testing more and more people we should expect to get more and more positive cases.  Therefore, if we're seeing the cases/test decrease it must be due to falling rates of infection.  After all, we have to test more and more people to find anyone who has it, right?

But this assumes a normal distribution of COVID-19 cases among those being tested.  I've talked before about how distribution matters, and one thing we know about how testing has been done to date is that it's not being done randomly.  It's being prioritized based on suspected sickness (i.e. you're symptomatic and you test negative for flu), and in many cases on how sick you are.  It also hasn't been rolled out to every jurisdiction uniformly.  Places with higher probability of spread got them first.  These are all reasons to suspect that the more we test the more we'll see this ratio of cases/test go down - it will take more tests to identify each new case - even if the disease is stable or spreading.

So we have two models of how to interpret the data.  One, where we assume the testing previously wasn't capturing many of the cases that were out there.  Under this model, increased testing means catching those cases we were previously missing - because we didn't have the tests to do them.  There are good reasons to believe this explains at least part of what we're seeing in the data.

The other model assumes the testing is capturing many more negative cases than we were previously seeing.  The more people we test, the more we should expect to be expanding into less exposed populations, which would explain at least part of what we're seeing in the data.

Most likely both are happening to some extent.  This kind of real-world data tends to be messy.  It's hard to tell, just from looking at it, which interpretation is driving the overall trends we see.  If only the chart could speak to us, we might be able to understand better what we're looking at.  But alas, the data doesn't speak for itself.


Eight days later, more data has come in. As I've maintained for awhile, a developing situation like this one is marked by uncertainty, not because people are willfully ignorant, but because the things we want to know aren't knowable yet. Here are the updated numbers since I made this post initially:

If you compare to the graph above you'll notice a few things:

  1. Some of the numbers earlier in the graph have changed, not just the ones from the 5-7 days at the end there. I don't know why they added three additional cases on Feb 2, or why they made any of the other changes. All I know is that every date since Valentine's day got an update of some kind, yet overall the left-hand side of the graph looks similar to the one above.
  2. There's also a slight increase in the number of tests reported as having been done, but that increase isn't nearly as large as the increased number of cases - by which I mean in actual nominal terms, not just as a percentage that'll show up on a graph.  For example, since the last post the number of cases for Feb 27 went up from 43 to 71, but the number of tests only went up from 186 to 196. Those extra 28 cases didn't all come from the extra 10 tests reported, so there must be some lag between reporting a test and reporting a case. And this lag is greater than just a couple of days, because it was already more than two weeks since the test date when I posted the numbers in my last analysis.
  3. Early dates change a little, but later dates change a lot more so the distortion increases the more recent the data are. For example, the data available on March 19th for testing performed on March 10th went from 140 positive tests out of a total of 3019, for a positive percentage of under 5%. Today's numbers (27MAR2020) are revised up to 767 positive tests out of a total of 3352, for a positive percentage of just under 23%. Obviously most of those additional positive cases are coming from tests already reported as having taken place on the 19th (they're from that initial 3019).
The end result is to throw off cases/test, because the case report data isn't strongly correlated with the testing data. It would be ideal to get a total number of negative cases to compare against. Either way, the number of cases/test doesn't appear to be going up, as we'd expect if disease spread were getting worse than it was in February. My confidence in this is low, because I didn't collect the data myself and the methodology is opaque. I do think that, given how much people are relying on number of cases to tell them something about the developing situation (and our exponentially increasing testing directly impacts that metric), a better assessment of the above data should be done. Preferably by someone with access to the source data.


Popular posts from this blog

A better addiction

Covid-19: Epidemiology is useful

Open Questions: The Origin of Life