On the Correlation Between COVID Cases and Deaths

So, content warning: This is about COVID deaths.

Someone tweeted about case counts correlated to Rep/Dem voting patterns by county showing that the GOP was literally killing its constituents, and someone else rebutted that cases aren’t correlated to deaths because only about one in seven people who test positive die. But a proper correlation would look at cases per million in discrete areas vs deaths per million. If these two numbers are strongly correlated, then that “one in seven” ratio would hold fairly consistently.

When I looked at that, though, I hit another snag. Below is a portion of Michigan’s county data, according to the New York Times. The second column of numbers is the cases per 100K, the fourth the deaths per 100K, and the last the full vaccination rate. This is sorted by cases per 100K (highest to lowest), so if there were a strong correlation, the fourth column would be consistently descending. It isn’t.

Source: The New York Times (screen capture from 8/28/21)

Clearly there is only a weak correlation between cases and deaths, but… factoring in that vaccination rate is a problem. Vaccines don’t prevent COVID, they reduce the risk of both getting infected and having serious symptoms. So in a highly vaccinated area, the ratio of “deaths” to “cases” should be lower than in a highly unvaccinated area. Complicating matters is that these are “all time” numbers, while the vaccination rate is “as of now”… for a good part of the time represented, the vaccination rate was 0%.

Meanwhile, the stochastic nature of human data is impacted by the population sizes involved; Shiawassee County’s population is far lower than Oakland County’s, so its data ratios are going to be less stable. (Unrelated example: Right now, Detroit Tiger Casey Mize, who has played in 31 MLB games, has a stunning On Base Percentage of 0.500, high above average [Derek Jeter retired with an OBP of 0.377]. But that’s because Mize has only had two plate appearances, and got to base once. How to lie with statistics.)

I don’t have a solution to this; it was just an interesting reminder of how messy real world mathematics can get.

Addendum: Looking naively at just the Oakland vs Wayne numbers, it certainly looks like vaccinations are effective: The counties have basically identical case counts (per 100K), but Oakland’s deaths are 2/3 Wayne’s, while vaccination rate is higher. However… Oakland also has better healthcare and a higher economic status. And the racial disparity is also a major factor. So, yeah, complicated.

