COVID-19 US Data - Inverse Correlation of Masks vs Cases

Cumlative, population-leveled case data versus public mask percentage

Houston Haynes

5 minute read

Contrast and Compare

Wearing masks more often means fewer COVID-19 cases. It’s that simple, and it couldn’t be more clear in the data.

I started looking into US data with the latest cumulative counts for likely and confirmed COVID-19 cases and deaths compiled by the New York Times. Aside from the split between “likely” and “confirmed” cases (the latter coming from a very specific postmortem test) I also wanted to look at the survey data collected on public mask wearing.

I had suspected that there would be an inverse correlation but I didn’t expect it to come up this quickly, or that it would be such a clear, stark result. I still have quite a bit of refinement work to do on presenting the report. And I also want to start looking at time series data, in order to use a weighted average that matches the time when the mask-wearing surveys are conducted. Right now I’m simply taking the full count, but when the mask guidelines change and compliance shifts I want to be sure I also evaluate changes in case rate.

Animated GIF

I was so struck by the clear inverse correlation between masks and cases that I made an animated GIF with the contrasting charts cycling back and forth.

The Charts

After some edits and surfacing state labels for tooltips (which pop up when you hover over any of the counties in the chart) I re-rendered the reports here for more detailed review. Like the other reports that pull data from OWID and JHU these will continue to shift and change as new cumulative case numbers are added in. And likewise, when there’s an update to the mask wearing surveys there will be new data there as well. Once I have a better view of the full time COVID time series data set I’ll shift to a weighted average for the past few weeks or month, or match the case counts to the time range for the survey period. But for now this provides a valuable, if slowly-moving snapshot.

Exceptions Worth Noting

It should be noted that the NYT found there to be some marginal variance in the actual level of mask wearers and the reported answers from the survey.

Researchers who hand-counted Wisconsin grocery shoppers in May and June found about 40 percent of shoppers wore masks, a level that is lower than the 45 percent who said they always wore masks in the recent Dynata sample (another 24 percent said they frequently wore masks).

They noted that it didn’t affect the result in aggregate. But it bears further scrutiny as more emphasis on masks is placed by public officials and the “virtue signal” of answering the questionnaires/surveys widen the gap with actual use.

Faces of COVID-19
The pandemic-related posts on this site are about more than data. Behind every number is a person, a family and a community. As reports are refreshed, new selectiont will also be chosen at random. To see how this is done, see this sidebar.

The code behind the reports

# begin setup code chunk
library(tidyverse)
library(highcharter)
library(widgetframe)
library(lubridate)

counties <- read.csv("../../data/us_county.csv", header = TRUE)

counties <-
  counties %>%
  mutate(fips = str_pad(fips, 5, side = 'left', pad = '0'))

county_pop <- select(counties, fips, state, population)

mask_URL <- 
  "https://raw.githubusercontent.com/nytimes/covid-19-data/master/mask-use/mask-use-by-county.csv"

mask_data <- read.csv(mask_URL)

mask_data <- 
  mask_data %>%
  mutate(COUNTYFP = str_pad(COUNTYFP, 5, side = 'left', pad = '0'))  %>%
  select(COUNTYFP, NEVER, RARELY, SOMETIMES, FREQUENTLY, ALWAYS) %>%
  mutate(always_pct = (ALWAYS * 100)) %>%
  rename(fips = COUNTYFP) %>%
  left_join(county_pop, fips = fips) %>%
  rename(location = state)

last_updated <- paste("Source: New York Times  -  Report Last Updated:"
                        ,  format(Sys.time(), "%a %b %d %Y %X"))

# end setup code chunk

widget <- hcmap("countries/us/us-all-all",
      data = mask_data,
      value = "always_pct",
      name = "Percentage that always wears masks in public",
      join = "fips") %>%
  hc_colorAxis(minColor = "white", maxColor = "#32644F") %>% 
  hc_title(text = "Percentage of US Respondants Claiming to Always Wear Masks in Public"
            , align = "center") %>%
    hc_chart(
      borderColor = 'rgba(160, 160, 160, 0.3)',
      borderRadius = 8,
      borderWidth = 2) %>%
  hc_tooltip(pointFormat = "State: {point.location}
{point.name} County: {point.value}%")%>% hc_legend(layout = "vertical", verticalAlign = "top", align = "right", valueDecimals = 0) %>% hc_credits(enabled = TRUE, text = last_updated, position = list(align = "left", x = 10, y = -5)) frameWidget(widget, height="100%", width="40rem")
# begin setup code chunk
library(tidyverse)
library(highcharter)
library(widgetframe)
library(lubridate)

counties <- read.csv("../../data/us_county.csv", header = TRUE)

counties <-
  counties %>%
  mutate(fips = str_pad(fips, 5, side = 'left', pad = '0'))

county_pop <- select(counties, fips, state, population)

US_COVID_data <- 
  "https://raw.githubusercontent.com/nytimes/covid-19-data/master/live/us-counties.csv"

us_data <- read.csv(US_COVID_data)

us_data_map <- 
  us_data %>%
  filter(fips != "") %>%
  mutate(date = ymd(date)) %>%
  mutate(fips = str_pad(fips, 5, side = 'left', pad = '0')) %>%
  select(date, fips, state, county, cases, deaths)  %>%
  left_join(county_pop, fips = fips) %>%
  arrange(fips, date) %>%
  mutate(ncpht = ((cases / population) * 100000)) %>%
  mutate(ndpht = ((deaths / population) * 100000)) %>%
  ungroup() %>%
  rename(location = state)

last_updated <- paste("Source: New York Times  -  Report Last Updated:"
                        ,  format(Sys.time(), "%a %b %d %Y %X"))

# end setup code chunk

widget <- hcmap("countries/us/us-all-all",
      data = mask_data,
      value = "always_pct",
      name = "Percentage that always wears masks in public",
      join = "fips") %>%
  hc_colorAxis(minColor = "white", maxColor = "#32644F") %>% 
  hc_title(text = "Percentage of US Respondants Claiming to Always Wear Masks in Public"
            , align = "center") %>%
    hc_chart(
      borderColor = 'rgba(160, 160, 160, 0.3)',
      borderRadius = 8,
      borderWidth = 2) %>%
  hc_tooltip(pointFormat = "State: {point.location}
{point.name} County: {point.value}%")%>% hc_legend(layout = "vertical", verticalAlign = "top", align = "right", valueDecimals = 0) %>% hc_credits(enabled = TRUE, text = last_updated, position = list(align = "left", x = 10, y = -5)) frameWidget(widget, height="100%", width="40rem")

Further analysis

As I mentioned above, the focus will eventually shift to the time series data, as opposed to the latest cumulative data files. There will be more detailed reporting as well. Right now the data looks bleak, but there is a silver lining here - on how following simple guidelines and hygiene protocols can limit the spread of infection and save lives.

Key Value
BuildDateTime 2021-06-15 16:21:40 -0700
LastGitUpdate 2021-06-12 10:59:02 -0700
GitHash ec878bd
CommitComment Using F# function for site cards