COVID in India - A Perfect Storm?

Confounding Congregate Events and Strains of the Virus Straining Systems to the Breaking Point

Houston Haynes

12 minute read

My Cassandra Syndrome

From the early days of lock down last year I was just as worried for a possible COVID-20 and COVID-21 as the current COVID-19. And when reconnecting with friends and former co-workers, many of whom work in public health and study of infectious diseases, the collective sentiment was largely similar. A close friend and I puzzled over India, as he’s from there and was convinced that it would see an outsized infection rate due to its population density. But we were still puzzled why India’s big wave emerged so recently and so asymmetrically. Part of it is our shared suspicion that there was under-reporting of cases, but we both thought there was more to the story.

Congregate Events

Taking the queue from my entries about contagion spreading due to end-of-year congregate events in Canada and Israel , my friend supposed that the same would happen in India with the advent of the national Kumbh Festival . Even though many of those public events are held outdoors, my friend explained some of the cultural norms had a great deal of close contact over extended periods. So I decided to dive back into the “Our World In Data” repository and line up India’s infection data with the dates of the festival.

I was fully expecting for each date to be followed by a “bump” in case rates similar to what occurred in Canada and Israel. And while it can be influenced by under-reporting it’s hard to see that many events remaining flat. So what could be the reason for such a huge wave of cases? It’s entirely possible that there’s no correlation whatsoever, or that there’s something tangential about the later events that were minor contributors. There are too many unknown unknowns for this to be an aha! moment. But with news of more virulent strains migrating to and gaining a footing in India I thought I’d take a look at genomics records to see if there’s anything else that emerges. So I dug into Nextstrain's data . Even if there weren’t any answers I was hoping it would lead to better questions.

This visualization is structured as a percentage chart where each variant is represented as a percent of total observations for a given month. The tooltip sort order re-shuffles the results to show the most prevalent variants for the month displayed. It’s been interesting to see how sample levels and back-dated sequence reporting has continued to shape the percentages for a given period. April is still “shifting” as May data continue to come in. As usual, correlation does not mean causation - and in systems like this even correlation can be elusive, but it’s certainly enough to justify spending a bit more time with the data.

Higher Level View

Looking at clade data from the same observation set shows a more succinct pattern. While it looks like May has some drop off in the partial month data, April shows a clear take-over of 20I/501Y.V1 (primarily due to B.1.1.7) and 21A (due to B.1.617.2) which are the two variants that are currently causing the most worry around the globe.

Many experts around the globe are watching this very closely, and like variants and clades, so too will the information surrounding them continue to develop.

News From the World

One of the main concerns surround a potential future “COVID-20” and “COVID-21” is the possibility of those variants reducing the effectiveness of vaccines or available therapies. With that, the contagion would continue to spread, and may eventually overtake once-thought-safe populations. And given the information from those that specialize in this area, certain strains of the virus not only seem more contagious, they also are showing some resistance to certain vaccines and demonstrate potential for accelerated spread.

During my time at TherapyEdge a significant portion of the company’s effort was on determining which strains of the AIDS virus were modulating within a given patient, which would impact the cocktail of ARV and related medications that made an individual’s treatment plan. It was like playing whack-a-mole, and if SARS-CoV-2 exhibits similar traits it will be a long time before the healthcare and scientific communities can get in front of it.

Taking Stock

This isn’t the only report I had planned for looking at this data, but given the recent press around Modi's COVID response , and the dramatic effect that had on COVID incident rates, I thought some focus time with the data was warranted. Seeing population-leveled incident rates is heart-wrenching enough, but in absolute terms the situation in truly horrific. This isn’t just a problem for India. It’s a problem for all of us, and while the scientific community has some clarity on this, I’m hopeful that governments and policy-makers will see beyond their borders and their own personal fortunes to do what’s right for the future of our shared world.

US Health Care Workers Lost to COVID-19
The pandemic-related posts on this site are about more than data. Behind every number is a person, a family and a community. As reports are refreshed, new selections will also be chosen at random. To see how this is done, see this sidebar.

The Code Behind the Charts

These charts are similar to many others on this site, with a few wrinkles in display and tooltip styling. I use highcharter , the R wrapper for the HighCharts JavaScript library.

Data for the first visual again uses the Our World In Data record set. Here it’s pared down to this year’s data and the date field is entitized.

library(highcharter)
library(tidyverse)
library(lubridate)
library(widgetframe)
library(xts)
library(jsonlite)

URL <- "https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/owid-covid-data.csv"

coviddata <- read.csv(URL)

data <- select(coviddata, date, location, new_cases_smoothed_per_million) %>%
  filter(date >= "2021-01-01") %>%
  rename(value = new_cases_smoothed_per_million) %>%
  mutate(value = ifelse(is.na(value), 0, value)) %>%
  mutate(date = ymd(date)) 

data.wide <-  pivot_wider(data, names_from = location, values_from = value)

data_xts <- xts(data.wide[,-1], order.by = data.wide$date)

Because this visual was about the general shape of the plotted area juxtaposed against the dates I didn’t use the “Highstock” visual as I did with Canada and Israel. The largest block of code is the layout of plotted dates, showcasing how the HighCharts wrapper uses a “lists of lists” approach to structuring the hand-off to javascript behind the scenes.

last_updated <- paste("Source: https://ourworldindata.org/  -  Report Last Updated:",  format(Sys.time(), "%a %b %d %Y %X"))

thm <- hc_theme_merge(
  hc_theme_ffx(),
  hc_theme(title = list(style = list(fontFamily = "Ubuntu")),
           subtitle = list(style = list(fontFamily = "Fira Code")),
           legend = list(itemStyle = list(fontFamily ='Ubuntu'))))

widget <- highchart(type = "chart") %>%
  hc_add_theme(thm) %>%
  hc_title(text = "India's Kumbh Festival Dates Compared To COVID Infection Rates (per Million)", align = "center") %>%
    hc_chart(
      borderColor = 'rgba(160, 160, 160, 0.3)',
      borderRadius = 8,
      borderWidth = 2,
      marginBottom = '80',
      marginTop = '60',
      marginLeft = '60',
      marginRight = '60') %>%
  hc_plotOptions(series = list(marker = list(enabled = FALSE))) %>%
  hc_credits(enabled = TRUE, 
             text = last_updated, 
             position = list(align = "left", x = 10, y = -5)) %>%
  hc_add_series(data_xts$India, 
                type = "area", 
                name = "India", 
                color = "#FF9933", 
                showInLegend = FALSE) %>%
  hc_tooltip(valueDecimals = 2) %>%
  hc_xAxis(type = "datetime",
    plotLines = list(list(
      label = list(text = "First Bath: Maka Sankranti"),
      color = "#138808",
      width = 3,
      value = datetime_to_timestamp(as.Date('2021-01-14', tz = 'UTC'))
    ),
    list(
      label = list(text = "Second Bath: Mauni Amavasya"),
      color = "#138808",
      width = 3,
      value = datetime_to_timestamp(as.Date('2021-02-11', tz = 'UTC'))
    ),
    list(
      label = list(text = "Third Bath: Vasant Panchami"),
      color = "#138808",
      width = 3,
      value = datetime_to_timestamp(as.Date('2021-02-16', tz = 'UTC'))
    ),
    list(
      label = list(text = "Fourth Bath: Ram Navami"),
      color = "#138808",
      width = 3,
      value = datetime_to_timestamp(as.Date('2021-04-21', tz = 'UTC'))
    ),
    list(
      label = list(text = "First Shahi Bath: Maha Shivratri"),
      color = "#138808",
      width = 3,
      value = datetime_to_timestamp(as.Date('2021-03-11', tz = 'UTC'))
    ),
    list(
      label = list(text = "Second Shahi Bath: Somvati Amamvasya"),
      color = "#138808",
      width = 3,
      value = datetime_to_timestamp(as.Date('2021-04-12', tz = 'UTC'))
    )
    )) 

frameWidget(widget, width="100%", height="26rem")

This visual that’s built from the code below uses a Nextstrain data set, and while that can be generated from their Python/Anaconda project there’s a dependency on running the process in a Linux container (as I’m on Windows 10). I’ve done that kind of work for local dev instances to be eventually be deployed in the Azure stack, but automating a headless Python/Conda process was eating too much time. I’m not sure if I’ll go through the process of making it work or if I’ll simply re-generate the file a few times per week. Time will tell…

Here you’ll see that I looked at countries that border India as well as India itself. I did some data exploration in order to see “fan out” due to other columns showing tracing back to bordering countries. But I’ll need to do a more complete job of figuring out how to tie in relevant data, which seemed to be a bridge too far in this exercise. Note that I added a defaulted “1” value column so that roll-ups in the monthly aggregate could then be used by HighCharts to show a late-bound percentage value. (see {point.percentage:,.2f} % in the bottom sample)

library(highcharter)
library(tidyverse)
library(lubridate)
library(widgetframe)
library(jsonlite)

asia <- "/repo/nextstrain-ncov/data/asia.csv"

asiadata <- read.csv(asia)

data <- select(asiadata, Collection.Data, Country, PANGO.Lineage, Clade) %>%
  filter(Country == "India" 
#         | Country == "Malaysia" 
#         | Country == "Bangladesh" 
#         | Country == "Nepal"
#         | Country == "Sri Lanka"
#         | Country == "Afghanistan"
#         | Country == "Pakistan"
#         | Country == "Maldives"
         ) %>%
  add_column(value = 1) %>%
  rename(Date = Collection.Data) %>%
  rename(Variant = PANGO.Lineage) %>%
  mutate(Date = mdy(Date)) %>%
  filter(Date >= "2021-01-01") %>% 
  select(Date, Variant, value) %>%
  group_by(Variant, Date = floor_date(Date, "month")) %>%
  summarise(value = sum(value))

While the area spline and percentage charting makes for a “busy” visual, the key really is the tooltip. Note how HTML for building that mini-table is used to construct a “shared” tool tip, where mousing over any value will show the full range of values for every aggregate for that month. The late-binding of percentage-of-total has a definite “bucket brigade” vibe to it at first glance. But like most of this kind of hybrid/polyglot code exercises it becomes the “new normal” fairly quickly. There’s quite a bit of lift that’s being carried by the HighCharts “smarts” and that’s part of the beauty in the beast.

thm <- hc_theme_merge(
  hc_theme_ffx(),
  hc_theme(title = list(style = list(fontFamily = "Ubuntu")),
           subtitle = list(style = list(fontFamily = "Fira Code")),
           legend = list(itemStyle = list(fontFamily ='Ubuntu'))))

last_updated <- paste("Source: https://nextstrain.org/sars-cov-2  -  Report Last Updated:",  format(Sys.time(), "%a %b %d %Y %X"))

widget <- hchart(data, "areaspline", hcaes(x = Date, y = value, group = Variant, labels = FALSE)) %>%
  hc_plotOptions(series = list(stacking = 'percent', marker = list(enabled = FALSE))) %>%
  hc_legend(enabled = F) %>%
  hc_add_theme(thm) %>%
  hc_title(text = "India COVID Variants Percentage by Month (PANGO Lineage)", align = "center") %>%
  hc_add_theme(thm) %>%
  hc_yAxis(title = list(enabled = FALSE)) %>%
  hc_xAxis(title = list(enabled = FALSE)) %>%
  hc_chart(borderColor = 'rgba(160, 160, 160, 0.3)',
            borderRadius = 8,
            borderWidth = 2,
            marginBottom = '80',
            marginTop = '60',
            marginLeft = '60',
            marginRight = '60') %>%
  hc_legend(align = "right",
            verticalAlign = "top",
            layout = "vertical") %>%
  hc_tooltip(shared = TRUE, sort = TRUE, useHTML= TRUE,
              headerFormat = "<b>{point.key}</b><table>",
              pointFormat = "<tr><td style='color: {series.color}'>{series.name}: </td><td style='text-align: right'> {point.percentage:,.2f} %</td></tr>",
              footerFormat = "</table>") %>%
  hc_credits(enabled = TRUE, 
              text = last_updated, 
              position = list(align = "left", x = 10, y = -5))

frameWidget(widget, width = "100%", height = "26rem")

I’m seriously considering the lift in “breaking out” the Linux-focused Python tooling for Nextstrain. It’s both a way to keep my Python chops relatively sharp as well as lay some groundwork for more exploration of Nextstrain data in F#. Stay tuned!

Key Value
BuildDateTime 2021-07-03 10:22:03 -0700
LastGitUpdate 2021-06-23 23:36:17 -0700
GitHash 1512b77
CommitComment uipdated global NextStrain data