Model Description
In an effort to better help our clients, and frankly all of us, maneuver these uncertain times and to better understand what the upcoming months are likely to bring, we have applied our data science expertise to estimate a structural model of the spread of the coronavirus. The aim of the model is to determine specifically how mobility and weather impact the local transmission rates while controlling for population density, population immunity rate and the fact that people are taking more precautions. In the weeks and days ahead, these two factors are likely to fluctuate substantially.
We have accessed a variety of publicly available data in order to estimate how people’s behavior and weather trends across the US impacts the local transmission rates of COVID19. In particular, we are using
- County-level daily times series data on cases and deaths from COVID19 1
- County-level daily times series data on people’s mobility immediately before and during the pandemic 2
- County-level population and population density figures 3
- State-level daily times series proxy data for the extent to which people are taking individual precautions 4, and
- County-level daily time series data on the maximum temperatures 5
In this article, we take a moment to explain the main takeaways from that research and discuss some details of the estimated model. In addition to this blog, we have developed a web app that allows visitors to investigate what the transmission rate is nationally and the ability to see how their specific county is doing. The app can be found here or click on the button below. Note the app is always being updated and improved.
Four Main Takeaways
Population Density Matters
The population density varies greatly from one county to the next. On the high-end, there are places like New York City (including the boroughs), with 37,000 residents per square mile. On the low end, you have places like Catron, New Mexico with 0.5 people per square mile. Places with lower population densities generally have lower transmission rates – the suspected reason being that people from less densely-populated areas interact with fewer other people during a given week.
The chart below shows the directly estimated transmission rates 6 on March 7th in the US for the counties that had cases at that time. There are relatively few points on this chart, because at that time, there were not many counties in the US with cases. We chose March 7th for this chart because we want to show what the transmission rates were ‘in the wild’ before schools were closed and stay at home orders were issued. This chart demonstrates why we would expect a positive correlation between population density and local transmission rates -we can see something like a positive relationship before counties started responding.
That said, even in sparsely populated areas, people interact with other people. Even places as sparsely populated as Catron, New Mexico (with its 1 person per 2 square miles) has had a few cases of COVID19 and some local transmission.
The next chart below shows our estimate of the local background transmission rates 7 for 1,988 counties for which we have complete data, assuming no containment methods (no shelter in place) and no awareness (no masks, no hand washing) of the disease. Notice that even in very sparsely populated counties, if cases arrive, we should expect the transmission rates to be above 1, unless actions are collectively taken to reduce the spread.
Note that the chart above shows model estimated transmission rates, not observed transmission rates. The variation in the chart above is due to another factor in our model, temperatures.
Warmer Temperatures Reduces the Transmission Rates
There’s been a lot of debate regarding if (and if so, to what extent) COVID19 will be impacted by weather 8 9. Our findings indicate that warmer temperatures have a small negative impact on the spread of COVID19. In particular, for every 1% increase in the temperature (measured in Kelvin), there’s a 2.7% decrease in the transmission rates. Since these temperatures are in Kelvin, a 1% increase is ~2.8°C or 5°F. So, as temperatures go from 45°F to 90°F, for example in southern United States climates, we estimate a decrease in the transmission rate 10 of 24%.
Rejecting Alternative Hypotheses about Weather
There are two alternative hypotheses that some may argue account for the perceived correlation between temperature and transmission rates. First, that warmer counties in the US have other characteristics that are reducing the spread. And second, that Amercians in every county have become more diligent at reducing the spread over time (as temperatures get warmer). Neither alternative 11 is supported by the data.
If we control for dates (by subtracting the mean transmission rate for that day from our observations) and consider only the deviation from the mean transmission rate on that day, we observe that warmer places generally have lower transmission rates -and this cannot be accounted for by changes in behavior over time. The chart above shows the slight (but significant) decrease in transmission rates we observe with warmer temperatures, after controlling for date.
If we control for counties in a similar fashion, (by subtracting from our observations the mean transmission rate in each county) and consider only the deviation from the mean transmission rate in that county, we still observe that warmer places generally have lower transmission rates -and this cannot be accounted for by changes in behavior from county to county.
While higher temperatures are helpful in reducing the spread, it is worth noting that, unless the population density is exceedingly low (less than 1 person per square mile), some preventative measures still need to be taken (if there are cases present) in order to stop exponential spread.
And unfortunately, the converse is also true: cooler temperatures are likely to increase the rate of spread. Our model anticipates colder temperatures (in the fall, for instance) are likely to increase transmission rates and as we go from 90°F to 45°F, the 24% reduction will be reversed. We should anticipate more community effort will be needed to reduce the spread in the fall.
Staying at Home (or Sheltering in Place) Reduces the Transmission Rates
All of our staying at home, conducting remote meetings, home-schooling our kids and reducing our trips to retail outlets has had a meaningful beneficial impact on the transmission rates of COVID19.
First, let’s look at how much more we have been staying home. The chart below shows that for every measured county, people began staying home at least 10% more for at least some days. In some counties, like San Francisco County (in orange) and New York County (in brown), people stayed home around 30% more for a five weeks.
Comparing directly measured transmission rates to the rates of residential mobility (which we use as a proxy for ‘staying home’), gives us the chart below. We observe a negative correlation between mobility and directly measured transmission rates. The green line represents the (population-weighted 12) univariate correlation between residential mobility and the directly measured transmission rates.
We estimate that for each 1% increase in residential mobility there is a 2.7% decrease in the transmission rate. So, in Houston (where Hahn Stats is headquartered) the population’s peak ‘residential mobility’ was around 1.22 (or 22% above a baseline of 1). We calculate that the collective reduction in mobility from the people of Harris County (the county Houston is in) reduced the transmission rate by about 57%.
The practical implications of this is that, as we reduce our restrictions we should expect the transmission rate to increase substantially. So, for Harris County, if mobility returned to normal we should expect to undo the 57% decrease we have observed thus far -more than doubling the rate of transmission. Fortunately, however, even as restrictions have been relaxed, people are only gradually increasing mobility.
The Public’s response to Government is gradual and incomplete
Considering the chart above (‘Staying Home during a Pandemic’), our populations do not adhere indefinitely to government shelter-in-place orders -nor do we respond immediately to the removal of restrictions. Notice that even though almost all counties were under restrictions to stay home in mid-April, people began venturing out more anyway. And, even though many places lifted restrictions in early May, we do not see a sudden decrease in people staying home, but a continuation of the gradual decline. The implication here is that local government mandates are not light switches, leading to public response, but a reason (among other reasons) that people weigh when determining their actions.
Provided the observation above holds more generally, that government mandates are a ‘reason among reasons’ has two implications. First, the economy cannot be ‘turned on’ by a simple change in policy. The process will be gradual. We have learned that, not only does it take time for businesses to respond to the new consumer situation, it seems consumers themselves only gradually and partially respond to signals from the government. For your business, if you are anticipating a specific ‘all-clear’ signal from the government where things will ‘return to normal’, that’s a mistake. Instead, adjustments to the ‘new normal’ should be made, with the anticipation of gradual improvement to the consumer economy.
Second, in the event that the situation worsens substantially in your local community, we should expect renewed shelter-in-place orders to be similarly sluggishly followed. There is a danger associated with a sluggish response. In situations where growth is exponential, a week’s difference in response time can be the difference between a manageable situation and one that overwhelms local hospitals. From a practical standpoint, businesses should be prepared for both extremes. It’s possible there’s no resurgence in your county and it’s possible your community’s health care system will be overwhelmed. If the public’s response could be flipped like a light switch, we could discount the later possibility -but here we cannot.
Estimated Model
For the sake of being precise (and for those of you interested in estimating this for yourself) below is the equation for the model we are estimating.
Note the following. First, the dependent variable is the log weekly change in the death (D) rate at some point in time (t) and for some county (c). The independent variables are all lagged by 17 days, which accounts (roughly) for the length of time until death from COVID19. We are assuming that this week’s growth in death rates is due to factors impacting the transmission rates 17 days ago.
Here, ‘residential’ refers to the weekly average residential mobility for the 7 day period ending at t-17. We estimate r= -2.68 ± 0.16, which means a 1% increase in people staying home in the week ending 17 days ago leads to a 2.68% decrease in the growth of the death rate today.
‘adjDens’ is the immunity-adjusted population density. It’s the density of the community times the square 13 of the proportion of still vulnerable people. We assume that for each observed COVID19 death, there are 100 contemporaneously immune 14. We estimate a= 0.08 ± 0.0054, meaning for each 1% increase in vulnerability-adjusted population density, there is a .08% increase in the transmission rate.
‘Masks’ is the cumulative 15 google search interest (scaled above a pre-covid baseline) observed on a state-by-state basis. It’s a proxy variable for the vigilance and other measures people are taking to reduce the spread. Wearing masks, washing hands, people not touching their faces or shaking other people’s hands, disinfecting shopping carts, installing shields at retail locations, etc. We use state-level google trends data, hence the s subscript. We estimate m=-0.73 ±0.017, meaning that for each 1% increase in our indicator of other personal measures, we see a decrease in the transmission rate of 0.73%.
‘kMax’ is the 7-day rolling average of the maximum daily temperature in the county, measured in degrees kelvin. We estimate k= -2.38± 0.25, meaning for each 1% increase in outdoor temperatures (kelvin) we observe a decrease in the transmission rates 2.38%.
Known Model Limitations
Our model only accounts for 75% of the variation in the number of deaths from COVID19 from one county to the next. There are a variety of possible reasons why there is more variation than we can account for. We enumerate the ones we have thought of here.
First, there are known factors our model does not account for, like super-spreaders 16, prisons populations or other high-density locals within a county, humidity, school closure time-tables, in-home population density, or access to health care. Each of these likely influence the transmission and death rates. The net effect of these missing variables means it is likely that our estimates for the included factors are probably estimates for an unknown combination of missing and included factors. If so, we should expect the actual impact of staying home (for instance) to be slightly less than the estimated effect. If we had to guess, we suspect that for each 1% staying home, the true impact is a 2% 17 decrease in transmission, not 2.6%.
Second, our model (and, indeed, our data) does not distinguish between COVID19 strains, which may prove important in ways that are hard to hypothesize about.
Third, our model does not distinguish between locally transmitted cases and imported cases. Nor do we control for travel between counties and its reduction. Our model is blind to the impact of travel restrictions and/or airfare as a possible vector of transmission. However, in the defence of the model, it is unclear whether it is reasonable to include travel in a model estimating local transmission rates.
And finally, our model is log-linear in all of its predictors. It’s entirely possible that temperature or awareness, for example, has a non-monotonic impact on transmission rates. If so, it is possible that we have underestimated the impact of temperature in temperature ranges of practical importance. And, we may be overestimating the importance of continued growth in awareness.
In addition to missing variables that are likely important to explaining the spread, it is also worth noting that there is uncertainty about the actual transmission rate 18 at any given point past. We estimate the transmission rate by calculating week-over-week changes in the death rates (and shifting dates to when it is likely the transmission occurred). This itself is an estimate of the transmission rate, and one that can vary substantially from the ‘true’ transmission rate, especially in the early stages of an outbreak (when imported cases and missed cases can easily under-count the infected and skew transmission rates).
What’s Next
This epidemic is likely to stay with us until we have vaccines and effective treatments. We will continue to update our models weekly and provide analysis. There are a variety of topics that we anticipate covering in the next few weeks that we were unable to put into this blog. For instance,
- What does our model say about the impact of micro-behavior adjustments 19 on transmission rates?
- How bad can transmission rates get this summer?
- As an individual, what is the personal risk of venturing out?
- How should we plan our events?
In the meantime, feel free to use our web app to get our current estimates on transmission rates.
- New York Times, “Coronavirus County Data US”; https://github.com/nytimes/covid-19-data Accessed: May, 25 2020[↩]
- Google LLC “Google COVID-19 Community Mobility Reports”; https://www.google.com/covid19/mobility/ Accessed: May, 25, 2020[↩]
- U.S. Census Bureau 2019. Population Estimates. Retrieved from Census API.[↩]
- Google LLC “Google Trends”; https://trends.google.com/trends/explore?q=mask&geo=US[↩]
- National Oceanic and Atmospheric Administration, “Climate Data Online”; https://www.ncdc.noaa.gov/cdo-web/webservices/v2 Accessed: May 20th, 2020[↩]
- The directly estimated transmission rate divides the rolling average number of deaths in a week by the rolling average number of deaths from the previous week. As for lag, rolling average deaths recorded on April 2nd and April 9th determine the estimated transmission rate on the 15th of March, for instance.[↩]
- In this context, a local background transmission rate is the number of people a randomly selected infected person would infect over the course of a week if people did not social distance and were generally unaware of the virus. We do not call it R0 because, generally, R0 does not change with things like population immunity or the weather.[↩]
- “The Modest Impact of Weather and Air Pollution on COVID-19 Transmission”; https://projects.iq.harvard.edu/files/covid19/files/weather_and_covid-19_preprint.pdf[↩]
- “Effects of temperature and humidity on the spread of COVID-19: A systematic review.”; https://www.medrxiv.org/content/10.1101/2020.04.14.20064923v1[↩]
- With regards to temperature, our model has some clear drawbacks. In particular, we assume the impact is monotonic and multiplicative. There’s good biological and sociological reasons to suspect that at certain temperatures, further increases or further reductions are immaterial. Alternatively, there could be an ‘ideal’ temperature for the spread. While we have considered models incorporating those changes, we have not yet found an alternative specification that justifies the additional parameters. Furthermore, we do not have and do not include humidity.[↩]
- Nor is the joint hypothesis (that it is a combination of warmer counties and changing behavior that accounts for the correlation) supported by the data.[↩]
- In general, we use log-population weighted observations, because counties with larger populations have both more cases and more reliable information on transmission rates. This was particularly true a month ago, when we started collecting our data and building our model. Now, the change in weights has little practical impact on the regression coefficients.[↩]
- We use the squared effect because immune people reduce transmission through two linear methods. First, they cannot get the virus, which reduces the average transmissibility of each interaction linearly. Second, they cannot spread the virus, which also independently reduces the average transmissibility linearly.[↩]
- We’re assuming a mortality rate of 1%. This probably is not accurate in places where hospital systems fail. The other parameter estimates are not particularly sensitive to this assumption.[↩]
- We use a cumulative figure, assuming that once people are converted to safer behaviors, they do not convert back quickly. Unfortunately, for the model the cumulative measure can only go up over time. We have investigated various rolling windows, only to find that, to date at least, the cumulative figure makes the most sense. It’s likely that there is impulse decay or a rolling window, however, the effective window is probably longer than the three months we have been dealing with the epidemic.[↩]
- Modeling COVID-19 on a network: super-spreaders, testing and containment[↩]
- Why 2% and not 2.6%? The impact on transmission of staying home should be squared -when people stay home there are less people out-and-about that can get sick and less sick people out of their homes transmitting the virus. The combined effect is squared, which, in a log linear model, means that 1% change in behavior leads to a 2% change in outcome. The other 0.6% could be due to kids staying home from school, for instance. ‘Staying home from school’ definitely rhymes with ‘sheltering in place’, but many young kids don’t have phones. So, their mobility goes un-measured.[↩]
- In order to calculate the actual transmission rate, we would need enough testing and contact tracing to satisfy ourselves that virtually no cases were missed. Until we achieve something like that level of testing, the best we can do is estimate the transmission rate.[↩]
- Masks, handwashing, reduced face-touching, staying home when we are feeling sick[↩]