Sunday, July 19, 2020

Balancing Harms

Why our teenagers won’t be on their high school campus this fall.

Since early March, I’ve been watching the relentless spread of SARS-COV-2 throughout the United States with morbid fascination and a grim determination not to expose myself or my family to this novel virus that looked plenty dangerous even without its full effects being understood. Now that we are learning more about the long-term suffering and damage it can do with the COVID-19 disease it causes–in younger people as well as those my age and older–this determination has not wavered.1 There is nothing more important in my life right now than protecting my family and my own body from this virus.

We’ve been holed up on our rural property with only occasional trips to get curbside pickup or have open-air driveway visits with friends. During these long months of quarantine, I’ve combined my technical background in signal processing and programming with a long-standing interest in math and data modeling to get myself a uniquely clear view into the situation with the COVID-19 pandemic.2 Yes, that’s a bold claim, but there is a lot of work and I think some pretty informative results to back it up.

Modeling the Spread of COVID-19

This work was to develop a nonlinear mathematical model for the number of reported cases of the disease that fits remarkably closely to what we’ve seen for the past eleven weeks in the U.S. as the curve flattened and then started heading upward again.3 For my own personal interests, I’ve applied the model to reported case data from my own Washington State and the most populous county out here in Eastern Washington where I live. For some context, I’ve also used it to consider how badly things are going in neighboring Idaho, and to stare at the dumpster fires raging in states like Arizona, California, Florida, and Texas.

This plot shows what’s happened during May, June, and now half of July in the U.S. overall, and how well the model fits with the cases reported for each of the past 79 days. The data is generously provided to the public by the New York Times, based on reports from state and local health agencies.4 I hope you will understand from these plots why I dared to see myself as having having a uniquely clear view of our pandemic-documentary version of American Horror Story.

U.S. Reported COVID-19 cases vs days elapsed since 1/22

The upper subplot shows the number of reported cases for the entire U.S. since the first of May. There are two curves plotting cumulative case numbers on a logarithmic scale vs the number of days that have elapsed since the first case was reported in this country back on January 22.

The blue curve shows what was actually reported, according to the New York Times data. The red one shows what my model expects was reported for each day in the past, looking backwards from the most recent date (to which it is fixed as an anchor point) with its ten model parameters fitted to the data. The largest error between what the data and the model’s fit to it is just 2.2%, way back on May 11 when there was just over a third as many cases as now.5

Under the Hood

You can skip this section if you don’t love math. Your loss.

The skinny subplot below the big one shows the error between the model’s expectations and reality.6 What you want to see in such a residual plot is a relatively even distribution of modeling error vs the amount being modeled. This one looks about as good as you could ask for, especially when you consider the significance of normality p of 0.38

That means that the “leftovers” from modeling the past data are not much different from what you would get from normally-distributed random noise.7 Because it’s impossible to model noise, you can have some confidence that the model is accounting for most everything but randomness when it is nearly as probable as not that your residuals would look this random if it were indeed just noise causing the error.

It may be more instructive to consider the opposite case, if there were low p value for the non-normality statistic. Say, 0.02 instead of the actual 0.38. That low p value would indicate that re-running 50 experiments (obviously not possible with a natural experiment like the one we are running with worst pandemic in a century) would get you residuals that distinct from normal random error only once on average. That would be a pretty good indication that your model isn’t accounting for some noteworthy phenomenon.8 But that’s definitely not what’s happening here with the fit of this model to reported cases of COVID-19 in the United States, at least not as of July 16.

So my “logistic growth with multiple growth regimes model” is accounting for what we see in the data. It is a naive curve-fitting model that does not assume anything beyond the following:

  1. The number of reported cases of COVID-19 in the U.S. is following a logistic growth model with L (the ultimate upper limit) fixed at 1/4 of the U.S. population,9

  2. but with three separate growth regimes (3 parameters) having smooth transitions between them (4 parameters),

  3. and with a sinusoidal component that imposes a weekly variation (2 parameters) on the current growth rate for each day,

  4. plus, finally, with a fixed number of new cases per day (1 parameter), to allow the model to only account for reported cases on or after May 1.

The best-fit curve has an artificially high initial growth rate r1 of 4.8966 (nearly 500% per day!), which the differential evolution algorithm arrives at because it isn’t actually looking at numbers before May 1. It just wants to fit the data between May 1 and now as closely as possible, and it found the way to do that was to jack up the growth rate for all those unseen days. It’s doing it’s job, and that’s fine; we don’t care about that earlier growth rate for this analysis, just what is happening now.

Following the model forward,10 we soon transition to a relatively sedate growth rate r2 of 1%. The transition occurs over a 32-day window (e.g., the time it takes for half the smooth transition between growth rates) defined by s1 that is centered on Feb. 6, (16 days after the first U.S. case), defined by t1. Then the red-state governors reopen things followed by people living it up on Memorial Day weekend. And we wind up with a 5/30 midpoint between the second growth regime where the curve had been flattening nicely and our current scarier one r3 where SARS-COV-2 reminds us who it is with a 1.9% increase in cases per day.

Fortune Telling with Function Fitting

This model of mine is a naive empirical one, using a cool evolutionary algorithm to fit a curve to data. It’s a very elegant curve, constructed from a first-order differential equation with multiple growth rates and smooth transitions and weekly variation, though that doesn’t make it a basis for much extrapolation.

But it certainly does match up to what’s been happening. An error betwen modeled and actual values of 1% or less going back four weeks, and then 2.1% or less for another eight weeks–forever in pandemic time. That’s 79 data points fitted with just ten free parameters. As with a shoe, if the model fits, wear it.

It fits well enough that I will try out some extrapolation anyhow, despite having just acknowleged the limitations of the model for prophetical purposes. The following plot shows what the model is projecting for U.S. reported cases two weeks into the future.

U.S. Reported COVID-19 cases vs days elapsed since 1/22

The upper subplot of this plot shows how many reported U.S. cases the model projects under the assumption that the data will continue to reflect a 1.9% daily growth rate, with an 11% weekly variation imposed by reporting limitations over the weekends.11 That is of course not an entirely safe assumption to make, no matter how closely the model fits to past data, and I have mostly limited my plots to just a couple weeks of extrapolation.12

The bottom two subplots show the new cases being reported each day, first as a percentage of the cumulative cases already reported as of that day and then (bottom subplot) as a simple number of new daily cases.

We have gotten jaded to the horror of this pandemic over the past several months, but take another look at that number in the upper right. It’s a big one: nearly five million people testing positive in August. And it’s increasing fast.

In the middle plot, we are seeing one fiftieth more being infected with each passing day, with weekly variation due to testing and people being off on weekends. And on the bottom plot, sixty thousand plus or minus new cases each and every day, all carrying the risk of a person losing their health and vigor for weeks if not months, in some possibly for a lifetime. Occasionally it’s a lifetime that the virus cuts short.

Yet people are complaining about wearing masks when they go in a store.

Dr. Anthony Fauci expressed the belief (or perhaps just hope) that the number of daily reported cases would never reach 100,000.13 I fully accept that Dr. Fauci has a wealth of knowledge and insight that is not reflected in my naive curve-fitting to the time-series data. But from what I’m seeing on that bottom red curve, it’s hard for me to see how we can avoid that grim milestone.

To do so, we would need a significantly lower daily growth rate during the coming weeks. It would have to go down enough to cause the value of parameter r3 to decrease, perhaps enough to justify yet another growth regime in the model and an additional three parameters. There is of course no way a naive model builder can know that in advance; this is nonlinear curve fitting with a modest and limited amount of extrapolation, not prophecy.

Track Record

This essay will not dwell on the model’s track record; I’ve done plenty of that in previous blog posts. I’ll just offer a couple of observations from back-testing the model, along with a long footnote about some reddit critics, and leave it at that.14

With data from the day before yesterday (7/16), the model projected that today’s New York Times cumulative cases number would have been 3,738,827, an increase of 149,338 over the two days. It was 3,719,110, an actual increase of 129,617. The model was pessimistic by 15%, or 7% off per day. It’s a naive curve-fitting model, and does not inform us whether this is because the weekly variation is increasing or the growth rate is settling back down, or there was just quite a bit of random variation in one direction.

With data from seven days ago (7/11), the model projected that today’s New York Times cumulative cases number would have been 3,703,746, an increase of 443,181 over the week. Again, it was 3,719,110 today. That error is too small to bother even worth trying to calculate. The projection was essentially perfect one week out.

State and Local

My own state of Washington was doing pretty well with this virus up until early June. Now things aren’t looking great at all, with not just the number of daily cases increasing each day, but also the rate of growth in daily cases. Here is what I’ve been calling the “Nightmare Plot” with the full set of information about the model’s fit to Washington’s reported case data along with a couple weeks of extrapolation into the end of July.

Washington State cases vs days elapsed since 1/22

I don’t actually expect that the long-term growth rate for reported COVID-19 cases in Washington State will settle at an absurd 33% per day, despite the model’s best fit assigning that value (0.32583) to the parameter r2. Something’s gotta give long before that happens, because no society can sustain having its population infected with a deadly virus at a reported rate that increases each day by a third of the total number of cases reported thus far.

The reason the curve fit can get away with such a high estimation of r2 is that it paired it with a very large value of t1, 271.65. That corresponds to an interim point of 272 days after January 22 that lies midway between the initial growth rate (nearly zero) and that crazy high final growth rate of nearly 33%. That’s October 20. I don’t believe things will continue increasing in Washington state until then, and neither should you. But it is useful to know that the best fit for the model parameters right now is one that projects a lot more cases, and a continued increase in how fast those cases are coming, for months ahead. That’s what would happen if the virus were allowed to progress as it has.

It can’t, of course. We won’t allow it to, whatever our politics or petty objections to wearing something on our face to protect other people. You can see why that growth rate will have to go back down–one way or the other–by humoring me with an extrapolation of the Washington State model through the end of August, when school is scheduled to start. This plot shows the percentage of Washington’s population that the model expects will have tested positive for COVID-19 on each day.

Percentage testing positive in Washington State vs days since 1/22

According to the model, cranking away on the data in its ignorance about anything people might do in response to the situation, or about whatever limitations there are on how many tests can be processed in a given day or week, we would be seeing 2% of the people in the entire state testing positive on August 25. That’s right around the time the kids would be heading back to campus.15

Assuming the continued accuracy of CDC Director Robert Redfield’s June 29 assessment that there are ten times as many actual cases as reported ones, one in every five citizens of Washington would have actually contracted COVID-19. Long before that, though, you would see masks everywhere even in the rural red eastern half of the state and Karen would finally shut the fuck up already. It’s actually started happening out here now, after months of opposition and denial.

The curve will flatten again, inevitably. This is the one bit of armchair epidemiology I will dare to offer, if you don’t also count my assigning 1/4 of the region’s population to model parameter L. The rest of my work is just looking carefully at data with a highly refined non-linear model that has reflected that data really well thus far and is pretty good at looking into the future a few days, perhaps even a week or two.16

Here’s the Nightmare Plot for Spokane County, by far the most populous one in Eastern Washington:

Spokane County (Eastern WA) cases vs days elapsed since 2/25

In this instance, we have the extreme periodic behavior of zero cases being reported each of the past two Saturdays (including today). But the model isn’t fooled; it mostly accounts for the periodicity by evolving its parameter aw to an unusually high value of 83% (0.83329). The residuals are fine, as normally distributed as would be expected from random Gaussian errors nearly half the time. And thus it can project with some confidence there will be well over a hundred newly reported cases this coming Tuesday and also on Wednesday. I am pretty confident that we (I live in a surrounding county, but Spokane County looms large) will be seeing four thousand reported cases (in a county of just over half a million) by early August.17

Finally, before going into the heavy parental decision that all this data was in service of, I will offer a Nightmare Plot for neighboring Kootenai County, Idaho. I expect the number of reported cases there to double in the next two weeks.

Kootenai County (North Idaho) cases vs days elapsed since 3/21

Hell No, They Won’t Go

I’ll repeat again that my only relevant expertise is in applying nonlinear mathematical models to time-series data.18 But, alongside my wife, I am still am entrusted with a decision that will affect some young people’s lives. Using this limited area of expertise along with a very comprehensive collection of data, we have decided that our high school kids will not be attending campus in person this fall, regardless of what precautions the school administrators take or what requirements they have. We just won’t do it.

How Many is Too Much?

The metric I have been using to assess how serious things are is 1% of the local or national population testing positive for COVID-19. Again relying on the view of CDC Director Robert Redfield (whose agency is now being shunned by the deranged racist narcissist in the White House) that there are ten times as many actual cases as reported,19 this equates to one tenth of the population that has thus far been infected.20

This 1% threshold has already been reached nationally, around July 12. My modeling of Washington State, Spokane County (WA), and Kootenai County (ID) for the weeks ahead makes me believe that it will also be reached locally by the time my kids would be asked to return to campus this fall. Using Dr. Redfield’s 10:1 actual vs reported estimate, every tenth person will have been infected in the region around my kids’ high school.

Think about that number for a moment. One out of every ten residents of the most populous county in Eastern Washington, or indeed the entire state–will have had that virus growing inside their bodies. Imagine one finger on your two hands held up in front of you being a random person in your community who is or has been a host for SARS-COV-2.21

How Many Contagious People Around?

The thought of 10% of the national or local population having contracted COVID-19 is pretty scary, but how many of them will actually pose a danger to my kids and my wife and me? This is a tough question to ask, and the weakest link in my analysis. My wife and I had an important decision to make, and we’re pretty much on our own in the failed state that America has become, and all I have to work with is the reported cases plus whatever assumption I put on top of them.

Let’s first consider the time window of when people can spread the virus to others. In an article published in March and updated July 2, The Harvard Medical School says the

time from exposure to symptom onset (known as the incubation period) is thought to be three to 14 days, though symptoms typically appear within four or five days after exposure.

We know that a person with COVID-19 may be contagious 48 to 72 hours before starting to experience symptoms. Emerging research suggests that people may actually be most likely to spread the virus to others during the 48 hours before they start to experience symptoms.

The article provides little guidance about when infectiousness might end. “Most symptomatic carriers “will no longer be contagious by 10 days after symptoms resolve,” is the best I can find, wondering if this also applies to people whose symptoms last for weeks.

Assembling all these vague numbers together, I wind up with the assumption of a fuzzy, ill-defined “contagion window” extending out to ten days after exposure. I have no idea what the shape of that distribution would look like. Are there lots of people who you still wouldn’t want to be around more than ten days after they were exposed, or just a few? But to keep things simple, to limit the number of people testing positive who I will consider infectious, and to not be quite so alarmist, I’ll assume that the window extends from the exposure date (realistically, it’s probably the day after exposure at the earliest) to ten days after.

So when did all the thousands of people who tested positive on a given day actually get exposed to the virus and start that (highly uncertain) ten-day contagion window?

Unfortunately, the delay from when a person gets exposed until their exposure results in a reported case is variable, long, and may be getting longer as backlogs of tests pile up. In a scathing April 4 critique of the very kind of analysis I’ve attempted to do–by a person who knows a thing or two about modeling data–Nate Silver said he assumes

that there’s a delay of 15 days . . . between infection and the test results showing up in the data though if anything I suspect this is too generous, given the huge testing bottlenecks in places such as California.

I’ll go with his 15 day estimate. In doing so, I am mindful of his warning that the “number of reported COVID-19 cases is not a very useful indicator of anything unless you also know something about how tests are being conducted.” But I will go ahead and make the terribly simplistic yet perhaps still useful assumption that (1) the exposure resulting in a given day’s new daily cases occurred 15 days before that day, and (2) the window in which all those infected people were infectious to others was from 5-15 days before they became a reported case.

This means that you have to look forward 5-15 days along the red projected-cases curve to see how many people around you are infectious on any given day. What the projection does is to give you an idea as to how many of those as-yet uncounted people are out there being contagious right now.

In the U.S. overall, my model is projecting that we will go from yesterday’s 3.7 million reported cases (none of whom are still in that 5-15 day window) to around 4.8 million 15 days from now.22 That’s 1.1 million additional people testing positive, with most of them in that contagion window right now. I’m further assuming, with Dr. Redfield, that there are ten times as many people actually infected on each given day as what the reported case data shows, which means there are maybe 8-10 million Americans you really don’t want to be around at this time.

Nationally, with our current growth rate and my shaky assumptions, it appears that there are now three infectious people for every reported case. That’s a lot of virus walking around.

How about for Washington State? Simply multiplying by three the 2.2% figure I dared to extrapolate above for Washington on 8/27 would result in around 7% of the population being infectious then. That’s a lot. I’m skeptical of it, too.23 So let’s (irresponsibly) project out the actual number of reported cases and then do the math like we just did for the U.S.

Ignorant of the likelihood that the curve will flatten between now and then, the model projects that 1% of Washington’s population will be testing positive on August 7, which is comparable to the situation now in the U.S. overall.24 See the “Percentage testing positive” plot in the section above. Re-doing the plot with cumulative case numbers rather than percentages looks like this:

“No-flattening” long-term projection for reported cases in Washington State

To do a SWAG25 for the percentage of Washington that is infectious on 8/7 with its projected 78,000 reported cases or so, let’s look forward 15 days to when the model has a business-as-usual projection of around 133,000 reported cases. That’s 55,000 additional cases. If most of those new cases appearing on 8/22 are infectious on 8/7, and if the 10x multipler for actual vs reported cases holds true, that’s perhaps nearly half a million Washingtonians capable of giving us COVID-19.

So, when it projects that 1% of my state’s citizens will be testing positive, assuming the virus is still on the rampage, my model tells me that around 6% of the state population will be capable of infecting me if I get too close. As I said before, I doubt if it will be growing as fast then as the model currently projects, but even half that would be too much.26

And of course I could not resist doing the same irresponsible extrapolation for Spokane County:

“No-flattening” long-term projection for reported cases in Spokane County

The model projects around 2,200 new reported cases from 8/7 to 8/22. Ten times that would be around 4% of the county population actually infected, with most of them infectious to others.27

What’s So Bad About 3%?

Imagine that there is indeed a 3% probability of any randomly selected person from the population around you being able to infect you with COVID-19 by coughing, talking, or even just breathing.

If you (or your school kid) encounters just 30 people randomly chosen from the population in a given day, you have only a 40% probability of avoiding any encounters with an infectious COVID-19 carrier. If you mingle in society and encounter a different set of people each day, your probability of avoiding proximity to a COVID-19 carrier go down dramatically each day. After a week, it’s practically impossible to remain free of any such encounters.

Hopefully those encounters are brief and separated by at least six feet (more space is better), with masks on everybody unless outdoors. That doesn’t sound like any sort of high school experience my kids would have this fall, whether attending under reasonable conditions or staying home. No dances, hugs, high fives, being the class clown or acerbic wit, band, etc.

So they will just stay home regardless and wait for our national folly to play itself out. Hopefully before or at least by the time we reach 5% of Americans reported as infected and probably half actually, we will have some herd immunity going and Spring Quarter will look safe.28

If it reopens, each of the students attending our kids’ high school will have a terrifyingly high probability of being exposed at some point through their day to someone infected by COVID-19. This presents an individual and family risk that my wife and I will not ask our kids to bear.

The Ethical Dimension

There is an ethical dimension to this as well. Admittedly, it is a small and secondary part of my considerations because, like you, I consider my health and that of my own family of paramount importance.

But here it is: Do I want to participate in an activity whose existence poses a serious health risk to a teacher, janitor, teacher’s assistant, or administrative employee? A person already underpaid and unappreciated who probably feels compelled to enter that building full of people whose skills in risk assessment and decision-making will not fully develop for a few years yet?

That person may be decades older than the students talking loudly in their classrooms29 or the kids whose bathrooms they are tasked with cleaning. And they may have little choice but to put their bodies at risk for simple economic survival, including, ironically, being able to keep the very health insurance they rely on to keep from going bankrupt if they get sick. They don’t have the luxury my kids have (though they hate it) of sitting home with both parents there.

It is an upside of having older parents that perhaps balances the increased risk we have, but worthless if we don’t make use of it. So we will continue staying home and staying vigilant, and thus deny this virus one small set of hosts to travel with.

Ironically, the parents that this effects most may be teachers with kids in school themselves.30 People in other professions and trades have to figure something out for their kids to be home for two months every summer here in most places in the U.S. But teachers have long enjoyed the perk of summers off, and so haven’t needed to make child care plans for the summers. When they go back to work, their kids go back to school. Well, maybe not this time, if they are in a reopening school district and decide not to subject their kids (and thus, indirectly, themselves) to the risk of infection.

The Decision: A Balance of Harms

Our decision will affect our kids’ future. There are serious consequences to them either way. On one side, this avoids subjecting them to an unacceptably high risk of catching a virus whose likelihood of causing months of illness and disability, long-term damage, and even death is only now being appreciated.31 Or (especially regarding the possibility of death) of passing it on to the two people they love most on earth.

On the other side is the sober realization that these teenagers of ours are going to miss out on much of the activities and flirtations and friendship intimacies of the years we remember from when we were them.32 There’s just no substitute for that experience in “remote learning,” whose name it seems to me refers as much to the likelihood of learning occurring as to the physical distance.

This decision is a balance of harms. It was difficult to make, but allowing our kids back onto their high school campus this fall imposes an individual and family risk on them that they cannot be asked to bear.

There are three times as many people who have been infected across the U.S. as there were at the beginning of May. Now the spread of the virus is growing twice as fast as it was then. Yet the general public and its elected officials (far more so in the GOP and its Dear Leader but to some extent true of both parties) have been acting like re-opening everything is simply inevitable no matter how many refrigerated morgue trucks a hospital needs to have stationed outside, no matter how many millions of people young and old wind up with permanent damage to their lungs, kidneys, hearts, and even their brains.33 No matter how many weeks of suffering many millions have to endure, each successive day growing with the fear and dread of being one of the unlucky long-term sick. Nope, we have to get things re-opened again, no matter what.

Again, I recognize that my wife and I are privileged in not having to leave the house every day for work. But the fact that in-person reopenings of various types are pretty much economic necessities in our dog-eat-dog unfettered capitalism does not make me want to participate in them if I can help it. That’s mostly from a desire for preservation of self and family, but to a small extent to not participate in the mass delusion that everything will be OK.

Notes


  1. This essay isn’t intended to be a collection of scary links, but it’s worthwhile to consider the view of an ICU doctor last week that his patients have gone from having an average age of around 65 to “between 25 to 35, 45 years old” (“Miami Hospital ICU Doctor: New Influx Of Patients Is Younger Than Before,” NPR, July 13

  2. The technical details of the modeling are covered later in this essay. I had spent much of my Python coding time in the past two years working on a free, open-source software project that models power MOSFET devices using Python and a freely available underlying simulation tool called Ngspice. Then a deadly pandemic happened, my family went into a no-nonsense quarantine for months, and frankly it became a bit difficult to concentrate on something as removed from practical life as that. Instead, I took the same nonlinear modeling tools I developed for the simulation project and applied them to time-series data on reported COVID-19 cases. This essay with its plots is the culmination of that effort. 

  3. The model is implemented in an example file covid19.py that is part of my ADE Python package. It’s free software; with the right software skills, you can install it and try it out for yourself. 

  4. Earlier I was using data from The Johns Hopkins University, but have switched to the New York Times data both for the detail it offers as well as its simple and open licensing terms, which are co-extensive with the Creative Commons Attribution-NonCommercial 4.0 International license, plus this:

    “In general, we are making this data publicly available for broad, noncommercial public use including by medical and public health researchers, policymakers, analysts and local news media.”

    “If you use this data, you must attribute it to ‘The New York Times’ in any publication. If you would like a more expanded description of the data, you could say ‘Data from The New York Times, based on reports from state and local health agencies.’” 

  5. Some of the states and counties I’ve looked at fit well enough to their datasets without all ten parameters used in the national model, and thus have simpler models. Washington State, Spokane County, and Kootenai County have only two growth regimes, seven parameters instead of ten. The AICc metric was used to determine what parameters “earned their keep” and remained in the model. They only do if they result in a lower (better) AICc score. 

  6. A square-root transform is applied to both the modeled and actual new-case numbers to arrive at the residuals. Then the sum of squared error (SSE) is computed by squaring each residual value and adding them up. The purpose of the transform is to lessen the disproportionate impact of later, higher numbers on the curve fit. 

  7. The “normal” or Gaussian probability distribution describes many events in nature, from the noise you hear on a radio to the variation in people’s heights. It is what you eventually wind up with when you look at enough related phenomena together, even if the underlying probability distributions of each one are not normal. 

  8. For some reason, a p of 0.05 is the standard for many statistical tests. So, on that basis, the residuals are well within the acceptable range of normality. 

  9. This value of L is the largest number of people the state/​county population expected to be reported as infected. According to Wikipedia, the lowest estimate for COVID-19 herd immunity of actual (not just reported) cases is 50%.

    But there will always be more actual cases than reported. As mentioned in the essay, on June 25 CDC director Robert Redfield indicated there are 10 times as many actual cases as reported. That would reduce a 50% herd-immunity value of L to 0.05 times population. But the 10x ratio could change as the number of cases increases and testing could increase to the point where nearly every American has been tested.

    To be conservative about the modeling and hopefully avoid inserting too many shaky assumptions into it, I’ve fixed L at 0.25 times the region’s population. This limits the increase in reported cases (regardless of the growth regime) as the number approaches 1/4 of the state or county’s total population.

    It actually doesn’t matter much. At this point in the pandemic, L has almost no impact yet; all of the regions I’ve looked at still have reported case numbers that are single-digit percentages of their populations. 

  10. The curve fitting algorithm actually follows the model backward from the latest day’s data, to which it fixed its cumulative case result. It does this by backwards-integrating a first-order differential equation xd(t,x) whose gory details are shown in the upper left part of the first plot. Applied mathematics in all its glory, and it works. 

  11. The exact parameters are shown in the lower right of the first set of subplots, with the third growth regime’s daily rate r3 equal to 0.18863 and the weekly variation aw equal to 0.11354. 

  12. Extrapolating a nonlinear model is not something to be done lightly even if the underlying phenomena are well understood and predictable. In the case of America’s COVID-19 pandemic, there are a couple of known unknowns (to paraphrase Donald Rumsfeld) worth mentioning in addition to all the unknown unknowns.

    The curve could flatten again if those states whose citizens and governors have not been taking this virus seriously might finally start to do so when enough of the people around them get sick and they see a lot of those people staying sick for a long time, or even dying. A second less comforting possibility is that we may soon see the effects of Trump’s stated desire not to have the reported numbers increase quite so fast (“Slow the testing down, please!”), and of his latest move to keep data from reaching the CDC. Doesn’t it seem that a federal agency named “The Center for Disease Control and Prevention” should be informed about just how many cases there currently are of a pandemic affecting millions of Americans? 

  13. Norah O’Donnel: “Dr. Fauci, do you still think that we could reach 100,000 infections a day?” Fauci: “You know, Norah, I don’t think we will. I hope not. It is conceivable that if we don’t get good control over the current outbreak and we keep spreading into other regions of the country, we could reach 100,000” (CBS News, “Fauci says he doesn’t like ‘to be pitted against the president’ after multiple attacks from the White House,” July 16

  14. Posting about this modeling work of mine on the r/Coronavirus subreddit has resulted in a lot of page views, which has been appreciated since the goal of the writer is after all to be read, and this seems like something important to share. It also results in a few critical comments each time, some of which have actually been helpful in guiding me toward more statistically rigorous modeling. I am finally reminded of the public health official in Argentina who took me to task back in April about not providing any statistical tests about the model’s residuals. It was a valid criticism which I accepted and responded to, and the official gracefully acknowledged frustration and fears about the virus in their country.

    Some of the critical comments have just been exasperating snark from people who consider themselves far above something so naive as fitting data to a curve and daring to claim that the curve means something when it fits very well. A recent interloper gazed down from his high horse to proclaim that I am trying to model sampling error with my weekly-variation component. I tried unsuccessfully to point out that my effort here is to model reported cases, not to imagine some expertise about an unknown number of actual cases of infection out there, hidden in the unreported masses. If the time series of case reports has a periodicity to it, as it obviously does for the U.S. overall and also more than half the states and counties I’ve examined, then that is what I’m modeling more faithfully by adding a periodic component. I predict the reported cases will increase by not much more tomorrow–perhaps even less–than they did today, because the periodic component to the rate at which cases are reported reaches the trough of its wave tomorrow. That says nothing about how cases are “actually” increasing, nor do I ask it to.

    The same inquisitor muttered something about basis functions and suggested I look into learning about the Fourier series. (I’m an electrical engineer with expertise and a dozen patents related to signal processing; I’ve heard of the Fourier series, thanks.) How silly of me to think that there is anything special about fitting a ten-parameter combination of any functions to within 2.1% over 70+ data points. Why, he (we can all guess it was probably a “he,” can’t we?) could pull off that same trick with just about anything. I suggested it was absurd for him to claim that ten free parameters of a Fourier series (five terms with a real and imaginary component apiece, though I’d forgotten about the DC component) would accomplish the same thing, and he made some assertion that he managed it with seven, presumably seven components. No plots or code or anything was provided, of course. And even if he could, he would find the innate periodicity of the Fourier series bringing his case numbers right back to the simple linear increase of the zero-frequency DC component, back and forth in an absurd roller coaster totally unrelated to the problem at hand as his modeled cases come in faster and then slower and then faster again.

    I’ll post this on reddit like my other essays about this model. But this time I think I’ll pass on any further schooling from the boy geniuses there who remember a few textbook cases and fancy terminology but offer no relevant work of their own for comparison. 

  15. Update 7/23: If I kept updating these blog posts with each new day’s data, I’d never stop. But given how sensitive such a long extrapolation is to changes in recent data, I do feel compelled to offer this link to a plot updated with 7/22 data that shows 1.3% of the state testing positive on August 25 instead of 2%. There is just a hint of flattening showing up in the curve. 

  16. For the Washington State data, the model has been a bit less accurate than for the U.S. as a whole. With 7/16 data, it projected 48,001 reported cases today when there were actually 47,563, an error of 34% (16% per day) in the projected vs actual increase from 46,268.

    With data from seven days ago, it projected there would be 46,950 reported cases today. Again, there were 47,563. That’s an actual increase of 6,237, compared to a projected increase of 5,624. The error was around 11% for the week, or around 1.5% per day. 

  17. Admittedly, the model starts to deviate quite a bit from what happened when it looks back more than three weeks. It goes from a worst-case modeled vs actual error of 6.9% on 7/11 to 12.8% on 6/28, 14.5% a week earlier, and 21.9% a week before that. There was an obviously artificial glitch in test results being released around 6/27-6/28, which the model will only approach as it smooths its errors over the whole interval of interest.

    Update 7/23: A Nightmare Plot for Spokane County updated with 7/22 data is here. The model projects 4,000 reported cases on August 4. 

  18. I hope this modeling work gives people some insights about the situation we face here in the U.S. But please note this critical disclaimer: First, I disclaim everything that the New York Times does. I’m pretty sure their lawyers had good reason for putting that stuff in there, so I’m going to repeat it. Except think “Ed Suominen” when you are reading “The New York Times”:

    “The New York Times has made every effort to ensure the accuracy of the information. However, the database may contain typographic errors or inaccuracies and may not be complete or current at any given time. Licensees further agree to assume all liability for any claims that may arise from or relate in any way to their use of the database and to hold The New York Times Company harmless from any such claims.”

    Second, I know very little about biology, beyond a layman’s fascination with it and the way everything evolved. (Including this virus!) I do have some experience with modeling, including using the ADE package of which this COVID-19 modeling is a demonstration file to develop some really cool power semiconductor simulation software that I’ll be releasing in a month or so from when I’m doing the GitHub commit with this COVID-19 example. The software (also to be free and open-source!) has a sophisticated subcircuit model for power MOSFETs that evolves 40+ parameters (an unfathomably huge search space). And it does so with this very package whose example you are now reading.

    So–yes, this is still a disclaimer–I am not an expert in any of the actual realms of medicine, biology, etc. that we rely on for telling us what’s going on with this virus. I just know how to fit models to data, in this case a model that is well understood to apply to biological populations.

    Don’t even think of relying on this analysis or the results of it for any substantive action. If you really find it that important, then investigate for yourself the math, programming, and the theory behind my use of the math for this situation. Run the code, play with it, critique it, consider how well the model does or does not apply. Consider whether the limiting part of the curve might occur more drastically or sooner, thus making this not as big a deal. Listen to experts and the very real reasoning they may have for their own projections about how bad this could get.

    Seriously, experts know stuff. That’s what they do. One of them I recommend paying attention to is Dr. Osterholm at the University of Minnesota’s Center for Infectious Disease Research and Policy (CIDRAP). His interview in the July 1 podcast episode Viral Gravity is sobering but informative and, as we’ve seen in the nearly three weeks since, has been quite accurate about how serious the situation is. 

  19. Washington Post, “CDC chief says coronavirus cases may be 10 times higher than reported,” June 25

  20. There’s nothing magic about my 1% threshold for reported cases, i.e., 10% of the population actually infected. It is just easy to picture 1/10 of a population. That’s one of your digits from both hands. The Romans understood the power of that concept; “decimation” referred to the Roman Army’s practice of brutally killing every tenth man in a legion to terrorize and punish insubordination. 

  21. I have just the finger in mind when it comes to the “leadership” of our current president in this crisis, but that’s another blog post entirely. 

  22. Update 7/23: This projection is unchanged. Here is the Nightmare Plot for the U.S. overall, updated with data through 7/22. 

  23. Update 7/23: With today’s updated data and the 1.3% figure it now projects for 8/27, the 3:1 multiple would yield 4.2% of the population being infectious. Still more than the 3% figure discussed in the main text. 

  24. Update 7/23: Now projected as 0.92%. Close enough. 

  25. Acronym for “scientific wild-ass guess,” an old term of endearment used by engineers who know full well how many shaky assumptions are involved with many a technical endeavor. 

  26. Update 7/23: The projection updated with 7/22 data now looks like this: around 70,000 cases on 8/7, and around 94,000 on 8/22. The increase is projected to be 24,000 rather than 55,000. So, with all the other assumptions kept the same, we would be looking at 3.1% of Washingtonians capable of infecting you on 8/22. Not as much, but the discussion of 3% still applies. 

  27. Update 7/23: With 7/22 data, the “no-flattening” long-term projection for Spokane County is for around 2,000 new reported cases, just a bit less than previously projected. 

  28. There will hopefully be a vaccine early in 2021 but when and for whom? At the rate things are going now, I wouldn’t be surprised if we get to 50% infected before one becomes available for everyone in my family. 

  29. “[L]oud speech can emit thousands of oral fluid droplets per second . . . there is a substantial probability that normal speaking causes airborne virus transmission in confined environments.” Stadnytskyi, Bax, et al., “The airborne lifetime of small speech droplets and their potential importance in SARS-CoV-2 transmission,” Proceedings of the National Academy of Sciences Jun 2020

  30. Rebecca Martinson, a public school teacher in Washington State, explains her decision to not return to class, no matter what, in this thoughtful essay, “Please Don’t Make Me Risk Getting Covid-19 to Teach Your Child,” New York Times, July 18.

    She writes, “My school district and school haven’t ruled out asking us return to in-person teaching in the fall. As careful and proactive as the administration has been when it comes to exploring plans to return to the classroom, nothing I have heard reassures me that I can safely teach in person.” After listing off all the sacrifices teachers have had to make for their students and careers, she says that it “isn’t fair to ask me to be part of a massive, unnecessary science experiment. I am not a human research subject. I will not do it.” 

  31. The danger of this virus to young people has been downplayed. It is frustrating how little statistical data seems to be available on this, but there is a significant minority of even younger people who suffer for months with the aftermath of a Covid19 infection. As high a figure as 5% has been tossed around, but with little statistical support. That is understandable, given that 2/3 of the number of Americans reported as infected have lived through less than six weeks of their positive test result. The majority of “long haulers” have yet to be. 

  32. Admittedly, we never were quite like them. Unlike these poor kids, we never had to suffer the downers of a deranged narcissist bigot wannabe authoritarian as president, a deadly pandemic, and an emerging economic depression all at once. 

  33. “COVID-19: Severe brain damage possible even with mild symptoms,” Deutsche Welle