Sunday, July 19, 2020

Balancing Harms

Why our teenagers won’t be on their high school campus this fall.

Since early March, I’ve been watching the relentless spread of SARS-COV-2 throughout the United States with morbid fascination and a grim determination not to expose myself or my family to this novel virus that looked plenty dangerous even without its full effects being understood. Now that we are learning more about the long-term suffering and damage it can do with the COVID-19 disease it causes–in younger people as well as those my age and older–this determination has not wavered.1 There is nothing more important in my life right now than protecting my family and my own body from this virus.

We’ve been holed up on our rural property with only occasional trips to get curbside pickup or have open-air driveway visits with friends. During these long months of quarantine, I’ve combined my technical background in signal processing and programming with a long-standing interest in math and data modeling to get myself a uniquely clear view into the situation with the COVID-19 pandemic.2 Yes, that’s a bold claim, but there is a lot of work and I think some pretty informative results to back it up.

Modeling the Spread of COVID-19

This work was to develop a nonlinear mathematical model for the number of reported cases of the disease that fits remarkably closely to what we’ve seen for the past eleven weeks in the U.S. as the curve flattened and then started heading upward again.3 For my own personal interests, I’ve applied the model to reported case data from my own Washington State and the most populous county out here in Eastern Washington where I live. For some context, I’ve also used it to consider how badly things are going in neighboring Idaho, and to stare at the dumpster fires raging in states like Arizona, California, Florida, and Texas.

This plot shows what’s happened during May, June, and now half of July in the U.S. overall, and how well the model fits with the cases reported for each of the past 79 days. The data is generously provided to the public by the New York Times, based on reports from state and local health agencies.4 I hope you will understand from these plots why I dared to see myself as having having a uniquely clear view of our pandemic-documentary version of American Horror Story.

U.S. Reported COVID-19 cases vs days elapsed since 1/22

The upper subplot shows the number of reported cases for the entire U.S. since the first of May. There are two curves plotting cumulative case numbers on a logarithmic scale vs the number of days that have elapsed since the first case was reported in this country back on January 22.

The blue curve shows what was actually reported, according to the New York Times data. The red one shows what my model expects was reported for each day in the past, looking backwards from the most recent date (to which it is fixed as an anchor point) with its ten model parameters fitted to the data. The largest error between what the data and the model’s fit to it is just 2.2%, way back on May 11 when there was just over a third as many cases as now.5

Under the Hood

You can skip this section if you don’t love math. Your loss.

The skinny subplot below the big one shows the error between the model’s expectations and reality.6 What you want to see in such a residual plot is a relatively even distribution of modeling error vs the amount being modeled. This one looks about as good as you could ask for, especially when you consider the significance of normality p of 0.38

That means that the “leftovers” from modeling the past data are not much different from what you would get from normally-distributed random noise.7 Because it’s impossible to model noise, you can have some confidence that the model is accounting for most everything but randomness when it is nearly as probable as not that your residuals would look this random if it were indeed just noise causing the error.

It may be more instructive to consider the opposite case, if there were low p value for the non-normality statistic. Say, 0.02 instead of the actual 0.38. That low p value would indicate that re-running 50 experiments (obviously not possible with a natural experiment like the one we are running with worst pandemic in a century) would get you residuals that distinct from normal random error only once on average. That would be a pretty good indication that your model isn’t accounting for some noteworthy phenomenon.8 But that’s definitely not what’s happening here with the fit of this model to reported cases of COVID-19 in the United States, at least not as of July 16.

So my “logistic growth with multiple growth regimes model” is accounting for what we see in the data. It is a naive curve-fitting model that does not assume anything beyond the following:

  1. The number of reported cases of COVID-19 in the U.S. is following a logistic growth model with L (the ultimate upper limit) fixed at 1/4 of the U.S. population,9

  2. but with three separate growth regimes (3 parameters) having smooth transitions between them (4 parameters),

  3. and with a sinusoidal component that imposes a weekly variation (2 parameters) on the current growth rate for each day,

  4. plus, finally, with a fixed number of new cases per day (1 parameter), to allow the model to only account for reported cases on or after May 1.

The best-fit curve has an artificially high initial growth rate r1 of 4.8966 (nearly 500% per day!), which the differential evolution algorithm arrives at because it isn’t actually looking at numbers before May 1. It just wants to fit the data between May 1 and now as closely as possible, and it found the way to do that was to jack up the growth rate for all those unseen days. It’s doing it’s job, and that’s fine; we don’t care about that earlier growth rate for this analysis, just what is happening now.

Following the model forward,10 we soon transition to a relatively sedate growth rate r2 of 1%. The transition occurs over a 32-day window (e.g., the time it takes for half the smooth transition between growth rates) defined by s1 that is centered on Feb. 6, (16 days after the first U.S. case), defined by t1. Then the red-state governors reopen things followed by people living it up on Memorial Day weekend. And we wind up with a 5/30 midpoint between the second growth regime where the curve had been flattening nicely and our current scarier one r3 where SARS-COV-2 reminds us who it is with a 1.9% increase in cases per day.

Fortune Telling with Function Fitting

This model of mine is a naive empirical one, using a cool evolutionary algorithm to fit a curve to data. It’s a very elegant curve, constructed from a first-order differential equation with multiple growth rates and smooth transitions and weekly variation, though that doesn’t make it a basis for much extrapolation.

But it certainly does match up to what’s been happening. An error betwen modeled and actual values of 1% or less going back four weeks, and then 2.1% or less for another eight weeks–forever in pandemic time. That’s 79 data points fitted with just ten free parameters. As with a shoe, if the model fits, wear it.

It fits well enough that I will try out some extrapolation anyhow, despite having just acknowleged the limitations of the model for prophetical purposes. The following plot shows what the model is projecting for U.S. reported cases two weeks into the future.

U.S. Reported COVID-19 cases vs days elapsed since 1/22

The upper subplot of this plot shows how many reported U.S. cases the model projects under the assumption that the data will continue to reflect a 1.9% daily growth rate, with an 11% weekly variation imposed by reporting limitations over the weekends.11 That is of course not an entirely safe assumption to make, no matter how closely the model fits to past data, and I have mostly limited my plots to just a couple weeks of extrapolation.12

The bottom two subplots show the new cases being reported each day, first as a percentage of the cumulative cases already reported as of that day and then (bottom subplot) as a simple number of new daily cases.

We have gotten jaded to the horror of this pandemic over the past several months, but take another look at that number in the upper right. It’s a big one: nearly five million people testing positive in August. And it’s increasing fast.

In the middle plot, we are seeing one fiftieth more being infected with each passing day, with weekly variation due to testing and people being off on weekends. And on the bottom plot, sixty thousand plus or minus new cases each and every day, all carrying the risk of a person losing their health and vigor for weeks if not months, in some possibly for a lifetime. Occasionally it’s a lifetime that the virus cuts short.

Yet people are complaining about wearing masks when they go in a store.

Dr. Anthony Fauci expressed the belief (or perhaps just hope) that the number of daily reported cases would never reach 100,000.13 I fully accept that Dr. Fauci has a wealth of knowledge and insight that is not reflected in my naive curve-fitting to the time-series data. But from what I’m seeing on that bottom red curve, it’s hard for me to see how we can avoid that grim milestone.

To do so, we would need a significantly lower daily growth rate during the coming weeks. It would have to go down enough to cause the value of parameter r3 to decrease, perhaps enough to justify yet another growth regime in the model and an additional three parameters. There is of course no way a naive model builder can know that in advance; this is nonlinear curve fitting with a modest and limited amount of extrapolation, not prophecy.

Track Record

This essay will not dwell on the model’s track record; I’ve done plenty of that in previous blog posts. I’ll just offer a couple of observations from back-testing the model, along with a long footnote about some reddit critics, and leave it at that.14

With data from the day before yesterday (7/16), the model projected that today’s New York Times cumulative cases number would have been 3,738,827, an increase of 149,338 over the two days. It was 3,719,110, an actual increase of 129,617. The model was pessimistic by 15%, or 7% off per day. It’s a naive curve-fitting model, and does not inform us whether this is because the weekly variation is increasing or the growth rate is settling back down, or there was just quite a bit of random variation in one direction.

With data from seven days ago (7/11), the model projected that today’s New York Times cumulative cases number would have been 3,703,746, an increase of 443,181 over the week. Again, it was 3,719,110 today. That error is too small to bother even worth trying to calculate. The projection was essentially perfect one week out.

State and Local

My own state of Washington was doing pretty well with this virus up until early June. Now things aren’t looking great at all, with not just the number of daily cases increasing each day, but also the rate of growth in daily cases. Here is what I’ve been calling the “Nightmare Plot” with the full set of information about the model’s fit to Washington’s reported case data along with a couple weeks of extrapolation into the end of July.

Washington State cases vs days elapsed since 1/22

I don’t actually expect that the long-term growth rate for reported COVID-19 cases in Washington State will settle at an absurd 33% per day, despite the model’s best fit assigning that value (0.32583) to the parameter r2. Something’s gotta give long before that happens, because no society can sustain having its population infected with a deadly virus at a reported rate that increases each day by a third of the total number of cases reported thus far.

The reason the curve fit can get away with such a high estimation of r2 is that it paired it with a very large value of t1, 271.65. That corresponds to an interim point of 272 days after January 22 that lies midway between the initial growth rate (nearly zero) and that crazy high final growth rate of nearly 33%. That’s October 20. I don’t believe things will continue increasing in Washington state until then, and neither should you. But it is useful to know that the best fit for the model parameters right now is one that projects a lot more cases, and a continued increase in how fast those cases are coming, for months ahead. That’s what would happen if the virus were allowed to progress as it has.

It can’t, of course. We won’t allow it to, whatever our politics or petty objections to wearing something on our face to protect other people. You can see why that growth rate will have to go back down–one way or the other–by humoring me with an extrapolation of the Washington State model through the end of August, when school is scheduled to start. This plot shows the percentage of Washington’s population that the model expects will have tested positive for COVID-19 on each day.

Percentage testing positive in Washington State vs days since 1/22

According to the model, cranking away on the data in its ignorance about anything people might do in response to the situation, or about whatever limitations there are on how many tests can be processed in a given day or week, we would be seeing 2% of the people in the entire state testing positive on August 25. That’s right around the time the kids would be heading back to campus.15

Assuming the continued accuracy of CDC Director Robert Redfield’s June 29 assessment that there are ten times as many actual cases as reported ones, one in every five citizens of Washington would have actually contracted COVID-19. Long before that, though, you would see masks everywhere even in the rural red eastern half of the state and Karen would finally shut the fuck up already. It’s actually started happening out here now, after months of opposition and denial.

The curve will flatten again, inevitably. This is the one bit of armchair epidemiology I will dare to offer, if you don’t also count my assigning 1/4 of the region’s population to model parameter L. The rest of my work is just looking carefully at data with a highly refined non-linear model that has reflected that data really well thus far and is pretty good at looking into the future a few days, perhaps even a week or two.16

Here’s the Nightmare Plot for Spokane County, by far the most populous one in Eastern Washington:

Spokane County (Eastern WA) cases vs days elapsed since 2/25

In this instance, we have the extreme periodic behavior of zero cases being reported each of the past two Saturdays (including today). But the model isn’t fooled; it mostly accounts for the periodicity by evolving its parameter aw to an unusually high value of 83% (0.83329). The residuals are fine, as normally distributed as would be expected from random Gaussian errors nearly half the time. And thus it can project with some confidence there will be well over a hundred newly reported cases this coming Tuesday and also on Wednesday. I am pretty confident that we (I live in a surrounding county, but Spokane County looms large) will be seeing four thousand reported cases (in a county of just over half a million) by early August.17

Finally, before going into the heavy parental decision that all this data was in service of, I will offer a Nightmare Plot for neighboring Kootenai County, Idaho. I expect the number of reported cases there to double in the next two weeks.

Kootenai County (North Idaho) cases vs days elapsed since 3/21

Hell No, They Won’t Go

I’ll repeat again that my only relevant expertise is in applying nonlinear mathematical models to time-series data.18 But, alongside my wife, I am still am entrusted with a decision that will affect some young people’s lives. Using this limited area of expertise along with a very comprehensive collection of data, we have decided that our high school kids will not be attending campus in person this fall, regardless of what precautions the school administrators take or what requirements they have. We just won’t do it.

How Many is Too Much?

The metric I have been using to assess how serious things are is 1% of the local or national population testing positive for COVID-19. Again relying on the view of CDC Director Robert Redfield (whose agency is now being shunned by the deranged racist narcissist in the White House) that there are ten times as many actual cases as reported,19 this equates to one tenth of the population that has thus far been infected.20

This 1% threshold has already been reached nationally, around July 12. My modeling of Washington State, Spokane County (WA), and Kootenai County (ID) for the weeks ahead makes me believe that it will also be reached locally by the time my kids would be asked to return to campus this fall. Using Dr. Redfield’s 10:1 actual vs reported estimate, every tenth person will have been infected in the region around my kids’ high school.

Think about that number for a moment. One out of every ten residents of the most populous county in Eastern Washington, or indeed the entire state–will have had that virus growing inside their bodies. Imagine one finger on your two hands held up in front of you being a random person in your community who is or has been a host for SARS-COV-2.21

How Many Contagious People Around?

The thought of 10% of the national or local population having contracted COVID-19 is pretty scary, but how many of them will actually pose a danger to my kids and my wife and me? This is a tough question to ask, and the weakest link in my analysis. My wife and I had an important decision to make, and we’re pretty much on our own in the failed state that America has become, and all I have to work with is the reported cases plus whatever assumption I put on top of them.

Let’s first consider the time window of when people can spread the virus to others. In an article published in March and updated July 2, The Harvard Medical School says the

time from exposure to symptom onset (known as the incubation period) is thought to be three to 14 days, though symptoms typically appear within four or five days after exposure.

We know that a person with COVID-19 may be contagious 48 to 72 hours before starting to experience symptoms. Emerging research suggests that people may actually be most likely to spread the virus to others during the 48 hours before they start to experience symptoms.

The article provides little guidance about when infectiousness might end. “Most symptomatic carriers “will no longer be contagious by 10 days after symptoms resolve,” is the best I can find, wondering if this also applies to people whose symptoms last for weeks.

Assembling all these vague numbers together, I wind up with the assumption of a fuzzy, ill-defined “contagion window” extending out to ten days after exposure. I have no idea what the shape of that distribution would look like. Are there lots of people who you still wouldn’t want to be around more than ten days after they were exposed, or just a few? But to keep things simple, to limit the number of people testing positive who I will consider infectious, and to not be quite so alarmist, I’ll assume that the window extends from the exposure date (realistically, it’s probably the day after exposure at the earliest) to ten days after.

So when did all the thousands of people who tested positive on a given day actually get exposed to the virus and start that (highly uncertain) ten-day contagion window?

Unfortunately, the delay from when a person gets exposed until their exposure results in a reported case is variable, long, and may be getting longer as backlogs of tests pile up. In a scathing April 4 critique of the very kind of analysis I’ve attempted to do–by a person who knows a thing or two about modeling data–Nate Silver said he assumes

that there’s a delay of 15 days . . . between infection and the test results showing up in the data though if anything I suspect this is too generous, given the huge testing bottlenecks in places such as California.

I’ll go with his 15 day estimate. In doing so, I am mindful of his warning that the “number of reported COVID-19 cases is not a very useful indicator of anything unless you also know something about how tests are being conducted.” But I will go ahead and make the terribly simplistic yet perhaps still useful assumption that (1) the exposure resulting in a given day’s new daily cases occurred 15 days before that day, and (2) the window in which all those infected people were infectious to others was from 5-15 days before they became a reported case.

This means that you have to look forward 5-15 days along the red projected-cases curve to see how many people around you are infectious on any given day. What the projection does is to give you an idea as to how many of those as-yet uncounted people are out there being contagious right now.

In the U.S. overall, my model is projecting that we will go from yesterday’s 3.7 million reported cases (none of whom are still in that 5-15 day window) to around 4.8 million 15 days from now.22 That’s 1.1 million additional people testing positive, with most of them in that contagion window right now. I’m further assuming, with Dr. Redfield, that there are ten times as many people actually infected on each given day as what the reported case data shows, which means there are maybe 8-10 million Americans you really don’t want to be around at this time.

Nationally, with our current growth rate and my shaky assumptions, it appears that there are now three infectious people for every reported case. That’s a lot of virus walking around.

How about for Washington State? Simply multiplying by three the 2.2% figure I dared to extrapolate above for Washington on 8/27 would result in around 7% of the population being infectious then. That’s a lot. I’m skeptical of it, too.23 So let’s (irresponsibly) project out the actual number of reported cases and then do the math like we just did for the U.S.

Ignorant of the likelihood that the curve will flatten between now and then, the model projects that 1% of Washington’s population will be testing positive on August 7, which is comparable to the situation now in the U.S. overall.24 See the “Percentage testing positive” plot in the section above. Re-doing the plot with cumulative case numbers rather than percentages looks like this:

“No-flattening” long-term projection for reported cases in Washington State

To do a SWAG25 for the percentage of Washington that is infectious on 8/7 with its projected 78,000 reported cases or so, let’s look forward 15 days to when the model has a business-as-usual projection of around 133,000 reported cases. That’s 55,000 additional cases. If most of those new cases appearing on 8/22 are infectious on 8/7, and if the 10x multipler for actual vs reported cases holds true, that’s perhaps nearly half a million Washingtonians capable of giving us COVID-19.

So, when it projects that 1% of my state’s citizens will be testing positive, assuming the virus is still on the rampage, my model tells me that around 6% of the state population will be capable of infecting me if I get too close. As I said before, I doubt if it will be growing as fast then as the model currently projects, but even half that would be too much.26

And of course I could not resist doing the same irresponsible extrapolation for Spokane County:

“No-flattening” long-term projection for reported cases in Spokane County

The model projects around 2,200 new reported cases from 8/7 to 8/22. Ten times that would be around 4% of the county population actually infected, with most of them infectious to others.27

What’s So Bad About 3%?

Imagine that there is indeed a 3% probability of any randomly selected person from the population around you being able to infect you with COVID-19 by coughing, talking, or even just breathing.

If you (or your school kid) encounters just 30 people randomly chosen from the population in a given day, you have only a 40% probability of avoiding any encounters with an infectious COVID-19 carrier. If you mingle in society and encounter a different set of people each day, your probability of avoiding proximity to a COVID-19 carrier go down dramatically each day. After a week, it’s practically impossible to remain free of any such encounters.

Hopefully those encounters are brief and separated by at least six feet (more space is better), with masks on everybody unless outdoors. That doesn’t sound like any sort of high school experience my kids would have this fall, whether attending under reasonable conditions or staying home. No dances, hugs, high fives, being the class clown or acerbic wit, band, etc.

So they will just stay home regardless and wait for our national folly to play itself out. Hopefully before or at least by the time we reach 5% of Americans reported as infected and probably half actually, we will have some herd immunity going and Spring Quarter will look safe.28

If it reopens, each of the students attending our kids’ high school will have a terrifyingly high probability of being exposed at some point through their day to someone infected by COVID-19. This presents an individual and family risk that my wife and I will not ask our kids to bear.

The Ethical Dimension

There is an ethical dimension to this as well. Admittedly, it is a small and secondary part of my considerations because, like you, I consider my health and that of my own family of paramount importance.

But here it is: Do I want to participate in an activity whose existence poses a serious health risk to a teacher, janitor, teacher’s assistant, or administrative employee? A person already underpaid and unappreciated who probably feels compelled to enter that building full of people whose skills in risk assessment and decision-making will not fully develop for a few years yet?

That person may be decades older than the students talking loudly in their classrooms29 or the kids whose bathrooms they are tasked with cleaning. And they may have little choice but to put their bodies at risk for simple economic survival, including, ironically, being able to keep the very health insurance they rely on to keep from going bankrupt if they get sick. They don’t have the luxury my kids have (though they hate it) of sitting home with both parents there.

It is an upside of having older parents that perhaps balances the increased risk we have, but worthless if we don’t make use of it. So we will continue staying home and staying vigilant, and thus deny this virus one small set of hosts to travel with.

Ironically, the parents that this effects most may be teachers with kids in school themselves.30 People in other professions and trades have to figure something out for their kids to be home for two months every summer here in most places in the U.S. But teachers have long enjoyed the perk of summers off, and so haven’t needed to make child care plans for the summers. When they go back to work, their kids go back to school. Well, maybe not this time, if they are in a reopening school district and decide not to subject their kids (and thus, indirectly, themselves) to the risk of infection.

The Decision: A Balance of Harms

Our decision will affect our kids’ future. There are serious consequences to them either way. On one side, this avoids subjecting them to an unacceptably high risk of catching a virus whose likelihood of causing months of illness and disability, long-term damage, and even death is only now being appreciated.31 Or (especially regarding the possibility of death) of passing it on to the two people they love most on earth.

On the other side is the sober realization that these teenagers of ours are going to miss out on much of the activities and flirtations and friendship intimacies of the years we remember from when we were them.32 There’s just no substitute for that experience in “remote learning,” whose name it seems to me refers as much to the likelihood of learning occurring as to the physical distance.

This decision is a balance of harms. It was difficult to make, but allowing our kids back onto their high school campus this fall imposes an individual and family risk on them that they cannot be asked to bear.

There are three times as many people who have been infected across the U.S. as there were at the beginning of May. Now the spread of the virus is growing twice as fast as it was then. Yet the general public and its elected officials (far more so in the GOP and its Dear Leader but to some extent true of both parties) have been acting like re-opening everything is simply inevitable no matter how many refrigerated morgue trucks a hospital needs to have stationed outside, no matter how many millions of people young and old wind up with permanent damage to their lungs, kidneys, hearts, and even their brains.33 No matter how many weeks of suffering many millions have to endure, each successive day growing with the fear and dread of being one of the unlucky long-term sick. Nope, we have to get things re-opened again, no matter what.

Again, I recognize that my wife and I are privileged in not having to leave the house every day for work. But the fact that in-person reopenings of various types are pretty much economic necessities in our dog-eat-dog unfettered capitalism does not make me want to participate in them if I can help it. That’s mostly from a desire for preservation of self and family, but to a small extent to not participate in the mass delusion that everything will be OK.

Notes


  1. This essay isn’t intended to be a collection of scary links, but it’s worthwhile to consider the view of an ICU doctor last week that his patients have gone from having an average age of around 65 to “between 25 to 35, 45 years old” (“Miami Hospital ICU Doctor: New Influx Of Patients Is Younger Than Before,” NPR, July 13

  2. The technical details of the modeling are covered later in this essay. I had spent much of my Python coding time in the past two years working on a free, open-source software project that models power MOSFET devices using Python and a freely available underlying simulation tool called Ngspice. Then a deadly pandemic happened, my family went into a no-nonsense quarantine for months, and frankly it became a bit difficult to concentrate on something as removed from practical life as that. Instead, I took the same nonlinear modeling tools I developed for the simulation project and applied them to time-series data on reported COVID-19 cases. This essay with its plots is the culmination of that effort. 

  3. The model is implemented in an example file covid19.py that is part of my ADE Python package. It’s free software; with the right software skills, you can install it and try it out for yourself. 

  4. Earlier I was using data from The Johns Hopkins University, but have switched to the New York Times data both for the detail it offers as well as its simple and open licensing terms, which are co-extensive with the Creative Commons Attribution-NonCommercial 4.0 International license, plus this:

    “In general, we are making this data publicly available for broad, noncommercial public use including by medical and public health researchers, policymakers, analysts and local news media.”

    “If you use this data, you must attribute it to ‘The New York Times’ in any publication. If you would like a more expanded description of the data, you could say ‘Data from The New York Times, based on reports from state and local health agencies.’” 

  5. Some of the states and counties I’ve looked at fit well enough to their datasets without all ten parameters used in the national model, and thus have simpler models. Washington State, Spokane County, and Kootenai County have only two growth regimes, seven parameters instead of ten. The AICc metric was used to determine what parameters “earned their keep” and remained in the model. They only do if they result in a lower (better) AICc score. 

  6. A square-root transform is applied to both the modeled and actual new-case numbers to arrive at the residuals. Then the sum of squared error (SSE) is computed by squaring each residual value and adding them up. The purpose of the transform is to lessen the disproportionate impact of later, higher numbers on the curve fit. 

  7. The “normal” or Gaussian probability distribution describes many events in nature, from the noise you hear on a radio to the variation in people’s heights. It is what you eventually wind up with when you look at enough related phenomena together, even if the underlying probability distributions of each one are not normal. 

  8. For some reason, a p of 0.05 is the standard for many statistical tests. So, on that basis, the residuals are well within the acceptable range of normality. 

  9. This value of L is the largest number of people the state/​county population expected to be reported as infected. According to Wikipedia, the lowest estimate for COVID-19 herd immunity of actual (not just reported) cases is 50%.

    But there will always be more actual cases than reported. As mentioned in the essay, on June 25 CDC director Robert Redfield indicated there are 10 times as many actual cases as reported. That would reduce a 50% herd-immunity value of L to 0.05 times population. But the 10x ratio could change as the number of cases increases and testing could increase to the point where nearly every American has been tested.

    To be conservative about the modeling and hopefully avoid inserting too many shaky assumptions into it, I’ve fixed L at 0.25 times the region’s population. This limits the increase in reported cases (regardless of the growth regime) as the number approaches 1/4 of the state or county’s total population.

    It actually doesn’t matter much. At this point in the pandemic, L has almost no impact yet; all of the regions I’ve looked at still have reported case numbers that are single-digit percentages of their populations. 

  10. The curve fitting algorithm actually follows the model backward from the latest day’s data, to which it fixed its cumulative case result. It does this by backwards-integrating a first-order differential equation xd(t,x) whose gory details are shown in the upper left part of the first plot. Applied mathematics in all its glory, and it works. 

  11. The exact parameters are shown in the lower right of the first set of subplots, with the third growth regime’s daily rate r3 equal to 0.18863 and the weekly variation aw equal to 0.11354. 

  12. Extrapolating a nonlinear model is not something to be done lightly even if the underlying phenomena are well understood and predictable. In the case of America’s COVID-19 pandemic, there are a couple of known unknowns (to paraphrase Donald Rumsfeld) worth mentioning in addition to all the unknown unknowns.

    The curve could flatten again if those states whose citizens and governors have not been taking this virus seriously might finally start to do so when enough of the people around them get sick and they see a lot of those people staying sick for a long time, or even dying. A second less comforting possibility is that we may soon see the effects of Trump’s stated desire not to have the reported numbers increase quite so fast (“Slow the testing down, please!”), and of his latest move to keep data from reaching the CDC. Doesn’t it seem that a federal agency named “The Center for Disease Control and Prevention” should be informed about just how many cases there currently are of a pandemic affecting millions of Americans? 

  13. Norah O’Donnel: “Dr. Fauci, do you still think that we could reach 100,000 infections a day?” Fauci: “You know, Norah, I don’t think we will. I hope not. It is conceivable that if we don’t get good control over the current outbreak and we keep spreading into other regions of the country, we could reach 100,000” (CBS News, “Fauci says he doesn’t like ‘to be pitted against the president’ after multiple attacks from the White House,” July 16

  14. Posting about this modeling work of mine on the r/Coronavirus subreddit has resulted in a lot of page views, which has been appreciated since the goal of the writer is after all to be read, and this seems like something important to share. It also results in a few critical comments each time, some of which have actually been helpful in guiding me toward more statistically rigorous modeling. I am finally reminded of the public health official in Argentina who took me to task back in April about not providing any statistical tests about the model’s residuals. It was a valid criticism which I accepted and responded to, and the official gracefully acknowledged frustration and fears about the virus in their country.

    Some of the critical comments have just been exasperating snark from people who consider themselves far above something so naive as fitting data to a curve and daring to claim that the curve means something when it fits very well. A recent interloper gazed down from his high horse to proclaim that I am trying to model sampling error with my weekly-variation component. I tried unsuccessfully to point out that my effort here is to model reported cases, not to imagine some expertise about an unknown number of actual cases of infection out there, hidden in the unreported masses. If the time series of case reports has a periodicity to it, as it obviously does for the U.S. overall and also more than half the states and counties I’ve examined, then that is what I’m modeling more faithfully by adding a periodic component. I predict the reported cases will increase by not much more tomorrow–perhaps even less–than they did today, because the periodic component to the rate at which cases are reported reaches the trough of its wave tomorrow. That says nothing about how cases are “actually” increasing, nor do I ask it to.

    The same inquisitor muttered something about basis functions and suggested I look into learning about the Fourier series. (I’m an electrical engineer with expertise and a dozen patents related to signal processing; I’ve heard of the Fourier series, thanks.) How silly of me to think that there is anything special about fitting a ten-parameter combination of any functions to within 2.1% over 70+ data points. Why, he (we can all guess it was probably a “he,” can’t we?) could pull off that same trick with just about anything. I suggested it was absurd for him to claim that ten free parameters of a Fourier series (five terms with a real and imaginary component apiece, though I’d forgotten about the DC component) would accomplish the same thing, and he made some assertion that he managed it with seven, presumably seven components. No plots or code or anything was provided, of course. And even if he could, he would find the innate periodicity of the Fourier series bringing his case numbers right back to the simple linear increase of the zero-frequency DC component, back and forth in an absurd roller coaster totally unrelated to the problem at hand as his modeled cases come in faster and then slower and then faster again.

    I’ll post this on reddit like my other essays about this model. But this time I think I’ll pass on any further schooling from the boy geniuses there who remember a few textbook cases and fancy terminology but offer no relevant work of their own for comparison. 

  15. Update 7/23: If I kept updating these blog posts with each new day’s data, I’d never stop. But given how sensitive such a long extrapolation is to changes in recent data, I do feel compelled to offer this link to a plot updated with 7/22 data that shows 1.3% of the state testing positive on August 25 instead of 2%. There is just a hint of flattening showing up in the curve. 

  16. For the Washington State data, the model has been a bit less accurate than for the U.S. as a whole. With 7/16 data, it projected 48,001 reported cases today when there were actually 47,563, an error of 34% (16% per day) in the projected vs actual increase from 46,268.

    With data from seven days ago, it projected there would be 46,950 reported cases today. Again, there were 47,563. That’s an actual increase of 6,237, compared to a projected increase of 5,624. The error was around 11% for the week, or around 1.5% per day. 

  17. Admittedly, the model starts to deviate quite a bit from what happened when it looks back more than three weeks. It goes from a worst-case modeled vs actual error of 6.9% on 7/11 to 12.8% on 6/28, 14.5% a week earlier, and 21.9% a week before that. There was an obviously artificial glitch in test results being released around 6/27-6/28, which the model will only approach as it smooths its errors over the whole interval of interest.

    Update 7/23: A Nightmare Plot for Spokane County updated with 7/22 data is here. The model projects 4,000 reported cases on August 4. 

  18. I hope this modeling work gives people some insights about the situation we face here in the U.S. But please note this critical disclaimer: First, I disclaim everything that the New York Times does. I’m pretty sure their lawyers had good reason for putting that stuff in there, so I’m going to repeat it. Except think “Ed Suominen” when you are reading “The New York Times”:

    “The New York Times has made every effort to ensure the accuracy of the information. However, the database may contain typographic errors or inaccuracies and may not be complete or current at any given time. Licensees further agree to assume all liability for any claims that may arise from or relate in any way to their use of the database and to hold The New York Times Company harmless from any such claims.”

    Second, I know very little about biology, beyond a layman’s fascination with it and the way everything evolved. (Including this virus!) I do have some experience with modeling, including using the ADE package of which this COVID-19 modeling is a demonstration file to develop some really cool power semiconductor simulation software that I’ll be releasing in a month or so from when I’m doing the GitHub commit with this COVID-19 example. The software (also to be free and open-source!) has a sophisticated subcircuit model for power MOSFETs that evolves 40+ parameters (an unfathomably huge search space). And it does so with this very package whose example you are now reading.

    So–yes, this is still a disclaimer–I am not an expert in any of the actual realms of medicine, biology, etc. that we rely on for telling us what’s going on with this virus. I just know how to fit models to data, in this case a model that is well understood to apply to biological populations.

    Don’t even think of relying on this analysis or the results of it for any substantive action. If you really find it that important, then investigate for yourself the math, programming, and the theory behind my use of the math for this situation. Run the code, play with it, critique it, consider how well the model does or does not apply. Consider whether the limiting part of the curve might occur more drastically or sooner, thus making this not as big a deal. Listen to experts and the very real reasoning they may have for their own projections about how bad this could get.

    Seriously, experts know stuff. That’s what they do. One of them I recommend paying attention to is Dr. Osterholm at the University of Minnesota’s Center for Infectious Disease Research and Policy (CIDRAP). His interview in the July 1 podcast episode Viral Gravity is sobering but informative and, as we’ve seen in the nearly three weeks since, has been quite accurate about how serious the situation is. 

  19. Washington Post, “CDC chief says coronavirus cases may be 10 times higher than reported,” June 25

  20. There’s nothing magic about my 1% threshold for reported cases, i.e., 10% of the population actually infected. It is just easy to picture 1/10 of a population. That’s one of your digits from both hands. The Romans understood the power of that concept; “decimation” referred to the Roman Army’s practice of brutally killing every tenth man in a legion to terrorize and punish insubordination. 

  21. I have just the finger in mind when it comes to the “leadership” of our current president in this crisis, but that’s another blog post entirely. 

  22. Update 7/23: This projection is unchanged. Here is the Nightmare Plot for the U.S. overall, updated with data through 7/22. 

  23. Update 7/23: With today’s updated data and the 1.3% figure it now projects for 8/27, the 3:1 multiple would yield 4.2% of the population being infectious. Still more than the 3% figure discussed in the main text. 

  24. Update 7/23: Now projected as 0.92%. Close enough. 

  25. Acronym for “scientific wild-ass guess,” an old term of endearment used by engineers who know full well how many shaky assumptions are involved with many a technical endeavor. 

  26. Update 7/23: The projection updated with 7/22 data now looks like this: around 70,000 cases on 8/7, and around 94,000 on 8/22. The increase is projected to be 24,000 rather than 55,000. So, with all the other assumptions kept the same, we would be looking at 3.1% of Washingtonians capable of infecting you on 8/22. Not as much, but the discussion of 3% still applies. 

  27. Update 7/23: With 7/22 data, the “no-flattening” long-term projection for Spokane County is for around 2,000 new reported cases, just a bit less than previously projected. 

  28. There will hopefully be a vaccine early in 2021 but when and for whom? At the rate things are going now, I wouldn’t be surprised if we get to 50% infected before one becomes available for everyone in my family. 

  29. “[L]oud speech can emit thousands of oral fluid droplets per second . . . there is a substantial probability that normal speaking causes airborne virus transmission in confined environments.” Stadnytskyi, Bax, et al., “The airborne lifetime of small speech droplets and their potential importance in SARS-CoV-2 transmission,” Proceedings of the National Academy of Sciences Jun 2020

  30. Rebecca Martinson, a public school teacher in Washington State, explains her decision to not return to class, no matter what, in this thoughtful essay, “Please Don’t Make Me Risk Getting Covid-19 to Teach Your Child,” New York Times, July 18.

    She writes, “My school district and school haven’t ruled out asking us return to in-person teaching in the fall. As careful and proactive as the administration has been when it comes to exploring plans to return to the classroom, nothing I have heard reassures me that I can safely teach in person.” After listing off all the sacrifices teachers have had to make for their students and careers, she says that it “isn’t fair to ask me to be part of a massive, unnecessary science experiment. I am not a human research subject. I will not do it.” 

  31. The danger of this virus to young people has been downplayed. It is frustrating how little statistical data seems to be available on this, but there is a significant minority of even younger people who suffer for months with the aftermath of a Covid19 infection. As high a figure as 5% has been tossed around, but with little statistical support. That is understandable, given that 2/3 of the number of Americans reported as infected have lived through less than six weeks of their positive test result. The majority of “long haulers” have yet to be. 

  32. Admittedly, we never were quite like them. Unlike these poor kids, we never had to suffer the downers of a deranged narcissist bigot wannabe authoritarian as president, a deadly pandemic, and an emerging economic depression all at once. 

  33. “COVID-19: Severe brain damage possible even with mild symptoms,” Deutsche Welle

Sunday, April 19, 2020

Coming Through in Waves

You are only coming through in waves
Your lips move but I can’t hear what you’re saying
—Roger Waters, Comfortably Numb (Pink Floyd, 1980)
Update, 4/20/20: Rather than dwell on the accuracy or lack of same in the model’s projections anymore–as if this were some kind of grand chess match where the Knights and Bishops don’t actually trample and lance each other–I will just offer the latest Nightmare Plot with a few observations. The one-million mark has received yet another delay, of one day to 4/28, and the model still projects 1.5M reported cases sometime in the second week of May, probably toward the end of that week. The curve continues to fit past data uncannily well even if it has shortcomings with extrapolations more than a few days out.

Most importantly, I am starting to question the wisdom of continuing these updates of a “reported cases” statistic that I’ve begun to suspect is being influenced by delayed and limited testing as much or more as repression of the virus. I do not want to participate in drawing attention to any flattening of a curve if what is pushing it down is a disconnect–whether due to cold political calculations or incompetence–between what that curve actually shows and the actual magnitude of this national and global crisis. Perhaps in Hollywood it doesn’t matter if the curves are real, but this is an important one indeed.

In my previous post a little over a week ago, I summarized the “logistic growth with growth-regime transition” model I’ve developed for reported cases of Covid-19 and offered–with many disclaimers that I include now as well–a few projections:

three quarters of a million cases one week from today, and over a million the week after that. The model’s projection–both now and as it stood on 4/5–is that there will be around a million and a half Americans–one in every two hundred–reporting infection with Covid-19 in early May, with the number still climbing faster every day.

The exact projection for one week from 4/10 (to a degree of precision only useful for evaluating past performance) was 758,921 reported cases in the U.S. On 4/17, there were 699,706. The model was 29% pessimistic about the week’s increase. Instead of 262,386 more cases, there were “only” 203,171.

This works out to a (geometric) average of 3.7% per day. In other words, if the model were that pessimistic every day, projecting 3.7% more than the number of cases that actually were reported and without updating its parameters to try to do better the next day, we would wind up with this much of an error.

That doesn’t seem very far off to me.

With its parameters evolved to fit today’s data (4/19), the model is projecting that the million-case mark will be reached by 4/27, a few days later than it expected a week ago. As far as its projection of 1.5M cases goes, it expects that sometime in the second week of May. Perhaps not quite “early May,” but there’s good reason to use imprecise language when making projections.1

Here’s the latest Nightmare Plot. As always, you can click on it to see the important details.

April 19, 2020: Reported U.S. Covid-19 cases vs days since 1/22

Testing, Testing . . .

Unfortunately, I’m starting to wonder if this modeling of reported cases may be doing more harm than good. The comments in the next few paragraphs are from a layman in the relevant fields of biology, medicine, what’s actually going on with this supremely fucked-up White House, etc. But I think it’s time to discuss reported vs actual cases of Covid-19 in the United States.

The fact that the numbers of daily new cases have fairly suddenly hit a ceiling of around 30,000 for about a week now make me suspicious that my model has become one for how fast we’re able to get people tested, and not for a realistic metric of how many people are infected. If the places doing the testing are unable or unwilling to process more than a certain number per day, of course that is going to result in a flattened curve.

Meanwhile, the numbers of actual infections and sick people waiting for their positive test result may be continuing to soar, unseen by all but victims’ frantic families and, when they finally give up on dealing with it at home, medical personnel who may not be testing even symptomatic patients. The testing backlog may grow and grow, and plots like mine will make it look like everything’s under control because the number of reported cases isn’t rising much at all.

But that can only go on for so long. Eventually, if there is indeed an explosion of actual cases that’s hidden behind the reported-cases curve, it will become obvious to millions of Americans that we’re actually fucked and this thing is going to kill a lot of people. It’s already certain that it will kill many more Americans–perhaps more per capita, too–than citizens of any other country.

I wonder about the armed lunkheads now doing protests agitated by shadowy right-wing groups at the encouragement of their cult leader2 who clearly doesn’t care about anyone but himself and his hold on power. What would their slow little minds do with that undeniable reality after making fools of themselves and infecting themselves, their families, and friends? What sort of toxic dangerous bullshit will their hero be spewing then to cover yet another embarassment he couldn’t possibly bring himself to face up to like a man, after a long pathetic lifetime of dodging one failure after another?

Track Record

Now, let’s go back to the neat little fantasyland of reported cases, because that’s all I am really qualified to deal with at this point. (I may switch to reported deaths soon; that seems harder to hide.)

In case you’re curious about how the model has been doing with more recent projections of what today’s number would be, here are a few.

On 4/18 (yesterday), the model projected 763,369 cases. There were 759,086. The model was pessimistic by 16% about the number of new cases there would be today.

The previous day, 4/17, the model projected 759,553 cases today, an error (in the increase) of 0.8% for the two days. It was on the pessimistic side–as if that matters with such a small error.

On 4/16, the model projected that there would be 752,522 cases today. The error in the increase (optimistic that time) was 12.4%. But that was over the course of three days, a time interval when the number of reported cases increased nearly sixty thousand. The (geometric) mean error per day was 4%.

———

It may be worth considering some other projections to other dates. The model’s next-day projection on 4/16 was for there to be 696,432 cases on 4/17, which was too optimistic by 10%. The previous day, 4/15, the model projected 689,026 cases on 4/17, an average daily error (in the increase) of 9.7%, also on the optimistic side. And on 4/14, when there were 605,193 cases, the model was projecting 679,654 on 4/17, an average daily error of 8.3%, again only counting the projected vs actual increase and not the absolute number of cases, again on the optimistic side.

Those errors are the result of some bigger new-case numbers coming in than would have been expected had the data conformed to the gradual decrease in growth rate that the model had evolved its parameters to fit each day. Each time a new day’s Johns Hopkins data came in, the parameters got changed to better fit the curve, and that resulted in next-day projections that wound up being lower than what actually happened.

What happened is at least in part that the official U.S. case counts began to “include both confirmed and probable cases and deaths.”3 It doesn’t seem that many states have as yet switched to this new reporting protocol, but it surely has and will continue to increase the numbers my model sees from the Johns Hopkins data.

There was a point where the model’s projection for how many cases there would be on 4/17 ceased to be optimistic and landed pretty much right on. That was 4/12, with 555,313 cases, before those larger numbers started showing up this past week. Then the model projected 701,642 cases on 4/17. That projected increase was off by +1.3%, or an average of 0.3% per day. (It was pessmistic, but by so little it hardly seems worth mentioning which way it was off.)

So, what did the model say about today–one week later–with its parameters evolved to data from 4/12? Its projection was that we would be seeing 755,274 reported cases in the United States today. Again, there were 759,086. That’s an error in the projected (199,961) vs actual (203,773) increase of 1.9%, or 0.3% per day.

I think that counts as a successful, if horrific, extrapolation.

The Sky is the Limit

With data from a couple days ago (4/17), I thought of the Futurama TV show and how Professor Farnsworth often announces some challenging and arguably terrible development by saying, “Good news, everyone!” I was tempted to say the same about the scatter plot of values vs SSE for the model parameter L.

As you may recall, L and r are parameters for the conventional logistic growth component of the model. L is the maximum number of possible cases, which at worst case is limited by the country’s population. The other parameter r is the initial exponential growth rate, before any curve-flattening becomes apparent.

Well, for the first time I was seeing the hints of a possible range for L, and hence a projection by the model of an ultimate worst-case number of U.S. reported cases. That was the “good” part of the “Good news” cackled by a goggle-eyed cartoon character. The challenging and arguably terrible part was that this faint preliminary outline of a range for L, the maximum number of U.S. reported cases ever, seemed to be anywhere from 50 to 200 million.4

Unfortunately, even that bit of good news is no longer apparent in the scatter plot for L. Indeed, the best fit combination of parameters after nearly 22,000 simulation runs, has its value of L pretty much at the whole U.S. population. That’s basically right at the upper limit of the parameter range.

We might take a long time to get there, says the model with its parameters evolved to the data for the past week or so, but we will go very high indeed before we’re done.

I won’t include such outlandish projections in the Nightmare Plot, though, because too much can and will change between now and then. Like, say, the President of the United States tweeting out calls for insurrection against their state governments5 and convincing his cult followers to go out and act like it’s just no big deal that three quarters of a million6 of their fellow citizens have now contracted a virus that has killed nearly half as many as those who are officially listed as “recovered.”7

Looking Back vs Forward

Or, Why you should perhaps consider taking this model seriously

It’s been an interesting experience posting these updates to the r/Coronavirus subreddit. Each posting gets this information to a few thousand people, so it still seems worthwhile to continue doing so. Better yet, I have gotten some valuable constructive criticism, sprinkled in between a few stupid lazy insults. Some of that criticism has resulted in direct improvements, such as the inclusion of a normality test on the residuals. (Turns out they are about as normal as it gets, with a p value of over 0.9.) And some of the comments are just plain funny, like this one from “derphurr” regarding my April 10 blog post:

To be fair I’ve been following your numbers from the beginning post. Your 1M cases date has been moving outwards every week/ updated graph.

It might be like fusion energy, it is technically possible we never get there.

Unfortunately, I think the model will beat fusion. (The classic line is “Fusion energy is 30 years away and always will be.”) Its current projection for a million U.S. cases is just a few days later than it was when that comment was made: April 27 rather than April 23.

Another redditor “chalion” from Argentina quoted this from the 4/10 blog post: “On 4/3, the projection was for there to be just over 600,000 cases today, compared to the 496,535 we have had reported at this point in the U.S. Quite a bit off, but remember, that was looking forward a full week.”

Admitting it was harsh (and later graciously apologizing for that), he or she then offered this critique of my extrapolations:

It’s a model that can’t predict a week by a large margin. Clearly it isn’t working. Its just adjusting it daily to past data but says nothing about the future.

He isn’t making a model of the epidemic, it’s making a tool to find a curve close to the current data. Its good and interesting but I don’t think it’s a model of the evolution of this thing.

I responded, and he or she did in kind (very reasonably), and I’ll quote that exchange for you. But first let me observe that the critique of the 4/3 projection remains valid. The model was expecting a little more than twice as many cases as have been reported thus far. The curve flattened a lot more since then. I won’t complain about that at all, even if it somewhat proves the model’s limitations, at least for extrapolations more than a few days out.

Here’s what I said in response:

You raise some interesting points. I don’t think your assessment in that last paragraph is unfair.

This would only be a model of the epidemic under the assumption that the reported cases will continue to increase in the fashion they have for the past two weeks. If they do, then it will have modeled the epidemic. If they don’t, then something else has occurred to cause the numbers to change, in a way that the model failed to incorporate.

Updating the parameters to fit the new situation is a natural way to continue the process of evolving them. This doesn’t change what the model predicted with the previous parameters. If the model is within 8% of the next day’s number of new cases, I don’t see that as having no predictive value, as “not working.” It seems to me it works quite well for short-term projections.

As far as its alleged inability to predict one week out (from 4/3), the error of its projecting 346,635 new cases rather than the 220,949 that actually got reported between 4/3 and 4/10 is considerable, 56%. I would just ask that the magnitude of that actual increase be considered, too, when passing judgment: the number of cases nearly doubled (1.8x) in that time. It was projecting the number to increase by 2.26x. Not by, say, extrapolating straight out from the 13.5% daily increase that had happened the previous two days and projecting 2.4x. The model was moving the curve, and in the right direction, just not as much as the data then indicated it would.

I’ve decided to take seriously your harshly-worded comment and give it the detailed response it deserves. You do make some good points, and there’s definitely room for humility with the longer-term performance of this model thus far. I’m glad you at least thought it was good and interesting in its own way, and I’d like it to be looked at seriously as an approach for something we both at least agree it works for, fitting a curve close to data of this Covid-19 U.S. epidemic thus far, even if we part company when it comes to extrapolating from that fit.

I will continue to ask for a bit of forbearance about even the large inaccuracy from 4/2 to today. We are still dealing with largely exponential growth, and even the actual reported numbers have been exploding. The model was projecting that we’d have a bit more than 6x as many cases today as we did then, and what we have is around 2.8x as many. If the model had kept on extrapolating that 13.5% daily increase we were having then, seventeen days ago, it would have projected 8.6x as many. It was accounting for some of the curve-flattening that happened, but not enough.

Now, here’s what my interlocutor, who was watching this two months ago as an employee of Argentina’s Ministry of Health, said in response to that:

Yes. I have to agree with you. You are right in every point you make. Even when you describe my wording as harsh. I’m sorry for that.

I’m trying the same as you, modelling Covid19 in my country (Argentina) and I don’t think my results are better than yours. Maybe that was that frustration talking.

I will look at yours model in more detail tomorrow and try to make constructive criticism and not what I did

Keep the effort. And thanks for your thoughtful answer.

Thank you, sir or madam, for the gracious words and also for reminding me about the very real limitations of extrapolating a curve that’s been fitted to time-series data. As a gesture of appreciation, I ran the model on Argentina’s data today and you can see the plot here. I can see why your modeling efforts have gotten you frustrated; my model doesn’t fit what’s been (officially) happening in your country that well either, with a fat-tailed residual distribution that is unlikely (p=0.039) to be normal random variation. Maybe it’s all the jumps in the time-series data for your country, which fortunately still has a tiny number of cases per capita.

Now, even if it’s true that the model is only fitting close to current data and can’t be trusted for what it says will happen in the future,8 I do think it’s done a damn fine job of characterizing what’s happened thus far, at least in the U.S. Just take a glance at that plot above (the top subplot) and the percentage errors in what it’s fitted to the past month of daily data, with just six parameters.

The best fit is essentially a perfect one going back nearly a week. Then it has errors in the single digit percentages for more than another week, for all of April and the last few days of March.

The error between modeled and actual reported cases doesn’t reach 20% until you go back to 3/14. That’s a long time ago, in pandemic time. The number of cases has multiplied 278 times since then. It almost certainly has one more doubling left to go. If there isn’t some new level of curve flattening that my model can’t anticipate with its best fit to today’s data, there will be more doublings to come.9 Meanwhile, the deranged narcissist in the White House is pushing to get everybody out there infecting each other again.

Hang on, my fellow citizens of the world’s newest and biggest failed state. I think we may be in for a long and bumpy ride.

Notes


  1. The data has been coming in pretty much as expected, and these projections are the same as they have been for the past two days. 

  2. “A trio of far-right, pro-gun provocateurs is behind some of the largest Facebook groups calling for anti-quarantine protests around the country, offering the latest illustration that some seemingly organic demonstrations are being engineered by a network of conservative activists.

    “The Facebook groups target Wisconsin, Ohio, Pennsylvania and New York, and they appear to be the work of Ben Dorr, the political director of a group called ‘Minnesota Gun Rights,’ and his siblings, Christopher and Aaron. By Sunday, the groups had roughly 200,000 members combined, and they continued to expand quickly, days after President Trump endorsed such protests by suggesting citizens should ‘liberate’ their states.” Washington Post, “Pro-gun activists using Facebook groups to push anti-quarantine protests,” 4/19/​20. 

  3. This is explained in some detail on the Worldometer COVID-19 site. 

  4. You can take a look for yourself at the scatter plot of L values vs SSE, after 100 generations of evolution to 4/17 Johns Hopkins data. Members of the final population are shown in red, while earlier, replaced individuals from earlier generations are shown in blue. Most of the individuals who–as will likely be the case for many a Trump fan gathered with fellow idiots to protest the sensible pandemic response of his governor–have been displaced by natural selection have low enough fitness (high SSE) that they are not included in this plot. 

  5. Not by accident, I live in one of those blue states that actually has its shit together. We have been seeing around 1% new vs total cases per day, which is a fifth of the national average. Here’s what our non-idiot governor, for whom I will be honored to cast a vote (by mail, duh) in November, had to say today:

    “The president’s statements this morning encourage illegal and dangerous acts. He is putting millions of people in danger of contracting COVID-19. His unhinged rantings and calls for people to “liberate” states could also lead to violence. We’ve seen it before.

    “The president is fomenting domestic rebellion and spreading lies even while his own administration says the virus is real and is deadly, and that we have a long way to go before restrictions can be lifted.

    “Just yesterday, the president stood alongside White House officials and public health experts and said science would guide his plan for easing restrictions. The White House released a sensible plan laying out many of the guidelines that I agree are essential to follow, as we work to resume economic activity. Trump slowly read his script and said the plan was based on ‘hard, verifiable data’ and was done ‘in consultation with scientists, experts and medical professionals across government.’

    “Less than 24 hours later, the president is off the rails. He’s not quoting scientists and doctors but spewing dangerous, anti-democratic rhetoric.

    “We appreciate our continued communication with the vice president, Dr. Birx, Admiral Polowczyk, Admiral Giroir and others in the federal government, but their work is undermined by the president’s irresponsible statements.

    “I hope someday we can look at today’s meltdown as something to be pitied, rather than condemned. But we don’t have that luxury today. There is too much at stake.

    “The president’s call to action today threatens to undermine his own goal of recovery by further delaying the ability of states to amend current interventions in a safe, evidence-based way. His words are likely to cause COVID-19 infections to spike in places where social distancing is working — and if infections are increasing in those places, that will further postpone the 14 days of decline that his own guidance says is necessary before modifying any interventions.

    “I hope political leaders of all sorts will speak out firmly against the president’s calls for rebellion. Americans need to work together to protect each other. It’s the only way to slow the spread of this deadly virus and get us on the road to recovery.” 

  6. And that’s only the number with serious enough symptoms to get their infection confirmed by a test, which remain difficult to get in many parts of the country. 

  7. According to the Worldometer on 4/19 at 7:36 PM Pacific time, there have been 40,555 deaths from Covid-19 and 71,012 recovered, with hundreds of thousands of people fighting for their lives right now. Many of them will not make it. But sure, assholes, go hang out at the beach with the crowd. 

  8. Even I would advise caution about looking to its projections more than a week or so out. 

  9. I’m just a layman about such things, but I understand that the onset of Summer in the U.S. is one such possibility, with warmer temperatures that kill off the virus faster on surfaces. Sure hope so, but then we have Autumn to look forward to, and the example of the 1918 Influenza with its deadly second wave. 

Friday, April 10, 2020

Marching Towards a Million

[T]he feelings of knowing, correctness, conviction, and certainty aren’t deliberate conclusions and conscious choices. They are mental sensations that happen to us.
—Robert Burton. On Being Certain
Update, 4/12: The curve is continuing to flatten a bit more than the model is able to extrapolate from having its six parameters fitted to the data as it stands each day. (I certainly have no objection to things coming in on the low side, even if it reveals some room for further possible improvement in the model, or just some random-walk stochastic behavior that simply can’t be chased via curve fitting and extrapolation, even with more model parameters.) There’s not been enough of a difference to go through this post and do a bunch of edits, so I’ve just added a few footnotes and this link to the latest version of the Nightmare Plot. Don’t get complacent now, OK?

For just over a week now, I’ve been running and re-running the “logistic growth with growth-regime transition” model I’ve developed for reported cases of Covid-19, evolving populations of individuals whose digital DNA consist of six parameters for that model:

xd(t, x) = r*x*(1-x/L) * flatten(t) + b
flatten(t) = 0.5*rf*(1-tanh(1.1*(t-t0)/th))-rf+1

Two of the parameters, L and r, are for the conventional logistic growth component of the model. L is the maximum number of possible cases, which at worst case is limited by the country’s population. The other parameter r is the initial exponential growth rate, before any curve-flattening becomes apparent.

The “growth-regime transition” component of the model, implemented by flatten(t), has three parameters of its own: There is a fractional reduction rf in growth rate at the conclusion of a lower-growth (flattened curve) regime. My modification to the conventional logistic-growth model uses the tanh function to smoothly transition from the original growth rate r to a lower “flattened” growth rate r*rf over a time interval th (in days). The transition interval is defined as the middle half of the full transition between growth rates, i.e., from still 75% of old vs new down to 25%. The midpoint of the transition is defined by the third parameter t0, in days after 1/22/​20.

Finally, a very small constant number of new cases per day is included as a sixth parameter b. This does manage to “earn its keep” in the model by more flexibly allowing the mean of the residuals to be zeroed out.

Here is what the model is saying today, with Johns Hopkins data from this evening (4/10, click here for the latest one with 4/12 data).

April 10, 2020: Reported U.S. Covid-19 cases vs days since 1/22

Extrapolating forward (admittedly a perilous enterprise, as I’ve said before with endlessly repeated disclaimers and caveats, applicable yet again here), the model is projecting three quarters of a million cases1 one week from today, and over a million the week after that.2 The model’s projection–both now and as it stood on 4/5–is that there will be around a million and a half Americans–one in every two hundred–reporting infection with Covid-19 in early May, with the number still climbing faster every day.3

The curve is bending downward a bit, yes, but things still look pretty grim.

Past Performance

You’ve probably heard the phrase “past performance does not indicate future results,” and that’s true if something happens (as it often does) that’s not accounted for by the model. Life is messy and complicated, and that includes pandemics. But shitty performance does tell you not to bother looking any further. And that’s definitely not what’s been happening with my little model.

With parameters evolved to yesterday’s Johns Hopkins data (4/9), it had projected today’s increase in the number of cumulative cases to be 32,920 instead of the 35,098 new cases that actually got reported today. That was off by about 6%. (I calculate the error from the projected vs actual increase from the most recent known value, because the model is anchored to that point and can only be given credit or blame for what it projects from that last known point.)

With data from the day before yesterday (4/8), it projected yesterday’s number of new cases at 32,385. There were 32,305 new cases yesterday, an error of 0.2%.

And with data from 4/7, the model evolved to parameters that had it projecting 30,849 new cases the next day. There were 30,849 new cases on 4/8, a 6% error.

On 4/5, the model projected 32,346 vs the 32,133 that there were on 4/6, an error of just 0.7%.

The model of course does a little worse when it looks further ahead. You’d expect that of any naive model (i.e., not theoretically informed, beyond “the growth rate is going down”) of data from six empirically-optimized parameters being extrapolated with exponential growth. And it’s certainly not anything I’m ashamed of.

On 4/3, the projection was for there to be just over 600,000 cases today, compared to the 496,535 we have had reported at this point in the U.S. Quite a bit off, but remember, that was looking forward a full week.

On 4/5, it was for there to be 510,425 cases today, a total error of less than 8%, again with the error measured from the projected vs actual increase, not the absolute number.4 On 4/7, the model projected there would be 486,367 cases as of today, and that was off (in the increase) by 10%.5

Evolution in Action

(In more ways than one, unfortunately.)

I’ve been wanting to write a little about the whole process of computer evolution that I use to fit the model’s six parameters to the time-series data. It begins with a population of 240 simulated organisms (not a virus!), digital “individuals” whose randomly chosen6 DNA consists of the values of those six parameters, within predefined bounds.

After working on this model for over a week, I’ve refined each of those bounds to a reasonable range of possible values. Updating those ranges as the model makes sometimes failed attempts to find a convincing best fit is my sole remaining human-engineering activity, now that the model is designed and the code implementing it is working.

Each of those 240 individuals is given a chance to show how fit its DNA is by running a backwards integration of the model from the last known data point. The model, you may recall from reading my previous blog post is for the number of new cases each day, not the cumulative number of total cases ever.

The model is a differential equation; xd(t,x), not x(t). So, to have it project the cumulative number of cases x(t), I integrate the differential equation forward or backward from a fixed point represented by the most recent known number of cases, a point to which it is anchored.7

The modeled number of reported cases is compared to the actual number that were reported, for each day going back what is now a full five weeks’ worth of data.8

The fitness of each individual is measured as the sum of the squared errors (SSE) between each day’s modeled number of reported cases (the value that the model would expect, being integrated backwards) vs. the number of cases there actually were as of that date. The two figures are compared only after they have had a square-root transform applied to them. This limits how much more recent, larger numbers of new daily cases weigh in the fitness calculation vs earlier ones.

Then evolution gets underway, with each population member getting challenged by an individual, which gets spawned from combinations of not two but four population members. These mutant offspring have the model run with their parameters through the integration and SSE calculation. If they are better (lower SSE) than whichever population member is being challenged in its turn, they replace it. When all members of the population have received their challenge, possibly having been replaced, evolution proceeds to the next generation.

The whole spawning process is worth a moment of technical discussion. Differential evolution uses an interesting9 sort of mathematical equivalent to some kind of alien four-way sex to create challengers. The trial individual is formed from “crossover” between the a “target” (a rough DE equivalent of a mother) and a “donor” individual (closest thing it has to a father). The donor is formed from the vector sum of a base individual and a scaled vector difference between two randomly chosen other individuals that are distinct from each other and both the target and base individuals.

That is hard to follow in print, but this equation might help. The many-tentacled alien infant we want from this tangled act is ic, the individual challenging:

id = ib + F*(i0 - i1)
ic = crossover(it, id)

The crossover consists of giving each parameter of the donor individual a chance (usually a very good chance) to appear in the challenger, as opposed to using the target’s parameter. Basically, think of a higher number as being more for paternal rather than maternal inheritance. The default value used by my program (apologies for what is becoming an awkward analogy) is 0.7.

I refine the parameter bounds as needed and sit back while my ade Python package (of which this whole Covid-19 modeling is a single example file covid19.py) had 75 generations of these simulated individuals spawn and fight it out on a virtual Darwinian savanna. The software dispatches SSE-calculating jobs asynchronously to worker Python interpeters across the timeslots of my six-core CPU. It takes about two minutes. The software produces a plot with the model’s curves in red and what actually happened in blue. It’s what I’ve been calling the Nightmare Plot.

Singing in the Apocalypse

The Nightmare Plot really is horrifying when you think about what those numbers represent. Their relentless upward march–slowing but by no means stopping–is making everyone’s life suck including my own. The novelty of this whole apocalyptic survival thing is starting to wear off just a bit, even for me.

Maybe I’m a terrible person, but dammit I can’t help experiencing a bit of pride, too. This “little model that could” is proving to be the no-pay, no-fame, no-acceptance academic-mathematical equivalent of, say, some college undergraduate inventing an optimal radio receiver frequency arrangement, now in use by the circuitry of your smartphone, as part of an independent senior project he decided to work on weekend after weekend a quarter century ago. (This hypothetical individual never was much for working in groups.)

So, take it or leave it, folks, you’ve got yourself a six-parameter mathematical model for the number of reported cases of Covid-19. It was never “published” in some elite-accepted overpriced package of specialty information. It wasn’t part of any network of peer reviewers. (Like the aforementioned loner radio geek, I’ve never been one for playing in groups.)

But it does appear to work.

———

A friend of mine told me today that he is selfishly rather enjoying this whole situation. He now has lots of time to learn things on his own that he’s been wanting to work on, time for “driving out to pretty places to take pictures and go for walks in the woods.”

He’d rather be back doing what he does in person. But, he admits, there are upsides.

I assured him that it’s 100% OK to enjoy those upsides, even as I admitted my own feeling of finding it a little weird to derive satisfaction from successfully modeling these awful numbers. But I had the benefit of receiving yesterday some reassurance in this area, as I was talking about this very topic with a friend of mine whose life’s work revolves around how people think and feel about things.

He told me he can’t be as helpful to others in his profession if he isn’t taking care of himself, and that means enjoying life in spite of or even at times because of what is otherwise a horrible situation. Of course he’d rather not be in it, nor would I or you, dear reader with a delicate pair of lungs of your own. But he is, and you are, and I am, and so let’s take what good there is to be had.

Smile and sing and laugh, and take pride in the work that you now have. Even through the Apocalypse.

Notes


  1. With data updated as of 4/12, the projection is now 700,000 by 4/17. A bit lower, and the curve is continuing to flatten, but not by much.

    The careful reader may notice that I always refer to “reported U.S. cases” or some alternate variation and wording. I will repeat yet again that I have no expertise in biology, medicine, or the spread of infectious disease, and so I try not to speculate on how many of our fellow citizens have actually gotten infected with this thing without it ever being reported. Again, I’m just a retired engineer who has spent years constructing nonlinear models of mostly electronic things and believes this one is a pretty well grounded for doing the following, and only the following: predicting what will happen if the data continues as it has recently, especially as it has in the past two weeks

  2. With data updated as of 4/12, the projection is now for a bit less than 900,000 cases by 4/24. One redditor cleverly observed that my modeling’s million-case projections have been like those for fusion energy. (The saying is that fusion is 30 years away and always will be.) I won’t dispute that observation at this point; each day’s new data for the past week or so has pushed that projection outward a bit, though never making it look any more implausible to reach eventually. 

  3. Let’s call it mid-May now, given the recent additional curve flattening (4/12 data). 

  4. Although the difference between computing the error is less with such a large increase from the 4/5 last-known number of 337,072. In case you really want to know, the absolute error from the model projecting forward five days was 2.8%. 

  5. The actual number of cases as of 4/12 from Johns Hopkins’ daily data was 555,313, an increase of 58,778. On 4/10, the model was projecting 567,306 cases, or a projected increase of 70,771. The error in the increase was 20% over the two-day interval, or 10% per day. Not as good as previous days’ next-day predictive performance, but not terrible, either. And since there will always be an error when extrapolating from a curve fit to data having a random component, I’m happy the data is lower than projected and not higher, because I have my own delicate pair of lungs that I’ve grown fond of, too. 

  6. Actually, they’re not quite chosen with uniform randomness: The default is to set up the initial population using a Latin hypercube. That way you get initialize pseudorandom parameter values, but with minimal clustering. You want the six-dimensional search space to be explored as fully as possible. 

  7. This is known as an “initial value problem,” the initial value here being the last known number of cases. You can go either direction in time from the initial value. For fitting the model parameters, my algorithm goes backwards from the most recent data. To extrapolate forward and make projections, it goes forward from the same “initial value.” 

  8. Reported cases numbers before March 5 are omitted from both the curve fitting and plots. The modeled (red) curve deviates from the historical data when you go earlier, and I’m not sure a good fit that far back in the history of this thing is relevant to what’s happening now. 

  9. You claim you wouldn’t find alien four-way procreative sex interesting? Well, I don’t believe you.