Sunday, April 19, 2020

Coming Through in Waves

You are only coming through in waves
Your lips move but I can’t hear what you’re saying
—Roger Waters, Comfortably Numb (Pink Floyd, 1980)
Update, 4/20/20: Rather than dwell on the accuracy or lack of same in the model’s projections anymore–as if this were some kind of grand chess match where the Knights and Bishops don’t actually trample and lance each other–I will just offer the latest Nightmare Plot with a few observations. The one-million mark has received yet another delay, of one day to 4/28, and the model still projects 1.5M reported cases sometime in the second week of May, probably toward the end of that week. The curve continues to fit past data uncannily well even if it has shortcomings with extrapolations more than a few days out.

Most importantly, I am starting to question the wisdom of continuing these updates of a “reported cases” statistic that I’ve begun to suspect is being influenced by delayed and limited testing as much or more as repression of the virus. I do not want to participate in drawing attention to any flattening of a curve if what is pushing it down is a disconnect–whether due to cold political calculations or incompetence–between what that curve actually shows and the actual magnitude of this national and global crisis. Perhaps in Hollywood it doesn’t matter if the curves are real, but this is an important one indeed.

In my previous post a little over a week ago, I summarized the “logistic growth with growth-regime transition” model I’ve developed for reported cases of Covid-19 and offered–with many disclaimers that I include now as well–a few projections:

three quarters of a million cases one week from today, and over a million the week after that. The model’s projection–both now and as it stood on 4/5–is that there will be around a million and a half Americans–one in every two hundred–reporting infection with Covid-19 in early May, with the number still climbing faster every day.

The exact projection for one week from 4/10 (to a degree of precision only useful for evaluating past performance) was 758,921 reported cases in the U.S. On 4/17, there were 699,706. The model was 29% pessimistic about the week’s increase. Instead of 262,386 more cases, there were “only” 203,171.

This works out to a (geometric) average of 3.7% per day. In other words, if the model were that pessimistic every day, projecting 3.7% more than the number of cases that actually were reported and without updating its parameters to try to do better the next day, we would wind up with this much of an error.

That doesn’t seem very far off to me.

With its parameters evolved to fit today’s data (4/19), the model is projecting that the million-case mark will be reached by 4/27, a few days later than it expected a week ago. As far as its projection of 1.5M cases goes, it expects that sometime in the second week of May. Perhaps not quite “early May,” but there’s good reason to use imprecise language when making projections.1

Here’s the latest Nightmare Plot. As always, you can click on it to see the important details.

April 19, 2020: Reported U.S. Covid-19 cases vs days since 1/22

Testing, Testing . . .

Unfortunately, I’m starting to wonder if this modeling of reported cases may be doing more harm than good. The comments in the next few paragraphs are from a layman in the relevant fields of biology, medicine, what’s actually going on with this supremely fucked-up White House, etc. But I think it’s time to discuss reported vs actual cases of Covid-19 in the United States.

The fact that the numbers of daily new cases have fairly suddenly hit a ceiling of around 30,000 for about a week now make me suspicious that my model has become one for how fast we’re able to get people tested, and not for a realistic metric of how many people are infected. If the places doing the testing are unable or unwilling to process more than a certain number per day, of course that is going to result in a flattened curve.

Meanwhile, the numbers of actual infections and sick people waiting for their positive test result may be continuing to soar, unseen by all but victims’ frantic families and, when they finally give up on dealing with it at home, medical personnel who may not be testing even symptomatic patients. The testing backlog may grow and grow, and plots like mine will make it look like everything’s under control because the number of reported cases isn’t rising much at all.

But that can only go on for so long. Eventually, if there is indeed an explosion of actual cases that’s hidden behind the reported-cases curve, it will become obvious to millions of Americans that we’re actually fucked and this thing is going to kill a lot of people. It’s already certain that it will kill many more Americans–perhaps more per capita, too–than citizens of any other country.

I wonder about the armed lunkheads now doing protests agitated by shadowy right-wing groups at the encouragement of their cult leader2 who clearly doesn’t care about anyone but himself and his hold on power. What would their slow little minds do with that undeniable reality after making fools of themselves and infecting themselves, their families, and friends? What sort of toxic dangerous bullshit will their hero be spewing then to cover yet another embarassment he couldn’t possibly bring himself to face up to like a man, after a long pathetic lifetime of dodging one failure after another?

Track Record

Now, let’s go back to the neat little fantasyland of reported cases, because that’s all I am really qualified to deal with at this point. (I may switch to reported deaths soon; that seems harder to hide.)

In case you’re curious about how the model has been doing with more recent projections of what today’s number would be, here are a few.

On 4/18 (yesterday), the model projected 763,369 cases. There were 759,086. The model was pessimistic by 16% about the number of new cases there would be today.

The previous day, 4/17, the model projected 759,553 cases today, an error (in the increase) of 0.8% for the two days. It was on the pessimistic side–as if that matters with such a small error.

On 4/16, the model projected that there would be 752,522 cases today. The error in the increase (optimistic that time) was 12.4%. But that was over the course of three days, a time interval when the number of reported cases increased nearly sixty thousand. The (geometric) mean error per day was 4%.

———

It may be worth considering some other projections to other dates. The model’s next-day projection on 4/16 was for there to be 696,432 cases on 4/17, which was too optimistic by 10%. The previous day, 4/15, the model projected 689,026 cases on 4/17, an average daily error (in the increase) of 9.7%, also on the optimistic side. And on 4/14, when there were 605,193 cases, the model was projecting 679,654 on 4/17, an average daily error of 8.3%, again only counting the projected vs actual increase and not the absolute number of cases, again on the optimistic side.

Those errors are the result of some bigger new-case numbers coming in than would have been expected had the data conformed to the gradual decrease in growth rate that the model had evolved its parameters to fit each day. Each time a new day’s Johns Hopkins data came in, the parameters got changed to better fit the curve, and that resulted in next-day projections that wound up being lower than what actually happened.

What happened is at least in part that the official U.S. case counts began to “include both confirmed and probable cases and deaths.”3 It doesn’t seem that many states have as yet switched to this new reporting protocol, but it surely has and will continue to increase the numbers my model sees from the Johns Hopkins data.

There was a point where the model’s projection for how many cases there would be on 4/17 ceased to be optimistic and landed pretty much right on. That was 4/12, with 555,313 cases, before those larger numbers started showing up this past week. Then the model projected 701,642 cases on 4/17. That projected increase was off by +1.3%, or an average of 0.3% per day. (It was pessmistic, but by so little it hardly seems worth mentioning which way it was off.)

So, what did the model say about today–one week later–with its parameters evolved to data from 4/12? Its projection was that we would be seeing 755,274 reported cases in the United States today. Again, there were 759,086. That’s an error in the projected (199,961) vs actual (203,773) increase of 1.9%, or 0.3% per day.

I think that counts as a successful, if horrific, extrapolation.

The Sky is the Limit

With data from a couple days ago (4/17), I thought of the Futurama TV show and how Professor Farnsworth often announces some challenging and arguably terrible development by saying, “Good news, everyone!” I was tempted to say the same about the scatter plot of values vs SSE for the model parameter L.

As you may recall, L and r are parameters for the conventional logistic growth component of the model. L is the maximum number of possible cases, which at worst case is limited by the country’s population. The other parameter r is the initial exponential growth rate, before any curve-flattening becomes apparent.

Well, for the first time I was seeing the hints of a possible range for L, and hence a projection by the model of an ultimate worst-case number of U.S. reported cases. That was the “good” part of the “Good news” cackled by a goggle-eyed cartoon character. The challenging and arguably terrible part was that this faint preliminary outline of a range for L, the maximum number of U.S. reported cases ever, seemed to be anywhere from 50 to 200 million.4

Unfortunately, even that bit of good news is no longer apparent in the scatter plot for L. Indeed, the best fit combination of parameters after nearly 22,000 simulation runs, has its value of L pretty much at the whole U.S. population. That’s basically right at the upper limit of the parameter range.

We might take a long time to get there, says the model with its parameters evolved to the data for the past week or so, but we will go very high indeed before we’re done.

I won’t include such outlandish projections in the Nightmare Plot, though, because too much can and will change between now and then. Like, say, the President of the United States tweeting out calls for insurrection against their state governments5 and convincing his cult followers to go out and act like it’s just no big deal that three quarters of a million6 of their fellow citizens have now contracted a virus that has killed nearly half as many as those who are officially listed as “recovered.”7

Looking Back vs Forward

Or, Why you should perhaps consider taking this model seriously

It’s been an interesting experience posting these updates to the r/Coronavirus subreddit. Each posting gets this information to a few thousand people, so it still seems worthwhile to continue doing so. Better yet, I have gotten some valuable constructive criticism, sprinkled in between a few stupid lazy insults. Some of that criticism has resulted in direct improvements, such as the inclusion of a normality test on the residuals. (Turns out they are about as normal as it gets, with a p value of over 0.9.) And some of the comments are just plain funny, like this one from “derphurr” regarding my April 10 blog post:

To be fair I’ve been following your numbers from the beginning post. Your 1M cases date has been moving outwards every week/ updated graph.

It might be like fusion energy, it is technically possible we never get there.

Unfortunately, I think the model will beat fusion. (The classic line is “Fusion energy is 30 years away and always will be.”) Its current projection for a million U.S. cases is just a few days later than it was when that comment was made: April 27 rather than April 23.

Another redditor “chalion” from Argentina quoted this from the 4/10 blog post: “On 4/3, the projection was for there to be just over 600,000 cases today, compared to the 496,535 we have had reported at this point in the U.S. Quite a bit off, but remember, that was looking forward a full week.”

Admitting it was harsh (and later graciously apologizing for that), he or she then offered this critique of my extrapolations:

It’s a model that can’t predict a week by a large margin. Clearly it isn’t working. Its just adjusting it daily to past data but says nothing about the future.

He isn’t making a model of the epidemic, it’s making a tool to find a curve close to the current data. Its good and interesting but I don’t think it’s a model of the evolution of this thing.

I responded, and he or she did in kind (very reasonably), and I’ll quote that exchange for you. But first let me observe that the critique of the 4/3 projection remains valid. The model was expecting a little more than twice as many cases as have been reported thus far. The curve flattened a lot more since then. I won’t complain about that at all, even if it somewhat proves the model’s limitations, at least for extrapolations more than a few days out.

Here’s what I said in response:

You raise some interesting points. I don’t think your assessment in that last paragraph is unfair.

This would only be a model of the epidemic under the assumption that the reported cases will continue to increase in the fashion they have for the past two weeks. If they do, then it will have modeled the epidemic. If they don’t, then something else has occurred to cause the numbers to change, in a way that the model failed to incorporate.

Updating the parameters to fit the new situation is a natural way to continue the process of evolving them. This doesn’t change what the model predicted with the previous parameters. If the model is within 8% of the next day’s number of new cases, I don’t see that as having no predictive value, as “not working.” It seems to me it works quite well for short-term projections.

As far as its alleged inability to predict one week out (from 4/3), the error of its projecting 346,635 new cases rather than the 220,949 that actually got reported between 4/3 and 4/10 is considerable, 56%. I would just ask that the magnitude of that actual increase be considered, too, when passing judgment: the number of cases nearly doubled (1.8x) in that time. It was projecting the number to increase by 2.26x. Not by, say, extrapolating straight out from the 13.5% daily increase that had happened the previous two days and projecting 2.4x. The model was moving the curve, and in the right direction, just not as much as the data then indicated it would.

I’ve decided to take seriously your harshly-worded comment and give it the detailed response it deserves. You do make some good points, and there’s definitely room for humility with the longer-term performance of this model thus far. I’m glad you at least thought it was good and interesting in its own way, and I’d like it to be looked at seriously as an approach for something we both at least agree it works for, fitting a curve close to data of this Covid-19 U.S. epidemic thus far, even if we part company when it comes to extrapolating from that fit.

I will continue to ask for a bit of forbearance about even the large inaccuracy from 4/2 to today. We are still dealing with largely exponential growth, and even the actual reported numbers have been exploding. The model was projecting that we’d have a bit more than 6x as many cases today as we did then, and what we have is around 2.8x as many. If the model had kept on extrapolating that 13.5% daily increase we were having then, seventeen days ago, it would have projected 8.6x as many. It was accounting for some of the curve-flattening that happened, but not enough.

Now, here’s what my interlocutor, who was watching this two months ago as an employee of Argentina’s Ministry of Health, said in response to that:

Yes. I have to agree with you. You are right in every point you make. Even when you describe my wording as harsh. I’m sorry for that.

I’m trying the same as you, modelling Covid19 in my country (Argentina) and I don’t think my results are better than yours. Maybe that was that frustration talking.

I will look at yours model in more detail tomorrow and try to make constructive criticism and not what I did

Keep the effort. And thanks for your thoughtful answer.

Thank you, sir or madam, for the gracious words and also for reminding me about the very real limitations of extrapolating a curve that’s been fitted to time-series data. As a gesture of appreciation, I ran the model on Argentina’s data today and you can see the plot here. I can see why your modeling efforts have gotten you frustrated; my model doesn’t fit what’s been (officially) happening in your country that well either, with a fat-tailed residual distribution that is unlikely (p=0.039) to be normal random variation. Maybe it’s all the jumps in the time-series data for your country, which fortunately still has a tiny number of cases per capita.

Now, even if it’s true that the model is only fitting close to current data and can’t be trusted for what it says will happen in the future,8 I do think it’s done a damn fine job of characterizing what’s happened thus far, at least in the U.S. Just take a glance at that plot above (the top subplot) and the percentage errors in what it’s fitted to the past month of daily data, with just six parameters.

The best fit is essentially a perfect one going back nearly a week. Then it has errors in the single digit percentages for more than another week, for all of April and the last few days of March.

The error between modeled and actual reported cases doesn’t reach 20% until you go back to 3/14. That’s a long time ago, in pandemic time. The number of cases has multiplied 278 times since then. It almost certainly has one more doubling left to go. If there isn’t some new level of curve flattening that my model can’t anticipate with its best fit to today’s data, there will be more doublings to come.9 Meanwhile, the deranged narcissist in the White House is pushing to get everybody out there infecting each other again.

Hang on, my fellow citizens of the world’s newest and biggest failed state. I think we may be in for a long and bumpy ride.

Notes


  1. The data has been coming in pretty much as expected, and these projections are the same as they have been for the past two days. 

  2. “A trio of far-right, pro-gun provocateurs is behind some of the largest Facebook groups calling for anti-quarantine protests around the country, offering the latest illustration that some seemingly organic demonstrations are being engineered by a network of conservative activists.

    “The Facebook groups target Wisconsin, Ohio, Pennsylvania and New York, and they appear to be the work of Ben Dorr, the political director of a group called ‘Minnesota Gun Rights,’ and his siblings, Christopher and Aaron. By Sunday, the groups had roughly 200,000 members combined, and they continued to expand quickly, days after President Trump endorsed such protests by suggesting citizens should ‘liberate’ their states.” Washington Post, “Pro-gun activists using Facebook groups to push anti-quarantine protests,” 4/19/​20. 

  3. This is explained in some detail on the Worldometer COVID-19 site. 

  4. You can take a look for yourself at the scatter plot of L values vs SSE, after 100 generations of evolution to 4/17 Johns Hopkins data. Members of the final population are shown in red, while earlier, replaced individuals from earlier generations are shown in blue. Most of the individuals who–as will likely be the case for many a Trump fan gathered with fellow idiots to protest the sensible pandemic response of his governor–have been displaced by natural selection have low enough fitness (high SSE) that they are not included in this plot. 

  5. Not by accident, I live in one of those blue states that actually has its shit together. We have been seeing around 1% new vs total cases per day, which is a fifth of the national average. Here’s what our non-idiot governor, for whom I will be honored to cast a vote (by mail, duh) in November, had to say today:

    “The president’s statements this morning encourage illegal and dangerous acts. He is putting millions of people in danger of contracting COVID-19. His unhinged rantings and calls for people to “liberate” states could also lead to violence. We’ve seen it before.

    “The president is fomenting domestic rebellion and spreading lies even while his own administration says the virus is real and is deadly, and that we have a long way to go before restrictions can be lifted.

    “Just yesterday, the president stood alongside White House officials and public health experts and said science would guide his plan for easing restrictions. The White House released a sensible plan laying out many of the guidelines that I agree are essential to follow, as we work to resume economic activity. Trump slowly read his script and said the plan was based on ‘hard, verifiable data’ and was done ‘in consultation with scientists, experts and medical professionals across government.’

    “Less than 24 hours later, the president is off the rails. He’s not quoting scientists and doctors but spewing dangerous, anti-democratic rhetoric.

    “We appreciate our continued communication with the vice president, Dr. Birx, Admiral Polowczyk, Admiral Giroir and others in the federal government, but their work is undermined by the president’s irresponsible statements.

    “I hope someday we can look at today’s meltdown as something to be pitied, rather than condemned. But we don’t have that luxury today. There is too much at stake.

    “The president’s call to action today threatens to undermine his own goal of recovery by further delaying the ability of states to amend current interventions in a safe, evidence-based way. His words are likely to cause COVID-19 infections to spike in places where social distancing is working — and if infections are increasing in those places, that will further postpone the 14 days of decline that his own guidance says is necessary before modifying any interventions.

    “I hope political leaders of all sorts will speak out firmly against the president’s calls for rebellion. Americans need to work together to protect each other. It’s the only way to slow the spread of this deadly virus and get us on the road to recovery.” 

  6. And that’s only the number with serious enough symptoms to get their infection confirmed by a test, which remain difficult to get in many parts of the country. 

  7. According to the Worldometer on 4/19 at 7:36 PM Pacific time, there have been 40,555 deaths from Covid-19 and 71,012 recovered, with hundreds of thousands of people fighting for their lives right now. Many of them will not make it. But sure, assholes, go hang out at the beach with the crowd. 

  8. Even I would advise caution about looking to its projections more than a week or so out. 

  9. I’m just a layman about such things, but I understand that the onset of Summer in the U.S. is one such possibility, with warmer temperatures that kill off the virus faster on surfaces. Sure hope so, but then we have Autumn to look forward to, and the example of the 1918 Influenza with its deadly second wave. 

Friday, April 10, 2020

Marching Towards a Million

[T]he feelings of knowing, correctness, conviction, and certainty aren’t deliberate conclusions and conscious choices. They are mental sensations that happen to us.
—Robert Burton. On Being Certain
Update, 4/12: The curve is continuing to flatten a bit more than the model is able to extrapolate from having its six parameters fitted to the data as it stands each day. (I certainly have no objection to things coming in on the low side, even if it reveals some room for further possible improvement in the model, or just some random-walk stochastic behavior that simply can’t be chased via curve fitting and extrapolation, even with more model parameters.) There’s not been enough of a difference to go through this post and do a bunch of edits, so I’ve just added a few footnotes and this link to the latest version of the Nightmare Plot. Don’t get complacent now, OK?

For just over a week now, I’ve been running and re-running the “logistic growth with growth-regime transition” model I’ve developed for reported cases of Covid-19, evolving populations of individuals whose digital DNA consist of six parameters for that model:

xd(t, x) = r*x*(1-x/L) * flatten(t) + b
flatten(t) = 0.5*rf*(1-tanh(1.1*(t-t0)/th))-rf+1

Two of the parameters, L and r, are for the conventional logistic growth component of the model. L is the maximum number of possible cases, which at worst case is limited by the country’s population. The other parameter r is the initial exponential growth rate, before any curve-flattening becomes apparent.

The “growth-regime transition” component of the model, implemented by flatten(t), has three parameters of its own: There is a fractional reduction rf in growth rate at the conclusion of a lower-growth (flattened curve) regime. My modification to the conventional logistic-growth model uses the tanh function to smoothly transition from the original growth rate r to a lower “flattened” growth rate r*rf over a time interval th (in days). The transition interval is defined as the middle half of the full transition between growth rates, i.e., from still 75% of old vs new down to 25%. The midpoint of the transition is defined by the third parameter t0, in days after 1/22/​20.

Finally, a very small constant number of new cases per day is included as a sixth parameter b. This does manage to “earn its keep” in the model by more flexibly allowing the mean of the residuals to be zeroed out.

Here is what the model is saying today, with Johns Hopkins data from this evening (4/10, click here for the latest one with 4/12 data).

April 10, 2020: Reported U.S. Covid-19 cases vs days since 1/22

Extrapolating forward (admittedly a perilous enterprise, as I’ve said before with endlessly repeated disclaimers and caveats, applicable yet again here), the model is projecting three quarters of a million cases1 one week from today, and over a million the week after that.2 The model’s projection–both now and as it stood on 4/5–is that there will be around a million and a half Americans–one in every two hundred–reporting infection with Covid-19 in early May, with the number still climbing faster every day.3

The curve is bending downward a bit, yes, but things still look pretty grim.

Past Performance

You’ve probably heard the phrase “past performance does not indicate future results,” and that’s true if something happens (as it often does) that’s not accounted for by the model. Life is messy and complicated, and that includes pandemics. But shitty performance does tell you not to bother looking any further. And that’s definitely not what’s been happening with my little model.

With parameters evolved to yesterday’s Johns Hopkins data (4/9), it had projected today’s increase in the number of cumulative cases to be 32,920 instead of the 35,098 new cases that actually got reported today. That was off by about 6%. (I calculate the error from the projected vs actual increase from the most recent known value, because the model is anchored to that point and can only be given credit or blame for what it projects from that last known point.)

With data from the day before yesterday (4/8), it projected yesterday’s number of new cases at 32,385. There were 32,305 new cases yesterday, an error of 0.2%.

And with data from 4/7, the model evolved to parameters that had it projecting 30,849 new cases the next day. There were 30,849 new cases on 4/8, a 6% error.

On 4/5, the model projected 32,346 vs the 32,133 that there were on 4/6, an error of just 0.7%.

The model of course does a little worse when it looks further ahead. You’d expect that of any naive model (i.e., not theoretically informed, beyond “the growth rate is going down”) of data from six empirically-optimized parameters being extrapolated with exponential growth. And it’s certainly not anything I’m ashamed of.

On 4/3, the projection was for there to be just over 600,000 cases today, compared to the 496,535 we have had reported at this point in the U.S. Quite a bit off, but remember, that was looking forward a full week.

On 4/5, it was for there to be 510,425 cases today, a total error of less than 8%, again with the error measured from the projected vs actual increase, not the absolute number.4 On 4/7, the model projected there would be 486,367 cases as of today, and that was off (in the increase) by 10%.5

Evolution in Action

(In more ways than one, unfortunately.)

I’ve been wanting to write a little about the whole process of computer evolution that I use to fit the model’s six parameters to the time-series data. It begins with a population of 240 simulated organisms (not a virus!), digital “individuals” whose randomly chosen6 DNA consists of the values of those six parameters, within predefined bounds.

After working on this model for over a week, I’ve refined each of those bounds to a reasonable range of possible values. Updating those ranges as the model makes sometimes failed attempts to find a convincing best fit is my sole remaining human-engineering activity, now that the model is designed and the code implementing it is working.

Each of those 240 individuals is given a chance to show how fit its DNA is by running a backwards integration of the model from the last known data point. The model, you may recall from reading my previous blog post is for the number of new cases each day, not the cumulative number of total cases ever.

The model is a differential equation; xd(t,x), not x(t). So, to have it project the cumulative number of cases x(t), I integrate the differential equation forward or backward from a fixed point represented by the most recent known number of cases, a point to which it is anchored.7

The modeled number of reported cases is compared to the actual number that were reported, for each day going back what is now a full five weeks’ worth of data.8

The fitness of each individual is measured as the sum of the squared errors (SSE) between each day’s modeled number of reported cases (the value that the model would expect, being integrated backwards) vs. the number of cases there actually were as of that date. The two figures are compared only after they have had a square-root transform applied to them. This limits how much more recent, larger numbers of new daily cases weigh in the fitness calculation vs earlier ones.

Then evolution gets underway, with each population member getting challenged by an individual, which gets spawned from combinations of not two but four population members. These mutant offspring have the model run with their parameters through the integration and SSE calculation. If they are better (lower SSE) than whichever population member is being challenged in its turn, they replace it. When all members of the population have received their challenge, possibly having been replaced, evolution proceeds to the next generation.

The whole spawning process is worth a moment of technical discussion. Differential evolution uses an interesting9 sort of mathematical equivalent to some kind of alien four-way sex to create challengers. The trial individual is formed from “crossover” between the a “target” (a rough DE equivalent of a mother) and a “donor” individual (closest thing it has to a father). The donor is formed from the vector sum of a base individual and a scaled vector difference between two randomly chosen other individuals that are distinct from each other and both the target and base individuals.

That is hard to follow in print, but this equation might help. The many-tentacled alien infant we want from this tangled act is ic, the individual challenging:

id = ib + F*(i0 - i1)
ic = crossover(it, id)

The crossover consists of giving each parameter of the donor individual a chance (usually a very good chance) to appear in the challenger, as opposed to using the target’s parameter. Basically, think of a higher number as being more for paternal rather than maternal inheritance. The default value used by my program (apologies for what is becoming an awkward analogy) is 0.7.

I refine the parameter bounds as needed and sit back while my ade Python package (of which this whole Covid-19 modeling is a single example file covid19.py) had 75 generations of these simulated individuals spawn and fight it out on a virtual Darwinian savanna. The software dispatches SSE-calculating jobs asynchronously to worker Python interpeters across the timeslots of my six-core CPU. It takes about two minutes. The software produces a plot with the model’s curves in red and what actually happened in blue. It’s what I’ve been calling the Nightmare Plot.

Singing in the Apocalypse

The Nightmare Plot really is horrifying when you think about what those numbers represent. Their relentless upward march–slowing but by no means stopping–is making everyone’s life suck including my own. The novelty of this whole apocalyptic survival thing is starting to wear off just a bit, even for me.

Maybe I’m a terrible person, but dammit I can’t help experiencing a bit of pride, too. This “little model that could” is proving to be the no-pay, no-fame, no-acceptance academic-mathematical equivalent of, say, some college undergraduate inventing an optimal radio receiver frequency arrangement, now in use by the circuitry of your smartphone, as part of an independent senior project he decided to work on weekend after weekend a quarter century ago. (This hypothetical individual never was much for working in groups.)

So, take it or leave it, folks, you’ve got yourself a six-parameter mathematical model for the number of reported cases of Covid-19. It was never “published” in some elite-accepted overpriced package of specialty information. It wasn’t part of any network of peer reviewers. (Like the aforementioned loner radio geek, I’ve never been one for playing in groups.)

But it does appear to work.

———

A friend of mine told me today that he is selfishly rather enjoying this whole situation. He now has lots of time to learn things on his own that he’s been wanting to work on, time for “driving out to pretty places to take pictures and go for walks in the woods.”

He’d rather be back doing what he does in person. But, he admits, there are upsides.

I assured him that it’s 100% OK to enjoy those upsides, even as I admitted my own feeling of finding it a little weird to derive satisfaction from successfully modeling these awful numbers. But I had the benefit of receiving yesterday some reassurance in this area, as I was talking about this very topic with a friend of mine whose life’s work revolves around how people think and feel about things.

He told me he can’t be as helpful to others in his profession if he isn’t taking care of himself, and that means enjoying life in spite of or even at times because of what is otherwise a horrible situation. Of course he’d rather not be in it, nor would I or you, dear reader with a delicate pair of lungs of your own. But he is, and you are, and I am, and so let’s take what good there is to be had.

Smile and sing and laugh, and take pride in the work that you now have. Even through the Apocalypse.

Notes


  1. With data updated as of 4/12, the projection is now 700,000 by 4/17. A bit lower, and the curve is continuing to flatten, but not by much.

    The careful reader may notice that I always refer to “reported U.S. cases” or some alternate variation and wording. I will repeat yet again that I have no expertise in biology, medicine, or the spread of infectious disease, and so I try not to speculate on how many of our fellow citizens have actually gotten infected with this thing without it ever being reported. Again, I’m just a retired engineer who has spent years constructing nonlinear models of mostly electronic things and believes this one is a pretty well grounded for doing the following, and only the following: predicting what will happen if the data continues as it has recently, especially as it has in the past two weeks

  2. With data updated as of 4/12, the projection is now for a bit less than 900,000 cases by 4/24. One redditor cleverly observed that my modeling’s million-case projections have been like those for fusion energy. (The saying is that fusion is 30 years away and always will be.) I won’t dispute that observation at this point; each day’s new data for the past week or so has pushed that projection outward a bit, though never making it look any more implausible to reach eventually. 

  3. Let’s call it mid-May now, given the recent additional curve flattening (4/12 data). 

  4. Although the difference between computing the error is less with such a large increase from the 4/5 last-known number of 337,072. In case you really want to know, the absolute error from the model projecting forward five days was 2.8%. 

  5. The actual number of cases as of 4/12 from Johns Hopkins’ daily data was 555,313, an increase of 58,778. On 4/10, the model was projecting 567,306 cases, or a projected increase of 70,771. The error in the increase was 20% over the two-day interval, or 10% per day. Not as good as previous days’ next-day predictive performance, but not terrible, either. And since there will always be an error when extrapolating from a curve fit to data having a random component, I’m happy the data is lower than projected and not higher, because I have my own delicate pair of lungs that I’ve grown fond of, too. 

  6. Actually, they’re not quite chosen with uniform randomness: The default is to set up the initial population using a Latin hypercube. That way you get initialize pseudorandom parameter values, but with minimal clustering. You want the six-dimensional search space to be explored as fully as possible. 

  7. This is known as an “initial value problem,” the initial value here being the last known number of cases. You can go either direction in time from the initial value. For fitting the model parameters, my algorithm goes backwards from the most recent data. To extrapolate forward and make projections, it goes forward from the same “initial value.” 

  8. Reported cases numbers before March 5 are omitted from both the curve fitting and plots. The modeled (red) curve deviates from the historical data when you go earlier, and I’m not sure a good fit that far back in the history of this thing is relevant to what’s happening now. 

  9. You claim you wouldn’t find alien four-way procreative sex interesting? Well, I don’t believe you. 

Monday, April 6, 2020

Portrait of a Pandemic

Time is a river, a violent current of events, glimpsed once and already carried past us, and another follows and is gone.
—Marcus Aurelius, Meditations
Update, 4/8, 5:45 PM PDT: With the latest data from Johns Hopkins, the latest NightMare Plot is not different enough from the one originally included with this post to warrant editing the post at all. I’m not going to run the numbers again for all the countries here, but of course was curious as to how things are looking in my own. The answer: still not good, even though the curve is continuing to bend downward slightly.1 Today’s 429,052 cases was 10% below what the model projected (three days ago!) that today’s number would be. (That error is in terms of the projected increase, which was 30% rather than the actual 27%.) The model still projects reaching the half-million mark around April 10 and a million cases by month end.

A model I’ve developed for naively fitting to the number of new cases of Covid-19 each day modifies the widely-accepted logistic growth model with a smooth growth-regime transition. It has been closely tracking the dramatic spread of the virus in the United States and other heavily affected countries while accounting for the “flattening of the curve” that is becoming apparent to a varying extent in each of those countries.

The modified model scales the reproduction parameter r by a hyperbolic tangent function of time, with a midpoint time t0 and a half-transition time th. (All time-based parameters are in units of days.) The scaling begins near 1.0 (full traditional r reproduction rate) and transitions to some lower fractional value rf as social distancing and lockdowns force the virus into a lower-growth regime that approaches an effective rate of rfr.

The population-limited parameter L of the logistic growth model remains, though it is still not nearly as consequential or well-defined in the curves of most countries as the curve-flattening effect of rf.2 Finally, there is a very small constant term b, a fixed number of new cases per day. That is mostly included to zero out the mean of the residuals so that I could get decent goodness-of-fit indicators for the model.

The result is a logistic growth model that incorporates the “flattening of the curve” now apparent even in U.S. data by effecting a smooth transition from the original uncontrolled growth rate to a later, lower one:

xd(t, x) = r*x*(1-x/L) * flatten(t) + b
flatten(t) = 0.5*rf*(1-tanh(1.1*(t-t0)/th))-rf+1

For details, development history, context, and disclaimers, see my series of previous Covid-19 blog posts, starting with the most recent one on April 2.3

Below are projections from the model for various countries, based on Johns Hopkins data from last evening, April 5. I will start with my own mess of a country, not just because I am a citizen of it but because it now has the most cases of Covid-19 on the planet and almost surely will continue to as this nightmare continues.

The United States of America

Extrapolating from its remarkably close fit with data going back more than two weeks now, the model is projecting half a million cases4 by April 10 and a million cases by the end of the month. The residuals (errors between how closely the model fitted to past data versus what the data actually was) are of a completely normal distribution.

Kenneth Regan, professor of mathematics at SUNY Buffalo, suggested that it would be useful to monitor the ratio of new cases reported on a given day to all cases under treatment that day. The idea is that it would reveal progress in a way that is more resistant to issues like how good the reporting is or how widespread the testing is, and that it would show when the data starts to show a peak.5

Although I don’t have data for a number of all cases under treatment on a given day, I’ve tried to implement the spirit of Professor Regan’s idea with a fourth subplot at the bottom of what I’ve been calling the Nightmare Plot, showing how large each day’s new cases are as a percentage of the cumulative number of cases ever reported by that date. It’s an instructive additional visualization, and the suggestion was much appreciated. And as you can see, that metric has been tracking since since 3/19 to within a few percent of what the model expected it would have been. Thankfully, that red curve is heading steadily downward.

The Nightmare Plot: United States of America, 4/5/20 data

As with any extrapolation from time-series data without reference to underlying theory or events, the model isn’t making perfect projections. With parameters evolved to data from the day before yesterday (4/4) when there were 308,850 U.S. reported cases, the model projected 346,464 cases yesterday. Thankfully, it was pessimistic by 33%; there are “only” 337,072 total (cumulative) cases being reported in last evening’s Johns Hopkins data.

Wait a minute, you might be thinking, the difference between 346,464 and 337,072 seems like a lot less than 33%. True. It’s only 2.8%. But that would give the model too much credit, because it was extrapolating from a fixed, known number that was 91.6% of yesterday’s actual number of cases. The model only gets credit (or blame) for its projected increase from that point, which it said would be 37,614 but actually was 28,222.

This error is a welcome departure from the previous few days when the model had been making dire next-day projections that were uncomfortably close to what actually happened. With data from a day earlier (4/3) when there were 275,586 cases, the model projected there would be 311,138 the day before yesterday (4/4). That represented an error (again, considering only the increase) of 6.9% (also on the pessimistic side). With the data available on 4/2, the model projected that the 243,453 cases that day would become 275,799 on 4/3, an error of just 0.66% in the expected increase.6 That one felt a little spooky.

When you are modeling numbers that represent hundreds of thousands of your fellow citizens getting very sick, you want reality to come in below your projections. I have been hoping for some evidence that the curve is bending downward, and finally it is here. Not by much; that Nightmare Plot still goes to well over a million cases before the month is over.

But I’ll take it. I’m in a much better mood than when I was freaking out over the numbers being so close to my projections for a couple days in a row. The model has been just a bit pessimistic and required some updates to its parameters to better track reduced growth, and that might continue to happen as the days progress and people finally take this thing seriously. Down, curve, down!

———

And with that good mood, I will allow myself some pride in how closely my modification to the logistic growth model is tracking to many more data points than one might expect from its six parameters. Look at that top subplot: There is no more than 8% error between what the model is fitting to the data and what the data actually has been for the past twelve days, going back to 3/24. In that span of time, the numbers being modeled have increased by a factor of six. The 4/4 data fitted the model to within a 5% error going just as many days into the past (back to 3/23). It was pretty much the same with data from the day before, with a comparably low level of errors going back to 3/22.

My evolutionary curve-fitting algorithm does two things to counter the effect of the more recent values (much larger than earlier ones due to exponential growth) having a disproportionate amount of weight. First, the model is for the number of new cases each day, not the cumulative number of total cases ever. It is a differential equation xd(t,x), not x(t). So, to have it project the cumulative number of cases x(t), I integrate the differential equation forward or backward from a fixed point represented by the most recent known number of cases, a point that it is anchored to.7 The number of new daily cases is increasing dramatically, but not quite as dramatically as the cumulative number of cases. So that helps keep the emphasis from being quite as much on more recent data.

Second, I apply a square-root transform to both the modeled and actual new-case numbers before computing the sum of squared error (SSE) between them and then evolving the parameters for minimum SSE. The 28,222 new cases we had yesterday had four times as much influence on the curve fit as the 6,421 new cases we had on 3/17, not sixteen times as much.

Despite these efforts, the algorithm is still going to fit recent data more closely, and that fit has been very close indeed, with just 1% maximum error over the past six days. Think about it; for nearly a week as the number of cases has more than doubled, the model stays within 1% of what actually happened as it traces its smooth growth curve backwards in time.

Due to its emphasis on fitting to later data points, the model strays a bit more, percentage-wise, as we peer into the ancient history of early March. But I’m certainly not ashamed of it expecting a little less than half the 217 cases we had one month ago as it looks further backwards from the harsh reality of over a thousand times as many cases now.8 Imagine you were looking at this blog post back then and there was a model telling you that there would be “only” 150,000 cases today. Would you now be upset that it somehow failed to convey the magnitude of the situation to you? I didn’t think so.

———

So, what is making me happier now, with numbers still climbing exponentially and no leveling-off in sight on the right side of that Nightmare Plot? I am hearing the whisper of encouraging things in the combination of six parameters that my software has evolved for the model based on yesterday’s data. The curve is indeed flattening a little bit, despite the stupidity of a lot of Republican governors, nutjob pastors, and spring-break partiers.

Let’s take a quick look at those parameters for ourselves.

USA, 4/5: Final, former, and failed parameter values vs sum of squared error

Each of these six subplots shows the values of one model parameter versus the sum of squared error (SSE) between modeled values and actual values going back just over four weeks. The subplots are zoomed in so that small SSE differences are evident between members of the final population (red dots) and parameter values that had once been part of the population over the course of the 75 generations of evolution (blue dots). Also shown (small black dots) are unsuccessful challengers that never made it into any population, but whose failed parameter combinations are instructive for showing how SSE varies with parameter values.

These plots make it clear that there are some well-defined ranges for all of the model parameters except L, the maximum number of possible cases, which at worst case is limited by the country’s population. The reason L isn’t better defined is that there is still no evidence in the time-series data of a fixed upper limit to the ultimate number of U.S. cases. The best we can infer from the distribution of L values in the upper-left subplot is that the fit starts to become less plausible if we assume an ultimate upper limit of less than 20 million cases.

Another model parameter that is more of a useful nuisance is b, a very small constant number of new cases per day. With every single model I’ve tried thus far, including this latest and hopefully final one, the parameter evolution favors smaller and smaller values of b tending toward zero. That’s evident in the upper right subplot. This parameter does its job of zeroing-out the mean (average) of the residuals (second subplot from the top in the Nightmare Plot) but that’s pretty much it.

Now onto the really interesting parameters. First, of course, is our old standby r, the initial reproduction or growth rate before any curve-flattening or (eventually) population-limiting begins to take effect. In the U.S., the model says that the number of cases was increasing by around 45% per day for a couple weeks after the first modeled date of 3/5. Looking again at the top of the Nightmare Plot, you can see that the model was assuming too high of a growth rate back then. The blue line was increasing exponentially (which appears as a straight line on a logarithmic plot), but at less than 45% per day. The curve-fitting algorithm tolerated this error as it fiddled with the parameter values to get its remarkably close fit to the data later on.

And when it got that remarkably close fit is when we approached the 61-62 days after 1/22/20 that evolved as an optimal value of parameter t0. That happened around March 23. Over the span of a week either direction from that date (th of around 14 days), the growth rate transitioned halfway from its initial 45% per day down to a post-flattening growth rate that’s lower by an impressive 90%. As you can see at the bottom of the Nightmare Plot, the number of reported cases increased by 8.4% yesterday, and that growth rate is expected to continue dropping, though more slowly now.9

For a deadly virus that is right now killing thousands of my fellow citizens, a 90% drop in growth rate seems like good news indeed, even as the escalating numbers will likely continue to stress us all out for the rest of April and probably beyond.

Now let’s take a quick look at Nightmare Plots for other countries that have been impacted by Covid-19. In some of them, the model has problems fitting to the data. I’ve extended the curve-fitting interval for each country as far back as possible without allowing clearly implausible pauses and jumps in reported-case numbers to mess everything up. There’s no avoiding such pauses and jumps that have occurred more recently, like France’s sudden discovery of more than a quarter of its cases on 4/4 alone, but the differential evolution algorithm and the model do their best with what they have to work with.

I will leave you with final thought to consider as you scroll down and look through how everybody else is doing outside the U.S. None of these other countries–not even Iran with its contemptible mullahs–is being run by anyone so utterly incompetent as our failed trust-fund game show host with his obvious cognitive limitations, profound ignorance, contempt for science and sound public policy, and pathological narcissism that drives every single thing he does. Let’s please not make that mistake again, OK?

Spain

The second-most impacted country behind the good old US of A is Spain with its 126,168 reported cases yesterday.

With parameters evolved to data from a day earlier when there were 119,199 cases, the model projected 127,844 cases on 4/4. The model overestimated the increase by 24%. That’s not as impressive as the model’s fit with recent data: No more than 2% error going back to 3/29, and no more than an astoundingly low 7% going all the way back to March 15. The residuals are indistinguishable from completely random Gaussian noise.

Spain has flattened that curve pretty well recently and the projected numbers don’t look too bad, perhaps doubling over the next month.

Please note that Spain had 500 cases to our 402 back on March 7, when Donald Trump responded as follows to a reporter’s question about whether he was “concerned that the virus is getting closer to the White House and D.C.”: No, I’m not concerned at all. No, I’m not. No, we’ve done a great job.

Well, if “a great job” means not getting people tested, telling his followers to carry on as usual and causing sycophant Southern Republican governors to delay action until the magnitude of the problem became obvious even to them, and now bullying states about getting the life-saving equipment they need as our numbers are now nearly three times larger and headed for probably at least six times larger over the next month, sure, what a fine and excellent job you’ve done, you fucking moron.

Italy

The model follows a modest recent flattening of the curve, half of which occurred over a one-week interval centered on 3/24. The plot shows that there was also some very slight flattening around 3/15 and 3/12. As the noted group of epidemiologists Pink Floyd observed, “You are only coming through in waves.”

The residuals are not significantly non-normal, though they definitely “fan out” with higher predicted numbers of new daily cases. The effect of this can be seen in the upper subplot, where the model does not track earlier cases that well with their little waves of increasing and then decreasing growth rates.

The constant b evolves to an unusually high range here, reflecting a best-fit solution based on a fairly limited recent data set. The model’s simplistic assumption of a constant 600 new cases per day still works even for the first date shown (3/13). The data modeled and shown doesn’t go back earlier in March because there were large discontinuities before then that provide a questionable basis for curve-fitting.

Italy has had a very difficult time with overtaxed hospitals and lots of people dying. But at least their new cases have been significantly slowing down, as shown in the bottom subplot. Even though they can expect to wind up with several times more of their citizens ultimately reported as infected by Covid-19, it looks like they are at least in the latter stages of having slowed down their rate of growth.

Germany

Germany’s curve is quite impressive, with a significant degree of flattening. Look at this value of rf!

Germany, 4/5 data: Values of parameter rf vs SSE

In addition to demonstrating how much better it is for a country to elect an accomplished scientist as one’s leader rather than a failed casino owner spouting absolute lunacy10 about Covid-19 just six weeks ago, this plot shows that the parameter L doesn’t seem to be contributing much to the model. The curve flattening is entirely a result of the growth-regime transition, nearly to zero.

France

There’s been some jarring discontinuities in France’s data recently, so the model is not doing very well with it. The residuals are definitely not Gaussian random variation, which you can see just by looking at the right side of the residuals subplot, in addition to the essentially zero p value.

The projection is included in the interest of fairness, to make sure that non-impressive results are shown as well. If there is anything definite to be taken away from France’s Nightmare Plot, it’s that they are definitely not out of the woods yet.

Iran

Iran has been surfing waves of infection for weeks now. Just look at the periodicity in the upper subplot and the residuals plot below it! Despite that, there is no significant evidence of non-normality in the residuals. What I don’t like about the way the parameters have evolved is how much they focus on an abrupt growth-regime transition back in early March. It’s completely artificial to assume that the growth rate would drop by half over the course of a single day, and yet that’s what the model sees and so there it went.

This is the price one pays for doing naive modeling of time-series data. You make no assumptions about the underlying events or theory, and you get what you get. As the other plots show, that often works amazingly well, and it might still be working well here, but I’m a little suspicious about any projections for Iran’s future cases. If for no other reason than that having so many waves of infection means that they can expect more totally unpredictable waves in the future. So, perhaps three times as many cases in a month as now, if nothing else weird happens? Or maybe twice or half as many at that?

United Kingdom

Another instance of an artificially abrupt transition to a lower-growth regime, with one important difference. In the UK, the values of L in the final population have a nicely bounded range, and it’s got a pretty low upper end, well below the country’s population of 67.9 million. The rf parameter is also well-defined, but relatively modest; the model is doing much of its work with conventional logistic-growth behavior.

UK, 4/5 data: Values of parameter L vs SSE

Looks good to me, even if I’m not inuiting any such population upper limit just looking at the UK’s numbers thus far. Maybe the evolution of model parameters over 75 generations is seeing something my eye isn’t. I hope so.

South Korea

These residuals are definitely not normal variation. That’s probably because the model spends so much time tracking the very small increases that have occurred in the past month. Unlike us, the South Koreans knew what they were doing, and did it fast. There’s no further slowdown in growth rate ahead, but it is tiny at this point. Looking good!

Singapore

Singapore had a relatively bad day yesterday, with an abrupt jump of nearly twice as many new cases as in previous days. But their numbers of daily cases are still small.

The residuals are normal, but I’m not convinced that there has been any growth-regime transition like the model shows in the bottom subplot, though. It just doesn’t look right. And the model’s deviation from older data gives me pause.

Still, a projection of perhaps 10,000 total cases by early next month doesn’t seem too bad to me, even if it winds up being several times that for some reason the model can’t possibly account for right now.

Finland

Finally, little Finland, included not because it has been severely impacted by Covid-19 but because it’s the land of some of my ancestors and some of my friends. The model isn’t tracking Finland’s data particularly well, with residuals that don’t appear at all to be normal random variation. You don’t need the small p value to see that; just look at the discontinuities in the number of reported cases in the top subplot, and the huge momentary spike in new reported cases on 4/4.

The model projects less than 10,000 cases by May, but I honestly don’t think that projection is worth a whole lot right now. If the data comes down in the next couple days, it might be worth looking at again. Meanwhile, Finns will continue practicing the kind of social distancing they always have, long before anyone ever heard of Coronavirus.

Notes


  1. Even worse as far as the state of our fracturing Republic is concerned. I have no models or data for we may be seeing in the weeks and months ahead during any summertime reprieve we may get from the virus, when there will be no such reprieve from the deranged narcissist and his jackbooted thugs in his cabinet, the Senate, the Supreme Court, and gerrymandered GOP fiefdoms like Wisconsin.

    What will this loathsome cult of corruption and idiot-worship that mutated out of the party of Lincoln and Theodore Roosevelt do after November 4–regardless of which way the electoral college votes after whatever passes for elections in the swing states–if virus cases are rebounding like it’s 1918 all over again with economic conditions more like it’s 1929? I hope if there are sides to be taken, fellow citizen, you will be found on the side of our beloved Constitution. The sacred founding text of a nation that has been under attack for decades, by this President and his predecessors–with its first, second, fourth, and fifth Amendments and a rich history of case law from many decades of carefully reasoned decisions in between times (unfortunately including our own) when the Supreme Court allowed its political colors to show underneath those somber black robes. 

  2. Every parameter in a model should “earn its keep” by having its own robust independent effect on fitness of the curve to the data. Of all the six parameters, L is the least compelling to keep. Its role in limiting the ultimate pervasiveness of the virus is currently being overshadowed by the growth-reduction aspect of the model. But I’m leaving it in for now, looking for signs of its eventual emergence, because it ultimately will serve a purpose as the pandemic finally heads into end stage. 

  3. The full history of posts, going backward in time, is: Modeling a Slightly Flattening COVID-19 Curve (4/2), Into the Rapids (3/25), Pandemic (3/22), and the original Applying the Logistic Growth Model to Covid-19 (3/19). 

  4. Throughout this blog post “cases” refers to the number of Covid-19 cases reported for the particular country under discussion, as provided by the data freely released to the public by Johns Hopkins University. See their Terms of Use, which I have adopted as my own with of course a suitable change in names.

    I will repeat yet again that I have no expertise in biology, medicine, or the spread of infectious disease, and so I try not to speculate on how many people have actually gotten infected with this thing without it ever being reported. Again, I’m just a retired engineer who has spent years constructing nonlinear models of mostly electronic things and believes this one is a pretty well grounded for doing the following, and only the following: predicting what will happen if the data continues as it has recently, especially as it has in the past two weeks

  5. Thanks to my dear friend and former boss Louis J. Hoffman for conveying this suggestion back to me from Prof. Regan, along with permission to give him credit for it. 

  6. It was on the pessimistic side, as the model almost always has been, but by such a small amount in that case as to hardly seem worth mentioning. If you are curious, the model was extrapolating the 4/3 data out to a projected 350,510 cases yesterday. Pessimistic again, but not by much: 14% over two days when the actual increase was nearly a hundred thousand cases. 

  7. This is known as an “initial value problem,” the initial value here being the last known number of cases. You can go either direction in time from the initial value. For fitting the model parameters, my algorithm goes backwards from the most recent data. To extrapolate forward and make projections, it goes forward from the same “initial value.” 

  8. The model anchors to the most recent known data point, which for this discussion is last evening’s Johns Hopkins data (4/5) with its 337,072 U.S. reported cases. Remember, the model’s differential equation xd(t,x) is for the expected number of new cases each day, not the cumulative number of cases x(t). So, to have it project the cumulative number of cases, I integrate the differential equation forward or backward from that last fixed point.

    To have it integrating that differential equation backwards a full month and winding up in the same neighborhood with the mere 217 cases we had then, shrinking its backwards projections by more than a thousand in the process, seems pretty remarkable to me. 

  9. In my previous post, I wrote “in praise of the hyperbolic tangent function” for nonlinear modeling, and how I’ve used it for electronic circuit simulation. Turns out that tanh( . . .) is also quite useful for gradually transitioning from initially unrestrained exponential growth to a lower-growth regime resulting from social distancing and quarantine. 

  10. By the last week of February, “he criticized CNN and MSNBC for ‘panicking markets.’ He said at a South Carolina rally falsely that ‘the Democrat policy of open borders’ had brought the virus into the country. He lashed out at ‘Do Nothing Democrat comrades.’ He tweeted about ‘Cryin’ Chuck Schumer,’ mocking Schumer for arguing that Trump should be more aggressive in fighting the virus. The next week, Trump would blame an Obama administration regulation for slowing the production of test kits. There was no truth to the charge.”

    “Throughout late February, Trump also continued to claim the situation was improving. On Feb. 26, he said: ‘We’re going down, not up. We’re going very substantially down, not up.’ On Feb. 27, he predicted: ‘It’s going to disappear. One day it’s like a miracle it will disappear.’ On Feb. 29, he said a vaccine would be available ‘very quickly’ and ‘very rapidly’ and praised his administration’s actions as ‘the most aggressive taken by any country.’ None of these claims were true.” David Leonhardt, A Complete List of Trump’s Attempts to Play Down Coronavirus, The New York Times (March 15, 2020). 

Thursday, April 2, 2020

Modeling a Slightly Flattening COVID-19 Curve

So last year 37,000 Americans died from the common Flu. It averages between 27,000 and 70,000 per year. Nothing is shut down, life & the economy go on. At this moment there are 546 confirmed cases of CoronaVirus, with 22 deaths. Think about that!
—Donald J. Trump (March 9, 2020)
And it hit the world. And we’re prepared, and we’re doing a great job with it. And it will go away. Just stay calm. It will go away.
—Donald J. Trump (March 10, 2020)
TL;DR: A modification to the logistic growth model that provides a smooth transition to a lower-growth regime is fitting the data for U.S. reported cases obtained on April 2 from Johns Hopkins University with errors of less than 5% each day for the past week, and less than 14% for the week earlier. It indicates that there will be around 275,000 U.S. reported cases tomorrow, around 311,000 the day after that, and half a million in a week from now (4/9). According to the model’s projections (which must be treated with caution as for any extrapolation), with exponential growth tempered by “flattening of the curve” proceeding as it has been, the million-case mark will be reached in mid-April. The projection is for a couple million reported U.S. cases by the end of the month. See the Nightmare Plot and all my Covid-19 blog posts for some important development history, context, disclaimer, and even a few thoughts about putting this thing into perspective.
Update, 4/3/20: The exact projection was that there would be 275,799 U.S. reported cases today if the numbers continued as they have for the past two weeks. Today’s number was 275,586. I see no need to update anything else in this blog post, but if you really want to see the latest version of the Nightmare plot, it is here. There has not been even a 0.1% error between the model and the actual numbers for the past three days.

Over the past two weeks, what began as a simple new usage example to include with my free, open-source evolutionary parameter-finding software wound up turning into something of a ghoulish personal challenge: to identify a statistically plausible mathematical model that would fit decently with the exploding numbers of reported Covid-19 cases in my beloved but increasingly broken country, the United States of America.

I’ve made lots of refinements to the code and its underlying modeling over that time, and discussed that along with some projections (accompanied by important disclaimers, which also apply here) along the way in my blog posts of March 19, March 22, and March 25. Along the way, I tried several different approaches to modeling this thing: the logistic growth model, the power law with exponential decay, and even a linear combination of the two, always accompanied by a small linear term.

What increasingly intrigued and frustrated me, though, was the inability of any of those models to deal with a slight “flattening of the curve” that has become apparent in the U.S. national numbers of recent days. Fitting the logistic growth model xd(t,x)=x*r*(1-x/L) to the data resulted in an unrealistically low value of L, because that’s the best such a simple model could do with the recent reduction in growth. The power-law model with exponential decay1 xd(t)=a*(t-ts)^n*exp(-(t-ts)/t0) provided interesting results that didn’t seem unrealistic and that proved pretty accurate at predicting the next few days’ reported case numbers. But there was no evidence of any realistic value being found for ts to implement the “decay” part of the model, and so the values beyond just a few days out had a very wide span of uncertainty.

I tried a linear combination of the two models, but that meant a lot of degrees of freedom. The combined model valiantly tried fitting itself with all its parameters to an anomaly in the US reported case data from 3/16 to 3/18, and offered me even less confidence than usual for making extrapolations.

But today an idea occurred to me to try, and it worked. The result is a logistic growth model that does a smooth transition from an earlier growth rate to a later, lower one:

xd(t, x) = r*x*(1-x/L) * flatten(t) + b
flatten(t) = 0.5*rf*(1-tanh(1.1*(t-t0)/th))-rf+1

There are six parameters:

L is the maximum number of possible cases, which at worst case is limited by the country’s population.

r is the initial exponential growth rate, before any curve-flattening becomes apparent.

rf is the fractional reduction in growth rate at the conclusion of the lower-growth (flattened curve) regime.

th is the time interval (in days) over which the middle half of a smooth transition occurs from the original higher-growth behavior to a fully flattened curve with the lower “flattened” growth rate r*rf.

t0 is the time (in days after 1/22/20) at the middle of the transition from regular logistic-growth behavior to a fully flattened curve.

b is a very small constant number of new cases per day. This is mostly there to zero out the mean of the residuals so that I could get decent goodness-of-fit indicators for the model.2

———

Here is the result, my latest (and perhaps last?) revision to what I’ve been calling the Nightmare Plot. The image was produced by the latest commit of covid19.py Python script using data released this evening from Johns Hopkins University. It’s an important image, so go ahead and click on it to see it with full resolution, or even download it if you want.

April 2, 2020: Reported U.S. Covid-19 cases vs days since 1/22

Extrapolating forward (admittedly a perilous enterprise, but I’m curious and so are you), the model is projecting half a million cases one week from today, and over a million the week after that. The end of April is looking pretty scary indeed, with the potential for one in every hundred Americans to be reporting infection with Covid-193 and the numbers still continuing to climb.

Logistic Growth with Growth-Regime Transition

If it weren’t for the terrifying reality of what is represented by the numbers it spits out, I would be delighted with the beautiful simplicity and performance of this new model. It adds an important feature to the logistic growth model that is well accepted for population-limited reproduction. And it’s a feature uniquely suited to what’s happening right now with the Coronavirus.

This new feature I’m calling “growth-regime transition.” It smoothly reduces the growth rate from its initial unrestrained amount to a lower one. Thus it can account for recent efforts of non-idiot governors and their less ignorant citizenry to “flatten the curve” by staying home and being careful and apart when they have to go out.

And that’s exactly what it’s doing, to a remarkable degree of precision. Even as the numbers have multiplied fivefold since a week ago, the model tracks each day’s cumulative numbers to within 5%. Going back a week further to when there were about 1/70 as many cases (3,499) as today, the model is no more than 13% from each day’s actual numbers. Getting that close that many times in a row with just six parameters, and with the numbers doubling six times, feels a little spooky to me.

It’s not just my gut feeling that indicates this model is really onto something. The residuals (a fancy name for the difference between what the model said a day’s numbers should have been and what they actually were) are shown in the middle of the Nightmare Plot with some information about them. There is no statistically significant evidence that the errors are anything but normal random variation. There may be a little heteroskedasticity to them, but I think they pretty much look like residuals should. The square-root transform I’m applying to the modeled and actual data before calculating the sum of squared errors mostly seems to be dealing with the huge change in magnitude of the numbers over time.

The middle plot includes a computation of the model’s Akaike information criterion, corrected (AICc), score. It’s -8.62 for this “logistic growth with growth-regime transition” model, lower and thus better than for any of the other models I’ve tried. Operating on the same dataset, the AICc was around zero for the 7-parameter linear combination of logistic growth, power law, and linear (projecting 460,791 cases in one week). It was +3.60 for just the logistic growth plus linear model (projecting 384,264 cases in a week).

The power-law model with time-shifting, exponential decay, and a linear term did pretty well as a second runner-up to the new one. Its AICc was -4.58 with today’s Johns Hopkins data, and it is no more than 8% from those actual numbers going back to 3/25. It has nice random-seeming residuals with no evidence whatsoever of non-normality. But it does only track recent data, quickly deviating from what actually happened when you go more than a week back. For what it’s worth, it projects 503,435 cases a week from now.

———

Another promising aspect of the model: Its parameter values lie in nice rounded distributions with respect to the sum of squared error, indicating that they (or most of them, L being a stubbornly hard one to identify at this still-early stage) are all earning their keep.

Here is the distribution of both parameters L and r for the logistic-growth aspect of the model, plotted against SSE (sum of squared error, after the square-root transform is applied to modeled and actual data values). The black dots represent parameter values that challenged some member of the population at some point and failed. The blue dots are values that were part of the population at some point in the 50-200 generations that the differential evolution algorithm ran, but got replaced by somebody better. And the red dots are members of the final population when the evolution algorithm terminated.4

It’s hard to make much of L except that it’s probably a bigger number than any of us would like. The initial growth rate r is very well defined, though, at an increase of around 35% per day.

Logistic growth parameters for April 2 data

The parameters rf (r-flattened), th (half-transition time), and t0 (transition midpoint) are all very nicely defined. The growth rate transitions to about 68% of its original value, with the middle (non-tail) half of the transition taking place over the course of a little over six days centered around 63 days after 1/22/20, or around March 24. The linear term b (actually, a constant number of new cases per day) is tiny–tending toward zero–and would be omitted except that it allows the residuals to be centered around zero without impacting the other parameters.

Growth-rate reduction and linear parameters for April 2 data

In Praise of the Hyperbolic Tangent Function

As I’ve explained in previous posts, my programming work (I’m a retired electrical engineer and patent agent turned hobbyist open-source software developer) in regular life about a month ago (seems like more like a year) was wrapping up a two-year-long project that does advanced simulation of power semiconductor circuits. I’ve developed a model for power MOSFETs that evolves optimal values of its 40-odd parameters using an asynchronous version of the differential evolution algorithm that I developed for that project.

A function I have grown to know and love from that work is the hyperbolic tangent. It plays an important role in electronic circuit simulation software to provide smooth transitions from one state to another. You can’t expect software that does stepwise integration to behave well when you suddenly change a zero to a one. The integrator wants to follow a smooth path. If nature abhors a vacuum, then electronic circuits–real or simulated–abhor discontinuity. You have to go gradually from one voltage or number to another, even if “gradually” involves a transition time measured in the billionths of a second.

When it comes to “flattening the curve” of a deadly pandemic by incarcerating ourselves for weeks on end, my layman’s hunch was that the transition time would be in days. Well, it was.

According to what evolved as the best-fitting combination of the six model parameters, the rate of growth in reported U.S. Covid-19 cases transitions halfway from a 35% daily increase to around 23% over the course of six days. The middle of that transition was on March 24. Look at the closeness of fit around that time, and afterward. There’s not even a tenth of a percent error in what the model says yesterday’s numbers should have been. The model’s worst error in the past four days, as the numbers have doubled, has been 1%.

Previous Projections

On March 19, I made my first public post about my modeling efforts. That day, the cumulative number of reported cases of Covid-19 in the United States was 13,677. That is according to data released each day by Johns Hopkins University, on which I have been basing all of my modeling work.

When I refer to “cases” below, it is from this same data set, provided as a CSV file from Johns Hopkins. They’ve done the public a great service by making such a valuable resource freely and openly available. I’m releasing everything from my own 70+ hours of amateur work to the public as well, in a small way of following that spirit.5 The Nightmare Plot is dedicated to the public domain; you can copy it without restriction, though I hope you will leave the identifier there that urges reference to my disclaimer language.

I was using a non-differential form of the logistic growth model, with four parameters: an upper reproduction limit L, the growth exponent k (I now call it r, because the programmer in me hates using k for anything but an index variable), plus two of my own, a time-shifting parameter t0 and a linear term a (I now call it b). The function for the modeled number of total cases, cumulative, on a given day t (units are in the number of days after 1/22/20) was f(t)=L/(1+exp(-k*(t-t0)))+a*t.

With those parameters evolved for a best fit with the 3/19 data, using the well-known differential evolution algorithm implemented by my ade Python package, that model predicted that there would be 429,414 cases today. It was 76% too pessimistic; we “only” have about 18 times as many cases as then, not 31.

On March 22, a week and a half ago when there were 33,272 U.S. reported cases, I posted again, this time about results from modeling with a linear combination of the regular logistic growth component, a power-law component with exponential decay, and the linear term (constant differential) b fitted to that day’s Johns Hopkins data. The combination model projected that there would be around 350,000 cases today. That was around 40% too high.

Thankfully, there has been some flattening of the curve since then.6 I wasn’t really seeing it show up in my other models, though. My most recent blog post before the one you are now reading, from March 26 was still being pessimistic, with a projection of 359,169 cases for today.

I was waiting and hoping for the model to continue to be shown pessimistic due to some “fancy theoretical factors that I didn’t even bother trying to understand finally emerge in a few days (hell, how about tomorrow?),” and for that curve to finally get “its long-awaited downward bend.” That has happened to some extent, and hopefully the open-source code that runs my latest (and hopefully last!) attempt at a model addressing it will continue to update to an even more lowered growth rate. Making pretty plots of that happy circumstance is left as an exercise to the Python-savvy reader.

A Final Note

Awful implications aside in a time of global crisis, there is a certain illicit thrill to doing my own naive research and thinking in a field supposedly best left to the experts, enjoying the faint flicker of possibility that I might be seeing the equations slide into place in a way that all their formal training somehow missed.

But, this is as good a time as any to repeat, yet again, that I have no expertise in biology, medicine, or infectious disease. See all the disclaimer stuff in my previous posts. I’m just a retired engineer who has always had a passion for modeling interesting data. I’m really looking forward to releasing a free, open-source Python package that does some really cool modeling of circuits using power MOSFETs, and I hope that makes the free Ngspice circuit simulator a lot more interesting to people.)

Also worth repeating:

The model is just the integration of a first-order differential equation that can be extrapolated to predict reported U.S. cases of Covid-19 if the reported cases continue as they have for the past two weeks.

Be well. And, please, as a fellow citizen who is more worried right now about the health of our Republic than that of my body, remember when you vote in November how much your President fucked things up and how he will have doubtless continued to behave like an idiot wannabe dictator in the months ahead.7 Personally, as I contemplate a near future consisting mostly of hiding away from friends, family, and neighbors to avoid a disease that otherwise could very well drown me in my own lungs, I am not quite yet sick of winning.

Notes


  1. Modified by me with an additional ts parameter. It shifts the start of both growth and decay to some best-fit date well after the first reported case in the Johns Hopkins data set, on 1/22/​20. 

  2. With every single model I’ve tried thus far, including this latest and hopefully final one, the parameter evolution favors smaller and smaller values of b tending toward zero. 

  3. The careful reader may notice that I always refer to “reported U.S. cases” or some alternate variation and wording. I will repeat yet again that I have no expertise in biology, medicine, or the spread of infectious disease, and so I try not to speculate on how many of our fellow citizens have actually gotten infected with this thing without it ever being reported. Again, I’m just a retired engineer who has spent years constructing nonlinear models of mostly electronic things and believes this one is a pretty well grounded for doing the following, and only the following: predicting what will happen if the data continues as it has recently, especially as it has in the past week or two

  4. You can create these plots for yourself from the .dat files in the ade/​examples directory, or from a .dat file you generate yourself. Just run the pv command that gets installed when you run pip install ade. There are some important command-line options; run pv -h for details. 

  5. My ade Python package of which the covid19.py file is a part, is free to download, run, modify, and incorporate into other work under the terms of the Apache license.

    The terms of use for the Johns Hopkins data (and I expressly incorporate by reference their disclaimer language into this and all my other Covid-19 related blog posts, obviously with my name substituted for theirs) are as follows:

    “This GitHub repo and its contents herein, including all data, mapping, and analysis, copyright 2020 Johns Hopkins University, all rights reserved, is provided to the public strictly for educational and academic research purposes. The Website relies upon publicly available data from multiple sources, that do not always agree. The Johns Hopkins University hereby disclaims any and all representations and warranties with respect to the Website, including accuracy, fitness for use, and merchantability. Reliance on the Website for medical guidance or use of the Website in commerce is strictly prohibited.” 

  6. At that point, I had not transitioned to modeling the number of new cases each day and then integrating the model’s first-order differential equation into a plot of cumulative cases each day. Instead, a nonlinear regression version of the logistic growth model was fitted to cumulative case numbers. 

  7. To say nothing of his pathological narcissism being the driving force for his every decision, from downplaying the significance of the pandemic just weeks ago and unleashing the wave of infection now emerging in MAGA country, to his demands that governors be nice to him before getting much-needed (and taxpayer-funded) federal assistance.