Monday, April 6, 2020

Portrait of a Pandemic

Time is a river, a violent current of events, glimpsed once and already carried past us, and another follows and is gone.
—Marcus Aurelius, Meditations

A model I’ve developed for naively fitting to the number of new cases of Covid-19 each day modifies the widely-accepted logistic growth model with a smooth growth-regime transition. It has been closely tracking the dramatic spread of the virus in the United States and other heavily affected countries while accounting for the “flattening of the curve” that is becoming apparent to a varying extent in each of those countries.

The modified model scales the reproduction parameter r by a hyperbolic tangent function of time, with a midpoint time t0 and a half-transition time th. (All time-based parameters are in units of days.) The scaling begins near 1.0 (full traditional r reproduction rate) and transitions to some lower fractional value rf as social distancing and lockdowns force the virus into a lower-growth regime that approaches an effective rate of rfr.

The population-limited parameter L of the logistic growth model remains, though it is still not nearly as consequential or well-defined in the curves of most countries as the curve-flattening effect of rf.1 Finally, there is a very small constant term b, a fixed number of new cases per day. That is mostly included to zero out the mean of the residuals so that I could get decent goodness-of-fit indicators for the model.

The result is a logistic growth model that incorporates the “flattening of the curve” now apparent even in U.S. data by effecting a smooth transition from the original uncontrolled growth rate to a later, lower one:

xd(t, x) = r*x*(1-x/L) * flatten(t) + b
flatten(t) = 0.5*rf*(1-tanh(1.1*(t-t0)/th))-rf+1

For details, development history, context, and disclaimers, see my series of previous Covid-19 blog posts, starting with the most recent one on April 2.2

Below are projections from the model for various countries, based on Johns Hopkins data from last evening, April 5. I will start with my own mess of a country, not just because I am a citizen of it but because it now has the most cases of Covid-19 on the planet and almost surely will continue to as this nightmare continues.

The United States of America

Extrapolating from its remarkably close fit with data going back more than two weeks now, the model is projecting half a million cases3 by April 10 and a million cases by the end of the month. The residuals (errors between how closely the model fitted to past data versus what the data actually was) are of a completely normal distribution.

Kenneth Regan, professor of mathematics at SUNY Buffalo, suggested that it would be useful to monitor the ratio of new cases reported on a given day to all cases under treatment that day. The idea is that it would reveal progress in a way that is more resistant to issues like how good the reporting is or how widespread the testing is, and that it would show when the data starts to show a peak.4

Although I don’t have data for a number of all cases under treatment on a given day, I’ve tried to implement the spirit of Professor Regan’s idea with a fourth subplot at the bottom of what I’ve been calling the Nightmare Plot, showing how large each day’s new cases are as a percentage of the cumulative number of cases ever reported by that date. It’s an instructive additional visualization, and the suggestion was much appreciated. And as you can see, that metric has been tracking since since 3/19 to within a few percent of what the model expected it would have been. Thankfully, that red curve is heading steadily downward.

The Nightmare Plot: United States of America, 4/5/20 data

As with any extrapolation from time-series data without reference to underlying theory or events, the model isn’t making perfect projections. With parameters evolved to data from the day before yesterday (4/4) when there were 308,850 U.S. reported cases, the model projected 346,464 cases yesterday. Thankfully, it was pessimistic by 33%; there are “only” 337,072 total (cumulative) cases being reported in last evening’s Johns Hopkins data.

Wait a minute, you might be thinking, the difference between 346,464 and 337,072 seems like a lot less than 33%. True. It’s only 2.8%. But that would give the model too much credit, because it was extrapolating from a fixed, known number that was 91.6% of yesterday’s actual number of cases. The model only gets credit (or blame) for its projected increase from that point, which it said would be 37,614 but actually was 28,222.

This error is a welcome departure from the previous few days when the model had been making dire next-day projections that were uncomfortably close to what actually happened. With data from a day earlier (4/3) when there were 275,586 cases, the model projected there would be 311,138 the day before yesterday (4/4). That represented an error (again, considering only the increase) of 6.9% (also on the pessimistic side). With the data available on 4/2, the model projected that the 243,453 cases that day would become 275,799 on 4/3, an error of just 0.66% in the expected increase.5 That one felt a little spooky.

When you are modeling numbers that represent hundreds of thousands of your fellow citizens getting very sick, you want reality to come in below your projections. I have been hoping for some evidence that the curve is bending downward, and finally it is here. Not by much; that Nightmare Plot still goes to well over a million cases before the month is over.

But I’ll take it. I’m in a much better mood than when I was freaking out over the numbers being so close to my projections for a couple days in a row. The model has been just a bit pessimistic and required some updates to its parameters to better track reduced growth, and that might continue to happen as the days progress and people finally take this thing seriously. Down, curve, down!

———

And with that good mood, I will allow myself some pride in how closely my modification to the logistic growth model is tracking to many more data points than one might expect from its six parameters. Look at that top subplot: There is no more than 8% error between what the model is fitting to the data and what the data actually has been for the past twelve days, going back to 3/24. In that span of time, the numbers being modeled have increased by a factor of six. The 4/4 data fitted the model to within a 5% error going just as many days into the past (back to 3/23). It was pretty much the same with data from the day before, with a comparably low level of errors going back to 3/22.

My evolutionary curve-fitting algorithm does two things to counter the effect of the more recent values (much larger than earlier ones due to exponential growth) having a disproportionate amount of weight. First, the model is for the number of new cases each day, not the cumulative number of total cases ever. It is a differential equation xd(t,x), not x(t). So, to have it project the cumulative number of cases x(t), I integrate the differential equation forward or backward from a fixed point represented by the most recent known number of cases, a point that it is anchored to.6 The number of new daily cases is increasing dramatically, but not quite as dramatically as the cumulative number of cases. So that helps keep the emphasis from being quite as much on more recent data.

Second, I apply a square-root transform to both the modeled and actual new-case numbers before computing the sum of squared error (SSE) between them and then evolving the parameters for minimum SSE. The 28,222 new cases we had yesterday had four times as much influence on the curve fit as the 6,421 new cases we had on 3/17, not sixteen times as much.

Despite these efforts, the algorithm is still going to fit recent data more closely, and that fit has been very close indeed, with just 1% maximum error over the past six days. Think about it; for nearly a week as the number of cases has more than doubled, the model stays within 1% of what actually happened as it traces its smooth growth curve backwards in time.

Due to its emphasis on fitting to later data points, the model strays a bit more, percentage-wise, as we peer into the ancient history of early March. But I’m certainly not ashamed of it expecting a little less than half the 217 cases we had one month ago as it looks further backwards from the harsh reality of over a thousand times as many cases now.7 Imagine you were looking at this blog post back then and there was a model telling you that there would be “only” 150,000 cases today. Would you now be upset that it somehow failed to convey the magnitude of the situation to you? I didn’t think so.

———

So, what is making me happier now, with numbers still climbing exponentially and no leveling-off in sight on the right side of that Nightmare Plot? I am hearing the whisper of encouraging things in the combination of six parameters that my software has evolved for the model based on yesterday’s data. The curve is indeed flattening a little bit, despite the stupidity of a lot of Republican governors, nutjob pastors, and spring-break partiers.

Let’s take a quick look at those parameters for ourselves.

USA, 4/5: Final, former, and failed parameter values vs sum of squared error

Each of these six subplots shows the values of one model parameter versus the sum of squared error (SSE) between modeled values and actual values going back just over four weeks. The subplots are zoomed in so that small SSE differences are evident between members of the final population (red dots) and parameter values that had once been part of the population over the course of the 75 generations of evolution (blue dots). Also shown (small black dots) are unsuccessful challengers that never made it into any population, but whose failed parameter combinations are instructive for showing how SSE varies with parameter values.

These plots make it clear that there are some well-defined ranges for all of the model parameters except L, the maximum number of possible cases, which at worst case is limited by the country’s population. The reason L isn’t better defined is that there is still no evidence in the time-series data of a fixed upper limit to the ultimate number of U.S. cases. The best we can infer from the distribution of L values in the upper-left subplot is that the fit starts to become less plausible if we assume an ultimate upper limit of less than 20 million cases.

Another model parameter that is more of a useful nuisance is b, a very small constant number of new cases per day. With every single model I’ve tried thus far, including this latest and hopefully final one, the parameter evolution favors smaller and smaller values of b tending toward zero. That’s evident in the upper right subplot. This parameter does its job of zeroing-out the mean (average) of the residuals (second subplot from the top in the Nightmare Plot) but that’s pretty much it.

Now onto the really interesting parameters. First, of course, is our old standby r, the initial reproduction or growth rate before any curve-flattening or (eventually) population-limiting begins to take effect. In the U.S., the model says that the number of cases was increasing by around 45% per day for a couple weeks after the first modeled date of 3/5. Looking again at the top of the Nightmare Plot, you can see that the model was assuming too high of a growth rate back then. The blue line was increasing exponentially (which appears as a straight line on a logarithmic plot), but at less than 45% per day. The curve-fitting algorithm tolerated this error as it fiddled with the parameter values to get its remarkably close fit to the data later on.

And when it got that remarkably close fit is when we approached the 61-62 days after 1/22/20 that evolved as an optimal value of parameter t0. That happened around March 23. Over the span of a week either direction from that date (th of around 14 days), the growth rate transitioned halfway from its initial 45% per day down to a post-flattening growth rate that’s lower by an impressive 90%. As you can see at the bottom of the Nightmare Plot, the number of reported cases increased by 8.4% yesterday, and that growth rate is expected to continue dropping, though more slowly now.8

For a deadly virus that is right now killing thousands of my fellow citizens, a 90% drop in growth rate seems like good news indeed, even as the escalating numbers will likely continue to stress us all out for the rest of April and probably beyond.

Now let’s take a quick look at Nightmare Plots for other countries that have been impacted by Covid-19. In some of them, the model has problems fitting to the data. I’ve extended the curve-fitting interval for each country as far back as possible without allowing clearly implausible pauses and jumps in reported-case numbers to mess everything up. There’s no avoiding such pauses and jumps that have occurred more recently, like France’s sudden discovery of more than a quarter of its cases on 4/4 alone, but the differential evolution algorithm and the model do their best with what they have to work with.

I will leave you with final thought to consider as you scroll down and look through how everybody else is doing outside the U.S. None of these other countries–not even Iran with its contemptible mullahs–is being run by anyone so utterly incompetent as our failed trust-fund game show host with his obvious cognitive limitations, profound ignorance, contempt for science and sound public policy, and pathological narcissism that drives every single thing he does. Let’s please not make that mistake again, OK?

Spain

The second-most impacted country behind the good old US of A is Spain with its 126,168 reported cases yesterday.

With parameters evolved to data from a day earlier when there were 119,199 cases, the model projected 127,844 cases on 4/4. The model overestimated the increase by 24%. That’s not as impressive as the model’s fit with recent data: No more than 2% error going back to 3/29, and no more than an astoundingly low 7% going all the way back to March 15. The residuals are indistinguishable from completely random Gaussian noise.

Spain has flattened that curve pretty well recently and the projected numbers don’t look too bad, perhaps doubling over the next month.

Please note that Spain had 500 cases to our 402 back on March 7, when Donald Trump responded as follows to a reporter’s question about whether he was “concerned that the virus is getting closer to the White House and D.C.”: No, I’m not concerned at all. No, I’m not. No, we’ve done a great job.

Well, if “a great job” means not getting people tested, telling his followers to carry on as usual and causing sycophant Southern Republican governors to delay action until the magnitude of the problem became obvious even to them, and now bullying states about getting the life-saving equipment they need as our numbers are now nearly three times larger and headed for probably at least six times larger over the next month, sure, what a fine and excellent job you’ve done, you fucking moron.

Italy

The model follows a modest recent flattening of the curve, half of which occurred over a one-week interval centered on 3/24. The plot shows that there was also some very slight flattening around 3/15 and 3/12. As the noted group of epidemiologists Pink Floyd observed, “You are only coming through in waves.”

The residuals are not significantly non-normal, though they definitely “fan out” with higher predicted numbers of new daily cases. The effect of this can be seen in the upper subplot, where the model does not track earlier cases that well with their little waves of increasing and then decreasing growth rates.

The constant b evolves to an unusually high range here, reflecting a best-fit solution based on a fairly limited recent data set. The model’s simplistic assumption of a constant 600 new cases per day still works even for the first date shown (3/13). The data modeled and shown doesn’t go back earlier in March because there were large discontinuities before then that provide a questionable basis for curve-fitting.

Italy has had a very difficult time with overtaxed hospitals and lots of people dying. But at least their new cases have been significantly slowing down, as shown in the bottom subplot. Even though they can expect to wind up with several times more of their citizens ultimately reported as infected by Covid-19, it looks like they are at least in the latter stages of having slowed down their rate of growth.

Germany

Germany’s curve is quite impressive, with a significant degree of flattening. Look at this value of rf!

Germany, 4/5 data: Values of parameter rf vs SSE

In addition to demonstrating how much better it is for a country to elect an accomplished scientist as one’s leader rather than a failed casino owner spouting absolute lunacy9 about Covid-19 just six weeks ago, this plot shows that the parameter L doesn’t seem to be contributing much to the model. The curve flattening is entirely a result of the growth-regime transition, nearly to zero.

France

There’s been some jarring discontinuities in France’s data recently, so the model is not doing very well with it. The residuals are definitely not Gaussian random variation, which you can see just by looking at the right side of the residuals subplot, in addition to the essentially zero p value.

The projection is included in the interest of fairness, to make sure that non-impressive results are shown as well. If there is anything definite to be taken away from France’s Nightmare Plot, it’s that they are definitely not out of the woods yet.

Iran

Iran has been surfing waves of infection for weeks now. Just look at the periodicity in the upper subplot and the residuals plot below it! Despite that, there is no significant evidence of non-normality in the residuals. What I don’t like about the way the parameters have evolved is how much they focus on an abrupt growth-regime transition back in early March. It’s completely artificial to assume that the growth rate would drop by half over the course of a single day, and yet that’s what the model sees and so there it went.

This is the price one pays for doing naive modeling of time-series data. You make no assumptions about the underlying events or theory, and you get what you get. As the other plots show, that often works amazingly well, and it might still be working well here, but I’m a little suspicious about any projections for Iran’s future cases. If for no other reason than that having so many waves of infection means that they can expect more totally unpredictable waves in the future. So, perhaps three times as many cases in a month as now, if nothing else weird happens? Or maybe twice or half as many at that?

United Kingdom

Another instance of an artificially abrupt transition to a lower-growth regime, with one important difference. In the UK, the values of L in the final population have a nicely bounded range, and it’s got a pretty low upper end, well below the country’s population of 67.9 million. The rf parameter is also well-defined, but relatively modest; the model is doing much of its work with conventional logistic-growth behavior.

UK, 4/5 data: Values of parameter L vs SSE

Looks good to me, even if I’m not inuiting any such population upper limit just looking at the UK’s numbers thus far. Maybe the evolution of model parameters over 75 generations is seeing something my eye isn’t. I hope so.

South Korea

These residuals are definitely not normal variation. That’s probably because the model spends so much time tracking the very small increases that have occurred in the past month. Unlike us, the South Koreans knew what they were doing, and did it fast. There’s no further slowdown in growth rate ahead, but it is tiny at this point. Looking good!

Singapore

Singapore had a relatively bad day yesterday, with an abrupt jump of nearly twice as many new cases as in previous days. But their numbers of daily cases are still small.

The residuals are normal, but I’m not convinced that there has been any growth-regime transition like the model shows in the bottom subplot, though. It just doesn’t look right. And the model’s deviation from older data gives me pause.

Still, a projection of perhaps 10,000 total cases by early next month doesn’t seem too bad to me, even if it winds up being several times that for some reason the model can’t possibly account for right now.

Finland

Finally, little Finland, included not because it has been severely impacted by Covid-19 but because it’s the land of some of my ancestors and some of my friends. The model isn’t tracking Finland’s data particularly well, with residuals that don’t appear at all to be normal random variation. You don’t need the small p value to see that; just look at the discontinuities in the number of reported cases in the top subplot, and the huge momentary spike in new reported cases on 4/4.

The model projects less than 10,000 cases by May, but I honestly don’t think that projection is worth a whole lot right now. If the data comes down in the next couple days, it might be worth looking at again. Meanwhile, Finns will continue practicing the kind of social distancing they always have, long before anyone ever heard of Coronavirus.

Notes


  1. Every parameter in a model should “earn its keep” by having its own robust independent effect on fitness of the curve to the data. Of all the six parameters, L is the least compelling to keep. Its role in limiting the ultimate pervasiveness of the virus is currently being overshadowed by the growth-reduction aspect of the model. But I’m leaving it in for now, looking for signs of its eventual emergence, because it ultimately will serve a purpose as the pandemic finally heads into end stage. 

  2. The full history of posts, going backward in time, is: Modeling a Slightly Flattening COVID-19 Curve (4/2), Into the Rapids (3/25), Pandemic (3/22), and the original Applying the Logistic Growth Model to Covid-19 (3/19). 

  3. Throughout this blog post “cases” refers to the number of Covid-19 cases reported for the particular country under discussion, as provided by the data freely released to the public by Johns Hopkins University. See their Terms of Use, which I have adopted as my own with of course a suitable change in names.

    I will repeat yet again that I have no expertise in biology, medicine, or the spread of infectious disease, and so I try not to speculate on how many people have actually gotten infected with this thing without it ever being reported. Again, I’m just a retired engineer who has spent years constructing nonlinear models of mostly electronic things and believes this one is a pretty well grounded for doing the following, and only the following: predicting what will happen if the data continues as it has recently, especially as it has in the past two weeks

  4. Thanks to my dear friend and former boss Louis J. Hoffman for conveying this suggestion back to me from Prof. Regan, along with permission to give him credit for it. 

  5. It was on the pessimistic side, as the model almost always has been, but by such a small amount in that case as to hardly seem worth mentioning. If you are curious, the model was extrapolating the 4/3 data out to a projected 350,510 cases yesterday. Pessimistic again, but not by much: 14% over two days when the actual increase was nearly a hundred thousand cases. 

  6. This is known as an “initial value problem,” the initial value here being the last known number of cases. You can go either direction in time from the initial value. For fitting the model parameters, my algorithm goes backwards from the most recent data. To extrapolate forward and make projections, it goes forward from the same “initial value.” 

  7. The model anchors to the most recent known data point, which for this discussion is last evening’s Johns Hopkins data (4/5) with its 337,072 U.S. reported cases. Remember, the model’s differential equation xd(t,x) is for the expected number of new cases each day, not the cumulative number of cases x(t). So, to have it project the cumulative number of cases, I integrate the differential equation forward or backward from that last fixed point.

    To have it integrating that differential equation backwards a full month and winding up in the same neighborhood with the mere 217 cases we had then, shrinking its backwards projections by more than a thousand in the process, seems pretty remarkable to me. 

  8. In my previous post, I wrote “in praise of the hyperbolic tangent function” for nonlinear modeling, and how I’ve used it for electronic circuit simulation. Turns out that tanh( . . .) is also quite useful for gradually transitioning from initially unrestrained exponential growth to a lower-growth regime resulting from social distancing and quarantine. 

  9. By the last week of February, “he criticized CNN and MSNBC for ‘panicking markets.’ He said at a South Carolina rally falsely that ‘the Democrat policy of open borders’ had brought the virus into the country. He lashed out at ‘Do Nothing Democrat comrades.’ He tweeted about ‘Cryin’ Chuck Schumer,’ mocking Schumer for arguing that Trump should be more aggressive in fighting the virus. The next week, Trump would blame an Obama administration regulation for slowing the production of test kits. There was no truth to the charge.”

    “Throughout late February, Trump also continued to claim the situation was improving. On Feb. 26, he said: ‘We’re going down, not up. We’re going very substantially down, not up.’ On Feb. 27, he predicted: ‘It’s going to disappear. One day it’s like a miracle it will disappear.’ On Feb. 29, he said a vaccine would be available ‘very quickly’ and ‘very rapidly’ and praised his administration’s actions as ‘the most aggressive taken by any country.’ None of these claims were true.” David Leonhardt, A Complete List of Trump’s Attempts to Play Down Coronavirus, The New York Times (March 15, 2020). 

Thursday, April 2, 2020

Modeling a Slightly Flattening COVID-19 Curve

So last year 37,000 Americans died from the common Flu. It averages between 27,000 and 70,000 per year. Nothing is shut down, life & the economy go on. At this moment there are 546 confirmed cases of CoronaVirus, with 22 deaths. Think about that!
—Donald J. Trump (March 9, 2020)
And it hit the world. And we’re prepared, and we’re doing a great job with it. And it will go away. Just stay calm. It will go away.
—Donald J. Trump (March 10, 2020)
TL;DR: A modification to the logistic growth model that provides a smooth transition to a lower-growth regime is fitting the data for U.S. reported cases obtained on April 2 from Johns Hopkins University with errors of less than 5% each day for the past week, and less than 14% for the week earlier. It indicates that there will be around 275,000 U.S. reported cases tomorrow, around 311,000 the day after that, and half a million in a week from now (4/9). According to the model’s projections (which must be treated with caution as for any extrapolation), with exponential growth tempered by “flattening of the curve” proceeding as it has been, the million-case mark will be reached in mid-April. The projection is for a couple million reported U.S. cases by the end of the month. See the Nightmare Plot and all my Covid-19 blog posts for some important development history, context, disclaimer, and even a few thoughts about putting this thing into perspective.
Update, 4/3/20: The exact projection was that there would be 275,799 U.S. reported cases today if the numbers continued as they have for the past two weeks. Today’s number was 275,586. I see no need to update anything else in this blog post, but if you really want to see the latest version of the Nightmare plot, it is here. There has not been even a 0.1% error between the model and the actual numbers for the past three days.

Over the past two weeks, what began as a simple new usage example to include with my free, open-source evolutionary parameter-finding software wound up turning into something of a ghoulish personal challenge: to identify a statistically plausible mathematical model that would fit decently with the exploding numbers of reported Covid-19 cases in my beloved but increasingly broken country, the United States of America.

I’ve made lots of refinements to the code and its underlying modeling over that time, and discussed that along with some projections (accompanied by important disclaimers, which also apply here) along the way in my blog posts of March 19, March 22, and March 25. Along the way, I tried several different approaches to modeling this thing: the logistic growth model, the power law with exponential decay, and even a linear combination of the two, always accompanied by a small linear term.

What increasingly intrigued and frustrated me, though, was the inability of any of those models to deal with a slight “flattening of the curve” that has become apparent in the U.S. national numbers of recent days. Fitting the logistic growth model xd(t,x)=x*r*(1-x/L) to the data resulted in an unrealistically low value of L, because that’s the best such a simple model could do with the recent reduction in growth. The power-law model with exponential decay1 xd(t)=a*(t-ts)^n*exp(-(t-ts)/t0) provided interesting results that didn’t seem unrealistic and that proved pretty accurate at predicting the next few days’ reported case numbers. But there was no evidence of any realistic value being found for ts to implement the “decay” part of the model, and so the values beyond just a few days out had a very wide span of uncertainty.

I tried a linear combination of the two models, but that meant a lot of degrees of freedom. The combined model valiantly tried fitting itself with all its parameters to an anomaly in the US reported case data from 3/16 to 3/18, and offered me even less confidence than usual for making extrapolations.

But today an idea occurred to me to try, and it worked. The result is a logistic growth model that does a smooth transition from an earlier growth rate to a later, lower one:

xd(t, x) = r*x*(1-x/L) * flatten(t) + b
flatten(t) = 0.5*rf*(1-tanh(1.1*(t-t0)/th))-rf+1

There are six parameters:

L is the maximum number of possible cases, which at worst case is limited by the country’s population.

r is the initial exponential growth rate, before any curve-flattening becomes apparent.

rf is the fractional reduction in growth rate at the conclusion of the lower-growth (flattened curve) regime.

th is the time interval (in days) over which the middle half of a smooth transition occurs from the original higher-growth behavior to a fully flattened curve with the lower “flattened” growth rate r*rf.

t0 is the time (in days after 1/22/20) at the middle of the transition from regular logistic-growth behavior to a fully flattened curve.

b is a very small constant number of new cases per day. This is mostly there to zero out the mean of the residuals so that I could get decent goodness-of-fit indicators for the model.2

———

Here is the result, my latest (and perhaps last?) revision to what I’ve been calling the Nightmare Plot. The image was produced by the latest commit of covid19.py Python script using data released this evening from Johns Hopkins University. It’s an important image, so go ahead and click on it to see it with full resolution, or even download it if you want.

April 2, 2020: Reported U.S. Covid-19 cases vs days since 1/22

Extrapolating forward (admittedly a perilous enterprise, but I’m curious and so are you), the model is projecting half a million cases one week from today, and over a million the week after that. The end of April is looking pretty scary indeed, with the potential for one in every hundred Americans to be reporting infection with Covid-193 and the numbers still continuing to climb.

Logistic Growth with Growth-Regime Transition

If it weren’t for the terrifying reality of what is represented by the numbers it spits out, I would be delighted with the beautiful simplicity and performance of this new model. It adds an important feature to the logistic growth model that is well accepted for population-limited reproduction. And it’s a feature uniquely suited to what’s happening right now with the Coronavirus.

This new feature I’m calling “growth-regime transition.” It smoothly reduces the growth rate from its initial unrestrained amount to a lower one. Thus it can account for recent efforts of non-idiot governors and their less ignorant citizenry to “flatten the curve” by staying home and being careful and apart when they have to go out.

And that’s exactly what it’s doing, to a remarkable degree of precision. Even as the numbers have multiplied fivefold since a week ago, the model tracks each day’s cumulative numbers to within 5%. Going back a week further to when there were about 1/70 as many cases (3,499) as today, the model is no more than 13% from each day’s actual numbers. Getting that close that many times in a row with just six parameters, and with the numbers doubling six times, feels a little spooky to me.

It’s not just my gut feeling that indicates this model is really onto something. The residuals (a fancy name for the difference between what the model said a day’s numbers should have been and what they actually were) are shown in the middle of the Nightmare Plot with some information about them. There is no statistically significant evidence that the errors are anything but normal random variation. There may be a little heteroskedasticity to them, but I think they pretty much look like residuals should. The square-root transform I’m applying to the modeled and actual data before calculating the sum of squared errors mostly seems to be dealing with the huge change in magnitude of the numbers over time.

The middle plot includes a computation of the model’s Akaike information criterion, corrected (AICc), score. It’s -8.62 for this “logistic growth with growth-regime transition” model, lower and thus better than for any of the other models I’ve tried. Operating on the same dataset, the AICc was around zero for the 7-parameter linear combination of logistic growth, power law, and linear (projecting 460,791 cases in one week). It was +3.60 for just the logistic growth plus linear model (projecting 384,264 cases in a week).

The power-law model with time-shifting, exponential decay, and a linear term did pretty well as a second runner-up to the new one. Its AICc was -4.58 with today’s Johns Hopkins data, and it is no more than 8% from those actual numbers going back to 3/25. It has nice random-seeming residuals with no evidence whatsoever of non-normality. But it does only track recent data, quickly deviating from what actually happened when you go more than a week back. For what it’s worth, it projects 503,435 cases a week from now.

———

Another promising aspect of the model: Its parameter values lie in nice rounded distributions with respect to the sum of squared error, indicating that they (or most of them, L being a stubbornly hard one to identify at this still-early stage) are all earning their keep.

Here is the distribution of both parameters L and r for the logistic-growth aspect of the model, plotted against SSE (sum of squared error, after the square-root transform is applied to modeled and actual data values). The black dots represent parameter values that challenged some member of the population at some point and failed. The blue dots are values that were part of the population at some point in the 50-200 generations that the differential evolution algorithm ran, but got replaced by somebody better. And the red dots are members of the final population when the evolution algorithm terminated.4

It’s hard to make much of L except that it’s probably a bigger number than any of us would like. The initial growth rate r is very well defined, though, at an increase of around 35% per day.

Logistic growth parameters for April 2 data

The parameters rf (r-flattened), th (half-transition time), and t0 (transition midpoint) are all very nicely defined. The growth rate transitions to about 68% of its original value, with the middle (non-tail) half of the transition taking place over the course of a little over six days centered around 63 days after 1/22/20, or around March 24. The linear term b (actually, a constant number of new cases per day) is tiny–tending toward zero–and would be omitted except that it allows the residuals to be centered around zero without impacting the other parameters.

Growth-rate reduction and linear parameters for April 2 data

In Praise of the Hyperbolic Tangent Function

As I’ve explained in previous posts, my programming work (I’m a retired electrical engineer and patent agent turned hobbyist open-source software developer) in regular life about a month ago (seems like more like a year) was wrapping up a two-year-long project that does advanced simulation of power semiconductor circuits. I’ve developed a model for power MOSFETs that evolves optimal values of its 40-odd parameters using an asynchronous version of the differential evolution algorithm that I developed for that project.

A function I have grown to know and love from that work is the hyperbolic tangent. It plays an important role in electronic circuit simulation software to provide smooth transitions from one state to another. You can’t expect software that does stepwise integration to behave well when you suddenly change a zero to a one. The integrator wants to follow a smooth path. If nature abhors a vacuum, then electronic circuits–real or simulated–abhor discontinuity. You have to go gradually from one voltage or number to another, even if “gradually” involves a transition time measured in the billionths of a second.

When it comes to “flattening the curve” of a deadly pandemic by incarcerating ourselves for weeks on end, my layman’s hunch was that the transition time would be in days. Well, it was.

According to what evolved as the best-fitting combination of the six model parameters, the rate of growth in reported U.S. Covid-19 cases transitions halfway from a 35% daily increase to around 23% over the course of six days. The middle of that transition was on March 24. Look at the closeness of fit around that time, and afterward. There’s not even a tenth of a percent error in what the model says yesterday’s numbers should have been. The model’s worst error in the past four days, as the numbers have doubled, has been 1%.

Previous Projections

On March 19, I made my first public post about my modeling efforts. That day, the cumulative number of reported cases of Covid-19 in the United States was 13,677. That is according to data released each day by Johns Hopkins University, on which I have been basing all of my modeling work.

When I refer to “cases” below, it is from this same data set, provided as a CSV file from Johns Hopkins. They’ve done the public a great service by making such a valuable resource freely and openly available. I’m releasing everything from my own 70+ hours of amateur work to the public as well, in a small way of following that spirit.5 The Nightmare Plot is dedicated to the public domain; you can copy it without restriction, though I hope you will leave the identifier there that urges reference to my disclaimer language.

I was using a non-differential form of the logistic growth model, with four parameters: an upper reproduction limit L, the growth exponent k (I now call it r, because the programmer in me hates using k for anything but an index variable), plus two of my own, a time-shifting parameter t0 and a linear term a (I now call it b). The function for the modeled number of total cases, cumulative, on a given day t (units are in the number of days after 1/22/20) was f(t)=L/(1+exp(-k*(t-t0)))+a*t.

With those parameters evolved for a best fit with the 3/19 data, using the well-known differential evolution algorithm implemented by my ade Python package, that model predicted that there would be 429,414 cases today. It was 76% too pessimistic; we “only” have about 18 times as many cases as then, not 31.

On March 22, a week and a half ago when there were 33,272 U.S. reported cases, I posted again, this time about results from modeling with a linear combination of the regular logistic growth component, a power-law component with exponential decay, and the linear term (constant differential) b fitted to that day’s Johns Hopkins data. The combination model projected that there would be around 350,000 cases today. That was around 40% too high.

Thankfully, there has been some flattening of the curve since then.6 I wasn’t really seeing it show up in my other models, though. My most recent blog post before the one you are now reading, from March 26 was still being pessimistic, with a projection of 359,169 cases for today.

I was waiting and hoping for the model to continue to be shown pessimistic due to some “fancy theoretical factors that I didn’t even bother trying to understand finally emerge in a few days (hell, how about tomorrow?),” and for that curve to finally get “its long-awaited downward bend.” That has happened to some extent, and hopefully the open-source code that runs my latest (and hopefully last!) attempt at a model addressing it will continue to update to an even more lowered growth rate. Making pretty plots of that happy circumstance is left as an exercise to the Python-savvy reader.

A Final Note

Awful implications aside in a time of global crisis, there is a certain illicit thrill to doing my own naive research and thinking in a field supposedly best left to the experts, enjoying the faint flicker of possibility that I might be seeing the equations slide into place in a way that all their formal training somehow missed.

But, this is as good a time as any to repeat, yet again, that I have no expertise in biology, medicine, or infectious disease. See all the disclaimer stuff in my previous posts. I’m just a retired engineer who has always had a passion for modeling interesting data. I’m really looking forward to releasing a free, open-source Python package that does some really cool modeling of circuits using power MOSFETs, and I hope that makes the free Ngspice circuit simulator a lot more interesting to people.)

Also worth repeating:

The model is just the integration of a first-order differential equation that can be extrapolated to predict reported U.S. cases of Covid-19 if the reported cases continue as they have for the past two weeks.

Be well. And, please, as a fellow citizen who is more worried right now about the health of our Republic than that of my body, remember when you vote in November how much your President fucked things up and how he will have doubtless continued to behave like an idiot wannabe dictator in the months ahead.7 Personally, as I contemplate a near future consisting mostly of hiding away from friends, family, and neighbors to avoid a disease that otherwise could very well drown me in my own lungs, I am not quite yet sick of winning.

Notes


  1. Modified by me with an additional ts parameter. It shifts the start of both growth and decay to some best-fit date well after the first reported case in the Johns Hopkins data set, on 1/22/​20. 

  2. With every single model I’ve tried thus far, including this latest and hopefully final one, the parameter evolution favors smaller and smaller values of b tending toward zero. 

  3. The careful reader may notice that I always refer to “reported U.S. cases” or some alternate variation and wording. I will repeat yet again that I have no expertise in biology, medicine, or the spread of infectious disease, and so I try not to speculate on how many of our fellow citizens have actually gotten infected with this thing without it ever being reported. Again, I’m just a retired engineer who has spent years constructing nonlinear models of mostly electronic things and believes this one is a pretty well grounded for doing the following, and only the following: predicting what will happen if the data continues as it has recently, especially as it has in the past week or two

  4. You can create these plots for yourself from the .dat files in the ade/​examples directory, or from a .dat file you generate yourself. Just run the pv command that gets installed when you run pip install ade. There are some important command-line options; run pv -h for details. 

  5. My ade Python package of which the covid19.py file is a part, is free to download, run, modify, and incorporate into other work under the terms of the Apache license.

    The terms of use for the Johns Hopkins data (and I expressly incorporate by reference their disclaimer language into this and all my other Covid-19 related blog posts, obviously with my name substituted for theirs) are as follows:

    “This GitHub repo and its contents herein, including all data, mapping, and analysis, copyright 2020 Johns Hopkins University, all rights reserved, is provided to the public strictly for educational and academic research purposes. The Website relies upon publicly available data from multiple sources, that do not always agree. The Johns Hopkins University hereby disclaims any and all representations and warranties with respect to the Website, including accuracy, fitness for use, and merchantability. Reliance on the Website for medical guidance or use of the Website in commerce is strictly prohibited.” 

  6. At that point, I had not transitioned to modeling the number of new cases each day and then integrating the model’s first-order differential equation into a plot of cumulative cases each day. Instead, a nonlinear regression version of the logistic growth model was fitted to cumulative case numbers. 

  7. To say nothing of his pathological narcissism being the driving force for his every decision, from downplaying the significance of the pandemic just weeks ago and unleashing the wave of infection now emerging in MAGA country, to his demands that governors be nice to him before getting much-needed (and taxpayer-funded) federal assistance. 

Wednesday, March 25, 2020

Into the Rapids

To every thing there is a season, and a time to every purpose under the heaven.
—Ecclesiastes 3:1

Update, March 26

No, I couldn’t help myself; this blog post reflects an update done March 26.

I am trying to kick the Covid-19 data modeling habit. It’s just not good for my mental health, or probably yours, at this point. If you have been taking the magnitude of this disaster seriously, keep doing so. What more, really, is there useful to say?

Anyhow, in a moment of weakness I put this evening’s Johns Hopkins data into the model-parameter evolver and turned the crank. And out came the latest update to the Nightmare Plot, which I am putting right here at the top, and leaving the one with yesterday’s where it is.

March 26, 2020: Reported U.S. Covid-19 cases vs days since 1/22

Today’s number was 83,836 vs the 82,716 I’d predicted yesterday. The model was 1.4% off, and on the optimistic side this time. The residuals of modeled vs actual data do not fan out and are not significantly outside a normal distribution. A previous run of the same model with parameters evolved to fit data from the day before yesterday predicted 89,614, which was pessimistic by 6.9% over the course of two days’ extrapolation.

The universe is not yet honoring my request to be proved wrong with the curve bending down soon.

Back to March 25

Since the previous blog post, I did some work on my modeling of reported U.S. Covid-19 cases, ran the scary new example for my evolutionary parameter-finding algorithm on that evening’s data from Johns Hopkins, and posted a plot of the results on Facebook. An updated one with today’s data is here (click on it for the full-size version):

March 25, 2020: Reported U.S. Covid-19 cases vs days since 1/22

After discussing some technical details1 of the day’s work, I gave this

brief summary about what the model is predicting for reported numbers of reported Covid-19 cases in the U.S.: Tomorrow, around 66,000. Day after that, around 84,000. Reaching 100,000 the day after that (3/27). A million U.S. reported cases around April 7, doubling every 4-5 days for a while until the virus starts to run out of people to infect.

As I write this, the Worldometer Coronavirus page reports 65,797 cases in the United States.2 I don’t need to tell you how close that is to what the model was predicting yesterday.

The astute observer will be able to visually extend the red line and note that my model predicts more U.S. cases than people in the U.S. around the middle of May. Obviously, that’s not going to happen. It’s a limitation of my modeling that it doesn’t account for population of the country or region.

I’ve limited the amount of extrapolation I bother to show or talk about, since extrapolations can be uncertain even when the underlying model is well understood theoretically. But the obviously impossible prediction beyond the right edge of the plot does have a practical meaning. And it is an important one: I currently don’t see any sign of a slowdown appearing in the data. I wish I did.

Any refinements made to the model to account for such real-world practicalities as a country’s population would deviate from what I’ve been saying about what it actually does: simply predict what will happen if the reported-case data continues as it has, for about two weeks now. (Yesterday, there were slightly fewer newly reported cases than the day before, an obvious anomaly in the data.)

So, what I am seeing today is what I saw days ago already, and it terrifies me. The upward march of numbers continues, along a faint path increasingly visible on a cold gray mountain full of death.

Model Parameters

A couple of points are worth mentioning about model parameters. There is still no plausible upper bound to L, the limiting number of reported cases for the logistic-growth part of the model. Weirdly, there seems to be no plausible lower bound, either; some parameter combinations in the final population have almost no logistic component. Not something I’m very comfortable with, but there is clearly an exponential component to this, and the best few parameter combinations in the final population have L values ranging from 80 million to approaching the full U.S. population.

Here is the final population of both parameters L and r for the logistic-growth component of the model, plotted against SSE (sum of squared error, after the square-root transform is applied to modeled and actual data values):

Logistic growth parameters for March 26 data

The power-law with exponential decay component has its own parameter weirdness worth noting, too. The value of the power n in the function xd(t)=a*(t-ts)^n*exp(-(t-ts)/t0) is considerably lower than what Ziff and Ziff found for a power-law model by itself.3 Indeed, its best-fit values seems to be far less than even linear. Honestly, I’m not sure what to make of that. A very gradual increase in the number of newly reported cases each day over time, perhaps due to improved testing? Again, interpretation is left up to you.

Power-law parameters for March 26 data

Finally, the poor oddball constant term b is still there, just barely, adding its tiny fixed number of new cases per day. Against my own intuition, leaving this term out noticeably worsens the goodness of fit. I believe its utility is in reducing the SSE contribution made by early data points in the time series, allowing a better fit for the exponential and power-law components when things finally start ramping up.

Constant parameter for March 26 data

If you are comfortable with Python and data modeling, I encourage you to clone yourself a copy of the repo for ADE on GitHub, install the package with “pip install -e ade”, modified as needed.4 Run the newly installed ade-examples script, and then run “./covid19.py US” from inside the “~/ade-examples” directory that the script creates. Then you can look at the parameter values that get evolved using the pv command, like this: “pv -r 1.5 covid19.dat”.

The Fantasy of Normalcy

This afternoon [March 25], I had to go out for three unavoidable errands. I hope this is the last time I have to do that for a little while. One of my stops in the hot zone that our world is fast becoming was to the local grocery store. I sat in the parking lot and pulled on a fresh pair of latex gloves, and paused putting my respirator on my head for a moment, because I was seeing something disturbing.

It wasn’t that things were different, it’s that things were too much the same. Here I was, with every reason to believe that, on this day next week, there would be 300,000 Americans reported as infected with an insidious hidden disease that kills one out of every hundred or so who get it, a disease that puts over a dozen of those into the hospital gasping for breath. That there would likely be over a million reporting it the very week after, with the numbers still climbing fast and many more people not yet or not ever getting their own infections included in these numbers.

So I sat there a while with my well-used dirt-stained herbicide respirator5 in my lap, preparing to encounter in all likelihood at least one person carrying this thing around inside that store. Today’s person feeling fine though I sure have been tired the last couple days is next week’s positive test result.

I put on the mask, stepped outside, picked up my three plastic snap-lid storage bins and carried them into the store past the incredulous faces of people just walking around getting stuff. I stood apart from others and waited as they got their shopping carts and then got mine. I put the three plastic tubs in the cart, waited for the old man–looks like you’ve got about an 85% chance6 of surviving a case of this, gramps–to move along more than 6 feet away from me. I headed straight back to the department with the things that we decided we really couldn’t do without, pausing to give everyone a wide berth and just not giving a fuck what they thought about my respirator mask or gloves.

There were two checkout lines, both full. I just stood there projecting an invisible sphere of stay the fuck away from me and the whole man-from-mars appearance. Those two social distancers worked wonderfully except for one not-young woman who I had to outright tell, “Please move away.” And I don’t think she was even into me.

My muffled Martian voice came through the mask, “I’d like to box up my own.” The terrified-looking cashier handed me my stuff as she scanned it and I put it in my plastic tubs, with a loud snap each time I shut the lids. Those tubs never touched the floor.

No, I’m not a rewards club member. Thanks for that receipt that I’m going to crumple in my gloved hands and throw away. Have a nice day.

I unlocked the Jeep and got out a plastic can of sanitizer wipes. I opened the back, and picked up one tub at a time from the cart, wiping it down before setting it into the Jeep. I closed the back and then sanitizer-wiped all of the handles I’d touched with my gloved hands. Then I opened the door again, took off my windbreaker, and stripped down to my swim trunks that I was wearing instead of underwear. The windbreaker and my pants and shoes went into a plastic tub all their own.

I carefully removed each glove, being careful not to touch anything but the edge dangling loose around my wrist, and dropped them into the tub with the dirty clothes. Yes, folks, there is a 50-something-year-old man wearing swim trunks in March in your friendly neighborhood grocery store parking lot. Snap. Lid closed.

Then I put on a fresh pair of pants that were in the Jeep along with a fresh pair of shoes, and finally then sat in the driver’s seat.

The other two stops were much easier. Our local pharmacy clearly understands what’s happening better than the folks at the grocery store, because they had a sign outside inviting customers to take curbside delivery of their medications. And that’s what I did. The pharmacy clerk waved from the passenger side of my Jeep, I unrolled the window and extended an open Ziploc bag for him to drop the prescription bag into. We urged each other to take care and I didn’t even feel like I needed a glove on the hand that held the bag (mine, that enclosed his).

You’re probably wondering what happened with the groceries that were inside those plastic tubs. Answer: bleach solution and rags, followed by plain water and more rags to rinse off the bleach. There were a couple of nonperishables that we aren’t going to need for a while, and with no freezing temperatures in the forecast for a few days, they’re going to stay outside.

Here’s how I explained all of this to my college-age daughter, who I believe has gotten just a little bit less scornful of her doomer father’s pronouncements in recent days: Tonight, I will not lie awake wishing I hadn’t been so insanely careful.

Here’s what remains with me, though, instead of anxiety about having picked up Covid-19 this afternoon. (I really don’t think so.) I am dealing with the angry dreadful realization that my conservative, backwoods community either doesn’t know or doesn’t care about what is heading their way. All those people, casually walking around Safeway, one pair of gloves to be seen anywhere besides my own. And then there was me, marching in there with a cart of plastic tubs, wearing that neon-pink industrial respirator mask and surgical gloves like something you’d be scared of even without knowing that a pandemic was just starting to cripple your country.

I’ll say it again: Today’s person feeling mostly fine is next week’s positive test result. And next week I am expecting7 there to be around 300,000 of those positive test results. Going beyond the model for a moment and just talking out of my ass, I’d bet there will be twice that many walking around feeling mostly fine if starting to worry that maybe this whole Coronavirus thing isn’t a Democrat hoax after all.8 And they will spawn, mostly without knowing, the following week’s new ever-larger batch of positive test results.

This is as good a time as any to repeat, yet again, that I have no expertise in biology, medicine, or infectious disease. I’m just a retired engineer who has always had a passion for modeling interesting data. I’m currently about to release a free, open-source Python package that does some really cool modeling of power MOSFETs. See the disclaimer stuff in my previous posts.

Also worth repeating: The model is just the integration of a first-order differential equation that can be extrapolated to predict reported U.S. cases of Covid-19 if the reported cases continue as they have for the past two weeks. That said, I feel obligated to share one layman’s thought that gives me hope.

Perhaps the testing is accelerating faster than the virus’s replication and thus the model is being overly pessimistic in terms of real cases. In this cheery scenario, the testing is really taking off lately, lots more people being tested every day, and so yes there are more reported cases. But then the testing will start just humming along efficiently and the number of reported cases won’t keep inflating itself so fast.

There is, unfortunately, a dark alternative scenario: The testing is unable to keep up with the true replication of the virus. Yes, an accelerating number of new tests is finally appearing, but it’s not keeping up with how fast people are getting infected. Hospitals get overwhelmed, doctors stop bothering to test. And my stupid scary model winds up underestimating how fast we actually are getting to Armageddon.

This, gentle reader, is why (1) I have stuck to just predicting numbers of reported cases and leaving the rest to you, and (2) why I am eager to conclude this little modeling project. For my own mental well-being in an anxious time.

Farewell

After I took my shower and finally felt really clean, after the doorknobs were all wiped down with sanitizer wipes and the food was put away, I started doing “just a little more” work on my modeling code. Then I took a break from it and saw tonight’s total for US reported cases: 65,797. That’s 99.6% of what yesterday’s model and data had predicted.9

There is a weird unsettled feeling involved with seeing a predictive tool one has developed–not without criticism from a few self-proclaimed experts on Reddit–make a prediction so uncannily close to the mark.10 It’s a slowly roiling fog of horror with jarring bright spots of pride, and the guilty unease at that fact that I am seeing lights in there at all at a time like this.

This has happened before, on March 23. Then I wrote on Facebook,

the Worldometer Coronavirus update currently lists 43,734 U.S. cases and my model had predicted 43,639 reported U.S. cases for today’s (still nonexistent) Johns Hopkins numbers. This sort of uncannily close tracking with the data leaves a weird sense of anxious conflicted satisfaction in my mind.

It would be so much better for everyone, including myself, if I were instead being proven wrong. Feeling naive and embarrassed would be a pretty small price to pay to see the curve bending downwards faster than all my modeling predicted. Unfortunately, it still isn’t.

Take that, I find myself thinking of the random Internet guy who pooh-poohed my modeling as naive, unfounded, something only someone not properly educated in the relevant area would do. And then I look at the parts of the plot further to the right and realize what continuing to be right means.

So, let me say something for the record, to critics and fans and that little semi-living armored glob of RNA that has found a really effective way for it to propagate copies of itself: I’m totally fine with my model being full of shit for everything that happens from now on. Really, I can deal with it. I get it–it’s completely naive to try to extrapolate the integration of a nonlinear differential equation from cases already reported to what cases might be reported in the future. A charming if silly and perhaps even a little egomaniacal exercise on the part of a retired engineer with an obsession for developing nonlinear models for complicated things.

When the fancy theoretical factors that I didn’t even bother trying to understand finally emerge in a few days (hell, how about tomorrow?) and that curve finally gets its long-awaited downward bend, I will sit and watch movies and feel sorry for myself and the embarrassing way I’ve never quite managed to grow up and leave things to the experts. But it’s okay, and when we top out at maybe a couple hundred thousand U.S. cases and then the kids go back to school, I will immerse myself in weightlifting or something and trying to forget what an idiot I can be sometimes.

Next month, over the Sunday morning breakfast table with some friends, I’m looking forward to having a laugh about how silly were all being. Please be gentle with my own ego when it happens, and fervently hope that it really does.

Meanwhile, I’m done here. The time window for these sorts of projections to make a difference is closing fast, even for those who are paying attention to them. It may have closed already. Those people walking around the grocery store will not have their fates altered by something I write tonight, or whether I decide right now that it’s best for my own sanity not to trouble myself to write about this anymore.

Be well. And, for God’s sake, please do not vote for this same idiot narcissist failure of a President to stay in the White House next year. If that needs any explanation at all in your mind at this point, you need to pay attention to a lot more than just my amateurish attempt at Covid-19 modeling.

Notes


  1. Instead of weighting later samples higher in the sum-of-squared error calculation for the fitness function, I applied a square root transform to the number of new cases per day each day. That somewhat mitigates the effect of exponential growth to put much more emphasis on the most recent days’ data while allowing me to study the residuals of modeled vs actual daily new cases. 

  2. The Johns Hopkins total came in at 65,778. Running the model again when the new data became available after the original writing of this blog post did not result in different enough projections to warrant anything but a couple of footnotes and of course an update to the Nightmare Plot. 

  3. Anna L. Ziff and Robert M. Ziff (“Fractal kinetics of COVID-19 pandemic (with update 3/1/20). They fit to an exponent just a bit greater than 3. Perhaps it shouldn’t be surprising that the exponent would fit much lower than that with in combination with a logistic growth component, but I still wouldn’t have expected n (what Ziff and Ziff call x) to be down in the cube/​quad root range. 

  4. The last argument ade can be changed to whatever directory your cloned repo is in where the setup.py file for the package resides. 

  5. A typical Spring (not this one!) has me outside spraying noxious invasive weeds with a solution of 2,4D herbicide. The idea of a mere plant being “invasive” seems a bit quaint right now, though I’ll surely keep going after my remaining holdouts of spotted knapweed and St. John’s wort for many a Spring to come. 

  6. Based on some stats provided here, the death rate for people 80+ years old is 14.8%. And, by the way, gramps, it is 60% higher for men than for women. 

  7. See my previous posts for context and disclaimer. Of course you knew I’d eventually say that somewhere. 

  8. Yes, the fucking moron really did call it that: “The Democrats are politicizing the coronavirus . . . This is their new hoax” (Feb. 28, 2020). 

  9. The Nightmare Plot from my March 22 was predicting just over 70,000 for 3/25, a bit higher than it is. This is in accordance with the overall trend (a good one!) of the model being proved slightly pessimistic in its extrapolations. But, before you go celebrating some history of erroneous pessimism on my part, the plot from my March 19 blog post optimistically predicted (with an earlier regressive model of cumulative case numbers, rather than the differential equation approach I now prefer) only around 51,000 cases. 

  10. See my Reddit posts yesterday and today, referencing this blog post. Now, if I post a comment to that blog referencing this footnote, do I have to update this footnote to reflect that? And then post another Reddit comment? And so on? 

Sunday, March 22, 2020

Pandemic

I must die, and must I die groaning too?–Be fettered. Must I be lamenting too?–Exiled. And what hinders me, then, but that I may go smiling, and cheerful, and serene?
—Epictetus, Discourses.
Some pandemics are mild. But some are fierce. If the virus replicates much faster than the immune system learns to defend against it, it will cause severe and sometimes fatal illness, resulting in a pestilence that could easily claim more lives in a single year than AIDS did in 25 [years]. Epidemiologists have warned that the next pandemic could sicken one in every three people on the planet, hospitalize many of those and kill tens to hundreds of millions. The disease would spare no nation, race or income group. There would be no certain way to avoid infection.
—“Preparing for a Pandemic,” W. Wayt Gibbs and Christine Soares, Scientific American 293(5), 44-54 (Nov. 2005).
TL;DR: A very good fit between data obtained on March 19 from Johns Hopkins University and a logistic+linear growth model indicates there there will nearly 60,000 reported cases of Covid-19 in the United States on March 24, around 300,000 cases eight days after that (4/3), and several million cases around mid April, with the numbers doubling every 5-6 days or so for at least another couple weeks. See the Nightmare Plot and the Disclaimer below.
Update, 3/22, 6:25 PM PDT: With the latest data from Johns Hopkins this evening, the latest NightMare Plot is not different enough from the one originally included with the post to warrant editing the post text. It’s been a long day spent with Covid-19. Today’s reported cases came in at 33,272, not significantly off from what the model had projected with yesterday’s data. The refined model parameters have caused a slight reduction in projected numbers more than a few days out, not by much. If anything, it continues to be very disturbing to see one’s modeling borne out by the relentless upward march of reported cases. Time for a break from this now.

I worked much of yesterday and this morning on a more sophisticated modeling approach than in my previous post, integrating a differential equation f(t,x) for the number of new cases per day, on each day, rather than the total number of reported cases each day.

Running the updated code with Johns Hopkins University data published yesterday (3/21) resulted in an updated Nightmare Plot.

Reported U.S. Covid-19 cases vs days since Jan. 22, 2020

Now, this is one really important plot. It shows up way too small in this blog post for you to be able to see its important details. So please click or tap on it to open it as an image by itself.

Please bear with me as I largely repeat a few paragraphs from the previous post. The upper subplot shows the best-fit logistic growth model in red, with the actual number of cumulative reported cases of Covid-19 in blue. The error between the model and the data is shown with each annotation. Look how small the residuals are compared to the exponentially rising numbers of cases. It’s a scarily impressive fit.

The lower subplot shows the number of cases expected to be reported over time, slightly in the past and then extrapolating to the future. Two hundred and fifty generations of running a differential evolution algorithm1 resulted in a 120-member population of combinations of parameters for the model.

I could have terminated the algorithm sooner, and then there would be some visible variation in the extrapolations. But I decided to just plow onwards for five times as many generations to be more certain of finding something really close to optimal. The black dots show expected reported-case values with each separate parameter combination represented by the 120-member population, plotted in bunches around each day from tomorrow out to mid-May.

Significantly, the subplots both have a logarithmic y-axis. Exponential growth is linear when viewed through a logarithmic lens. When you see that straight line marching steadily upward toward those massive numbers, you really want all your modeling to wind up an embarrassing public failure.

More Power to You

Now, the model includes:

  1. a logistic growth component, as before,

  2. a power-law with exponential decay component, as suggested by Anna L. Ziff and Robert M. Ziff (“Fractal kinetics of COVID-19 pandemic (with update 3/1/20)”),

  3. and a linear component with a small constant number of new cases being reported per day, which only helps improve the closeness of fit early on.2

I tried Ziff and Ziff’s approach by itself but was not impressed with the closeness of fit to the data thus far. This thing is still very much exponential.

With exponential growth, the power-law behavior is not some more-than-squared increase with time but with itself. When the number of cases grows exponentially, as it has been in the U.S. for about two weeks now, the rapidly increasing number of reported cases feeds on itself. Infected people result in infected people, who then result in still more infected (and infectious!) people.

A power-law approach is only nonlinear in time, not in itself. Sure, the number of new cases will increase dramatically as the days go on, this model says. But it won’t be feeding on itself. The increase is just a function of time passing, like the days suddenly getting longer in Spring. It’s not like an exponential forest fire where what is being consumed also takes its turns consuming.

I made a good-faith attempt to switch covid19.py to the Vazquez (2006) “power-law with an exponential cutoff.” Any petty pride as a data modeler to have one’s first instincts bettered is pushed aside in the hope that it would prove more accurate and perhaps less scary than the logistic growth model I’d been using. Unfortunately, it didn’t seem to fit the data as well as the logistic growth model does. What I did find to be an improvement, however, was a blended model that included both.

So perhaps Ziff and Ziff are correct when they “tend to predict an S-shaped curve with a tapering off in the near future as is being seen.” Perhaps there are “fractal kinestics” at play that contribute some significant power-law behavior to the data we have now. It doesn’t have to mean that is the only biological or epidemiological factor at play.

The Pretty New Model

To repeat myself, I don’t have any expertise in the relevant areas. But I naively assume that a pandemic can have more than one driving factor. And so I now propose to simply add the power-law modeling as one of two components (plus a perhaps gratuitous constant) of a differential equation model for the number of new cases per day, each day. The other component is, of course the logistic growth model.

This results in a model with seven parameters. It’s a first-order differential equation:

xd(t, x) = curve_powerlaw(t) + curve_logistic(t, x) + b
curve_logistic(t, x) = x*r*(1 - x/L)
curve_powerlaw(t) = a * (t-ts)^n * exp(-(t-ts)/t0)

With any sort of modeling, one must guard against the temptation to overfit the data with ever more sophisticated models. But I believe these 7 parameters all earn their keep with the current behavior of COVID-19. More parameters are not necessarily bad; my MOSFET models have 40+ parameters, all necessary to simulate the behavior of a semiconductor device with very complex underlying physics.

Great (Actually, Shitty) Expectations

So, here’s what I personally expect, if the number of reported cases continues to match the model as well as it has in the past week or so. We will remain in full-on exponential growth for a few more days. Then, around the end of the month, we will see things starting to slowing down just a little. But the slowdown will only be in exponential terms, unfortunately, not in the linear way we usually think about things. There will still be more new cases every day then there were the previous day, for weeks to come. It’s just that the daily increases in the number of new cases will themselves stop increasing quite so fast.

I’m expecting a hundred thousand reported U.S. cases by 3/26. That’s up from the 60,000 that I was expecting–when I last updated my predictions two days ago–we would be seeing by then. My projection for 4/2 is essentially unchanged at around 400,000 cases, and I’m thinking the million-case mark will likely be reached by 4/7 instead of between 4/4 and 4/6. Again, this is still just modeling reported cases, not all of them.

There still does not seem to be any convincing upper limit before the population of the U.S. is approached, sometime in May. To put it in a purely technical way, that is really fucking scary.

Conclusion

Thanks again to my new Facebook friend Ng Yi Kai Aaron, an applied statistician in Singapore, for suggesting I look into the power-law modeling approach. Again, I’ve partially incorporated that into the model, but not entirely because it doesn’t seem to fit the data on its own, not for U.S. cases at least.

At the end of my last blog post, I got a little philosophical. I suggested that a front-row seat watching history get made in one of the shittiest ways imaginable is definitely something not to be passed up. Did you ever wonder what it would be like to watch (or feel socially compelled to watch) half-naked desperate men flail away at each other with weird instruments of death, until one of them was indeed dead? How about hearing the swoosh of the guillotine outside your Paris apartment, followed each time by the roar of an angry mob? Bracing stuff. Sucked to be there, actually, a detail which the history books tend to omit.

So here we are. The pandemic of 2020 is just getting underway. I hope you stocked up on popcorn. I also hope you carefully read my Disclaimer in the previous blog post, because it applies to everything I say here, too. And remember:

The model simply predicts what will happen if the data continues as it has for a little over a week now.

That’s it. The interpretation and explanation is up to you.

I want to add a couple of words about your behavior and the possibility of you having a response something like this: “Well, then I’ll get it anyhow so why bother being careful.”

First, you absolutely do not have to get it. I still believe that the model will have to be adjust in the future to reflect the then-apparent new reality that people finally got freaked out enough to take this seriously, deciding to risk boredom, a shortage of twinkies, or even getting minir health conditions addressed because it’s become apparent how much getting Covid-19 sucks, even for a younger person.

Or maybe we will have enforced lockdowns as this administration finally wakes up (but then see my history of blog and Facebook posts on Trump’s authoritarianism). It’s already starting to happen in the state level, not the shitty wannabe dictatorship I fear from the deranged narcissist but reasonable if drastic measures by grown adults who take their office seriously. Including, perhaps surprisingly, not a few Republicans.

If you can possibly wait this out for a while (of course you’ll need to get some things, see below for what I do), your statistics will start to look better as the number of new cases finally starts to drop each day. It will be a bit like getting to roll two dice instead of just one. Each passing week out of your self-imposed stay-healthy near-quarantine will get a little less nerve-wracking if things continue to go your way. As the saying went in Hunger Games, may the odds be ever in your favor.

And as for “Why bother if you get it anyway?”, I personally would rather have my hospital stay sometime in, say, June when the number of new cases is finally dropping. When doctors have gotten experienced with treating this thing and are recovered (the ones who make it) and immune. I want the people saving my life not to be fearing for their own. And if you are really into the hermit life, you could wait for that vaccine in about a year.

So, what is a reasonable action to take? Of course, we need to get stuff from time to time. There are very few true hermits anymore, and even preppers are going to find themselves wondering why they didn’t buy more Peanut M&Ms or tampons. I can just tell you what I do and you can laugh at me if you want, or perhaps there is something sensible. And remember the disclaimer, goddammit, and that I’m not a doctor or anything like that.

Before it became obvious even to me what a monster this pandemic was going to be, I was fortunate enough to have my wife buy a box of latex gloves. One hundred gloves equals fifty trips outside my van. That will be more than enough, because I’m not going out very often.

When I get to my destination, I park the van, give my nose a good itching, and put on a pair of the gloves. An open paper garbage bag sits in front of the center console, ready to receive the gloves when I’m done. I open the van door, step outside, and shut the door. I go into my pharmacy or grocery store (there’s really no other place I’d bother with at this point) taking advantage of automatic doors wherever possible. Or, failing that, I’ll be honest and say that I try to sort of just be behind somebody whose already opening the door themselves. Why thank you!

I go get what I need, touching as little as possible, staying the fuck away from everyone else. Don’t even look at me like you’re gonna cough. Maybe for this reason alone, of those N95 masks I bought months ago for slash burning would be useful. When it’s needed, I pull my credit card out of a Ziplock bag and put it back in while trying not to have it touch the bag.

I do not care what people think about my wearing gloves. At this point, they’re actually probably jealous, or maybe thinking I’ve got a big stash of them that I hoarded from everybody else. Well, I wasn’t even planning on buying the one box.

Now, if you don’t have them, pretend you do, but these gloves have some weird invisible stuff on them that burns your skin on contact.

These imaginary gloves are 100% effective but you just don’t want to touch them except on the inside. You will touch everything you need to, including your credit card, through these imaginary gloves. You don’t want to touch things that you will be touching without the gloves on, because that will leave the invisible stinging stuff on your steering wheel and then your hands will still hurt in a few minutes. So, only stuff you’ll never have to touch again, like that door handle or Harvey Weinstein.

Except there is one solution to the problem of having to touch some things with and without the imaginary gloves on. The stinging stuff washes off with those really strong sanitizing wipes. You just have to wash off any of the stuff you get on your hands, before it starts to hurt. And then your door handle will be clean and so will your hands, ready for you to touch without the gloves on.

The time you take off the imaginary gloves is of course the time when you put a little glob of hand sanitizer into your palms and then rub the stuff around everywhere. Honestly, at this point, I’d probably be doing it twice, and then wiping down the exterior of the hand sanitizer bottle with some leftover sanitizer. Only after all this can you consider your hands ready to touch something without the imaginary gloves, including that terrible itch you’ve had just above your left nostril.

When you get home, put your clothes and shoes in a bag, get a handful of those nasty wipes, look both ways to make sure the neighbors can’t see you in your underwear, and wipe down the door handles. Then go take a shower, and you can really scratch that nose itch.

I’m going to try to step away from the data now, because I’ve said my piece. The only reason I would see to update this post is if something happens dramatically different from what I now expect.

Have a nice Spring.

Notes


  1. Using my free, open-source Python package ade, Asynchronous Differential Evolution. 

  2. Many thanks to Ng Yi Kai Aaron, an applied statistician in Singapore, for introducing me to the Ziff and Ziff approach.