Monday, April 6, 2020

Portrait of a Pandemic

Time is a river, a violent current of events, glimpsed once and already carried past us, and another follows and is gone.
—Marcus Aurelius, Meditations
Update, 4/8, 5:45 PM PDT: With the latest data from Johns Hopkins, the latest NightMare Plot is not different enough from the one originally included with this post to warrant editing the post at all. I’m not going to run the numbers again for all the countries here, but of course was curious as to how things are looking in my own. The answer: still not good, even though the curve is continuing to bend downward slightly.1 Today’s 429,052 cases was 10% below what the model projected (three days ago!) that today’s number would be. (That error is in terms of the projected increase, which was 30% rather than the actual 27%.) The model still projects reaching the half-million mark around April 10 and a million cases by month end.

A model I’ve developed for naively fitting to the number of new cases of Covid-19 each day modifies the widely-accepted logistic growth model with a smooth growth-regime transition. It has been closely tracking the dramatic spread of the virus in the United States and other heavily affected countries while accounting for the “flattening of the curve” that is becoming apparent to a varying extent in each of those countries.

The modified model scales the reproduction parameter r by a hyperbolic tangent function of time, with a midpoint time t0 and a half-transition time th. (All time-based parameters are in units of days.) The scaling begins near 1.0 (full traditional r reproduction rate) and transitions to some lower fractional value rf as social distancing and lockdowns force the virus into a lower-growth regime that approaches an effective rate of rfr.

The population-limited parameter L of the logistic growth model remains, though it is still not nearly as consequential or well-defined in the curves of most countries as the curve-flattening effect of rf.2 Finally, there is a very small constant term b, a fixed number of new cases per day. That is mostly included to zero out the mean of the residuals so that I could get decent goodness-of-fit indicators for the model.

The result is a logistic growth model that incorporates the “flattening of the curve” now apparent even in U.S. data by effecting a smooth transition from the original uncontrolled growth rate to a later, lower one:

xd(t, x) = r*x*(1-x/L) * flatten(t) + b
flatten(t) = 0.5*rf*(1-tanh(1.1*(t-t0)/th))-rf+1

For details, development history, context, and disclaimers, see my series of previous Covid-19 blog posts, starting with the most recent one on April 2.3

Below are projections from the model for various countries, based on Johns Hopkins data from last evening, April 5. I will start with my own mess of a country, not just because I am a citizen of it but because it now has the most cases of Covid-19 on the planet and almost surely will continue to as this nightmare continues.

The United States of America

Extrapolating from its remarkably close fit with data going back more than two weeks now, the model is projecting half a million cases4 by April 10 and a million cases by the end of the month. The residuals (errors between how closely the model fitted to past data versus what the data actually was) are of a completely normal distribution.

Kenneth Regan, professor of mathematics at SUNY Buffalo, suggested that it would be useful to monitor the ratio of new cases reported on a given day to all cases under treatment that day. The idea is that it would reveal progress in a way that is more resistant to issues like how good the reporting is or how widespread the testing is, and that it would show when the data starts to show a peak.5

Although I don’t have data for a number of all cases under treatment on a given day, I’ve tried to implement the spirit of Professor Regan’s idea with a fourth subplot at the bottom of what I’ve been calling the Nightmare Plot, showing how large each day’s new cases are as a percentage of the cumulative number of cases ever reported by that date. It’s an instructive additional visualization, and the suggestion was much appreciated. And as you can see, that metric has been tracking since since 3/19 to within a few percent of what the model expected it would have been. Thankfully, that red curve is heading steadily downward.

The Nightmare Plot: United States of America, 4/5/20 data

As with any extrapolation from time-series data without reference to underlying theory or events, the model isn’t making perfect projections. With parameters evolved to data from the day before yesterday (4/4) when there were 308,850 U.S. reported cases, the model projected 346,464 cases yesterday. Thankfully, it was pessimistic by 33%; there are “only” 337,072 total (cumulative) cases being reported in last evening’s Johns Hopkins data.

Wait a minute, you might be thinking, the difference between 346,464 and 337,072 seems like a lot less than 33%. True. It’s only 2.8%. But that would give the model too much credit, because it was extrapolating from a fixed, known number that was 91.6% of yesterday’s actual number of cases. The model only gets credit (or blame) for its projected increase from that point, which it said would be 37,614 but actually was 28,222.

This error is a welcome departure from the previous few days when the model had been making dire next-day projections that were uncomfortably close to what actually happened. With data from a day earlier (4/3) when there were 275,586 cases, the model projected there would be 311,138 the day before yesterday (4/4). That represented an error (again, considering only the increase) of 6.9% (also on the pessimistic side). With the data available on 4/2, the model projected that the 243,453 cases that day would become 275,799 on 4/3, an error of just 0.66% in the expected increase.6 That one felt a little spooky.

When you are modeling numbers that represent hundreds of thousands of your fellow citizens getting very sick, you want reality to come in below your projections. I have been hoping for some evidence that the curve is bending downward, and finally it is here. Not by much; that Nightmare Plot still goes to well over a million cases before the month is over.

But I’ll take it. I’m in a much better mood than when I was freaking out over the numbers being so close to my projections for a couple days in a row. The model has been just a bit pessimistic and required some updates to its parameters to better track reduced growth, and that might continue to happen as the days progress and people finally take this thing seriously. Down, curve, down!

———

And with that good mood, I will allow myself some pride in how closely my modification to the logistic growth model is tracking to many more data points than one might expect from its six parameters. Look at that top subplot: There is no more than 8% error between what the model is fitting to the data and what the data actually has been for the past twelve days, going back to 3/24. In that span of time, the numbers being modeled have increased by a factor of six. The 4/4 data fitted the model to within a 5% error going just as many days into the past (back to 3/23). It was pretty much the same with data from the day before, with a comparably low level of errors going back to 3/22.

My evolutionary curve-fitting algorithm does two things to counter the effect of the more recent values (much larger than earlier ones due to exponential growth) having a disproportionate amount of weight. First, the model is for the number of new cases each day, not the cumulative number of total cases ever. It is a differential equation xd(t,x), not x(t). So, to have it project the cumulative number of cases x(t), I integrate the differential equation forward or backward from a fixed point represented by the most recent known number of cases, a point that it is anchored to.7 The number of new daily cases is increasing dramatically, but not quite as dramatically as the cumulative number of cases. So that helps keep the emphasis from being quite as much on more recent data.

Second, I apply a square-root transform to both the modeled and actual new-case numbers before computing the sum of squared error (SSE) between them and then evolving the parameters for minimum SSE. The 28,222 new cases we had yesterday had four times as much influence on the curve fit as the 6,421 new cases we had on 3/17, not sixteen times as much.

Despite these efforts, the algorithm is still going to fit recent data more closely, and that fit has been very close indeed, with just 1% maximum error over the past six days. Think about it; for nearly a week as the number of cases has more than doubled, the model stays within 1% of what actually happened as it traces its smooth growth curve backwards in time.

Due to its emphasis on fitting to later data points, the model strays a bit more, percentage-wise, as we peer into the ancient history of early March. But I’m certainly not ashamed of it expecting a little less than half the 217 cases we had one month ago as it looks further backwards from the harsh reality of over a thousand times as many cases now.8 Imagine you were looking at this blog post back then and there was a model telling you that there would be “only” 150,000 cases today. Would you now be upset that it somehow failed to convey the magnitude of the situation to you? I didn’t think so.

———

So, what is making me happier now, with numbers still climbing exponentially and no leveling-off in sight on the right side of that Nightmare Plot? I am hearing the whisper of encouraging things in the combination of six parameters that my software has evolved for the model based on yesterday’s data. The curve is indeed flattening a little bit, despite the stupidity of a lot of Republican governors, nutjob pastors, and spring-break partiers.

Let’s take a quick look at those parameters for ourselves.

USA, 4/5: Final, former, and failed parameter values vs sum of squared error

Each of these six subplots shows the values of one model parameter versus the sum of squared error (SSE) between modeled values and actual values going back just over four weeks. The subplots are zoomed in so that small SSE differences are evident between members of the final population (red dots) and parameter values that had once been part of the population over the course of the 75 generations of evolution (blue dots). Also shown (small black dots) are unsuccessful challengers that never made it into any population, but whose failed parameter combinations are instructive for showing how SSE varies with parameter values.

These plots make it clear that there are some well-defined ranges for all of the model parameters except L, the maximum number of possible cases, which at worst case is limited by the country’s population. The reason L isn’t better defined is that there is still no evidence in the time-series data of a fixed upper limit to the ultimate number of U.S. cases. The best we can infer from the distribution of L values in the upper-left subplot is that the fit starts to become less plausible if we assume an ultimate upper limit of less than 20 million cases.

Another model parameter that is more of a useful nuisance is b, a very small constant number of new cases per day. With every single model I’ve tried thus far, including this latest and hopefully final one, the parameter evolution favors smaller and smaller values of b tending toward zero. That’s evident in the upper right subplot. This parameter does its job of zeroing-out the mean (average) of the residuals (second subplot from the top in the Nightmare Plot) but that’s pretty much it.

Now onto the really interesting parameters. First, of course, is our old standby r, the initial reproduction or growth rate before any curve-flattening or (eventually) population-limiting begins to take effect. In the U.S., the model says that the number of cases was increasing by around 45% per day for a couple weeks after the first modeled date of 3/5. Looking again at the top of the Nightmare Plot, you can see that the model was assuming too high of a growth rate back then. The blue line was increasing exponentially (which appears as a straight line on a logarithmic plot), but at less than 45% per day. The curve-fitting algorithm tolerated this error as it fiddled with the parameter values to get its remarkably close fit to the data later on.

And when it got that remarkably close fit is when we approached the 61-62 days after 1/22/20 that evolved as an optimal value of parameter t0. That happened around March 23. Over the span of a week either direction from that date (th of around 14 days), the growth rate transitioned halfway from its initial 45% per day down to a post-flattening growth rate that’s lower by an impressive 90%. As you can see at the bottom of the Nightmare Plot, the number of reported cases increased by 8.4% yesterday, and that growth rate is expected to continue dropping, though more slowly now.9

For a deadly virus that is right now killing thousands of my fellow citizens, a 90% drop in growth rate seems like good news indeed, even as the escalating numbers will likely continue to stress us all out for the rest of April and probably beyond.

Now let’s take a quick look at Nightmare Plots for other countries that have been impacted by Covid-19. In some of them, the model has problems fitting to the data. I’ve extended the curve-fitting interval for each country as far back as possible without allowing clearly implausible pauses and jumps in reported-case numbers to mess everything up. There’s no avoiding such pauses and jumps that have occurred more recently, like France’s sudden discovery of more than a quarter of its cases on 4/4 alone, but the differential evolution algorithm and the model do their best with what they have to work with.

I will leave you with final thought to consider as you scroll down and look through how everybody else is doing outside the U.S. None of these other countries–not even Iran with its contemptible mullahs–is being run by anyone so utterly incompetent as our failed trust-fund game show host with his obvious cognitive limitations, profound ignorance, contempt for science and sound public policy, and pathological narcissism that drives every single thing he does. Let’s please not make that mistake again, OK?

Spain

The second-most impacted country behind the good old US of A is Spain with its 126,168 reported cases yesterday.

With parameters evolved to data from a day earlier when there were 119,199 cases, the model projected 127,844 cases on 4/4. The model overestimated the increase by 24%. That’s not as impressive as the model’s fit with recent data: No more than 2% error going back to 3/29, and no more than an astoundingly low 7% going all the way back to March 15. The residuals are indistinguishable from completely random Gaussian noise.

Spain has flattened that curve pretty well recently and the projected numbers don’t look too bad, perhaps doubling over the next month.

Please note that Spain had 500 cases to our 402 back on March 7, when Donald Trump responded as follows to a reporter’s question about whether he was “concerned that the virus is getting closer to the White House and D.C.”: No, I’m not concerned at all. No, I’m not. No, we’ve done a great job.

Well, if “a great job” means not getting people tested, telling his followers to carry on as usual and causing sycophant Southern Republican governors to delay action until the magnitude of the problem became obvious even to them, and now bullying states about getting the life-saving equipment they need as our numbers are now nearly three times larger and headed for probably at least six times larger over the next month, sure, what a fine and excellent job you’ve done, you fucking moron.

Italy

The model follows a modest recent flattening of the curve, half of which occurred over a one-week interval centered on 3/24. The plot shows that there was also some very slight flattening around 3/15 and 3/12. As the noted group of epidemiologists Pink Floyd observed, “You are only coming through in waves.”

The residuals are not significantly non-normal, though they definitely “fan out” with higher predicted numbers of new daily cases. The effect of this can be seen in the upper subplot, where the model does not track earlier cases that well with their little waves of increasing and then decreasing growth rates.

The constant b evolves to an unusually high range here, reflecting a best-fit solution based on a fairly limited recent data set. The model’s simplistic assumption of a constant 600 new cases per day still works even for the first date shown (3/13). The data modeled and shown doesn’t go back earlier in March because there were large discontinuities before then that provide a questionable basis for curve-fitting.

Italy has had a very difficult time with overtaxed hospitals and lots of people dying. But at least their new cases have been significantly slowing down, as shown in the bottom subplot. Even though they can expect to wind up with several times more of their citizens ultimately reported as infected by Covid-19, it looks like they are at least in the latter stages of having slowed down their rate of growth.

Germany

Germany’s curve is quite impressive, with a significant degree of flattening. Look at this value of rf!

Germany, 4/5 data: Values of parameter rf vs SSE

In addition to demonstrating how much better it is for a country to elect an accomplished scientist as one’s leader rather than a failed casino owner spouting absolute lunacy10 about Covid-19 just six weeks ago, this plot shows that the parameter L doesn’t seem to be contributing much to the model. The curve flattening is entirely a result of the growth-regime transition, nearly to zero.

France

There’s been some jarring discontinuities in France’s data recently, so the model is not doing very well with it. The residuals are definitely not Gaussian random variation, which you can see just by looking at the right side of the residuals subplot, in addition to the essentially zero p value.

The projection is included in the interest of fairness, to make sure that non-impressive results are shown as well. If there is anything definite to be taken away from France’s Nightmare Plot, it’s that they are definitely not out of the woods yet.

Iran

Iran has been surfing waves of infection for weeks now. Just look at the periodicity in the upper subplot and the residuals plot below it! Despite that, there is no significant evidence of non-normality in the residuals. What I don’t like about the way the parameters have evolved is how much they focus on an abrupt growth-regime transition back in early March. It’s completely artificial to assume that the growth rate would drop by half over the course of a single day, and yet that’s what the model sees and so there it went.

This is the price one pays for doing naive modeling of time-series data. You make no assumptions about the underlying events or theory, and you get what you get. As the other plots show, that often works amazingly well, and it might still be working well here, but I’m a little suspicious about any projections for Iran’s future cases. If for no other reason than that having so many waves of infection means that they can expect more totally unpredictable waves in the future. So, perhaps three times as many cases in a month as now, if nothing else weird happens? Or maybe twice or half as many at that?

United Kingdom

Another instance of an artificially abrupt transition to a lower-growth regime, with one important difference. In the UK, the values of L in the final population have a nicely bounded range, and it’s got a pretty low upper end, well below the country’s population of 67.9 million. The rf parameter is also well-defined, but relatively modest; the model is doing much of its work with conventional logistic-growth behavior.

UK, 4/5 data: Values of parameter L vs SSE

Looks good to me, even if I’m not inuiting any such population upper limit just looking at the UK’s numbers thus far. Maybe the evolution of model parameters over 75 generations is seeing something my eye isn’t. I hope so.

South Korea

These residuals are definitely not normal variation. That’s probably because the model spends so much time tracking the very small increases that have occurred in the past month. Unlike us, the South Koreans knew what they were doing, and did it fast. There’s no further slowdown in growth rate ahead, but it is tiny at this point. Looking good!

Singapore

Singapore had a relatively bad day yesterday, with an abrupt jump of nearly twice as many new cases as in previous days. But their numbers of daily cases are still small.

The residuals are normal, but I’m not convinced that there has been any growth-regime transition like the model shows in the bottom subplot, though. It just doesn’t look right. And the model’s deviation from older data gives me pause.

Still, a projection of perhaps 10,000 total cases by early next month doesn’t seem too bad to me, even if it winds up being several times that for some reason the model can’t possibly account for right now.

Finland

Finally, little Finland, included not because it has been severely impacted by Covid-19 but because it’s the land of some of my ancestors and some of my friends. The model isn’t tracking Finland’s data particularly well, with residuals that don’t appear at all to be normal random variation. You don’t need the small p value to see that; just look at the discontinuities in the number of reported cases in the top subplot, and the huge momentary spike in new reported cases on 4/4.

The model projects less than 10,000 cases by May, but I honestly don’t think that projection is worth a whole lot right now. If the data comes down in the next couple days, it might be worth looking at again. Meanwhile, Finns will continue practicing the kind of social distancing they always have, long before anyone ever heard of Coronavirus.

Notes


  1. Even worse as far as the state of our fracturing Republic is concerned. I have no models or data for we may be seeing in the weeks and months ahead during any summertime reprieve we may get from the virus, when there will be no such reprieve from the deranged narcissist and his jackbooted thugs in his cabinet, the Senate, the Supreme Court, and gerrymandered GOP fiefdoms like Wisconsin.

    What will this loathsome cult of corruption and idiot-worship that mutated out of the party of Lincoln and Theodore Roosevelt do after November 4–regardless of which way the electoral college votes after whatever passes for elections in the swing states–if virus cases are rebounding like it’s 1918 all over again with economic conditions more like it’s 1929? I hope if there are sides to be taken, fellow citizen, you will be found on the side of our beloved Constitution. The sacred founding text of a nation that has been under attack for decades, by this President and his predecessors–with its first, second, fourth, and fifth Amendments and a rich history of case law from many decades of carefully reasoned decisions in between times (unfortunately including our own) when the Supreme Court allowed its political colors to show underneath those somber black robes. 

  2. Every parameter in a model should “earn its keep” by having its own robust independent effect on fitness of the curve to the data. Of all the six parameters, L is the least compelling to keep. Its role in limiting the ultimate pervasiveness of the virus is currently being overshadowed by the growth-reduction aspect of the model. But I’m leaving it in for now, looking for signs of its eventual emergence, because it ultimately will serve a purpose as the pandemic finally heads into end stage. 

  3. The full history of posts, going backward in time, is: Modeling a Slightly Flattening COVID-19 Curve (4/2), Into the Rapids (3/25), Pandemic (3/22), and the original Applying the Logistic Growth Model to Covid-19 (3/19). 

  4. Throughout this blog post “cases” refers to the number of Covid-19 cases reported for the particular country under discussion, as provided by the data freely released to the public by Johns Hopkins University. See their Terms of Use, which I have adopted as my own with of course a suitable change in names.

    I will repeat yet again that I have no expertise in biology, medicine, or the spread of infectious disease, and so I try not to speculate on how many people have actually gotten infected with this thing without it ever being reported. Again, I’m just a retired engineer who has spent years constructing nonlinear models of mostly electronic things and believes this one is a pretty well grounded for doing the following, and only the following: predicting what will happen if the data continues as it has recently, especially as it has in the past two weeks

  5. Thanks to my dear friend and former boss Louis J. Hoffman for conveying this suggestion back to me from Prof. Regan, along with permission to give him credit for it. 

  6. It was on the pessimistic side, as the model almost always has been, but by such a small amount in that case as to hardly seem worth mentioning. If you are curious, the model was extrapolating the 4/3 data out to a projected 350,510 cases yesterday. Pessimistic again, but not by much: 14% over two days when the actual increase was nearly a hundred thousand cases. 

  7. This is known as an “initial value problem,” the initial value here being the last known number of cases. You can go either direction in time from the initial value. For fitting the model parameters, my algorithm goes backwards from the most recent data. To extrapolate forward and make projections, it goes forward from the same “initial value.” 

  8. The model anchors to the most recent known data point, which for this discussion is last evening’s Johns Hopkins data (4/5) with its 337,072 U.S. reported cases. Remember, the model’s differential equation xd(t,x) is for the expected number of new cases each day, not the cumulative number of cases x(t). So, to have it project the cumulative number of cases, I integrate the differential equation forward or backward from that last fixed point.

    To have it integrating that differential equation backwards a full month and winding up in the same neighborhood with the mere 217 cases we had then, shrinking its backwards projections by more than a thousand in the process, seems pretty remarkable to me. 

  9. In my previous post, I wrote “in praise of the hyperbolic tangent function” for nonlinear modeling, and how I’ve used it for electronic circuit simulation. Turns out that tanh( . . .) is also quite useful for gradually transitioning from initially unrestrained exponential growth to a lower-growth regime resulting from social distancing and quarantine. 

  10. By the last week of February, “he criticized CNN and MSNBC for ‘panicking markets.’ He said at a South Carolina rally falsely that ‘the Democrat policy of open borders’ had brought the virus into the country. He lashed out at ‘Do Nothing Democrat comrades.’ He tweeted about ‘Cryin’ Chuck Schumer,’ mocking Schumer for arguing that Trump should be more aggressive in fighting the virus. The next week, Trump would blame an Obama administration regulation for slowing the production of test kits. There was no truth to the charge.”

    “Throughout late February, Trump also continued to claim the situation was improving. On Feb. 26, he said: ‘We’re going down, not up. We’re going very substantially down, not up.’ On Feb. 27, he predicted: ‘It’s going to disappear. One day it’s like a miracle it will disappear.’ On Feb. 29, he said a vaccine would be available ‘very quickly’ and ‘very rapidly’ and praised his administration’s actions as ‘the most aggressive taken by any country.’ None of these claims were true.” David Leonhardt, A Complete List of Trump’s Attempts to Play Down Coronavirus, The New York Times (March 15, 2020).