Sunday, April 19, 2020

Coming Through in Waves

You are only coming through in waves
Your lips move but I can’t hear what you’re saying
—Roger Waters, Comfortably Numb (Pink Floyd, 1980)
Update, 4/20/20: Rather than dwell on the accuracy or lack of same in the model’s projections anymore–as if this were some kind of grand chess match where the Knights and Bishops don’t actually trample and lance each other–I will just offer the latest Nightmare Plot with a few observations. The one-million mark has received yet another delay, of one day to 4/28, and the model still projects 1.5M reported cases sometime in the second week of May, probably toward the end of that week. The curve continues to fit past data uncannily well even if it has shortcomings with extrapolations more than a few days out.

Most importantly, I am starting to question the wisdom of continuing these updates of a “reported cases” statistic that I’ve begun to suspect is being influenced by delayed and limited testing as much or more as repression of the virus. I do not want to participate in drawing attention to any flattening of a curve if what is pushing it down is a disconnect–whether due to cold political calculations or incompetence–between what that curve actually shows and the actual magnitude of this national and global crisis. Perhaps in Hollywood it doesn’t matter if the curves are real, but this is an important one indeed.

In my previous post a little over a week ago, I summarized the “logistic growth with growth-regime transition” model I’ve developed for reported cases of Covid-19 and offered–with many disclaimers that I include now as well–a few projections:

three quarters of a million cases one week from today, and over a million the week after that. The model’s projection–both now and as it stood on 4/5–is that there will be around a million and a half Americans–one in every two hundred–reporting infection with Covid-19 in early May, with the number still climbing faster every day.

The exact projection for one week from 4/10 (to a degree of precision only useful for evaluating past performance) was 758,921 reported cases in the U.S. On 4/17, there were 699,706. The model was 29% pessimistic about the week’s increase. Instead of 262,386 more cases, there were “only” 203,171.

This works out to a (geometric) average of 3.7% per day. In other words, if the model were that pessimistic every day, projecting 3.7% more than the number of cases that actually were reported and without updating its parameters to try to do better the next day, we would wind up with this much of an error.

That doesn’t seem very far off to me.

With its parameters evolved to fit today’s data (4/19), the model is projecting that the million-case mark will be reached by 4/27, a few days later than it expected a week ago. As far as its projection of 1.5M cases goes, it expects that sometime in the second week of May. Perhaps not quite “early May,” but there’s good reason to use imprecise language when making projections.1

Here’s the latest Nightmare Plot. As always, you can click on it to see the important details.

April 19, 2020: Reported U.S. Covid-19 cases vs days since 1/22

Testing, Testing . . .

Unfortunately, I’m starting to wonder if this modeling of reported cases may be doing more harm than good. The comments in the next few paragraphs are from a layman in the relevant fields of biology, medicine, what’s actually going on with this supremely fucked-up White House, etc. But I think it’s time to discuss reported vs actual cases of Covid-19 in the United States.

The fact that the numbers of daily new cases have fairly suddenly hit a ceiling of around 30,000 for about a week now make me suspicious that my model has become one for how fast we’re able to get people tested, and not for a realistic metric of how many people are infected. If the places doing the testing are unable or unwilling to process more than a certain number per day, of course that is going to result in a flattened curve.

Meanwhile, the numbers of actual infections and sick people waiting for their positive test result may be continuing to soar, unseen by all but victims’ frantic families and, when they finally give up on dealing with it at home, medical personnel who may not be testing even symptomatic patients. The testing backlog may grow and grow, and plots like mine will make it look like everything’s under control because the number of reported cases isn’t rising much at all.

But that can only go on for so long. Eventually, if there is indeed an explosion of actual cases that’s hidden behind the reported-cases curve, it will become obvious to millions of Americans that we’re actually fucked and this thing is going to kill a lot of people. It’s already certain that it will kill many more Americans–perhaps more per capita, too–than citizens of any other country.

I wonder about the armed lunkheads now doing protests agitated by shadowy right-wing groups at the encouragement of their cult leader2 who clearly doesn’t care about anyone but himself and his hold on power. What would their slow little minds do with that undeniable reality after making fools of themselves and infecting themselves, their families, and friends? What sort of toxic dangerous bullshit will their hero be spewing then to cover yet another embarassment he couldn’t possibly bring himself to face up to like a man, after a long pathetic lifetime of dodging one failure after another?

Track Record

Now, let’s go back to the neat little fantasyland of reported cases, because that’s all I am really qualified to deal with at this point. (I may switch to reported deaths soon; that seems harder to hide.)

In case you’re curious about how the model has been doing with more recent projections of what today’s number would be, here are a few.

On 4/18 (yesterday), the model projected 763,369 cases. There were 759,086. The model was pessimistic by 16% about the number of new cases there would be today.

The previous day, 4/17, the model projected 759,553 cases today, an error (in the increase) of 0.8% for the two days. It was on the pessimistic side–as if that matters with such a small error.

On 4/16, the model projected that there would be 752,522 cases today. The error in the increase (optimistic that time) was 12.4%. But that was over the course of three days, a time interval when the number of reported cases increased nearly sixty thousand. The (geometric) mean error per day was 4%.

———

It may be worth considering some other projections to other dates. The model’s next-day projection on 4/16 was for there to be 696,432 cases on 4/17, which was too optimistic by 10%. The previous day, 4/15, the model projected 689,026 cases on 4/17, an average daily error (in the increase) of 9.7%, also on the optimistic side. And on 4/14, when there were 605,193 cases, the model was projecting 679,654 on 4/17, an average daily error of 8.3%, again only counting the projected vs actual increase and not the absolute number of cases, again on the optimistic side.

Those errors are the result of some bigger new-case numbers coming in than would have been expected had the data conformed to the gradual decrease in growth rate that the model had evolved its parameters to fit each day. Each time a new day’s Johns Hopkins data came in, the parameters got changed to better fit the curve, and that resulted in next-day projections that wound up being lower than what actually happened.

What happened is at least in part that the official U.S. case counts began to “include both confirmed and probable cases and deaths.”3 It doesn’t seem that many states have as yet switched to this new reporting protocol, but it surely has and will continue to increase the numbers my model sees from the Johns Hopkins data.

There was a point where the model’s projection for how many cases there would be on 4/17 ceased to be optimistic and landed pretty much right on. That was 4/12, with 555,313 cases, before those larger numbers started showing up this past week. Then the model projected 701,642 cases on 4/17. That projected increase was off by +1.3%, or an average of 0.3% per day. (It was pessmistic, but by so little it hardly seems worth mentioning which way it was off.)

So, what did the model say about today–one week later–with its parameters evolved to data from 4/12? Its projection was that we would be seeing 755,274 reported cases in the United States today. Again, there were 759,086. That’s an error in the projected (199,961) vs actual (203,773) increase of 1.9%, or 0.3% per day.

I think that counts as a successful, if horrific, extrapolation.

The Sky is the Limit

With data from a couple days ago (4/17), I thought of the Futurama TV show and how Professor Farnsworth often announces some challenging and arguably terrible development by saying, “Good news, everyone!” I was tempted to say the same about the scatter plot of values vs SSE for the model parameter L.

As you may recall, L and r are parameters for the conventional logistic growth component of the model. L is the maximum number of possible cases, which at worst case is limited by the country’s population. The other parameter r is the initial exponential growth rate, before any curve-flattening becomes apparent.

Well, for the first time I was seeing the hints of a possible range for L, and hence a projection by the model of an ultimate worst-case number of U.S. reported cases. That was the “good” part of the “Good news” cackled by a goggle-eyed cartoon character. The challenging and arguably terrible part was that this faint preliminary outline of a range for L, the maximum number of U.S. reported cases ever, seemed to be anywhere from 50 to 200 million.4

Unfortunately, even that bit of good news is no longer apparent in the scatter plot for L. Indeed, the best fit combination of parameters after nearly 22,000 simulation runs, has its value of L pretty much at the whole U.S. population. That’s basically right at the upper limit of the parameter range.

We might take a long time to get there, says the model with its parameters evolved to the data for the past week or so, but we will go very high indeed before we’re done.

I won’t include such outlandish projections in the Nightmare Plot, though, because too much can and will change between now and then. Like, say, the President of the United States tweeting out calls for insurrection against their state governments5 and convincing his cult followers to go out and act like it’s just no big deal that three quarters of a million6 of their fellow citizens have now contracted a virus that has killed nearly half as many as those who are officially listed as “recovered.”7

Looking Back vs Forward

Or, Why you should perhaps consider taking this model seriously

It’s been an interesting experience posting these updates to the r/Coronavirus subreddit. Each posting gets this information to a few thousand people, so it still seems worthwhile to continue doing so. Better yet, I have gotten some valuable constructive criticism, sprinkled in between a few stupid lazy insults. Some of that criticism has resulted in direct improvements, such as the inclusion of a normality test on the residuals. (Turns out they are about as normal as it gets, with a p value of over 0.9.) And some of the comments are just plain funny, like this one from “derphurr” regarding my April 10 blog post:

To be fair I’ve been following your numbers from the beginning post. Your 1M cases date has been moving outwards every week/ updated graph.

It might be like fusion energy, it is technically possible we never get there.

Unfortunately, I think the model will beat fusion. (The classic line is “Fusion energy is 30 years away and always will be.”) Its current projection for a million U.S. cases is just a few days later than it was when that comment was made: April 27 rather than April 23.

Another redditor “chalion” from Argentina quoted this from the 4/10 blog post: “On 4/3, the projection was for there to be just over 600,000 cases today, compared to the 496,535 we have had reported at this point in the U.S. Quite a bit off, but remember, that was looking forward a full week.”

Admitting it was harsh (and later graciously apologizing for that), he or she then offered this critique of my extrapolations:

It’s a model that can’t predict a week by a large margin. Clearly it isn’t working. Its just adjusting it daily to past data but says nothing about the future.

He isn’t making a model of the epidemic, it’s making a tool to find a curve close to the current data. Its good and interesting but I don’t think it’s a model of the evolution of this thing.

I responded, and he or she did in kind (very reasonably), and I’ll quote that exchange for you. But first let me observe that the critique of the 4/3 projection remains valid. The model was expecting a little more than twice as many cases as have been reported thus far. The curve flattened a lot more since then. I won’t complain about that at all, even if it somewhat proves the model’s limitations, at least for extrapolations more than a few days out.

Here’s what I said in response:

You raise some interesting points. I don’t think your assessment in that last paragraph is unfair.

This would only be a model of the epidemic under the assumption that the reported cases will continue to increase in the fashion they have for the past two weeks. If they do, then it will have modeled the epidemic. If they don’t, then something else has occurred to cause the numbers to change, in a way that the model failed to incorporate.

Updating the parameters to fit the new situation is a natural way to continue the process of evolving them. This doesn’t change what the model predicted with the previous parameters. If the model is within 8% of the next day’s number of new cases, I don’t see that as having no predictive value, as “not working.” It seems to me it works quite well for short-term projections.

As far as its alleged inability to predict one week out (from 4/3), the error of its projecting 346,635 new cases rather than the 220,949 that actually got reported between 4/3 and 4/10 is considerable, 56%. I would just ask that the magnitude of that actual increase be considered, too, when passing judgment: the number of cases nearly doubled (1.8x) in that time. It was projecting the number to increase by 2.26x. Not by, say, extrapolating straight out from the 13.5% daily increase that had happened the previous two days and projecting 2.4x. The model was moving the curve, and in the right direction, just not as much as the data then indicated it would.

I’ve decided to take seriously your harshly-worded comment and give it the detailed response it deserves. You do make some good points, and there’s definitely room for humility with the longer-term performance of this model thus far. I’m glad you at least thought it was good and interesting in its own way, and I’d like it to be looked at seriously as an approach for something we both at least agree it works for, fitting a curve close to data of this Covid-19 U.S. epidemic thus far, even if we part company when it comes to extrapolating from that fit.

I will continue to ask for a bit of forbearance about even the large inaccuracy from 4/2 to today. We are still dealing with largely exponential growth, and even the actual reported numbers have been exploding. The model was projecting that we’d have a bit more than 6x as many cases today as we did then, and what we have is around 2.8x as many. If the model had kept on extrapolating that 13.5% daily increase we were having then, seventeen days ago, it would have projected 8.6x as many. It was accounting for some of the curve-flattening that happened, but not enough.

Now, here’s what my interlocutor, who was watching this two months ago as an employee of Argentina’s Ministry of Health, said in response to that:

Yes. I have to agree with you. You are right in every point you make. Even when you describe my wording as harsh. I’m sorry for that.

I’m trying the same as you, modelling Covid19 in my country (Argentina) and I don’t think my results are better than yours. Maybe that was that frustration talking.

I will look at yours model in more detail tomorrow and try to make constructive criticism and not what I did

Keep the effort. And thanks for your thoughtful answer.

Thank you, sir or madam, for the gracious words and also for reminding me about the very real limitations of extrapolating a curve that’s been fitted to time-series data. As a gesture of appreciation, I ran the model on Argentina’s data today and you can see the plot here. I can see why your modeling efforts have gotten you frustrated; my model doesn’t fit what’s been (officially) happening in your country that well either, with a fat-tailed residual distribution that is unlikely (p=0.039) to be normal random variation. Maybe it’s all the jumps in the time-series data for your country, which fortunately still has a tiny number of cases per capita.

Now, even if it’s true that the model is only fitting close to current data and can’t be trusted for what it says will happen in the future,8 I do think it’s done a damn fine job of characterizing what’s happened thus far, at least in the U.S. Just take a glance at that plot above (the top subplot) and the percentage errors in what it’s fitted to the past month of daily data, with just six parameters.

The best fit is essentially a perfect one going back nearly a week. Then it has errors in the single digit percentages for more than another week, for all of April and the last few days of March.

The error between modeled and actual reported cases doesn’t reach 20% until you go back to 3/14. That’s a long time ago, in pandemic time. The number of cases has multiplied 278 times since then. It almost certainly has one more doubling left to go. If there isn’t some new level of curve flattening that my model can’t anticipate with its best fit to today’s data, there will be more doublings to come.9 Meanwhile, the deranged narcissist in the White House is pushing to get everybody out there infecting each other again.

Hang on, my fellow citizens of the world’s newest and biggest failed state. I think we may be in for a long and bumpy ride.

Notes


  1. The data has been coming in pretty much as expected, and these projections are the same as they have been for the past two days. 

  2. “A trio of far-right, pro-gun provocateurs is behind some of the largest Facebook groups calling for anti-quarantine protests around the country, offering the latest illustration that some seemingly organic demonstrations are being engineered by a network of conservative activists.

    “The Facebook groups target Wisconsin, Ohio, Pennsylvania and New York, and they appear to be the work of Ben Dorr, the political director of a group called ‘Minnesota Gun Rights,’ and his siblings, Christopher and Aaron. By Sunday, the groups had roughly 200,000 members combined, and they continued to expand quickly, days after President Trump endorsed such protests by suggesting citizens should ‘liberate’ their states.” Washington Post, “Pro-gun activists using Facebook groups to push anti-quarantine protests,” 4/19/​20. 

  3. This is explained in some detail on the Worldometer COVID-19 site. 

  4. You can take a look for yourself at the scatter plot of L values vs SSE, after 100 generations of evolution to 4/17 Johns Hopkins data. Members of the final population are shown in red, while earlier, replaced individuals from earlier generations are shown in blue. Most of the individuals who–as will likely be the case for many a Trump fan gathered with fellow idiots to protest the sensible pandemic response of his governor–have been displaced by natural selection have low enough fitness (high SSE) that they are not included in this plot. 

  5. Not by accident, I live in one of those blue states that actually has its shit together. We have been seeing around 1% new vs total cases per day, which is a fifth of the national average. Here’s what our non-idiot governor, for whom I will be honored to cast a vote (by mail, duh) in November, had to say today:

    “The president’s statements this morning encourage illegal and dangerous acts. He is putting millions of people in danger of contracting COVID-19. His unhinged rantings and calls for people to “liberate” states could also lead to violence. We’ve seen it before.

    “The president is fomenting domestic rebellion and spreading lies even while his own administration says the virus is real and is deadly, and that we have a long way to go before restrictions can be lifted.

    “Just yesterday, the president stood alongside White House officials and public health experts and said science would guide his plan for easing restrictions. The White House released a sensible plan laying out many of the guidelines that I agree are essential to follow, as we work to resume economic activity. Trump slowly read his script and said the plan was based on ‘hard, verifiable data’ and was done ‘in consultation with scientists, experts and medical professionals across government.’

    “Less than 24 hours later, the president is off the rails. He’s not quoting scientists and doctors but spewing dangerous, anti-democratic rhetoric.

    “We appreciate our continued communication with the vice president, Dr. Birx, Admiral Polowczyk, Admiral Giroir and others in the federal government, but their work is undermined by the president’s irresponsible statements.

    “I hope someday we can look at today’s meltdown as something to be pitied, rather than condemned. But we don’t have that luxury today. There is too much at stake.

    “The president’s call to action today threatens to undermine his own goal of recovery by further delaying the ability of states to amend current interventions in a safe, evidence-based way. His words are likely to cause COVID-19 infections to spike in places where social distancing is working — and if infections are increasing in those places, that will further postpone the 14 days of decline that his own guidance says is necessary before modifying any interventions.

    “I hope political leaders of all sorts will speak out firmly against the president’s calls for rebellion. Americans need to work together to protect each other. It’s the only way to slow the spread of this deadly virus and get us on the road to recovery.” 

  6. And that’s only the number with serious enough symptoms to get their infection confirmed by a test, which remain difficult to get in many parts of the country. 

  7. According to the Worldometer on 4/19 at 7:36 PM Pacific time, there have been 40,555 deaths from Covid-19 and 71,012 recovered, with hundreds of thousands of people fighting for their lives right now. Many of them will not make it. But sure, assholes, go hang out at the beach with the crowd. 

  8. Even I would advise caution about looking to its projections more than a week or so out. 

  9. I’m just a layman about such things, but I understand that the onset of Summer in the U.S. is one such possibility, with warmer temperatures that kill off the virus faster on surfaces. Sure hope so, but then we have Autumn to look forward to, and the example of the 1918 Influenza with its deadly second wave.