Wednesday, March 25, 2020

Into the Rapids

To every thing there is a season, and a time to every purpose under the heaven.
—Ecclesiastes 3:1

Update, March 26

No, I couldn’t help myself; this blog post reflects an update done March 26.

I am trying to kick the Covid-19 data modeling habit. It’s just not good for my mental health, or probably yours, at this point. If you have been taking the magnitude of this disaster seriously, keep doing so. What more, really, is there useful to say?

Anyhow, in a moment of weakness I put this evening’s Johns Hopkins data into the model-parameter evolver and turned the crank. And out came the latest update to the Nightmare Plot, which I am putting right here at the top, and leaving the one with yesterday’s where it is.

March 26, 2020: Reported U.S. Covid-19 cases vs days since 1/22

Today’s number was 83,836 vs the 82,716 I’d predicted yesterday. The model was 1.4% off, and on the optimistic side this time. The residuals of modeled vs actual data do not fan out and are not significantly outside a normal distribution. A previous run of the same model with parameters evolved to fit data from the day before yesterday predicted 89,614, which was pessimistic by 6.9% over the course of two days’ extrapolation.

The universe is not yet honoring my request to be proved wrong with the curve bending down soon.

Back to March 25

Since the previous blog post, I did some work on my modeling of reported U.S. Covid-19 cases, ran the scary new example for my evolutionary parameter-finding algorithm on that evening’s data from Johns Hopkins, and posted a plot of the results on Facebook. An updated one with today’s data is here (click on it for the full-size version):

March 25, 2020: Reported U.S. Covid-19 cases vs days since 1/22

After discussing some technical details1 of the day’s work, I gave this

brief summary about what the model is predicting for reported numbers of reported Covid-19 cases in the U.S.: Tomorrow, around 66,000. Day after that, around 84,000. Reaching 100,000 the day after that (3/27). A million U.S. reported cases around April 7, doubling every 4-5 days for a while until the virus starts to run out of people to infect.

As I write this, the Worldometer Coronavirus page reports 65,797 cases in the United States.2 I don’t need to tell you how close that is to what the model was predicting yesterday.

The astute observer will be able to visually extend the red line and note that my model predicts more U.S. cases than people in the U.S. around the middle of May. Obviously, that’s not going to happen. It’s a limitation of my modeling that it doesn’t account for population of the country or region.

I’ve limited the amount of extrapolation I bother to show or talk about, since extrapolations can be uncertain even when the underlying model is well understood theoretically. But the obviously impossible prediction beyond the right edge of the plot does have a practical meaning. And it is an important one: I currently don’t see any sign of a slowdown appearing in the data. I wish I did.

Any refinements made to the model to account for such real-world practicalities as a country’s population would deviate from what I’ve been saying about what it actually does: simply predict what will happen if the reported-case data continues as it has, for about two weeks now. (Yesterday, there were slightly fewer newly reported cases than the day before, an obvious anomaly in the data.)

So, what I am seeing today is what I saw days ago already, and it terrifies me. The upward march of numbers continues, along a faint path increasingly visible on a cold gray mountain full of death.

Model Parameters

A couple of points are worth mentioning about model parameters. There is still no plausible upper bound to L, the limiting number of reported cases for the logistic-growth part of the model. Weirdly, there seems to be no plausible lower bound, either; some parameter combinations in the final population have almost no logistic component. Not something I’m very comfortable with, but there is clearly an exponential component to this, and the best few parameter combinations in the final population have L values ranging from 80 million to approaching the full U.S. population.

Here is the final population of both parameters L and r for the logistic-growth component of the model, plotted against SSE (sum of squared error, after the square-root transform is applied to modeled and actual data values):

Logistic growth parameters for March 26 data

The power-law with exponential decay component has its own parameter weirdness worth noting, too. The value of the power n in the function xd(t)=a*(t-ts)^n*exp(-(t-ts)/t0) is considerably lower than what Ziff and Ziff found for a power-law model by itself.3 Indeed, its best-fit values seems to be far less than even linear. Honestly, I’m not sure what to make of that. A very gradual increase in the number of newly reported cases each day over time, perhaps due to improved testing? Again, interpretation is left up to you.

Power-law parameters for March 26 data

Finally, the poor oddball constant term b is still there, just barely, adding its tiny fixed number of new cases per day. Against my own intuition, leaving this term out noticeably worsens the goodness of fit. I believe its utility is in reducing the SSE contribution made by early data points in the time series, allowing a better fit for the exponential and power-law components when things finally start ramping up.

Constant parameter for March 26 data

If you are comfortable with Python and data modeling, I encourage you to clone yourself a copy of the repo for ADE on GitHub, install the package with “pip install -e ade”, modified as needed.4 Run the newly installed ade-examples script, and then run “./covid19.py US” from inside the “~/ade-examples” directory that the script creates. Then you can look at the parameter values that get evolved using the pv command, like this: “pv -r 1.5 covid19.dat”.

The Fantasy of Normalcy

This afternoon [March 25], I had to go out for three unavoidable errands. I hope this is the last time I have to do that for a little while. One of my stops in the hot zone that our world is fast becoming was to the local grocery store. I sat in the parking lot and pulled on a fresh pair of latex gloves, and paused putting my respirator on my head for a moment, because I was seeing something disturbing.

It wasn’t that things were different, it’s that things were too much the same. Here I was, with every reason to believe that, on this day next week, there would be 300,000 Americans reported as infected with an insidious hidden disease that kills one out of every hundred or so who get it, a disease that puts over a dozen of those into the hospital gasping for breath. That there would likely be over a million reporting it the very week after, with the numbers still climbing fast and many more people not yet or not ever getting their own infections included in these numbers.

So I sat there a while with my well-used dirt-stained herbicide respirator5 in my lap, preparing to encounter in all likelihood at least one person carrying this thing around inside that store. Today’s person feeling fine though I sure have been tired the last couple days is next week’s positive test result.

I put on the mask, stepped outside, picked up my three plastic snap-lid storage bins and carried them into the store past the incredulous faces of people just walking around getting stuff. I stood apart from others and waited as they got their shopping carts and then got mine. I put the three plastic tubs in the cart, waited for the old man–looks like you’ve got about an 85% chance6 of surviving a case of this, gramps–to move along more than 6 feet away from me. I headed straight back to the department with the things that we decided we really couldn’t do without, pausing to give everyone a wide berth and just not giving a fuck what they thought about my respirator mask or gloves.

There were two checkout lines, both full. I just stood there projecting an invisible sphere of stay the fuck away from me and the whole man-from-mars appearance. Those two social distancers worked wonderfully except for one not-young woman who I had to outright tell, “Please move away.” And I don’t think she was even into me.

My muffled Martian voice came through the mask, “I’d like to box up my own.” The terrified-looking cashier handed me my stuff as she scanned it and I put it in my plastic tubs, with a loud snap each time I shut the lids. Those tubs never touched the floor.

No, I’m not a rewards club member. Thanks for that receipt that I’m going to crumple in my gloved hands and throw away. Have a nice day.

I unlocked the Jeep and got out a plastic can of sanitizer wipes. I opened the back, and picked up one tub at a time from the cart, wiping it down before setting it into the Jeep. I closed the back and then sanitizer-wiped all of the handles I’d touched with my gloved hands. Then I opened the door again, took off my windbreaker, and stripped down to my swim trunks that I was wearing instead of underwear. The windbreaker and my pants and shoes went into a plastic tub all their own.

I carefully removed each glove, being careful not to touch anything but the edge dangling loose around my wrist, and dropped them into the tub with the dirty clothes. Yes, folks, there is a 50-something-year-old man wearing swim trunks in March in your friendly neighborhood grocery store parking lot. Snap. Lid closed.

Then I put on a fresh pair of pants that were in the Jeep along with a fresh pair of shoes, and finally then sat in the driver’s seat.

The other two stops were much easier. Our local pharmacy clearly understands what’s happening better than the folks at the grocery store, because they had a sign outside inviting customers to take curbside delivery of their medications. And that’s what I did. The pharmacy clerk waved from the passenger side of my Jeep, I unrolled the window and extended an open Ziploc bag for him to drop the prescription bag into. We urged each other to take care and I didn’t even feel like I needed a glove on the hand that held the bag (mine, that enclosed his).

You’re probably wondering what happened with the groceries that were inside those plastic tubs. Answer: bleach solution and rags, followed by plain water and more rags to rinse off the bleach. There were a couple of nonperishables that we aren’t going to need for a while, and with no freezing temperatures in the forecast for a few days, they’re going to stay outside.

Here’s how I explained all of this to my college-age daughter, who I believe has gotten just a little bit less scornful of her doomer father’s pronouncements in recent days: Tonight, I will not lie awake wishing I hadn’t been so insanely careful.

Here’s what remains with me, though, instead of anxiety about having picked up Covid-19 this afternoon. (I really don’t think so.) I am dealing with the angry dreadful realization that my conservative, backwoods community either doesn’t know or doesn’t care about what is heading their way. All those people, casually walking around Safeway, one pair of gloves to be seen anywhere besides my own. And then there was me, marching in there with a cart of plastic tubs, wearing that neon-pink industrial respirator mask and surgical gloves like something you’d be scared of even without knowing that a pandemic was just starting to cripple your country.

I’ll say it again: Today’s person feeling mostly fine is next week’s positive test result. And next week I am expecting7 there to be around 300,000 of those positive test results. Going beyond the model for a moment and just talking out of my ass, I’d bet there will be twice that many walking around feeling mostly fine if starting to worry that maybe this whole Coronavirus thing isn’t a Democrat hoax after all.8 And they will spawn, mostly without knowing, the following week’s new ever-larger batch of positive test results.

This is as good a time as any to repeat, yet again, that I have no expertise in biology, medicine, or infectious disease. I’m just a retired engineer who has always had a passion for modeling interesting data. I’m currently about to release a free, open-source Python package that does some really cool modeling of power MOSFETs. See the disclaimer stuff in my previous posts.

Also worth repeating: The model is just the integration of a first-order differential equation that can be extrapolated to predict reported U.S. cases of Covid-19 if the reported cases continue as they have for the past two weeks. That said, I feel obligated to share one layman’s thought that gives me hope.

Perhaps the testing is accelerating faster than the virus’s replication and thus the model is being overly pessimistic in terms of real cases. In this cheery scenario, the testing is really taking off lately, lots more people being tested every day, and so yes there are more reported cases. But then the testing will start just humming along efficiently and the number of reported cases won’t keep inflating itself so fast.

There is, unfortunately, a dark alternative scenario: The testing is unable to keep up with the true replication of the virus. Yes, an accelerating number of new tests is finally appearing, but it’s not keeping up with how fast people are getting infected. Hospitals get overwhelmed, doctors stop bothering to test. And my stupid scary model winds up underestimating how fast we actually are getting to Armageddon.

This, gentle reader, is why (1) I have stuck to just predicting numbers of reported cases and leaving the rest to you, and (2) why I am eager to conclude this little modeling project. For my own mental well-being in an anxious time.

Farewell

After I took my shower and finally felt really clean, after the doorknobs were all wiped down with sanitizer wipes and the food was put away, I started doing “just a little more” work on my modeling code. Then I took a break from it and saw tonight’s total for US reported cases: 65,797. That’s 99.6% of what yesterday’s model and data had predicted.9

There is a weird unsettled feeling involved with seeing a predictive tool one has developed–not without criticism from a few self-proclaimed experts on Reddit–make a prediction so uncannily close to the mark.10 It’s a slowly roiling fog of horror with jarring bright spots of pride, and the guilty unease at that fact that I am seeing lights in there at all at a time like this.

This has happened before, on March 23. Then I wrote on Facebook,

the Worldometer Coronavirus update currently lists 43,734 U.S. cases and my model had predicted 43,639 reported U.S. cases for today’s (still nonexistent) Johns Hopkins numbers. This sort of uncannily close tracking with the data leaves a weird sense of anxious conflicted satisfaction in my mind.

It would be so much better for everyone, including myself, if I were instead being proven wrong. Feeling naive and embarrassed would be a pretty small price to pay to see the curve bending downwards faster than all my modeling predicted. Unfortunately, it still isn’t.

Take that, I find myself thinking of the random Internet guy who pooh-poohed my modeling as naive, unfounded, something only someone not properly educated in the relevant area would do. And then I look at the parts of the plot further to the right and realize what continuing to be right means.

So, let me say something for the record, to critics and fans and that little semi-living armored glob of RNA that has found a really effective way for it to propagate copies of itself: I’m totally fine with my model being full of shit for everything that happens from now on. Really, I can deal with it. I get it–it’s completely naive to try to extrapolate the integration of a nonlinear differential equation from cases already reported to what cases might be reported in the future. A charming if silly and perhaps even a little egomaniacal exercise on the part of a retired engineer with an obsession for developing nonlinear models for complicated things.

When the fancy theoretical factors that I didn’t even bother trying to understand finally emerge in a few days (hell, how about tomorrow?) and that curve finally gets its long-awaited downward bend, I will sit and watch movies and feel sorry for myself and the embarrassing way I’ve never quite managed to grow up and leave things to the experts. But it’s okay, and when we top out at maybe a couple hundred thousand U.S. cases and then the kids go back to school, I will immerse myself in weightlifting or something and trying to forget what an idiot I can be sometimes.

Next month, over the Sunday morning breakfast table with some friends, I’m looking forward to having a laugh about how silly were all being. Please be gentle with my own ego when it happens, and fervently hope that it really does.

Meanwhile, I’m done here. The time window for these sorts of projections to make a difference is closing fast, even for those who are paying attention to them. It may have closed already. Those people walking around the grocery store will not have their fates altered by something I write tonight, or whether I decide right now that it’s best for my own sanity not to trouble myself to write about this anymore.

Be well. And, for God’s sake, please do not vote for this same idiot narcissist failure of a President to stay in the White House next year. If that needs any explanation at all in your mind at this point, you need to pay attention to a lot more than just my amateurish attempt at Covid-19 modeling.

Notes


  1. Instead of weighting later samples higher in the sum-of-squared error calculation for the fitness function, I applied a square root transform to the number of new cases per day each day. That somewhat mitigates the effect of exponential growth to put much more emphasis on the most recent days’ data while allowing me to study the residuals of modeled vs actual daily new cases. 

  2. The Johns Hopkins total came in at 65,778. Running the model again when the new data became available after the original writing of this blog post did not result in different enough projections to warrant anything but a couple of footnotes and of course an update to the Nightmare Plot. 

  3. Anna L. Ziff and Robert M. Ziff (“Fractal kinetics of COVID-19 pandemic (with update 3/1/20). They fit to an exponent just a bit greater than 3. Perhaps it shouldn’t be surprising that the exponent would fit much lower than that with in combination with a logistic growth component, but I still wouldn’t have expected n (what Ziff and Ziff call x) to be down in the cube/​quad root range. 

  4. The last argument ade can be changed to whatever directory your cloned repo is in where the setup.py file for the package resides. 

  5. A typical Spring (not this one!) has me outside spraying noxious invasive weeds with a solution of 2,4D herbicide. The idea of a mere plant being “invasive” seems a bit quaint right now, though I’ll surely keep going after my remaining holdouts of spotted knapweed and St. John’s wort for many a Spring to come. 

  6. Based on some stats provided here, the death rate for people 80+ years old is 14.8%. And, by the way, gramps, it is 60% higher for men than for women. 

  7. See my previous posts for context and disclaimer. Of course you knew I’d eventually say that somewhere. 

  8. Yes, the fucking moron really did call it that: “The Democrats are politicizing the coronavirus . . . This is their new hoax” (Feb. 28, 2020). 

  9. The Nightmare Plot from my March 22 was predicting just over 70,000 for 3/25, a bit higher than it is. This is in accordance with the overall trend (a good one!) of the model being proved slightly pessimistic in its extrapolations. But, before you go celebrating some history of erroneous pessimism on my part, the plot from my March 19 blog post optimistically predicted (with an earlier regressive model of cumulative case numbers, rather than the differential equation approach I now prefer) only around 51,000 cases. 

  10. See my Reddit posts yesterday and today, referencing this blog post. Now, if I post a comment to that blog referencing this footnote, do I have to update this footnote to reflect that? And then post another Reddit comment? And so on? 

Sunday, March 22, 2020

Pandemic

I must die, and must I die groaning too?–Be fettered. Must I be lamenting too?–Exiled. And what hinders me, then, but that I may go smiling, and cheerful, and serene?
—Epictetus, Discourses.
Some pandemics are mild. But some are fierce. If the virus replicates much faster than the immune system learns to defend against it, it will cause severe and sometimes fatal illness, resulting in a pestilence that could easily claim more lives in a single year than AIDS did in 25 [years]. Epidemiologists have warned that the next pandemic could sicken one in every three people on the planet, hospitalize many of those and kill tens to hundreds of millions. The disease would spare no nation, race or income group. There would be no certain way to avoid infection.
—“Preparing for a Pandemic,” W. Wayt Gibbs and Christine Soares, Scientific American 293(5), 44-54 (Nov. 2005).
TL;DR: A very good fit between data obtained on March 19 from Johns Hopkins University and a logistic+linear growth model indicates there there will nearly 60,000 reported cases of Covid-19 in the United States on March 24, around 300,000 cases eight days after that (4/3), and several million cases around mid April, with the numbers doubling every 5-6 days or so for at least another couple weeks. See the Nightmare Plot and the Disclaimer below.
Update, 3/22, 6:25 PM PDT: With the latest data from Johns Hopkins this evening, the latest NightMare Plot is not different enough from the one originally included with the post to warrant editing the post text. It’s been a long day spent with Covid-19. Today’s reported cases came in at 33,272, not significantly off from what the model had projected with yesterday’s data. The refined model parameters have caused a slight reduction in projected numbers more than a few days out, not by much. If anything, it continues to be very disturbing to see one’s modeling borne out by the relentless upward march of reported cases. Time for a break from this now.

I worked much of yesterday and this morning on a more sophisticated modeling approach than in my previous post, integrating a differential equation f(t,x) for the number of new cases per day, on each day, rather than the total number of reported cases each day.

Running the updated code with Johns Hopkins University data published yesterday (3/21) resulted in an updated Nightmare Plot.

Reported U.S. Covid-19 cases vs days since Jan. 22, 2020

Now, this is one really important plot. It shows up way too small in this blog post for you to be able to see its important details. So please click or tap on it to open it as an image by itself.

Please bear with me as I largely repeat a few paragraphs from the previous post. The upper subplot shows the best-fit logistic growth model in red, with the actual number of cumulative reported cases of Covid-19 in blue. The error between the model and the data is shown with each annotation. Look how small the residuals are compared to the exponentially rising numbers of cases. It’s a scarily impressive fit.

The lower subplot shows the number of cases expected to be reported over time, slightly in the past and then extrapolating to the future. Two hundred and fifty generations of running a differential evolution algorithm1 resulted in a 120-member population of combinations of parameters for the model.

I could have terminated the algorithm sooner, and then there would be some visible variation in the extrapolations. But I decided to just plow onwards for five times as many generations to be more certain of finding something really close to optimal. The black dots show expected reported-case values with each separate parameter combination represented by the 120-member population, plotted in bunches around each day from tomorrow out to mid-May.

Significantly, the subplots both have a logarithmic y-axis. Exponential growth is linear when viewed through a logarithmic lens. When you see that straight line marching steadily upward toward those massive numbers, you really want all your modeling to wind up an embarrassing public failure.

More Power to You

Now, the model includes:

  1. a logistic growth component, as before,

  2. a power-law with exponential decay component, as suggested by Anna L. Ziff and Robert M. Ziff (“Fractal kinetics of COVID-19 pandemic (with update 3/1/20)”),

  3. and a linear component with a small constant number of new cases being reported per day, which only helps improve the closeness of fit early on.2

I tried Ziff and Ziff’s approach by itself but was not impressed with the closeness of fit to the data thus far. This thing is still very much exponential.

With exponential growth, the power-law behavior is not some more-than-squared increase with time but with itself. When the number of cases grows exponentially, as it has been in the U.S. for about two weeks now, the rapidly increasing number of reported cases feeds on itself. Infected people result in infected people, who then result in still more infected (and infectious!) people.

A power-law approach is only nonlinear in time, not in itself. Sure, the number of new cases will increase dramatically as the days go on, this model says. But it won’t be feeding on itself. The increase is just a function of time passing, like the days suddenly getting longer in Spring. It’s not like an exponential forest fire where what is being consumed also takes its turns consuming.

I made a good-faith attempt to switch covid19.py to the Vazquez (2006) “power-law with an exponential cutoff.” Any petty pride as a data modeler to have one’s first instincts bettered is pushed aside in the hope that it would prove more accurate and perhaps less scary than the logistic growth model I’d been using. Unfortunately, it didn’t seem to fit the data as well as the logistic growth model does. What I did find to be an improvement, however, was a blended model that included both.

So perhaps Ziff and Ziff are correct when they “tend to predict an S-shaped curve with a tapering off in the near future as is being seen.” Perhaps there are “fractal kinestics” at play that contribute some significant power-law behavior to the data we have now. It doesn’t have to mean that is the only biological or epidemiological factor at play.

The Pretty New Model

To repeat myself, I don’t have any expertise in the relevant areas. But I naively assume that a pandemic can have more than one driving factor. And so I now propose to simply add the power-law modeling as one of two components (plus a perhaps gratuitous constant) of a differential equation model for the number of new cases per day, each day. The other component is, of course the logistic growth model.

This results in a model with seven parameters. It’s a first-order differential equation:

xd(t, x) = curve_powerlaw(t) + curve_logistic(t, x) + b
curve_logistic(t, x) = x*r*(1 - x/L)
curve_powerlaw(t) = a * (t-ts)^n * exp(-(t-ts)/t0)

With any sort of modeling, one must guard against the temptation to overfit the data with ever more sophisticated models. But I believe these 7 parameters all earn their keep with the current behavior of COVID-19. More parameters are not necessarily bad; my MOSFET models have 40+ parameters, all necessary to simulate the behavior of a semiconductor device with very complex underlying physics.

Great (Actually, Shitty) Expectations

So, here’s what I personally expect, if the number of reported cases continues to match the model as well as it has in the past week or so. We will remain in full-on exponential growth for a few more days. Then, around the end of the month, we will see things starting to slowing down just a little. But the slowdown will only be in exponential terms, unfortunately, not in the linear way we usually think about things. There will still be more new cases every day then there were the previous day, for weeks to come. It’s just that the daily increases in the number of new cases will themselves stop increasing quite so fast.

I’m expecting a hundred thousand reported U.S. cases by 3/26. That’s up from the 60,000 that I was expecting–when I last updated my predictions two days ago–we would be seeing by then. My projection for 4/2 is essentially unchanged at around 400,000 cases, and I’m thinking the million-case mark will likely be reached by 4/7 instead of between 4/4 and 4/6. Again, this is still just modeling reported cases, not all of them.

There still does not seem to be any convincing upper limit before the population of the U.S. is approached, sometime in May. To put it in a purely technical way, that is really fucking scary.

Conclusion

Thanks again to my new Facebook friend Ng Yi Kai Aaron, an applied statistician in Singapore, for suggesting I look into the power-law modeling approach. Again, I’ve partially incorporated that into the model, but not entirely because it doesn’t seem to fit the data on its own, not for U.S. cases at least.

At the end of my last blog post, I got a little philosophical. I suggested that a front-row seat watching history get made in one of the shittiest ways imaginable is definitely something not to be passed up. Did you ever wonder what it would be like to watch (or feel socially compelled to watch) half-naked desperate men flail away at each other with weird instruments of death, until one of them was indeed dead? How about hearing the swoosh of the guillotine outside your Paris apartment, followed each time by the roar of an angry mob? Bracing stuff. Sucked to be there, actually, a detail which the history books tend to omit.

So here we are. The pandemic of 2020 is just getting underway. I hope you stocked up on popcorn. I also hope you carefully read my Disclaimer in the previous blog post, because it applies to everything I say here, too. And remember:

The model simply predicts what will happen if the data continues as it has for a little over a week now.

That’s it. The interpretation and explanation is up to you.

I want to add a couple of words about your behavior and the possibility of you having a response something like this: “Well, then I’ll get it anyhow so why bother being careful.”

First, you absolutely do not have to get it. I still believe that the model will have to be adjust in the future to reflect the then-apparent new reality that people finally got freaked out enough to take this seriously, deciding to risk boredom, a shortage of twinkies, or even getting minir health conditions addressed because it’s become apparent how much getting Covid-19 sucks, even for a younger person.

Or maybe we will have enforced lockdowns as this administration finally wakes up (but then see my history of blog and Facebook posts on Trump’s authoritarianism). It’s already starting to happen in the state level, not the shitty wannabe dictatorship I fear from the deranged narcissist but reasonable if drastic measures by grown adults who take their office seriously. Including, perhaps surprisingly, not a few Republicans.

If you can possibly wait this out for a while (of course you’ll need to get some things, see below for what I do), your statistics will start to look better as the number of new cases finally starts to drop each day. It will be a bit like getting to roll two dice instead of just one. Each passing week out of your self-imposed stay-healthy near-quarantine will get a little less nerve-wracking if things continue to go your way. As the saying went in Hunger Games, may the odds be ever in your favor.

And as for “Why bother if you get it anyway?”, I personally would rather have my hospital stay sometime in, say, June when the number of new cases is finally dropping. When doctors have gotten experienced with treating this thing and are recovered (the ones who make it) and immune. I want the people saving my life not to be fearing for their own. And if you are really into the hermit life, you could wait for that vaccine in about a year.

So, what is a reasonable action to take? Of course, we need to get stuff from time to time. There are very few true hermits anymore, and even preppers are going to find themselves wondering why they didn’t buy more Peanut M&Ms or tampons. I can just tell you what I do and you can laugh at me if you want, or perhaps there is something sensible. And remember the disclaimer, goddammit, and that I’m not a doctor or anything like that.

Before it became obvious even to me what a monster this pandemic was going to be, I was fortunate enough to have my wife buy a box of latex gloves. One hundred gloves equals fifty trips outside my van. That will be more than enough, because I’m not going out very often.

When I get to my destination, I park the van, give my nose a good itching, and put on a pair of the gloves. An open paper garbage bag sits in front of the center console, ready to receive the gloves when I’m done. I open the van door, step outside, and shut the door. I go into my pharmacy or grocery store (there’s really no other place I’d bother with at this point) taking advantage of automatic doors wherever possible. Or, failing that, I’ll be honest and say that I try to sort of just be behind somebody whose already opening the door themselves. Why thank you!

I go get what I need, touching as little as possible, staying the fuck away from everyone else. Don’t even look at me like you’re gonna cough. Maybe for this reason alone, of those N95 masks I bought months ago for slash burning would be useful. When it’s needed, I pull my credit card out of a Ziplock bag and put it back in while trying not to have it touch the bag.

I do not care what people think about my wearing gloves. At this point, they’re actually probably jealous, or maybe thinking I’ve got a big stash of them that I hoarded from everybody else. Well, I wasn’t even planning on buying the one box.

Now, if you don’t have them, pretend you do, but these gloves have some weird invisible stuff on them that burns your skin on contact.

These imaginary gloves are 100% effective but you just don’t want to touch them except on the inside. You will touch everything you need to, including your credit card, through these imaginary gloves. You don’t want to touch things that you will be touching without the gloves on, because that will leave the invisible stinging stuff on your steering wheel and then your hands will still hurt in a few minutes. So, only stuff you’ll never have to touch again, like that door handle or Harvey Weinstein.

Except there is one solution to the problem of having to touch some things with and without the imaginary gloves on. The stinging stuff washes off with those really strong sanitizing wipes. You just have to wash off any of the stuff you get on your hands, before it starts to hurt. And then your door handle will be clean and so will your hands, ready for you to touch without the gloves on.

The time you take off the imaginary gloves is of course the time when you put a little glob of hand sanitizer into your palms and then rub the stuff around everywhere. Honestly, at this point, I’d probably be doing it twice, and then wiping down the exterior of the hand sanitizer bottle with some leftover sanitizer. Only after all this can you consider your hands ready to touch something without the imaginary gloves, including that terrible itch you’ve had just above your left nostril.

When you get home, put your clothes and shoes in a bag, get a handful of those nasty wipes, look both ways to make sure the neighbors can’t see you in your underwear, and wipe down the door handles. Then go take a shower, and you can really scratch that nose itch.

I’m going to try to step away from the data now, because I’ve said my piece. The only reason I would see to update this post is if something happens dramatically different from what I now expect.

Have a nice Spring.

Notes


  1. Using my free, open-source Python package ade, Asynchronous Differential Evolution. 

  2. Many thanks to Ng Yi Kai Aaron, an applied statistician in Singapore, for introducing me to the Ziff and Ziff approach. 

Thursday, March 19, 2020

Applying the Logistic Growth Model to Covid-19

The chief task in life is simply this: to identify and separate matters so that I can say clearly to myself which are externals not under my control, and which have to do with the choices I actually control.
—Epictetus, Discourses.
Dad, you’re just some guy who knows how to obsess over numbers. We have actual people who are experts at this stuff. Go and write it if you want, but don’t feel like you have to!
—Daughter of Ed Suominen, March 2020.
TL;DR: A very good fit between data obtained on March 19 from Johns Hopkins University and a logistic+linear growth model indicates there there will be over 50,000 reported cases of Covid-19 in the United States on March 25, over 300,000 cases one week after that (4/1), and several million cases by early April. See the Nightmare Plot and the Disclaimer below.
Update, March 20: There was a significant uptick in U.S. cases today, bringing us to a total of 13,677 according to Johns Hopkins data provided this evening. The increase is more than was expected this time, and the jump is significant enough that I would rather not publish results with the logistic growth model fitted to the latest data. Doing so results in projections that are considerably higher than what the rest of this blog post discusses. I will wait for tomorrow’s data and then perhaps consider a modification to the model if we get unexpectedly high numbers again, one possibility being to change the linear term to a power-law one of the form a*t^b. That might reflect the effect of better testing without forcing an artificially high value for the exponential model parameter k.1
———

On March 15, I wrote to some friends on Facebook about the latest results of putting my computer evolution code and skills to the task of finding parameters for the logistic growth model as applied to the number of U.S. reported cases of Covid-19. My belief–speaking as someone with expertise in fitting nonlinear models to data but not any kind of expert in the fields of biology, medicine, or infection disease–was that we would reach 10,000 cases on March 17, and would have reported cases numbering in the hundreds of thousands by March 29.2 By the end of April, I believed there would likely be millions of Americans being reported as having this virus. The rate of growth I thought would be unlikely to even start slowing down before April.

The prediction for the one date that has come and gone was not quite accurate. On March 17, there were 6,421 cases reported in the U.S., a little less than two-thirds of what the model said was most likely. But, in my defense, I ask you to look back at yourself enjoying a Sunday in mid-March. Would you have been truly untroubled just two days ago by hearing that six thousand of your fellow Americans would have a deadly respiratory infection that puts a fifth of its hosts into hospitalization? The model was pessimistic but not ridiculous.

Two days earlier, on March 13, I had introduced the project to my Facebook friends, prefacing the discussion with the acknowledgment that I’m not a biologist, or a doctor, or an infectious disease expert of any kind. Just a retired engineer and inventor who knows how to write Python code and has been working on modeling and simulation for over a year now.

After seeing predictions ranging from dismissive to hysterical about the Coronavirus, I saw a useful if sobering application example of a tool that I’d written specifically for my electronic simulation work, ADE. This new example would apply the fairly well-known “logistic growth model”3 to what has now bloomed into a pandemic.

Writing and running covid19.py forced me to some stark conclusions: In one week (3/20), I said, we would be likely to have over 20,000 U.S. cases. A week later (3/27), around 10 times that. “By the first few days of April we could very plausibly hit the one million mark. There will certainly be nobody saying this is just like the flu by then.”

The most I was–and am–willing to guarantee about those predictions, however, is that the red line in the plot I included with the post is a nearly optimal fit of the function f(t)=L/(1+exp(-k*(t-t0)))+at to the number of cases versus time provided by Johns Hopkins university, including an update made that evening.

So how did that prediction fare? With data updated Thursday evening (3/19), it now appears that there will be around 13,500 reported cases on March 20. Again the model is pessimistic, with 68% as many cases reported as expected to be. But, again, even the lower one is a huge number of people getting very sick all of a sudden. Were you expecting anything like that just five days ago?

If not, don’t feel bad. There was an excellent reason why you might have been surprised then at what is now clearly plausible to anyone looking at the plot below: Your President was telling you that it was no big deal.

With the data available on March 13, the logistic+linear growth model predicted there would be around 200,000 U.S. cases on March 27. That is a little more than double what the model’s current best fit says is most likely. Again, the model was and still may be pessimistic, predicting too many cases. But again, even 90,000 or so infected Americans–with probably 20,000 of them very sick and at least a thousand of those dying and thousands left with permanent lung damage–is a very big deal. And the virus will still just be getting started.

Yesterday, March 18, I released the first version of this blog post with the projection that there would be nearly 15,000 cases tomorrow (3/20). (Actually 14,538.) Once again, the model was a bit pessimistic; the current projection is for 13,549 cases, or 93% as much.4 And the longer-term projections are slightly lower, which is moving in the direction we all want, though only by a little bit.5

———

So, on March 19, here is what the admittedly pessimistic logistic+linear growth model now says, based on Johns Hopkins data updated this evening. The numbers are all in reported U.S. cases:

  • The day after tomorrow (3/21), there will be nearly 18,000 cases.

  • In one week (3/26), there will be more than 60,000 cases.

  • In two weeks (4/2), there will be over 400,000 cases.

  • We will reach the million mark between April 4 and 6.6

  • On April 11, there will be five million cases.7

  • The U.S. outbreak won’t even begin slowing down until mid-April at the earliest. In other words, there will be increasing numbers of new cases until probably around April 24 when there are finally fewer new cases one day than there were the day before.

  • The ultimate number of Americans being reported as being infected by the novel Coronavirus will ultimately reach several tens of millions.

This is some scary shit. And it may be even worse than it looks right now. What the data show, and the model is fitted to, is the number of reported cases; several days ago, some experts in the Seattle area were saying that the number of true cases in Washington State to be several times the number being tested and reported.8 Isn’t it reasonable to expect that to remain largely true? Our medical system will almost certainly become overloaded and the focus will simply turn to saving those lives that can be saved, as it has already in Italy.

But all this is just me talking, not the model. It makes no assumptions or judgments about the data. It doesn’t care if some political situation has caused fewer tests, or suddenly more tests. It doesn’t care about an idiot chief executive downplaying the danger and thus encouraging its spread (at least among his cult following), then abruptly deciding to join the adults in the room.9

The model simply predicts what will happen if the data continues as it has recently, especially as it has in the past few days.

That’s it. The interpretation and explanation is up to you.

The Nightmare Plot

Returning to the model and its neat little world of reported cases, here is a plot from a simulation I ran this evening, whose results I summarized in the bullet points above. It should make you listen very carefully to what you are being told by medical experts about social distancing, washing your hands, not touching your face, and staying the fuck home.

Now, this is one really important plot. It shows up way too small in this blog post for you to be able to see its important details. So please click or tap on it to open it as an image by itself.

Reported U.S. Covid-19 cases vs days since Jan. 22, 2020

You can also click here to see the plot with data from yesterday, 3/18. Open them in two tabs of your browser and then switch between to see how the model is holding up.

The upper subplot shows the best-fit logistic growth model in red, with the actual number of cumulative reported cases of Covid-19 in blue. The error between the model and the data is shown with each annotation. Look how small the residuals are compared to the exponentially rising numbers of cases. It’s a scarily impressive fit, even if the model has proved a bit pessimistic thus far.

The lower subplot shows the number of cases expected to be reported over time, slightly in the past and then extrapolating to the future. Fifty generations of running a differential evolution algorithm10 resulted in a 120-member population of combinations of parameters for the model. I deliberately terminated the algorithm sooner than I would otherwise so that there would be some visible variation in the extrapolations. The black dots show expected reported-case values with parameters from each member of the population, plotted at a bunch of random times from 3/12 to early April.

Significantly, the subplots both have a logarithmic y-axis. Exponential growth is linear when viewed through a logarithmic lense. When you see that straight line marching steadily upward toward those massive numbers, you really want all your modeling to wind up an embarrassing public failure.

Covering my Posterior

A better way to model this might have been to use a Monte Carlo analysis (e.g., with the Metropolis-Hastings algorithm) to obtain posterior probability distributions for the parameters, and then run a bunch of extrapolations based on parameters drawn from the distributions. But I had the tools handy for using ADE instead; I’ve been wrapping up a year-long project modeling power semiconductor devices using it with the free Ngspice simulation software. So this is what I have to offer, and it seems plenty illuminating to me.

But even without having posterior distributions to draw random variates from, what I am seeing in the scatter plots of value vs SSE for parameter L is not reasssuring. That parameter represents the total number of cases expected to ever be reported. And the data we have, with its steady logarithmic-scale march upward, is not satisfying my computer evolution algorithm that there is any upper limit before the nation’s entire population is infected.

SSE vs value: Parameter L (3/18 data)

Simply put, this thing is currently showing no signs of slowing down anytime soon. It is very possible, even likely, that these values of L are due more to genetic drift than any optimality-of-fit of the modeling they represent.11

A word of explanation of this scatter plot: The red dots hugging the left side of the plot are values (y-axis) of L in the final population of parameter combinations, plotted against the sum of squared error (x-axis) that those combinations had vs the data. The distribution of values seems to indicate that we shouldn’t hope for less than several million U.S. cases, and that we can’t count on any upper limit before the virus runs out of hosts to infect.12

There is a fair amount of correlation between the model parameter t0 and two other parameters, k and L. The parameter k represents how drastic the exponential behavior is; higher values cause things to blow up faster and thus start to reach limits sooner. Thus the highest values of k in the final population are associated with somewhat lower values of t0. The time when the number of new daily cases reaches its maximum happens a few days earlier.

Regarding the correlation between parameters t0 and L, a positive-valued one this time, it simply makes sense to realize that increasing new cases longer before you finally start to slow down the increases is associated with having more people ultimately infected.

Reasons Why Things Might Not Be So Bad

I want to emphasize that there is also the distinct possibility of L coming down by a lot within the next couple of days. (Unfortunately, I thought it would do that a couple days ago already, but it’s done the opposite.) It could still happen for a couple of reasons I can think of:

  • A curtailing effect becoming apparent soon from containment measures that just aren’t being noticed quite yet due to the incubation period.

  • A sudden recent increase in the number of reported cases due to testing finally being available. The rate of tested vs actual may be increasing, not just the absolute number of people testing positive. This would mean that the model is currently getting fitted to an overly dire set of parameters (especially L) due more to recent dramatic increases in reported cases from better testing than exponential spread of the virus.

And there are probably many more reasons I haven’t even imagined why that curve might start bending down sooner than in this simulation. Again, I need to emphasize my lack of biological or medical expertise. And this leads to . . .

The Disclaimer

First, I disclaim everything that John Hopkins does when offering the data on which this analysis is based.13 I’m pretty sure their lawyers had good reason for putting that stuff in there, so I’m going to repeat it. Except think “Ed Suominen” when you are reading “The Johns Hopkins University”, and this blog post when you read “the Website.”

This GitHub repo and its contents herein, including all data, mapping, and analysis, copyright 2020 Johns Hopkins University, all rights reserved, is provided to the public strictly for educational and academic research purposes. The Website relies upon publicly available data from multiple sources, that do not always agree. The Johns Hopkins University hereby disclaims any and all representations and warranties with respect to the Website, including accuracy, fitness for use, and merchantability. Reliance on the Website for medical guidance or use of the Website in commerce is strictly prohibited.

Second, I know very little about biology, beyond a layman’s fascination with it and the way everything evolved. (Including this virus!) I do have some experience with modeling, including using my ADE Python package to develop some really cool power semiconductor simulation software that I’ll be releasing in a month or so from when I’m doing the GitHub commit with this COVID-19 example. The software (also to be free and open-source!) has a sophisticated subcircuit model for power MOSFETs that evolves 40+ parameters (an unfathomably huge search space). It uses the same principle–differential evolution of nonlinear model parameters–as this unfortunate example we find ourselves in.

The model I’m using for the number of reported cases of COVID-19 follows the logistic growth model, with a small (and not terribly significant) linear term added. It has just 4 parameters, and finding the best combination of those parameters is no problem at all for ADE.

Remember, I am not an expert in any of the actual realms of medicine, biology, etc. that we rely on for telling us what’s going on with this virus. I just know how to fit models to data, in this case a model that is well understood to apply to biological populations.

Don’t even think of relying on this analysis or the results of it for any substantive action. If you really find it that important, then investigate for yourself the math, programming, and the theory behind my use of the math for this situation. Run the code, play with it, critique it, consider how well the model does or does not apply. Consider whether the limiting part of the curve might occur more drastically or sooner, thus making this not as big a deal. Listen to experts and the very real reasoning they may have for their own projections about how bad this could get.

It’s on you. I neither can nor will take any responsibility for what you do. I will say this, though: If you haven’t been sitting at home for a week straight already, wash your hands a lot and don’t itch that nose unless you really have to and you just got done with one of those hand washings. It’s a hot zone out there already. You don’t need my fancy modeling to see that.

Finally, if this is getting you down, please think of all the people who were living and loving and looking up at the blue sky even during the fall of Rome and the Black Death. We have a front-row seat on history being made. Yes, it is a worldwide biological cataclysm not seen since the days of polio, smallpox, and the Spanish Flu.

Yes, this really sucks. But you are alive, and there is so much left to see. A world in crisis can sometimes be an exhilarating world to live in, like a sharp fresh breeze tickling your face on a clear winter’s day. Your grandparents saw cold bracing days like these, and were called the Greatest Generation for the way they responded.

To anyone in despair: Leaving the show early would be a sad waste of the seat that was reserved for you. Stick around. Do what you can to make your life a little better, and the lives of those who love you and whom you love. Allow your worries and fears and sadness to seep into the gentle awareness that an entire world now worries with you.

And there is a bit of good news to share, though it may be cold comfort for my fellow citizens in the U.S.

South Korea is fully in its containment phase, well past its t0 that took place over two weeks ago. They followed the logistic growth model all the way to the containment phase. Look at the two curves and annotated +/- numbers in the upper subplot! The lower subplot zooms in on a narrow range of case numbers around 8,000, where it is unlikely to increase much further.

Reported Covid-19 cases in South Korea vs days since Jan. 22, 2020

Italy’s numbers should start leveling off significantly in the next week. They reached t0 yesterday, according to my best fit of the logistic+linear model with this evening’s data. They appear to be headed for around 70,000-80,000 cases, or about 1% of their population. Even that doesn’t sound too bad.

Reported Covid-19 cases in Italy vs days since Jan. 22, 2020

Be well. And stay home.

Notes


  1. Ng Yi Kai Aaron pointed out an article referencing a paper (Ziff, Anna L. and Ziff, Robert M., “Fractal kinetics of COVID-19 pandemic,” preprint available online) suggesting that the data from China’s experience with the virus “are very well fit by assuming a power-law behavior with an exponent somewhat greater than two.” 

  2. See the important section entitled Disclaimer

  3. See, e.g., https://services.math.duke.edu/​education/ccp/​materials/diffeq/​logistic/logi1.html

  4. All these significant figures are only used for comparison purposes. It is of course silly to put more than a couple of significant digits on extrapolations this uncertain. 

  5. You may think that’s progress, but I consider it disappointing (as a human being with a pair of lungs, not as a data modeler) that the model is tracking the model’s exponential growth phase so closely, and that t0 seems to remain far in the future. 

  6. This projection remains unchanged from the one done with data from yesterday (3/18). 

  7. With yesterday’s data, I thought we would reach the five million mark a day earlier, 4/10. 

  8. Trevor Bedford, for example, a scientist at the Fred Hutchinson Cancer Institute in Seattle “studying viruses, evolution, and immunity,” has mentioned a 10:1 true vs reported cases ratio. https://twitter.com/​trvrb/status/​1238643292197150720?s=20.

    “I could easily be off 2-fold in either direction,” he Tweeted on March 13, when there were just over 2,000 cases being reported in the U.S., “but my best guess is that we’re currently in the 10,000 to 40,000 range nationally.” 

  9. Those who follow me on Facebook know how much contempt I have for the incompetent, malicious, destructive asshole who found enough bigots and morons in a key combination of states to make it past the Electoral College. No, I will not mince words. If you still support Donald Trump– knowing that he dismantled the office that Obama had set up to address pandemics, that he fired people with expertise to deal with this, that he downplayed and denied the reality of the problem until just days ago–then I think there is something deeply wrong with you.

    In my previous post, I asked, “Do many of his supporters even realize how much they’ve been played?” I quoted the self-confessed narcissist Sam Vaknin, who wrote that “the narcissist abuses people. He misleads them into believing that they mean something to him, that they are special and dear to him, and that he cares about them. When they discover that it was all a sham and a charade, they are devastated” (Malignant Self-love: Narcissism Revisited, Narcissus Publications, 2015, p. 69.)

    So far the deranged narcissist’s base of support has proven remarkably resilient to plain facts about how much of a sham it really is. I hope that changes very soon. 

  10. Using my free, open-source Python package ade, Asynchronous Differential Evolution. 

  11. Genetic drift is an evolutionary phenomenon where a population “drifts” certain bits of its genetic code toward what appears to be an optimal range when in reality it is just the survivors propagating a consensus that has no actual selection value. I’ve seen it happen with my computer evolution of simulation model parameters just like it happens in nature.

    The final population of L with 3/19 data ranges from around 20,000,000 to more than the population of the U.S., where the logistic model would obviously run into a stark limitation. Not different enough to show an updated plot. 

  12. This scatter plot doesn’t show a real probability distribution, as a Monte Carlo analysis would. But it does seem instructive, to represent a confidence interval of sorts. I’m guessing that it is no narrower than a posterior distribution obtained from a random walk with well-informed priors. On this question, however, my modeling knowledge reaches its current limits. 

  13. The GitHub repo is at https://github.com/​CSSEGISandData/COVID-19