Showing posts with label Science. Show all posts
Showing posts with label Science. Show all posts

Sunday, July 19, 2020

Balancing Harms

Why our teenagers won’t be on their high school campus this fall.

Since early March, I’ve been watching the relentless spread of SARS-COV-2 throughout the United States with morbid fascination and a grim determination not to expose myself or my family to this novel virus that looked plenty dangerous even without its full effects being understood. Now that we are learning more about the long-term suffering and damage it can do with the COVID-19 disease it causes–in younger people as well as those my age and older–this determination has not wavered.1 There is nothing more important in my life right now than protecting my family and my own body from this virus.

We’ve been holed up on our rural property with only occasional trips to get curbside pickup or have open-air driveway visits with friends. During these long months of quarantine, I’ve combined my technical background in signal processing and programming with a long-standing interest in math and data modeling to get myself a uniquely clear view into the situation with the COVID-19 pandemic.2 Yes, that’s a bold claim, but there is a lot of work and I think some pretty informative results to back it up.

Modeling the Spread of COVID-19

This work was to develop a nonlinear mathematical model for the number of reported cases of the disease that fits remarkably closely to what we’ve seen for the past eleven weeks in the U.S. as the curve flattened and then started heading upward again.3 For my own personal interests, I’ve applied the model to reported case data from my own Washington State and the most populous county out here in Eastern Washington where I live. For some context, I’ve also used it to consider how badly things are going in neighboring Idaho, and to stare at the dumpster fires raging in states like Arizona, California, Florida, and Texas.

This plot shows what’s happened during May, June, and now half of July in the U.S. overall, and how well the model fits with the cases reported for each of the past 79 days. The data is generously provided to the public by the New York Times, based on reports from state and local health agencies.4 I hope you will understand from these plots why I dared to see myself as having having a uniquely clear view of our pandemic-documentary version of American Horror Story.

U.S. Reported COVID-19 cases vs days elapsed since 1/22

The upper subplot shows the number of reported cases for the entire U.S. since the first of May. There are two curves plotting cumulative case numbers on a logarithmic scale vs the number of days that have elapsed since the first case was reported in this country back on January 22.

The blue curve shows what was actually reported, according to the New York Times data. The red one shows what my model expects was reported for each day in the past, looking backwards from the most recent date (to which it is fixed as an anchor point) with its ten model parameters fitted to the data. The largest error between what the data and the model’s fit to it is just 2.2%, way back on May 11 when there was just over a third as many cases as now.5

Under the Hood

You can skip this section if you don’t love math. Your loss.

The skinny subplot below the big one shows the error between the model’s expectations and reality.6 What you want to see in such a residual plot is a relatively even distribution of modeling error vs the amount being modeled. This one looks about as good as you could ask for, especially when you consider the significance of normality p of 0.38

That means that the “leftovers” from modeling the past data are not much different from what you would get from normally-distributed random noise.7 Because it’s impossible to model noise, you can have some confidence that the model is accounting for most everything but randomness when it is nearly as probable as not that your residuals would look this random if it were indeed just noise causing the error.

It may be more instructive to consider the opposite case, if there were low p value for the non-normality statistic. Say, 0.02 instead of the actual 0.38. That low p value would indicate that re-running 50 experiments (obviously not possible with a natural experiment like the one we are running with worst pandemic in a century) would get you residuals that distinct from normal random error only once on average. That would be a pretty good indication that your model isn’t accounting for some noteworthy phenomenon.8 But that’s definitely not what’s happening here with the fit of this model to reported cases of COVID-19 in the United States, at least not as of July 16.

So my “logistic growth with multiple growth regimes model” is accounting for what we see in the data. It is a naive curve-fitting model that does not assume anything beyond the following:

  1. The number of reported cases of COVID-19 in the U.S. is following a logistic growth model with L (the ultimate upper limit) fixed at 1/4 of the U.S. population,9

  2. but with three separate growth regimes (3 parameters) having smooth transitions between them (4 parameters),

  3. and with a sinusoidal component that imposes a weekly variation (2 parameters) on the current growth rate for each day,

  4. plus, finally, with a fixed number of new cases per day (1 parameter), to allow the model to only account for reported cases on or after May 1.

The best-fit curve has an artificially high initial growth rate r1 of 4.8966 (nearly 500% per day!), which the differential evolution algorithm arrives at because it isn’t actually looking at numbers before May 1. It just wants to fit the data between May 1 and now as closely as possible, and it found the way to do that was to jack up the growth rate for all those unseen days. It’s doing it’s job, and that’s fine; we don’t care about that earlier growth rate for this analysis, just what is happening now.

Following the model forward,10 we soon transition to a relatively sedate growth rate r2 of 1%. The transition occurs over a 32-day window (e.g., the time it takes for half the smooth transition between growth rates) defined by s1 that is centered on Feb. 6, (16 days after the first U.S. case), defined by t1. Then the red-state governors reopen things followed by people living it up on Memorial Day weekend. And we wind up with a 5/30 midpoint between the second growth regime where the curve had been flattening nicely and our current scarier one r3 where SARS-COV-2 reminds us who it is with a 1.9% increase in cases per day.

Fortune Telling with Function Fitting

This model of mine is a naive empirical one, using a cool evolutionary algorithm to fit a curve to data. It’s a very elegant curve, constructed from a first-order differential equation with multiple growth rates and smooth transitions and weekly variation, though that doesn’t make it a basis for much extrapolation.

But it certainly does match up to what’s been happening. An error betwen modeled and actual values of 1% or less going back four weeks, and then 2.1% or less for another eight weeks–forever in pandemic time. That’s 79 data points fitted with just ten free parameters. As with a shoe, if the model fits, wear it.

It fits well enough that I will try out some extrapolation anyhow, despite having just acknowleged the limitations of the model for prophetical purposes. The following plot shows what the model is projecting for U.S. reported cases two weeks into the future.

U.S. Reported COVID-19 cases vs days elapsed since 1/22

The upper subplot of this plot shows how many reported U.S. cases the model projects under the assumption that the data will continue to reflect a 1.9% daily growth rate, with an 11% weekly variation imposed by reporting limitations over the weekends.11 That is of course not an entirely safe assumption to make, no matter how closely the model fits to past data, and I have mostly limited my plots to just a couple weeks of extrapolation.12

The bottom two subplots show the new cases being reported each day, first as a percentage of the cumulative cases already reported as of that day and then (bottom subplot) as a simple number of new daily cases.

We have gotten jaded to the horror of this pandemic over the past several months, but take another look at that number in the upper right. It’s a big one: nearly five million people testing positive in August. And it’s increasing fast.

In the middle plot, we are seeing one fiftieth more being infected with each passing day, with weekly variation due to testing and people being off on weekends. And on the bottom plot, sixty thousand plus or minus new cases each and every day, all carrying the risk of a person losing their health and vigor for weeks if not months, in some possibly for a lifetime. Occasionally it’s a lifetime that the virus cuts short.

Yet people are complaining about wearing masks when they go in a store.

Dr. Anthony Fauci expressed the belief (or perhaps just hope) that the number of daily reported cases would never reach 100,000.13 I fully accept that Dr. Fauci has a wealth of knowledge and insight that is not reflected in my naive curve-fitting to the time-series data. But from what I’m seeing on that bottom red curve, it’s hard for me to see how we can avoid that grim milestone.

To do so, we would need a significantly lower daily growth rate during the coming weeks. It would have to go down enough to cause the value of parameter r3 to decrease, perhaps enough to justify yet another growth regime in the model and an additional three parameters. There is of course no way a naive model builder can know that in advance; this is nonlinear curve fitting with a modest and limited amount of extrapolation, not prophecy.

Track Record

This essay will not dwell on the model’s track record; I’ve done plenty of that in previous blog posts. I’ll just offer a couple of observations from back-testing the model, along with a long footnote about some reddit critics, and leave it at that.14

With data from the day before yesterday (7/16), the model projected that today’s New York Times cumulative cases number would have been 3,738,827, an increase of 149,338 over the two days. It was 3,719,110, an actual increase of 129,617. The model was pessimistic by 15%, or 7% off per day. It’s a naive curve-fitting model, and does not inform us whether this is because the weekly variation is increasing or the growth rate is settling back down, or there was just quite a bit of random variation in one direction.

With data from seven days ago (7/11), the model projected that today’s New York Times cumulative cases number would have been 3,703,746, an increase of 443,181 over the week. Again, it was 3,719,110 today. That error is too small to bother even worth trying to calculate. The projection was essentially perfect one week out.

State and Local

My own state of Washington was doing pretty well with this virus up until early June. Now things aren’t looking great at all, with not just the number of daily cases increasing each day, but also the rate of growth in daily cases. Here is what I’ve been calling the “Nightmare Plot” with the full set of information about the model’s fit to Washington’s reported case data along with a couple weeks of extrapolation into the end of July.

Washington State cases vs days elapsed since 1/22

I don’t actually expect that the long-term growth rate for reported COVID-19 cases in Washington State will settle at an absurd 33% per day, despite the model’s best fit assigning that value (0.32583) to the parameter r2. Something’s gotta give long before that happens, because no society can sustain having its population infected with a deadly virus at a reported rate that increases each day by a third of the total number of cases reported thus far.

The reason the curve fit can get away with such a high estimation of r2 is that it paired it with a very large value of t1, 271.65. That corresponds to an interim point of 272 days after January 22 that lies midway between the initial growth rate (nearly zero) and that crazy high final growth rate of nearly 33%. That’s October 20. I don’t believe things will continue increasing in Washington state until then, and neither should you. But it is useful to know that the best fit for the model parameters right now is one that projects a lot more cases, and a continued increase in how fast those cases are coming, for months ahead. That’s what would happen if the virus were allowed to progress as it has.

It can’t, of course. We won’t allow it to, whatever our politics or petty objections to wearing something on our face to protect other people. You can see why that growth rate will have to go back down–one way or the other–by humoring me with an extrapolation of the Washington State model through the end of August, when school is scheduled to start. This plot shows the percentage of Washington’s population that the model expects will have tested positive for COVID-19 on each day.

Percentage testing positive in Washington State vs days since 1/22

According to the model, cranking away on the data in its ignorance about anything people might do in response to the situation, or about whatever limitations there are on how many tests can be processed in a given day or week, we would be seeing 2% of the people in the entire state testing positive on August 25. That’s right around the time the kids would be heading back to campus.15

Assuming the continued accuracy of CDC Director Robert Redfield’s June 29 assessment that there are ten times as many actual cases as reported ones, one in every five citizens of Washington would have actually contracted COVID-19. Long before that, though, you would see masks everywhere even in the rural red eastern half of the state and Karen would finally shut the fuck up already. It’s actually started happening out here now, after months of opposition and denial.

The curve will flatten again, inevitably. This is the one bit of armchair epidemiology I will dare to offer, if you don’t also count my assigning 1/4 of the region’s population to model parameter L. The rest of my work is just looking carefully at data with a highly refined non-linear model that has reflected that data really well thus far and is pretty good at looking into the future a few days, perhaps even a week or two.16

Here’s the Nightmare Plot for Spokane County, by far the most populous one in Eastern Washington:

Spokane County (Eastern WA) cases vs days elapsed since 2/25

In this instance, we have the extreme periodic behavior of zero cases being reported each of the past two Saturdays (including today). But the model isn’t fooled; it mostly accounts for the periodicity by evolving its parameter aw to an unusually high value of 83% (0.83329). The residuals are fine, as normally distributed as would be expected from random Gaussian errors nearly half the time. And thus it can project with some confidence there will be well over a hundred newly reported cases this coming Tuesday and also on Wednesday. I am pretty confident that we (I live in a surrounding county, but Spokane County looms large) will be seeing four thousand reported cases (in a county of just over half a million) by early August.17

Finally, before going into the heavy parental decision that all this data was in service of, I will offer a Nightmare Plot for neighboring Kootenai County, Idaho. I expect the number of reported cases there to double in the next two weeks.

Kootenai County (North Idaho) cases vs days elapsed since 3/21

Hell No, They Won’t Go

I’ll repeat again that my only relevant expertise is in applying nonlinear mathematical models to time-series data.18 But, alongside my wife, I am still am entrusted with a decision that will affect some young people’s lives. Using this limited area of expertise along with a very comprehensive collection of data, we have decided that our high school kids will not be attending campus in person this fall, regardless of what precautions the school administrators take or what requirements they have. We just won’t do it.

How Many is Too Much?

The metric I have been using to assess how serious things are is 1% of the local or national population testing positive for COVID-19. Again relying on the view of CDC Director Robert Redfield (whose agency is now being shunned by the deranged racist narcissist in the White House) that there are ten times as many actual cases as reported,19 this equates to one tenth of the population that has thus far been infected.20

This 1% threshold has already been reached nationally, around July 12. My modeling of Washington State, Spokane County (WA), and Kootenai County (ID) for the weeks ahead makes me believe that it will also be reached locally by the time my kids would be asked to return to campus this fall. Using Dr. Redfield’s 10:1 actual vs reported estimate, every tenth person will have been infected in the region around my kids’ high school.

Think about that number for a moment. One out of every ten residents of the most populous county in Eastern Washington, or indeed the entire state–will have had that virus growing inside their bodies. Imagine one finger on your two hands held up in front of you being a random person in your community who is or has been a host for SARS-COV-2.21

How Many Contagious People Around?

The thought of 10% of the national or local population having contracted COVID-19 is pretty scary, but how many of them will actually pose a danger to my kids and my wife and me? This is a tough question to ask, and the weakest link in my analysis. My wife and I had an important decision to make, and we’re pretty much on our own in the failed state that America has become, and all I have to work with is the reported cases plus whatever assumption I put on top of them.

Let’s first consider the time window of when people can spread the virus to others. In an article published in March and updated July 2, The Harvard Medical School says the

time from exposure to symptom onset (known as the incubation period) is thought to be three to 14 days, though symptoms typically appear within four or five days after exposure.

We know that a person with COVID-19 may be contagious 48 to 72 hours before starting to experience symptoms. Emerging research suggests that people may actually be most likely to spread the virus to others during the 48 hours before they start to experience symptoms.

The article provides little guidance about when infectiousness might end. “Most symptomatic carriers “will no longer be contagious by 10 days after symptoms resolve,” is the best I can find, wondering if this also applies to people whose symptoms last for weeks.

Assembling all these vague numbers together, I wind up with the assumption of a fuzzy, ill-defined “contagion window” extending out to ten days after exposure. I have no idea what the shape of that distribution would look like. Are there lots of people who you still wouldn’t want to be around more than ten days after they were exposed, or just a few? But to keep things simple, to limit the number of people testing positive who I will consider infectious, and to not be quite so alarmist, I’ll assume that the window extends from the exposure date (realistically, it’s probably the day after exposure at the earliest) to ten days after.

So when did all the thousands of people who tested positive on a given day actually get exposed to the virus and start that (highly uncertain) ten-day contagion window?

Unfortunately, the delay from when a person gets exposed until their exposure results in a reported case is variable, long, and may be getting longer as backlogs of tests pile up. In a scathing April 4 critique of the very kind of analysis I’ve attempted to do–by a person who knows a thing or two about modeling data–Nate Silver said he assumes

that there’s a delay of 15 days . . . between infection and the test results showing up in the data though if anything I suspect this is too generous, given the huge testing bottlenecks in places such as California.

I’ll go with his 15 day estimate. In doing so, I am mindful of his warning that the “number of reported COVID-19 cases is not a very useful indicator of anything unless you also know something about how tests are being conducted.” But I will go ahead and make the terribly simplistic yet perhaps still useful assumption that (1) the exposure resulting in a given day’s new daily cases occurred 15 days before that day, and (2) the window in which all those infected people were infectious to others was from 5-15 days before they became a reported case.

This means that you have to look forward 5-15 days along the red projected-cases curve to see how many people around you are infectious on any given day. What the projection does is to give you an idea as to how many of those as-yet uncounted people are out there being contagious right now.

In the U.S. overall, my model is projecting that we will go from yesterday’s 3.7 million reported cases (none of whom are still in that 5-15 day window) to around 4.8 million 15 days from now.22 That’s 1.1 million additional people testing positive, with most of them in that contagion window right now. I’m further assuming, with Dr. Redfield, that there are ten times as many people actually infected on each given day as what the reported case data shows, which means there are maybe 8-10 million Americans you really don’t want to be around at this time.

Nationally, with our current growth rate and my shaky assumptions, it appears that there are now three infectious people for every reported case. That’s a lot of virus walking around.

How about for Washington State? Simply multiplying by three the 2.2% figure I dared to extrapolate above for Washington on 8/27 would result in around 7% of the population being infectious then. That’s a lot. I’m skeptical of it, too.23 So let’s (irresponsibly) project out the actual number of reported cases and then do the math like we just did for the U.S.

Ignorant of the likelihood that the curve will flatten between now and then, the model projects that 1% of Washington’s population will be testing positive on August 7, which is comparable to the situation now in the U.S. overall.24 See the “Percentage testing positive” plot in the section above. Re-doing the plot with cumulative case numbers rather than percentages looks like this:

“No-flattening” long-term projection for reported cases in Washington State

To do a SWAG25 for the percentage of Washington that is infectious on 8/7 with its projected 78,000 reported cases or so, let’s look forward 15 days to when the model has a business-as-usual projection of around 133,000 reported cases. That’s 55,000 additional cases. If most of those new cases appearing on 8/22 are infectious on 8/7, and if the 10x multipler for actual vs reported cases holds true, that’s perhaps nearly half a million Washingtonians capable of giving us COVID-19.

So, when it projects that 1% of my state’s citizens will be testing positive, assuming the virus is still on the rampage, my model tells me that around 6% of the state population will be capable of infecting me if I get too close. As I said before, I doubt if it will be growing as fast then as the model currently projects, but even half that would be too much.26

And of course I could not resist doing the same irresponsible extrapolation for Spokane County:

“No-flattening” long-term projection for reported cases in Spokane County

The model projects around 2,200 new reported cases from 8/7 to 8/22. Ten times that would be around 4% of the county population actually infected, with most of them infectious to others.27

What’s So Bad About 3%?

Imagine that there is indeed a 3% probability of any randomly selected person from the population around you being able to infect you with COVID-19 by coughing, talking, or even just breathing.

If you (or your school kid) encounters just 30 people randomly chosen from the population in a given day, you have only a 40% probability of avoiding any encounters with an infectious COVID-19 carrier. If you mingle in society and encounter a different set of people each day, your probability of avoiding proximity to a COVID-19 carrier go down dramatically each day. After a week, it’s practically impossible to remain free of any such encounters.

Hopefully those encounters are brief and separated by at least six feet (more space is better), with masks on everybody unless outdoors. That doesn’t sound like any sort of high school experience my kids would have this fall, whether attending under reasonable conditions or staying home. No dances, hugs, high fives, being the class clown or acerbic wit, band, etc.

So they will just stay home regardless and wait for our national folly to play itself out. Hopefully before or at least by the time we reach 5% of Americans reported as infected and probably half actually, we will have some herd immunity going and Spring Quarter will look safe.28

If it reopens, each of the students attending our kids’ high school will have a terrifyingly high probability of being exposed at some point through their day to someone infected by COVID-19. This presents an individual and family risk that my wife and I will not ask our kids to bear.

The Ethical Dimension

There is an ethical dimension to this as well. Admittedly, it is a small and secondary part of my considerations because, like you, I consider my health and that of my own family of paramount importance.

But here it is: Do I want to participate in an activity whose existence poses a serious health risk to a teacher, janitor, teacher’s assistant, or administrative employee? A person already underpaid and unappreciated who probably feels compelled to enter that building full of people whose skills in risk assessment and decision-making will not fully develop for a few years yet?

That person may be decades older than the students talking loudly in their classrooms29 or the kids whose bathrooms they are tasked with cleaning. And they may have little choice but to put their bodies at risk for simple economic survival, including, ironically, being able to keep the very health insurance they rely on to keep from going bankrupt if they get sick. They don’t have the luxury my kids have (though they hate it) of sitting home with both parents there.

It is an upside of having older parents that perhaps balances the increased risk we have, but worthless if we don’t make use of it. So we will continue staying home and staying vigilant, and thus deny this virus one small set of hosts to travel with.

Ironically, the parents that this effects most may be teachers with kids in school themselves.30 People in other professions and trades have to figure something out for their kids to be home for two months every summer here in most places in the U.S. But teachers have long enjoyed the perk of summers off, and so haven’t needed to make child care plans for the summers. When they go back to work, their kids go back to school. Well, maybe not this time, if they are in a reopening school district and decide not to subject their kids (and thus, indirectly, themselves) to the risk of infection.

The Decision: A Balance of Harms

Our decision will affect our kids’ future. There are serious consequences to them either way. On one side, this avoids subjecting them to an unacceptably high risk of catching a virus whose likelihood of causing months of illness and disability, long-term damage, and even death is only now being appreciated.31 Or (especially regarding the possibility of death) of passing it on to the two people they love most on earth.

On the other side is the sober realization that these teenagers of ours are going to miss out on much of the activities and flirtations and friendship intimacies of the years we remember from when we were them.32 There’s just no substitute for that experience in “remote learning,” whose name it seems to me refers as much to the likelihood of learning occurring as to the physical distance.

This decision is a balance of harms. It was difficult to make, but allowing our kids back onto their high school campus this fall imposes an individual and family risk on them that they cannot be asked to bear.

There are three times as many people who have been infected across the U.S. as there were at the beginning of May. Now the spread of the virus is growing twice as fast as it was then. Yet the general public and its elected officials (far more so in the GOP and its Dear Leader but to some extent true of both parties) have been acting like re-opening everything is simply inevitable no matter how many refrigerated morgue trucks a hospital needs to have stationed outside, no matter how many millions of people young and old wind up with permanent damage to their lungs, kidneys, hearts, and even their brains.33 No matter how many weeks of suffering many millions have to endure, each successive day growing with the fear and dread of being one of the unlucky long-term sick. Nope, we have to get things re-opened again, no matter what.

Again, I recognize that my wife and I are privileged in not having to leave the house every day for work. But the fact that in-person reopenings of various types are pretty much economic necessities in our dog-eat-dog unfettered capitalism does not make me want to participate in them if I can help it. That’s mostly from a desire for preservation of self and family, but to a small extent to not participate in the mass delusion that everything will be OK.

Notes


  1. This essay isn’t intended to be a collection of scary links, but it’s worthwhile to consider the view of an ICU doctor last week that his patients have gone from having an average age of around 65 to “between 25 to 35, 45 years old” (“Miami Hospital ICU Doctor: New Influx Of Patients Is Younger Than Before,” NPR, July 13

  2. The technical details of the modeling are covered later in this essay. I had spent much of my Python coding time in the past two years working on a free, open-source software project that models power MOSFET devices using Python and a freely available underlying simulation tool called Ngspice. Then a deadly pandemic happened, my family went into a no-nonsense quarantine for months, and frankly it became a bit difficult to concentrate on something as removed from practical life as that. Instead, I took the same nonlinear modeling tools I developed for the simulation project and applied them to time-series data on reported COVID-19 cases. This essay with its plots is the culmination of that effort. 

  3. The model is implemented in an example file covid19.py that is part of my ADE Python package. It’s free software; with the right software skills, you can install it and try it out for yourself. 

  4. Earlier I was using data from The Johns Hopkins University, but have switched to the New York Times data both for the detail it offers as well as its simple and open licensing terms, which are co-extensive with the Creative Commons Attribution-NonCommercial 4.0 International license, plus this:

    “In general, we are making this data publicly available for broad, noncommercial public use including by medical and public health researchers, policymakers, analysts and local news media.”

    “If you use this data, you must attribute it to ‘The New York Times’ in any publication. If you would like a more expanded description of the data, you could say ‘Data from The New York Times, based on reports from state and local health agencies.’” 

  5. Some of the states and counties I’ve looked at fit well enough to their datasets without all ten parameters used in the national model, and thus have simpler models. Washington State, Spokane County, and Kootenai County have only two growth regimes, seven parameters instead of ten. The AICc metric was used to determine what parameters “earned their keep” and remained in the model. They only do if they result in a lower (better) AICc score. 

  6. A square-root transform is applied to both the modeled and actual new-case numbers to arrive at the residuals. Then the sum of squared error (SSE) is computed by squaring each residual value and adding them up. The purpose of the transform is to lessen the disproportionate impact of later, higher numbers on the curve fit. 

  7. The “normal” or Gaussian probability distribution describes many events in nature, from the noise you hear on a radio to the variation in people’s heights. It is what you eventually wind up with when you look at enough related phenomena together, even if the underlying probability distributions of each one are not normal. 

  8. For some reason, a p of 0.05 is the standard for many statistical tests. So, on that basis, the residuals are well within the acceptable range of normality. 

  9. This value of L is the largest number of people the state/​county population expected to be reported as infected. According to Wikipedia, the lowest estimate for COVID-19 herd immunity of actual (not just reported) cases is 50%.

    But there will always be more actual cases than reported. As mentioned in the essay, on June 25 CDC director Robert Redfield indicated there are 10 times as many actual cases as reported. That would reduce a 50% herd-immunity value of L to 0.05 times population. But the 10x ratio could change as the number of cases increases and testing could increase to the point where nearly every American has been tested.

    To be conservative about the modeling and hopefully avoid inserting too many shaky assumptions into it, I’ve fixed L at 0.25 times the region’s population. This limits the increase in reported cases (regardless of the growth regime) as the number approaches 1/4 of the state or county’s total population.

    It actually doesn’t matter much. At this point in the pandemic, L has almost no impact yet; all of the regions I’ve looked at still have reported case numbers that are single-digit percentages of their populations. 

  10. The curve fitting algorithm actually follows the model backward from the latest day’s data, to which it fixed its cumulative case result. It does this by backwards-integrating a first-order differential equation xd(t,x) whose gory details are shown in the upper left part of the first plot. Applied mathematics in all its glory, and it works. 

  11. The exact parameters are shown in the lower right of the first set of subplots, with the third growth regime’s daily rate r3 equal to 0.18863 and the weekly variation aw equal to 0.11354. 

  12. Extrapolating a nonlinear model is not something to be done lightly even if the underlying phenomena are well understood and predictable. In the case of America’s COVID-19 pandemic, there are a couple of known unknowns (to paraphrase Donald Rumsfeld) worth mentioning in addition to all the unknown unknowns.

    The curve could flatten again if those states whose citizens and governors have not been taking this virus seriously might finally start to do so when enough of the people around them get sick and they see a lot of those people staying sick for a long time, or even dying. A second less comforting possibility is that we may soon see the effects of Trump’s stated desire not to have the reported numbers increase quite so fast (“Slow the testing down, please!”), and of his latest move to keep data from reaching the CDC. Doesn’t it seem that a federal agency named “The Center for Disease Control and Prevention” should be informed about just how many cases there currently are of a pandemic affecting millions of Americans? 

  13. Norah O’Donnel: “Dr. Fauci, do you still think that we could reach 100,000 infections a day?” Fauci: “You know, Norah, I don’t think we will. I hope not. It is conceivable that if we don’t get good control over the current outbreak and we keep spreading into other regions of the country, we could reach 100,000” (CBS News, “Fauci says he doesn’t like ‘to be pitted against the president’ after multiple attacks from the White House,” July 16

  14. Posting about this modeling work of mine on the r/Coronavirus subreddit has resulted in a lot of page views, which has been appreciated since the goal of the writer is after all to be read, and this seems like something important to share. It also results in a few critical comments each time, some of which have actually been helpful in guiding me toward more statistically rigorous modeling. I am finally reminded of the public health official in Argentina who took me to task back in April about not providing any statistical tests about the model’s residuals. It was a valid criticism which I accepted and responded to, and the official gracefully acknowledged frustration and fears about the virus in their country.

    Some of the critical comments have just been exasperating snark from people who consider themselves far above something so naive as fitting data to a curve and daring to claim that the curve means something when it fits very well. A recent interloper gazed down from his high horse to proclaim that I am trying to model sampling error with my weekly-variation component. I tried unsuccessfully to point out that my effort here is to model reported cases, not to imagine some expertise about an unknown number of actual cases of infection out there, hidden in the unreported masses. If the time series of case reports has a periodicity to it, as it obviously does for the U.S. overall and also more than half the states and counties I’ve examined, then that is what I’m modeling more faithfully by adding a periodic component. I predict the reported cases will increase by not much more tomorrow–perhaps even less–than they did today, because the periodic component to the rate at which cases are reported reaches the trough of its wave tomorrow. That says nothing about how cases are “actually” increasing, nor do I ask it to.

    The same inquisitor muttered something about basis functions and suggested I look into learning about the Fourier series. (I’m an electrical engineer with expertise and a dozen patents related to signal processing; I’ve heard of the Fourier series, thanks.) How silly of me to think that there is anything special about fitting a ten-parameter combination of any functions to within 2.1% over 70+ data points. Why, he (we can all guess it was probably a “he,” can’t we?) could pull off that same trick with just about anything. I suggested it was absurd for him to claim that ten free parameters of a Fourier series (five terms with a real and imaginary component apiece, though I’d forgotten about the DC component) would accomplish the same thing, and he made some assertion that he managed it with seven, presumably seven components. No plots or code or anything was provided, of course. And even if he could, he would find the innate periodicity of the Fourier series bringing his case numbers right back to the simple linear increase of the zero-frequency DC component, back and forth in an absurd roller coaster totally unrelated to the problem at hand as his modeled cases come in faster and then slower and then faster again.

    I’ll post this on reddit like my other essays about this model. But this time I think I’ll pass on any further schooling from the boy geniuses there who remember a few textbook cases and fancy terminology but offer no relevant work of their own for comparison. 

  15. Update 7/23: If I kept updating these blog posts with each new day’s data, I’d never stop. But given how sensitive such a long extrapolation is to changes in recent data, I do feel compelled to offer this link to a plot updated with 7/22 data that shows 1.3% of the state testing positive on August 25 instead of 2%. There is just a hint of flattening showing up in the curve. 

  16. For the Washington State data, the model has been a bit less accurate than for the U.S. as a whole. With 7/16 data, it projected 48,001 reported cases today when there were actually 47,563, an error of 34% (16% per day) in the projected vs actual increase from 46,268.

    With data from seven days ago, it projected there would be 46,950 reported cases today. Again, there were 47,563. That’s an actual increase of 6,237, compared to a projected increase of 5,624. The error was around 11% for the week, or around 1.5% per day. 

  17. Admittedly, the model starts to deviate quite a bit from what happened when it looks back more than three weeks. It goes from a worst-case modeled vs actual error of 6.9% on 7/11 to 12.8% on 6/28, 14.5% a week earlier, and 21.9% a week before that. There was an obviously artificial glitch in test results being released around 6/27-6/28, which the model will only approach as it smooths its errors over the whole interval of interest.

    Update 7/23: A Nightmare Plot for Spokane County updated with 7/22 data is here. The model projects 4,000 reported cases on August 4. 

  18. I hope this modeling work gives people some insights about the situation we face here in the U.S. But please note this critical disclaimer: First, I disclaim everything that the New York Times does. I’m pretty sure their lawyers had good reason for putting that stuff in there, so I’m going to repeat it. Except think “Ed Suominen” when you are reading “The New York Times”:

    “The New York Times has made every effort to ensure the accuracy of the information. However, the database may contain typographic errors or inaccuracies and may not be complete or current at any given time. Licensees further agree to assume all liability for any claims that may arise from or relate in any way to their use of the database and to hold The New York Times Company harmless from any such claims.”

    Second, I know very little about biology, beyond a layman’s fascination with it and the way everything evolved. (Including this virus!) I do have some experience with modeling, including using the ADE package of which this COVID-19 modeling is a demonstration file to develop some really cool power semiconductor simulation software that I’ll be releasing in a month or so from when I’m doing the GitHub commit with this COVID-19 example. The software (also to be free and open-source!) has a sophisticated subcircuit model for power MOSFETs that evolves 40+ parameters (an unfathomably huge search space). And it does so with this very package whose example you are now reading.

    So–yes, this is still a disclaimer–I am not an expert in any of the actual realms of medicine, biology, etc. that we rely on for telling us what’s going on with this virus. I just know how to fit models to data, in this case a model that is well understood to apply to biological populations.

    Don’t even think of relying on this analysis or the results of it for any substantive action. If you really find it that important, then investigate for yourself the math, programming, and the theory behind my use of the math for this situation. Run the code, play with it, critique it, consider how well the model does or does not apply. Consider whether the limiting part of the curve might occur more drastically or sooner, thus making this not as big a deal. Listen to experts and the very real reasoning they may have for their own projections about how bad this could get.

    Seriously, experts know stuff. That’s what they do. One of them I recommend paying attention to is Dr. Osterholm at the University of Minnesota’s Center for Infectious Disease Research and Policy (CIDRAP). His interview in the July 1 podcast episode Viral Gravity is sobering but informative and, as we’ve seen in the nearly three weeks since, has been quite accurate about how serious the situation is. 

  19. Washington Post, “CDC chief says coronavirus cases may be 10 times higher than reported,” June 25

  20. There’s nothing magic about my 1% threshold for reported cases, i.e., 10% of the population actually infected. It is just easy to picture 1/10 of a population. That’s one of your digits from both hands. The Romans understood the power of that concept; “decimation” referred to the Roman Army’s practice of brutally killing every tenth man in a legion to terrorize and punish insubordination. 

  21. I have just the finger in mind when it comes to the “leadership” of our current president in this crisis, but that’s another blog post entirely. 

  22. Update 7/23: This projection is unchanged. Here is the Nightmare Plot for the U.S. overall, updated with data through 7/22. 

  23. Update 7/23: With today’s updated data and the 1.3% figure it now projects for 8/27, the 3:1 multiple would yield 4.2% of the population being infectious. Still more than the 3% figure discussed in the main text. 

  24. Update 7/23: Now projected as 0.92%. Close enough. 

  25. Acronym for “scientific wild-ass guess,” an old term of endearment used by engineers who know full well how many shaky assumptions are involved with many a technical endeavor. 

  26. Update 7/23: The projection updated with 7/22 data now looks like this: around 70,000 cases on 8/7, and around 94,000 on 8/22. The increase is projected to be 24,000 rather than 55,000. So, with all the other assumptions kept the same, we would be looking at 3.1% of Washingtonians capable of infecting you on 8/22. Not as much, but the discussion of 3% still applies. 

  27. Update 7/23: With 7/22 data, the “no-flattening” long-term projection for Spokane County is for around 2,000 new reported cases, just a bit less than previously projected. 

  28. There will hopefully be a vaccine early in 2021 but when and for whom? At the rate things are going now, I wouldn’t be surprised if we get to 50% infected before one becomes available for everyone in my family. 

  29. “[L]oud speech can emit thousands of oral fluid droplets per second . . . there is a substantial probability that normal speaking causes airborne virus transmission in confined environments.” Stadnytskyi, Bax, et al., “The airborne lifetime of small speech droplets and their potential importance in SARS-CoV-2 transmission,” Proceedings of the National Academy of Sciences Jun 2020

  30. Rebecca Martinson, a public school teacher in Washington State, explains her decision to not return to class, no matter what, in this thoughtful essay, “Please Don’t Make Me Risk Getting Covid-19 to Teach Your Child,” New York Times, July 18.

    She writes, “My school district and school haven’t ruled out asking us return to in-person teaching in the fall. As careful and proactive as the administration has been when it comes to exploring plans to return to the classroom, nothing I have heard reassures me that I can safely teach in person.” After listing off all the sacrifices teachers have had to make for their students and careers, she says that it “isn’t fair to ask me to be part of a massive, unnecessary science experiment. I am not a human research subject. I will not do it.” 

  31. The danger of this virus to young people has been downplayed. It is frustrating how little statistical data seems to be available on this, but there is a significant minority of even younger people who suffer for months with the aftermath of a Covid19 infection. As high a figure as 5% has been tossed around, but with little statistical support. That is understandable, given that 2/3 of the number of Americans reported as infected have lived through less than six weeks of their positive test result. The majority of “long haulers” have yet to be. 

  32. Admittedly, we never were quite like them. Unlike these poor kids, we never had to suffer the downers of a deranged narcissist bigot wannabe authoritarian as president, a deadly pandemic, and an emerging economic depression all at once. 

  33. “COVID-19: Severe brain damage possible even with mild symptoms,” Deutsche Welle

Thursday, March 19, 2020

Applying the Logistic Growth Model to Covid-19

The chief task in life is simply this: to identify and separate matters so that I can say clearly to myself which are externals not under my control, and which have to do with the choices I actually control.
—Epictetus, Discourses.
Dad, you’re just some guy who knows how to obsess over numbers. We have actual people who are experts at this stuff. Go and write it if you want, but don’t feel like you have to!
—Daughter of Ed Suominen, March 2020.
TL;DR: A very good fit between data obtained on March 19 from Johns Hopkins University and a logistic+linear growth model indicates there there will be over 50,000 reported cases of Covid-19 in the United States on March 25, over 300,000 cases one week after that (4/1), and several million cases by early April. See the Nightmare Plot and the Disclaimer below.
Update, March 20: There was a significant uptick in U.S. cases today, bringing us to a total of 13,677 according to Johns Hopkins data provided this evening. The increase is more than was expected this time, and the jump is significant enough that I would rather not publish results with the logistic growth model fitted to the latest data. Doing so results in projections that are considerably higher than what the rest of this blog post discusses. I will wait for tomorrow’s data and then perhaps consider a modification to the model if we get unexpectedly high numbers again, one possibility being to change the linear term to a power-law one of the form a*t^b. That might reflect the effect of better testing without forcing an artificially high value for the exponential model parameter k.1
———

On March 15, I wrote to some friends on Facebook about the latest results of putting my computer evolution code and skills to the task of finding parameters for the logistic growth model as applied to the number of U.S. reported cases of Covid-19. My belief–speaking as someone with expertise in fitting nonlinear models to data but not any kind of expert in the fields of biology, medicine, or infection disease–was that we would reach 10,000 cases on March 17, and would have reported cases numbering in the hundreds of thousands by March 29.2 By the end of April, I believed there would likely be millions of Americans being reported as having this virus. The rate of growth I thought would be unlikely to even start slowing down before April.

The prediction for the one date that has come and gone was not quite accurate. On March 17, there were 6,421 cases reported in the U.S., a little less than two-thirds of what the model said was most likely. But, in my defense, I ask you to look back at yourself enjoying a Sunday in mid-March. Would you have been truly untroubled just two days ago by hearing that six thousand of your fellow Americans would have a deadly respiratory infection that puts a fifth of its hosts into hospitalization? The model was pessimistic but not ridiculous.

Two days earlier, on March 13, I had introduced the project to my Facebook friends, prefacing the discussion with the acknowledgment that I’m not a biologist, or a doctor, or an infectious disease expert of any kind. Just a retired engineer and inventor who knows how to write Python code and has been working on modeling and simulation for over a year now.

After seeing predictions ranging from dismissive to hysterical about the Coronavirus, I saw a useful if sobering application example of a tool that I’d written specifically for my electronic simulation work, ADE. This new example would apply the fairly well-known “logistic growth model”3 to what has now bloomed into a pandemic.

Writing and running covid19.py forced me to some stark conclusions: In one week (3/20), I said, we would be likely to have over 20,000 U.S. cases. A week later (3/27), around 10 times that. “By the first few days of April we could very plausibly hit the one million mark. There will certainly be nobody saying this is just like the flu by then.”

The most I was–and am–willing to guarantee about those predictions, however, is that the red line in the plot I included with the post is a nearly optimal fit of the function f(t)=L/(1+exp(-k*(t-t0)))+at to the number of cases versus time provided by Johns Hopkins university, including an update made that evening.

So how did that prediction fare? With data updated Thursday evening (3/19), it now appears that there will be around 13,500 reported cases on March 20. Again the model is pessimistic, with 68% as many cases reported as expected to be. But, again, even the lower one is a huge number of people getting very sick all of a sudden. Were you expecting anything like that just five days ago?

If not, don’t feel bad. There was an excellent reason why you might have been surprised then at what is now clearly plausible to anyone looking at the plot below: Your President was telling you that it was no big deal.

With the data available on March 13, the logistic+linear growth model predicted there would be around 200,000 U.S. cases on March 27. That is a little more than double what the model’s current best fit says is most likely. Again, the model was and still may be pessimistic, predicting too many cases. But again, even 90,000 or so infected Americans–with probably 20,000 of them very sick and at least a thousand of those dying and thousands left with permanent lung damage–is a very big deal. And the virus will still just be getting started.

Yesterday, March 18, I released the first version of this blog post with the projection that there would be nearly 15,000 cases tomorrow (3/20). (Actually 14,538.) Once again, the model was a bit pessimistic; the current projection is for 13,549 cases, or 93% as much.4 And the longer-term projections are slightly lower, which is moving in the direction we all want, though only by a little bit.5

———

So, on March 19, here is what the admittedly pessimistic logistic+linear growth model now says, based on Johns Hopkins data updated this evening. The numbers are all in reported U.S. cases:

  • The day after tomorrow (3/21), there will be nearly 18,000 cases.

  • In one week (3/26), there will be more than 60,000 cases.

  • In two weeks (4/2), there will be over 400,000 cases.

  • We will reach the million mark between April 4 and 6.6

  • On April 11, there will be five million cases.7

  • The U.S. outbreak won’t even begin slowing down until mid-April at the earliest. In other words, there will be increasing numbers of new cases until probably around April 24 when there are finally fewer new cases one day than there were the day before.

  • The ultimate number of Americans being reported as being infected by the novel Coronavirus will ultimately reach several tens of millions.

This is some scary shit. And it may be even worse than it looks right now. What the data show, and the model is fitted to, is the number of reported cases; several days ago, some experts in the Seattle area were saying that the number of true cases in Washington State to be several times the number being tested and reported.8 Isn’t it reasonable to expect that to remain largely true? Our medical system will almost certainly become overloaded and the focus will simply turn to saving those lives that can be saved, as it has already in Italy.

But all this is just me talking, not the model. It makes no assumptions or judgments about the data. It doesn’t care if some political situation has caused fewer tests, or suddenly more tests. It doesn’t care about an idiot chief executive downplaying the danger and thus encouraging its spread (at least among his cult following), then abruptly deciding to join the adults in the room.9

The model simply predicts what will happen if the data continues as it has recently, especially as it has in the past few days.

That’s it. The interpretation and explanation is up to you.

The Nightmare Plot

Returning to the model and its neat little world of reported cases, here is a plot from a simulation I ran this evening, whose results I summarized in the bullet points above. It should make you listen very carefully to what you are being told by medical experts about social distancing, washing your hands, not touching your face, and staying the fuck home.

Now, this is one really important plot. It shows up way too small in this blog post for you to be able to see its important details. So please click or tap on it to open it as an image by itself.

Reported U.S. Covid-19 cases vs days since Jan. 22, 2020

You can also click here to see the plot with data from yesterday, 3/18. Open them in two tabs of your browser and then switch between to see how the model is holding up.

The upper subplot shows the best-fit logistic growth model in red, with the actual number of cumulative reported cases of Covid-19 in blue. The error between the model and the data is shown with each annotation. Look how small the residuals are compared to the exponentially rising numbers of cases. It’s a scarily impressive fit, even if the model has proved a bit pessimistic thus far.

The lower subplot shows the number of cases expected to be reported over time, slightly in the past and then extrapolating to the future. Fifty generations of running a differential evolution algorithm10 resulted in a 120-member population of combinations of parameters for the model. I deliberately terminated the algorithm sooner than I would otherwise so that there would be some visible variation in the extrapolations. The black dots show expected reported-case values with parameters from each member of the population, plotted at a bunch of random times from 3/12 to early April.

Significantly, the subplots both have a logarithmic y-axis. Exponential growth is linear when viewed through a logarithmic lense. When you see that straight line marching steadily upward toward those massive numbers, you really want all your modeling to wind up an embarrassing public failure.

Covering my Posterior

A better way to model this might have been to use a Monte Carlo analysis (e.g., with the Metropolis-Hastings algorithm) to obtain posterior probability distributions for the parameters, and then run a bunch of extrapolations based on parameters drawn from the distributions. But I had the tools handy for using ADE instead; I’ve been wrapping up a year-long project modeling power semiconductor devices using it with the free Ngspice simulation software. So this is what I have to offer, and it seems plenty illuminating to me.

But even without having posterior distributions to draw random variates from, what I am seeing in the scatter plots of value vs SSE for parameter L is not reasssuring. That parameter represents the total number of cases expected to ever be reported. And the data we have, with its steady logarithmic-scale march upward, is not satisfying my computer evolution algorithm that there is any upper limit before the nation’s entire population is infected.

SSE vs value: Parameter L (3/18 data)

Simply put, this thing is currently showing no signs of slowing down anytime soon. It is very possible, even likely, that these values of L are due more to genetic drift than any optimality-of-fit of the modeling they represent.11

A word of explanation of this scatter plot: The red dots hugging the left side of the plot are values (y-axis) of L in the final population of parameter combinations, plotted against the sum of squared error (x-axis) that those combinations had vs the data. The distribution of values seems to indicate that we shouldn’t hope for less than several million U.S. cases, and that we can’t count on any upper limit before the virus runs out of hosts to infect.12

There is a fair amount of correlation between the model parameter t0 and two other parameters, k and L. The parameter k represents how drastic the exponential behavior is; higher values cause things to blow up faster and thus start to reach limits sooner. Thus the highest values of k in the final population are associated with somewhat lower values of t0. The time when the number of new daily cases reaches its maximum happens a few days earlier.

Regarding the correlation between parameters t0 and L, a positive-valued one this time, it simply makes sense to realize that increasing new cases longer before you finally start to slow down the increases is associated with having more people ultimately infected.

Reasons Why Things Might Not Be So Bad

I want to emphasize that there is also the distinct possibility of L coming down by a lot within the next couple of days. (Unfortunately, I thought it would do that a couple days ago already, but it’s done the opposite.) It could still happen for a couple of reasons I can think of:

  • A curtailing effect becoming apparent soon from containment measures that just aren’t being noticed quite yet due to the incubation period.

  • A sudden recent increase in the number of reported cases due to testing finally being available. The rate of tested vs actual may be increasing, not just the absolute number of people testing positive. This would mean that the model is currently getting fitted to an overly dire set of parameters (especially L) due more to recent dramatic increases in reported cases from better testing than exponential spread of the virus.

And there are probably many more reasons I haven’t even imagined why that curve might start bending down sooner than in this simulation. Again, I need to emphasize my lack of biological or medical expertise. And this leads to . . .

The Disclaimer

First, I disclaim everything that John Hopkins does when offering the data on which this analysis is based.13 I’m pretty sure their lawyers had good reason for putting that stuff in there, so I’m going to repeat it. Except think “Ed Suominen” when you are reading “The Johns Hopkins University”, and this blog post when you read “the Website.”

This GitHub repo and its contents herein, including all data, mapping, and analysis, copyright 2020 Johns Hopkins University, all rights reserved, is provided to the public strictly for educational and academic research purposes. The Website relies upon publicly available data from multiple sources, that do not always agree. The Johns Hopkins University hereby disclaims any and all representations and warranties with respect to the Website, including accuracy, fitness for use, and merchantability. Reliance on the Website for medical guidance or use of the Website in commerce is strictly prohibited.

Second, I know very little about biology, beyond a layman’s fascination with it and the way everything evolved. (Including this virus!) I do have some experience with modeling, including using my ADE Python package to develop some really cool power semiconductor simulation software that I’ll be releasing in a month or so from when I’m doing the GitHub commit with this COVID-19 example. The software (also to be free and open-source!) has a sophisticated subcircuit model for power MOSFETs that evolves 40+ parameters (an unfathomably huge search space). It uses the same principle–differential evolution of nonlinear model parameters–as this unfortunate example we find ourselves in.

The model I’m using for the number of reported cases of COVID-19 follows the logistic growth model, with a small (and not terribly significant) linear term added. It has just 4 parameters, and finding the best combination of those parameters is no problem at all for ADE.

Remember, I am not an expert in any of the actual realms of medicine, biology, etc. that we rely on for telling us what’s going on with this virus. I just know how to fit models to data, in this case a model that is well understood to apply to biological populations.

Don’t even think of relying on this analysis or the results of it for any substantive action. If you really find it that important, then investigate for yourself the math, programming, and the theory behind my use of the math for this situation. Run the code, play with it, critique it, consider how well the model does or does not apply. Consider whether the limiting part of the curve might occur more drastically or sooner, thus making this not as big a deal. Listen to experts and the very real reasoning they may have for their own projections about how bad this could get.

It’s on you. I neither can nor will take any responsibility for what you do. I will say this, though: If you haven’t been sitting at home for a week straight already, wash your hands a lot and don’t itch that nose unless you really have to and you just got done with one of those hand washings. It’s a hot zone out there already. You don’t need my fancy modeling to see that.

Finally, if this is getting you down, please think of all the people who were living and loving and looking up at the blue sky even during the fall of Rome and the Black Death. We have a front-row seat on history being made. Yes, it is a worldwide biological cataclysm not seen since the days of polio, smallpox, and the Spanish Flu.

Yes, this really sucks. But you are alive, and there is so much left to see. A world in crisis can sometimes be an exhilarating world to live in, like a sharp fresh breeze tickling your face on a clear winter’s day. Your grandparents saw cold bracing days like these, and were called the Greatest Generation for the way they responded.

To anyone in despair: Leaving the show early would be a sad waste of the seat that was reserved for you. Stick around. Do what you can to make your life a little better, and the lives of those who love you and whom you love. Allow your worries and fears and sadness to seep into the gentle awareness that an entire world now worries with you.

And there is a bit of good news to share, though it may be cold comfort for my fellow citizens in the U.S.

South Korea is fully in its containment phase, well past its t0 that took place over two weeks ago. They followed the logistic growth model all the way to the containment phase. Look at the two curves and annotated +/- numbers in the upper subplot! The lower subplot zooms in on a narrow range of case numbers around 8,000, where it is unlikely to increase much further.

Reported Covid-19 cases in South Korea vs days since Jan. 22, 2020

Italy’s numbers should start leveling off significantly in the next week. They reached t0 yesterday, according to my best fit of the logistic+linear model with this evening’s data. They appear to be headed for around 70,000-80,000 cases, or about 1% of their population. Even that doesn’t sound too bad.

Reported Covid-19 cases in Italy vs days since Jan. 22, 2020

Be well. And stay home.

Notes


  1. Ng Yi Kai Aaron pointed out an article referencing a paper (Ziff, Anna L. and Ziff, Robert M., “Fractal kinetics of COVID-19 pandemic,” preprint available online) suggesting that the data from China’s experience with the virus “are very well fit by assuming a power-law behavior with an exponent somewhat greater than two.” 

  2. See the important section entitled Disclaimer

  3. See, e.g., https://services.math.duke.edu/​education/ccp/​materials/diffeq/​logistic/logi1.html

  4. All these significant figures are only used for comparison purposes. It is of course silly to put more than a couple of significant digits on extrapolations this uncertain. 

  5. You may think that’s progress, but I consider it disappointing (as a human being with a pair of lungs, not as a data modeler) that the model is tracking the model’s exponential growth phase so closely, and that t0 seems to remain far in the future. 

  6. This projection remains unchanged from the one done with data from yesterday (3/18). 

  7. With yesterday’s data, I thought we would reach the five million mark a day earlier, 4/10. 

  8. Trevor Bedford, for example, a scientist at the Fred Hutchinson Cancer Institute in Seattle “studying viruses, evolution, and immunity,” has mentioned a 10:1 true vs reported cases ratio. https://twitter.com/​trvrb/status/​1238643292197150720?s=20.

    “I could easily be off 2-fold in either direction,” he Tweeted on March 13, when there were just over 2,000 cases being reported in the U.S., “but my best guess is that we’re currently in the 10,000 to 40,000 range nationally.” 

  9. Those who follow me on Facebook know how much contempt I have for the incompetent, malicious, destructive asshole who found enough bigots and morons in a key combination of states to make it past the Electoral College. No, I will not mince words. If you still support Donald Trump– knowing that he dismantled the office that Obama had set up to address pandemics, that he fired people with expertise to deal with this, that he downplayed and denied the reality of the problem until just days ago–then I think there is something deeply wrong with you.

    In my previous post, I asked, “Do many of his supporters even realize how much they’ve been played?” I quoted the self-confessed narcissist Sam Vaknin, who wrote that “the narcissist abuses people. He misleads them into believing that they mean something to him, that they are special and dear to him, and that he cares about them. When they discover that it was all a sham and a charade, they are devastated” (Malignant Self-love: Narcissism Revisited, Narcissus Publications, 2015, p. 69.)

    So far the deranged narcissist’s base of support has proven remarkably resilient to plain facts about how much of a sham it really is. I hope that changes very soon. 

  10. Using my free, open-source Python package ade, Asynchronous Differential Evolution. 

  11. Genetic drift is an evolutionary phenomenon where a population “drifts” certain bits of its genetic code toward what appears to be an optimal range when in reality it is just the survivors propagating a consensus that has no actual selection value. I’ve seen it happen with my computer evolution of simulation model parameters just like it happens in nature.

    The final population of L with 3/19 data ranges from around 20,000,000 to more than the population of the U.S., where the logistic model would obviously run into a stark limitation. Not different enough to show an updated plot. 

  12. This scatter plot doesn’t show a real probability distribution, as a Monte Carlo analysis would. But it does seem instructive, to represent a confidence interval of sorts. I’m guessing that it is no narrower than a posterior distribution obtained from a random walk with well-informed priors. On this question, however, my modeling knowledge reaches its current limits. 

  13. The GitHub repo is at https://github.com/​CSSEGISandData/COVID-19

Friday, July 29, 2016

Galaxy Gazing

I think that the dying pray at the last not “please,” but “thank you,” as a guest thanks his host at the door. Falling from airplanes the people are crying thank you, thank you, all down the air; and the cold carriages draw up for them on the rocks. Divinity is not playful. The universe was not made in jest but in solemn incomprehensible earnest. By a power that is unfathomably secret, and holy, and fleet. There is nothing to be done about it, but ignore it, or see.
—Annie Dillard, Pilgrim at Tinker Creek1
The Milky Way from my driveway

Tonight, with clear weather and no moon around, I am up late to look at a dark sky with the first decent pair of binoculars I’ve ever owned. The vaguely textured white blur of the Milky Way that my eyes have long admired, unmagnified, now resolve through the 10x binoculars into clusters of countless stars with crisscrossing fuzzy ribbons of black woven in between.

I pan the circular field of view slowly along our galaxy’s long overhead arc, immersed in the depth I sense above me from my two eyes merging a single image. There’s a satisfying tangible connection between the fine motions of my arms and the slow sweeping past of this collection of a hundred billion stars in our little corner of the universe.

A dim smudge near Cassiopeia teases my eyes’ limits of sensitivity and resolution. I think it’s M52, a globular cluster a few thousand light-years away. It was first identified by Charles Messier in 1774. The photons I’m collecting in my binoculars tonight from its 193 or so stars were more than 90% of the way here when Messier peered through his telescope. In the meantime, a nation rose through a rebellion and then quashed one of its own; enslaved, freed, and still long oppressed a large fraction of its citizens; conquered its native peoples and then rescued others from conquest in two world wars.

The smudges are clusters of countless stars.2

These photons had already emerged from their nuclear furnaces by the time some settlements along the river Tiber formed the first humble beginnings of the Roman empire.3 Their journey may even have been halfway underway by then; we’re not sure exactly how far away M52 is from us.4

It’s been a little more than two thousand years ago since a citizen of that empire, a gifted poet and philosopher, stood next to some pool or pond beneath the night sky. The skies anywhere in Europe were darker than they are now, even at my place out in the country. I imagine Titus Lucretius Caras (c. 99-55 B.C.) looking at an image of the blazing array of stars overhead, seeing their “images,” which, he muses, must “be able to run through space incalculable / In a moment of time.”5

The pointpoints and patterns of the stars are mirrored in the still water before him, “not turned round intact, but flung straight back / In reverse,” with the features thus shown “in reverse.”6 He moves slightly to one side along the water’s edge and notices how one particularly bright star near the horizon comes abruptly into view from behind the tree. Its direct image and its reflection both wink on instantly–at exactly the same time, as far as he can tell.7

A smooth surface of water is exposed

To a clear sky at night, at once the stars

And constellations of the firmament

Shining serene make answer in the water.

Yet he knows that the “images” raining down from the sky take a longer route when they make the extra trip to the water and back than when they go directly into his eye.

Now do you see how in an instant the image

Falls from the edge of heaven

to the edge of earth?

Wherefore again and yet again I say

How marvellously swift the motion is

Of the bodies which strike our eyes

and make us see.8

Those image-bearing bodies are “marvelously swift” indeed. They move 186,000 miles–more than 23 earth diameters–through the vacuum of space every second. Yet the immense vault of our universe is so incomprehensibly vast that it’s taken most of the span of human civilization for them to reach us, from a relatively nearby neighbor within just our own galaxy (there are at least a hundred billion others).9

My kind of nightlife

Silent and impassive to all the twitches and ripples in the microscopic biofilm of one ordinary planet, in the hundreds of years since Messier noticed this odd feature among the stars–in the thousands filled with death and wars and tears of joy and sorrow since Lucretius did his ancient poolside musings–the photons from its clustered stars continued their long journey outward. Only now do they finally land on my retinas to collapse wave functions and trigger individual rod-shaped cells to launch neurotransmitters down neighboring filaments of cell-strings along my optic nerves.

In my brain, a little smudge registers. Something’s really up there.

The stars in M52 will keep launching their photons all my life, as they have for 35 million years now. They’ll get lost in the sea of light that covers and warms the daylight half of earth, fall through clear skies over the other half in darkness, and remain ignored almost always, as the earth swings around its own little star a few dozen more times until my eyes no longer see anything at all.

And yet, despite my absence, the earth will stay in its orbit and the photons will stream on.

Notes


  1. Does it surprise you to see such ringing words of spirituality as the epigraph to an atheist’s essay? Such prose retains its profound beauty regardless of one’s disagreements with its message. And even with no God in the picture, I am still happy to call whatever was behind the Big Bang, or the quantum fluctuation that unleashed the Big Bang, or whatever was behind that, a “power that is unfathomably secret,” even holy, filling me with a sort of reverence as I gave upwards at night. 

  2. There’s also some light pollution near the horizon, even out here, miles from the nearest city. I’ve tried to de-emphasize it with reduced yellow and green luminance. 

  3. en.wikipedia.org/​wiki/Ancient_Rome 

  4. Because “this cluster is in the plane of the Milky Way,” our available “methods of determining distance are too uncertain,” some yielding estimates “as small as 3,000 light years, while others are as large as 7,000” (Ethan Siegel, “Messier Monday: A Star Cluster on the Bubble, M52,” ScienceBlogs

  5. Lucretius, Book IV, line 191. From On the Nature of the Universe, Ronald Melville, trans. (Oxford University Press). 

  6. Book IV, lines 295-99. 

  7. It’s not exactly the same time, of course, something I remain well aware of as an electrical engineer with a radio background. Indeed, engineers rely on the known and limited speed of light to do antenna design with all of its resonant and carefully spaced conductive elements. Quarter-wavelength spacings abound. 

  8. Book IV, lines 210-17. 

  9. “How Many Stars Are There In the Universe?”, European Space Agency. I’ve seen another dim smudge out there in the night sky from the nearest of those other galaxies, Andromeda. Its photons took millions of years to reach me instead of thousands.