Showing posts with label Evolution. Show all posts
Showing posts with label Evolution. Show all posts

Friday, April 10, 2020

Marching Towards a Million

[T]he feelings of knowing, correctness, conviction, and certainty aren’t deliberate conclusions and conscious choices. They are mental sensations that happen to us.
—Robert Burton. On Being Certain
Update, 4/12: The curve is continuing to flatten a bit more than the model is able to extrapolate from having its six parameters fitted to the data as it stands each day. (I certainly have no objection to things coming in on the low side, even if it reveals some room for further possible improvement in the model, or just some random-walk stochastic behavior that simply can’t be chased via curve fitting and extrapolation, even with more model parameters.) There’s not been enough of a difference to go through this post and do a bunch of edits, so I’ve just added a few footnotes and this link to the latest version of the Nightmare Plot. Don’t get complacent now, OK?

For just over a week now, I’ve been running and re-running the “logistic growth with growth-regime transition” model I’ve developed for reported cases of Covid-19, evolving populations of individuals whose digital DNA consist of six parameters for that model:

xd(t, x) = r*x*(1-x/L) * flatten(t) + b
flatten(t) = 0.5*rf*(1-tanh(1.1*(t-t0)/th))-rf+1

Two of the parameters, L and r, are for the conventional logistic growth component of the model. L is the maximum number of possible cases, which at worst case is limited by the country’s population. The other parameter r is the initial exponential growth rate, before any curve-flattening becomes apparent.

The “growth-regime transition” component of the model, implemented by flatten(t), has three parameters of its own: There is a fractional reduction rf in growth rate at the conclusion of a lower-growth (flattened curve) regime. My modification to the conventional logistic-growth model uses the tanh function to smoothly transition from the original growth rate r to a lower “flattened” growth rate r*rf over a time interval th (in days). The transition interval is defined as the middle half of the full transition between growth rates, i.e., from still 75% of old vs new down to 25%. The midpoint of the transition is defined by the third parameter t0, in days after 1/22/​20.

Finally, a very small constant number of new cases per day is included as a sixth parameter b. This does manage to “earn its keep” in the model by more flexibly allowing the mean of the residuals to be zeroed out.

Here is what the model is saying today, with Johns Hopkins data from this evening (4/10, click here for the latest one with 4/12 data).

April 10, 2020: Reported U.S. Covid-19 cases vs days since 1/22

Extrapolating forward (admittedly a perilous enterprise, as I’ve said before with endlessly repeated disclaimers and caveats, applicable yet again here), the model is projecting three quarters of a million cases1 one week from today, and over a million the week after that.2 The model’s projection–both now and as it stood on 4/5–is that there will be around a million and a half Americans–one in every two hundred–reporting infection with Covid-19 in early May, with the number still climbing faster every day.3

The curve is bending downward a bit, yes, but things still look pretty grim.

Past Performance

You’ve probably heard the phrase “past performance does not indicate future results,” and that’s true if something happens (as it often does) that’s not accounted for by the model. Life is messy and complicated, and that includes pandemics. But shitty performance does tell you not to bother looking any further. And that’s definitely not what’s been happening with my little model.

With parameters evolved to yesterday’s Johns Hopkins data (4/9), it had projected today’s increase in the number of cumulative cases to be 32,920 instead of the 35,098 new cases that actually got reported today. That was off by about 6%. (I calculate the error from the projected vs actual increase from the most recent known value, because the model is anchored to that point and can only be given credit or blame for what it projects from that last known point.)

With data from the day before yesterday (4/8), it projected yesterday’s number of new cases at 32,385. There were 32,305 new cases yesterday, an error of 0.2%.

And with data from 4/7, the model evolved to parameters that had it projecting 30,849 new cases the next day. There were 30,849 new cases on 4/8, a 6% error.

On 4/5, the model projected 32,346 vs the 32,133 that there were on 4/6, an error of just 0.7%.

The model of course does a little worse when it looks further ahead. You’d expect that of any naive model (i.e., not theoretically informed, beyond “the growth rate is going down”) of data from six empirically-optimized parameters being extrapolated with exponential growth. And it’s certainly not anything I’m ashamed of.

On 4/3, the projection was for there to be just over 600,000 cases today, compared to the 496,535 we have had reported at this point in the U.S. Quite a bit off, but remember, that was looking forward a full week.

On 4/5, it was for there to be 510,425 cases today, a total error of less than 8%, again with the error measured from the projected vs actual increase, not the absolute number.4 On 4/7, the model projected there would be 486,367 cases as of today, and that was off (in the increase) by 10%.5

Evolution in Action

(In more ways than one, unfortunately.)

I’ve been wanting to write a little about the whole process of computer evolution that I use to fit the model’s six parameters to the time-series data. It begins with a population of 240 simulated organisms (not a virus!), digital “individuals” whose randomly chosen6 DNA consists of the values of those six parameters, within predefined bounds.

After working on this model for over a week, I’ve refined each of those bounds to a reasonable range of possible values. Updating those ranges as the model makes sometimes failed attempts to find a convincing best fit is my sole remaining human-engineering activity, now that the model is designed and the code implementing it is working.

Each of those 240 individuals is given a chance to show how fit its DNA is by running a backwards integration of the model from the last known data point. The model, you may recall from reading my previous blog post is for the number of new cases each day, not the cumulative number of total cases ever.

The model is a differential equation; xd(t,x), not x(t). So, to have it project the cumulative number of cases x(t), I integrate the differential equation forward or backward from a fixed point represented by the most recent known number of cases, a point to which it is anchored.7

The modeled number of reported cases is compared to the actual number that were reported, for each day going back what is now a full five weeks’ worth of data.8

The fitness of each individual is measured as the sum of the squared errors (SSE) between each day’s modeled number of reported cases (the value that the model would expect, being integrated backwards) vs. the number of cases there actually were as of that date. The two figures are compared only after they have had a square-root transform applied to them. This limits how much more recent, larger numbers of new daily cases weigh in the fitness calculation vs earlier ones.

Then evolution gets underway, with each population member getting challenged by an individual, which gets spawned from combinations of not two but four population members. These mutant offspring have the model run with their parameters through the integration and SSE calculation. If they are better (lower SSE) than whichever population member is being challenged in its turn, they replace it. When all members of the population have received their challenge, possibly having been replaced, evolution proceeds to the next generation.

The whole spawning process is worth a moment of technical discussion. Differential evolution uses an interesting9 sort of mathematical equivalent to some kind of alien four-way sex to create challengers. The trial individual is formed from “crossover” between the a “target” (a rough DE equivalent of a mother) and a “donor” individual (closest thing it has to a father). The donor is formed from the vector sum of a base individual and a scaled vector difference between two randomly chosen other individuals that are distinct from each other and both the target and base individuals.

That is hard to follow in print, but this equation might help. The many-tentacled alien infant we want from this tangled act is ic, the individual challenging:

id = ib + F*(i0 - i1)
ic = crossover(it, id)

The crossover consists of giving each parameter of the donor individual a chance (usually a very good chance) to appear in the challenger, as opposed to using the target’s parameter. Basically, think of a higher number as being more for paternal rather than maternal inheritance. The default value used by my program (apologies for what is becoming an awkward analogy) is 0.7.

I refine the parameter bounds as needed and sit back while my ade Python package (of which this whole Covid-19 modeling is a single example file covid19.py) had 75 generations of these simulated individuals spawn and fight it out on a virtual Darwinian savanna. The software dispatches SSE-calculating jobs asynchronously to worker Python interpeters across the timeslots of my six-core CPU. It takes about two minutes. The software produces a plot with the model’s curves in red and what actually happened in blue. It’s what I’ve been calling the Nightmare Plot.

Singing in the Apocalypse

The Nightmare Plot really is horrifying when you think about what those numbers represent. Their relentless upward march–slowing but by no means stopping–is making everyone’s life suck including my own. The novelty of this whole apocalyptic survival thing is starting to wear off just a bit, even for me.

Maybe I’m a terrible person, but dammit I can’t help experiencing a bit of pride, too. This “little model that could” is proving to be the no-pay, no-fame, no-acceptance academic-mathematical equivalent of, say, some college undergraduate inventing an optimal radio receiver frequency arrangement, now in use by the circuitry of your smartphone, as part of an independent senior project he decided to work on weekend after weekend a quarter century ago. (This hypothetical individual never was much for working in groups.)

So, take it or leave it, folks, you’ve got yourself a six-parameter mathematical model for the number of reported cases of Covid-19. It was never “published” in some elite-accepted overpriced package of specialty information. It wasn’t part of any network of peer reviewers. (Like the aforementioned loner radio geek, I’ve never been one for playing in groups.)

But it does appear to work.

———

A friend of mine told me today that he is selfishly rather enjoying this whole situation. He now has lots of time to learn things on his own that he’s been wanting to work on, time for “driving out to pretty places to take pictures and go for walks in the woods.”

He’d rather be back doing what he does in person. But, he admits, there are upsides.

I assured him that it’s 100% OK to enjoy those upsides, even as I admitted my own feeling of finding it a little weird to derive satisfaction from successfully modeling these awful numbers. But I had the benefit of receiving yesterday some reassurance in this area, as I was talking about this very topic with a friend of mine whose life’s work revolves around how people think and feel about things.

He told me he can’t be as helpful to others in his profession if he isn’t taking care of himself, and that means enjoying life in spite of or even at times because of what is otherwise a horrible situation. Of course he’d rather not be in it, nor would I or you, dear reader with a delicate pair of lungs of your own. But he is, and you are, and I am, and so let’s take what good there is to be had.

Smile and sing and laugh, and take pride in the work that you now have. Even through the Apocalypse.

Notes


  1. With data updated as of 4/12, the projection is now 700,000 by 4/17. A bit lower, and the curve is continuing to flatten, but not by much.

    The careful reader may notice that I always refer to “reported U.S. cases” or some alternate variation and wording. I will repeat yet again that I have no expertise in biology, medicine, or the spread of infectious disease, and so I try not to speculate on how many of our fellow citizens have actually gotten infected with this thing without it ever being reported. Again, I’m just a retired engineer who has spent years constructing nonlinear models of mostly electronic things and believes this one is a pretty well grounded for doing the following, and only the following: predicting what will happen if the data continues as it has recently, especially as it has in the past two weeks

  2. With data updated as of 4/12, the projection is now for a bit less than 900,000 cases by 4/24. One redditor cleverly observed that my modeling’s million-case projections have been like those for fusion energy. (The saying is that fusion is 30 years away and always will be.) I won’t dispute that observation at this point; each day’s new data for the past week or so has pushed that projection outward a bit, though never making it look any more implausible to reach eventually. 

  3. Let’s call it mid-May now, given the recent additional curve flattening (4/12 data). 

  4. Although the difference between computing the error is less with such a large increase from the 4/5 last-known number of 337,072. In case you really want to know, the absolute error from the model projecting forward five days was 2.8%. 

  5. The actual number of cases as of 4/12 from Johns Hopkins’ daily data was 555,313, an increase of 58,778. On 4/10, the model was projecting 567,306 cases, or a projected increase of 70,771. The error in the increase was 20% over the two-day interval, or 10% per day. Not as good as previous days’ next-day predictive performance, but not terrible, either. And since there will always be an error when extrapolating from a curve fit to data having a random component, I’m happy the data is lower than projected and not higher, because I have my own delicate pair of lungs that I’ve grown fond of, too. 

  6. Actually, they’re not quite chosen with uniform randomness: The default is to set up the initial population using a Latin hypercube. That way you get initialize pseudorandom parameter values, but with minimal clustering. You want the six-dimensional search space to be explored as fully as possible. 

  7. This is known as an “initial value problem,” the initial value here being the last known number of cases. You can go either direction in time from the initial value. For fitting the model parameters, my algorithm goes backwards from the most recent data. To extrapolate forward and make projections, it goes forward from the same “initial value.” 

  8. Reported cases numbers before March 5 are omitted from both the curve fitting and plots. The modeled (red) curve deviates from the historical data when you go earlier, and I’m not sure a good fit that far back in the history of this thing is relevant to what’s happening now. 

  9. You claim you wouldn’t find alien four-way procreative sex interesting? Well, I don’t believe you. 

Monday, April 6, 2020

Portrait of a Pandemic

Time is a river, a violent current of events, glimpsed once and already carried past us, and another follows and is gone.
—Marcus Aurelius, Meditations
Update, 4/8, 5:45 PM PDT: With the latest data from Johns Hopkins, the latest NightMare Plot is not different enough from the one originally included with this post to warrant editing the post at all. I’m not going to run the numbers again for all the countries here, but of course was curious as to how things are looking in my own. The answer: still not good, even though the curve is continuing to bend downward slightly.1 Today’s 429,052 cases was 10% below what the model projected (three days ago!) that today’s number would be. (That error is in terms of the projected increase, which was 30% rather than the actual 27%.) The model still projects reaching the half-million mark around April 10 and a million cases by month end.

A model I’ve developed for naively fitting to the number of new cases of Covid-19 each day modifies the widely-accepted logistic growth model with a smooth growth-regime transition. It has been closely tracking the dramatic spread of the virus in the United States and other heavily affected countries while accounting for the “flattening of the curve” that is becoming apparent to a varying extent in each of those countries.

The modified model scales the reproduction parameter r by a hyperbolic tangent function of time, with a midpoint time t0 and a half-transition time th. (All time-based parameters are in units of days.) The scaling begins near 1.0 (full traditional r reproduction rate) and transitions to some lower fractional value rf as social distancing and lockdowns force the virus into a lower-growth regime that approaches an effective rate of rfr.

The population-limited parameter L of the logistic growth model remains, though it is still not nearly as consequential or well-defined in the curves of most countries as the curve-flattening effect of rf.2 Finally, there is a very small constant term b, a fixed number of new cases per day. That is mostly included to zero out the mean of the residuals so that I could get decent goodness-of-fit indicators for the model.

The result is a logistic growth model that incorporates the “flattening of the curve” now apparent even in U.S. data by effecting a smooth transition from the original uncontrolled growth rate to a later, lower one:

xd(t, x) = r*x*(1-x/L) * flatten(t) + b
flatten(t) = 0.5*rf*(1-tanh(1.1*(t-t0)/th))-rf+1

For details, development history, context, and disclaimers, see my series of previous Covid-19 blog posts, starting with the most recent one on April 2.3

Below are projections from the model for various countries, based on Johns Hopkins data from last evening, April 5. I will start with my own mess of a country, not just because I am a citizen of it but because it now has the most cases of Covid-19 on the planet and almost surely will continue to as this nightmare continues.

The United States of America

Extrapolating from its remarkably close fit with data going back more than two weeks now, the model is projecting half a million cases4 by April 10 and a million cases by the end of the month. The residuals (errors between how closely the model fitted to past data versus what the data actually was) are of a completely normal distribution.

Kenneth Regan, professor of mathematics at SUNY Buffalo, suggested that it would be useful to monitor the ratio of new cases reported on a given day to all cases under treatment that day. The idea is that it would reveal progress in a way that is more resistant to issues like how good the reporting is or how widespread the testing is, and that it would show when the data starts to show a peak.5

Although I don’t have data for a number of all cases under treatment on a given day, I’ve tried to implement the spirit of Professor Regan’s idea with a fourth subplot at the bottom of what I’ve been calling the Nightmare Plot, showing how large each day’s new cases are as a percentage of the cumulative number of cases ever reported by that date. It’s an instructive additional visualization, and the suggestion was much appreciated. And as you can see, that metric has been tracking since since 3/19 to within a few percent of what the model expected it would have been. Thankfully, that red curve is heading steadily downward.

The Nightmare Plot: United States of America, 4/5/20 data

As with any extrapolation from time-series data without reference to underlying theory or events, the model isn’t making perfect projections. With parameters evolved to data from the day before yesterday (4/4) when there were 308,850 U.S. reported cases, the model projected 346,464 cases yesterday. Thankfully, it was pessimistic by 33%; there are “only” 337,072 total (cumulative) cases being reported in last evening’s Johns Hopkins data.

Wait a minute, you might be thinking, the difference between 346,464 and 337,072 seems like a lot less than 33%. True. It’s only 2.8%. But that would give the model too much credit, because it was extrapolating from a fixed, known number that was 91.6% of yesterday’s actual number of cases. The model only gets credit (or blame) for its projected increase from that point, which it said would be 37,614 but actually was 28,222.

This error is a welcome departure from the previous few days when the model had been making dire next-day projections that were uncomfortably close to what actually happened. With data from a day earlier (4/3) when there were 275,586 cases, the model projected there would be 311,138 the day before yesterday (4/4). That represented an error (again, considering only the increase) of 6.9% (also on the pessimistic side). With the data available on 4/2, the model projected that the 243,453 cases that day would become 275,799 on 4/3, an error of just 0.66% in the expected increase.6 That one felt a little spooky.

When you are modeling numbers that represent hundreds of thousands of your fellow citizens getting very sick, you want reality to come in below your projections. I have been hoping for some evidence that the curve is bending downward, and finally it is here. Not by much; that Nightmare Plot still goes to well over a million cases before the month is over.

But I’ll take it. I’m in a much better mood than when I was freaking out over the numbers being so close to my projections for a couple days in a row. The model has been just a bit pessimistic and required some updates to its parameters to better track reduced growth, and that might continue to happen as the days progress and people finally take this thing seriously. Down, curve, down!

———

And with that good mood, I will allow myself some pride in how closely my modification to the logistic growth model is tracking to many more data points than one might expect from its six parameters. Look at that top subplot: There is no more than 8% error between what the model is fitting to the data and what the data actually has been for the past twelve days, going back to 3/24. In that span of time, the numbers being modeled have increased by a factor of six. The 4/4 data fitted the model to within a 5% error going just as many days into the past (back to 3/23). It was pretty much the same with data from the day before, with a comparably low level of errors going back to 3/22.

My evolutionary curve-fitting algorithm does two things to counter the effect of the more recent values (much larger than earlier ones due to exponential growth) having a disproportionate amount of weight. First, the model is for the number of new cases each day, not the cumulative number of total cases ever. It is a differential equation xd(t,x), not x(t). So, to have it project the cumulative number of cases x(t), I integrate the differential equation forward or backward from a fixed point represented by the most recent known number of cases, a point that it is anchored to.7 The number of new daily cases is increasing dramatically, but not quite as dramatically as the cumulative number of cases. So that helps keep the emphasis from being quite as much on more recent data.

Second, I apply a square-root transform to both the modeled and actual new-case numbers before computing the sum of squared error (SSE) between them and then evolving the parameters for minimum SSE. The 28,222 new cases we had yesterday had four times as much influence on the curve fit as the 6,421 new cases we had on 3/17, not sixteen times as much.

Despite these efforts, the algorithm is still going to fit recent data more closely, and that fit has been very close indeed, with just 1% maximum error over the past six days. Think about it; for nearly a week as the number of cases has more than doubled, the model stays within 1% of what actually happened as it traces its smooth growth curve backwards in time.

Due to its emphasis on fitting to later data points, the model strays a bit more, percentage-wise, as we peer into the ancient history of early March. But I’m certainly not ashamed of it expecting a little less than half the 217 cases we had one month ago as it looks further backwards from the harsh reality of over a thousand times as many cases now.8 Imagine you were looking at this blog post back then and there was a model telling you that there would be “only” 150,000 cases today. Would you now be upset that it somehow failed to convey the magnitude of the situation to you? I didn’t think so.

———

So, what is making me happier now, with numbers still climbing exponentially and no leveling-off in sight on the right side of that Nightmare Plot? I am hearing the whisper of encouraging things in the combination of six parameters that my software has evolved for the model based on yesterday’s data. The curve is indeed flattening a little bit, despite the stupidity of a lot of Republican governors, nutjob pastors, and spring-break partiers.

Let’s take a quick look at those parameters for ourselves.

USA, 4/5: Final, former, and failed parameter values vs sum of squared error

Each of these six subplots shows the values of one model parameter versus the sum of squared error (SSE) between modeled values and actual values going back just over four weeks. The subplots are zoomed in so that small SSE differences are evident between members of the final population (red dots) and parameter values that had once been part of the population over the course of the 75 generations of evolution (blue dots). Also shown (small black dots) are unsuccessful challengers that never made it into any population, but whose failed parameter combinations are instructive for showing how SSE varies with parameter values.

These plots make it clear that there are some well-defined ranges for all of the model parameters except L, the maximum number of possible cases, which at worst case is limited by the country’s population. The reason L isn’t better defined is that there is still no evidence in the time-series data of a fixed upper limit to the ultimate number of U.S. cases. The best we can infer from the distribution of L values in the upper-left subplot is that the fit starts to become less plausible if we assume an ultimate upper limit of less than 20 million cases.

Another model parameter that is more of a useful nuisance is b, a very small constant number of new cases per day. With every single model I’ve tried thus far, including this latest and hopefully final one, the parameter evolution favors smaller and smaller values of b tending toward zero. That’s evident in the upper right subplot. This parameter does its job of zeroing-out the mean (average) of the residuals (second subplot from the top in the Nightmare Plot) but that’s pretty much it.

Now onto the really interesting parameters. First, of course, is our old standby r, the initial reproduction or growth rate before any curve-flattening or (eventually) population-limiting begins to take effect. In the U.S., the model says that the number of cases was increasing by around 45% per day for a couple weeks after the first modeled date of 3/5. Looking again at the top of the Nightmare Plot, you can see that the model was assuming too high of a growth rate back then. The blue line was increasing exponentially (which appears as a straight line on a logarithmic plot), but at less than 45% per day. The curve-fitting algorithm tolerated this error as it fiddled with the parameter values to get its remarkably close fit to the data later on.

And when it got that remarkably close fit is when we approached the 61-62 days after 1/22/20 that evolved as an optimal value of parameter t0. That happened around March 23. Over the span of a week either direction from that date (th of around 14 days), the growth rate transitioned halfway from its initial 45% per day down to a post-flattening growth rate that’s lower by an impressive 90%. As you can see at the bottom of the Nightmare Plot, the number of reported cases increased by 8.4% yesterday, and that growth rate is expected to continue dropping, though more slowly now.9

For a deadly virus that is right now killing thousands of my fellow citizens, a 90% drop in growth rate seems like good news indeed, even as the escalating numbers will likely continue to stress us all out for the rest of April and probably beyond.

Now let’s take a quick look at Nightmare Plots for other countries that have been impacted by Covid-19. In some of them, the model has problems fitting to the data. I’ve extended the curve-fitting interval for each country as far back as possible without allowing clearly implausible pauses and jumps in reported-case numbers to mess everything up. There’s no avoiding such pauses and jumps that have occurred more recently, like France’s sudden discovery of more than a quarter of its cases on 4/4 alone, but the differential evolution algorithm and the model do their best with what they have to work with.

I will leave you with final thought to consider as you scroll down and look through how everybody else is doing outside the U.S. None of these other countries–not even Iran with its contemptible mullahs–is being run by anyone so utterly incompetent as our failed trust-fund game show host with his obvious cognitive limitations, profound ignorance, contempt for science and sound public policy, and pathological narcissism that drives every single thing he does. Let’s please not make that mistake again, OK?

Spain

The second-most impacted country behind the good old US of A is Spain with its 126,168 reported cases yesterday.

With parameters evolved to data from a day earlier when there were 119,199 cases, the model projected 127,844 cases on 4/4. The model overestimated the increase by 24%. That’s not as impressive as the model’s fit with recent data: No more than 2% error going back to 3/29, and no more than an astoundingly low 7% going all the way back to March 15. The residuals are indistinguishable from completely random Gaussian noise.

Spain has flattened that curve pretty well recently and the projected numbers don’t look too bad, perhaps doubling over the next month.

Please note that Spain had 500 cases to our 402 back on March 7, when Donald Trump responded as follows to a reporter’s question about whether he was “concerned that the virus is getting closer to the White House and D.C.”: No, I’m not concerned at all. No, I’m not. No, we’ve done a great job.

Well, if “a great job” means not getting people tested, telling his followers to carry on as usual and causing sycophant Southern Republican governors to delay action until the magnitude of the problem became obvious even to them, and now bullying states about getting the life-saving equipment they need as our numbers are now nearly three times larger and headed for probably at least six times larger over the next month, sure, what a fine and excellent job you’ve done, you fucking moron.

Italy

The model follows a modest recent flattening of the curve, half of which occurred over a one-week interval centered on 3/24. The plot shows that there was also some very slight flattening around 3/15 and 3/12. As the noted group of epidemiologists Pink Floyd observed, “You are only coming through in waves.”

The residuals are not significantly non-normal, though they definitely “fan out” with higher predicted numbers of new daily cases. The effect of this can be seen in the upper subplot, where the model does not track earlier cases that well with their little waves of increasing and then decreasing growth rates.

The constant b evolves to an unusually high range here, reflecting a best-fit solution based on a fairly limited recent data set. The model’s simplistic assumption of a constant 600 new cases per day still works even for the first date shown (3/13). The data modeled and shown doesn’t go back earlier in March because there were large discontinuities before then that provide a questionable basis for curve-fitting.

Italy has had a very difficult time with overtaxed hospitals and lots of people dying. But at least their new cases have been significantly slowing down, as shown in the bottom subplot. Even though they can expect to wind up with several times more of their citizens ultimately reported as infected by Covid-19, it looks like they are at least in the latter stages of having slowed down their rate of growth.

Germany

Germany’s curve is quite impressive, with a significant degree of flattening. Look at this value of rf!

Germany, 4/5 data: Values of parameter rf vs SSE

In addition to demonstrating how much better it is for a country to elect an accomplished scientist as one’s leader rather than a failed casino owner spouting absolute lunacy10 about Covid-19 just six weeks ago, this plot shows that the parameter L doesn’t seem to be contributing much to the model. The curve flattening is entirely a result of the growth-regime transition, nearly to zero.

France

There’s been some jarring discontinuities in France’s data recently, so the model is not doing very well with it. The residuals are definitely not Gaussian random variation, which you can see just by looking at the right side of the residuals subplot, in addition to the essentially zero p value.

The projection is included in the interest of fairness, to make sure that non-impressive results are shown as well. If there is anything definite to be taken away from France’s Nightmare Plot, it’s that they are definitely not out of the woods yet.

Iran

Iran has been surfing waves of infection for weeks now. Just look at the periodicity in the upper subplot and the residuals plot below it! Despite that, there is no significant evidence of non-normality in the residuals. What I don’t like about the way the parameters have evolved is how much they focus on an abrupt growth-regime transition back in early March. It’s completely artificial to assume that the growth rate would drop by half over the course of a single day, and yet that’s what the model sees and so there it went.

This is the price one pays for doing naive modeling of time-series data. You make no assumptions about the underlying events or theory, and you get what you get. As the other plots show, that often works amazingly well, and it might still be working well here, but I’m a little suspicious about any projections for Iran’s future cases. If for no other reason than that having so many waves of infection means that they can expect more totally unpredictable waves in the future. So, perhaps three times as many cases in a month as now, if nothing else weird happens? Or maybe twice or half as many at that?

United Kingdom

Another instance of an artificially abrupt transition to a lower-growth regime, with one important difference. In the UK, the values of L in the final population have a nicely bounded range, and it’s got a pretty low upper end, well below the country’s population of 67.9 million. The rf parameter is also well-defined, but relatively modest; the model is doing much of its work with conventional logistic-growth behavior.

UK, 4/5 data: Values of parameter L vs SSE

Looks good to me, even if I’m not inuiting any such population upper limit just looking at the UK’s numbers thus far. Maybe the evolution of model parameters over 75 generations is seeing something my eye isn’t. I hope so.

South Korea

These residuals are definitely not normal variation. That’s probably because the model spends so much time tracking the very small increases that have occurred in the past month. Unlike us, the South Koreans knew what they were doing, and did it fast. There’s no further slowdown in growth rate ahead, but it is tiny at this point. Looking good!

Singapore

Singapore had a relatively bad day yesterday, with an abrupt jump of nearly twice as many new cases as in previous days. But their numbers of daily cases are still small.

The residuals are normal, but I’m not convinced that there has been any growth-regime transition like the model shows in the bottom subplot, though. It just doesn’t look right. And the model’s deviation from older data gives me pause.

Still, a projection of perhaps 10,000 total cases by early next month doesn’t seem too bad to me, even if it winds up being several times that for some reason the model can’t possibly account for right now.

Finland

Finally, little Finland, included not because it has been severely impacted by Covid-19 but because it’s the land of some of my ancestors and some of my friends. The model isn’t tracking Finland’s data particularly well, with residuals that don’t appear at all to be normal random variation. You don’t need the small p value to see that; just look at the discontinuities in the number of reported cases in the top subplot, and the huge momentary spike in new reported cases on 4/4.

The model projects less than 10,000 cases by May, but I honestly don’t think that projection is worth a whole lot right now. If the data comes down in the next couple days, it might be worth looking at again. Meanwhile, Finns will continue practicing the kind of social distancing they always have, long before anyone ever heard of Coronavirus.

Notes


  1. Even worse as far as the state of our fracturing Republic is concerned. I have no models or data for we may be seeing in the weeks and months ahead during any summertime reprieve we may get from the virus, when there will be no such reprieve from the deranged narcissist and his jackbooted thugs in his cabinet, the Senate, the Supreme Court, and gerrymandered GOP fiefdoms like Wisconsin.

    What will this loathsome cult of corruption and idiot-worship that mutated out of the party of Lincoln and Theodore Roosevelt do after November 4–regardless of which way the electoral college votes after whatever passes for elections in the swing states–if virus cases are rebounding like it’s 1918 all over again with economic conditions more like it’s 1929? I hope if there are sides to be taken, fellow citizen, you will be found on the side of our beloved Constitution. The sacred founding text of a nation that has been under attack for decades, by this President and his predecessors–with its first, second, fourth, and fifth Amendments and a rich history of case law from many decades of carefully reasoned decisions in between times (unfortunately including our own) when the Supreme Court allowed its political colors to show underneath those somber black robes. 

  2. Every parameter in a model should “earn its keep” by having its own robust independent effect on fitness of the curve to the data. Of all the six parameters, L is the least compelling to keep. Its role in limiting the ultimate pervasiveness of the virus is currently being overshadowed by the growth-reduction aspect of the model. But I’m leaving it in for now, looking for signs of its eventual emergence, because it ultimately will serve a purpose as the pandemic finally heads into end stage. 

  3. The full history of posts, going backward in time, is: Modeling a Slightly Flattening COVID-19 Curve (4/2), Into the Rapids (3/25), Pandemic (3/22), and the original Applying the Logistic Growth Model to Covid-19 (3/19). 

  4. Throughout this blog post “cases” refers to the number of Covid-19 cases reported for the particular country under discussion, as provided by the data freely released to the public by Johns Hopkins University. See their Terms of Use, which I have adopted as my own with of course a suitable change in names.

    I will repeat yet again that I have no expertise in biology, medicine, or the spread of infectious disease, and so I try not to speculate on how many people have actually gotten infected with this thing without it ever being reported. Again, I’m just a retired engineer who has spent years constructing nonlinear models of mostly electronic things and believes this one is a pretty well grounded for doing the following, and only the following: predicting what will happen if the data continues as it has recently, especially as it has in the past two weeks

  5. Thanks to my dear friend and former boss Louis J. Hoffman for conveying this suggestion back to me from Prof. Regan, along with permission to give him credit for it. 

  6. It was on the pessimistic side, as the model almost always has been, but by such a small amount in that case as to hardly seem worth mentioning. If you are curious, the model was extrapolating the 4/3 data out to a projected 350,510 cases yesterday. Pessimistic again, but not by much: 14% over two days when the actual increase was nearly a hundred thousand cases. 

  7. This is known as an “initial value problem,” the initial value here being the last known number of cases. You can go either direction in time from the initial value. For fitting the model parameters, my algorithm goes backwards from the most recent data. To extrapolate forward and make projections, it goes forward from the same “initial value.” 

  8. The model anchors to the most recent known data point, which for this discussion is last evening’s Johns Hopkins data (4/5) with its 337,072 U.S. reported cases. Remember, the model’s differential equation xd(t,x) is for the expected number of new cases each day, not the cumulative number of cases x(t). So, to have it project the cumulative number of cases, I integrate the differential equation forward or backward from that last fixed point.

    To have it integrating that differential equation backwards a full month and winding up in the same neighborhood with the mere 217 cases we had then, shrinking its backwards projections by more than a thousand in the process, seems pretty remarkable to me. 

  9. In my previous post, I wrote “in praise of the hyperbolic tangent function” for nonlinear modeling, and how I’ve used it for electronic circuit simulation. Turns out that tanh( . . .) is also quite useful for gradually transitioning from initially unrestrained exponential growth to a lower-growth regime resulting from social distancing and quarantine. 

  10. By the last week of February, “he criticized CNN and MSNBC for ‘panicking markets.’ He said at a South Carolina rally falsely that ‘the Democrat policy of open borders’ had brought the virus into the country. He lashed out at ‘Do Nothing Democrat comrades.’ He tweeted about ‘Cryin’ Chuck Schumer,’ mocking Schumer for arguing that Trump should be more aggressive in fighting the virus. The next week, Trump would blame an Obama administration regulation for slowing the production of test kits. There was no truth to the charge.”

    “Throughout late February, Trump also continued to claim the situation was improving. On Feb. 26, he said: ‘We’re going down, not up. We’re going very substantially down, not up.’ On Feb. 27, he predicted: ‘It’s going to disappear. One day it’s like a miracle it will disappear.’ On Feb. 29, he said a vaccine would be available ‘very quickly’ and ‘very rapidly’ and praised his administration’s actions as ‘the most aggressive taken by any country.’ None of these claims were true.” David Leonhardt, A Complete List of Trump’s Attempts to Play Down Coronavirus, The New York Times (March 15, 2020). 

Thursday, March 19, 2020

Applying the Logistic Growth Model to Covid-19

The chief task in life is simply this: to identify and separate matters so that I can say clearly to myself which are externals not under my control, and which have to do with the choices I actually control.
—Epictetus, Discourses.
Dad, you’re just some guy who knows how to obsess over numbers. We have actual people who are experts at this stuff. Go and write it if you want, but don’t feel like you have to!
—Daughter of Ed Suominen, March 2020.
TL;DR: A very good fit between data obtained on March 19 from Johns Hopkins University and a logistic+linear growth model indicates there there will be over 50,000 reported cases of Covid-19 in the United States on March 25, over 300,000 cases one week after that (4/1), and several million cases by early April. See the Nightmare Plot and the Disclaimer below.
Update, March 20: There was a significant uptick in U.S. cases today, bringing us to a total of 13,677 according to Johns Hopkins data provided this evening. The increase is more than was expected this time, and the jump is significant enough that I would rather not publish results with the logistic growth model fitted to the latest data. Doing so results in projections that are considerably higher than what the rest of this blog post discusses. I will wait for tomorrow’s data and then perhaps consider a modification to the model if we get unexpectedly high numbers again, one possibility being to change the linear term to a power-law one of the form a*t^b. That might reflect the effect of better testing without forcing an artificially high value for the exponential model parameter k.1
———

On March 15, I wrote to some friends on Facebook about the latest results of putting my computer evolution code and skills to the task of finding parameters for the logistic growth model as applied to the number of U.S. reported cases of Covid-19. My belief–speaking as someone with expertise in fitting nonlinear models to data but not any kind of expert in the fields of biology, medicine, or infection disease–was that we would reach 10,000 cases on March 17, and would have reported cases numbering in the hundreds of thousands by March 29.2 By the end of April, I believed there would likely be millions of Americans being reported as having this virus. The rate of growth I thought would be unlikely to even start slowing down before April.

The prediction for the one date that has come and gone was not quite accurate. On March 17, there were 6,421 cases reported in the U.S., a little less than two-thirds of what the model said was most likely. But, in my defense, I ask you to look back at yourself enjoying a Sunday in mid-March. Would you have been truly untroubled just two days ago by hearing that six thousand of your fellow Americans would have a deadly respiratory infection that puts a fifth of its hosts into hospitalization? The model was pessimistic but not ridiculous.

Two days earlier, on March 13, I had introduced the project to my Facebook friends, prefacing the discussion with the acknowledgment that I’m not a biologist, or a doctor, or an infectious disease expert of any kind. Just a retired engineer and inventor who knows how to write Python code and has been working on modeling and simulation for over a year now.

After seeing predictions ranging from dismissive to hysterical about the Coronavirus, I saw a useful if sobering application example of a tool that I’d written specifically for my electronic simulation work, ADE. This new example would apply the fairly well-known “logistic growth model”3 to what has now bloomed into a pandemic.

Writing and running covid19.py forced me to some stark conclusions: In one week (3/20), I said, we would be likely to have over 20,000 U.S. cases. A week later (3/27), around 10 times that. “By the first few days of April we could very plausibly hit the one million mark. There will certainly be nobody saying this is just like the flu by then.”

The most I was–and am–willing to guarantee about those predictions, however, is that the red line in the plot I included with the post is a nearly optimal fit of the function f(t)=L/(1+exp(-k*(t-t0)))+at to the number of cases versus time provided by Johns Hopkins university, including an update made that evening.

So how did that prediction fare? With data updated Thursday evening (3/19), it now appears that there will be around 13,500 reported cases on March 20. Again the model is pessimistic, with 68% as many cases reported as expected to be. But, again, even the lower one is a huge number of people getting very sick all of a sudden. Were you expecting anything like that just five days ago?

If not, don’t feel bad. There was an excellent reason why you might have been surprised then at what is now clearly plausible to anyone looking at the plot below: Your President was telling you that it was no big deal.

With the data available on March 13, the logistic+linear growth model predicted there would be around 200,000 U.S. cases on March 27. That is a little more than double what the model’s current best fit says is most likely. Again, the model was and still may be pessimistic, predicting too many cases. But again, even 90,000 or so infected Americans–with probably 20,000 of them very sick and at least a thousand of those dying and thousands left with permanent lung damage–is a very big deal. And the virus will still just be getting started.

Yesterday, March 18, I released the first version of this blog post with the projection that there would be nearly 15,000 cases tomorrow (3/20). (Actually 14,538.) Once again, the model was a bit pessimistic; the current projection is for 13,549 cases, or 93% as much.4 And the longer-term projections are slightly lower, which is moving in the direction we all want, though only by a little bit.5

———

So, on March 19, here is what the admittedly pessimistic logistic+linear growth model now says, based on Johns Hopkins data updated this evening. The numbers are all in reported U.S. cases:

  • The day after tomorrow (3/21), there will be nearly 18,000 cases.

  • In one week (3/26), there will be more than 60,000 cases.

  • In two weeks (4/2), there will be over 400,000 cases.

  • We will reach the million mark between April 4 and 6.6

  • On April 11, there will be five million cases.7

  • The U.S. outbreak won’t even begin slowing down until mid-April at the earliest. In other words, there will be increasing numbers of new cases until probably around April 24 when there are finally fewer new cases one day than there were the day before.

  • The ultimate number of Americans being reported as being infected by the novel Coronavirus will ultimately reach several tens of millions.

This is some scary shit. And it may be even worse than it looks right now. What the data show, and the model is fitted to, is the number of reported cases; several days ago, some experts in the Seattle area were saying that the number of true cases in Washington State to be several times the number being tested and reported.8 Isn’t it reasonable to expect that to remain largely true? Our medical system will almost certainly become overloaded and the focus will simply turn to saving those lives that can be saved, as it has already in Italy.

But all this is just me talking, not the model. It makes no assumptions or judgments about the data. It doesn’t care if some political situation has caused fewer tests, or suddenly more tests. It doesn’t care about an idiot chief executive downplaying the danger and thus encouraging its spread (at least among his cult following), then abruptly deciding to join the adults in the room.9

The model simply predicts what will happen if the data continues as it has recently, especially as it has in the past few days.

That’s it. The interpretation and explanation is up to you.

The Nightmare Plot

Returning to the model and its neat little world of reported cases, here is a plot from a simulation I ran this evening, whose results I summarized in the bullet points above. It should make you listen very carefully to what you are being told by medical experts about social distancing, washing your hands, not touching your face, and staying the fuck home.

Now, this is one really important plot. It shows up way too small in this blog post for you to be able to see its important details. So please click or tap on it to open it as an image by itself.

Reported U.S. Covid-19 cases vs days since Jan. 22, 2020

You can also click here to see the plot with data from yesterday, 3/18. Open them in two tabs of your browser and then switch between to see how the model is holding up.

The upper subplot shows the best-fit logistic growth model in red, with the actual number of cumulative reported cases of Covid-19 in blue. The error between the model and the data is shown with each annotation. Look how small the residuals are compared to the exponentially rising numbers of cases. It’s a scarily impressive fit, even if the model has proved a bit pessimistic thus far.

The lower subplot shows the number of cases expected to be reported over time, slightly in the past and then extrapolating to the future. Fifty generations of running a differential evolution algorithm10 resulted in a 120-member population of combinations of parameters for the model. I deliberately terminated the algorithm sooner than I would otherwise so that there would be some visible variation in the extrapolations. The black dots show expected reported-case values with parameters from each member of the population, plotted at a bunch of random times from 3/12 to early April.

Significantly, the subplots both have a logarithmic y-axis. Exponential growth is linear when viewed through a logarithmic lense. When you see that straight line marching steadily upward toward those massive numbers, you really want all your modeling to wind up an embarrassing public failure.

Covering my Posterior

A better way to model this might have been to use a Monte Carlo analysis (e.g., with the Metropolis-Hastings algorithm) to obtain posterior probability distributions for the parameters, and then run a bunch of extrapolations based on parameters drawn from the distributions. But I had the tools handy for using ADE instead; I’ve been wrapping up a year-long project modeling power semiconductor devices using it with the free Ngspice simulation software. So this is what I have to offer, and it seems plenty illuminating to me.

But even without having posterior distributions to draw random variates from, what I am seeing in the scatter plots of value vs SSE for parameter L is not reasssuring. That parameter represents the total number of cases expected to ever be reported. And the data we have, with its steady logarithmic-scale march upward, is not satisfying my computer evolution algorithm that there is any upper limit before the nation’s entire population is infected.

SSE vs value: Parameter L (3/18 data)

Simply put, this thing is currently showing no signs of slowing down anytime soon. It is very possible, even likely, that these values of L are due more to genetic drift than any optimality-of-fit of the modeling they represent.11

A word of explanation of this scatter plot: The red dots hugging the left side of the plot are values (y-axis) of L in the final population of parameter combinations, plotted against the sum of squared error (x-axis) that those combinations had vs the data. The distribution of values seems to indicate that we shouldn’t hope for less than several million U.S. cases, and that we can’t count on any upper limit before the virus runs out of hosts to infect.12

There is a fair amount of correlation between the model parameter t0 and two other parameters, k and L. The parameter k represents how drastic the exponential behavior is; higher values cause things to blow up faster and thus start to reach limits sooner. Thus the highest values of k in the final population are associated with somewhat lower values of t0. The time when the number of new daily cases reaches its maximum happens a few days earlier.

Regarding the correlation between parameters t0 and L, a positive-valued one this time, it simply makes sense to realize that increasing new cases longer before you finally start to slow down the increases is associated with having more people ultimately infected.

Reasons Why Things Might Not Be So Bad

I want to emphasize that there is also the distinct possibility of L coming down by a lot within the next couple of days. (Unfortunately, I thought it would do that a couple days ago already, but it’s done the opposite.) It could still happen for a couple of reasons I can think of:

  • A curtailing effect becoming apparent soon from containment measures that just aren’t being noticed quite yet due to the incubation period.

  • A sudden recent increase in the number of reported cases due to testing finally being available. The rate of tested vs actual may be increasing, not just the absolute number of people testing positive. This would mean that the model is currently getting fitted to an overly dire set of parameters (especially L) due more to recent dramatic increases in reported cases from better testing than exponential spread of the virus.

And there are probably many more reasons I haven’t even imagined why that curve might start bending down sooner than in this simulation. Again, I need to emphasize my lack of biological or medical expertise. And this leads to . . .

The Disclaimer

First, I disclaim everything that John Hopkins does when offering the data on which this analysis is based.13 I’m pretty sure their lawyers had good reason for putting that stuff in there, so I’m going to repeat it. Except think “Ed Suominen” when you are reading “The Johns Hopkins University”, and this blog post when you read “the Website.”

This GitHub repo and its contents herein, including all data, mapping, and analysis, copyright 2020 Johns Hopkins University, all rights reserved, is provided to the public strictly for educational and academic research purposes. The Website relies upon publicly available data from multiple sources, that do not always agree. The Johns Hopkins University hereby disclaims any and all representations and warranties with respect to the Website, including accuracy, fitness for use, and merchantability. Reliance on the Website for medical guidance or use of the Website in commerce is strictly prohibited.

Second, I know very little about biology, beyond a layman’s fascination with it and the way everything evolved. (Including this virus!) I do have some experience with modeling, including using my ADE Python package to develop some really cool power semiconductor simulation software that I’ll be releasing in a month or so from when I’m doing the GitHub commit with this COVID-19 example. The software (also to be free and open-source!) has a sophisticated subcircuit model for power MOSFETs that evolves 40+ parameters (an unfathomably huge search space). It uses the same principle–differential evolution of nonlinear model parameters–as this unfortunate example we find ourselves in.

The model I’m using for the number of reported cases of COVID-19 follows the logistic growth model, with a small (and not terribly significant) linear term added. It has just 4 parameters, and finding the best combination of those parameters is no problem at all for ADE.

Remember, I am not an expert in any of the actual realms of medicine, biology, etc. that we rely on for telling us what’s going on with this virus. I just know how to fit models to data, in this case a model that is well understood to apply to biological populations.

Don’t even think of relying on this analysis or the results of it for any substantive action. If you really find it that important, then investigate for yourself the math, programming, and the theory behind my use of the math for this situation. Run the code, play with it, critique it, consider how well the model does or does not apply. Consider whether the limiting part of the curve might occur more drastically or sooner, thus making this not as big a deal. Listen to experts and the very real reasoning they may have for their own projections about how bad this could get.

It’s on you. I neither can nor will take any responsibility for what you do. I will say this, though: If you haven’t been sitting at home for a week straight already, wash your hands a lot and don’t itch that nose unless you really have to and you just got done with one of those hand washings. It’s a hot zone out there already. You don’t need my fancy modeling to see that.

Finally, if this is getting you down, please think of all the people who were living and loving and looking up at the blue sky even during the fall of Rome and the Black Death. We have a front-row seat on history being made. Yes, it is a worldwide biological cataclysm not seen since the days of polio, smallpox, and the Spanish Flu.

Yes, this really sucks. But you are alive, and there is so much left to see. A world in crisis can sometimes be an exhilarating world to live in, like a sharp fresh breeze tickling your face on a clear winter’s day. Your grandparents saw cold bracing days like these, and were called the Greatest Generation for the way they responded.

To anyone in despair: Leaving the show early would be a sad waste of the seat that was reserved for you. Stick around. Do what you can to make your life a little better, and the lives of those who love you and whom you love. Allow your worries and fears and sadness to seep into the gentle awareness that an entire world now worries with you.

And there is a bit of good news to share, though it may be cold comfort for my fellow citizens in the U.S.

South Korea is fully in its containment phase, well past its t0 that took place over two weeks ago. They followed the logistic growth model all the way to the containment phase. Look at the two curves and annotated +/- numbers in the upper subplot! The lower subplot zooms in on a narrow range of case numbers around 8,000, where it is unlikely to increase much further.

Reported Covid-19 cases in South Korea vs days since Jan. 22, 2020

Italy’s numbers should start leveling off significantly in the next week. They reached t0 yesterday, according to my best fit of the logistic+linear model with this evening’s data. They appear to be headed for around 70,000-80,000 cases, or about 1% of their population. Even that doesn’t sound too bad.

Reported Covid-19 cases in Italy vs days since Jan. 22, 2020

Be well. And stay home.

Notes


  1. Ng Yi Kai Aaron pointed out an article referencing a paper (Ziff, Anna L. and Ziff, Robert M., “Fractal kinetics of COVID-19 pandemic,” preprint available online) suggesting that the data from China’s experience with the virus “are very well fit by assuming a power-law behavior with an exponent somewhat greater than two.” 

  2. See the important section entitled Disclaimer

  3. See, e.g., https://services.math.duke.edu/​education/ccp/​materials/diffeq/​logistic/logi1.html

  4. All these significant figures are only used for comparison purposes. It is of course silly to put more than a couple of significant digits on extrapolations this uncertain. 

  5. You may think that’s progress, but I consider it disappointing (as a human being with a pair of lungs, not as a data modeler) that the model is tracking the model’s exponential growth phase so closely, and that t0 seems to remain far in the future. 

  6. This projection remains unchanged from the one done with data from yesterday (3/18). 

  7. With yesterday’s data, I thought we would reach the five million mark a day earlier, 4/10. 

  8. Trevor Bedford, for example, a scientist at the Fred Hutchinson Cancer Institute in Seattle “studying viruses, evolution, and immunity,” has mentioned a 10:1 true vs reported cases ratio. https://twitter.com/​trvrb/status/​1238643292197150720?s=20.

    “I could easily be off 2-fold in either direction,” he Tweeted on March 13, when there were just over 2,000 cases being reported in the U.S., “but my best guess is that we’re currently in the 10,000 to 40,000 range nationally.” 

  9. Those who follow me on Facebook know how much contempt I have for the incompetent, malicious, destructive asshole who found enough bigots and morons in a key combination of states to make it past the Electoral College. No, I will not mince words. If you still support Donald Trump– knowing that he dismantled the office that Obama had set up to address pandemics, that he fired people with expertise to deal with this, that he downplayed and denied the reality of the problem until just days ago–then I think there is something deeply wrong with you.

    In my previous post, I asked, “Do many of his supporters even realize how much they’ve been played?” I quoted the self-confessed narcissist Sam Vaknin, who wrote that “the narcissist abuses people. He misleads them into believing that they mean something to him, that they are special and dear to him, and that he cares about them. When they discover that it was all a sham and a charade, they are devastated” (Malignant Self-love: Narcissism Revisited, Narcissus Publications, 2015, p. 69.)

    So far the deranged narcissist’s base of support has proven remarkably resilient to plain facts about how much of a sham it really is. I hope that changes very soon. 

  10. Using my free, open-source Python package ade, Asynchronous Differential Evolution. 

  11. Genetic drift is an evolutionary phenomenon where a population “drifts” certain bits of its genetic code toward what appears to be an optimal range when in reality it is just the survivors propagating a consensus that has no actual selection value. I’ve seen it happen with my computer evolution of simulation model parameters just like it happens in nature.

    The final population of L with 3/19 data ranges from around 20,000,000 to more than the population of the U.S., where the logistic model would obviously run into a stark limitation. Not different enough to show an updated plot. 

  12. This scatter plot doesn’t show a real probability distribution, as a Monte Carlo analysis would. But it does seem instructive, to represent a confidence interval of sorts. I’m guessing that it is no narrower than a posterior distribution obtained from a random walk with well-informed priors. On this question, however, my modeling knowledge reaches its current limits. 

  13. The GitHub repo is at https://github.com/​CSSEGISandData/COVID-19

Tuesday, August 11, 2015

Faith vs. Fact: Two Opposing Sides of the Coyne

The methodological conflicts between science and religion cannot be brokered, for faith has no reliable way to find truth. It is no more compatible for someone to be a scientist in the lab and a believer in the church than it is for someone to be a science-based physician who practices homopeathic medicine in her spare time.
Faith vs. Fact
Faith vs. Fact, fascinating folio from fellow feline fan
Book Review: Faith vs. Fact by Jerry A. Coyne. New York: Viking Penguin (2015).

I’ve read about a dozen books during this hot summer of broken weather records and burning forests, most of them relating to a scientific issue that is but should not be contentious: drastic, ongoing, and potentially devastating human-caused climate change. Three of these works stand out in my mind.

Under a Green Sky by paleontologist Peter Ward tells an engaging tale about cataclysmic extinction events while cautioning about our headlong rush into what might well be another one, caused not by volcanic activity or an asteroid but our reckless burning–in a slim century of explosive human activity–of fossilized carbon that took millions of years to accumulate. Paolo Bacigalupi makes similar warnings using fiction in The Water Knife, “a near-future thriller that casts new light on how we live today and what may be in store for us tomorrow.” (Hint: You’re screwed, especially if you live in Arizona or Nevada.)

And then there is an autographed hardback volume that especially weighed heavy in my hands as I sat sweating in the evenings among my drying trees. It’s significant to me not just because it addresses the mindset of those who deny the slow changes happening right outside their windows, but because it represents the single biggest shift in my own little life: from faith to fact. The goal of its author, evolutionary biologist and religion critic Jerry Coyne, is for people to do what came so hard for me as a Christian fundamentalist, and apparently does for millions of Americans in the thrall of our fossil-fueled Western lifestyle: “produce good reasons for what they believe–not only in religion, but in any area in which evidence can be brought to bear.”1

“Nothing less than the future of our planet is at stake” when it comes to climate-change denialism, and Dr. Coyne devotes a few pages of his book to a discussion of that.2 Despite “the nearly unanimous view of climate scientists that the earth is warming because of human-generated emissions of greenhouse gases,” a dismaying number of Americans and their congressional representatives have no interest in slowing our massive dumping of carbon into the atmosphere. To him and me both, the “ability of people to ignore inconvenient truths that conflict with their faith, whether or not the faith be religious, is astonishing.”3 Yet I had that ability myself, too, ignoring and denying all the evidence against the Laestadian Christianity that long had been the most important aspect of my life.4

That form of faith was a religious one, of course, which is almost entirely the focus of Coyne’s book rather than some secular faith in Fox News pundits and talk radio. They are not entirely disconnected: He notes a correlation between church attendance and acceptance of scientific realities about evolution, the Big Bang, the Earth’s age, and human-caused global warming.5 (You can guess which way the correlation goes; sermons are not known for encouraging scientific thinking.)

Faith vs. Fact is a personal book to me for a couple more reasons that are worth mentioning before (finally!) proceeding into a detailed review of it. The odd little sect in which I was raised gets mentioned: “Laestadianism, a conservative branch of Lutheranism, considers itself the only true faith: only its roughly 60,000 adherents are eligible for salvation, with the billions of others on earth doomed to eternal torment.” Not at all inaccurate, but possibly not the way Laestadianism would like to be introduced to thousands of people.6

Laestadianism gets some exposure (Faith vs. Fact, p. 84)

And it was a real thrill to see my name listed alongside various personal heroes of mine–Dan Barker, Richard Dawkins, Peter Boghossian, Sean Carroll, Dan Dennett, Sam Harris, John Loftus, the late Victor Stenger–when Dr. Coyne thanked some “diverse friends and colleagues” for help and encouragement on his acknowledgements page. After his reading and offering comments about a book of my own, some enjoyable correspondence, and a warm conversation about cats and atheists (not unrelated topics, really) at a conference where we finally met, I would be honored to call Dr. Coyne a friend.

So, full disclosure, an unbiased reviewer of this book I am not. But let’s go ahead and take a deeper look.

Competitors for Truth

“Science and religion,” writes Coyne in his Preface to the book, “are competitors in the business of finding out what is true about our universe.”7 This pretty much summarizes his thinking on the topic, and he makes it abundantly clear which side he judges to be the winner.

All the revelations in all the world’s scriptures have never told us that a molecule of benzene has six carbon atoms arranged in a ring, or that the Earth is 4.5 billion years old. It is this asymmetry of knowledge that, despite religion’s truth claims, make its adherents embrace the fallacious claim that religion and science occupy separate magisteria.8

That NOMA (non-overlapping magisteria) claim was advanced by Stephen Jay Gould in hopes that religion and science could get along somehow. Coyne devotes several pages to dismantling Gould’s idea of a “potential harmony through difference of science and religion, both properly conceived and limited.” The problem, Coyne says, is that one word properly, because “real religion is frequently and stubbornly improper.” Religion tends to trespass on the boundaries of science, even though it rarely happens the other way around: The “vast majority of scientists are happy to pursue their calling as an entirely naturalistic enterprise.”9 This “reliance on naturalism” is

not an assumption decided in advance, but a result of experience–the experience of men like Darwin and Laplace who found that the only way forward was to posit natural rather than supernatural explanations. Because of this success, and the recurrent failure of supernaturalism to explain anything about the universe, naturalism is now taken for granted as the guiding principle of science.10

As a scientist (or an engineer, to add my own experience into the mix), you don’t gaze upward for answers when you’re working in the lab, except maybe if a buzzing light fixture is generating electromagnetic interference. Coyne offers the amusing yet powerful example of someone who spends their life looking in vain for the Loch Ness monster. After all that effort, “stalking the lake with a camera, sounding it with sonar, and sending submersibles into its depths,” they find nothing. Which is more sensible at that point, he asks,

to conclude provisionally that the monster simply isn’t there, or to throw up your hands and say, “It might be there; I’m not sure”? Most people would give the first response–unless they’re talking about God.11

The reason, of course, is that there is so much at stake–an eternity of reward or punishment, one’s entire social network–when it comes to talking about God. I remember consciously denying myself the mental luxury of even allowing for the possibility of His absence. What a delicious relief it was when I finally could!

One scientist who has taken Coyne’s difficult but honest first option, after 20 years of investigation, is Dr. Susan Blackmore. “At some point something snapped,” she writes in a 2010 essay. “Instead of struggling to fit my chance results into yet another doomed theory of the paranormal, I faced up to the awful possibility that I might have been wrong from the start that perhaps there were no paranormal phenomena at all. I had to change my mind.”12 It’s an inspiring story, and I find Blackmore’s absence in this section of the book a bit unfortunate, a lost opportunity to point out that it can be done by a principled thinker.

Can’t We All Just Get Along?

Coyne has little patience for NOMA, for the efforts by theologians and science popularizers alike to avoid the appearance that a competition even exists between science and religion. Simple self-preservation makes it attractive for the liberal religious, while a strategic desire “to avoid alienating religious people” motivates scientific organizations.13 For their separate reasons, they all want to let religion save face by granting it some invisible sphere of truth outside the world of observation and explanation. That he terms “accommodationism,” a harmful “weakening of our organs of reason by promoting useless methods of finding truth.”14

Accommodationist cat: Will trade much-tabbed book for tummy rub

Some science-savvy theologians claim that their sophisticated forms of faith offer “other ways of knowing” what science hasn’t yet explained. There are indeed plenty of those questions; one that Coyne mentions is why the speed of light is constant in a vacuum. Fine, he says: Provide some concrete faith-based answers, and “tell us not only what those answers are, but how they would convince either nonbelievers or members of other faiths. And let those ‘other ways of knowing’ make predictions in the same way that science does.”

But of course they don’t, and can’t. He offers a parallel to the challenge Christopher Hitchens made to believers for an example of ethical behavior only they could perform. The Coyne challenge is this: “[G]ive me a single verified fact about reality that came from Scripture or Revelation alone and then was confirmed only later by science or empirical observation.”15 Neither challenge has ever had a credible response.16

It’s not just that religions are incompatible with science, Coyne says. Unlike science, whose many different disciplines “share a core methodology based on doubt, replication, reason, and observation,”17 religion is splintered into countless varieties that are incompatible with each other. Yet “this incompatibility wasn’t inevitable: if the particulars of belief and dogma were somehow bestowed on humans by a god, there’s no obvious reason why there should be more than one brand of faith.”18

This argument resonates with me for a reason Coyne probably never thought of when he made it: patent law. I’ve obtained over a dozen patents, for commercially successful technology. What those pieces of paper give you is the right to exclude others from making and using what you’ve invented, a right that you can then license and sell to others, or exercise yourself to avoid competition during the 20-year patent term.19 Now, an omnipotent God has the ultimate patent. He could just squash everything but the One True Religion that he supposedly invented, and that would be that. But that doesn’t happen, because there is no such patent holder.

Something else I’ve done is to spend an embarrassing number of hours studying and writing about those “particulars of belief and dogma” in all their hair-splitting details–not just between Protestantism and Catholicism, not just between different forms of Lutheranism, but between different forms of Laestadian Lutheranism. So I offer a hearty secular Amen to another excellent point Coyne makes along those lines: “Given that most religious people acquire their faith through accidents of birth, and those faiths are conflicting, it’s very likely that the tenets of a randomly specified religion are wrong. How can you tell if yours is right?”20

Uh, because the guys in suits who are telling you that it is too right are really, really sure of it–because their fathers in suits who told them about it were, too? Never mind those other guys at the “heretic” church one town over, who are telling a story whose differences are slight but of incomprehensible importance, and who have no less basis for making their own claims. Yeah, right.

At this point in my review, and in my life, I have the blessed freedom to offer the real answer to that dilemma, for those uncomfortable pew-sitters reading this who are suffering through the churnings of doubt: Revelation without observation is bullshit. A more refined and civilized statement, perhaps, is Coyne’s summary of his claims about the co-existence of religion and science. But it is no less direct. The two

are incompatible because they have different methods for getting knowledge about reality, have different ways of assessing the reliability of that knowledge, and, in the end, arrive at conflicting conclusions about the universe. “Knowledge” acquired by religion is at odds not only with scientific knowledge, but also with knowledge professed by other religions. In the end, religion’s methods, unlike those of science, are useless for understanding reality.21

Come on, now, Jerry. Stop being all nice and diplomatic and vague, and tell us what you really think!

The Chimpanzee in the Room

For most everyone in the United States and probably many other places around the world, mentioning science and religion together will evoke a third topic: evolution. “While not the only scientific theory that contradicts scripture,” Coyne observes, “evolution has implications, involving materialism, human exceptionalism, and morality, that are distressing to many believers.”22 But, as I observed in my first book after confronting those issues, then still a troubled believer of sorts in theism if no longer my childhood fundamentalism, theological imperative does not equal truth.23

The truth about evolution is simply undeniable to any reasonably informed and thoughtful individual. As Coyne (who has spent decades working directly in the field) notes, “it is supported by mountains of scientific data–at least as much data as support the uncontroversial ‘germ theory’ that infectious diseases are caused by microorganisms.”24 Indeed, we see the deadly results of evolution in action, right before our eyes, whenever new generations of those microorganisms acquire new resistances to our dwindling stocks of effective antibiotics.

And yet denial persists, to an astounding degree. Coyne summarizes the results of a 2014 Gallup poll: “fully 42%” of Americans polled “were straight biblical young-Earth creationists, agreeing that humans were created in our present form within the last ten thousand years.” Fewer than one in five “accepted evolution the way biologists do, as a naturalistic, unguided process.” The reason is not a lack of evidence, which is simply overwhelming–countless thousands of published findings from numerous scientific disciplines. Nor is it a lack of opportunity for people to learn about that evidence; Coyne notes that “we live in an age of unprecedented science popularization.”25 Indeed, he has been one of the forces behind that with his own book, deservedly a best-seller, Why Evolution is True.

This is not about the evidence. It is about a fearful, irrational denial of reality by those who cannot afford to deviate from the party line of their precious religions. In the concluding pages of Evolving out of Eden, Dr. Robert M. Price and I reflected on the mindset of the Christian fundamentalist, a place I myself had still been uncomfortably occupying not long earlier. Things get difficult for him, we wrote,

if he peers outside the safety of church society and “healthy” reading materials to glean some awareness of the many other theological problems lurking in the tall grass of science. He may recognize himself (and Jesus!) as an evolved primate, and Original Sin as an absurd doctrine built on unscientific sand. The very rationale of the atonement collapses, along with all those “sins” his pastor carries on about, which come to look like natural, even healthy traits that allowed his ancestors to replicate and eventually produce him. The God of all Creation he once praised while musing over every tree and sunset goes quiet and cold, fading into an impersonal set of laws and forces that forms life out of randomness shaped by countless acts of suffering and death.

It should be no surprise to see so many Eden dwellers turn away from all this and scurry back to retrenchment and denial, the burden of intellectual dishonesty and cognitive dissonance still lighter than the terrifying alternative. The only other options are to water down one’s faith with accommodationism, which brings its own dishonesty and dissonance, or abandon it altogether. But science has set forth the flaming sword, and the Garden cannot remain occupied for long.26

Coyne provides some useful discussion of the theological dangers in that tall grass, too, including a crystal-clear falsification of the whole Adam and Eve idea (pp. 126-27), experimental demonstrations “that no external force seems to be producing mutations in an adaptively useful way” (p. 138), and a thorough debunking of the “fine-tuning” argument (pp. 160-66). Faith vs. Fact is not a book limited or even really focused on the theological problems posed by evolutionary reality, but it certainly gives the reader a flavor of what is keeping those poll numbers so high, one decade after the next, while the science marches on.

Facing Facts

“The vast majority of believers don’t want their faith examined skeptically,” Coyne observes in his concluding chapter about why this all matters. Nor “do they honestly examine other faiths to find why they see their own is true and those others as false.” What religion does, instead, is to defend “its claims by turning them into a watertight edifice immune to refutation.” The preachers and imams and their faithful listeners aren’t really interested in what is true; if they were, they would acknowledge that what they are currently thinking might not be. But that is a step they do not and cannot take, despite Coyne’s eminently sensible proposition that it is “better to find out how the world really works instead of making up stories about it, or accepting stories concocted centuries ago.”

I am no longer so concerned about religion as I used to be, and I hope for the same world that Jerry Coyne wants: “one in which the strength of one’s beliefs about matters of fact is proportional to the evidence . . . where it is okay to reserve judgment if one doesn’t know the answer, and where it’s not seen as offensive to doubt the claims of others.”27 I want that world, too, and I try to live my life as if it has already arrived.

But our culture is pervaded with irrationality and stubborn beliefs in what is palpably not true, and that has a way of creeping into one’s life regardless. It is not just felt in the aftershocks of religion rejected–the loss of a social network, the worries about superstitions being taught to children, the difficulties experienced by loved ones still inside the church walls. It also manifests in outbreaks of measles caused by vaccine deniers, in the disparaging and defunding of our educational system by a disinterested and even hostile public, and in what has concerned me most during this summer of heat and drought and smoke: climate change whose human causes and even whose very presence so many are still denying.

Not something I want to lose. [Flickr page]

“Doctrines may be a frightful burden,” Willam Catton wrote a generation ago in Overshoot: The Ecological Basis of Revolutionary Change, another of the worthy books I’ve read during these past months. For, “with the prestige of antiquity and tradition, they deprive the living generation of an open-minded capacity to face facts.”28 It is a piece of the same puzzle that Coyne describes, just focused on a different form of faith–in limitless growth without consequence.

To avoid despairing of our ongoing ecological disaster, we have constructed ourselves a giant cargo cult, in which our modern “faith in science and technology as infallible solvers of any conceivable problem can be, in a post-exuberant world, just as superstitious” as the Melanasians who constructed runways in anticipation of John Frum’s return with piles of loot. Catton describes this in a chapter of his 1982 work with the eerily identical title to Coyne’s book: “Faith versus Fact.” He writes that the “modern Cargoist who expects to be bailed out of this year’s ecological predicament by next year’s technological breakthrough holds similar beliefs because of his inadequate knowledge of ecology and of technology’s role in it. Both Cargoist faiths rest upon the quicksand of fundamental ignorance lubricated by superficial knowledge.”29

This is not a faith from which I can just walk away, as I did with Christian fundamentalism, difficult as that was. So I do my empty penances (Catton: “We may come to feel guilty about stealing from the future, but we will continue to do it”) and look outside the window, air conditioner running, at my big trees that have lived through a hundred summers. They may not survive many more as hot and dry as the one that is burning the American West right now. And I find myself wishing for a sanctuary in which I might sing, to keep those facts away. But I know better, and this is the way I will always live, with a mind clear and free, still with more joy than sorrow just the same.

———
See Jerry Coyne’s book page for more information about Faith vs. Fact, a highly recommended read. If you are wrestling with doubts about a religion that you’re not sure is true anymore, and science has any part in that struggle, give yourself a few days with this work. Reality can be difficult, but the pain of trying to deny it when you know better is far worse.
My thanks to Jerry for his nice write-up of this review.

Notes


  1. Faith vs. Fact, p. xxii. 

  2. p. 246. 

  3. p. 245. 

  4. See my first book, The Examination of the Pearl

  5. Faith vs. Fact, p. 245. 

  6. The situation is actually worse than Dr. Coyne may realize. You also have to be the right kind of Laestadian to be saved, a faithful member of the correct one of at least five different splinter groups who all make their own extreme exclusivity claims. 

  7. p. xvi. 

  8. pp. 195-96. 

  9. p. 108. 

  10. p. 92. 

  11. p. 95. 

  12. “Why I Had to Change My Mind.” In Richard Gross, Psychology: The Science of Mind and Behaviour, 6th ed. (London: Hodder Education), pp. 86-87. The quote is from a draft version available online at susanblackmore.co.uk/​Chapters/Gross2010.htm

  13. p. 93. 

  14. p. xxi. 

  15. pp. 91. Back when I was a faithful Bible believer, I would have responded to the challenge with Jesus’ examples of saved and unsaved people at the moment of his second coming: “I tell you, on that night there will be two in one bed; one will be taken and the other will be left. There will be two women grinding at the same place; one will be taken and the other will be left. Two men will be in the field; one will be taken and the other will be left” (Luke 17:34-36, NASB). They didn’t realize the earth was round back when that was written, I used to think, so how would a human author know to use workday examples along with a nocturnal one? But, alas, that last part about men working in the field wasn’t in the original text, and the odds of such an accidental “revelation” never occurring in thousands of lines of Scripture are very low indeed. 

  16. pp. 227-28. 

  17. p. 86. 

  18. p. 85. 

  19. The twenty years begins on the day you file the patent application, although there are no enforceable rights until claims appear in an issued patent. Some limited term extensions are possible due to certain administrative delays in getting the patent grant, but overall, patents differ from the Mickey-mouse charade of perpetual legislative updates to copyright terms in that patented ideas do usefully pass to the public good. I expect to see Walt’s precious mouse in the public domain when he can skate over frozen hellfire, perhaps to the tune of Let it Go

  20. p. 85. 

  21. p. 64. 

  22. p. 59. 

  23. An Examination of the Pearl (2012), §4.3.1: “But theological imperative does not equal truth. It couldn’t do so even when the Church had the rack and the stake at its disposal. The facts just sit there, mute, uncaring about how vehemently people deny their existence. . . . The only alternative to accepting the overwhelming evidence of man’s non-Adamic, evolutionary origins is to say that the evidence is false and was planted by God in fossils, vestigial body parts, patterns of speciation, ongoing and directly observed evolutionary changes, and a newly discovered treasure trove of information in our own DNA that matches up remarkably with all the observations that had been made beforehand. There is absolutely nothing contradicting that evidence except some ancient Hebrew writings (which themselves contradict each other) and the mountain of theology that has piled up on top of those writings over the centuries.” 

  24. Faith vs. Fact, p. 59. 

  25. p. 60. 

  26. Robert M. Price and Edwin A. Suominen, Evolving out of Eden. Valley, WA: Tellectual Press (2013), p. 311. 

  27. Faith vs. Fact, p. 260. 

  28. Willam R. Catton, Overshoot: The Ecological Basis of Revolutionary Change. Urbana and Chicago: University of Illinois Press (1982), Ch. 5, “The End of Exuberance.” Citing an 1896 essay by sociologist William Graham Sumner. 

  29. Catton, Ch. 11, “Faith versus Fact.”