Introduction
Our first model was the exponential growth model. It failed spectacularly in two respects. First, everyone became infected. Second, more people become infected than there are people. I want to look at a very simple model that limits the infection to the actual number of people. Unfortunately, in this model, everyone will still become infected. Although this model is not particularly useful, the ideas behind it form the basis for many useful models.
THE SI model
In this model there are two types of individuals, the susceptible (S), and the infectious (I). Once an individual becomes infected (gets the disease) they are immediately infectious (can pass it on). There is no cure in this model, so once an individual becomes infectious, they remain infectious. But the infection is not lethal, so no one dies. These conditions make the SI model simple and easily implemented. Some of the future models will build on the structure of this model. Future developments will allow people to recover and not remain infectious. A period of time between becoming infected and being infectious (the latent period) can be added.
The Key concept
The model assumes the infection spreads when an infected individual encounters a non-infected, non-immune, individual who is susceptible to being infected. But not all encounters result in infection. Unless the two get physically close enough, and long enough, it is unlikely that the infection will pass to the susceptible individual. So we only count the encounters that have the possibility of passing the infection. Someone who is not infected, but can be infected (not immune) is called susceptible. Let the total number of people who are susceptible as a function of time be S(t). Let the number of infected people as a function of time be denoted I(t). We measure time in days.
An average, typical, infected individual makes contact with other people during the day. Let’s assume that λ of these contacts are capable of passing on the infection (each day). Here λ does not need to be an integer, it represents an average for the population. But he infection can only be passed on if the other person is susceptible. The actual number of infections that one person generates is reduced when the number of susceptible people starts declining. Sometime the encounter is with a susceptible individual and other times the encounter is with another infectious individual. But together the total number of encounters is λ.
Let N be the total number of people in the population. Then the effective rate of infection λeff by a single individual becomes.
Here is the fraction of people (on average) that our infectious person encounter who is capable of receiving the disease. Therefore the number of people infected by each infected indivual becomes .
Since there are I(t) infected individuals, each of which is infecting people, the rate of new infections is just Therefore we can write a differential equation for the function I(t).
(1)
In this model the total population consists of infected and non-infected individuals. (Remember no one dies). Therefore
N = S(t) + I(t) or
S(t) = N- I(t).
Substituting this into equation 1 we find
(2)
This can be written in a slightly more transparent way, by isolating the N.
(3)
Here λ, the effective contacts per day, is a constant that simply controls the rate at which things proceed. N is the total population. When I(t) becomes equal to N, there are no more susceptible people, and the rate of infection drops to zero. To solve this we also need an initial condition, the number of infectious people at time zero.
The Solution.
Let’s assume that the number of infectious individual at t=0 is Io. This is an example of the Logistics equation which has a well known solution. In this case:
(4)
Sample Solution
The plot below is an example for a population of 7.7 billion (roughly the current world population). The logistics has a lambda of the natural logarithm of 2. This is chosen so in the early stages it matches the doubling time of an exponential, which is also shown on the chart. (see part 1). Both the exponential and the logistics equations start off with one infection.
The exponential and the logistics equation agree till about the 26th doubling (t=26) where they differ by about 1%. At the 31st doubling they differ by 22%. The Logistic equation is slowing down. By the 33rd doubling the exponential is predicting more infections than there are people while the logistics equation has about 53% of the people infected.
By the 46th “doubling” the Logistics equation predicts that the number of susceptible people has dropped to under a million, or about 0.01% of the total population. The problem of infections rising above the total population has been solved, and it has delayed the inevitable by a few doublings.
A real case study
As I write this we are in the middle of the COVID-19 pandemic. The World Health Organization (WHO) has released data for the Chinese province of Hubei that contains the city of Wuhan in their situation reports of the COVID-19 pandemic. (https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports ) Since Hubei has gone through a cycle of growth and then decline of COVID-19 it makes a good relevant case study.
Below is a plot of the total number of cases (infections) of days 6-16 (WHO Special Reports 6 though 16) in Hubei. Also shown is a fitted exponential curve. The coefficient in the exponent (our λ) is 0.2137 and the number of infected people on day 0 (actually day 5, where we are beginning our analysis) is 1,619 people. The only other number we need to fit the logistics curve the population of Hubei (59.17 million people).
The plot for the Hubei has to be shown on a semi-logarithmic plot, otherwise the Hubei data would be on the horizontal axis! Notice that equal distances on the vertical axis are now powers of 10. The Hubei data are the open circles. In the middle of this curve China changed the definition of a confirmed case. This happened between days 26 and 27 on the above plot. The sudden jump and associated shift in the curve was removed by subtracting 17,962 cases from the subsequent data. Both the exponential (red) and the Logistics curve (black) track the early days pretty well, but they don’t saturate correctly. The Exponential curve of course doesn’t saturate at all! (see part 1). The Logistics curve saturates at the total population of Hubei, which is still a bit off the plot.
The goal of epidemiologists and public health officials is to prevent the entire population from getting the disease. China clearly achieved this goal. So we still have some work to do on our models!
Trying to save the Logistics curve.
Before we give up on the logistics curve all together, let try something on an ad-hock basis. Perhaps this will provide a little motivation for part III. No matter how we change λ the logistics curve will always result in everyone being infected. Lambda ( λ) changes is how long a doubling takes takes in the early stages. It is N that controls where the logistics curve saturates. Rather than use the total population as N, lets use the final number of cases. This is worthless as a model, since the final number of cases is something we would like the model to show us, not something we want to have to put into the model. However this will provide some insight into part III. Here is the fit.
The plot above has a linear vertical axis. The fit to the data is reasonable. The logistics curve seems to show a slightly slower rise and a faster leveling off. What might this tell us about the next modification to the model?
So what is missing?
When someone gets sick they either recover or die. If the die, the can not longer infect people. In some diseases, after recovery, people remain susceptible to getting sick a second time. One example is the common cold. In other diseases, after recovery, people become immune, at least for some time period. Thus there are two possible modifications to this model. In one people recover and rejoin the susceptible pool (Susceptible Infectious Susceptible, or SIS models). In the other case they exhibit immunity (at least during the time period of the model) or SIR models. We will consider the SIS model first, and then turn our attention to SIR models.
Implementation Possibilities
For an implementation in Python for the solution of the ordinary differential equation see this page https://davidalarrabee.com/?page_id=866
For an implementation in Python using a Monte-Carlo approach see this page (under development). In this approach individual people are simulated with a random number generator.