Introduction
So far we have used models that are described by differential equations. At the heart of an epidemic or pandemic is the spread of the pathogen from one individual to another. Predicting if Uncle John gets infected from doing the errands around town is virtually impossible. Put one might be able to estimate the probability that Uncle John gets infected. This suggests some sort of random number modelling of the system.
Monte Carlo is a famous casino which has lent its name to a technique for using random trials to model complex systems. The basic idea is simple. Start out with a large number of individuals that are susceptible, but not yet infected. Have another group of individuals that are infectious. Use a the probability of getting infected and a random number generator to infect the susceptible individuals over time.
The details
The major input parameters to the model need to be determined:
numpeople = is the number of people in the simulation run. Since this disease isn’t deadly, this number doesn’t change (we are neglecting births and deaths from other causes).
numinfected = This is the number of people that are infected at the start of the simulation. Since we are dealing with random numbers, too low a value will make the statistical noise dominate the startup of the epidemic.
numtimesteps = is the number of iterations of the simulation. It wouldn’t be hard to change the code to an end time.
dt = the size of a timestep in days.
plambda = is the lambda in the differential equation. It is the number of contacts a day that an infected person makes that have the possibility of being infectious.
After checking the data we create a list of persons where each entry is a single character corresponding to their status. Each individual is either an S (susceptible) or an I (Infectious). In this model once you become infectious, you remain infectious.
After some housekeeping (opening an output file, printing headers, etc.) we enter the main loop.
- Determine the current number of infectious and Susceptible people
- Determine the probability of a susceptible individual getting infected in the next time interval. This is easy if the time interval is a day. If the infection rate is high enough, it might be better to use a fraction of a day as a time-step. Two approached are discussed below.
- Go though the list of susceptible people. For each susceptible person generate a random number between zero and one. If the number is less than the probability in 1, mark that person infected.
- Compute the theoretical value
- recompute the number of susceptible and infected persons as a result of this time step.
- compare the simulation and the theoretical results
- print it out to the screen and the file
- repeat the process starting at one until we have used up the iterations. (Yes this is the function of the for loop before 1….)
Time step control
In the differential equation the probability of a susceptible becoming infected changes continuously, even during a single day. In this simulation the probability is fixed during the time step using values determined at the start of the time-step. As the time-step goes to zero the Monte Carlo results should approach the differential equation approach (spoiler it does). The probability of a susceptible person being infected in a time interval gets smaller as gets smaller. How important is this?
If the disease is progressing rapidly then the time interval might want to be a fraction of a day. To put it another way the change in the number of infected people should be a small fraction of the current number of infected people.
or
since every one gets infected and the number of susceptable people starts off close to N we have that
so for a rapidly advancing disease the time interval might need to be smaller than a day. A python code follows and you can try this out for yourself.
In the Monte Carlo code we need a way of generating a probability for infection for a time interval and parameter . Here are two strategies. They both agree when is very small or equal to 1, but differ at moderate values. If the probability parameter is
a) multiply by This assumes that we can ignore the change in I(t) over a day and that the that the probabilities in the sub interval simply “adds up” to the total probability. It works … but .. if you know something about probabilities you might recognize that this really doesn’t work.
b) consider a group of A people going though a sequence of time-steps. We will assume that at each time-step the probability of getting infected is p. After the first time-step the number of people that are infected is Ap, but the number of people that are not infected is A(1-p). After n time intervals the number of people that remain uninfected is Since the original number was A, the number that were infected in those time intervals is just or . But the number of intervals during a day is simply and the probability in the subinterval is related to the probability during a full day by
Solving this for p in terms of yields
p becomes the probability during a time-step of size when the initial full day probability is . In the software below the statements for both options are present, with the simple option a commented out.
Other issues
Another difference is “noise” that is generated by using random numbers. You can see this by running the software with different random number “seeds.” Different seeds generate different pseudo random number sequences. This allows the person evaluating the computer runs to generate multiple runs with different seeds to see the size of the random number noise. If the initial number of infected persons is low, then this noise becomes quite evident at the the start of the simulation. This is one reason the code as supplied has a very large population and a large number of initially infected individuals.
In the real world the number of people being infected at 3 am in the morning is probably lower than those being infected at noon. The number of people being infected during Wednesday may be different than the number of people being infected on Saturday or Sunday. Peoples lives have rhythms that are not reflected in either the differential equations or in the Monte Carlo calculations. So I would be hesitant to say one method is preferable to the other. Models don’t include everything. The modeler needs to judge what can be left out without significant influence on the questions the modeler is trying to answer.
We will examine a third option when we consider the next model (SIS) that will build off of the Monte Carlo approach.
Python Implementation
Here is a sample Python code that implements the Monte Carlo technique. I have added code to drop the data to a comma delimited file. This allows the user to load the data into Excel or other spreadsheet program. Hopefully the comments in the code are sufficient for you too follow the code. I have avoided the use of lots of special procedures to create a modern looking piece of code. For the small code it seemed preferable to make it as legible as possible with a simple start at the top go to the bottom structure.
import math
# import the random number generator and math
from random import seed
from random import random
# seed random number generator
seed(1)
#*****************************************************************
# MONTE CARLO SI Epidemeological Code
# Version 1.03
# May 15, 2020
#*****************************************************************
vers = "SI monte carlo code vs. 1.03 May 15, 2020 by David A. Larrabee \n"
header = "Time, Susceptible, Infected, new infections, Predicted # infected,"
header = header+" error (%)"
EOL="\n" # end of line character
print (vers)
# ----------------------------------------------------------------------
# initialization of the code
# ----------------------------------------------------------------------
# input parameters.
numpeople = 300000 # the number of people in the simulation run
numinfected = 1000 # the initial number of infected people must be less
# the number of people
numtimesteps = 900 # the number of days to run the simulation
dt = 0.2 # the fraction of a day in a timestep
plambda = 1.5/14 # the parameter in SI, number of people an infected
# person infects each day
datafile="SImontecarlo103.csv" # name of the file to store data.
# Data check and set up for the run
if numinfected > numpeople: # just to prevent an error, trivial case, everyone
numpeople = numinfected # starts out infected
if numinfected < 1:
numinfected = 1
persons = [] # list holding the status of the persons
for i in range(numpeople):
persons.append("S") # set the default value to suspectible
for i in range(numinfected):
persons[i]="I" # set status to infected for the select few
frac = (numpeople-numinfected)/numinfected # parameter for the exact solution
# if desired log the results in a data file
if datafile != "":
fhandler=open(datafile,"a")
strng=vers+EOL
strng=strng+'Total number of people =, '+str(numpeople)+ EOL
strng=strng+"Total number of people initially infected =, "
strng=strng+str(numinfected)+EOL
strng=strng+" lanmbda =,"+str(plambda)+EOL
strng=strng+" number of timesteps, "+str(numtimesteps)+EOL
strng=strng+" fraction of a day in a timestep, "+str(dt)+EOL
strng=strng+header+EOL
fhandler.write(strng)
print(header)
# -----------------------------------------------------------------
# MAIN LOOP
# -----------------------------------------------------------------
for timestep in range(numtimesteps):
# total up the number of infected and susceptable people at the start
time = (timestep + 1)*dt # the actual time at the end of this timestep
numsusceptable = 0
numinfected = 0
for i in range(len(persons)):
if persons[i] == "S":
numsusceptable = numsusceptable + 1
else:
numinfected = numinfected + 1
# this is the probablity that a susceptable person becomes infected
# it the the number of people that are infected in the SI model divided
# the number of susceptible people. It should be less than one.
# if the interval is less than a day the probability needs to be adjusted
# multiplying the infectious probability by DT gives good results for DT
# is one, or DT is small. A better approach is the second below (see
# on line writeup)
if numsusceptable >0:
infectprobp = plambda * numinfected / numpeople # probability of
# infectprob =infectprobp*dt
infectprob = 1.0-(1.0-infectprobp)**dt
else:
infectprob = 1. # everyone is infected, if you were susceptible
# eveyone else is trying to infect you!
if infectprob >1 : # shouldn't happen....
infectprob = 1.0
print("infectprob >1")
# throw the die
newinfections = 0
for i in range(numpeople):
if persons[i]== "S":
if random() <= infectprob:
persons[i]="I"
newinfections = newinfections +1
theory = numpeople/(1+frac*math.exp(-1.0*plambda*time))
inf = numinfected + newinfections
sus = numsusceptable - newinfections
err = (inf - theory)/theory*100.0
print('{:6.2f}'.format(time),'{:10d}'.format(sus), '{:9d}'.format(inf),
'{:14d}'.format(newinfections), '{:9.0f}'.format(theory),
'{:+.2f}'.format(err),'%' )
if datafile != "":
strng=str(time)+","+str(sus)+','+str(inf)+','+str(newinfections)
strng=strng+","+str(theory)+','+str(err)+EOL
fhandler.write(strng)
fhandler.close()
print("done")
Code Results
Here is the start of the code output to the screen for the code input data above. For the first few time steps.
SI monte carlo code vs. 1.03 May 15, 2020 by David A. Larrabee
Time, Susceptible, Infected, new infections, Predicted # infected, error (%)
0.20 298983 1017 17 1022 -0.45 %
0.40 298961 1039 22 1044 -0.44 %
0.60 298942 1058 19 1066 -0.77 %
0.80 298922 1078 20 1089 -1.03 %
1.00 298904 1096 18 1113 -1.50 %
1.20 298875 1125 29 1137 -1.03 %
1.40 298859 1141 16 1161 -1.74 %
1.60 298833 1167 26 1186 -1.62 %
1.80 298805 1195 28 1212 -1.39 %
2.00 298783 1217 22 1238 -1.70 %
2.20 298750 1250 33 1265 -1.16 %
2.40 298729 1271 21 1292 -1.62 %
2.60 298703 1297 26 1320 -1.73 %
2.80 298678 1322 25 1348 -1.95 %
3.00 298644 1356 34 1377 -1.55 %
3.20 298614 1386 30 1407 -1.50 %
3.40 298592 1408 22 1437 -2.04 %
3.60 298560 1440 32 1468 -1.93 %
3.80 298537 1463 23 1500 -2.47 %
4.00 298500 1500 37 1532 -2.11 %
4.20 298460 1540 40 1565 -1.62 %
4.40 298430 1570 30 1599 -1.82 %
4.60 298402 1598 28 1634 -2.17 %
4.80 298374 1626 28 1669 -2.56 %
5.00 298341 1659 33 1705 -2.68 %
5.20 298303 1697 38 1741 -2.55 %
5.40 298264 1736 39 1779 -2.41 %
5.60 298228 1772 36 1817 -2.48 %
5.80 298195 1805 33 1856 -2.76 %
6.00 298146 1854 49 1896 -2.23 %
6.20 298102 1898 44 1937 -2.01 %
6.40 298066 1934 36 1979 -2.26 %
6.60 298017 1983 49 2021 -1.89 %
6.80 297978 2022 39 2065 -2.07 %
7.00 297942 2058 36 2109 -2.43 %
7.20 297904 2096 38 2155 -2.72 %
7.40 297864 2136 40 2201 -2.95 %
7.60 297815 2185 49 2248 -2.81 %
7.80 297773 2227 42 2296 -3.02 %
8.00 297728 2272 45 2346 -3.15 %
Comparison with exact solutions
The plot below was generated from the Comma delimited files and loaded into Excel. The dashed line is the solution of the SI model differential Equation. The green circles are the Monte Carlo simulation using a time step of 1 day. The red line is a Monte Carlo simulation with a time step of 1/5 of a day. The other parameters are the same as in the code above.
Next Steps
Next we will consider the first useful model, SIS or Susceptible Infectious Susceptible. (In process).