Part 2.3 Monte Carlo SI Model Implementation

Introduction

So far we have used models that are described by differential equations.  At the heart of an epidemic or pandemic is the spread of the pathogen from one individual to another.  Predicting if Uncle John gets infected from doing the errands around town is virtually impossible.  Put one might be able to estimate the probability that Uncle John gets infected.  This suggests some sort of random number modelling of the system.  

Monte Carlo is a famous casino which has lent its name to a technique for using random trials to model complex systems.   The basic idea is simple.  Start out with a large number of individuals that are susceptible, but not yet infected.   Have another group of individuals that are infectious.  Use a the probability of getting infected and a random number generator to infect the susceptible individuals over time.

The details

The major input parameters to the model need to be determined:

numpeople = is the number of people in the simulation run. Since this disease isn’t deadly, this number doesn’t change (we are neglecting births and deaths from other causes).

numinfected = This is the number of people that are infected at the start of the simulation. Since we are dealing with random numbers, too low a value will make the statistical noise dominate the startup of the epidemic.

numtimesteps = is the number of iterations of the simulation. It wouldn’t be hard to change the code to an end time.

dt = the size of a timestep in days.

plambda = is the lambda in the differential equation. It is the number of contacts a day that an infected person makes that have the possibility of being infectious.

After checking the data we create a list of persons where each entry is a single character corresponding to their status. Each individual is either an S (susceptible) or an I (Infectious). In this model once you become infectious, you remain infectious.

After some housekeeping (opening an output file, printing headers, etc.) we enter the main loop.

  1. Determine the current number of infectious and Susceptible people
  2. Determine the probability of a susceptible individual getting infected in the next time interval. This is easy if the time interval is a day. If the infection rate is high enough, it might be better to use a fraction of a day as a time-step. Two approached are discussed below.
  3. Go though the list of susceptible people. For each susceptible person generate a random number between zero and one. If the number is less than the probability in 1, mark that person infected.
  4. Compute the theoretical value
  5. recompute the number of susceptible and infected persons as a result of this time step.
  6. compare the simulation and the theoretical results
  7. print it out to the screen and the file
  8. repeat the process starting at one until we have used up the iterations. (Yes this is the function of the for loop before 1….)

Time step control

In the differential equation the probability of a susceptible becoming infected changes continuously, even during a single day.   In this simulation the probability is fixed during the time step using values determined at the start of the time-step.  As the time-step goes to zero the Monte Carlo results should approach the differential equation approach (spoiler it does).   The probability of a susceptible person being infected in a time interval \delta t gets smaller as \delta t gets smaller.  How important is this?

If the disease is progressing rapidly then the time interval might want to be a fraction of a day.  To put it another way the change in the number of infected people should be a small fraction of the current number of infected people.

    \[ \lambda \frac{I(t)S(t)}{N} \delta t << I(t) \]

or

    \[ \lambda \frac{S(t)}{N} \delta t << 1  \]

since every one gets infected and the number of susceptable people starts off close to N we have that 

    \[  \lambda  \delta t << 1  \]

so for a rapidly advancing disease the time interval might need to be smaller than a day.  A python code follows and you can try this out for yourself.

In the Monte Carlo code we need a way of generating a probability for infection for a time interval \delta t and parameter \lambda.  Here are two strategies.  They both agree when \delta t is very small or equal to 1, but differ at moderate values.  If \delta t = 1 the probability parameter is \lambda \frac{I}{N}

 a) multiply \lambda \frac{I}{N} by \delta t This assumes that we can ignore the change in I(t) over a day and that the that the probabilities in the sub interval  simply “adds up”  to the total probability.  It works … but .. if you know something about probabilities you might recognize that this really doesn’t work.

b)  consider a group of A people going though a sequence of time-steps.  We will assume that at each time-step the probability of getting infected is p.  After the first time-step the number of people that are infected is Ap, but the number of people that are not infected is A(1-p).  After n time intervals the number of people that remain uninfected is A (1-p)^n  Since the original number was A, the number that were infected in those time intervals is just A-A (1-p)^n or A(1-(1-p)^n).   But the number of intervals during a day is simply \frac{1}{\delta t} and the probability in the subinterval  is related to the probability during a full day by

    \[  1-(1-p)^{\frac{1}{\delta t} }= \lambda_i = \lambda \frac{I}{N} \]

Solving this for p in terms of \lambda_i  yields

    \[ p = 1-(1-\lambda_i)^{\delta t} \]

p becomes the probability during a time-step of size \delta t when the initial full day probability is \lambda_i.  In the software below the statements for both options are present, with the simple option a commented out.  

Other issues

Another difference is “noise” that is generated by using random numbers.  You can see this by running the software with different random number “seeds.”  Different seeds generate different pseudo random number sequences.  This allows the person evaluating the computer runs to generate multiple runs with different seeds to see the size of the random number noise.   If the initial number of infected persons is low, then this noise becomes quite evident at the the start of the simulation. This is one reason the code as supplied has a very large population and a large number of initially infected individuals.

In the real world the number of people being infected at 3 am in the morning is probably lower than those being infected at noon.  The number of people being infected during Wednesday may be different than the number of people being infected on Saturday or Sunday.  Peoples lives have rhythms that are not reflected in either the differential equations or in the Monte Carlo calculations.  So I would be hesitant to say one method is preferable to the other.  Models don’t include everything. The modeler needs to judge what can be left out without significant influence on the questions the modeler is trying to answer.

We will examine a third option when we consider the next model (SIS) that will build off of the Monte Carlo approach.

Python Implementation

Here is a sample Python code that implements the Monte Carlo technique.  I have added code to drop the data to a comma delimited file.  This allows the user to load the data into Excel or other spreadsheet program.  Hopefully the comments in the code are sufficient for you too follow the code.  I have avoided the use of lots of special procedures to create a modern looking piece of code.  For the small code it seemed preferable to make it as legible as possible with a simple start at the top go to the bottom structure.  

                    

import math
# import the random number generator and math
from random import seed
from random import random
# seed random number generator
seed(1)

#*****************************************************************
#  MONTE CARLO SI Epidemeological Code
#      Version 1.03   
#      May 15, 2020
#*****************************************************************

vers = "SI monte carlo code vs. 1.03 May 15, 2020 by David A. Larrabee \n"
header = "Time, Susceptible, Infected, new infections, Predicted # infected,"
header = header+" error (%)"
EOL="\n"   # end of line character

print (vers)

# ----------------------------------------------------------------------
#   initialization of the code
# ----------------------------------------------------------------------
# input parameters.

numpeople    = 300000   # the number of people in the simulation run
numinfected  = 1000     # the initial number of infected people must be less 
                        # the number of people
numtimesteps = 900       # the number of days to run the simulation
dt           = 0.2      # the fraction of a day in a timestep
plambda      = 1.5/14   # the parameter in SI, number of people an infected
                        # person infects each day

datafile="SImontecarlo103.csv"  # name of the file to store data.  

#  Data check and set up for the run

if numinfected > numpeople:  # just to prevent an error, trivial case, everyone
    numpeople = numinfected  # starts out infected
    
if numinfected < 1:
    numinfected = 1    
    
persons = []                # list holding the status of the persons

for i in range(numpeople):
    persons.append("S")     # set the default value to suspectible
    
for i in range(numinfected):
    persons[i]="I"          # set status to infected for the select few 
    
frac = (numpeople-numinfected)/numinfected # parameter for the exact solution

# if desired log the results in a data file

if datafile != "":
     fhandler=open(datafile,"a")
     strng=vers+EOL
     strng=strng+'Total number of people =, '+str(numpeople)+ EOL
     strng=strng+"Total number of people initially infected =, "
     strng=strng+str(numinfected)+EOL
     strng=strng+" lanmbda =,"+str(plambda)+EOL
     strng=strng+" number of timesteps, "+str(numtimesteps)+EOL
     strng=strng+" fraction of a day in a timestep, "+str(dt)+EOL     
     strng=strng+header+EOL
     fhandler.write(strng)
     
print(header)
    
# -----------------------------------------------------------------
#   MAIN LOOP
# -----------------------------------------------------------------

for timestep in range(numtimesteps):
    
     #  total up the number of infected and susceptable people at the start
     time = (timestep + 1)*dt   # the actual time at the end of this timestep
     numsusceptable = 0
     numinfected    = 0
     
     for i in range(len(persons)):
         if persons[i] == "S":
             numsusceptable = numsusceptable + 1
         else:
             numinfected = numinfected + 1
             
     # this is the probablity that a susceptable person becomes infected
     # it the the number of people that are infected in the SI model divided
     # the number of susceptible people.  It should be less than one.
     # if the interval is less than a day the probability needs to be adjusted
     # multiplying the infectious probability by DT gives good results for DT
     # is one, or DT is small.  A better approach is the second below (see 
     # on line writeup)
             
     if numsusceptable >0:
         infectprobp = plambda * numinfected / numpeople   # probability of
#         infectprob =infectprobp*dt
         infectprob = 1.0-(1.0-infectprobp)**dt
     else:
         infectprob = 1.   # everyone is infected, if you were susceptible 
                           # eveyone else is trying to infect you!
         
     if infectprob >1 :    # shouldn't happen....
         infectprob = 1.0
         print("infectprob >1")
         
     # throw the die

     newinfections = 0
         
     for i in range(numpeople):
         if persons[i]== "S":
             if random() <= infectprob:
                 persons[i]="I"
                 newinfections = newinfections +1
                 
     theory = numpeople/(1+frac*math.exp(-1.0*plambda*time))
     
     inf = numinfected + newinfections
     sus = numsusceptable - newinfections
     err = (inf - theory)/theory*100.0
    
     print('{:6.2f}'.format(time),'{:10d}'.format(sus), '{:9d}'.format(inf),
           '{:14d}'.format(newinfections), '{:9.0f}'.format(theory),
           '{:+.2f}'.format(err),'%' )
     
     if datafile != "":
         strng=str(time)+","+str(sus)+','+str(inf)+','+str(newinfections)
         strng=strng+","+str(theory)+','+str(err)+EOL
         fhandler.write(strng)
         
fhandler.close()
print("done")           
                 

Code Results

Here is the start of the code output to the screen for the code input data above.  For the first few time steps.

                    

SI monte carlo code vs. 1.03 May 15, 2020 by David A. Larrabee 

Time, Susceptible, Infected, new infections, Predicted # infected, error (%)
  0.20     298983      1017             17      1022 -0.45 %
  0.40     298961      1039             22      1044 -0.44 %
  0.60     298942      1058             19      1066 -0.77 %
  0.80     298922      1078             20      1089 -1.03 %
  1.00     298904      1096             18      1113 -1.50 %
  1.20     298875      1125             29      1137 -1.03 %
  1.40     298859      1141             16      1161 -1.74 %
  1.60     298833      1167             26      1186 -1.62 %
  1.80     298805      1195             28      1212 -1.39 %
  2.00     298783      1217             22      1238 -1.70 %
  2.20     298750      1250             33      1265 -1.16 %
  2.40     298729      1271             21      1292 -1.62 %
  2.60     298703      1297             26      1320 -1.73 %
  2.80     298678      1322             25      1348 -1.95 %
  3.00     298644      1356             34      1377 -1.55 %
  3.20     298614      1386             30      1407 -1.50 %
  3.40     298592      1408             22      1437 -2.04 %
  3.60     298560      1440             32      1468 -1.93 %
  3.80     298537      1463             23      1500 -2.47 %
  4.00     298500      1500             37      1532 -2.11 %
  4.20     298460      1540             40      1565 -1.62 %
  4.40     298430      1570             30      1599 -1.82 %
  4.60     298402      1598             28      1634 -2.17 %
  4.80     298374      1626             28      1669 -2.56 %
  5.00     298341      1659             33      1705 -2.68 %
  5.20     298303      1697             38      1741 -2.55 %
  5.40     298264      1736             39      1779 -2.41 %
  5.60     298228      1772             36      1817 -2.48 %
  5.80     298195      1805             33      1856 -2.76 %
  6.00     298146      1854             49      1896 -2.23 %
  6.20     298102      1898             44      1937 -2.01 %
  6.40     298066      1934             36      1979 -2.26 %
  6.60     298017      1983             49      2021 -1.89 %
  6.80     297978      2022             39      2065 -2.07 %
  7.00     297942      2058             36      2109 -2.43 %
  7.20     297904      2096             38      2155 -2.72 %
  7.40     297864      2136             40      2201 -2.95 %
  7.60     297815      2185             49      2248 -2.81 %
  7.80     297773      2227             42      2296 -3.02 %
  8.00     297728      2272             45      2346 -3.15 %

 

Comparison with exact solutions

The plot below was generated from the Comma delimited files and loaded into Excel.  The dashed line is the solution of the SI model differential Equation.  The green circles are the Monte Carlo simulation using a time step of 1 day.  The red line is a Monte Carlo simulation with a time step of 1/5 of a day.  The other parameters are the same as in the code above.

Next Steps

Next we will consider the first useful model, SIS or Susceptible Infectious Susceptible. (In process).

A dialog between Science, economics and Religion