|
#+title: Example: Constant Parameters in terms of R0
|
|
#+title: Example: Constant parameters with historical prevalence
|
|
|
|
|
|
Created by Alexander Zarebski
|
|
Created by Alexander Zarebski
|
|
|
|
|
|
Last edited using version =v0.3.0=.
|
|
Last edited using =BEAST v2.6.7= and =timtam v0.3.1=.
|
|
|
|
|
|
This tutorial demonstrates how to use TimTam to estimate the reproduction number
|
|
This tutorial demonstrates how to use TimTam to estimate the reproduction number
|
|
and prevalence of infection at two points in time using a sequence alignment and
|
|
and prevalence of infection at two points in time using a sequence alignment and
|
... | @@ -14,72 +14,78 @@ genomic data to estimate epidemiological parameters from outbreaks. |
... | @@ -14,72 +14,78 @@ genomic data to estimate epidemiological parameters from outbreaks. |
|
|
|
|
|
* Data and model
|
|
* Data and model
|
|
|
|
|
|
In this tutorial, we will work with data obtained from a simulated epidemic. The
|
|
In this tutorial, we will estimate the reproduction number and prevalence of
|
|
data consists of a sequence alignment, [[https://github.com/aezarebski/timtam2/wiki/tutorial-7/sample-sequences.fasta][sample-sequences.fasta]] and a time series
|
|
infection in a simulated epidemic. The data consists of a sequence alignment,
|
|
of daily case counts, [[https://github.com/aezarebski/timtam2/wiki/tutorial-7/aggregated-occurrences.csv][aggregated-occurrences.csv]]. These data were simulated from
|
|
[[https://github.com/aezarebski/timtam2/wiki/tutorial-7/sample-sequences.fasta][sample-sequences.fasta]], and a time series of daily case counts,
|
|
a birth-death process in which individuals infect (therefore 'give birth to')
|
|
[[https://github.com/aezarebski/timtam2/wiki/tutorial-7/aggregated-occurrences.csv][aggregated-occurrences.csv]]. These data were simulated from a (constante rate)
|
|
new individuals and cease to be infectious (therefore 'die') after a given time.
|
|
birth-death process in which individuals infect (therefore 'give birth to') new
|
|
From the epidemic process, individuals that cease to be infectious as they
|
|
individuals and cease to be infectious (therefore 'die') after an exponentially
|
|
recover or are removed from the epidemic due to a fatal outcome of the
|
|
distributed amount of time. From the epidemic process, individuals that cease to
|
|
infection. However, from our data collection perspective, these 'removed'
|
|
be infectious as they recover or are removed from the epidemic due to a fatal
|
|
individuals correspond to one of three possible scenarios: i) they recover from
|
|
outcome of the infection. However, from our data collection perspective, these
|
|
the infection without ever being diagnosed (in which case we don't observe their
|
|
'removed' individuals correspond to one of three possible scenarios: i) they
|
|
infection at all), ii) they can be diagnosed and have a genetic sequence
|
|
recover from the infection without ever being diagnosed (in which case we don't
|
|
generated (in which case we get a time stamped sequence for that case) or they
|
|
observe their infection at all), ii) they can be diagnosed and have a genetic
|
|
can be diagnosed but generate an unsequenced sample (in which case they appear
|
|
sequence generated (in which case we get a time stamped sequence for that case)
|
|
in the dataset as a confirmed case but there is no sequence data attached to the
|
|
or they can be diagnosed but generate an unsequenced sample (in which case they
|
|
case). For the unsequenced samples, they are aggregated into daily counts giving
|
|
appear in the dataset as a confirmed case but there is no sequence data attached
|
|
rise to the time series data.
|
|
to the case). For the unsequenced samples, they are aggregated into daily counts
|
|
|
|
giving rise to the time series data.
|
|
|
|
|
|
The following figure shows the number of infectious individuals ("infectious")
|
|
The following figure shows the number of infectious individuals ("infectious")
|
|
through time along with the cumulative number of people who were previously
|
|
through time along with the cumulative number of people who were previously
|
|
infectious, for example, who have recovered or been sampled. Note from the
|
|
infectious but no longer are, e.g. because they have recovered or been sampled.
|
|
subtitle that there are 2222 infectious individuals at the present. The number
|
|
Note from the subtitle that there are 2222 infectious individuals at the
|
|
of infectious individuals is growing exponentially, as indicated by the linear
|
|
present. The number of infectious individuals is growing exponentially, as
|
|
growth on a logarithmic /y/-scale.
|
|
indicated by the following figure.
|
|
|
|
|
|
[[https://raw.githubusercontent.com/wiki/aezarebski/timtam2/images/tutorial-7-simulation-trajectory.png]]
|
|
[[https://raw.githubusercontent.com/wiki/aezarebski/timtam2/images/tutorial-7-simulation-trajectory.png]]
|
|
|
|
|
|
The following figure shows the daily number of confirmed cases: infectious that
|
|
The next figure shows the daily number of confirmed cases, i.e. the number of
|
|
have been observed but for which a sequence was not collected. These data form
|
|
people that have tested positive on that day (and are infectious) but for which
|
|
the time series that we will use.
|
|
a viral genome sequence was not recorded. These data, the daily counts, form the
|
|
|
|
time series that we will use.
|
|
|
|
|
|
[[https://raw.githubusercontent.com/wiki/aezarebski/timtam2/images/tutorial-7-occurrence-time-series.png]]
|
|
[[https://raw.githubusercontent.com/wiki/aezarebski/timtam2/images/tutorial-7-occurrence-time-series.png]]
|
|
|
|
|
|
Note that at the end of the simulated epidemic there are 2222 infectious
|
|
These data were simulated with [[http://tgvaughan.github.io/MASTER][MASTER]], with parameters chosen so the epidemic
|
|
individuals. The parameters used to simulate this epidemic are set to represent
|
|
has a basic reproduction number of \(1.85\). To simplify this tutorial we will
|
|
a pathogen with a basic reproduction number of \(1.85\). To simplify this
|
|
assume a known net removal rate, denoted by [[https://github.com/aezarebski/timtam2/wiki/Glossary#sigma][sigma]], of (\(0.054 = 0.046 +
|
|
tutorial we will assume a known net removal rate, denoted by [[https://github.com/aezarebski/timtam2/wiki/Glossary#sigma][sigma]], of
|
|
0.008\)), and the pathogen evolves with a substitution rate of (\(10^{-3}\))
|
|
(\(0.046 + 0.008\)), and that the pathogen evolves with a molecular clock rate
|
|
substitutions/site/day constantly across the tree. Both values fall within the
|
|
of (\(10^{-3}\)) substitutions/site/day. Both values fall within the ballpark of
|
|
ballpark of what might be expected for an RNA virus. The origin time for our
|
|
what might be expected for an RNA virus. The origin time for our simulated
|
|
simulated epidemic took place (\(70\)) days before the end of the simulation.
|
|
epidemic took place (\(70\)) days before the end of the simulation.
|
|
|
|
|
|
|
|
* Analysis
|
|
* Analysis
|
|
|
|
|
|
** Setting up a scaffold with BEAUti
|
|
We will start by using BEAUti to set up an XML file which handles most of the
|
|
|
|
analysis. We will then tweak this so that the historical prevalence is
|
|
|
|
estimated.
|
|
|
|
|
|
|
|
** Setting up with BEAUti
|
|
|
|
|
|
1. Create a work folder in your computer and download the [[https://github.com/aezarebski/timtam2/wiki/tutorial-7/sample-sequences.fasta][sample-sequences.fasta]]
|
|
1. Create a work folder in your computer and download the [[https://github.com/aezarebski/timtam2/wiki/tutorial-7/sample-sequences.fasta][sample-sequences.fasta]]
|
|
and [[https://github.com/aezarebski/timtam2/wiki/tutorial-7/aggregated-occurrences.csv][aggregated-occurrences.csv]] files there.
|
|
and [[https://github.com/aezarebski/timtam2/wiki/tutorial-7/aggregated-occurrences.csv][aggregated-occurrences.csv]] files there.
|
|
2. Create an XML file using BEAUti.
|
|
2. Create an XML file using BEAUti.
|
|
1. Load the sequence data into BEAUti
|
|
1. Load the sequence FASTA file into BEAUti
|
|
2. Parse the tip-dates using the default auto-configured values function.
|
|
2. In the *Tip Dates* tab, parse the tip-dates using the default
|
|
|
|
auto-configured values function.
|
|
3. Leave the site model tab with the default values.
|
|
3. Leave the site model tab with the default values.
|
|
4. Set the clock rate to =0.001= in the clock model tab.
|
|
4. In the *Clock Model* tab, set the clock rate to =0.001=.
|
|
5. Select the /Tim Tam Time Series Model/ on the tree prior tab.
|
|
5. In the *Priors* tab, select the /Tim Tam Time Series Model/.
|
|
6. (Optional) set the chain length to a smaller value if you are in a hurry;
|
|
6. (Optional) set the chain length to a smaller value if you are in a hurry;
|
|
1,000,000 MCMC steps should do.
|
|
1,000,000 MCMC steps should do.
|
|
7. Save the resulting file as an XML, eg =timtam.xml=.
|
|
7. Save the resulting file as an XML, e.g. =timtam.xml=.
|
|
|
|
|
|
** Adjusting the XML for our model
|
|
** Adjusting the XML for our model
|
|
|
|
|
|
1. We now need to estimate some parameter values to add to the XML. To do this,
|
|
1. We now need to compute some values to add to the XML. To do this, download
|
|
download and run the R script [[https://github.com/aezarebski/timtam2/wiki/tutorial-7/print-disaster-data.R][print-disaster-data.R]] and keep a copy of the
|
|
and run the R script [[https://github.com/aezarebski/timtam2/wiki/tutorial-7/print-disaster-data.R][print-disaster-data.R]] and keep a copy of the data that
|
|
data that gets printed out on the Console or Terminal. There should be two
|
|
gets printed out on the Console or Terminal. There should be two sets of
|
|
sets of values, these are the representation of the time series data that
|
|
values, these are the representation of the time series data that TimTam
|
|
TimTam expects. We do this because the times in the simulation are relative
|
|
expects. We do this because the times in the simulation are relative to date
|
|
to date of the last sequence we have; we therefore need to shift the times in
|
|
of the last sequence we have; we therefore need to shift the times in the XML
|
|
the XML so they agree with the ones we just computed.
|
|
so they agree with the ones we just computed.
|
|
|
|
|
|
#+begin_src sh
|
|
#+begin_src sh
|
|
Rscript print-disaster-data.R
|
|
Rscript print-disaster-data.R
|
... | @@ -89,13 +95,15 @@ epidemic took place (\(70\)) days before the end of the simulation. |
... | @@ -89,13 +95,15 @@ epidemic took place (\(70\)) days before the end of the simulation. |
|
following edits:
|
|
following edits:
|
|
- Remove operators relating to the clock rate, as we are working with a known
|
|
- Remove operators relating to the clock rate, as we are working with a known
|
|
(from the simulation parameters) evolutionary rate. There should be two of
|
|
(from the simulation parameters) evolutionary rate. There should be two of
|
|
these clock operators, you can identify them because they begin with
|
|
these clock operators, you can identify them because they have the clock
|
|
~<operator id="clock.rate"~
|
|
rate among the parameters that they act on. Search =clockRate= to find
|
|
- We need to tweak =historySizes= and =historyChecks=: these are used to
|
|
these in the file.
|
|
estimate historical prevalence. There are several things that will need to
|
|
- We need to tweak =historySizes= and =historyChecks= which are the
|
|
be removed:
|
|
parameters used to estimate the historical prevalence. There are several
|
|
+ the =stateNode= for =historySizes=, should have its value set to
|
|
things that will need to be edited:
|
|
something similar to =2000 3000= as a starting point,
|
|
+ the =stateNode= for =TTHistorySizes.t=, should have its value set to
|
|
|
|
something similar to =2000 3000= as a starting point, (*n.b.* you may
|
|
|
|
need to adjust this to different values for your own dataset.)
|
|
+ the prior distribution on the history sizes can be left as it is,
|
|
+ the prior distribution on the history sizes can be left as it is,
|
|
+ the =operator= on the history sizes can be left as it is,
|
|
+ the =operator= on the history sizes can be left as it is,
|
|
+ the =log= for the history sizes can be left as it is,
|
|
+ the =log= for the history sizes can be left as it is,
|
... | @@ -107,10 +115,10 @@ epidemic took place (\(70\)) days before the end of the simulation. |
... | @@ -107,10 +115,10 @@ epidemic took place (\(70\)) days before the end of the simulation. |
|
distribution node. You should recognise it by the =id= value of
|
|
distribution node. You should recognise it by the =id= value of
|
|
=TimTam.t:sample-sequences=. This will format the time series of the
|
|
=TimTam.t:sample-sequences=. This will format the time series of the
|
|
sampling times and sizes so that they are included properly. Note that
|
|
sampling times and sizes so that they are included properly. Note that
|
|
BEAUti may have added ~dimension='3'~ to these nodes, in which case delete
|
|
BEAUti may have added ~dimension='3'~ to these nodes, in which case, delete
|
|
these attributes.
|
|
these attributes.
|
|
- Double check that =sigma= is set to the known value of \(0.1\) in
|
|
- Double check that =sigma= (the net removal rate) is set to the known value
|
|
=TimTam.d:sample-sequences=.
|
|
of \(0.1\) in =TimTam.d:sample-sequences=.
|
|
- Double check that the value of =originTime= is set to \(70\).
|
|
- Double check that the value of =originTime= is set to \(70\).
|
|
- Save the resulting XML which should look like [[https://github.com/aezarebski/timtam2/wiki/tutorial-7/timtam.xml][timtam.xml]].
|
|
- Save the resulting XML which should look like [[https://github.com/aezarebski/timtam2/wiki/tutorial-7/timtam.xml][timtam.xml]].
|
|
|
|
|
... | @@ -119,7 +127,8 @@ epidemic took place (\(70\)) days before the end of the simulation. |
... | @@ -119,7 +127,8 @@ epidemic took place (\(70\)) days before the end of the simulation. |
|
1. Run MCMC by executing the XML file on BEAST (you may want to grab a coffee
|
|
1. Run MCMC by executing the XML file on BEAST (you may want to grab a coffee
|
|
while this is running.)
|
|
while this is running.)
|
|
2. Download the simulation data (which is needed for subsequent visualisations)
|
|
2. Download the simulation data (which is needed for subsequent visualisations)
|
|
in [[https://github.com/aezarebski/timtam2/wiki/tutorial-7/tutorial-7-simulation-output.json][tutorial-7-simulation-output.json]].
|
|
in [[https://github.com/aezarebski/timtam2/wiki/tutorial-7/tutorial-7-simulation-output.json][tutorial-7-simulation-output.json]] and make sure it is in your working
|
|
|
|
directory.
|
|
3. Plot the results using [[https://github.com/aezarebski/timtam2/wiki/tutorial-7/post-processing.R][post-processing.R]]. This will produce two figures
|
|
3. Plot the results using [[https://github.com/aezarebski/timtam2/wiki/tutorial-7/post-processing.R][post-processing.R]]. This will produce two figures
|
|
similar to the ones shown below, summarising some of the epidemiological
|
|
similar to the ones shown below, summarising some of the epidemiological
|
|
estimates from your analysis.
|
|
estimates from your analysis.
|
... | | ... | |