Changes

commit 761c9e0b1c446bf401cb0333c5a5e99b6512a27e Author: Alex Zarebski <aezarebski@gmail.com> Date: Thu Aug 25 15:17:39 2022 +0100 run through this tutorial again with the new template commit 238f6f0dbf9760c5fa14e36beeeca1c1bf860fa1 Author: Alex Zarebski <aezarebski@gmail.com> Date: Wed Aug 24 16:35:52 2022 +0100 update to use the new templates commit 80a745619ef91326620db7f74281e81a9c2706fe Author: Alex Zarebski <aezarebski@gmail.com> Date: Tue Aug 23 16:44:37 2022 +0100 Squashed commit of the following: commit 1615355922b4c8dc3fdbdcfbe8d8097f90371bae Author: Alex Zarebski <aezarebski@gmail.com> Date: Tue Aug 23 16:40:44 2022 +0100 editing docs commit 0e8e0ef1e65331c9232bcc2670c48f5d1ec850eb Author: Alex Zarebski <aezarebski@gmail.com> Date: Tue Aug 23 16:38:47 2022 +0100 upgrade the documentation commit b68227d3742381925d3e8665e1bc2b894fa637dd Author: Alex Zarebski <aezarebski@gmail.com> Date: Tue Aug 23 16:32:38 2022 +0100 include results figures commit 5fc8adeacf1671288f3b8f0b25d31c2eb2e2ab48 Author: Alex Zarebski <aezarebski@gmail.com> Date: Tue Aug 23 16:31:31 2022 +0100 include materials for new tutorial commit f53bac828882d9511de474024de0b2192af5592f Author: Alex Zarebski <aezarebski@gmail.com> Date: Tue Aug 23 14:53:05 2022 +0100 organise some materials commit 4feb7a8f5c7564fbba58f8ea728a5d971b139814 Author: Alex Zarebski <aezarebski@gmail.com> Date: Sat Aug 20 16:00:27 2022 +0100 glossary commit b7085653c47b51ddaf9de4b1e39dae81c39856b1 Author: Alex Zarebski <aezarebski@gmail.com> Date: Sat Aug 20 15:30:47 2022 +0100 add a note about the new parameterisation commit 31927a99718ffbbb4ec0ad0e05cca60dfeb4898a Author: Alex Zarebski <aezarebski@gmail.com> Date: Fri Aug 19 17:11:36 2022 +0100 update class diagram commit cc073f54c9ebecce03e345ad24d3024f3d2c8dc1 Author: Alex Zarebski <aezarebski@gmail.com> Date: Fri Aug 19 16:21:25 2022 +0100 include notes about alternative params commit ec27df583be94f1fbc0fdb200c373caa94b6e0ba Author: Alex Zarebski <aezarebski@gmail.com> Date: Fri Aug 19 15:40:51 2022 +0100 use the newer r0 formulation commit 27b79b96dfc2991a980fb50b8062e11461cf3e9c Author: Alex Zarebski <aezarebski@gmail.com> Date: Fri Aug 19 15:29:29 2022 +0100 include note about branches on wiki commit c3250912d7f8051ad5416d8c9cca032c9d3f8485 Author: Alex Zarebski <aezarebski@gmail.com> Date: Fri Aug 19 15:26:39 2022 +0100 Squashed commit of the following: commit 2c08e51eb9f0271f2a499c3ac2f58c8ae2484b03 Author: Alex Zarebski <aezarebski@gmail.com> Date: Fri Aug 19 15:22:58 2022 +0100 update tutorial documentation commit bc61cedaaecffacae0375f1ee04b050276c27fe5 Author: Alex Zarebski <aezarebski@gmail.com> Date: Fri Aug 19 14:51:46 2022 +0100 include a bunch of scripts to make this easier to produce
Alex Zarebski · 520d96fe
--- a/Constant-Parameters-Canonical.org
+++ b/Constant-Parameters-Canonical.org
+#+title: Example: Constant Parameters
+Created by Alexander Zarebski
+Edited by [[https://github.com/BernardoGG][Bernardo Gutierrez]] on 12/08/2022
+Last edited using version =v0.3.0=.
+This tutorial demonstrates how to use TimTam to estimate the reproduction number
+and prevalence of infection using a sequence alignment and a time series of
+confirmed cases. Often in genomic epidemiology, the available set of sequences is
+not representative of the total number of recorded cases due to limitations in
+sequencing capacity, accessibility to samples or other reasons. TimTam facilitates
+the inclusion of case counts combined with available genomic data to estimate
+epidemiological parameters from outbreaks.
+* Data and model
+In this tutorial, we will work with data obtained from a simulated epidemic. The
+data consists of a sequence alignment, [[https://github.com/aezarebski/timtam2/wiki/tutorial-1/sample-sequences.fasta][sample-sequences.fasta]] and a time series
+of daily case counts, [[https://github.com/aezarebski/timtam2/wiki/tutorial-1/aggregated-occurrences.csv][aggregated-occurrences.csv]]. These data were simulated from
+a birth-death process in which individuals infect (therefore 'give birth to')
+new individuals and cease to be infectious (therefore 'die') after a given time.
+From the epidemic process, individuals that cease to be infectious as they
+recover or are removed from the epidemic due to a fatal outcome of the
+infection. However, from our data collection perspective, these 'removed'
+individuals correspond to one of three possible scenarios: i) they recover from
+the infection without ever being diagnosed (in which case we don't observe their
+infection at all), ii) they can be diagnosed and have a genetic sequence
+generated (in which case we get a time stamped sequence for that case) or they
+can be diagnosed but generate an unsequenced sample (in which case they appear
+in the dataset as a confirmed case but there is no sequence data attached to the
+case). For the unsequenced samples, they are aggregated into daily counts giving
+rise to the time series data.
+The following figure shows the number of infectious individuals ("infectious")
+through time along with the cumulative number of people who were previously
+infectious, for example, who have recovered or been sampled. Note from the
+subtitle that there are 2222 infectious individuals at the present. The number
+of infectious individuals is growing exponentially, as indicated by the linear
+growth on a logarithmic /y/-scale.
+[[https://raw.githubusercontent.com/wiki/aezarebski/timtam2/images/tutorial-1-simulated-trajectories.png]]
+The following figure shows the daily number of confirmed cases: infectious that
+have been observed but for which a sequence was not collected. These data form
+the time series that we will use.
+[[https://raw.githubusercontent.com/wiki/aezarebski/timtam2/images/tutorial-1-occurrence-time-series.png]]
+Note that at the end of the simulated epidemic there are approximately 2250
+infectious individuals. The parameters used to simulate this epidemic are set to
+represent a pathogen with a basic reproduction number of \(1.85\). To simplify
+this tutorial we will assume a known death rate of (\(0.046\)), and that the
+pathogen evolves with a molecular clock rate of (\(10^{-3}\))
+substitutions/site/day. Both values fall within the ballpark of what might be
+expected for an RNA virus. The origin time for our simulated epidemic took place
+(\(70\)) days before the end of the simulation.
+* Analysis
+1. Create a work folder in your computer and download the [[https://github.com/aezarebski/timtam2/wiki/tutorial-0/sample-sequences.fasta][sample-sequences.fasta]] and
+   [[https://github.com/aezarebski/timtam2/wiki/tutorial-0/aggregated-occurrences.csv][aggregated-occurrences.csv]] files there.
+2. Create an XML file using BEAUti.
+   1. Load the sequence data into BEAUti
+   2. Parse the tip-dates using the default auto-configured values function.
+   3. Leave the site model tab with the default values.
+   4. Set the clock rate to =0.001= in the clock model tab.
+   5. Select the /Tim Tam Model/ on the tree prior tab.
+   6. (Optional) set the chain length to a smaller value if you are in a hurry;
+      1,000,000 MCMC steps should do.
+   7. Save the resulting file as an XML.
+3. We now need to estimate some parameter values to add to the XML. to do this,
+   download and run the R script [[https://github.com/aezarebski/timtam2/wiki/tutorial-1/print-disaster-data.R][print-disaster-data.R]] and keep a copy of the
+   data that gets printed out on the Console or Terminal. There should be two
+   sets of values, these are the representation of the time series data that
+   TimTam expects. We do this because the times in the simulation are relative
+   to date of the last sequence we have; we therefore need to shift the times in
+   the XML so they agree with the ones we just computed.
+4. Now it's time to tweak the XML by opening it in a text editor and making the
+   following edits:
+   - Remove operators relating to the clock rate, as we are working with a known
+     (from the simulation parameters) evolutionary rate. There should be two of
+     these clock operators, you can identify them because they begin with
+     ~<operator id="clock.rate"~
+   - Remove all references to =historySizes= and =historyChecks=, these are used
+     to estimate historical prevalence and are not needed for this example.
+   - Copy and paste the two number series from the R script output as the values
+     of =disasterTimes= and =disasterSizes= respectively, in the TimTam
+     distribution node. You should recognise it by the =id=
+     =TimTam.t:sample-sequences=. This will format the time series of the
+     sampling times and sizes so that they are included properly.
+   - The final XML should look like [[https://github.com/aezarebski/timtam2/wiki/tutorial-1/timtam.xml][timtam.xml]].
+5. Run MCMC by executing the XML file on BEAST (you may want to grab a coffee
+   while this is running.)
+6. Plot the results using [[https://github.com/aezarebski/timtam2/wiki/tutorial-1/post-processing.R][post-processing.R]]. This will produce two figures
+   similar to the ones shown below, summarising some of the epidemiological
+   estimates from your analysis.
+* Results
+By following this tutorial, we have used a multiple sequence alignment and a
+time series of observed (but not sequenced) case data to estimate the basic
+reproduction number of a pathogen and the prevalence of the disease at the time
+of the last observation in our simulated epidemic.
+The first figure shows the posterior distribution of the (basic) reproduction
+number as inferred during your BEAST run. The true value for the reproduction
+number (defined as a known value in the simulation) is shown as a vertical line.
+[[https://raw.githubusercontent.com/wiki/aezarebski/timtam2/images/tutorial-1-marginals.png]]
+Note how in the plot, the 95% HPD includes the real value of the basic reproduction
+number. We would expect to recover the true value of an estimated parameter credible
+interval when working with real-world data.
+The second figure shows the posterior distribution of the prevalence at the time
+of the last observation, i.e. the number of infectious people at the present time.
+Recall from the observation we made above that this number should be approximately
+2250 infectious cases at that time. How does the estimate from the figure below and
+the estimate from your own analysis compare to the true number of infectious cases
+at this last time point?
+[[https://raw.githubusercontent.com/wiki/aezarebski/timtam2/images/tutorial-1-prevalence.png]]
+The plots produced by the provided R script from your analysis should look very
+similar to the ones shown above but not identical. Remember that the MCMC
+samples across parameter space randomly, but while the exact values vary from
+one run to the next, the distribution of these values should be mostly the same.