---
title: "Simulations and test harnesses"
output: html_document
vignette: >
  %\VignetteIndexEntry{Simulations and test harnesses}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

# Background

Testing the inference of $R_t$ and growth rate estimates needs a ground truth,
which is missing in real outbreaks as infection events cannot be truly observed.
Testing accuracy of estimates of $R_t$ for example needs one or more simulations
of outbreaks that have known parametrisation. `ggoutbreak` contains some
functions to generate synthetic datasets which exhibit some of the complexities
we observed in the COVID-19 pandemic. There are two levels at which the
simulations work, aggregate counts or case line list simulations.

```{r setup}
knitr::opts_chunk$set(echo = TRUE)
here::i_am("vignettes/simulation-test-models.Rmd")
source(here::here("vignettes/vignette-utils.R"))
library(ggoutbreak)
library(patchwork)

plot_changes = function(sim, name, max_y = NA, mapping = ggplot2::aes(), case=FALSE, fn = attr(sim,"fn"), changes = NULL, ...) {
  
  if (!is.null(changes)) {
    col = intersect(c("growth","R"),colnames(changes))
    events = changes %>% dplyr::transmute(
      start = as.Date(as.time_period(t, unit=time_unit)),
      end = NA,
      label = sprintf(paste0(col,"=%1.2f"),!!rlang::sym(col))
    )
  }
  else
    events = attr(sim,"events")
  
  force(fn)
  if ("statistic" %in% colnames(sim)) sim = sim %>% dplyr::filter(statistic == "infections") 
  
  ggplot2::ggplot(mapping=mapping)+
    {if (is.null(events)) NULL else geom_events(events=events)}+
    {if (is.null(fn)) NULL else ggplot2::geom_function(fun = ~ fn(t=as.time_period(.x, sim$time)), xlim=as.Date(range(sim$time)), n = length(sim$time))}+
    {if (!"rt.weighted" %in% colnames(sim)) NULL else ggplot2::geom_step(data = sim, mapping=ggplot2::aes(x=as.Date(time), y=rt.weighted))}+
    {if (!"rt.inst" %in% colnames(sim)) NULL else list(
      ggplot2::geom_point(data = sim, mapping=ggplot2::aes(x=as.Date(time), y=rt.inst),size=0.5),
      ggplot2::geom_line(data = sim, mapping=ggplot2::aes(x=as.Date(time), y=rt.inst),alpha=0.1)
    )}+
    {if (!case || !"rt.case" %in% colnames(sim)) NULL else list(
      ggplot2::geom_point(data = sim, mapping=ggplot2::aes(x=as.Date(time), y=rt.case), colour="blue",size=0.5),
      ggplot2::geom_line(data = sim, mapping=ggplot2::aes(x=as.Date(time), y=rt.case), colour="blue",alpha=0.1)
    )}+
    ggplot2::ylab(name)+ggplot2::xlab(NULL)+
    ggplot2::coord_cartesian(ylim=c(NA,max_y))
}

```


`ggoutbreak` includes a set of simulations that can be used to generate test
data with known parameters. These can be used to validate the output of 
model parameter estimates, calculated from simulated data for which we have gold standard
values for those parameters. Simulations are both at an aggregate level generating 
count data, or at an individual level generating line lists.

The parameters implemented so far are as follows:

- Importation rate
- Reproduction number
- Growth rate
- Infectivity profile (generation time or the time delay of infectee infection 
from time of infector infection)
- Case ascertainment rate
- Probability of symptoms given infection
- Time delay of symptom onset from time of infection
- Probability of hospital admission given infection
- Time delay of hospital admission from time of infection
- Probability of death given infection
- Time delay of death from time of infection
- Probability of testing given infection (or given symptoms)
- Time delay of test sampling from time of infection (or symptoms)
- Time delay of test result from time of test sampling

Depending on the model some of these may be specified in different ways. Different
delays and rates of observation can be added ad-hoc and the simulations can be 
stratified by different classes. In some simulations this can be propagated using
an equivalent to a contact matrix.

# Specifying time varying parameters

The simulation framework is highly configurable, and a key part of this is
specifying parameters that vary over time, or have some random or non random day
to day variation. Most simulation parameters must be given as a time varying
function, but other parameters are often available. `ggoutbreak` includes
various ways of generating these functions, and simulation configuration options
that expect a function are identified by the `fn_...` prefix, for example
`fn_p_symptomatic`.

Functions are evaluated in the context of the simulation data frame, as it is
being built. These will always have a time and a count column, and may have other
columns too. The evaluation happens internally but here we use the `.ts_evaluate`
function to demonstrate the results.

```{r}
demo = tibble::tibble(
  time = 0:9,
  class = rep(c("one","two"),5),
  flag = c(rep(TRUE,5),rep(FALSE,5))
)
```

A static value can be supplied as a purrr style lambda.

```{r}
.ts_evaluate( ~ 0.5, demo )
```

A time varying function can be supplied as a purrr style lambda where the first
parameter is the `time` column of the `demo` dataframe

```{r}
.ts_evaluate( ~ ifelse(.x < 5, 2, 0.5), demo )
```

An alternative syntax uses an anonymous function with named parameters. The
time column is always shortened to `t`. This format can be used to key off other 
variables to allow a time varying, and class specific parameter to be returned

```{r}
.ts_evaluate( \(t, class) dplyr::case_when(
  class == "one" ~ "variant 1 R_t value",
  class == "two" ~ "variant 2 R_t value"
), demo )
```

If you don't know what columns are available providing an empty or otherwise
incorrect function results in the valid parameter names being displayed in the
error message:

```{r}
try(.ts_evaluate( \() {} , demo ))
```

## Step and linear functions

A common need is to set a number of fixed levels, or fixed knots and have
the function interpolate between them. The step function might be used to
parametrise $R_t$ in the event of a lock-down for example.

To make this simple most simulation functions will take a `changes` dataframe
that defines the time point and the new value of, either the reproduction number
or the growth rate at the time point. These can be used as parametrisation for
the reproduction number.

```{r}

changes = tibble::tibble(
  t = c(0,20,40,60,80), 
  growth = c(0.1,0,-0.1,0,0.1)
)

fn = cfg_step_fn(changes)
ggplot2::ggplot()+ggplot2::geom_function(fun = fn, xlim = c(0,100))+ggplot2::ylab("growth rate")

```

`cfg_linear_fn` produces a very similar effect with less abrupt changes in $R_t$
and changes occurring such that the value at the knot is correct.

```{r}

changes = tibble::tibble(
  t = c(0,10,20,30,40,50,60,70,80), 
  R = c(1,1.5,1.75,1.25,0.9,0.7,0.8,0.95,1)
)

fn = cfg_linear_fn(changes)
ggplot2::ggplot()+
  ggplot2::geom_function(fun = fn, xlim = c(0,100))+ggplot2::ylab("reproduction number")+
  ggplot2::geom_hline(yintercept=1,linetype="dashed")
  

```

## Random functions

For some applications (e.g. ascertainment, importation) it might useful to have
a value that includes some random noise, but with the an expected value of the
average. This is easiest with a `purrr` style lambda.

```{r}
# A random normally distributed value with mean 5 and SD 1
# `.x` here will be interpreted as time as the first parameter, and in this
# case is only used to size the returned random gaussian.
.ts_evaluate(~ stats::rnorm(.x,5,1), demo)
```

For convenience `ggoutbreak` includes some RNGs that are parametrised by mean and a
dispersion parameter.

```{r}

# A random Bernoulli parametrised by probability.
.ts_evaluate(~ rbern(.x, p = 0.5), demo)

# A beta distributed quantity parametrised by probability and dispersion (1-high to 0-none)
.ts_evaluate(~ rbeta2(.x, prob = 0.7, kappa = 0.1), demo)

# A log normal parametrised by mean and SD on the true scale
.ts_evaluate(~ rlnorm2(.x, mean = 5, sd =1), demo)

# A gamma parametrised by mean and SD on the true scale
.ts_evaluate(~ rgamma2(.x, mean = 5, sd =1), demo)
```

These functions can be combined with other logic to make a time dependent random
number, in this case the first 5 random gammas are drawn from a distribution 
with mean 1 and SD 1 and the last 5 from mean 6 and SD 1:

```{r}
# A random number from a gamma distribution parametrised by mean and SD on the true scale
# where the mean is a time varying value. Again here `.x` is the time and it is
# being used to define the number of returned value sand the mean of these values
.ts_evaluate(~ rgamma2(.x, mean = ifelse(.x < 5,1,6), sd =1), demo)

```

Although time dependence is the most likely scenario, any simulation component
can be used to control gamma and beta distributed quantities. This allows for
configuring variant specific hospitalisation rates, or delays for example.

```{r}

hospitalisation_prob_fn = cfg_beta_prob_rng(
  probability_fn = \(variant) ifelse(variant=="alpha", 0.2, 0.02),
  kappa_fn = ~ 0.1
)


demo = tibble::tibble(
  t = 1:200,
  variant = c(rep("wildtype",100),rep("alpha", 100))
) %>% dplyr::mutate(
  value = .ts_evaluate(hospitalisation_prob_fn, .)
)

ggplot2::ggplot(demo, ggplot2::aes(x=variant,y=value))+ggplot2::geom_point(position = "jitter")+
  ggplot2::ylab("probability hospitalisation")


```

## Delay distributions and delay RNGs

A key parameter is the generation interval or infectivity profile that defines
the delay between infection of infector and infectee in a transmission chain.
Other delays such as infection to symptoms, or infection to admission also need
to be parametrised. This is done using a empirical probability distribution as
described in [this vignette](infectivity-profile-discretisation.html) (referred
to as 'IP distributions'). In general IP distributions are either used for a
convolution of summary count data, to simulate the effect of delay on an
aggregate measure, or for random sampling to generate a delay for each
individual in a line list. There are a couple of ways of constructing these
for a simulation:

```{r}

gamma_ip = make_gamma_ip(median_of_mean = 5, median_of_sd = 1)
emp_ip = make_empirical_ip(omega = c(0,0,0,1,1,2,3,3,2,1,0,0))

p1=gamma_ip %>% plot_ip() + ggplot2::facet_wrap(~"gamma")
p2=emp_ip %>% plot_ip() + ggplot2::facet_wrap(~"empirical")
p1+p2+patchwork::plot_layout(nrow=1)#,axes = "collect")
```

The IP distributions are used directly for convolution in aggregate simulations. 
For line list simulations we want a random sample to apply to individuals. For 
gamma based IP distributions the `rgamma` or `rgamma2` functions could be used 
directly. For empirical distributions we can generate random numbers

```{r}
emp_sample_fn = cfg_ip_sampler_rng(emp_ip)

tmp = tibble::tibble(
  serial_interval = emp_sample_fn(2000)
)

p3 = p2 %above% ggplot2::geom_histogram(data=tmp,mapping=ggplot2::aes(x=serial_interval, y=ggplot2::after_stat(density)),
      breaks = c(0,seq(0.5,12)),fill="grey80",colour="grey40")
p3

```

These IP delay distributions may themselves change over time, or change as a
result of other characteristics of the simulation. This is difficult to demo
outside of the simulation, and we will give some examples later. In principle 
though a time varying function for selecting one of 2 IP delays might look like
this:

```{r}

# We start by defining a time varying function for the mean of a gamma
delay_mean_fn = cfg_linear_fn(tibble::tribble(
  ~t , ~delay,
  0, 3, # 3 day delays initially,
  4, 3, # Until day 4. Between day 4 and 10 delays improving
  10, 1 # by day 10 delays steady at 1 day
))

# We wrap this in a a function to calculate a IP for each time point
delay_fn = function(t) purrr::map( delay_mean_fn(t), function(mean) {
  make_gamma_ip(median_of_mean = mean, median_of_sd = sqrt(mean))
})

# Within the simulation this delay function will be evaluated for each day
# here we do it manually

times = 0:14
delay_t = delay_fn(t=times)

# this is a list of IP distributions.
tibble::tibble(ip = delay_t, t=times) %>% tidyr::unnest(ip) %>% dplyr::glimpse() %>%
  ggplot2::ggplot()+
  ggplot2::geom_rect(ggplot2::aes(xmin=t,xmax=t+1,ymin=a0,ymax=a1,fill = probability))+
  ggplot2::xlab("time")+ggplot2::ylab("delay")

```

For time varying delays in individual based models things are a little simpler.
We create RNG that has a mean for every individual in the simulation.

```{r}

delay_mean_fn = cfg_linear_fn(tibble::tribble(
  ~t , ~delay,
  0, 3, # 3 day delays initially,
  4, 3, # Until day 4. Between day 4 and 10 delays improving
  10, 1 # by day 10 delays steady at 1 day
))

delay_rng = function(t) { rgamma2(t, delay_mean_fn(t)) }

times = rep(0:14,1000)
delay = delay_rng(t=times)

tibble::tibble(delay = delay, t=times) %>%
  ggplot2::ggplot()+ggplot2::geom_boxplot(ggplot2::aes(x=as.factor(t),y=delay),outliers = FALSE)+
  ggplot2::xlab("time")+ggplot2::ylab("delay")

```

These approaches can be applied to any delay distribution, other than time to
infection such as time to hospitalisation, and so these can be made to vary on
other variables in the simulation such as variant class or potentially patient
age.

## Periodic functions

A periodic function is most obviously useful to generate seasonal forcing:

```{r}

# A simple to interpret and easy to make integrate to 0, so that growth remains
# bounded. 
growth_rate_fn = \(t) dplyr::case_when(
  t %% 365 < 40 ~ 0.1,
  t %% 365 < 80 ~ -0.1,
  TRUE ~ 0)
ggplot2::ggplot()+ggplot2::geom_function(fun = growth_rate_fn, xlim = c(0,365*2))+ggplot2::ylab("growth rate")

```

Other important uses for periodic functions is to simulate the patterns of
within week variation. This is often seen in testing data as a result of delays
to being tested, or to test processing over the weekend. To model this we
have two variants of weekly periodicity that use a gamma distribution for the
delay. One can be used represent the delay distribution for convolution in count
based simulations and another for random sampling in individual based simulations.

```{r}
gamma_means = c(1,1,1,1,4,3,2)
delay_fn = cfg_weekly_ip_fn(mean=gamma_means)

# delay_fn can be used as the parameter for a delay distribution convolution 
# format_ip takes a probability based delay distribution and gives us a summary
# the mean, sd, etc.
.ts_evaluate(delay_fn, tibble::tibble(t=1:7)) %>% 
  purrr::map_chr(format_ip)
```

```{r}
# for line list functions a RNG with weekly periodicity generates samples 
# of the interval that is consistent with the parameters passed to 
# `cfg_weekly_gamma_rng`
delay_rng = cfg_weekly_gamma_rng(mean=gamma_means)

tibble::tibble(t=rep(1:7,100)) %>%
  dplyr::mutate(sample = .ts_evaluate(delay_rng, .)) %>%
  dplyr::group_by(t) %>%
  dplyr::summarise(
    mean = mean(sample),
    sd = stats::sd(sample)
  ) %>%
  dplyr::mutate(
    expected_mean = gamma_means,
    expected_sd = sqrt(gamma_means)
  )

```

We will demonstrate this in a simulation later.

# Simulation cookbook

## Count based simulations

Count based models are typically based around a poisson process where the 
expected number of cases per day is expressed as a function of imports and 
growth, either in terms of the exponential growth rate, or a combination of the
reproduction number and infectivity profile (generation time).

### Basic growth rate possion model with ascertainment noise.

Daily count simulations are based around defined incidence rates, which can be
expressed in terms of a time dependent growth rate and an time dependent
importation rate. From this a incidence time series can be generated using a
poisson or negative binomial distribution. We can also define a simulation in
terms of time dependent reproduction number, time dependent infectivity profile,
and a time dependent importation rate.

```{r}

withr::with_options(list('day_zero'=Sys.Date()),{

  # define a growth rate time series
  changes = tibble::tibble(
    t = c(0,15,40,60,80), 
    growth = c(0.1,0.01,-0.075,0,0.05)
  )
  
  sim = sim_poisson_model(
    changes = changes,
    max_time = 120,
    time_unit = "1 day"
  )
  
  p1 = plot_changes(sim, "parametrised growth rate", date_breaks = "2 weeks") 
  p2 = ggoutbreak::plot_counts(sim, events = attr(sim,"events"), date_breaks = "2 weeks")
  p1+p2+patchwork::plot_annotation(tag_levels = "A")+plot_layout(ncol=1,axes = "collect")

})
```

This produces summary statistics only. With the growth rate and an infectivity
profile, we can use the methods of Wallinga and Lipsitch to infer a theoretical
value of $R_t$, giving us a ground truth to validate $R_t$ estimates.

```{r}

# COVID-19 generation time estimates from Ganyani et al 2020.
ip = make_gamma_ip(5.2, 3.78, 6.78, 1.72, 0.91, 3.93, epiestim_sampler=FALSE, epiestim_compat=FALSE)

fn_Rt = changes %>% 
  dplyr::transmute(
    t,
    R_t = wallinga_lipsitch(growth, y = ip$probability, a0 = ip$a0, a1 = ip$a1)) %>%
  cfg_step_fn()

plot_changes(sim, "parametrised reproduction number", fn=fn_Rt, date_breaks = "2 weeks") 


```

Simulated noise can be introduced by scaling the incidence rate by a random
ascertainment rate, which can vary over time. The randomness factor (`kappa` - a
measure of dispersion) is calibrated to range from 0 (none) to 1 (maximum).

```{r}

sim2 = dplyr::bind_rows(lapply(c(0,0.25,0.5,1), function(k) 
    sim %>% 
    sim_apply_ascertainment(
      fn_asc = ~ rbeta2(.x, prob = 0.7, kappa = k)
    ) %>%
    dplyr::mutate(ascertainment_noise = k) 
  )) %>%
  dplyr::group_by(ascertainment_noise)

ggoutbreak::plot_counts(sim2, events = attr(sim,"events"), date_breaks = "2 weeks")+
  ggplot2::facet_wrap(~sprintf("randomness: %1.2f",ascertainment_noise),ncol=2)

# Noise introduced by random day to day ascertainment variation:

```


In this example we select a time varying delay distribution, reflecting test
processing delays, due to setting up testing and then a public holiday. These
can be applied as a time-varying convolution filter on summary case count
simulations to simulate the effect that changing delay distributions has on
aggregate case counts. For example dynamic delays in test processing can result
in slow down and catch up anomalies.

```{r}

ip_mean_fn = cfg_linear_fn(tibble::tribble(
  ~t , ~delay,
  0, 3, # 3 day avg delays initially
  10, 1, # by day 10 delays decreased to 1
  25, 1, # remain steady until day 25
  26, 3, # public holiday on day 26 
  27, 3, # and day 27
  28, 1 # returning to normal on day 28
))

p_detection_fn = cfg_linear_fn(tibble::tribble(
  ~t , ~p,
  0, 0.1, # initial low probability of detection
  10, 0.7,
  90, 0.7,
  120,0.3
))

delay_detection_fn = ~ make_gamma_ip(median_of_mean = ip_mean_fn(.x))

sim3 = sim %>% sim_convolution(
  p_fn = p_detection_fn,
  delay_fn = delay_detection_fn,
  output = "detected"
) %>% sim_delayed_observation(
  input = "detected",
  delay_fn = ~ make_gamma_ip(7,median_of_sd = 10)
)

p1=plot_changes(sim3,name = "delays (days)", fn = ip_mean_fn)+ggplot2::ylim(0,NA)
p2=plot_changes(sim3,name = "detection", fn = p_detection_fn)+ggplot2::ylim(0,1)
p3=plot_counts(sim3,events = attr(sim3,"events"),mapping=ggplot2::aes(colour=statistic))+ggplot2::geom_line()

p1+p2+p3+patchwork::plot_layout(ncol=1,heights=c(1,1,2),axes = "collect")


```

### Seasonal outbreak using a reproduction number

We can simulate a seasonal outbreak by providing a periodic growth rate or $R_t$
as a parameter to a simple poisson model. In this example we also simulated a 
default set of delays which keying off an infection event, generate a delayed
set of symptom observations, test samples, results, hospitalisations and death 
time series.

```{r}


seasonal_rt_fn = \(t) dplyr::case_when(
  t %% 365 < 20 ~ 2.5,
  t %% 365 < 60 ~ 0.6,
  TRUE ~ 1
)

seasonal_imports = \(t) ifelse(
  t %% 365 < 5, 5, 0 
)


seasonal_sim = sim_poisson_Rt_model(
    max_time = 365*3,
    fn_Rt = seasonal_rt_fn,
    fn_imports = seasonal_imports,
    fn_ip = ~ make_gamma_ip(6)
  )

#profvis::profvis(
system.time({
  seasonal_sim = seasonal_sim %>% sim_apply_delay()
})
#)

p1=plot_changes(
        seasonal_sim, 
        name = "Rt", 
        fn = seasonal_rt_fn, 
        max_y = 3)

p2=plot_counts(
        seasonal_sim, 
        events = attr(seasonal_sim,"events"),
        mapping=ggplot2::aes(colour=statistic),size=0.25)+
      ggplot2::geom_line()+
      ggplot2::facet_wrap(~ statistic,ncol=1,scales = "free_y")

p1+p2+patchwork::plot_layout(ncol=1,axes = "collect",heights=c(1,10))


```

# Line list simulations

When we want to investigate estimates of delays such as the serial interval, or
rates such as the case fatality rate, it is necessary to simulate individuals
and their interaction. To do this we use a branching process model parametrised
directly with $R_t$. The time varying $R_t$ defines the expected number of
onward infections for each individual which is sampled using a poisson or
negative binomial distribution. Each infectee is then given a time of infection
depending on a specified infectivity profile, and the times of further events
such as symptom onset, hospitalisation, death, test sampling, and test results
can be assigned, following known event probabilities and delay distributions.
Summary counts of all these events indexed on the time of event, demonstrate
well known biases, such as right truncation. This simulation is directly
parametrised with a theoretical $R_t$ value; summary of the network edges gives
us a realised case reproduction number which is specific for an individual
simulation, and the force of infection gives us a instantaneous reproduction
number estimate for the individual simulation. From the parametrised infectivity
profile we can again use the methods of Wallinga and Lipsitch to solve for a
theoretical growth rate.

## Basic outbreak with delays and censoring

A minimal outbreak simulation using a branching process model takes a time
varying function for the reproduction number and a time varying function for the
number of imported cases. This generates a set of infection events for the 
specified time period (staring at day 0).

```{r}

# A simple outbreak
linelist = sim_branching_process(
   fn_Rt = ~ ifelse(.x < 40, 1.5, 0.95),
   max_time = 120,
   seed = 100,
   fn_imports = ~ ifelse(.x<10,8,0)
)

plot_cases(linelist, events = attr(linelist,"events"),individual=TRUE)+ggplot2::coord_cartesian()

```


From this set of infections we can add in a observations, based on the
probability of observation and the delay of observation. Both of these are time
varying functions, which can for example vary as a result of the day of the
week. In this example we add a "sequencing" column set that describes when a
sample was taken for genomic sequencing. This sampling is delayed slightly more
on infections arising at the weekend.

```{r}
linelist2 = linelist %>% sim_delay(
  p_fn = ~ 0.6, 
  delay_fn = cfg_weekly_gamma_rng(c(3,3,3,3,3,3.5,3.5), sd=1),
  output = "sequencing"
)

# The delay function results in additional columns prefixed by "sequencing"
# of reach individual.
linelist2 %>% dplyr::glimpse()

```

With this additional date we can summarise both time points into daily counts 
of infections, and in this case "sequencing" samples, which shows some periodicity
as a result of the extended delay over the weekend. However it is common to 
also have some sort of delay reporting the results of a test. In this case we have added
a longish delay in getting the results back. As we are aggregating on sample date
we will therefore have a right censoring effect on positive sample counts.

```{r}
summary = linelist2 %>% sim_summarise_linelist(
    censoring = list(
      "sequencing" = ~ rgamma2(.x, mean = 14)
    )
  )

# plot the counts for comparison.
p1=plot_changes(summary,"Rt",max_y = 2.0, case = TRUE)
p2=plot_counts(summary,events = attr(summary,"events"), mapping=ggplot2::aes(colour=statistic),size=0.5)+ggplot2::geom_line()
patchwork::wrap_plots(p1,p2,ncol=1,axes="collect")

```


## Variant introduction: Alpha example

If the infectivity profile is changing dynamically over the simulation it alters
the relationship between reproduction number and growth rate.

For line list models the infectivity profile could potentially vary from
individual to individual depending on the variant they are infected with, which
can be a cause of more rapid growth of a variant. It is hard to differentiate
between a higher reproduction number and a shorter generation time, especially 
in the short term. However a decreased generation time will lead to a steeper 
but flatter curve compared to an increased $R_t$, which will tend to take off
more gradually, from the trajectory of the original variant, but curve away more
quickly as the higher exponential rate kicks in.


```{r}

variant_imports_df = tibble::tribble(
  ~time, ~variant, ~count,
  0:4, "wild-type", 5,
  20:24, "alpha", 5
)

# The fist scenario is a two variant outbreak with one variant with a shorter
# generation time.

variant_ip_fn = cfg_gamma_ip_fn(
    mean_fn = \(variant) dplyr::case_when(
      # Two infectivity profiles one for each variant.
      variant=="wild-type" ~ 8,
      variant=="alpha" ~ 5) # variant 2 has a shorter generation time
)


scenario1 = sim_branching_process(
  max_time = 120,
  fn_Rt = ~ 1.15, # Both variants have the same fixed R_t
  fn_ip = variant_ip_fn,
  imports_df = variant_imports_df,
  seed=101
)


# The second scenario involves two variants one with a slight reproduction 
# number advantage. Both have the same (longer) generation time

variant_Rt_fn = \(t,variant) dplyr::case_when(
  variant=="wild-type" ~ 1.15,
  variant=="alpha" ~ 1.3
)

scenario2 = sim_branching_process(
  max_time = 120,
  fn_Rt = variant_Rt_fn, 
  fn_ip = ~ make_gamma_ip(8), # Both variants have the same fixed ip with mean 8
  imports_df = variant_imports_df,
  seed=100
)

comparison = dplyr::bind_rows(
  scenario1 %>% sim_summarise_linelist(variant) %>% dplyr::mutate(scenario = "1 - Shorter GT"),
  scenario2 %>% sim_summarise_linelist(variant) %>% dplyr::mutate(scenario = "2 - Rt advantage")
)

p1=plot_changes(comparison,"Rt", max_y = 2.5,mapping = ggplot2::aes(colour=variant))+ggplot2::facet_wrap(~scenario)
p2=plot_counts(comparison %>% dplyr::group_by(scenario,variant), mapping = ggplot2::aes(colour=variant))+ggplot2::facet_wrap(~scenario)

patchwork::wrap_plots(p1,p2,ncol=1,axes="collect")

```

## Two variants: Delta outbreak

In the pandemic Delta started spreading whilst Alpha was declining. This 
could not have been due to the generation time, which cannot change the overall
direction of the epidemic on its own. It must have been associated with a more
transmissible nature.

```{r}

variant_imports_df = tibble::tribble(
  ~time, ~variant, ~count,
  -20:-1, "alpha", 300,
  15:19, "delta", 5
)

variant_Rt_fn = \(t,variant) dplyr::case_when(
  t < 0 ~ 0,
  variant=="alpha" ~ 0.8,
  variant=="delta" ~ 1.3
)

scenario = sim_branching_process(
  max_time = 120,
  fn_Rt = variant_Rt_fn, 
  fn_ip = ~ make_gamma_ip(8), # Both variants have the same fixed ip with mean 8
  imports_df = variant_imports_df,
  seed=100
)

summary1 =  scenario %>% sim_summarise_linelist(variant) %>% dplyr::filter(time >= 0) 
summary2 =  scenario %>% sim_summarise_linelist() %>% dplyr::filter(time >= 0)

p1=plot_counts(summary1, mapping=ggplot2::aes(colour=variant))
p2=plot_counts(summary2)

p3=ggplot2::ggplot(summary1, ggplot2::aes(x=as.Date(time), colour=variant))+
    ggplot2::geom_point(ggplot2::aes(y=rt.inst))+
    ggplot2::geom_line(ggplot2::aes(y=rt.weighted))+
    ggplot2::coord_cartesian(ylim=c(0,3.5))+
    ggplot2::xlab(NULL)

p4=ggplot2::ggplot(summary2, ggplot2::aes(x=as.Date(time)))+
    ggplot2::geom_point(ggplot2::aes(y=rt.inst))+
    ggplot2::geom_line(ggplot2::aes(y=rt.weighted))+
    ggplot2::coord_cartesian(ylim=c(0,3.5))+
    ggplot2::xlab(NULL)

patchwork::wrap_plots(p1,p2,p3,p4,ncol=2,axes="collect")


```

## Over-dispersion

In the branching process model, each infected individual has a daily probability
of infecting someone based on the current reproduction number and the
infectivity profile. This is sampled using either a poisson or a negative
binomial distribution depending on a dispersion parameter (`kappa==1` is poisson
distributed, anything larger than this is a negative binomial). This can be made
time dependent, so that we can simulate changes in response to, for example,
shutting social venues. Very over-dispersed outbreaks have a higher chance of 
becoming extinct in the early stage for the same reproduction number, but this 
is not very realistic. More likely is a scenario where a subset of the 
population have a high super-spreading potential.

```{r}

kappa = c(1,1.5,2,3,4)

comparison = dplyr::bind_rows(lapply(kappa, function(k) {
  
  sim_branching_process(
    max_time = 80,
    fn_Rt = ~ 2, # Both variants have the same fixed R_t
    fn_ip = ~ make_gamma_ip(8),
    fn_import = ~ ifelse(.x<10,4,0),
    fn_kappa = ~ k,
    seed=101
  ) %>% 
    sim_summarise_linelist() %>%
    dplyr::mutate(scenario = sprintf("Dispersion: %1.2f",k))
  
}))

plot_counts(comparison %>% dplyr::group_by(class=scenario))+scale_y_log1p()
```

## Age stratification & contact matrices

We can specify metadata in the branching process model. In the example above it
was the variant, and was transmitted to the infectee. If we can define a probabilistic
mapping between metadata classes we can simulate stratified populations such as 
by age. To do this we define a function that reacts to the metadata and assigns
a infectee class. We also need to define the initial metadata for imported cases.

```{r}

# This simulation is seeded in the younger age groups:
strat_imports_df = tibble::tribble(
  ~time, ~age_cat, ~count,
  0:4, "child", 5,
  0:4, "adolescent", 5
) %>% dplyr::mutate(age_cat = factor(age_cat,levels = c("child","adolescent","adult","elderly")))

# The next generation function in this form is similar to a 
# next generation matrix however it does not include R_t components: 
strat_fn_list_next_gen = list(
  "age_cat" = ~ dplyr::case_when(
    .x == "child" ~ rcategorical(.x, prob = c("child"=0.4,"adolescent"=0.2,"adult"=0.4,"elderly"=0)),
    .x == "adolescent" ~ rcategorical(.x, prob = c("child"=0.2,"adolescent"=0.4,"adult"=0.3,"elderly"=0.1)),
    .x == "adult" ~ rcategorical(.x, prob = c("child"=0.1,"adolescent"=0.1,"adult"=0.4,"elderly"=0.4)),
    .x == "elderly" ~ rcategorical(.x, prob = c("child"=0,"adolescent"=0.2,"adult"=0.3,"elderly"=0.5)),
  )
)

strat_fn_Rt = \(age_cat) dplyr::case_when(
  age_cat == "child" ~ 1.2,
  age_cat == "adolescent" ~ 1.2,
  age_cat == "adult" ~ 1.2,
  age_cat == "elderly" ~ 2,
)

strat_bpm = sim_branching_process(
    max_time = 80,
    fn_Rt = strat_fn_Rt,
    fn_ip = ~ make_gamma_ip(8),
    imports_df = strat_imports_df,
    fn_list_next_gen = strat_fn_list_next_gen,
    seed=101
  ) %>% dplyr::mutate(age_cat = factor(age_cat,levels = c("child","adolescent","adult","elderly")))

plot_cases(strat_bpm,mapping = ggplot2::aes(fill=age_cat),individual=TRUE)+ggplot2::coord_cartesian()


```

# Future steps

```{r}
#TODO: defaults in function that line up with sim_apply_delay_convolution
#' # move this to vignette examples
#' tmp2 = tmp %>% sim_apply_delay(
#'   fn_p_symptomatic = ~ 0.8,
#'   fn_symptom_delay = ~ stats::rgamma(.x, shape = 3),
#'   fn_p_tested = \(symptomatic,...) ifelse(symptomatic, 0.8, 0.1),
#'   fn_sample_delay = ~ stats::rgamma(.x, shape = 2),
#'   fn_result_delay = \(sample_time,...) dplyr::case_when(
#'       floor(sample_time) %% 7 %in% c(0,1,2,3,4) ~ stats::rgamma(sample_time, shape=2), # Sun, Mon to Thurs
#'       floor(sample_time) %% 7 == 5 ~ stats::rgamma(sample_time, shape=4), # Fri
#'       floor(sample_time) %% 7 == 6 ~ stats::rgamma(sample_time, shape=3), # Sat
#'   )
#' )
#' 
#' 
#' 
#' # Delay distribution example:
#' tmp3 = sim_branching_process(
#'   changes = tibble::tibble(t = c(0,20,40,60,80,110), R_t = c(1.8,1.5,0.9,1.5,0.8,1.2)),
#'   kappa = 1,
#'   max_time = 120,
#'   seed = 100,
#'   summarise = FALSE
#'  ) %>% sim_apply_delay(
#'   fn_p_symptomatic = ~ 0.8,
#'   fn_symptom_delay = ~ stats::rgamma(.x, shape = 3),
#'   fn_p_symptomatic = ~ 0.8,
#'   fn_symptom_delay = ~ stats::rgamma(.x, shape = 3),
#'   fn_p_tested = \(symptomatic,...) ifelse(symptomatic, 0.8, 0.1),
#'   fn_sample_delay = \(symptomatic, symptom_time, ...) dplyr::case_when(
#'       !symptomatic ~ stats::runif(symptomatic, max = 14),
#'       TRUE ~ cfg_weekly_gamma_rng(c(1,1,1,1,3,2,2))(symptom_time)
#'   ),
#'   fn_result_delay = \(sample_time,...) cfg_weekly_gamma_rng(c(1,1,1,1,3,2,2))(sample_time)
#'  )
#'
#' tmp4 = tmp3 %>% sim_summarise_linelist()
#'
#' ggplot2::ggplot(tmp4 %>% dplyr::filter(!is.na(count)), ggplot2::aes(x=time, colour=statistic))+
#'   ggplot2::geom_line(ggplot2::aes(y=count))+
#'   ggplot2::facet_wrap(~statistic,ncol=1)
#' }
#'
#' #' tmp2 = sim_branching_process(
#'   changes = tibble::tibble(t = c(0,20,40,60,80,110), R = c(1.8,1.5,0.9,1.5,0.8,1.2)),
#'   kappa = 1,
#'   max_time = 120,
#'   seed = 100
#' ) %>% sim_summarise_linelist()
#'
#' ggplot2::ggplot(tmp2, ggplot2::aes(x=time))+
#'   ggplot2::geom_point(ggplot2::aes(y=count))+
#'   ggplot2::geom_line(ggplot2::aes(y=original))
#'
# ggplot2::ggplot(tmp2, ggplot2::aes(x=time))+
#    ggplot2::geom_step(ggplot2::aes(y=rt.weighted), colour="black")+
#    ggplot2::geom_point(ggplot2::aes(y=rt.inst, colour="Instantaneous"))+
#    ggplot2::geom_point(ggplot2::aes(y=rt.case, colour="Case"))+
#    ggplot2::coord_cartesian(ylim=c(0,3))

```