Package 'ggoutbreak' reference manual

Title:	Estimate Incidence, Proportions and Exponential Growth Rates
Description:	Simple statistical models and visualisations for calculating the incidence, proportion, exponential growth rate, and reproduction number of infectious disease case time series. This toolkit was largely developed during the COVID-19 pandemic.
Authors:	Robert Challen [aut, cre, cph]
Maintainer:	Robert Challen <[email protected]>
License:	MIT + file LICENSE
Version:	0.4.1
Built:	2025-03-09 05:48:17 UTC
Source:	https://github.com/ai4ci/ggoutbreak

Insert a layer at the bottom of a `ggplot`

Description

Insert a layer at the bottom of a ggplot

Usage

plot %above% layer
plot %above% layer

Arguments

`plot`	the plot to add the layer to
`layer`	the layer to insert underneath the plot

Value

a ggplot

Convert time period to dates

Description

Convert time period to dates

Usage

## S3 method for class 'time_period'
as.Date(x, ...)

## S3 method for class 'time_period'
as.POSIXct(x, ...)
## S3 method for class 'time_period'
as.Date(x, ...)

## S3 method for class 'time_period'
as.POSIXct(x, ...)

Arguments

`x`	a time_period
`...`	not used

Value

a vector of dates representing the start of each of the input time_period entries

Functions

as.POSIXct(time_period): Convert to a vector of POSIXct

Convert to a time period class

Description

Time periods are just a zero based numeric representation of dates with a time unit baked in. This allows variable length periods (e.g. days or weeks), and fractional days to be represented in a consistent(ish) way

Usage

as.time_period(x, unit = NULL, start_date = NULL, anchor = NULL, ...)

## S3 method for class 'time_period'
c(..., recursive = F)

## S3 method for class 'time_period'
x[...]

## S3 replacement method for class 'time_period'
x[...] <- value

## S3 method for class 'time_period'
x[[...]]

## S3 replacement method for class 'time_period'
x[[...]] <- value

## S3 method for class 'time_period'
seq(from, to = from, ...)

is.time_period(x)

## S3 method for class 'time_period'
print(x, ...)
as.time_period(x, unit = NULL, start_date = NULL, anchor = NULL, ...)

## S3 method for class 'time_period'
c(..., recursive = F)

## S3 method for class 'time_period'
x[...]

## S3 replacement method for class 'time_period'
x[...] <- value

## S3 method for class 'time_period'
x[[...]]

## S3 replacement method for class 'time_period'
x[[...]] <- value

## S3 method for class 'time_period'
seq(from, to = from, ...)

is.time_period(x)

## S3 method for class 'time_period'
print(x, ...)

Arguments

`x`	a vector of numbers (may be integer or real) or a time_period
`unit`	the length of one unit of time. This will be either a integer number of days, or a specification such as "1 week", or another `time_period`. If `x` is a `time_period`, and the unit is different then from that of `x` this will return a new `time_period` using the new units.
`start_date`	the zero time date as something that can be coerced to a date. If the `x` input is already a `time_period` and this is different to its `start_date` then it will be recalibrated to use the new start date.
`anchor`	only relevant is `x` is a vector of dates and `start_date` is not specified, this is a date, or "start" or "end" or a weekday name e.g. "mon". With the vector of dates in `x` it will find a reference date for the time-series. If this is `NULL` and `start_date` is also `NULL` it will fall back to `getOption("day_zero","2019-12-29")`
`...`	used for subtype implementations
`recursive`	concatenate recursively
`value`	the value
`from`, `to`	the starting and (maximal) end values of the sequence. Of length `1` unless just `from` is supplied as an unnamed argument.

Value

a time_period class, consisting of a vector of numbers, with attributes for time period and start_date

Functions

c(time_period): Combine time_period
[: Subset a time_period
`[`(time_period) <- value: Assign values to a subset of a time_period
[[: Get a value in a time_period
`[[`(time_period) <- value: Assign a value in a time_period
seq(time_period): Create a sequence using time_periods
is.time_period(): Check is a time_period
print(time_period): Print a time_period

Examples

# 100 weeks from 2020-01-01

tmp = as.time_period(0:100, 7, "2020-01-01")
as.Date(tmp)

range(tmp)
min(tmp)
tmp2 = as.integer(as.Date(tmp))
# testthat::expect_true(all(na.omit(tmp2-lag(tmp2)) == 7))

tmp2 = as.time_period(0:23, 1/24, "2020-01-01")
as.POSIXct(tmp2)

# convert timeseries to new "unit"
tmp = as.time_period(0:100, 7, "2020-01-01")
tmp2 = as.time_period(tmp,1)
testthat::expect_equal(as.numeric(tmp2), 0:100*7)
# 100 weeks from 2020-01-01

tmp = as.time_period(0:100, 7, "2020-01-01")
as.Date(tmp)

range(tmp)
min(tmp)
tmp2 = as.integer(as.Date(tmp))
# testthat::expect_true(all(na.omit(tmp2-lag(tmp2)) == 7))

tmp2 = as.time_period(0:23, 1/24, "2020-01-01")
as.POSIXct(tmp2)

# convert timeseries to new "unit"
tmp = as.time_period(0:100, 7, "2020-01-01")
tmp2 = as.time_period(tmp,1)
testthat::expect_equal(as.numeric(tmp2), 0:100*7)

A scales breaks generator for log1p scales

Description

A scales breaks generator for log1p scales

Usage

breaks_log1p(n = 5, base = 10)
breaks_log1p(n = 5, base = 10)

Arguments

`n`	the number of breaks
`base`	the base for the breaks

Value

a function for ggplot scale breaks

Examples

ggplot2::ggplot(ggplot2::diamonds, ggplot2::aes(x=price))+
  ggplot2::geom_density()+
  ggplot2::scale_x_continuous(trans="log1p", breaks=breaks_log1p())
ggplot2::ggplot(ggplot2::diamonds, ggplot2::aes(x=price))+
  ggplot2::geom_density()+
  ggplot2::scale_x_continuous(trans="log1p", breaks=breaks_log1p())

Generate a random probability based on features of the simulation

Description

Generate a random probability based on features of the simulation

Usage

cfg_beta_prob_rng(probability_fn = ~0.8, kappa_fn = ~0.1)
cfg_beta_prob_rng(probability_fn = ~0.8, kappa_fn = ~0.1)

Arguments

`probability_fn`	a function which gives the time-varying mean of a beta distribution, The function will be called minimally with `.x` or 't“ which will be the time as a time period. Other variables may be present.
`kappa_fn`	a function which gives the time-varying dispersion of a beta distribution. The function will be called with minimally `.x` or `t` which will be the time period and `.y` or `mean` will be the mean. Other variables may be present.

Value

a time dependent function that inputs a time (as a time_period) and returns an probability for each day defined by the probability_fn and kappa_fn

Examples

fn = cfg_beta_prob_rng(~ ifelse(.x<=5,0.1,0.9))
fn(1:10)
fn = cfg_beta_prob_rng(~ ifelse(.x<=5,0.1,0.9))
fn(1:10)

Get a IP generating function from time varying mean and SD of a gamma function

Description

Get a IP generating function from time varying mean and SD of a gamma function

Usage

cfg_gamma_ip_fn(mean_fn = ~2, sd_fn = function(mean) sqrt(mean))
cfg_gamma_ip_fn(mean_fn = ~2, sd_fn = function(mean) sqrt(mean))

Arguments

`mean_fn`	a function which gives the time-varying mean of a gamma distribution, The function will be called minimally with `.x` or `t` which will be the time as a time period. Other variables may be present.
`sd_fn`	a function which gives the time-varying mean of a gamma distribution. The function will be called with minimally `.x` or `t` which will be the time period and `.y` or `mean` will be the mean. Other variables may be present.

Value

a time dependent function that inputs a time (as a time_period) and returns an ip delay distribution for each day defined by the mean_fn and sd_fn

Examples

fn = cfg_gamma_ip_fn(mean_fn = \(t) ifelse(t < 5, 4, 2))
# a gamma function that changes mean at time 5
fn(4)
fn(7)
fn = cfg_gamma_ip_fn(mean_fn = \(t) ifelse(t < 5, 4, 2))
# a gamma function that changes mean at time 5
fn(4)
fn(7)

Randomly sample from an empirical distribution

Description

This is used for random sampling from the infectivity profile for times to infection, for example. There is nothing to stop you putting in a delay distribution with negative times but strange things may happen in your simulation.

Usage

cfg_ip_sampler_rng(ip = i_empirical_ip)
cfg_ip_sampler_rng(ip = i_empirical_ip)

Arguments

ip

a long format empirical distribution

A dataframe containing the following columns:

boot (anything + default(1)) - a bootstrap identifier
probability (proportion) - the probability of new event during this period.
a0 (double) - the beginning of the time period (in days)
a1 (double) - the end of the time period (in days)

Must be grouped by: boot (exactly).

A default value is defined.

Value

a function which accepts n parameter which produces random samples from the ip distribution

Examples

tmp = cfg_ip_sampler_rng(ganyani_ip_2)(10000)

# This discretised ganyani distribution is based on these figures:
# mean: 5.2 (3.78-6.78) and sd: 1.72 (0.91-3.93)
format_ip(ganyani_ip_2)

mean(tmp) # Should be about 5.2
stats::sd(tmp) # Should be about 1.72
tmp = cfg_ip_sampler_rng(ganyani_ip_2)(10000)

# This discretised ganyani distribution is based on these figures:
# mean: 5.2 (3.78-6.78) and sd: 1.72 (0.91-3.93)
format_ip(ganyani_ip_2)

mean(tmp) # Should be about 5.2
stats::sd(tmp) # Should be about 1.72

Linear function from dataframe

Description

Linear function from dataframe

Usage

cfg_linear_fn(changes, ..., col_name = setdiff(colnames(changes), "t"))
cfg_linear_fn(changes, ..., col_name = setdiff(colnames(changes), "t"))

Arguments

`changes`	a dataframe with `t` and `⁠<col_name>⁠` columns which define the change points for a piecewise linear function.
`...`	not used
`col_name`	the value column (optional if only 2 columns)

Value

a function that inputs a vector t and returns a linearly interpolated value from ⁠<col_name>⁠

Step function from dataframe

Description

Step function from dataframe

Usage

cfg_step_fn(changes, ..., col_name = setdiff(colnames(changes), "t"))
cfg_step_fn(changes, ..., col_name = setdiff(colnames(changes), "t"))

Arguments

`changes`	a dataframe with `t` and `⁠<col_name>⁠` columns which define the cut points for a step function.
`...`	not used
`col_name`	the value column (optional if only 2 columns)

Value

a function that inputs a vector t and returns the next smallest corresponding value in ⁠<col_name>⁠ (or the first one)

Sample from a multinomial transition matrix

Description

This is particularly designed for use within the fn_list_next_gen parameter of sim_branching_process() to allow age group contract matrices to be applied for example (assuming a categorical age).

Usage

cfg_transition_fn(transition)
cfg_transition_fn(transition)

Arguments

transition

a transition matrix or long format dataframe. If a matrix the columns should add to 1, and the column names are the input class. row names are the output class. the data frame format must have input, output, and probability columns.

Value

a function that given an input will return samples of the output class according to the probability distributions.

Examples

age = rep(c("child","adult","elderly"),100)

fn = cfg_transition_fn(tibble::tribble(
  ~input, ~output, ~probability,
  "child", "child", 0.5,
  "child", "adult", 0.5,
  "adult", "child", 0.5,
  "adult", "adult", 0.3,
  "adult", "elderly", 0.2,
  "elderly","elderly", 0.5,
  "elderly","adult", 0.5,
))

table(fn(age),age)
age = rep(c("child","adult","elderly"),100)

fn = cfg_transition_fn(tibble::tribble(
  ~input, ~output, ~probability,
  "child", "child", 0.5,
  "child", "adult", 0.5,
  "adult", "child", 0.5,
  "adult", "adult", 0.3,
  "adult", "elderly", 0.2,
  "elderly","elderly", 0.5,
  "elderly","adult", 0.5,
))

table(fn(age),age)

Weekly delay function with day of week effect

Description

This function returns a random number generator from a gamma distribution that has a weekly period and configurable degree of additional variability around weekly pattern. It is suited to delays of things like testing which may depend on the day of week.

Usage

cfg_weekly_gamma_rng(
  mean = c(1, 1, 1, 1, 4, 3, 2),
  sd = sqrt(mean),
  week_starts = weekdays(as.Date("2024-10-14"))
)
cfg_weekly_gamma_rng(
  mean = c(1, 1, 1, 1, 4, 3, 2),
  sd = sqrt(mean),
  week_starts = weekdays(as.Date("2024-10-14"))
)

Arguments

`mean`	a mean amount of delay for each day of the week
`sd`	the SD of the delay
`week_starts`	locale description of first day of week (default is a "Monday").

Value

a random number generator taking t time parameter and returning a duration the that time t depending on the day of the week.

Examples

fn = cfg_weekly_gamma_rng(c(1,1,1,1,4,3,2))
matrix(fn(1:42),ncol=7,byrow=TRUE)
fn = cfg_weekly_gamma_rng(c(1,1,1,1,4,3,2))
matrix(fn(1:42),ncol=7,byrow=TRUE)

Weekly convolution distribution function

Description

Weekly convolution distribution function

Usage

cfg_weekly_ip_fn(
  mean = c(1, 1, 1, 1, 4, 3, 2),
  sd = sqrt(mean),
  week_starts = weekdays(as.Date("2024-10-14"))
)
cfg_weekly_ip_fn(
  mean = c(1, 1, 1, 1, 4, 3, 2),
  sd = sqrt(mean),
  week_starts = weekdays(as.Date("2024-10-14"))
)

Arguments

`mean`	The means of a gamma distributed delay function for each weekday
`sd`	The sds of a gamma distributed delay function for each weekday
`week_starts`	locale description of first day of week (default is a "Monday").

Value

a time dependent function that inputs a time (as a time_period) and generates an IP delay distribution for each day varying by day of week

Examples

cat(sapply(cfg_weekly_ip_fn()(1:7),format_ip),sep = "\n")
cat(sapply(cfg_weekly_ip_fn()(1:7),format_ip),sep = "\n")

Random probability function with day of week effect

Description

This function returns a random probability generator that has a weekly period and configurable degree of additional variability around weekly pattern. It is suited to probabilities of things like testing which may depend on the day of week.

Usage

cfg_weekly_proportion_rng(
  prob = c(0.8, 0.8, 0.8, 0.8, 0.8, 0.5, 0.5),
  kappa = 0.1,
  week_starts = weekdays(as.Date("2024-10-14"))
)
cfg_weekly_proportion_rng(
  prob = c(0.8, 0.8, 0.8, 0.8, 0.8, 0.5, 0.5),
  kappa = 0.1,
  week_starts = weekdays(as.Date("2024-10-14"))
)

Arguments

`prob`	the rates of e.g. ascertainment for each day of the week.
`kappa`	dispersion parameter between 0 and 1. O is no dispersion. 1 is maximum
`week_starts`	locale description of first day of week (default is a "Monday").

Value

a random number generator function taking t time parameter and returning a probability of ascertainment for that time, depending on day of week etc.

Examples

fn = cfg_weekly_proportion_rng(c(0.9,0.9,0.9,0.9,0.9,0.1,0.1))
matrix(fn(1:42),ncol=7,byrow=TRUE)
fn = cfg_weekly_proportion_rng(c(0.9,0.9,0.9,0.9,0.9,0.1,0.1))
matrix(fn(1:42),ncol=7,byrow=TRUE)

A COVID-19 infectivity profile based on an empirical resampling approach

Description

An infectivity profile derived from a meta-analysis of serial intervals.

Usage

data(covid_ip)
data(covid_ip)

Format

A dataframe containing the following columns:

boot (anything + default(1)) - a bootstrap identifier
probability (proportion) - the probability of infection between previous time period until time
time (double) - the end of the time period (in days)

Must be grouped by: boot (exactly).

A dataframe containing the following columns:

boot (anything + default(1)) - a bootstrap identifier
probability (proportion) - the probability of infection between previous time period until time
tau (numeric) - the time index this probability relates to (in days)
a0 (numeric) - the beginning of the time period
a1 (numeric) - the end of the time period

Grouped by: boot.

1400 rows and 5 columns

Test sensitivity of PCR tests

Description

The probability of detecting COVID using PCR given time since infection, based on Binny et al 2023.

Usage

data(covid_test_sensitivity)
data(covid_test_sensitivity)

Format

A dataframe containing the following columns:

tau (numeric) - the time column
probability (numeric) - the probability column
boot (integer) - the boot column

Must be grouped by: boot (and other groupings allowed).

5100 rows and 3 columns

References

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9384503/

The COVID-19 viral shedding duration

Description

The COVID-19 viral shedding duration

Usage

data(covid_viral_shedding)
data(covid_viral_shedding)

Format

A dataframe containing the following columns:

boot (anything + default(1)) - a bootstrap identifier
probability (proportion) - the probability of infection between previous time period until time
tau (numeric) - the time index this probability relates to (in days)
a0 (numeric) - the beginning of the time period
a1 (numeric) - the end of the time period

Grouped by: boot.

2600 rows and 3 columns

References

https://www.nature.com/articles/s41467-020-20568-4 From Von Kampen et al.

Places a set of dates within a regular time series

Description

The counterpart to date_seq_dates(). Take an original set of data and place it within a regular time series where the periodicity of the time series may be expressed as numbers of days, weeks, months quarters, or years, and the periods are defined by an anchoring date, day of the week or by reference to the start or end of the input dates. This can either return the periods as dates or factors (e.g. for plotting) or as a time_period for analysis that relies on a numeric representation of the date or duration from the anchor.

Usage

cut_date(
  dates,
  unit,
  anchor = "start",
  output = c("date", "factor", "time_period"),
  dfmt = "%d/%b/%y",
  ifmt = "{start} — {end}",
  ...
)
cut_date(
  dates,
  unit,
  anchor = "start",
  output = c("date", "factor", "time_period"),
  dfmt = "%d/%b/%y",
  ifmt = "{start} — {end}",
  ...
)

Arguments

`dates`	a set of dates
`unit`	a period e.g. "1 week"
`anchor`	one of a date, "start" or "end" or a weekday name e.g. "mon" this will always be one of the start of the time periods we are cutting into
`output`	return the result as either a "date" (the default), an ordered "factor" with the date ranges as a label, or as a "time_period". The result is named with labels referring to the
`dfmt`	the `strptime` format for the dates in the labels
`ifmt`	a `sprintf` format for the period label containing `⁠%s⁠` exactly twice.
`...`	ignored

Value

a set of dates, times or a factor level, representing the start of the period the date falls into, where the period is defined by the duration and the anchor

Examples

dates = as.Date(c("2020-01-01","2020-02-01","2020-01-15","2020-02-03",NA))
fs = ggoutbreak::date_seq(dates, "2 days")
dates - cut_date(dates, "2 days")
cut_date(dates,unit="2 days", output="time_period")

# A weekly set of dates:
dates2 = Sys.Date() + floor(stats::runif(50,max=10))*7

# in this specific situation the final date is not truncated because the
# input data is seen as an exact match for the whole output period.
cut_date(dates2, "1 week", "sun", output="factor")
cut_date(dates2, dfmt = "%d/%b", output="factor", unit = "2 weeks", anchor="sun")

dates = as.Date(c("2020-01-01","2020-02-01","2020-01-15","2020-02-03",NA))
fs = ggoutbreak::date_seq(dates, "2 days")
dates - cut_date(dates, "2 days")
cut_date(dates,unit="2 days", output="time_period")

# A weekly set of dates:
dates2 = Sys.Date() + floor(stats::runif(50,max=10))*7

# in this specific situation the final date is not truncated because the
# input data is seen as an exact match for the whole output period.
cut_date(dates2, "1 week", "sun", output="factor")
cut_date(dates2, dfmt = "%d/%b", output="factor", unit = "2 weeks", anchor="sun")

Create the full sequence of values in a vector

Description

This is useful if you want to fill in missing values that should have been observed but weren't. For example, date_seq(c(1, 2, 4, 6), 1) will return 1:6.

Usage

date_seq(x, period, ...)
date_seq(x, period, ...)

Arguments

`x`	a numeric or date vector
`period`	Gap between each observation. The existing data will be checked to ensure that it is actually of this periodicity.
`...`	for subtype methods

Value

a vector of the same type as the input

Examples

date_seq(c(1, 2, 4, 5, 10), 1)
date_seq(c(1, 2, 4, 5, 10), 1)

Expand a date vector to the full range of possible dates

Description

Derive from a vector of observation dates, a complete ordered sequence of periods in a regular time series, where the length of the periods is specified, as a number of days, weeks, years etcetera. E.g. this can convert a random set of dates to a ordered complete list of 1 week intervals (or 2 month intervals) spanning the same range as the dates. This has some interesting problems regarding where to put breaks within a month or week. Often this is either based on a specific date (e.g. yearly periods starting at 2020-01-01) or a day of week (e.g. 2 weekly periods staring on a Sunday) or maybe relative to the input time series (weekly ending on the last date of the data). There is also a problem when we consider data that may have incomplete starting and end periods, which may not be comparable to other periods, and we may need to exclude these from the result.

Usage

## S3 method for class 'Date'
date_seq(x, period = .day_interval(x), anchor = "start", complete = FALSE, ...)
## S3 method for class 'Date'
date_seq(x, period = .day_interval(x), anchor = "start", complete = FALSE, ...)

Arguments

`x`	a vector of dates, possibly including NA values
`period`	the gap between observations as a number of days or as a natural language definition of the period such as "1 week", '2 weeks', '1 month', etcetera. If not given this will be derived from the dates.
`anchor`	defines a day that appears in the sequence (if it were to extend that far). Given either as a date, or "start", "end" or a day of the week, e.g. "mon".
`complete`	truncate incomplete start and end periods
`...`	ignored

Value

a vector of dates for regular periods between the minimum and maximum of dates, with the boundaries defined by the anchor.

Examples

date_seq(as.Date(c("2020-01-01","2020-02-01","2020-01-15","2020-02-01",NA)), "2 days")
date_seq(as.Date(c("2020-01-01","2020-02-01","2020-01-15","2020-02-01",NA)), "2 days")

Create the full sequence of values in a vector

Description

This is useful if you want to fill in missing values that should have been observed but weren't. For example, date_seq(c(1, 2, 4, 6), 1) will return 1:6.

Usage

## S3 method for class 'numeric'
date_seq(x, period = 1, tol = 1e-06, ...)
## S3 method for class 'numeric'
date_seq(x, period = 1, tol = 1e-06, ...)

Arguments

`x`	a numeric or date vector
`period`	Gap between each observation. The existing data will be checked to ensure that it is actually of this periodicity.
`tol`	Numerical tolerance for checking periodicity.
`...`	for subtype methods

Value

a vector of the same type as the input

Examples

date_seq(c(1, 2, 4, 5, 10), 1)
date_seq(c(1, 2, 4, 5, 10), 1)

Expand a `time_period` vector to the full range of possible times

Description

Derive from a vector of observation time_periods, a complete ordered sequence of periods in a regular time series, where the length of the periods is specified, as a number of days, weeks, years etc. E.g. this can convert a random set of times to a ordered complete list of 1 week intervals (or 2 month intervals) spanning the same range as the dates. This has some interesting problems regarding where to put breaks within a month or week. Often this is either based on a specific date (e.g. yearly periods starting at 2020-01-01) or a day of week (e.g. 2 weekly periods staring on a sunday) or maybe relative to the input time series (weekly ending on the last date of the data). There is also a problem when we consider data that may have incomplete starting and end periods, which may not be comparable to other periods, and we may need to exclude these from the result.

Usage

## S3 method for class 'time_period'
date_seq(x, period = attributes(x)$unit, complete = FALSE, ...)
## S3 method for class 'time_period'
date_seq(x, period = attributes(x)$unit, complete = FALSE, ...)

Arguments

`x`	a time period vector
`period`	the gap between observations as a number of days or as a natural language definition of the period such as "1 week", '2 weeks', '1 month', etc. If not given this will be derived from the dates.
`complete`	truncate incomplete start and end periods
`...`	ignored

Value

a vector of time_periods for regular periods between the minimum and maximum of dates, with the boundaries defined by the anchor.

Examples

tmp = as.time_period(c(0,10,100), 7, "2020-01-01")
date_seq(tmp, "7 days")
date_seq(tmp, "1 day")
tmp = as.time_period(c(0,10,100), 7, "2020-01-01")
date_seq(tmp, "7 days")
date_seq(tmp, "1 day")

Convert a set of dates to numeric timepoints

Description

Using a start_date and a unit specification

Usage

date_to_time(
  dates,
  unit = .day_interval(dates),
  start_date = getOption("day_zero", "2019-12-29")
)
date_to_time(
  dates,
  unit = .day_interval(dates),
  start_date = getOption("day_zero", "2019-12-29")
)

Arguments

`dates`	a vector of dates to convert
`unit`	a specification of the unit of the resulting time series. Will be determined from periodicity of dates if not specified. If another `time_period` is given as the unit then the
`start_date`	the origin of the conversion. Defaults to the beginning of the COVID pandemic

Value

a vector of class time_period

Examples

times = date_to_time(as.Date("2019-12-29")+0:100, "1 week")
dates = time_to_date(times)
times = date_to_time(as.Date("2019-12-29")+0:100, "1 week")
dates = time_to_date(times)

The Beta Distribution

Description

Density, distribution function, quantile function and random generation for the Beta distribution with parameters shape1 and shape2 (and optional non-centrality parameter ncp).

Usage

dbeta2(x, prob, kappa, log = FALSE)
dbeta2(x, prob, kappa, log = FALSE)

Arguments

`x`	vector of quantiles
`prob`	the mean probability (vectorised)
`kappa`	a coefficient of variation. where 0 is no variability and 1 is maximally variability (vectorised)
`log`	logical; if TRUE, probabilities p are given as log(p).

Value

dbeta gives the density, pbeta the distribution function, qbeta the quantile function, and rbeta generates random deviates.

Invalid arguments will result in return value NaN, with a warning.

The length of the result is determined by n for rbeta, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

Examples

dbeta2(c(0.25,0.5,0.75), 0.5, 0.25)
dbeta2(c(0.25,0.5,0.75), 0.5, 0.25)

The Gamma Distribution

Description

Density, distribution function, quantile function and random generation for the Gamma distribution with parameters shape and scale.

Usage

dgamma2(x, mean, sd = sqrt(mean), log = FALSE, convex = TRUE)
dgamma2(x, mean, sd = sqrt(mean), log = FALSE, convex = TRUE)

Arguments

`x`	vector of quantiles
`mean`	the mean value on the true scale (vectorised)
`sd`	the standard deviation on the true scale (vectorised)
`log`	logical; if TRUE, probabilities p are given as log(p).
`convex`	Show a warning if the distribution selected is not a convex function

Value

dgamma gives the density, pgamma gives the distribution function, qgamma gives the quantile function, and rgamma generates random deviates.

Invalid arguments will result in return value NaN, with a warning.

The length of the result is determined by n for rgamma, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

Examples

dgamma2(seq(0,4,0.25), 2, 1)
dgamma2(seq(0,4,0.25), 2, 1)

The Log Normal Distribution

Description

Density, distribution function, quantile function and random generation for the log normal distribution whose logarithm has mean equal to meanlog and standard deviation equal to sdlog.

Usage

dlnorm2(x, mean = 1, sd = sqrt(exp(1) - 1), log = FALSE)
dlnorm2(x, mean = 1, sd = sqrt(exp(1) - 1), log = FALSE)

Arguments

`x`	vector of quantiles
`mean`	the mean value on the true scale (vectorised)
`sd`	the standard deviation on the true scale (vectorised)
`log`	logical; if TRUE, probabilities p are given as log(p).

Details

The log normal distribution has density

$f(x) = \frac{1}{\sqrt{2\pi}\sigma x} e^{-(\log(x) - \mu)^2/2 \sigma^2}%$

where $\mu$ and $\sigma$ are the mean and standard deviation of the logarithm. The mean is $E(X) = exp(\mu + 1/2 \sigma^2)$ , the median is $med(X) = exp(\mu)$ , and the variance $Var(X) = exp(2\mu + \sigma^2)(exp(\sigma^2) - 1)$ and hence the coefficient of variation is $\sqrt{exp(\sigma^2) - 1}$ which is approximately $\sigma$ when that is small (e.g., $\sigma < 1/2$ ).

Value

dlnorm gives the density, plnorm gives the distribution function, qlnorm gives the quantile function, and rlnorm generates random deviates.

The length of the result is determined by n for rlnorm, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

Note

The cumulative hazard $H(t) = - \log(1 - F(t))$ is -plnorm(t, r, lower = FALSE, log = TRUE).

Source

dlnorm is calculated from the definition (in ‘Details’). [pqr]lnorm are based on the relationship to the normal.

Consequently, they model a single point mass at exp(meanlog) for the boundary case sdlog = 0.

References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

Examples

dlnorm2(seq(0,4,0.25), 2, 1)
dlnorm2(seq(0,4,0.25), 2, 1)

The Negative Binomial Distribution

Description

Density, distribution function, quantile function and random generation for the negative binomial distribution with parameters size and prob.

Usage

dnbinom2(x, mean, sd = sqrt(mean), log = FALSE)
dnbinom2(x, mean, sd = sqrt(mean), log = FALSE)

Arguments

`x`	vector of (non-negative integer) quantiles.
`mean`	the mean value on the true scale (vectorised)
`sd`	the standard deviation on the true scale (vectorised)
`log`	logical; if TRUE, probabilities p are given as log(p).

Details

The negative binomial distribution with size $= n$ and prob $= p$ has density

$p(x) = \frac{\Gamma(x+n)}{\Gamma(n) x!} p^n (1-p)^x$

for $x = 0, 1, 2, \ldots$ , $n > 0$ and $0 < p \le 1$ .

This represents the number of failures which occur in a sequence of Bernoulli trials before a target number of successes is reached. The mean is $\mu = n(1-p)/p$ and variance $n(1-p)/p^2$ .

A negative binomial distribution can also arise as a mixture of Poisson distributions with mean distributed as a gamma distribution (see pgamma) with scale parameter (1 - prob)/prob and shape parameter size. (This definition allows non-integer values of size.)

An alternative parametrization (often used in ecology) is by the mean mu (see above), and size, the dispersion parameter, where prob = size/(size+mu). The variance is mu + mu^2/size in this parametrization.

If an element of x is not integer, the result of dnbinom is zero, with a warning.

The case size == 0 is the distribution concentrated at zero. This is the limiting distribution for size approaching zero, even if mu rather than prob is held constant. Notice though, that the mean of the limit distribution is 0, whatever the value of mu.

The quantile is defined as the smallest value $x$ such that $F(x) \ge p$ , where $F$ is the distribution function.

Value

dnbinom gives the density, pnbinom gives the distribution function, qnbinom gives the quantile function, and rnbinom generates random deviates.

Invalid size or prob will result in return value NaN, with a warning.

The length of the result is determined by n for rnbinom, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

rnbinom returns a vector of type integer unless generated values exceed the maximum representable integer when double values are returned.

Source

dnbinom computes via binomial probabilities, using code contributed by Catherine Loader (see dbinom).

pnbinom uses pbeta.

qnbinom uses the Cornish–Fisher Expansion to include a skewness correction to a normal approximation, followed by a search.

rnbinom uses the derivation as a gamma mixture of Poisson distributions, see

Devroye, L. (1986) Non-Uniform Random Variate Generation. Springer-Verlag, New York. Page 480.

Examples

dnbinom2(0:5, 2, sqrt(2))
dnbinom2(0:5, 2, sqrt(2))

Doubling time from growth rate

Description

The unit of doubling times is always days.

Usage

doubling_time(x, ...)
doubling_time(x, ...)

Arguments

x

a dataframe calculated from either proportion or incidence growth rate calculations:

e.g. A dataframe containing the following columns:

time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
incidence.fit (double) - an estimate of the incidence rate on a log scale
incidence.se.fit (positive_double) - the standard error of the incidence rate estimate on a log scale
incidence.0.025 (positive_double) - lower confidence limit of the incidence rate (true scale)
incidence.0.5 (positive_double) - median estimate of the incidence rate (true scale)
incidence.0.975 (positive_double) - upper confidence limit of the incidence rate (true scale)
growth.fit (double) - an estimate of the growth rate
growth.se.fit (positive_double) - the standard error the growth rate
growth.0.025 (double) - lower confidence limit of the growth rate
growth.0.5 (double) - median estimate of the growth rate
growth.0.975 (double) - upper confidence limit of the growth rate

Any grouping allowed.

A dataframe containing the following columns:

time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
proportion.fit (double) - an estimate of the proportion on a logit scale
proportion.se.fit (positive_double) - the standard error of proportion estimate on a logit scale
proportion.0.025 (proportion) - lower confidence limit of proportion (true scale)
proportion.0.5 (proportion) - median estimate of proportion (true scale)
proportion.0.975 (proportion) - upper confidence limit of proportion (true scale)
relative.growth.fit (double) - an estimate of the relative growth rate
relative.growth.se.fit (positive_double) - the standard error the relative growth rate
relative.growth.0.025 (double) - lower confidence limit of the relative growth rate
relative.growth.0.5 (double) - median estimate of the relative growth rate
relative.growth.0.975 (double) - upper confidence limit of the relative growth rate

Any grouping allowed.

...

not used

Value

the same dataframe with additional columns for doubling time or relative doubling time plus confidence intervals.

Examples

ggoutbreak::england_covid %>%
  ggoutbreak::poisson_locfit_model(window=21) %>%
  ggoutbreak::doubling_time() %>%
  dplyr::glimpse()


ggoutbreak::england_covid %>%
  ggoutbreak::poisson_locfit_model(window=21) %>%
  ggoutbreak::doubling_time() %>%
  dplyr::glimpse()

The Du empirical serial interval dataset

Description

From Z. Du, X. Xu, Y. Wu, L. Wang, B. J. Cowling, and L. A. Meyers, ‘Serial Interval of COVID-19 among Publicly Reported Confirmed Cases’, Emerg Infect Dis, vol. 26, no. 6, pp. 1341–1343, Jun. 2020, doi: 10.3201/eid2606.200357.

Usage

data(du_serial_interval_ip)
data(du_serial_interval_ip)

Format

A dataframe containing the following columns:

boot (anything + default(1)) - a bootstrap identifier
probability (proportion) - the probability of infection between previous time period until time
tau (numeric) - the time index this probability relates to (in days)
a0 (numeric) - the beginning of the time period
a1 (numeric) - the end of the time period

Grouped by: boot.

2603 rows and 5 columns

Wedge distribution

Description

The wedge distribution has a domain of 0 to 1 and has a linear probability density function over that domain.

Usage

dwedge(x, a, lower.tail = TRUE, log = FALSE)
dwedge(x, a, lower.tail = TRUE, log = FALSE)

Arguments

`x`	vector of quantiles
`a`	a gradient from -2 (left skewed) to 2 (right skewed)
`lower.tail`	logical; if TRUE (default), probabilities are `P[X<=x]` otherwise `P[X>x]`.
`log`	logical; if TRUE, probabilities p are given as log(p).

Details

The rwedge can be combined with quantile functions to skew standard distributions, or introduce correlation or down weight certain parts of the distribution.

Value

a vector of probabilities, quantiles, densities or samples.

Examples


pwedge(seq(0,1,0.1), a=1)
dwedge(seq(0,1,0.1), a=1)
qwedge(c(0.25,0.5,0.75), a=-1)

stats::cor(
  stats::qnorm(rwedge(1000, a=2)),
  stats::qnorm(rwedge(1000, a=-2))
)
pwedge(seq(0,1,0.1), a=1)
dwedge(seq(0,1,0.1), a=1)
qwedge(c(0.25,0.5,0.75), a=-1)

stats::cor(
  stats::qnorm(rwedge(1000, a=2)),
  stats::qnorm(rwedge(1000, a=-2))
)

The SPI-M-O England consensus growth rate

Description

SPI-M-O used a range of different statistical and mechanistic models to produce estimates of the growth rate of the epidemic from various data sources (including with an early version of ggoutbreak).

Usage

data(england_consensus_growth_rate)
data(england_consensus_growth_rate)

Format

A dataframe containing the following columns:

date (date) - the date of the estimate
low (numeric) - the lower published estimate of the growth rate
high (numeric) - the higher published estimate of the growth rate

No mandatory groupings.

No default value.

111 rows and 3 columns

The SPI-M-O England consensus reproduction number

Description

SPI-M-O used a range of different statistical and mechanistic models to produce estimates of the reproduction number of the epidemic from various data sources.

Usage

data(england_consensus_rt)
data(england_consensus_rt)

Format

A dataframe containing the following columns:

date (date) - the date of the estimate
low (numeric) - the lower published estimate of the reproduction number
high (numeric) - the higher published estimate of the reproduction number

No mandatory groupings.

No default value.

113 rows and 3 columns

Daily COVID-19 case counts by age group in England

Description

A dataset of the daily count of COVID-19 cases by age group in England downloaded from the UKHSA coronavirus API, and formatted for use in ggoutbreak. A denominator is calculated which is the overall positive count for all age groups. This data set can be used to calculate group-wise incidence and absolute growth rates and group wise proportions and relative growth rates by age group.

Usage

data(england_covid)
data(england_covid)

Format

A dataframe containing the following columns:

date (as.Date) - the date column
class (enum(⁠00_04⁠,⁠05_09⁠,⁠10_14⁠,⁠15_19⁠,⁠20_24⁠,⁠25_29⁠,⁠30_34⁠,⁠35_39⁠,⁠40_44⁠,⁠45_49⁠,⁠50_54⁠,⁠55_59⁠,⁠60_64⁠,⁠65_69⁠,⁠70_74⁠,⁠75_79⁠,⁠80_84⁠,⁠85_89⁠,⁠90+⁠)) - the class column
count (numeric) - the test positives for each age group
denom (numeric) - the test positives for all age groups
time (time_period) - the time column

Must be grouped by: class (and other groupings allowed).

No default value.

26790 rows and 5 columns

Details

You may want england_covid_proportion instead which includes the population denominator. The denominator here is the total number of positive tests in all age groups and not the tests taken or denominators.

England COVID-19 PCR test positivity

Description

The coronavirus.gov.uk dashboard published tests conducted and positive results as separate data sets for a range of geographies. In this case the data is combined with testing rate as denominator, and positives as count for England.

Usage

data(england_covid_pcr_positivity)
data(england_covid_pcr_positivity)

Format

A dataframe containing the following columns:

date (date) - a daily time series
time (time_period) - the time column
count (numeric) - test positives in England on that day
denom (numeric) - total tests conducted on that day

No mandatory groupings.

No default value.

1413 rows and 4 columns

England COVID by age group for ascertainment

Description

An age group stratified dataset from

Usage

data(england_covid_proportion)
data(england_covid_proportion)

Format

A dataframe containing the following columns:

class (character) - the age group
date (date) - the start date of a week
count (numeric) - the count of COVID positives
denom (numeric) - the number of COVID tests performed
population (numeric) - the size of the population at this age group
time (time_period) - the time column (weekly)

Must be grouped by: class (and other groupings allowed).

No default value.

1050 rows and 6 columns

Details

the coronavirus.gov.uk site for positive cases aggregated to 10 year age groups and by weekly time.
NHS test and trace date which reported regional by age group testing effort aggregated to country level.
ONS 2021 census population aggregated to 10 year age groups.

England demographics

Description

Population counts by 5 year age group for England only from the 2021 census.

Usage

data(england_demographics)
data(england_demographics)

Format

A dataframe containing the following columns:

class (enum(⁠00_04⁠,⁠05_09⁠,⁠10_14⁠,⁠15_19⁠,⁠20_24⁠,⁠25_29⁠,⁠30_34⁠,⁠35_39⁠,⁠40_44⁠,⁠45_49⁠,⁠50_54⁠,⁠55_59⁠,⁠60_64⁠,⁠65_69⁠,⁠70_74⁠,⁠75_79⁠,⁠80_84⁠,⁠85_89⁠,⁠90+⁠)) - the class column
population (numeric) - the population count column
baseline_proportion (numeric) - the baseline proportion is the proportion this age group makes up of the total.

Must be grouped by: class (and other groupings allowed).

No default value.

19 rows and 3 columns

Source

https://www.ons.gov.uk/file?uri=/peoplepopulationandcommunity/populationandmigration/populationestimates/datasets/populationandhouseholdestimatesenglandandwalescensus2021/census2021/census2021firstresultsenglandwales1.xlsx

Key dated in the COVID-19 response in England

Description

This includes mainly the dates of lockdowns, releases from social distancing measures and the dates that new variants were first detected.

Usage

data(england_events)
data(england_events)

Format

A dataframe containing the following columns:

label (character) - the event label
start (date) - the event start date
end (date) - the (optional) event end date

No mandatory groupings.

No default value.

13 rows and 3 columns

NHS COVID-19 app data

Description

check-in (social activity) and alerts (self isolation instruction) data from the NHS COVID-19 app, aggregated to country level on a week by week basis.

Usage

data(england_nhs_app)
data(england_nhs_app)

Format

A dataframe containing the following columns:

date (date) - the start date of the week
alerts (integer) - the count of self-isolation alerts
visits (integer) - the number of venue check-ins representing visits to social venues.
time (time_period) - the time column

No mandatory groupings.

No default value.

137 rows and 4 columns

The england_ons_infection_survey dataset

Description

The COVID-19 ONS infection survey took a random sample of the population and provides an estimate of the prevalence of COVID-19 that is supposedly free from ascertainment bias.

Usage

data(england_ons_infection_survey)
data(england_ons_infection_survey)

Format

A dataframe containing the following columns:

date (date) - the date column
geography (character) - the geography column
prevalence.0.5 (numeric) - the median proportion of people in the region testing positive for COVID-19
prevalence.0.025 (numeric) - the lower CI of the proportion of people in the region testing positive for COVID-19
prevalence.0.975 (numeric) - the upper CI of the proportion of people in the region testing positive for COVID-19
denom (integer) - the sample size on which this estimate was made (daily rate inferred from weekly sample sizes.)
time (time_period) - the time column

No mandatory groupings.

No default value.

9820 rows and 7 columns

Details

The data is available here: https://www.ons.gov.uk/file?uri=/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/datasets/coronaviruscovid19infectionsurveydata/2023/20230310covid19infectionsurveydatasetsengland.xlsx

Counts of COVID-19 variants

Description

Data from the COG-UK and Sanger centre sequencing programme. The data were made available through the Welcome foundation at Lower tier local authority level, and is weekly time series of counts per variant. Variants were assigned using the tree structure of the Pango lineage. Different sub-lineages are aggregated to the major WHO variants of concern.

Usage

data(england_variants)
data(england_variants)

Format

A dataframe containing the following columns:

date (date) - the end date of the week
time (time_period) - the time column
class (enum(Other,Alpha (B.1.1.7),Delta (B.1.617.2),Delta (AY.4),Omicron (Other),Omicron (BA.2),Omicron (BA.4),Omicron (BA.5),XBB (Other),Kraken (XBB.1.5),Arcturus (XBB.1.16),Eris (EG.5.1))) - the class column
who_class (enum(Other,Alpha,Delta,Omicron,Kraken,Arcturus,Eris)) - the who_class column
count (numeric) - the weekly count column
denom (numeric) - the number of sequences performed in that week

Must be grouped by: class (and other groupings allowed).

No default value.

479 rows and 6 columns

Format date as dmy

Description

Format date as dmy

Usage

fdmy(date)
fdmy(date)

Arguments

date

a date to convert

Value

the formatted date

Examples

fdmy(Sys.Date())
fdmy(Sys.Date())

Print a summary of an infectivity profile

Description

Print a summary of an infectivity profile

Usage

format_ip(ip = i_empirical_ip)
format_ip(ip = i_empirical_ip)

Arguments

ip

the infectivity profile to summarise. a0 and a1 columns are optional if tau is given.

A dataframe containing the following columns:

boot (anything + default(1)) - a bootstrap identifier
probability (proportion) - the probability of new event during this period.
a0 (double) - the beginning of the time period (in days)
a1 (double) - the end of the time period (in days)

Must be grouped by: boot (exactly).

A default value is defined.

Value

an infectivity profile description

Examples

format_ip(ganyani_ip)

format_ip(ganyani_ip)

A COVID-19 infectivity profile based on an Ganyani et al 2020

Description

A COVID-19 infectivity profile based on an Ganyani et al 2020

The Ganyani serial interval dataset, compatible with EpiEstim

Usage

data(ganyani_ip)
data(ganyani_ip)

Format

A dataframe containing the following columns:

boot (anything + default(1)) - a bootstrap identifier
probability (proportion) - the probability of infection between previous time period until time
tau (double) - the time index this probability relates to (in days)
a0 - the beginning of the time period
a1 - the end of the time period

Grouped by boot (exactly).

A dataframe containing the following columns:

boot (anything + default(1)) - a bootstrap identifier
probability (proportion) - the probability of infection between previous time period until time
tau (numeric) - the time index this probability relates to (in days)
a0 (numeric) - the beginning of the time period
a1 (numeric) - the end of the time period

Grouped by: boot.

2400 rows and 5 columns

References

https://www.eurosurveillance.org/content/10.2807/1560-7917.ES.2020.25.17.2000257

A COVID-19 infectivity profile based on an Ganyani et al 2020

Description

This version is discretised in a manner that makes it incompatible with EpiEstim.

Usage

data(ganyani_ip_2)
data(ganyani_ip_2)

Format

A dataframe containing the following columns:

boot (anything + default(1)) - a bootstrap identifier
probability (proportion) - the probability of infection between previous time period until time
tau (double) - the time index this probability relates to (in days)
a0 - the beginning of the time period
a1 - the end of the time period

Grouped by boot (exactly).

A dataframe containing the following columns:

boot (anything + default(1)) - a bootstrap identifier
probability (proportion) - the probability of infection between previous time period until time
tau (numeric) - the time index this probability relates to (in days)
a0 (numeric) - the beginning of the time period
a1 (numeric) - the end of the time period

Grouped by: boot.

2800 rows and 5 columns

References

https://www.eurosurveillance.org/content/10.2807/1560-7917.ES.2020.25.17.2000257

Add time series event markers to a time series plot.

Description

The x axis must be a date.

Usage

geom_events(
  events = i_events,
  event_label_size = 7,
  event_label_colour = "black",
  event_label_angle = -30,
  event_line_colour = "grey50",
  event_fill_colour = "grey50",
  hide_labels = FALSE,
  guide_axis = ggplot2::derive(),
  ...
)
geom_events(
  events = i_events,
  event_label_size = 7,
  event_label_colour = "black",
  event_label_angle = -30,
  event_line_colour = "grey50",
  event_fill_colour = "grey50",
  hide_labels = FALSE,
  guide_axis = ggplot2::derive(),
  ...
)

Arguments

`events`	Significant events or time spans A dataframe containing the following columns: label (character) - the event label start (date) - the start date, or the date of the event end (date) - the end date or NA if a single event Any grouping allowed. A default value is defined.
`event_label_size`	how big to make the event label
`event_label_colour`	the event label colour
`event_label_angle`	the event label colour
`event_line_colour`	the event line colour
`event_fill_colour`	the event area fill
`hide_labels`	do not show labels at all
`guide_axis`	a guide axis configuration for the labels (see `ggplot2::guide_axis` and `ggplot2::dup_axis`). This can be used to specify a position amongst other things.
`...`	Named arguments passed on to `ggplot2::scale_x_date` `name` The name of the scale. Used as the axis or legend title. If `waiver()`, the default, the name of the scale is taken from the first mapping used for that aesthetic. If `NULL`, the legend title will be omitted. `breaks` One of: `NULL` for no breaks `waiver()` for the breaks specified by `date_breaks` A `Date`/`POSIXct` vector giving positions of breaks A function that takes the limits as input and returns breaks as output `date_breaks` A string giving the distance between breaks like "2 weeks", or "10 years". If both `breaks` and `date_breaks` are specified, `date_breaks` wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'. `labels` One of: `NULL` for no labels `waiver()` for the default labels computed by the transformation object A character vector giving labels (must be same length as `breaks`) An expression vector (must be the same length as breaks). See ?plotmath for details. A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation. `date_labels` A string giving the formatting specification for the labels. Codes are defined in `strftime()`. If both `labels` and `date_labels` are specified, `date_labels` wins. `minor_breaks` One of: `NULL` for no breaks `waiver()` for the breaks specified by `date_minor_breaks` A `Date`/`POSIXct` vector giving positions of minor breaks A function that takes the limits as input and returns minor breaks as output `date_minor_breaks` A string giving the distance between minor breaks like "2 weeks", or "10 years". If both `minor_breaks` and `date_minor_breaks` are specified, `date_minor_breaks` wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'. `limits` One of: `NULL` to use the default scale range A numeric vector of length two providing limits of the scale. Use `NA` to refer to the existing minimum or maximum A function that accepts the existing (automatic) limits and returns new limits. Also accepts rlang lambda function notation. Note that setting limits on positional scales will remove data outside of the limits. If the purpose is to zoom, use the limit argument in the coordinate system (see `coord_cartesian()`). `expand` For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function `expansion()` to generate the values for the `expand` argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables. `oob` One of: Function that handles limits outside of the scale limits (out of bounds). Also accepts rlang lambda function notation. The default (`scales::censor()`) replaces out of bounds values with `NA`. `scales::squish()` for squishing out of bounds values into range. `scales::squish_infinite()` for squishing infinite values into range. `guide` A function used to create a guide or its name. See `guides()` for more information. `position` For position scales, The position of the axis. `left` or `right` for y axes, `top` or `bottom` for x axes. `sec.axis` `sec_axis()` is used to specify a secondary axis. `timezone` The timezone to use for display on the axes. The default (`NULL`) uses the timezone encoded in the data. `na.value` Missing values will be replaced with this value.

Value

a set of geoms for a time series.

Weekly COVID-19 case counts by age group in Germany

Description

A dataset of the weekly count of COVID-19 cases by age group in Germany downloaded from the Robert Koch Institute Survstat service, and formatted for use in growth rates. A denominator is calculated which is the overall positive count for all age groups. This data set can be used to calculate group-wise incidence and absolute growth rates and group wise proportions and relative growth rates.

Usage

data(germany_covid)
data(germany_covid)

Format

A dataframe containing the following columns:

class (enum(⁠0–14⁠,⁠15–19⁠,⁠20–24⁠,⁠25–29⁠,⁠30–39⁠,⁠40–49⁠,⁠50–59⁠,⁠60–69⁠,⁠70–79⁠,⁠80+⁠,Unknown, .ordered=TRUE)) - the age group
date (as.Date) - the date column
count (integer) - the test positives for each age group
time (time_period) - the time column
denom (integer) - the test positives for all age groups

Must be grouped by: class (and other groupings allowed).

No default value.

2070 rows and 6 columns

Germany demographics

Description

Derived from the Robert Koch Survstat service by comparing counts and incidence rates.

Usage

data(germany_demographics)
data(germany_demographics)

Format

A dataframe containing the following columns:

class (enum(⁠0–14⁠,⁠15–19⁠,⁠20–24⁠,⁠25–29⁠,⁠30–39⁠,⁠40–49⁠,⁠50–59⁠,⁠60–69⁠,⁠70–79⁠,⁠80+⁠, .ordered=TRUE)) - the class column
population (integer) - the population column

Must be grouped by: class (and other groupings allowed).

No default value.

10 rows and 2 columns

Infers a daily baseline population for a timeseries

Description

This function augments any timeseries with a population denominator. The population data may be static estimates, or a set of estimates at time points. The population data may be grouped in which case the grouping might be geographical area or age group or gender for example. The two inputs must have compatible grouping (i.e. all the groups in the population data must be present in the timeseries).

Usage

infer_population(df = i_timeseries, pop = i_population_data)
infer_population(df = i_timeseries, pop = i_population_data)

Arguments

df

A time series, or a grouped collection of time series.

A dataframe containing the following columns:

time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period'

Any grouping allowed.

pop

The population data must be grouped in the same way as df. It might also have a time column as a time_period if the population is not static

A dataframe containing the following columns:

population (positive_integer) - Size of population

Any grouping allowed.

Value

the df timeseries with additional population column

Examples

ggoutbreak::england_covid %>%
  ggoutbreak::infer_population(ggoutbreak::england_demographics) %>%
  dplyr::glimpse()
ggoutbreak::england_covid %>%
  ggoutbreak::infer_population(ggoutbreak::england_demographics) %>%
  dplyr::glimpse()

Infer the prevalence of disease from incidence estimates and population size.

Description

Log-scaled incidence estimates are used to generate samples of incidence. These are convolved with the infectivity profile (as a set of discrete distributions) and the result is evaluated as a fraction of population. The result is fitted to a logit-normal (using moments) and quantiles produced from samples.

Usage

infer_prevalence(
  modelled = i_incidence_model,
  pop = i_population_data,
  ip = i_discrete_ip,
  bootstraps = 1000,
  seed = Sys.time(),
  ...
)
infer_prevalence(
  modelled = i_incidence_model,
  pop = i_population_data,
  ip = i_discrete_ip,
  bootstraps = 1000,
  seed = Sys.time(),
  ...
)

Arguments

`modelled`	Model output from processing the `raw` dataframe with something like `poission_locfit_model` A dataframe containing the following columns: time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period' incidence.fit (double) - an estimate of the incidence rate on a log scale incidence.se.fit (positive_double) - the standard error of the incidence rate estimate on a log scale incidence.0.025 (positive_double) - lower confidence limit of the incidence rate (true scale) incidence.0.5 (positive_double) - median estimate of the incidence rate (true scale) incidence.0.975 (positive_double) - upper confidence limit of the incidence rate (true scale) Any grouping allowed.
`pop`	The population data must be grouped in the same way as `modelled`. A dataframe containing the following columns: population (positive_integer) - Size of population Any grouping allowed.
`ip`	A discrete distribution representing the probability of detection of disease on a given day after infection. This will not necessarily sum to one, but each entry will not exceed the asymptomatic fraction. A dataframe containing the following columns: boot (anything + default(1)) - a bootstrap identifier probability (proportion) - the probability of new event during this period. tau (integer + complete) - the days since the index event. Must be grouped by: boot (exactly). A default value is defined.
`bootstraps`	the number of samples to take at each time point. This will be rounded up to a whole multiple of the infectivity profile distribution length.
`seed`	a random number seed for reproducibility
`...`	not used

Value

the modelled input with additional proportion columns

Examples

tmp = ggoutbreak::england_covid %>%
  dplyr::filter(date < "2021-01-01") %>%
  time_aggregate(count=sum(count)) %>%
  ggoutbreak::poisson_locfit_model(window=21) %>%
  ggoutbreak::infer_prevalence(
     pop = 56489700,
     # N.B. timing is difficult in this example, as we are using cases.
     ip = ggoutbreak::covid_test_sensitivity
  )

if(interactive()) {
  plot_prevalence(tmp)
}
tmp = ggoutbreak::england_covid %>%
  dplyr::filter(date < "2021-01-01") %>%
  time_aggregate(count=sum(count)) %>%
  ggoutbreak::poisson_locfit_model(window=21) %>%
  ggoutbreak::infer_prevalence(
     pop = 56489700,
     # N.B. timing is difficult in this example, as we are using cases.
     ip = ggoutbreak::covid_test_sensitivity
  )

if(interactive()) {
  plot_prevalence(tmp)
}

Calculate a risk ratio from incidence (experimental)

Description

This enables incidence rates are able to be compared to a baseline figure for incidence. The baseline could come for example from a population average or average incidence over time. The output is an incidence rate ratio. The incidence_baseline column is a rate of events per unit time. The time unit is expected to be the same as that of the date in modelled and this is not checked.

Usage

infer_rate_ratio(
  modelled = i_incidence_model,
  base = i_baseline_incidence_data,
  ...
)
infer_rate_ratio(
  modelled = i_incidence_model,
  base = i_baseline_incidence_data,
  ...
)

Arguments

modelled

Model output from something like poisson_locfit_model(). It really makes sense if this is a grouped model.

A dataframe containing the following columns:

time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period'
incidence.fit (double) - an estimate of the incidence rate on a log scale
incidence.se.fit (positive_double) - the standard error of the incidence rate estimate on a log scale
incidence.0.025 (positive_double) - lower confidence limit of the incidence rate (true scale)
incidence.0.5 (positive_double) - median estimate of the incidence rate (true scale)
incidence.0.975 (positive_double) - upper confidence limit of the incidence rate (true scale)

Any grouping allowed.

base

The baseline data must be grouped in the same way as modelled. It may be a time series but does not have to be. See the example and note this may change in the future.

A dataframe containing the following columns:

baseline_incidence (positive_double) - Baseline raw incidence rate as count data

Any grouping allowed.

...

not used

Value

a dataframe with incidence rate ratios for each of the classes in modelled. A dataframe containing the following columns:

time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
rate_ratio.0.025 (positive_double) - lower confidence limit of the rate ratio for a population group
rate_ratio.0.5 (positive_double) - median estimate of the rate ratio for a population group
rate_ratio.0.975 (positive_double) - upper confidence limit of the rate ratio for a population group

Any grouping allowed.

Examples


baseline = ggoutbreak::england_covid %>%
  ggoutbreak::time_aggregate() %>%
  ggoutbreak::poisson_locfit_model(window=21) %>%
  dplyr::mutate(baseline_incidence = incidence.0.5)


tmp = ggoutbreak::england_covid %>%
  ggoutbreak::poisson_locfit_model(window=21) %>%
  ggoutbreak::infer_rate_ratio(baseline) %>%
  dplyr::glimpse()



baseline = ggoutbreak::england_covid %>%
  ggoutbreak::time_aggregate() %>%
  ggoutbreak::poisson_locfit_model(window=21) %>%
  dplyr::mutate(baseline_incidence = incidence.0.5)


tmp = ggoutbreak::england_covid %>%
  ggoutbreak::poisson_locfit_model(window=21) %>%
  ggoutbreak::infer_rate_ratio(baseline) %>%
  dplyr::glimpse()

Calculate a normalised risk ratio from proportions

Description

This assumes that for example, case distribution proportions are stratified by a population grouping, e.g. geography or age, and we have estimates of the size of that population during that time period. Normalising by population proportion allows us to compare the relative risk of the outcome in groups, compared to the expected population risk if the outcome is evenly distributed across the population. There may be other proportions other than population fractions that may be useful to compare. At the moment this does not handle any uncertainty.

Usage

infer_risk_ratio(
  modelled = i_proportion_model,
  base = i_baseline_proportion_data,
  ...
)
infer_risk_ratio(
  modelled = i_proportion_model,
  base = i_baseline_proportion_data,
  ...
)

Arguments

modelled

Model output from something like proportion_locfit_model()

A dataframe containing the following columns:

time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period'
proportion.fit (double) - an estimate of the proportion on a logit scale
proportion.se.fit (positive_double) - the standard error of proportion estimate on a logit scale
proportion.0.025 (proportion) - lower confidence limit of proportion (true scale)
proportion.0.5 (proportion) - median estimate of proportion (true scale)
proportion.0.975 (proportion) - upper confidence limit of proportion (true scale)

Any grouping allowed.

base

The baseline data must be grouped in the same way as modelled. It may be a time series but does not have to be.

A dataframe containing the following columns:

baseline_proportion (proportion) - Baseline proportion for comparison

Any grouping allowed.

...

not used

Value

a dataframe with relative risk / risk ratio columns. A dataframe containing the following columns:

time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
risk_ratio.0.025 (positive_double) - lower confidence limit of the excess risk ratio for a population group
risk_ratio.0.5 (positive_double) - median estimate of the excess risk ratio for a population group
risk_ratio.0.975 (positive_double) - upper confidence limit of the excess risk ratio for a population group

Any grouping allowed.

Examples




tmp = ggoutbreak::england_covid %>%
  ggoutbreak::proportion_locfit_model(window=21) %>%
  ggoutbreak::infer_risk_ratio(ggoutbreak::england_demographics) %>%
  dplyr::glimpse()

if(interactive()) {
  plot_growth_phase(tmp)
}

tmp = ggoutbreak::england_covid %>%
  ggoutbreak::proportion_locfit_model(window=21) %>%
  ggoutbreak::infer_risk_ratio(ggoutbreak::england_demographics) %>%
  dplyr::glimpse()

if(interactive()) {
  plot_growth_phase(tmp)
}

Strictly integer breaks for continuous scale

Description

Strictly integer breaks for continuous scale

Usage

integer_breaks(n = 5, ...)
integer_breaks(n = 5, ...)

Arguments

`n`	number of breaks
`...`	further arguments for methods.

Value

a ggplot breaks function

Calculate a growth rate from a reproduction number and an infectivity profile

Description

Calculate a growth rate from a reproduction number and an infectivity profile

Usage

inv_wallinga_lipsitch(
  Rt,
  y = i_empirical_ip,
  a1 = seq(0.5, length.out = length(y)),
  a0 = dplyr::lag(a1, default = 0)
)
inv_wallinga_lipsitch(
  Rt,
  y = i_empirical_ip,
  a1 = seq(0.5, length.out = length(y)),
  a0 = dplyr::lag(a1, default = 0)
)

Arguments

`Rt`	a vector of reproduction numbers
`y`	an empirical infectivity profile as a probability vector or as a dataframe of format: A dataframe containing the following columns: boot (anything + default(1)) - a bootstrap identifier probability (proportion) - the probability of new event during this period. a0 (double) - the beginning of the time period (in days) a1 (double) - the end of the time period (in days) Must be grouped by: boot (exactly). A default value is defined.
`a1`	the end time of the infectivity profile probability estimate (defaults to 0.5,1.5,2.5,...).
`a0`	the start time of the infectivity profile probability estimate (defaults to 0,0.5,1.5,...).

Value

an vector of growth rates

Examples

y=stats::pgamma(seq(0.5,length.out=50), 5,2)-stats::pgamma(c(0,seq(0.5,length.out=49)), 5,2)
inv_wallinga_lipsitch(Rt=seq(0.5,2.5,length.out=9), y=y)
y=stats::pgamma(seq(0.5,length.out=50), 5,2)-stats::pgamma(c(0,seq(0.5,length.out=49)), 5,2)
inv_wallinga_lipsitch(Rt=seq(0.5,2.5,length.out=9), y=y)

Check whether vector is a date

Description

Check whether vector is a date

Usage

is.Date(x)
is.Date(x)

Arguments

`x`	a vector to check

Value

TRUE if dates, FALSE otherwise

Examples

is.Date(Sys.Date())
is.Date(Sys.Date())

Extract Parts of a POSIXt or Date Object

Description

Extract the weekday, month or quarter, or the Julian time (days since some origin). These are generic functions: the methods for the internal date-time classes are documented here.

Usage

## S3 method for class 'time_period'
julian(x, ...)
## S3 method for class 'time_period'
julian(x, ...)

Arguments

`x`	an object inheriting from class `"POSIXt"` or `"Date"`.
`...`	arguments for other methods.

Value

weekdays and months return a character vector of names in the locale in use, i.e., Sys.getlocale("LC_TIME").

quarters returns a character vector of "Q1" to "Q4".

julian returns the number of days (possibly fractional) since the origin, with the origin as a "origin" attribute. All time calculations in R are done ignoring leap-seconds.

Note

Other components such as the day of the month or the year are very easy to compute: just use as.POSIXlt and extract the relevant component. Alternatively (especially if the components are desired as character strings), use strftime.

Examples

## first two are locale dependent:
weekdays(.leap.seconds)
months  (.leap.seconds)
quarters(.leap.seconds)

## Show how easily you get month, day, year, day (of {month, week, yr}), ... :
## (remember to count from 0 (!): mon = 0..11, wday = 0..6,  etc !!)

##' Transform (Time-)Date vector  to  convenient data frame :
dt2df <- function(dt, dName = deparse(substitute(dt))) {
    DF <- as.data.frame(unclass(as.POSIXlt( dt )))
    `names<-`(cbind(dt, DF, deparse.level=0L), c(dName, names(DF)))
}
## e.g.,
dt2df(.leap.seconds)    # date+time
dt2df(Sys.Date() + 0:9) # date

##' Even simpler:  Date -> Matrix - dropping time info {sec,min,hour, isdst}
d2mat <- function(x) simplify2array(unclass(as.POSIXlt(x))[4:7])
## e.g.,
d2mat(seq(as.Date("2000-02-02"), by=1, length.out=30)) # has R 1.0.0's release date


## Julian Day Number (JDN, https://en.wikipedia.org/wiki/Julian_day)
## is the number of days since noon UTC on the first day of 4317 BCE.
## in the proleptic Julian calendar.  To more recently, in
## 'Terrestrial Time' which differs from UTC by a few seconds
## See https://en.wikipedia.org/wiki/Terrestrial_Time
julian(Sys.Date(), -2440588) # from a day
floor(as.numeric(julian(Sys.time())) + 2440587.5) # from a date-time

## first two are locale dependent:
weekdays(.leap.seconds)
months  (.leap.seconds)
quarters(.leap.seconds)

## Show how easily you get month, day, year, day (of {month, week, yr}), ... :
## (remember to count from 0 (!): mon = 0..11, wday = 0..6,  etc !!)

##' Transform (Time-)Date vector  to  convenient data frame :
dt2df <- function(dt, dName = deparse(substitute(dt))) {
    DF <- as.data.frame(unclass(as.POSIXlt( dt )))
    `names<-`(cbind(dt, DF, deparse.level=0L), c(dName, names(DF)))
}
## e.g.,
dt2df(.leap.seconds)    # date+time
dt2df(Sys.Date() + 0:9) # date

##' Even simpler:  Date -> Matrix - dropping time info {sec,min,hour, isdst}
d2mat <- function(x) simplify2array(unclass(as.POSIXlt(x))[4:7])
## e.g.,
d2mat(seq(as.Date("2000-02-02"), by=1, length.out=30)) # has R 1.0.0's release date


## Julian Day Number (JDN, https://en.wikipedia.org/wiki/Julian_day)
## is the number of days since noon UTC on the first day of 4317 BCE.
## in the proleptic Julian calendar.  To more recently, in
## 'Terrestrial Time' which differs from UTC by a few seconds
## See https://en.wikipedia.org/wiki/Terrestrial_Time
julian(Sys.Date(), -2440588) # from a day
floor(as.numeric(julian(Sys.time())) + 2440587.5) # from a date-time

Label a time period

Description

Create a set of labels for a time period based on the start and duration of the period. The format is configurable using the start and end dates and the dfmt and ifmt parameters, however if the time period has names then these are used in preference.

Usage

## S3 method for class 'time_period'
labels(
  object,
  ...,
  dfmt = "%d/%b",
  ifmt = "{start} — {end}",
  na.value = "Unknown"
)
## S3 method for class 'time_period'
labels(
  object,
  ...,
  dfmt = "%d/%b",
  ifmt = "{start} — {end}",
  na.value = "Unknown"
)

Arguments

`object`	a set of decimal times as a time_period
`...`	not used
`dfmt`	a `strptime` format specification for the format of the date
`ifmt`	a glue spec referring to `start` and `end` of the period as a formatted date
`na.value`	a label for `NA` times

Value

a set of character labels for the time

Examples

eg = as.time_period(Sys.Date()+0:10*7, anchor="start")
labels(eg)
labels(eg, ifmt="{start}", dfmt="%d/%b/%y")
labels(eg, ifmt="until {end}", dfmt="%d %b %Y")

# labels retained in constructor:
eg2 = Sys.Date()+0:10*7
names(eg2) = paste0("week ",0:10)
labels(eg2)
labels(as.time_period(eg2, anchor="start"))
eg = as.time_period(Sys.Date()+0:10*7, anchor="start")
labels(eg)
labels(eg, ifmt="{start}", dfmt="%d/%b/%y")
labels(eg, ifmt="until {end}", dfmt="%d %b %Y")

# labels retained in constructor:
eg2 = Sys.Date()+0:10*7
names(eg2) = paste0("week ",0:10)
labels(eg2)
labels(as.time_period(eg2, anchor="start"))

logit scale

Description

Perform logit scaling with correct axis formatting. To not be used directly but with ggplot (e.g. ggplot2::scale_y_continuous(trans = "logit") )

Usage

logit_trans(n = 5, ...)
logit_trans(n = 5, ...)

Arguments

`n`	the number of breaks
`...`	not used, for compatibility

Value

A scales object

Examples


library(ggplot2)
library(tibble)

tibble::tibble(pvalue = c(0.001, 0.05, 0.1), fold_change = 1:3) %>%
 ggplot2::ggplot(aes(fold_change , pvalue)) +
 ggplot2::geom_point() +
 ggplot2::scale_y_continuous(trans = "logit")

library(ggplot2)
library(tibble)

tibble::tibble(pvalue = c(0.001, 0.05, 0.1), fold_change = 1:3) %>%
 ggplot2::ggplot(aes(fold_change , pvalue)) +
 ggplot2::geom_point() +
 ggplot2::scale_y_continuous(trans = "logit")

Recover a long format infectivity profile from an `EpiEstim` style matrix

Description

Recover a long format infectivity profile from an EpiEstim style matrix

Usage

make_empirical_ip(omega, normalise = TRUE)
make_empirical_ip(omega, normalise = TRUE)

Arguments

`omega`	a matrix of probabilities, starting at time zero, with columns representing one possible infectivity profile, with the fist value being the probability at time zero to 0.5. Or a vector of probabilities for one single profile, resulting in 1 bootstrap.
`normalise`	is this a probability mass function? In which case we make the sum of this equal to one (the default). If `FALSE` then the input matrix, or vector is clipped so that its maximum value is one. If this is a number then the PMF is scaled to this value.

Value

a long format ip delay distribution

Examples

format_ip(make_empirical_ip(c(0,0,1,1,1,2,2,2,1,1)))
format_ip(make_empirical_ip(c(0,0,1,1,1,2,2,2,1,1)))

Make an infectivity profile from published data

Description

The infectivity profile is typically fitted to data by MCMC and reported as median and 95% credible intervals, of the mean, and the SD of (usually_ a gamma distribution. This function generates a discrete infectivity probability distribution representing the chance that an infectee was infected on any specific day after the infector was infected (given that the infectee was infected).

Usage

make_gamma_ip(
  median_of_mean,
  lower_ci_of_mean = median_of_mean,
  upper_ci_of_mean = median_of_mean,
  median_of_sd = sqrt(median_of_mean),
  lower_ci_of_sd = median_of_sd,
  upper_ci_of_sd = median_of_sd,
  correlation = NA,
  n_boots = 100,
  epiestim_compat = FALSE,
  epiestim_sampler = epiestim_compat,
  z_crit = 0.95
)
make_gamma_ip(
  median_of_mean,
  lower_ci_of_mean = median_of_mean,
  upper_ci_of_mean = median_of_mean,
  median_of_sd = sqrt(median_of_mean),
  lower_ci_of_sd = median_of_sd,
  upper_ci_of_sd = median_of_sd,
  correlation = NA,
  n_boots = 100,
  epiestim_compat = FALSE,
  epiestim_sampler = epiestim_compat,
  z_crit = 0.95
)

Arguments

`median_of_mean`, `lower_ci_of_mean`, `upper_ci_of_mean`	Parameters of the infectivity profile mean.
`median_of_sd`, `lower_ci_of_sd`, `upper_ci_of_sd`	Parameters of the infectivity profile SD.
`correlation`	the correlation between mean and sd. this is optional and will be inferred if not changed form the default `NA` value.
`n_boots`	The number of samples to generate.
`epiestim_compat`	Use `EpiEstim` to generate the infectivity profiles. A true value here results in an infectivity profile with probability of 0 for day 0.
`epiestim_sampler`	Use `EpiEstim` to generate the random samples using independent truncated normal distributions for mean and SD based on parameters above. If `FALSE` then it will use a log normal distributions with correlation.
`z_crit`	the width of the confidence intervals (defaults to 95%).

Details

EpiEstim generates these distributions by sampling from a truncated normal distribution for both mean and sd. The means and sds thus produced are discretised using a gamma distribution offset by 1 day, to enforce that the probability of infection on day zero is zero.

This constraint changes the shape of the distribution somewhat and may cause a small bias (although there is no ground truth to evaluate). A different sampling and discretisation strategy is provided. The sampler uses log-normal distributions for both mean and SD with a degree of correlation. The discretizer assigns probabilities direct from the CDF of the gamma distribution without offset. This results in non zero values for the probability at time zero and can only be used with Rt estimation methods that can handle zero/negative serial intervals (e.g. rt_from_incidence or rt_from_renewal, or rt_from_growth_rate).

Value

a long format infectivity profile data frame, or a list of dataframes if input is a vector.

Examples

# COVID-19 estimates from Ganyani et al 2020.
tmp = make_gamma_ip(5.2, 3.78, 6.78, 1.72, 0.91, 3.93,
  epiestim_sampler=FALSE, epiestim_compat=FALSE)

tmp %>%
  dplyr::group_by(boot) %>%
  dplyr::summarise(
    mean = sum(tau*probability),
    sd = sqrt(sum((tau-sum(tau*probability))^2*probability))
  ) %>%
  dplyr::summarise(
    mean = sprintf("%1.2f [%1.2f-%1.2f]",
      stats::quantile(mean,0.5),
      stats::quantile(mean,0.025),
      stats::quantile(mean,0.975)),
    sd = sprintf("%1.2f [%1.2f-%1.2f]",
      stats::quantile(sd,0.5),
      stats::quantile(sd,0.025),
      stats::quantile(sd,0.975))
  )

if(interactive()) {
  plot_ip(tmp, alpha=0.1) +
    ggplot2::coord_cartesian(xlim=c(0,15))
}

means = c(3,4,5)
ips = make_gamma_ip(means)

if (interactive()) {
  purrr::map2(ips,means, ~ .x %>% dplyr::mutate(label = sprintf("Mean: %1.2f",.y))) %>%
    purrr::map( ~ plot_ip(.x,alpha=0.1)+ggplot2::facet_wrap(~label))
}

# COVID-19 estimates from Ganyani et al 2020.
tmp = make_gamma_ip(5.2, 3.78, 6.78, 1.72, 0.91, 3.93,
  epiestim_sampler=FALSE, epiestim_compat=FALSE)

tmp %>%
  dplyr::group_by(boot) %>%
  dplyr::summarise(
    mean = sum(tau*probability),
    sd = sqrt(sum((tau-sum(tau*probability))^2*probability))
  ) %>%
  dplyr::summarise(
    mean = sprintf("%1.2f [%1.2f-%1.2f]",
      stats::quantile(mean,0.5),
      stats::quantile(mean,0.025),
      stats::quantile(mean,0.975)),
    sd = sprintf("%1.2f [%1.2f-%1.2f]",
      stats::quantile(sd,0.5),
      stats::quantile(sd,0.025),
      stats::quantile(sd,0.975))
  )

if(interactive()) {
  plot_ip(tmp, alpha=0.1) +
    ggplot2::coord_cartesian(xlim=c(0,15))
}

means = c(3,4,5)
ips = make_gamma_ip(means)

if (interactive()) {
  purrr::map2(ips,means, ~ .x %>% dplyr::mutate(label = sprintf("Mean: %1.2f",.y))) %>%
    purrr::map( ~ plot_ip(.x,alpha=0.1)+ggplot2::facet_wrap(~label))
}

Make an infectivity profile from posterior samples

Description

The infectivity profile is typically fitted to data by MCMC as a gamma distribution. This function generates a discrete infectivity probability distribution representing the chance that an infectee was infected on any specific day after the infector was infected (given that the infectee was infected), from posterior samples.

Usage

make_posterior_ip(
  ...,
  mean,
  sd,
  shape,
  rate,
  scale,
  epiestim_compat = FALSE,
  n_boots = 100
)
make_posterior_ip(
  ...,
  mean,
  sd,
  shape,
  rate,
  scale,
  epiestim_compat = FALSE,
  n_boots = 100
)

Arguments

`...`	not used, must be empty
`mean`	a vector of gamma distribution means
`sd`	a vector of gamma distribution sds
`shape`	a vector of gamma distribution shape parameters
`rate`	a vector of gamma distribution rate parameters
`scale`	a vector of gamma distribution scale parameters
`epiestim_compat`	Use `EpiEstim` to generate the infectivity profiles. A true value here results in an infectivity profile with probability of 0 for day 0.
`n_boots`	if there are more posterior samples than this limit then a maximum of `n_boots` ip distributions will be created (randomly sampled).

Details

If using EpiEstim and coarseDataTools::dic.fit.mcmc the output of the MCMC will be a S4 object with a samples slot, containing a dataframe of shape=var1 and scale=var2 columns. to use this output with make_posterior_ip invoke it like this:

do.call(make_posterior_ip, SI_fit_clever@samples %>% dplyr::rename(shape=var1, scale=var2))

N.b. only one combination of mean and sd, shape and rate, or shape and scale, are required.

Value

a long format ip delay distribution

Examples


tmp = make_posterior_ip(
  mean = stats::rnorm(100,5,0.1),
  sd = stats::rnorm(100,1.5,0.1)
)
tmp %>% dplyr::glimpse()
if (interactive()) plot_ip(tmp)

tmp = make_posterior_ip(
  mean = stats::rnorm(100,5,0.1),
  sd = stats::rnorm(100,1.5,0.1)
)
tmp %>% dplyr::glimpse()
if (interactive()) plot_ip(tmp)

Re-sample an empirical IP distribution direct from data

Description

Suits larger contact tracing data sets where there is a delay between 2 events which may or may not be precisely known.

Usage

make_resampled_ip(
  tau,
  min_tau = pmax(tau - 0.5, truncate),
  max_tau = tau + 0.5,
  add_noise = TRUE,
  truncate = 0,
  n_boots = 100,
  seed = Sys.time()
)
make_resampled_ip(
  tau,
  min_tau = pmax(tau - 0.5, truncate),
  max_tau = tau + 0.5,
  add_noise = TRUE,
  truncate = 0,
  n_boots = 100,
  seed = Sys.time()
)

Arguments

`tau`	the delay between first and second events
`min_tau`	the minimum delay for interval censored delays
`max_tau`	the maximum delay for interval censored delays
`add_noise`	adds noise to each date point for each replicate
`truncate`	what is the minimum realistic value of the parameter
`n_boots`	number of replicates to generate
`seed`	a random number seed for reproducibility

Value

a long format ip delay distribution

Examples

tau = rgamma2(100, 5,2)
ip = make_resampled_ip(min_tau = tau-1, max_tau = tau+1, seed = 100)
if(interactive()) {
  plot_ip(ip,alpha=0.1)
}
tau = rgamma2(100, 5,2)
ip = make_resampled_ip(min_tau = tau-1, max_tau = tau+1, seed = 100)
if(interactive()) {
  plot_ip(ip,alpha=0.1)
}

The maximum of a set of dates

Description

max.Date returns an integer and -Inf for a set of NA dates. This is usually inconvenient.

Usage

max_date(x, ...)
max_date(x, ...)

Arguments

`x`	a vector of dates
`...`	ignored

Value

a date. '0001-01-01“ if there is no well defined minimum.

Examples

max_date(NA)
max_date(NA)

The minimum of a set of dates

Description

min.Date returns an integer and Inf for a set of NA dates. This is usually inconvenient.

Usage

min_date(x, ...)
min_date(x, ...)

Arguments

`x`	a vector of dates
`...`	ignored

Value

a date. 9999-12-31 if there is no well defined minimum.

Examples

min_date(NA)
min_date(NA)

Extract Parts of a POSIXt or Date Object

Description

Extract the weekday, month or quarter, or the Julian time (days since some origin). These are generic functions: the methods for the internal date-time classes are documented here.

Usage

## S3 method for class 'time_period'
months(x, abbreviate = FALSE)
## S3 method for class 'time_period'
months(x, abbreviate = FALSE)

Arguments

`x`	an object inheriting from class `"POSIXt"` or `"Date"`.
`abbreviate`	logical vector (possibly recycled). Should the names be abbreviated?

Value

weekdays and months return a character vector of names in the locale in use, i.e., Sys.getlocale("LC_TIME").

quarters returns a character vector of "Q1" to "Q4".

julian returns the number of days (possibly fractional) since the origin, with the origin as a "origin" attribute. All time calculations in R are done ignoring leap-seconds.

Note

Examples

## first two are locale dependent:
weekdays(.leap.seconds)
months  (.leap.seconds)
quarters(.leap.seconds)

## Show how easily you get month, day, year, day (of {month, week, yr}), ... :
## (remember to count from 0 (!): mon = 0..11, wday = 0..6,  etc !!)

##' Transform (Time-)Date vector  to  convenient data frame :
dt2df <- function(dt, dName = deparse(substitute(dt))) {
    DF <- as.data.frame(unclass(as.POSIXlt( dt )))
    `names<-`(cbind(dt, DF, deparse.level=0L), c(dName, names(DF)))
}
## e.g.,
dt2df(.leap.seconds)    # date+time
dt2df(Sys.Date() + 0:9) # date

##' Even simpler:  Date -> Matrix - dropping time info {sec,min,hour, isdst}
d2mat <- function(x) simplify2array(unclass(as.POSIXlt(x))[4:7])
## e.g.,
d2mat(seq(as.Date("2000-02-02"), by=1, length.out=30)) # has R 1.0.0's release date


## Julian Day Number (JDN, https://en.wikipedia.org/wiki/Julian_day)
## is the number of days since noon UTC on the first day of 4317 BCE.
## in the proleptic Julian calendar.  To more recently, in
## 'Terrestrial Time' which differs from UTC by a few seconds
## See https://en.wikipedia.org/wiki/Terrestrial_Time
julian(Sys.Date(), -2440588) # from a day
floor(as.numeric(julian(Sys.time())) + 2440587.5) # from a date-time

## first two are locale dependent:
weekdays(.leap.seconds)
months  (.leap.seconds)
quarters(.leap.seconds)

## Show how easily you get month, day, year, day (of {month, week, yr}), ... :
## (remember to count from 0 (!): mon = 0..11, wday = 0..6,  etc !!)

##' Transform (Time-)Date vector  to  convenient data frame :
dt2df <- function(dt, dName = deparse(substitute(dt))) {
    DF <- as.data.frame(unclass(as.POSIXlt( dt )))
    `names<-`(cbind(dt, DF, deparse.level=0L), c(dName, names(DF)))
}
## e.g.,
dt2df(.leap.seconds)    # date+time
dt2df(Sys.Date() + 0:9) # date

##' Even simpler:  Date -> Matrix - dropping time info {sec,min,hour, isdst}
d2mat <- function(x) simplify2array(unclass(as.POSIXlt(x))[4:7])
## e.g.,
d2mat(seq(as.Date("2000-02-02"), by=1, length.out=30)) # has R 1.0.0's release date


## Julian Day Number (JDN, https://en.wikipedia.org/wiki/Julian_day)
## is the number of days since noon UTC on the first day of 4317 BCE.
## in the proleptic Julian calendar.  To more recently, in
## 'Terrestrial Time' which differs from UTC by a few seconds
## See https://en.wikipedia.org/wiki/Terrestrial_Time
julian(Sys.Date(), -2440588) # from a day
floor(as.numeric(julian(Sys.time())) + 2440587.5) # from a date-time

Multinomial time-series model.

Description

Takes a list of times, classes and counts, e.g. a COGUK variant like data set with time, (multinomial) class (e.g. variant) and count being the count in that time period. Fits a quadratic B-spline on time to the proportion of the data using nnet::multinom, with approx one degree of freedom per class and per window units of the time series

Usage

multinomial_nnet_model(
  d = i_multinomial_input,
  ...,
  window = 14,
  frequency = "1 day",
  predict = TRUE
)
multinomial_nnet_model(
  d = i_multinomial_input,
  ...,
  window = 14,
  frequency = "1 day",
  predict = TRUE
)

Arguments

`d`	multi-class count input
`...`	not used and present to allow proportion model to be used in a group_modify
`window`	a number of data points between knots, smaller values result in less smoothing, large value in more.
`frequency`	the density of the output estimates.
`predict`	result a prediction. If false we return the model.

Value

a new dataframe with time (as a time period), class, and proportion.0.5, or a model object

Examples

if (interactive()) {
  # not run due to long running
  tmp = ggoutbreak::england_covid %>%
    dplyr::filter(date > "2022-01-01") %>%
    ggoutbreak::multinomial_nnet_model(window=21) %>%
    dplyr::glimpse()
}
if (interactive()) {
  # not run due to long running
  tmp = ggoutbreak::england_covid %>%
    dplyr::filter(date > "2022-01-01") %>%
    ggoutbreak::multinomial_nnet_model(window=21) %>%
    dplyr::glimpse()
}

Calculate a normalised count per capita

Description

This assumes positive disease counts are stratified by a population grouping, e.g. geography or age, and we have estimates of the size of that population during that time period. Normalising by population size allows us to compare groups.

Usage

normalise_count(
  raw = i_incidence_data,
  pop = i_population_data,
  ...,
  population_unit = 1e+05,
  normalise_time = FALSE
)
normalise_count(
  raw = i_incidence_data,
  pop = i_population_data,
  ...,
  population_unit = 1e+05,
  normalise_time = FALSE
)

Arguments

`raw`	The count data A dataframe containing the following columns: count (positive_integer) - Positive case counts associated with the specified time frame time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period' Any grouping allowed.
`pop`	The population data must be grouped in the same way as `raw`. A dataframe containing the following columns: population (positive_integer) - Size of population Any grouping allowed.
`...`	not used
`population_unit`	What population unit do you want the count data normalised to e.g. per 100K
`normalise_time`	The default behaviour for normalising is to keep it in the same time units as the input data. If this parameter is set to `TRUE` the incidence rates are calculated per year. If given as a lubridate period string e.g. "1 week" then the incidence is calculated over that time period.

Value

a dataframe with incidence rates per unit capita. A dataframe containing the following columns:

population (positive_integer) - Size of population
count (positive_integer) - Positive case counts associated with the specified time frame
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
population_unit (double) - The population unit on which the per capita incidence rate is calculated
time_unit (lubridate::as.period) - The time period over which the per capita incidence rate is calculated

Any grouping allowed.

Examples

tmp = ggoutbreak::england_covid %>%
  ggoutbreak::normalise_count(ggoutbreak::england_demographics) %>%
  dplyr::glimpse()

tmp = ggoutbreak::england_covid %>%
  ggoutbreak::normalise_count(ggoutbreak::england_demographics) %>%
  dplyr::glimpse()

Calculate a normalised incidence rate per capita

Description

Usage

normalise_incidence(
  modelled = i_incidence_model,
  pop = i_population_data,
  ...,
  population_unit = 1e+05,
  normalise_time = FALSE
)
normalise_incidence(
  modelled = i_incidence_model,
  pop = i_population_data,
  ...,
  population_unit = 1e+05,
  normalise_time = FALSE
)

Arguments

`modelled`	Model output from processing the `raw` dataframe with something like `poission_locfit_model` A dataframe containing the following columns: time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period' incidence.fit (double) - an estimate of the incidence rate on a log scale incidence.se.fit (positive_double) - the standard error of the incidence rate estimate on a log scale incidence.0.025 (positive_double) - lower confidence limit of the incidence rate (true scale) incidence.0.5 (positive_double) - median estimate of the incidence rate (true scale) incidence.0.975 (positive_double) - upper confidence limit of the incidence rate (true scale) Any grouping allowed.
`pop`	The population data must be grouped in the same way as `modelled`. A dataframe containing the following columns: population (positive_integer) - Size of population Any grouping allowed.
`...`	not used
`population_unit`	what population unit do you want the incidence in e.g. per 100K
`normalise_time`	The default behaviour for incidence is to keep it in the same time units as the input data. If this parameter is set to `TRUE` the incidence rates are calculated per year. If given as a lubridate period string e.g. "1 day" then the incidence is calculated over that time period.

Value

a dataframe with incidence rates per unit capita. A dataframe containing the following columns:

time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
incidence.per_capita.fit (double) - an estimate of the incidence per capita rate on a log scale
incidence.per_capita.se.fit (positive_double) - the standard error of the incidence per capita rate estimate on a log scale
incidence.per_capita.0.025 (positive_double) - lower confidence limit of the incidence per capita rate (true scale)
incidence.per_capita.0.5 (positive_double) - median estimate of the incidence per capita rate (true scale)
incidence.per_capita.0.975 (positive_double) - upper confidence limit of the incidence per capita rate (true scale)
population_unit (double) - The population unit on which the per capita incidence rate is calculated
time_unit (lubridate::as.period) - The time period over which the per capita incidence rate is calculated

Any grouping allowed.

Examples

tmp = ggoutbreak::england_covid %>%
  ggoutbreak::poisson_locfit_model(window=21) %>%
  ggoutbreak::normalise_incidence(ggoutbreak::england_demographics) %>%
  dplyr::glimpse()

tmp = ggoutbreak::england_covid %>%
  ggoutbreak::poisson_locfit_model(window=21) %>%
  ggoutbreak::normalise_incidence(ggoutbreak::england_demographics) %>%
  dplyr::glimpse()

The Beta Distribution

Description

Density, distribution function, quantile function and random generation for the Beta distribution with parameters shape1 and shape2 (and optional non-centrality parameter ncp).

Usage

pbeta2(q, prob, kappa, lower.tail = TRUE, log.p = FALSE)
pbeta2(q, prob, kappa, lower.tail = TRUE, log.p = FALSE)

Arguments

`q`	vector of quantiles
`prob`	the mean probability (vectorised)
`kappa`	a coefficient of variation. where 0 is no variability and 1 is maximally variability (vectorised)
`lower.tail`	logical; if TRUE (default), probabilities are `P[X<=x]` otherwise `P[X>x]`.
`log.p`	logical; if TRUE, probabilities p are given as log(p).

Value

dbeta gives the density, pbeta the distribution function, qbeta the quantile function, and rbeta generates random deviates.

Invalid arguments will result in return value NaN, with a warning.

The length of the result is determined by n for rbeta, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

Examples

pbeta2(c(0.25,0.5,0.75), 0.5, 0.25)
pbeta2(c(0.25,0.5,0.75), 0.5, 0.25)

The Gamma Distribution

Description

Density, distribution function, quantile function and random generation for the Gamma distribution with parameters shape and scale.

Usage

pgamma2(
  q,
  mean,
  sd = sqrt(mean),
  lower.tail = TRUE,
  log.p = FALSE,
  convex = TRUE
)
pgamma2(
  q,
  mean,
  sd = sqrt(mean),
  lower.tail = TRUE,
  log.p = FALSE,
  convex = TRUE
)

Arguments

`q`	vector of quantiles
`mean`	the mean value on the true scale (vectorised)
`sd`	the standard deviation on the true scale (vectorised)
`lower.tail`	logical; if TRUE (default), probabilities are `P[X<=x]` otherwise `P[X>x]`.
`log.p`	logical; if TRUE, probabilities p are given as log(p).
`convex`	Show a warning if the distribution selected is not a convex function

Value

dgamma gives the density, pgamma gives the distribution function, qgamma gives the quantile function, and rgamma generates random deviates.

Invalid arguments will result in return value NaN, with a warning.

The length of the result is determined by n for rgamma, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

Examples

pgamma2(seq(0,4,0.25), 2, 1)
pgamma2(seq(0,4,0.25), 2, 1)

The Log Normal Distribution

Description

Density, distribution function, quantile function and random generation for the log normal distribution whose logarithm has mean equal to meanlog and standard deviation equal to sdlog.

Usage

plnorm2(q, mean = 1, sd = sqrt(exp(1) - 1), lower.tail = TRUE, log.p = FALSE)
plnorm2(q, mean = 1, sd = sqrt(exp(1) - 1), lower.tail = TRUE, log.p = FALSE)

Arguments

`q`	vector of quantiles
`mean`	the mean value on the true scale (vectorised)
`sd`	the standard deviation on the true scale (vectorised)
`lower.tail`	logical; if TRUE (default), probabilities are $P[X \le x]$ , otherwise, $P[X > x]$ .
`log.p`	logical; if TRUE, probabilities p are given as log(p).

Details

The log normal distribution has density

$f(x) = \frac{1}{\sqrt{2\pi}\sigma x} e^{-(\log(x) - \mu)^2/2 \sigma^2}%$

Value

dlnorm gives the density, plnorm gives the distribution function, qlnorm gives the quantile function, and rlnorm generates random deviates.

The length of the result is determined by n for rlnorm, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

Note

The cumulative hazard $H(t) = - \log(1 - F(t))$ is -plnorm(t, r, lower = FALSE, log = TRUE).

Source

dlnorm is calculated from the definition (in ‘Details’). [pqr]lnorm are based on the relationship to the normal.

Consequently, they model a single point mass at exp(meanlog) for the boundary case sdlog = 0.

References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

Examples

plnorm2(seq(0,4,0.25), 2, 1)
plnorm2(seq(0,4,0.25), 2, 1)

Plot a raw case counts as a histogram

Description

Plot a raw case counts as a histogram

Usage

plot_cases(
  raw = i_timestamped,
  ...,
  mapping = .check_for_aes(raw, ..., class_aes = "fill"),
  events = i_events
)
plot_cases(
  raw = i_timestamped,
  ...,
  mapping = .check_for_aes(raw, ..., class_aes = "fill"),
  events = i_events
)

Arguments

`raw`	The raw case data either as a summarised count or as a line-list A dataframe containing the following columns: time (ggoutbreak::time_period) - A set of events with a timestamp as a 'time_period' Any grouping allowed.
`...`	Named arguments passed on to `geom_events` `event_label_size` how big to make the event label `event_label_colour` the event label colour `event_label_angle` the event label colour `event_line_colour` the event line colour `event_fill_colour` the event area fill `hide_labels` do not show labels at all `guide_axis` a guide axis configuration for the labels (see `ggplot2::guide_axis` and `ggplot2::dup_axis`). This can be used to specify a position amongst other things. `...` Named arguments passed on to `ggplot2::scale_x_date` `name` The name of the scale. Used as the axis or legend title. If `waiver()`, the default, the name of the scale is taken from the first mapping used for that aesthetic. If `NULL`, the legend title will be omitted. `breaks` One of: `NULL` for no breaks `waiver()` for the breaks specified by `date_breaks` A `Date`/`POSIXct` vector giving positions of breaks A function that takes the limits as input and returns breaks as output `date_breaks` A string giving the distance between breaks like "2 weeks", or "10 years". If both `breaks` and `date_breaks` are specified, `date_breaks` wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'. `labels` One of: `NULL` for no labels `waiver()` for the default labels computed by the transformation object A character vector giving labels (must be same length as `breaks`) An expression vector (must be the same length as breaks). See ?plotmath for details. A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation. `date_labels` A string giving the formatting specification for the labels. Codes are defined in `strftime()`. If both `labels` and `date_labels` are specified, `date_labels` wins. `minor_breaks` One of: `NULL` for no breaks `waiver()` for the breaks specified by `date_minor_breaks` A `Date`/`POSIXct` vector giving positions of minor breaks A function that takes the limits as input and returns minor breaks as output `date_minor_breaks` A string giving the distance between minor breaks like "2 weeks", or "10 years". If both `minor_breaks` and `date_minor_breaks` are specified, `date_minor_breaks` wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'. `limits` One of: `NULL` to use the default scale range A numeric vector of length two providing limits of the scale. Use `NA` to refer to the existing minimum or maximum A function that accepts the existing (automatic) limits and returns new limits. Also accepts rlang lambda function notation. Note that setting limits on positional scales will remove data outside of the limits. If the purpose is to zoom, use the limit argument in the coordinate system (see `coord_cartesian()`). `expand` For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function `expansion()` to generate the values for the `expand` argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables. `oob` One of: Function that handles limits outside of the scale limits (out of bounds). Also accepts rlang lambda function notation. The default (`scales::censor()`) replaces out of bounds values with `NA`. `scales::squish()` for squishing out of bounds values into range. `scales::squish_infinite()` for squishing infinite values into range. `guide` A function used to create a guide or its name. See `guides()` for more information. `position` For position scales, The position of the axis. `left` or `right` for y axes, `top` or `bottom` for x axes. `sec.axis` `sec_axis()` is used to specify a secondary axis. `timezone` The timezone to use for display on the axes. The default (`NULL`) uses the timezone encoded in the data. `na.value` Missing values will be replaced with this value.
`mapping`	a `ggplot2::aes` mapping. Most importantly setting the `fill` to something if there are multiple types of event in the plot. If a `class` column is present the mapping will default to using this.
`events`	Significant events or time spans A dataframe containing the following columns: label (character) - the event label start (date) - the start date, or the date of the event end (date) - the end date or NA if a single event Any grouping allowed. A default value is defined.

Value

a ggplot object

Examples


# 50 random times:
tmp = tibble::tibble(
  time = as.time_period( sample.int(10,50,replace=TRUE) ,"1 day"),
  class = rep(c("one","two","three"), length.out=50)
)

if(interactive()) {
  plot_cases(tmp)
}
# 50 random times:
tmp = tibble::tibble(
  time = as.time_period( sample.int(10,50,replace=TRUE) ,"1 day"),
  class = rep(c("one","two","three"), length.out=50)
)

if(interactive()) {
  plot_cases(tmp)
}

Plot a raw case count timeseries

Description

Plot a raw case count timeseries

Usage

plot_counts(
  raw = i_incidence_data,
  ...,
  mapping = .check_for_aes(raw, ...),
  events = i_events
)
plot_counts(
  raw = i_incidence_data,
  ...,
  mapping = .check_for_aes(raw, ...),
  events = i_events
)

Arguments

`raw`	The raw count data, or the raw count data normalised by population ( see `normalise_count()`) A dataframe containing the following columns: count (positive_integer) - Positive case counts associated with the specified time frame time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period' Any grouping allowed.
`...`	Named arguments passed on to `geom_events` `event_label_size` how big to make the event label `event_label_colour` the event label colour `event_label_angle` the event label colour `event_line_colour` the event line colour `event_fill_colour` the event area fill `hide_labels` do not show labels at all `guide_axis` a guide axis configuration for the labels (see `ggplot2::guide_axis` and `ggplot2::dup_axis`). This can be used to specify a position amongst other things. `...` Named arguments passed on to `ggplot2::scale_x_date` `name` The name of the scale. Used as the axis or legend title. If `waiver()`, the default, the name of the scale is taken from the first mapping used for that aesthetic. If `NULL`, the legend title will be omitted. `breaks` One of: `NULL` for no breaks `waiver()` for the breaks specified by `date_breaks` A `Date`/`POSIXct` vector giving positions of breaks A function that takes the limits as input and returns breaks as output `date_breaks` A string giving the distance between breaks like "2 weeks", or "10 years". If both `breaks` and `date_breaks` are specified, `date_breaks` wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'. `labels` One of: `NULL` for no labels `waiver()` for the default labels computed by the transformation object A character vector giving labels (must be same length as `breaks`) An expression vector (must be the same length as breaks). See ?plotmath for details. A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation. `date_labels` A string giving the formatting specification for the labels. Codes are defined in `strftime()`. If both `labels` and `date_labels` are specified, `date_labels` wins. `minor_breaks` One of: `NULL` for no breaks `waiver()` for the breaks specified by `date_minor_breaks` A `Date`/`POSIXct` vector giving positions of minor breaks A function that takes the limits as input and returns minor breaks as output `date_minor_breaks` A string giving the distance between minor breaks like "2 weeks", or "10 years". If both `minor_breaks` and `date_minor_breaks` are specified, `date_minor_breaks` wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'. `limits` One of: `NULL` to use the default scale range A numeric vector of length two providing limits of the scale. Use `NA` to refer to the existing minimum or maximum A function that accepts the existing (automatic) limits and returns new limits. Also accepts rlang lambda function notation. Note that setting limits on positional scales will remove data outside of the limits. If the purpose is to zoom, use the limit argument in the coordinate system (see `coord_cartesian()`). `expand` For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function `expansion()` to generate the values for the `expand` argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables. `oob` One of: Function that handles limits outside of the scale limits (out of bounds). Also accepts rlang lambda function notation. The default (`scales::censor()`) replaces out of bounds values with `NA`. `scales::squish()` for squishing out of bounds values into range. `scales::squish_infinite()` for squishing infinite values into range. `guide` A function used to create a guide or its name. See `guides()` for more information. `position` For position scales, The position of the axis. `left` or `right` for y axes, `top` or `bottom` for x axes. `sec.axis` `sec_axis()` is used to specify a secondary axis. `timezone` The timezone to use for display on the axes. The default (`NULL`) uses the timezone encoded in the data. `na.value` Missing values will be replaced with this value.
`mapping`	a `ggplot2::aes` mapping. Most importantly setting the `colour` to something if there are multiple incidence timeseries in the plot
`events`	Significant events or time spans A dataframe containing the following columns: label (character) - the event label start (date) - the start date, or the date of the event end (date) - the end date or NA if a single event Any grouping allowed. A default value is defined.

Value

a ggplot object

Examples

# example code

tmp = ggoutbreak::england_covid %>%
  time_aggregate(count=sum(count)) %>%
  normalise_count(pop=56489700, population_unit=1000, normalise_time=TRUE)

# normalised by England population (56489700 people)

if(interactive()) {
  plot_counts(tmp, colour="blue",size=0.25)
}

# example code

tmp = ggoutbreak::england_covid %>%
  time_aggregate(count=sum(count)) %>%
  normalise_count(pop=56489700, population_unit=1000, normalise_time=TRUE)

# normalised by England population (56489700 people)

if(interactive()) {
  plot_counts(tmp, colour="blue",size=0.25)
}

Plot an incidence or proportion versus growth phase diagram

Description

Plot an incidence or proportion versus growth phase diagram

Usage

plot_growth_phase(
  modelled = i_timestamped,
  timepoints = NULL,
  duration = max(dplyr::count(modelled)$n),
  interval = 7,
  mapping = if (interfacer::is_col_present(modelled, class)) ggplot2::aes(colour = class)
    else ggplot2::aes(),
  cis = TRUE,
  ...
)
plot_growth_phase(
  modelled = i_timestamped,
  timepoints = NULL,
  duration = max(dplyr::count(modelled)$n),
  interval = 7,
  mapping = if (interfacer::is_col_present(modelled, class)) ggplot2::aes(colour = class)
    else ggplot2::aes(),
  cis = TRUE,
  ...
)

Arguments

`modelled`	Either: A dataframe containing the following columns: time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a `time_period` incidence.fit (double) - an estimate of the incidence rate on a log scale incidence.se.fit (positive_double) - the standard error of the incidence rate estimate on a log scale incidence.0.025 (positive_double) - lower confidence limit of the incidence rate (true scale) incidence.0.5 (positive_double) - median estimate of the incidence rate (true scale) incidence.0.975 (positive_double) - upper confidence limit of the incidence rate (true scale) growth.fit (double) - an estimate of the growth rate growth.se.fit (positive_double) - the standard error the growth rate growth.0.025 (double) - lower confidence limit of the growth rate growth.0.5 (double) - median estimate of the growth rate growth.0.975 (double) - upper confidence limit of the growth rate Any grouping allowed. OR: A dataframe containing the following columns: time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a `time_period` proportion.fit (double) - an estimate of the proportion on a logit scale proportion.se.fit (positive_double) - the standard error of proportion estimate on a logit scale proportion.0.025 (proportion) - lower confidence limit of proportion (true scale) proportion.0.5 (proportion) - median estimate of proportion (true scale) proportion.0.975 (proportion) - upper confidence limit of proportion (true scale) relative.growth.fit (double) - an estimate of the relative growth rate relative.growth.se.fit (positive_double) - the standard error the relative growth rate relative.growth.0.025 (double) - lower confidence limit of the relative growth rate relative.growth.0.5 (double) - median estimate of the relative growth rate relative.growth.0.975 (double) - upper confidence limit of the relative growth rate Any grouping allowed.
`timepoints`	time points (as `Date` or `time_period` vector) of dates to plot phase diagrams. If multiple this will result in a sequence of plots as facets. If `NULL` (the default) it will be the last time point in the series
`duration`	the length of the growth rate phase trail
`interval`	the length of time between markers on the phase plot
`mapping`	a `ggplot2::aes()` mapping
`cis`	logical; should the phases be marked with confidence intervals?
`...`	Named arguments passed on to `geom_events` `events` Significant events or time spans A dataframe containing the following columns: label (character) - the event label start (date) - the start date, or the date of the event end (date) - the end date or NA if a single event Any grouping allowed. A default value is defined. `event_label_size` how big to make the event label `event_label_colour` the event label colour `event_label_angle` the event label colour `event_line_colour` the event line colour `event_fill_colour` the event area fill `hide_labels` do not show labels at all `guide_axis` a guide axis configuration for the labels (see `ggplot2::guide_axis` and `ggplot2::dup_axis`). This can be used to specify a position amongst other things. `...` Named arguments passed on to `ggplot2::scale_x_date` `name` The name of the scale. Used as the axis or legend title. If `waiver()`, the default, the name of the scale is taken from the first mapping used for that aesthetic. If `NULL`, the legend title will be omitted. `breaks` One of: `NULL` for no breaks `waiver()` for the breaks specified by `date_breaks` A `Date`/`POSIXct` vector giving positions of breaks A function that takes the limits as input and returns breaks as output `date_breaks` A string giving the distance between breaks like "2 weeks", or "10 years". If both `breaks` and `date_breaks` are specified, `date_breaks` wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'. `labels` One of: `NULL` for no labels `waiver()` for the default labels computed by the transformation object A character vector giving labels (must be same length as `breaks`) An expression vector (must be the same length as breaks). See ?plotmath for details. A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation. `date_labels` A string giving the formatting specification for the labels. Codes are defined in `strftime()`. If both `labels` and `date_labels` are specified, `date_labels` wins. `minor_breaks` One of: `NULL` for no breaks `waiver()` for the breaks specified by `date_minor_breaks` A `Date`/`POSIXct` vector giving positions of minor breaks A function that takes the limits as input and returns minor breaks as output `date_minor_breaks` A string giving the distance between minor breaks like "2 weeks", or "10 years". If both `minor_breaks` and `date_minor_breaks` are specified, `date_minor_breaks` wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'. `limits` One of: `NULL` to use the default scale range A numeric vector of length two providing limits of the scale. Use `NA` to refer to the existing minimum or maximum A function that accepts the existing (automatic) limits and returns new limits. Also accepts rlang lambda function notation. Note that setting limits on positional scales will remove data outside of the limits. If the purpose is to zoom, use the limit argument in the coordinate system (see `coord_cartesian()`). `expand` For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function `expansion()` to generate the values for the `expand` argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables. `oob` One of: Function that handles limits outside of the scale limits (out of bounds). Also accepts rlang lambda function notation. The default (`scales::censor()`) replaces out of bounds values with `NA`. `scales::squish()` for squishing out of bounds values into range. `scales::squish_infinite()` for squishing infinite values into range. `guide` A function used to create a guide or its name. See `guides()` for more information. `position` For position scales, The position of the axis. `left` or `right` for y axes, `top` or `bottom` for x axes. `sec.axis` `sec_axis()` is used to specify a secondary axis. `timezone` The timezone to use for display on the axes. The default (`NULL`) uses the timezone encoded in the data. `na.value` Missing values will be replaced with this value. Named arguments passed on to `ggplot2::facet_wrap` `facets` A set of variables or expressions quoted by `vars()` and defining faceting groups on the rows or columns dimension. The variables can be named (the names are passed to `labeller`). For compatibility with the classic interface, can also be a formula or character vector. Use either a one sided formula, `~a + b`, or a character vector, `c("a", "b")`. `nrow,ncol` Number of rows and columns. `scales` Should scales be fixed (`"fixed"`, the default), free (`"free"`), or free in one dimension (`"free_x"`, `"free_y"`)? `shrink` If `TRUE`, will shrink scales to fit output of statistics, not raw data. If `FALSE`, will be range of raw data before statistical summary. `labeller` A function that takes one data frame of labels and returns a list or data frame of character vectors. Each input column corresponds to one factor. Thus there will be more than one with `vars(cyl, am)`. Each output column gets displayed as one separate line in the strip label. This function should inherit from the "labeller" S3 class for compatibility with `labeller()`. You can use different labeling functions for different kind of labels, for example use `label_parsed()` for formatting facet labels. `label_value()` is used by default, check it for more details and pointers to other options. `as.table` If `TRUE`, the default, the facets are laid out like a table with highest values at the bottom-right. If `FALSE`, the facets are laid out like a plot with the highest value at the top-right. `switch` By default, the labels are displayed on the top and right of the plot. If `"x"`, the top labels will be displayed to the bottom. If `"y"`, the right-hand side labels will be displayed to the left. Can also be set to `"both"`. `drop` If `TRUE`, the default, all factor levels not used in the data will automatically be dropped. If `FALSE`, all factor levels will be shown, regardless of whether or not they appear in the data. `dir` Direction: either `"h"` for horizontal, the default, or `"v"`, for vertical. `strip.position` By default, the labels are displayed on the top of the plot. Using `strip.position` it is possible to place the labels on either of the four sides by setting `strip.position = c("top", "bottom", "left", "right")` `axes` Determines which axes will be drawn in case of fixed scales. When `"margins"` (default), axes will be drawn at the exterior margins. `"all_x"` and `"all_y"` will draw the respective axes at the interior panels too, whereas `"all"` will draw all axes at all panels. `axis.labels` Determines whether to draw labels for interior axes when the scale is fixed and the `axis` argument is not `"margins"`. When `"all"` (default), all interior axes get labels. When `"margins"`, only the exterior axes get labels, and the interior axes get none. When `"all_x"` or `"all_y"`, only draws the labels at the interior axes in the x- or y-direction respectively.

Value

a ggplot

Examples


tmp = ggoutbreak::england_covid %>%
  time_aggregate(count=sum(count))

tmp_pop = ggoutbreak::england_demographics %>%
  dplyr::ungroup() %>%
  dplyr::summarise(population = sum(population))

# If the incidence is normalised by population
tmp2 = tmp %>%
  poisson_locfit_model() %>%
  normalise_incidence(tmp_pop)

timepoints = as.Date(c("Lockdown 1" = "2020-03-30", "Lockdown 2" = "2020-12-31"))

if(interactive()) {
  plot_growth_phase(tmp2, timepoints, duration=108)
}
tmp = ggoutbreak::england_covid %>%
  time_aggregate(count=sum(count))

tmp_pop = ggoutbreak::england_demographics %>%
  dplyr::ungroup() %>%
  dplyr::summarise(population = sum(population))

# If the incidence is normalised by population
tmp2 = tmp %>%
  poisson_locfit_model() %>%
  normalise_incidence(tmp_pop)

timepoints = as.Date(c("Lockdown 1" = "2020-03-30", "Lockdown 2" = "2020-12-31"))

if(interactive()) {
  plot_growth_phase(tmp2, timepoints, duration=108)
}

Growth rate timeseries diagram

Description

Growth rate timeseries diagram

Usage

plot_growth_rate(
  modelled = i_timeseries,
  ...,
  mapping = .check_for_aes(modelled, ...),
  events = i_events
)
plot_growth_rate(
  modelled = i_timeseries,
  ...,
  mapping = .check_for_aes(modelled, ...),
  events = i_events
)

Arguments

`modelled`	Either: A dataframe containing the following columns: time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a `time_period` incidence.fit (double) - an estimate of the incidence rate on a log scale incidence.se.fit (positive_double) - the standard error of the incidence rate estimate on a log scale incidence.0.025 (positive_double) - lower confidence limit of the incidence rate (true scale) incidence.0.5 (positive_double) - median estimate of the incidence rate (true scale) incidence.0.975 (positive_double) - upper confidence limit of the incidence rate (true scale) growth.fit (double) - an estimate of the growth rate growth.se.fit (positive_double) - the standard error the growth rate growth.0.025 (double) - lower confidence limit of the growth rate growth.0.5 (double) - median estimate of the growth rate growth.0.975 (double) - upper confidence limit of the growth rate Any grouping allowed. OR: A dataframe containing the following columns: time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a `time_period` proportion.fit (double) - an estimate of the proportion on a logit scale proportion.se.fit (positive_double) - the standard error of proportion estimate on a logit scale proportion.0.025 (proportion) - lower confidence limit of proportion (true scale) proportion.0.5 (proportion) - median estimate of proportion (true scale) proportion.0.975 (proportion) - upper confidence limit of proportion (true scale) relative.growth.fit (double) - an estimate of the relative growth rate relative.growth.se.fit (positive_double) - the standard error the relative growth rate relative.growth.0.025 (double) - lower confidence limit of the relative growth rate relative.growth.0.5 (double) - median estimate of the relative growth rate relative.growth.0.975 (double) - upper confidence limit of the relative growth rate Any grouping allowed.
`...`	Named arguments passed on to `geom_events` `event_label_size` how big to make the event label `event_label_colour` the event label colour `event_label_angle` the event label colour `event_line_colour` the event line colour `event_fill_colour` the event area fill `hide_labels` do not show labels at all `guide_axis` a guide axis configuration for the labels (see `ggplot2::guide_axis` and `ggplot2::dup_axis`). This can be used to specify a position amongst other things. `...` Named arguments passed on to `ggplot2::scale_x_date` `name` The name of the scale. Used as the axis or legend title. If `waiver()`, the default, the name of the scale is taken from the first mapping used for that aesthetic. If `NULL`, the legend title will be omitted. `breaks` One of: `NULL` for no breaks `waiver()` for the breaks specified by `date_breaks` A `Date`/`POSIXct` vector giving positions of breaks A function that takes the limits as input and returns breaks as output `date_breaks` A string giving the distance between breaks like "2 weeks", or "10 years". If both `breaks` and `date_breaks` are specified, `date_breaks` wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'. `labels` One of: `NULL` for no labels `waiver()` for the default labels computed by the transformation object A character vector giving labels (must be same length as `breaks`) An expression vector (must be the same length as breaks). See ?plotmath for details. A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation. `date_labels` A string giving the formatting specification for the labels. Codes are defined in `strftime()`. If both `labels` and `date_labels` are specified, `date_labels` wins. `minor_breaks` One of: `NULL` for no breaks `waiver()` for the breaks specified by `date_minor_breaks` A `Date`/`POSIXct` vector giving positions of minor breaks A function that takes the limits as input and returns minor breaks as output `date_minor_breaks` A string giving the distance between minor breaks like "2 weeks", or "10 years". If both `minor_breaks` and `date_minor_breaks` are specified, `date_minor_breaks` wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'. `limits` One of: `NULL` to use the default scale range A numeric vector of length two providing limits of the scale. Use `NA` to refer to the existing minimum or maximum A function that accepts the existing (automatic) limits and returns new limits. Also accepts rlang lambda function notation. Note that setting limits on positional scales will remove data outside of the limits. If the purpose is to zoom, use the limit argument in the coordinate system (see `coord_cartesian()`). `expand` For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function `expansion()` to generate the values for the `expand` argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables. `oob` One of: Function that handles limits outside of the scale limits (out of bounds). Also accepts rlang lambda function notation. The default (`scales::censor()`) replaces out of bounds values with `NA`. `scales::squish()` for squishing out of bounds values into range. `scales::squish_infinite()` for squishing infinite values into range. `guide` A function used to create a guide or its name. See `guides()` for more information. `position` For position scales, The position of the axis. `left` or `right` for y axes, `top` or `bottom` for x axes. `sec.axis` `sec_axis()` is used to specify a secondary axis. `timezone` The timezone to use for display on the axes. The default (`NULL`) uses the timezone encoded in the data. `na.value` Missing values will be replaced with this value.
`mapping`	a `ggplot2::aes` mapping. Most importantly setting the `colour` to something if there are multiple incidence time series in the plot
`events`	Significant events or time spans A dataframe containing the following columns: label (character) - the event label start (date) - the start date, or the date of the event end (date) - the end date or NA if a single event Any grouping allowed. A default value is defined.

Value

a ggplot

Examples

# example code
tmp = ggoutbreak::england_covid %>%
  time_aggregate(count=sum(count))

tmp_pop = ggoutbreak::england_demographics %>%
  dplyr::ungroup() %>%
  dplyr::summarise(population = sum(population))

# If the incidence is normalised by population
tmp2 = tmp %>%
  poisson_locfit_model() %>%
  normalise_incidence(tmp_pop)

if(interactive()) {
  plot_growth_rate(tmp2,colour="blue")
}

tmp3 = ggoutbreak::england_covid %>%
  proportion_locfit_model()

if(interactive()) {
  plot_growth_rate(tmp3)
}

# example code
tmp = ggoutbreak::england_covid %>%
  time_aggregate(count=sum(count))

tmp_pop = ggoutbreak::england_demographics %>%
  dplyr::ungroup() %>%
  dplyr::summarise(population = sum(population))

# If the incidence is normalised by population
tmp2 = tmp %>%
  poisson_locfit_model() %>%
  normalise_incidence(tmp_pop)

if(interactive()) {
  plot_growth_rate(tmp2,colour="blue")
}

tmp3 = ggoutbreak::england_covid %>%
  proportion_locfit_model()

if(interactive()) {
  plot_growth_rate(tmp3)
}

Plot an incidence timeseries

Description

Plot an incidence timeseries

Usage

plot_incidence(
  modelled = i_incidence_model,
  raw = i_incidence_data,
  ...,
  mapping = .check_for_aes(modelled, ...),
  events = i_events
)
plot_incidence(
  modelled = i_incidence_model,
  raw = i_incidence_data,
  ...,
  mapping = .check_for_aes(modelled, ...),
  events = i_events
)

Arguments

`modelled`	An optional estimate of the incidence time series. If `modelled` is missing then it is estimated from `raw` using a `poisson_locfit_model`. In this case parameters `window` and `deg` may be supplied to control the fit. A dataframe containing the following columns: time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a `time_period` incidence.fit (double) - an estimate of the incidence rate on a log scale incidence.se.fit (positive_double) - the standard error of the incidence rate estimate on a log scale incidence.0.025 (positive_double) - lower confidence limit of the incidence rate (true scale) incidence.0.5 (positive_double) - median estimate of the incidence rate (true scale) incidence.0.975 (positive_double) - upper confidence limit of the incidence rate (true scale) Any grouping allowed. `modelled` can also be the output from `normalise_incidence` in which case the plot uses the per capita rates calculated by that function
`raw`	The raw count data A dataframe containing the following columns: count (positive_integer) - Positive case counts associated with the specified time frame time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period' Any grouping allowed.
`...`	Named arguments passed on to `geom_events` `event_label_size` how big to make the event label `event_label_colour` the event label colour `event_label_angle` the event label colour `event_line_colour` the event line colour `event_fill_colour` the event area fill `hide_labels` do not show labels at all `guide_axis` a guide axis configuration for the labels (see `ggplot2::guide_axis` and `ggplot2::dup_axis`). This can be used to specify a position amongst other things. `...` Named arguments passed on to `ggplot2::scale_x_date` `name` The name of the scale. Used as the axis or legend title. If `waiver()`, the default, the name of the scale is taken from the first mapping used for that aesthetic. If `NULL`, the legend title will be omitted. `breaks` One of: `NULL` for no breaks `waiver()` for the breaks specified by `date_breaks` A `Date`/`POSIXct` vector giving positions of breaks A function that takes the limits as input and returns breaks as output `date_breaks` A string giving the distance between breaks like "2 weeks", or "10 years". If both `breaks` and `date_breaks` are specified, `date_breaks` wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'. `labels` One of: `NULL` for no labels `waiver()` for the default labels computed by the transformation object A character vector giving labels (must be same length as `breaks`) An expression vector (must be the same length as breaks). See ?plotmath for details. A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation. `date_labels` A string giving the formatting specification for the labels. Codes are defined in `strftime()`. If both `labels` and `date_labels` are specified, `date_labels` wins. `minor_breaks` One of: `NULL` for no breaks `waiver()` for the breaks specified by `date_minor_breaks` A `Date`/`POSIXct` vector giving positions of minor breaks A function that takes the limits as input and returns minor breaks as output `date_minor_breaks` A string giving the distance between minor breaks like "2 weeks", or "10 years". If both `minor_breaks` and `date_minor_breaks` are specified, `date_minor_breaks` wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'. `limits` One of: `NULL` to use the default scale range A numeric vector of length two providing limits of the scale. Use `NA` to refer to the existing minimum or maximum A function that accepts the existing (automatic) limits and returns new limits. Also accepts rlang lambda function notation. Note that setting limits on positional scales will remove data outside of the limits. If the purpose is to zoom, use the limit argument in the coordinate system (see `coord_cartesian()`). `expand` For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function `expansion()` to generate the values for the `expand` argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables. `oob` One of: Function that handles limits outside of the scale limits (out of bounds). Also accepts rlang lambda function notation. The default (`scales::censor()`) replaces out of bounds values with `NA`. `scales::squish()` for squishing out of bounds values into range. `scales::squish_infinite()` for squishing infinite values into range. `guide` A function used to create a guide or its name. See `guides()` for more information. `position` For position scales, The position of the axis. `left` or `right` for y axes, `top` or `bottom` for x axes. `sec.axis` `sec_axis()` is used to specify a secondary axis. `timezone` The timezone to use for display on the axes. The default (`NULL`) uses the timezone encoded in the data. `na.value` Missing values will be replaced with this value. Named arguments passed on to `poisson_locfit_model` `d` input data A dataframe containing the following columns: count (positive_integer) - Positive case counts associated with the specified time frame time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period' Ungrouped. `...` not used and present to allow proportion model to be used in a `group_modify` `window` a number of data points defining the bandwidth of the estimate, smaller values result in less smoothing, large value in more. The default value of 14 is calibrated for data provided on a daily frequency, with weekly data a lower value may be preferred. - (defaults to `14`) `deg` polynomial degree (min 1) - higher degree results in less smoothing, lower values result in more smoothing. A degree of 1 is fitting a linear model piece wise. - (defaults to `1`) `frequency` the density of the output estimates as a time period such as `⁠7 days⁠` or `⁠2 weeks⁠`. - (defaults to `"1 day"`) `predict` result is a prediction dataframe. If false we return the `locfit` models (advanced). - (defaults to `TRUE`)
`mapping`	a `ggplot2::aes` mapping. Most importantly setting the `colour` to something if there are multiple incidence timeseries in the plot
`events`	Significant events or time spans A dataframe containing the following columns: label (character) - the event label start (date) - the start date, or the date of the event end (date) - the end date or NA if a single event Any grouping allowed. A default value is defined.

Value

a ggplot object

Examples

# example code

tmp = ggoutbreak::england_covid %>%
  time_aggregate(count=sum(count)) %>%
  normalise_count(56489700, population_unit=1000, normalise_time="1 year")

tmp2 = tmp %>% poisson_locfit_model()

if(interactive()) {
  plot_incidence(tmp2,tmp,colour="blue",size=0.25)
}

# example code

tmp = ggoutbreak::england_covid %>%
  time_aggregate(count=sum(count)) %>%
  normalise_count(56489700, population_unit=1000, normalise_time="1 year")

tmp2 = tmp %>% poisson_locfit_model()

if(interactive()) {
  plot_incidence(tmp2,tmp,colour="blue",size=0.25)
}

Plot an infectivity profile

Description

Plot an infectivity profile

Usage

plot_ip(ip = i_empirical_ip, alpha = 0.1, ...)
plot_ip(ip = i_empirical_ip, alpha = 0.1, ...)

Arguments

ip

A long format infectivity profile

A dataframe containing the following columns:

boot (anything + default(1)) - a bootstrap identifier
probability (proportion) - the probability of new event during this period.
a0 (double) - the beginning of the time period (in days)
a1 (double) - the end of the time period (in days)

Must be grouped by: boot (exactly).

A default value is defined.

alpha

the alpha value of the bootstrap lines

...

passed onto geom_segment for controlling line thickness, alpha etc.

Value

an ggplot object

Examples

if(interactive()) {
  plot_ip(ganyani_ip,alpha=0.1)
}
if(interactive()) {
  plot_ip(ganyani_ip,alpha=0.1)
}

Plot a multinomial proportions mode

Description

Plot a multinomial proportions mode

Usage

plot_multinomial(
  modelled = i_multinomial_proportion_model,
  ...,
  mapping = ggplot2::aes(fill = class),
  events = i_events,
  normalise = FALSE
)
plot_multinomial(
  modelled = i_multinomial_proportion_model,
  ...,
  mapping = ggplot2::aes(fill = class),
  events = i_events,
  normalise = FALSE
)

Arguments

`modelled`	the multinomial count data A dataframe containing the following columns: time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period' class (factor) - A factor specifying the type of observation. This will be things like variant, or serotype, for a multinomial model. Any missing data points are ignored. proportion.0.5 (proportion) - median estimate of proportion (true scale) Must be grouped by: class (exactly).
`...`	Named arguments passed on to `geom_events` `event_label_size` how big to make the event label `event_label_colour` the event label colour `event_label_angle` the event label colour `event_line_colour` the event line colour `event_fill_colour` the event area fill `hide_labels` do not show labels at all `guide_axis` a guide axis configuration for the labels (see `ggplot2::guide_axis` and `ggplot2::dup_axis`). This can be used to specify a position amongst other things. `...` Named arguments passed on to `ggplot2::scale_x_date` `name` The name of the scale. Used as the axis or legend title. If `waiver()`, the default, the name of the scale is taken from the first mapping used for that aesthetic. If `NULL`, the legend title will be omitted. `breaks` One of: `NULL` for no breaks `waiver()` for the breaks specified by `date_breaks` A `Date`/`POSIXct` vector giving positions of breaks A function that takes the limits as input and returns breaks as output `date_breaks` A string giving the distance between breaks like "2 weeks", or "10 years". If both `breaks` and `date_breaks` are specified, `date_breaks` wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'. `labels` One of: `NULL` for no labels `waiver()` for the default labels computed by the transformation object A character vector giving labels (must be same length as `breaks`) An expression vector (must be the same length as breaks). See ?plotmath for details. A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation. `date_labels` A string giving the formatting specification for the labels. Codes are defined in `strftime()`. If both `labels` and `date_labels` are specified, `date_labels` wins. `minor_breaks` One of: `NULL` for no breaks `waiver()` for the breaks specified by `date_minor_breaks` A `Date`/`POSIXct` vector giving positions of minor breaks A function that takes the limits as input and returns minor breaks as output `date_minor_breaks` A string giving the distance between minor breaks like "2 weeks", or "10 years". If both `minor_breaks` and `date_minor_breaks` are specified, `date_minor_breaks` wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'. `limits` One of: `NULL` to use the default scale range A numeric vector of length two providing limits of the scale. Use `NA` to refer to the existing minimum or maximum A function that accepts the existing (automatic) limits and returns new limits. Also accepts rlang lambda function notation. Note that setting limits on positional scales will remove data outside of the limits. If the purpose is to zoom, use the limit argument in the coordinate system (see `coord_cartesian()`). `expand` For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function `expansion()` to generate the values for the `expand` argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables. `oob` One of: Function that handles limits outside of the scale limits (out of bounds). Also accepts rlang lambda function notation. The default (`scales::censor()`) replaces out of bounds values with `NA`. `scales::squish()` for squishing out of bounds values into range. `scales::squish_infinite()` for squishing infinite values into range. `guide` A function used to create a guide or its name. See `guides()` for more information. `position` For position scales, The position of the axis. `left` or `right` for y axes, `top` or `bottom` for x axes. `sec.axis` `sec_axis()` is used to specify a secondary axis. `timezone` The timezone to use for display on the axes. The default (`NULL`) uses the timezone encoded in the data. `na.value` Missing values will be replaced with this value.
`mapping`	a `ggplot2::aes` mapping. Most importantly setting the `colour` to something if there are multiple incidence timeseries in the plot
`events`	Significant events or time spans A dataframe containing the following columns: label (character) - the event label start (date) - the start date, or the date of the event end (date) - the end date or NA if a single event Any grouping allowed. A default value is defined.
`normalise`	make sure the probabilities add up to one - this can be a bad idea if you know you may have missing values.

Value

a ggplot

Examples

tmp = ggoutbreak::england_covid %>%
  ggoutbreak::proportion_locfit_model(window=21) %>%
  dplyr::glimpse()

if(interactive()) {
  plot_multinomial(tmp, normalise=TRUE)+
    ggplot2::scale_fill_viridis_d()
}

tmp = ggoutbreak::england_covid %>%
  ggoutbreak::proportion_locfit_model(window=21) %>%
  dplyr::glimpse()

if(interactive()) {
  plot_multinomial(tmp, normalise=TRUE)+
    ggplot2::scale_fill_viridis_d()
}

Plot a proportions timeseries

Description

Plot a proportions timeseries

Usage

plot_prevalence(
  modelled = i_prevalence_model,
  raw = i_proportion_data,
  ...,
  mapping = .check_for_aes(modelled, ...),
  events = i_events
)
plot_prevalence(
  modelled = i_prevalence_model,
  raw = i_proportion_data,
  ...,
  mapping = .check_for_aes(modelled, ...),
  events = i_events
)

Arguments

`modelled`	Proportion model estimates A dataframe containing the following columns: time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period' prevalence.0.025 (proportion) - lower confidence limit of prevalence (true scale) prevalence.0.5 (proportion) - median estimate of prevalence (true scale) prevalence.0.975 (proportion) - upper confidence limit of prevalence (true scale) Any grouping allowed.
`raw`	Raw count data A dataframe containing the following columns: denom (positive_integer) - Total test counts associated with the specified time frame count (positive_integer) - Positive case counts associated with the specified time frame time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period' Any grouping allowed.
`...`	Named arguments passed on to `geom_events` `event_label_size` how big to make the event label `event_label_colour` the event label colour `event_label_angle` the event label colour `event_line_colour` the event line colour `event_fill_colour` the event area fill `hide_labels` do not show labels at all `guide_axis` a guide axis configuration for the labels (see `ggplot2::guide_axis` and `ggplot2::dup_axis`). This can be used to specify a position amongst other things. `...` Named arguments passed on to `ggplot2::scale_x_date` `name` The name of the scale. Used as the axis or legend title. If `waiver()`, the default, the name of the scale is taken from the first mapping used for that aesthetic. If `NULL`, the legend title will be omitted. `breaks` One of: `NULL` for no breaks `waiver()` for the breaks specified by `date_breaks` A `Date`/`POSIXct` vector giving positions of breaks A function that takes the limits as input and returns breaks as output `date_breaks` A string giving the distance between breaks like "2 weeks", or "10 years". If both `breaks` and `date_breaks` are specified, `date_breaks` wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'. `labels` One of: `NULL` for no labels `waiver()` for the default labels computed by the transformation object A character vector giving labels (must be same length as `breaks`) An expression vector (must be the same length as breaks). See ?plotmath for details. A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation. `date_labels` A string giving the formatting specification for the labels. Codes are defined in `strftime()`. If both `labels` and `date_labels` are specified, `date_labels` wins. `minor_breaks` One of: `NULL` for no breaks `waiver()` for the breaks specified by `date_minor_breaks` A `Date`/`POSIXct` vector giving positions of minor breaks A function that takes the limits as input and returns minor breaks as output `date_minor_breaks` A string giving the distance between minor breaks like "2 weeks", or "10 years". If both `minor_breaks` and `date_minor_breaks` are specified, `date_minor_breaks` wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'. `limits` One of: `NULL` to use the default scale range A numeric vector of length two providing limits of the scale. Use `NA` to refer to the existing minimum or maximum A function that accepts the existing (automatic) limits and returns new limits. Also accepts rlang lambda function notation. Note that setting limits on positional scales will remove data outside of the limits. If the purpose is to zoom, use the limit argument in the coordinate system (see `coord_cartesian()`). `expand` For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function `expansion()` to generate the values for the `expand` argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables. `oob` One of: Function that handles limits outside of the scale limits (out of bounds). Also accepts rlang lambda function notation. The default (`scales::censor()`) replaces out of bounds values with `NA`. `scales::squish()` for squishing out of bounds values into range. `scales::squish_infinite()` for squishing infinite values into range. `guide` A function used to create a guide or its name. See `guides()` for more information. `position` For position scales, The position of the axis. `left` or `right` for y axes, `top` or `bottom` for x axes. `sec.axis` `sec_axis()` is used to specify a secondary axis. `timezone` The timezone to use for display on the axes. The default (`NULL`) uses the timezone encoded in the data. `na.value` Missing values will be replaced with this value. Named arguments passed on to `proportion_locfit_model` `d` the input A dataframe containing the following columns: denom (positive_integer) - Total test counts associated with the specified time frame count (positive_integer) - Positive case counts associated with the specified time frame time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period' Ungrouped. `...` not used and present to allow proportion model to be used in a `group_modify` `window` a number of data points defining the bandwidth of the estimate, smaller values result in less smoothing, large value in more. The default value of 14 is calibrated for data provided on a daily frequency, with weekly data a lower value may be preferred. - (defaults to `14`) `deg` polynomial degree (min 1) - higher degree results in less smoothing, lower values result in more smoothing. A degree of 1 is fitting a linear model piece wise. - (defaults to `1`) `frequency` the density of the output estimates as a time period such as `⁠7 days⁠` or `⁠2 weeks⁠`. - (defaults to `"1 day"`) `predict` result is a prediction dataframe. If false we return the `locfit` models (advanced). - (defaults to `TRUE`)
`mapping`	a `ggplot2::aes` mapping. Most importantly setting the `colour` to something if there are multiple incidence timeseries in the plot
`events`	Significant events or time spans A dataframe containing the following columns: label (character) - the event label start (date) - the start date, or the date of the event end (date) - the end date or NA if a single event Any grouping allowed. A default value is defined.

Value

a ggplot object

Examples

if(interactive()) {
  plot_prevalence(
    ggoutbreak::england_ons_infection_survey,
    mapping = ggplot2::aes(colour=geography))
}
if(interactive()) {
  plot_prevalence(
    ggoutbreak::england_ons_infection_survey,
    mapping = ggplot2::aes(colour=geography))
}

Plot a proportions timeseries

Description

Plot a proportions timeseries

Usage

plot_proportion(
  modelled = i_proportion_model,
  raw = i_proportion_data,
  ...,
  mapping = .check_for_aes(modelled, ...),
  events = i_events
)
plot_proportion(
  modelled = i_proportion_model,
  raw = i_proportion_data,
  ...,
  mapping = .check_for_aes(modelled, ...),
  events = i_events
)

Arguments

`modelled`	Proportion model estimates A dataframe containing the following columns: time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period' proportion.fit (double) - an estimate of the proportion on a logit scale proportion.se.fit (positive_double) - the standard error of proportion estimate on a logit scale proportion.0.025 (proportion) - lower confidence limit of proportion (true scale) proportion.0.5 (proportion) - median estimate of proportion (true scale) proportion.0.975 (proportion) - upper confidence limit of proportion (true scale) Any grouping allowed.
`raw`	Raw count data A dataframe containing the following columns: denom (positive_integer) - Total test counts associated with the specified time frame count (positive_integer) - Positive case counts associated with the specified time frame time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period' Any grouping allowed.
`...`	Named arguments passed on to `geom_events` `event_label_size` how big to make the event label `event_label_colour` the event label colour `event_label_angle` the event label colour `event_line_colour` the event line colour `event_fill_colour` the event area fill `hide_labels` do not show labels at all `guide_axis` a guide axis configuration for the labels (see `ggplot2::guide_axis` and `ggplot2::dup_axis`). This can be used to specify a position amongst other things. `...` Named arguments passed on to `ggplot2::scale_x_date` `name` The name of the scale. Used as the axis or legend title. If `waiver()`, the default, the name of the scale is taken from the first mapping used for that aesthetic. If `NULL`, the legend title will be omitted. `breaks` One of: `NULL` for no breaks `waiver()` for the breaks specified by `date_breaks` A `Date`/`POSIXct` vector giving positions of breaks A function that takes the limits as input and returns breaks as output `date_breaks` A string giving the distance between breaks like "2 weeks", or "10 years". If both `breaks` and `date_breaks` are specified, `date_breaks` wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'. `labels` One of: `NULL` for no labels `waiver()` for the default labels computed by the transformation object A character vector giving labels (must be same length as `breaks`) An expression vector (must be the same length as breaks). See ?plotmath for details. A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation. `date_labels` A string giving the formatting specification for the labels. Codes are defined in `strftime()`. If both `labels` and `date_labels` are specified, `date_labels` wins. `minor_breaks` One of: `NULL` for no breaks `waiver()` for the breaks specified by `date_minor_breaks` A `Date`/`POSIXct` vector giving positions of minor breaks A function that takes the limits as input and returns minor breaks as output `date_minor_breaks` A string giving the distance between minor breaks like "2 weeks", or "10 years". If both `minor_breaks` and `date_minor_breaks` are specified, `date_minor_breaks` wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'. `limits` One of: `NULL` to use the default scale range A numeric vector of length two providing limits of the scale. Use `NA` to refer to the existing minimum or maximum A function that accepts the existing (automatic) limits and returns new limits. Also accepts rlang lambda function notation. Note that setting limits on positional scales will remove data outside of the limits. If the purpose is to zoom, use the limit argument in the coordinate system (see `coord_cartesian()`). `expand` For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function `expansion()` to generate the values for the `expand` argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables. `oob` One of: Function that handles limits outside of the scale limits (out of bounds). Also accepts rlang lambda function notation. The default (`scales::censor()`) replaces out of bounds values with `NA`. `scales::squish()` for squishing out of bounds values into range. `scales::squish_infinite()` for squishing infinite values into range. `guide` A function used to create a guide or its name. See `guides()` for more information. `position` For position scales, The position of the axis. `left` or `right` for y axes, `top` or `bottom` for x axes. `sec.axis` `sec_axis()` is used to specify a secondary axis. `timezone` The timezone to use for display on the axes. The default (`NULL`) uses the timezone encoded in the data. `na.value` Missing values will be replaced with this value. Named arguments passed on to `proportion_locfit_model` `d` the input A dataframe containing the following columns: denom (positive_integer) - Total test counts associated with the specified time frame count (positive_integer) - Positive case counts associated with the specified time frame time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period' Ungrouped. `...` not used and present to allow proportion model to be used in a `group_modify` `window` a number of data points defining the bandwidth of the estimate, smaller values result in less smoothing, large value in more. The default value of 14 is calibrated for data provided on a daily frequency, with weekly data a lower value may be preferred. - (defaults to `14`) `deg` polynomial degree (min 1) - higher degree results in less smoothing, lower values result in more smoothing. A degree of 1 is fitting a linear model piece wise. - (defaults to `1`) `frequency` the density of the output estimates as a time period such as `⁠7 days⁠` or `⁠2 weeks⁠`. - (defaults to `"1 day"`) `predict` result is a prediction dataframe. If false we return the `locfit` models (advanced). - (defaults to `TRUE`)
`mapping`	a `ggplot2::aes` mapping. Most importantly setting the `colour` to something if there are multiple incidence timeseries in the plot
`events`	Significant events or time spans A dataframe containing the following columns: label (character) - the event label start (date) - the start date, or the date of the event end (date) - the end date or NA if a single event Any grouping allowed. A default value is defined.

Value

a ggplot object

Examples

tmp = ggoutbreak::england_covid %>%
  ggoutbreak::proportion_locfit_model(window=21) %>%
  dplyr::glimpse()

if(interactive()) {
  plot_proportion(tmp)+
    ggplot2::scale_fill_viridis_d(aesthetics = c("fill","colour"))
}
tmp = ggoutbreak::england_covid %>%
  ggoutbreak::proportion_locfit_model(window=21) %>%
  dplyr::glimpse()

if(interactive()) {
  plot_proportion(tmp)+
    ggplot2::scale_fill_viridis_d(aesthetics = c("fill","colour"))
}

Plot a raw case count proportion timeseries

Description

Plot a raw case count proportion timeseries

Usage

plot_proportions(
  raw = i_proportion_data,
  ...,
  mapping = .check_for_aes(raw, ...),
  events = i_events
)
plot_proportions(
  raw = i_proportion_data,
  ...,
  mapping = .check_for_aes(raw, ...),
  events = i_events
)

Arguments

`raw`	The raw count and denominator data A dataframe containing the following columns: denom (positive_integer) - Total test counts associated with the specified time frame count (positive_integer) - Positive case counts associated with the specified time frame time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period' Any grouping allowed.
`...`	Named arguments passed on to `geom_events` `event_label_size` how big to make the event label `event_label_colour` the event label colour `event_label_angle` the event label colour `event_line_colour` the event line colour `event_fill_colour` the event area fill `hide_labels` do not show labels at all `guide_axis` a guide axis configuration for the labels (see `ggplot2::guide_axis` and `ggplot2::dup_axis`). This can be used to specify a position amongst other things. `...` Named arguments passed on to `ggplot2::scale_x_date` `name` The name of the scale. Used as the axis or legend title. If `waiver()`, the default, the name of the scale is taken from the first mapping used for that aesthetic. If `NULL`, the legend title will be omitted. `breaks` One of: `NULL` for no breaks `waiver()` for the breaks specified by `date_breaks` A `Date`/`POSIXct` vector giving positions of breaks A function that takes the limits as input and returns breaks as output `date_breaks` A string giving the distance between breaks like "2 weeks", or "10 years". If both `breaks` and `date_breaks` are specified, `date_breaks` wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'. `labels` One of: `NULL` for no labels `waiver()` for the default labels computed by the transformation object A character vector giving labels (must be same length as `breaks`) An expression vector (must be the same length as breaks). See ?plotmath for details. A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation. `date_labels` A string giving the formatting specification for the labels. Codes are defined in `strftime()`. If both `labels` and `date_labels` are specified, `date_labels` wins. `minor_breaks` One of: `NULL` for no breaks `waiver()` for the breaks specified by `date_minor_breaks` A `Date`/`POSIXct` vector giving positions of minor breaks A function that takes the limits as input and returns minor breaks as output `date_minor_breaks` A string giving the distance between minor breaks like "2 weeks", or "10 years". If both `minor_breaks` and `date_minor_breaks` are specified, `date_minor_breaks` wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'. `limits` One of: `NULL` to use the default scale range A numeric vector of length two providing limits of the scale. Use `NA` to refer to the existing minimum or maximum A function that accepts the existing (automatic) limits and returns new limits. Also accepts rlang lambda function notation. Note that setting limits on positional scales will remove data outside of the limits. If the purpose is to zoom, use the limit argument in the coordinate system (see `coord_cartesian()`). `expand` For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function `expansion()` to generate the values for the `expand` argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables. `oob` One of: Function that handles limits outside of the scale limits (out of bounds). Also accepts rlang lambda function notation. The default (`scales::censor()`) replaces out of bounds values with `NA`. `scales::squish()` for squishing out of bounds values into range. `scales::squish_infinite()` for squishing infinite values into range. `guide` A function used to create a guide or its name. See `guides()` for more information. `position` For position scales, The position of the axis. `left` or `right` for y axes, `top` or `bottom` for x axes. `sec.axis` `sec_axis()` is used to specify a secondary axis. `timezone` The timezone to use for display on the axes. The default (`NULL`) uses the timezone encoded in the data. `na.value` Missing values will be replaced with this value.
`mapping`	a `ggplot2::aes` mapping. Most importantly setting the `colour` to something if there are multiple incidence timeseries in the plot
`events`	Significant events or time spans A dataframe containing the following columns: label (character) - the event label start (date) - the start date, or the date of the event end (date) - the end date or NA if a single event Any grouping allowed. A default value is defined.

Value

a ggplot object

Examples

# example code

tmp = tibble::tibble(
  time = as.time_period(1:10, "1 day"),
  count = 101:110
) %>% dplyr::mutate(
  denom = count*time
)

if(interactive()) {
  plot_proportions(tmp)+ggplot2::geom_line()
}

# example code

tmp = tibble::tibble(
  time = as.time_period(1:10, "1 day"),
  count = 101:110
) %>% dplyr::mutate(
  denom = count*time
)

if(interactive()) {
  plot_proportions(tmp)+ggplot2::geom_line()
}

Reproduction number timeseries diagram

Description

Reproduction number timeseries diagram

Usage

plot_rt(
  modelled = i_reproduction_number,
  ...,
  mapping = .check_for_aes(modelled, ...),
  events = i_events
)
plot_rt(
  modelled = i_reproduction_number,
  ...,
  mapping = .check_for_aes(modelled, ...),
  events = i_events
)

Arguments

`modelled`	the modelled Rt estimate A dataframe containing the following columns: time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period' rt.fit (double) - an estimate of the reproduction number rt.se.fit (positive_double) - the standard error of the reproduction number rt.0.025 (double) - lower confidence limit of the reproduction number rt.0.5 (double) - median estimate of the reproduction number rt.0.975 (double) - upper confidence limit of the reproduction number Any grouping allowed.
`...`	Named arguments passed on to `geom_events` `event_label_size` how big to make the event label `event_label_colour` the event label colour `event_label_angle` the event label colour `event_line_colour` the event line colour `event_fill_colour` the event area fill `hide_labels` do not show labels at all `guide_axis` a guide axis configuration for the labels (see `ggplot2::guide_axis` and `ggplot2::dup_axis`). This can be used to specify a position amongst other things. `...` Named arguments passed on to `ggplot2::scale_x_date` `name` The name of the scale. Used as the axis or legend title. If `waiver()`, the default, the name of the scale is taken from the first mapping used for that aesthetic. If `NULL`, the legend title will be omitted. `breaks` One of: `NULL` for no breaks `waiver()` for the breaks specified by `date_breaks` A `Date`/`POSIXct` vector giving positions of breaks A function that takes the limits as input and returns breaks as output `date_breaks` A string giving the distance between breaks like "2 weeks", or "10 years". If both `breaks` and `date_breaks` are specified, `date_breaks` wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'. `labels` One of: `NULL` for no labels `waiver()` for the default labels computed by the transformation object A character vector giving labels (must be same length as `breaks`) An expression vector (must be the same length as breaks). See ?plotmath for details. A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation. `date_labels` A string giving the formatting specification for the labels. Codes are defined in `strftime()`. If both `labels` and `date_labels` are specified, `date_labels` wins. `minor_breaks` One of: `NULL` for no breaks `waiver()` for the breaks specified by `date_minor_breaks` A `Date`/`POSIXct` vector giving positions of minor breaks A function that takes the limits as input and returns minor breaks as output `date_minor_breaks` A string giving the distance between minor breaks like "2 weeks", or "10 years". If both `minor_breaks` and `date_minor_breaks` are specified, `date_minor_breaks` wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'. `limits` One of: `NULL` to use the default scale range A numeric vector of length two providing limits of the scale. Use `NA` to refer to the existing minimum or maximum A function that accepts the existing (automatic) limits and returns new limits. Also accepts rlang lambda function notation. Note that setting limits on positional scales will remove data outside of the limits. If the purpose is to zoom, use the limit argument in the coordinate system (see `coord_cartesian()`). `expand` For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function `expansion()` to generate the values for the `expand` argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables. `oob` One of: Function that handles limits outside of the scale limits (out of bounds). Also accepts rlang lambda function notation. The default (`scales::censor()`) replaces out of bounds values with `NA`. `scales::squish()` for squishing out of bounds values into range. `scales::squish_infinite()` for squishing infinite values into range. `guide` A function used to create a guide or its name. See `guides()` for more information. `position` For position scales, The position of the axis. `left` or `right` for y axes, `top` or `bottom` for x axes. `sec.axis` `sec_axis()` is used to specify a secondary axis. `timezone` The timezone to use for display on the axes. The default (`NULL`) uses the timezone encoded in the data. `na.value` Missing values will be replaced with this value.
`mapping`	a `ggplot2::aes` mapping. Most importantly setting the `colour` to something if there are multiple incidence time series in the plot
`events`	Significant events or time spans A dataframe containing the following columns: label (character) - the event label start (date) - the start date, or the date of the event end (date) - the end date or NA if a single event Any grouping allowed. A default value is defined.

Value

a ggplot timeseries

Examples

# example code
tmp = ggoutbreak::england_covid %>%
  time_aggregate(count=sum(count))

if (interactive()) {

  tmp2 = tmp %>%
    poisson_locfit_model() %>%
    rt_from_growth_rate()

  # comparing RT from growth rates with England consensus Rt:
  plot_rt(tmp2,colour="blue")+
    ggplot2::geom_errorbar(
      data=england_consensus_rt,
      mapping=ggplot2::aes(x=date-21,ymin=low,ymax=high),
      colour="red")

}
# example code
tmp = ggoutbreak::england_covid %>%
  time_aggregate(count=sum(count))

if (interactive()) {

  tmp2 = tmp %>%
    poisson_locfit_model() %>%
    rt_from_growth_rate()

  # comparing RT from growth rates with England consensus Rt:
  plot_rt(tmp2,colour="blue")+
    ggplot2::geom_errorbar(
      data=england_consensus_rt,
      mapping=ggplot2::aes(x=date-21,ymin=low,ymax=high),
      colour="red")

}

The Negative Binomial Distribution

Description

Density, distribution function, quantile function and random generation for the negative binomial distribution with parameters size and prob.

Usage

pnbinom2(q, mean, sd = sqrt(mean), lower.tail = TRUE, log.p = FALSE)
pnbinom2(q, mean, sd = sqrt(mean), lower.tail = TRUE, log.p = FALSE)

Arguments

`q`	vector of quantiles.
`mean`	the mean value on the true scale (vectorised)
`sd`	the standard deviation on the true scale (vectorised)
`lower.tail`	logical; if TRUE (default), probabilities are $P[X \le x]$ , otherwise, $P[X > x]$ .
`log.p`	logical; if TRUE, probabilities p are given as log(p).

Details

The negative binomial distribution with size $= n$ and prob $= p$ has density

$p(x) = \frac{\Gamma(x+n)}{\Gamma(n) x!} p^n (1-p)^x$

for $x = 0, 1, 2, \ldots$ , $n > 0$ and $0 < p \le 1$ .

This represents the number of failures which occur in a sequence of Bernoulli trials before a target number of successes is reached. The mean is $\mu = n(1-p)/p$ and variance $n(1-p)/p^2$ .

If an element of x is not integer, the result of dnbinom is zero, with a warning.

The quantile is defined as the smallest value $x$ such that $F(x) \ge p$ , where $F$ is the distribution function.

Value

dnbinom gives the density, pnbinom gives the distribution function, qnbinom gives the quantile function, and rnbinom generates random deviates.

Invalid size or prob will result in return value NaN, with a warning.

The length of the result is determined by n for rnbinom, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

rnbinom returns a vector of type integer unless generated values exceed the maximum representable integer when double values are returned.

Source

dnbinom computes via binomial probabilities, using code contributed by Catherine Loader (see dbinom).

pnbinom uses pbeta.

qnbinom uses the Cornish–Fisher Expansion to include a skewness correction to a normal approximation, followed by a search.

rnbinom uses the derivation as a gamma mixture of Poisson distributions, see

Devroye, L. (1986) Non-Uniform Random Variate Generation. Springer-Verlag, New York. Page 480.

Examples

pnbinom2(0:5, 2, sqrt(2))
pnbinom2(0:5, 2, sqrt(2))

Poisson time-series model.

Description

This uses a generalised linear model to fit a quasi-poisson model with a time varying rate as a natural cubic spline with approx one degree of freedom per window units of the time series.

Usage

poisson_glm_model(d = i_incidence_input, ..., window = 14, frequency = "1 day")
poisson_glm_model(d = i_incidence_input, ..., window = 14, frequency = "1 day")

Arguments

`d`	Count model input A dataframe containing the following columns: count (positive_integer) - Positive case counts associated with the specified time frame time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period' Ungrouped.
`...`	not used and present to allow proportion model to be used in a `group_modify`
`window`	a number of data points defining the bandwidth of the estimate, smaller values result in less smoothing, large value in more. The default value of 14 is calibrated for data provided on a daily frequency, with weekly data a lower value may be preferred. - (defaults to `14`)
`frequency`	the density of the output estimates as a time period such as `⁠7 days⁠` or `⁠2 weeks⁠`. - (defaults to `"1 day"`)

Value

A dataframe containing the following columns:

time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
incidence.fit (double) - an estimate of the incidence rate on a log scale
incidence.se.fit (positive_double) - the standard error of the incidence rate estimate on a log scale
incidence.0.025 (positive_double) - lower confidence limit of the incidence rate (true scale)
incidence.0.5 (positive_double) - median estimate of the incidence rate (true scale)
incidence.0.975 (positive_double) - upper confidence limit of the incidence rate (true scale)

Any grouping allowed.

Examples

ggoutbreak::england_covid %>%
 dplyr::filter(date < "2021-01-01") %>%
 time_aggregate(count=sum(count)) %>%
 ggoutbreak::poisson_glm_model(window=21) %>%
 dplyr::glimpse()
ggoutbreak::england_covid %>%
 dplyr::filter(date < "2021-01-01") %>%
 time_aggregate(count=sum(count)) %>%
 ggoutbreak::poisson_glm_model(window=21) %>%
 dplyr::glimpse()

Poisson time-series model.

Description

Takes a list of times and counts and fits a quasi-poisson model fitted with a log link function to count data using local regression using the package locfit.

Usage

poisson_locfit_model(
  d = i_incidence_input,
  ...,
  window = 14,
  deg = 1,
  frequency = "1 day",
  predict = TRUE
)
poisson_locfit_model(
  d = i_incidence_input,
  ...,
  window = 14,
  deg = 1,
  frequency = "1 day",
  predict = TRUE
)

Arguments

`d`	input data A dataframe containing the following columns: count (positive_integer) - Positive case counts associated with the specified time frame time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period' Ungrouped.
`...`	not used and present to allow proportion model to be used in a `group_modify`
`window`	a number of data points defining the bandwidth of the estimate, smaller values result in less smoothing, large value in more. The default value of 14 is calibrated for data provided on a daily frequency, with weekly data a lower value may be preferred. - (defaults to `14`)
`deg`	polynomial degree (min 1) - higher degree results in less smoothing, lower values result in more smoothing. A degree of 1 is fitting a linear model piece wise. - (defaults to `1`)
`frequency`	the density of the output estimates as a time period such as `⁠7 days⁠` or `⁠2 weeks⁠`. - (defaults to `"1 day"`)
`predict`	result is a prediction dataframe. If false we return the `locfit` models (advanced). - (defaults to `TRUE`)

Details

This results is an incidence rate estimate plus an absolute exponential growth rate estimate both based on the time unit of the input data (e.g. for daily data the rate will be cases per day and the growth rate will be daily).

Value

A dataframe containing the following columns:

time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
incidence.fit (double) - an estimate of the incidence rate on a log scale
incidence.se.fit (positive_double) - the standard error of the incidence rate estimate on a log scale
incidence.0.025 (positive_double) - lower confidence limit of the incidence rate (true scale)
incidence.0.5 (positive_double) - median estimate of the incidence rate (true scale)
incidence.0.975 (positive_double) - upper confidence limit of the incidence rate (true scale)
growth.fit (double) - an estimate of the growth rate
growth.se.fit (positive_double) - the standard error the growth rate
growth.0.025 (double) - lower confidence limit of the growth rate
growth.0.5 (double) - median estimate of the growth rate
growth.0.975 (double) - upper confidence limit of the growth rate

Any grouping allowed.

Examples


withr::with_options(list("ggoutbreak.keep_cdf"=TRUE),{
  ggoutbreak::england_covid %>%
    ggoutbreak::poisson_locfit_model(window=21) %>%
    dplyr::glimpse()
})

withr::with_options(list("ggoutbreak.keep_cdf"=TRUE),{
  ggoutbreak::england_covid %>%
    ggoutbreak::poisson_locfit_model(window=21) %>%
    dplyr::glimpse()
})

Binomial time-series model.

Description

This uses a generalised linear model to fit a quasi-binomial model with a time varying rate as a natural cubic spline with approx one degree of freedom per window units of the time series.

Usage

proportion_glm_model(
  d = i_proportion_input,
  ...,
  window = 14,
  frequency = "1 day"
)
proportion_glm_model(
  d = i_proportion_input,
  ...,
  window = 14,
  frequency = "1 day"
)

Arguments

`d`	Proportion model input A dataframe containing the following columns: denom (positive_integer) - Total test counts associated with the specified time frame count (positive_integer) - Positive case counts associated with the specified time frame time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period' Ungrouped.
`...`	not used and present to allow proportion model to be used in a `group_modify`
`window`	a number of data points defining the bandwidth of the estimate, smaller values result in less smoothing, large value in more. The default value of 14 is calibrated for data provided on a daily frequency, with weekly data a lower value may be preferred. - (defaults to `14`)
`frequency`	the density of the output estimates as a time period such as `⁠7 days⁠` or `⁠2 weeks⁠`. - (defaults to `"1 day"`)

Value

A dataframe containing the following columns:

time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
proportion.fit (double) - an estimate of the proportion on a logit scale
proportion.se.fit (positive_double) - the standard error of proportion estimate on a logit scale
proportion.0.025 (proportion) - lower confidence limit of proportion (true scale)
proportion.0.5 (proportion) - median estimate of proportion (true scale)
proportion.0.975 (proportion) - upper confidence limit of proportion (true scale)

Any grouping allowed.

Examples


ggoutbreak::england_covid_proportion %>%
  dplyr::filter(date < "2021-01-01") %>%
  time_aggregate() %>%
  ggoutbreak::proportion_glm_model(window=5) %>%
  dplyr::glimpse()

# TODO: deal with error conditions
# "observations with zero weight not used for calculating dispersion
ggoutbreak::england_covid_proportion %>%
  dplyr::filter(date < "2021-01-01") %>%
  time_aggregate() %>%
  ggoutbreak::proportion_glm_model(window=5) %>%
  dplyr::glimpse()

# TODO: deal with error conditions
# "observations with zero weight not used for calculating dispersion

A binomial proportion estimate and associated exponential growth rate

Description

takes a list of times, counts and a denominator and fits a quasi-binomial model using a logit link function to proportion data using local regression using the package locfit.

Usage

proportion_locfit_model(
  d = i_proportion_input,
  ...,
  window = 14,
  deg = 1,
  frequency = "1 day",
  predict = TRUE
)
proportion_locfit_model(
  d = i_proportion_input,
  ...,
  window = 14,
  deg = 1,
  frequency = "1 day",
  predict = TRUE
)

Arguments

`d`	the input A dataframe containing the following columns: denom (positive_integer) - Total test counts associated with the specified time frame count (positive_integer) - Positive case counts associated with the specified time frame time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period' Ungrouped.
`...`	not used and present to allow proportion model to be used in a `group_modify`
`window`	a number of data points defining the bandwidth of the estimate, smaller values result in less smoothing, large value in more. The default value of 14 is calibrated for data provided on a daily frequency, with weekly data a lower value may be preferred. - (defaults to `14`)
`deg`	polynomial degree (min 1) - higher degree results in less smoothing, lower values result in more smoothing. A degree of 1 is fitting a linear model piece wise. - (defaults to `1`)
`frequency`	the density of the output estimates as a time period such as `⁠7 days⁠` or `⁠2 weeks⁠`. - (defaults to `"1 day"`)
`predict`	result is a prediction dataframe. If false we return the `locfit` models (advanced). - (defaults to `TRUE`)

Details

This expects d to contain one combination of:

time and count and denom columns - e.g. all tests conducted.

This results is a one versus others comparison binomial proportion estimate plus a relative growth rate estimate specifying how much quicker this is growing compared to the growth of the denominator.

The denominator maybe the sum of all subgroups denom = sum(count), e.g. in the situation where there are multiple variants of a disease circulating. In which case the relative growth is that of the subgroup compared to the overall. You can make this a one-versus-others comparison by making the denominator exclude the current item (e.g. denom = sum(count)-count).

The denominator can also be used to express the size of the population tested. This gives us a relative growth rate that is different in essence to the previous and may be a better estimate of the true growth rate in the situation where testing effort is variable, or capacity saturated.

Value

A dataframe containing the following columns:

time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
proportion.fit (double) - an estimate of the proportion on a logit scale
proportion.se.fit (positive_double) - the standard error of proportion estimate on a logit scale
proportion.0.025 (proportion) - lower confidence limit of proportion (true scale)
proportion.0.5 (proportion) - median estimate of proportion (true scale)
proportion.0.975 (proportion) - upper confidence limit of proportion (true scale)
relative.growth.fit (double) - an estimate of the relative growth rate
relative.growth.se.fit (positive_double) - the standard error the relative growth rate
relative.growth.0.025 (double) - lower confidence limit of the relative growth rate
relative.growth.0.5 (double) - median estimate of the relative growth rate
relative.growth.0.975 (double) - upper confidence limit of the relative growth rate

Any grouping allowed.

Examples

ggoutbreak::england_covid_proportion %>%
  time_aggregate() %>%
  proportion_locfit_model(window=5, degree=2) %>%
  dplyr::glimpse()
ggoutbreak::england_covid_proportion %>%
  time_aggregate() %>%
  proportion_locfit_model(window=5, degree=2) %>%
  dplyr::glimpse()

Wedge distribution

Description

The wedge distribution has a domain of 0 to 1 and has a linear probability density function over that domain.

Usage

pwedge(q, a, log.p = FALSE)
pwedge(q, a, log.p = FALSE)

Arguments

`q`	vector of quantiles
`a`	a gradient from -2 (left skewed) to 2 (right skewed)
`log.p`	logical; if TRUE, probabilities p are given as log(p).

Details

The rwedge can be combined with quantile functions to skew standard distributions, or introduce correlation or down weight certain parts of the distribution.

Value

a vector of probabilities, quantiles, densities or samples.

Examples


pwedge(seq(0,1,0.1), a=1)
dwedge(seq(0,1,0.1), a=1)
qwedge(c(0.25,0.5,0.75), a=-1)

stats::cor(
  stats::qnorm(rwedge(1000, a=2)),
  stats::qnorm(rwedge(1000, a=-2))
)
pwedge(seq(0,1,0.1), a=1)
dwedge(seq(0,1,0.1), a=1)
qwedge(c(0.25,0.5,0.75), a=-1)

stats::cor(
  stats::qnorm(rwedge(1000, a=2)),
  stats::qnorm(rwedge(1000, a=-2))
)

The Beta Distribution

Description

Density, distribution function, quantile function and random generation for the Beta distribution with parameters shape1 and shape2 (and optional non-centrality parameter ncp).

Usage

qbeta2(p, prob, kappa, lower.tail = TRUE, log.p = FALSE)
qbeta2(p, prob, kappa, lower.tail = TRUE, log.p = FALSE)

Arguments

`p`	vector of probabilities
`prob`	the mean probability (vectorised)
`kappa`	a coefficient of variation. where 0 is no variability and 1 is maximally variability (vectorised)
`lower.tail`	logical; if TRUE (default), probabilities are `P[X<=x]` otherwise `P[X>x]`.
`log.p`	logical; if TRUE, probabilities p are given as log(p).

Value

dbeta gives the density, pbeta the distribution function, qbeta the quantile function, and rbeta generates random deviates.

Invalid arguments will result in return value NaN, with a warning.

The length of the result is determined by n for rbeta, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

Examples

qbeta2(c(0.25,0.5,0.75), 0.5, 0.25)
qbeta2(c(0.25,0.5,0.75), 0.5, 0.25)

The Gamma Distribution

Description

Density, distribution function, quantile function and random generation for the Gamma distribution with parameters shape and scale.

Usage

qgamma2(
  p,
  mean,
  sd = sqrt(mean),
  lower.tail = TRUE,
  log.p = FALSE,
  convex = TRUE
)
qgamma2(
  p,
  mean,
  sd = sqrt(mean),
  lower.tail = TRUE,
  log.p = FALSE,
  convex = TRUE
)

Arguments

`p`	vector of probabilities
`mean`	the mean value on the true scale (vectorised)
`sd`	the standard deviation on the true scale (vectorised)
`lower.tail`	logical; if TRUE (default), probabilities are `P[X<=x]` otherwise `P[X>x]`.
`log.p`	logical; if TRUE, probabilities p are given as log(p).
`convex`	Show a warning if the distribution selected is not a convex function

Value

dgamma gives the density, pgamma gives the distribution function, qgamma gives the quantile function, and rgamma generates random deviates.

Invalid arguments will result in return value NaN, with a warning.

The length of the result is determined by n for rgamma, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

Examples

qgamma2(c(0.25,0.5,0.75), 2, 1)
qgamma2(c(0.25,0.5,0.75), 2, 1)

The Log Normal Distribution

Description

Density, distribution function, quantile function and random generation for the log normal distribution whose logarithm has mean equal to meanlog and standard deviation equal to sdlog.

Usage

qlnorm2(p, mean = 1, sd = sqrt(exp(1) - 1), lower.tail = TRUE, log.p = FALSE)
qlnorm2(p, mean = 1, sd = sqrt(exp(1) - 1), lower.tail = TRUE, log.p = FALSE)

Arguments

`p`	vector of probabilities.
`mean`	the mean value on the true scale (vectorised)
`sd`	the standard deviation on the true scale (vectorised)
`lower.tail`	logical; if TRUE (default), probabilities are $P[X \le x]$ , otherwise, $P[X > x]$ .
`log.p`	logical; if TRUE, probabilities p are given as log(p).

Details

The log normal distribution has density

$f(x) = \frac{1}{\sqrt{2\pi}\sigma x} e^{-(\log(x) - \mu)^2/2 \sigma^2}%$

Value

dlnorm gives the density, plnorm gives the distribution function, qlnorm gives the quantile function, and rlnorm generates random deviates.

The length of the result is determined by n for rlnorm, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

Note

The cumulative hazard $H(t) = - \log(1 - F(t))$ is -plnorm(t, r, lower = FALSE, log = TRUE).

Source

dlnorm is calculated from the definition (in ‘Details’). [pqr]lnorm are based on the relationship to the normal.

Consequently, they model a single point mass at exp(meanlog) for the boundary case sdlog = 0.

References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

Examples

qlnorm2(c(0.25,0.5,0.72), 2, 1)
qlnorm2(c(0.25,0.5,0.72), 2, 1)

The Negative Binomial Distribution

Description

Density, distribution function, quantile function and random generation for the negative binomial distribution with parameters size and prob.

Usage

qnbinom2(p, mean, sd = sqrt(mean), lower.tail = TRUE, log.p = FALSE)
qnbinom2(p, mean, sd = sqrt(mean), lower.tail = TRUE, log.p = FALSE)

Arguments

`p`	vector of probabilities.
`mean`	the mean value on the true scale (vectorised)
`sd`	the standard deviation on the true scale (vectorised)
`lower.tail`	logical; if TRUE (default), probabilities are $P[X \le x]$ , otherwise, $P[X > x]$ .
`log.p`	logical; if TRUE, probabilities p are given as log(p).

Details

The negative binomial distribution with size $= n$ and prob $= p$ has density

$p(x) = \frac{\Gamma(x+n)}{\Gamma(n) x!} p^n (1-p)^x$

for $x = 0, 1, 2, \ldots$ , $n > 0$ and $0 < p \le 1$ .

This represents the number of failures which occur in a sequence of Bernoulli trials before a target number of successes is reached. The mean is $\mu = n(1-p)/p$ and variance $n(1-p)/p^2$ .

If an element of x is not integer, the result of dnbinom is zero, with a warning.

The quantile is defined as the smallest value $x$ such that $F(x) \ge p$ , where $F$ is the distribution function.

Value

dnbinom gives the density, pnbinom gives the distribution function, qnbinom gives the quantile function, and rnbinom generates random deviates.

Invalid size or prob will result in return value NaN, with a warning.

The length of the result is determined by n for rnbinom, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

rnbinom returns a vector of type integer unless generated values exceed the maximum representable integer when double values are returned.

Source

dnbinom computes via binomial probabilities, using code contributed by Catherine Loader (see dbinom).

pnbinom uses pbeta.

qnbinom uses the Cornish–Fisher Expansion to include a skewness correction to a normal approximation, followed by a search.

rnbinom uses the derivation as a gamma mixture of Poisson distributions, see

Devroye, L. (1986) Non-Uniform Random Variate Generation. Springer-Verlag, New York. Page 480.

Examples

qnbinom2(c(0.25,0.5,0.75), 5, sqrt(5))
qnbinom2(c(0.25,0.5,0.75), 5, sqrt(5))

Identify estimate lags in a model

Description

A specific parameter or set of parameters can be estimated by a pipeline. This function applies the pipeline to a synthetic epidemic with sawtooth incidence resulting from a stepped growth rate function. The lag between synthetic input and estimate is assessed by minimising the root mean square error of input and estimated based on different lag offsets.

Usage

quantify_lag(pipeline, ip = i_empirical_ip, lags = -10:30)
quantify_lag(pipeline, ip = i_empirical_ip, lags = -10:30)

Arguments

pipeline

a function taking an input dataset and an infectivity profile as inputs and producing an estimate as output. This is the whole parametrised pipeline including any other inputs. This can be a purrr style function.

ip

the infectivity profile.

A dataframe containing the following columns:

boot (anything + default(1)) - a bootstrap identifier
probability (proportion) - the probability of new event during this period.
a0 (double) - the beginning of the time period (in days)
a1 (double) - the end of the time period (in days)

Must be grouped by: boot (exactly).

A default value is defined.

lags

a vector with the delays to test. Defaults to -10 to +30 days

Value

a lag analysis dataframe containing the estimate type and the lag in days that the estimate is behind the actual observation

Examples

pipeline = ~ .x %>% poisson_locfit_model() %>% rt_from_incidence(ip = .y)
lag_analysis = quantify_lag(pipeline)

quantify_lag(~ rt_epiestim(.x,ip = .y))

pipeline = ~ .x %>% poisson_locfit_model() %>% rt_from_incidence(ip = .y)
lag_analysis = quantify_lag(pipeline)

quantify_lag(~ rt_epiestim(.x,ip = .y))

Extract Parts of a POSIXt or Date Object

Description

Extract the weekday, month or quarter, or the Julian time (days since some origin). These are generic functions: the methods for the internal date-time classes are documented here.

Usage

## S3 method for class 'time_period'
quarters(x, ...)
## S3 method for class 'time_period'
quarters(x, ...)

Arguments

`x`	an object inheriting from class `"POSIXt"` or `"Date"`.
`...`	arguments for other methods.

Value

weekdays and months return a character vector of names in the locale in use, i.e., Sys.getlocale("LC_TIME").

quarters returns a character vector of "Q1" to "Q4".

julian returns the number of days (possibly fractional) since the origin, with the origin as a "origin" attribute. All time calculations in R are done ignoring leap-seconds.

Note

Examples

## first two are locale dependent:
weekdays(.leap.seconds)
months  (.leap.seconds)
quarters(.leap.seconds)

## Show how easily you get month, day, year, day (of {month, week, yr}), ... :
## (remember to count from 0 (!): mon = 0..11, wday = 0..6,  etc !!)

##' Transform (Time-)Date vector  to  convenient data frame :
dt2df <- function(dt, dName = deparse(substitute(dt))) {
    DF <- as.data.frame(unclass(as.POSIXlt( dt )))
    `names<-`(cbind(dt, DF, deparse.level=0L), c(dName, names(DF)))
}
## e.g.,
dt2df(.leap.seconds)    # date+time
dt2df(Sys.Date() + 0:9) # date

##' Even simpler:  Date -> Matrix - dropping time info {sec,min,hour, isdst}
d2mat <- function(x) simplify2array(unclass(as.POSIXlt(x))[4:7])
## e.g.,
d2mat(seq(as.Date("2000-02-02"), by=1, length.out=30)) # has R 1.0.0's release date


## Julian Day Number (JDN, https://en.wikipedia.org/wiki/Julian_day)
## is the number of days since noon UTC on the first day of 4317 BCE.
## in the proleptic Julian calendar.  To more recently, in
## 'Terrestrial Time' which differs from UTC by a few seconds
## See https://en.wikipedia.org/wiki/Terrestrial_Time
julian(Sys.Date(), -2440588) # from a day
floor(as.numeric(julian(Sys.time())) + 2440587.5) # from a date-time

## first two are locale dependent:
weekdays(.leap.seconds)
months  (.leap.seconds)
quarters(.leap.seconds)

## Show how easily you get month, day, year, day (of {month, week, yr}), ... :
## (remember to count from 0 (!): mon = 0..11, wday = 0..6,  etc !!)

##' Transform (Time-)Date vector  to  convenient data frame :
dt2df <- function(dt, dName = deparse(substitute(dt))) {
    DF <- as.data.frame(unclass(as.POSIXlt( dt )))
    `names<-`(cbind(dt, DF, deparse.level=0L), c(dName, names(DF)))
}
## e.g.,
dt2df(.leap.seconds)    # date+time
dt2df(Sys.Date() + 0:9) # date

##' Even simpler:  Date -> Matrix - dropping time info {sec,min,hour, isdst}
d2mat <- function(x) simplify2array(unclass(as.POSIXlt(x))[4:7])
## e.g.,
d2mat(seq(as.Date("2000-02-02"), by=1, length.out=30)) # has R 1.0.0's release date


## Julian Day Number (JDN, https://en.wikipedia.org/wiki/Julian_day)
## is the number of days since noon UTC on the first day of 4317 BCE.
## in the proleptic Julian calendar.  To more recently, in
## 'Terrestrial Time' which differs from UTC by a few seconds
## See https://en.wikipedia.org/wiki/Terrestrial_Time
julian(Sys.Date(), -2440588) # from a day
floor(as.numeric(julian(Sys.time())) + 2440587.5) # from a date-time

Wedge distribution

Description

The wedge distribution has a domain of 0 to 1 and has a linear probability density function over that domain.

Usage

qwedge(p, a, lower.tail = TRUE, log.p = FALSE)
qwedge(p, a, lower.tail = TRUE, log.p = FALSE)

Arguments

`p`	vector of probabilities
`a`	a gradient from -2 (left skewed) to 2 (right skewed)
`lower.tail`	logical; if TRUE (default), probabilities are `P[X<=x]` otherwise `P[X>x]`.
`log.p`	logical; if TRUE, probabilities p are given as log(p).

Details

The rwedge can be combined with quantile functions to skew standard distributions, or introduce correlation or down weight certain parts of the distribution.

Value

a vector of probabilities, quantiles, densities or samples.

Examples


pwedge(seq(0,1,0.1), a=1)
dwedge(seq(0,1,0.1), a=1)
qwedge(c(0.25,0.5,0.75), a=-1)

stats::cor(
  stats::qnorm(rwedge(1000, a=2)),
  stats::qnorm(rwedge(1000, a=-2))
)
pwedge(seq(0,1,0.1), a=1)
dwedge(seq(0,1,0.1), a=1)
qwedge(c(0.25,0.5,0.75), a=-1)

stats::cor(
  stats::qnorm(rwedge(1000, a=2)),
  stats::qnorm(rwedge(1000, a=-2))
)

A random Bernoulli sample as a logical value

Description

A random Bernoulli sample as a logical value

Usage

rbern(n, prob)
rbern(n, prob)

Arguments

`n`	number of observations
`prob`	the mean probability (vectorised)

Value

a vector of logical values of size n

Examples

table(rbern(100, 0.25))
table(rbern(100, 0.25))

The Beta Distribution

Description

Density, distribution function, quantile function and random generation for the Beta distribution with parameters shape1 and shape2 (and optional non-centrality parameter ncp).

Usage

rbeta2(n, prob, kappa)
rbeta2(n, prob, kappa)

Arguments

`n`	number of observations
`prob`	the mean probability (vectorised)
`kappa`	a coefficient of variation. where 0 is no variability and 1 is maximally variability (vectorised)

Value

dbeta gives the density, pbeta the distribution function, qbeta the quantile function, and rbeta generates random deviates.

Invalid arguments will result in return value NaN, with a warning.

The length of the result is determined by n for rbeta, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

Examples

rbeta2(3, c(0.1,0.5,0.9),0.1)
rbeta2(3, c(0.1,0.5,0.9),0.1)

Sampling from the multinomial equivalent of the Bernoulli distribution

Description

Sampling from the multinomial equivalent of the Bernoulli distribution

Usage

rcategorical(n, prob, factor = FALSE)
rcategorical(n, prob, factor = FALSE)

Arguments

`n`	a sample size
`prob`	a (optionally named) vector of probabilities that will be normalised to sum to 1
`factor`	if not FALSE then factor levels will either be taken from the names of `prob` first, or if this is a character vector from this.

Value

a vector of random class labels of length n. Labels come from names of prob or from a character vector in factor.

Examples


prob = c("one"=0.1,"two"=0.2,"seven"=0.7)
table(rcategorical(1000,prob))
rcategorical(10,prob,factor=TRUE)
rcategorical(10,rep(1,26),factor=letters)

prob = c("one"=0.1,"two"=0.2,"seven"=0.7)
table(rcategorical(1000,prob))
rcategorical(10,prob,factor=TRUE)
rcategorical(10,rep(1,26),factor=letters)

Random count data from a discrete gamma distribution

Description

For count data that is under-dispersed we can use a gamma distribution rounded to the nearest whole number. This is the same as the method of discretisation in make_gamma_ip, and suits delay distributions where there is less variability than can be represented with a Poisson or negative binomial distribution.

Usage

rdiscgamma(n, mean, sd, kappa)
rdiscgamma(n, mean, sd, kappa)

Arguments

`n`	number of observations
`mean`	the mean value on the true scale (vectorised)
`sd`	the standard deviation on the true scale (vectorised)
`kappa`	a coefficient of variation. where 0 is no variability and 1 is maximally variability (vectorised)

Value

an integer valued vector from a gamma distribution.

Examples

rdiscgamma(10, 2, 1)
rdiscgamma(10, 2, 1)

Reband any discrete distribution

Description

e.g. age banded population, or a discrete probability distribution e.g. a serial interval distribution. This method fits a monotonically increasing spline to the cumulative distribution (including the upper and lower limits) and interpolating using that spline to the new cut points.

Usage

reband_discrete(
  x,
  y,
  xout,
  xlim = c(0, NA),
  ytotal = c(0, sum(y)),
  digits = 0,
  labelling = c("positive_integer", "inclusive", "exclusive"),
  sep = "-"
)
reband_discrete(
  x,
  y,
  xout,
  xlim = c(0, NA),
  ytotal = c(0, sum(y)),
  digits = 0,
  labelling = c("positive_integer", "inclusive", "exclusive"),
  sep = "-"
)

Arguments

`x`	a set of upper limits of bands, e.g. for age: 0-14;15-64;65-79;80+ is 15,65,80,NA
`y`	a set of quantities for each band e.g. population figures
`xout`	a set of new upper limits
`xlim`	Upper and lower limits for x. if the last band is e.g 80+ in the input and we want to know the 85+ band in the output some kind of maximum upper limit is needed to interpolate to.
`ytotal`	upper and lower limits for y. If the interpolation values fall outside of x then the minimum and maximum limits of y are given by this. This would be `c(0,1)` for a probability distribution, for example.
`digits`	if the `xout` value is continuous then how many significant figures to put in the labels
`labelling`	are the `xout` values interpretable as an `inclusive` upper limit, or an `exclusive` upper limit, or as an upper limit of an 'positive_integer“ quantity
`sep`	separator for names e.g. `18-24` or `⁠18 to 24⁠`

Value

a re-banded set of discrete values, guaranteed to sum to the same as y

Examples

ul = stringr::str_extract(england_demographics$class, "_([0-9]+)",group = 1) %>%
  as.numeric()

tmp = reband_discrete(
  ul, england_demographics$population,
  c(5,10,15,40,80), xlim=c(0,120))

tmp

sum(tmp)
sum(england_demographics$population)
ul = stringr::str_extract(england_demographics$class, "_([0-9]+)",group = 1) %>%
  as.numeric()

tmp = reband_discrete(
  ul, england_demographics$population,
  c(5,10,15,40,80), xlim=c(0,120))

tmp

sum(tmp)
sum(england_demographics$population)

Re-parametrised distributions

Description

Re-parametrised distributions

Arguments

`n`	number of observations
`x`	vector of quantiles
`q`	vector of quantiles
`p`	vector of probabilities
`log`	logical; if TRUE, probabilities p are given as log(p).
`log.p`	logical; if TRUE, probabilities p are given as log(p).
`lower.tail`	logical; if TRUE (default), probabilities are `P[X<=x]` otherwise `P[X>x]`.
`convex`	Show a warning if the distribution selected is not a convex function
`prob`	the mean probability (vectorised)
`kappa`	a coefficient of variation. where 0 is no variability and 1 is maximally variability (vectorised)
`mean`	the mean value on the true scale (vectorised)
`sd`	the standard deviation on the true scale (vectorised)

Rescale a timeseries in the temporal dimension

Description

Sometimes we may have, for example, modelled incidence or growth rates on weekly data resulting in cases per week and growth rate per week. We may wish to use this to estimate the reproduction number, using algorithms that assume a daily incidence. Not everything has a dependence on time, and things such as proportions, or prevalence will not change.

Usage

rescale_model(df = i_timeseries, time_unit)
rescale_model(df = i_timeseries, time_unit)

Arguments

df

A data frame containing modelled output. This will modify the following columns if present:

A dataframe containing the following columns:

time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
incidence.fit (double) - an estimate of the incidence rate on a log scale
incidence.se.fit (positive_double) - the standard error of the incidence rate estimate on a log scale
incidence.0.025 (positive_double) - lower confidence limit of the incidence rate (true scale)
incidence.0.5 (positive_double) - median estimate of the incidence rate (true scale)
incidence.0.975 (positive_double) - upper confidence limit of the incidence rate (true scale)

Any grouping allowed.

A dataframe containing the following columns:

time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
growth.fit (double) - an estimate of the growth rate
growth.se.fit (positive_double) - the standard error the growth rate
growth.0.025 (double) - lower confidence limit of the growth rate
growth.0.5 (double) - median estimate of the growth rate
growth.0.975 (double) - upper confidence limit of the growth rate

Any grouping allowed.

A dataframe containing the following columns:

time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
proportion.fit (double) - an estimate of the proportion on a logit scale
proportion.se.fit (positive_double) - the standard error of proportion estimate on a logit scale
proportion.0.025 (proportion) - lower confidence limit of proportion (true scale)
proportion.0.5 (proportion) - median estimate of proportion (true scale)
proportion.0.975 (proportion) - upper confidence limit of proportion (true scale)
relative.growth.fit (double) - an estimate of the relative growth rate
relative.growth.se.fit (positive_double) - the standard error the relative growth rate
relative.growth.0.025 (double) - lower confidence limit of the relative growth rate
relative.growth.0.5 (double) - median estimate of the relative growth rate
relative.growth.0.975 (double) - upper confidence limit of the relative growth rate

Any grouping allowed.

time_unit

a lubridate period string such as "1 day"

Value

the same time series with different time unit, and adjusted incidence and growth rate figures.

Examples


sim = sim_poisson_model(time_unit = "1 week")
incidence = sim %>% poisson_locfit_model(frequency = "1 day", deg = 2, window=5)
incidence2 = incidence %>% rescale_model(time_unit = "1 day")
incidence2 %>% dplyr::glimpse()

sim = sim_poisson_model(time_unit = "1 week")
incidence = sim %>% poisson_locfit_model(frequency = "1 day", deg = 2, window=5)
incidence2 = incidence %>% rescale_model(time_unit = "1 day")
incidence2 %>% dplyr::glimpse()

The Gamma Distribution

Description

Density, distribution function, quantile function and random generation for the Gamma distribution with parameters shape and scale.

Usage

rgamma2(n, mean, sd = sqrt(mean), convex = TRUE)
rgamma2(n, mean, sd = sqrt(mean), convex = TRUE)

Arguments

`n`	number of observations
`mean`	the mean value on the true scale (vectorised)
`sd`	the standard deviation on the true scale (vectorised)
`convex`	Show a warning if the distribution selected is not a convex function

Value

dgamma gives the density, pgamma gives the distribution function, qgamma gives the quantile function, and rgamma generates random deviates.

Invalid arguments will result in return value NaN, with a warning.

The length of the result is determined by n for rgamma, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

Examples

rgamma2(10, 2, 1)
rgamma2(10, 2, 1)

The Log Normal Distribution

Description

Density, distribution function, quantile function and random generation for the log normal distribution whose logarithm has mean equal to meanlog and standard deviation equal to sdlog.

Usage

rlnorm2(n, mean = 1, sd = sqrt(exp(1) - 1))
rlnorm2(n, mean = 1, sd = sqrt(exp(1) - 1))

Arguments

`n`	number of observations. If `length(n) > 1`, the length is taken to be the number required.
`mean`	the mean value on the true scale (vectorised)
`sd`	the standard deviation on the true scale (vectorised)

Details

The log normal distribution has density

$f(x) = \frac{1}{\sqrt{2\pi}\sigma x} e^{-(\log(x) - \mu)^2/2 \sigma^2}%$

Value

dlnorm gives the density, plnorm gives the distribution function, qlnorm gives the quantile function, and rlnorm generates random deviates.

The length of the result is determined by n for rlnorm, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

Note

The cumulative hazard $H(t) = - \log(1 - F(t))$ is -plnorm(t, r, lower = FALSE, log = TRUE).

Source

dlnorm is calculated from the definition (in ‘Details’). [pqr]lnorm are based on the relationship to the normal.

Consequently, they model a single point mass at exp(meanlog) for the boundary case sdlog = 0.

References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

Examples

rlnorm2(10, 2, 1)
rlnorm2(10, 2, 1)

The Negative Binomial Distribution

Description

Density, distribution function, quantile function and random generation for the negative binomial distribution with parameters size and prob.

Usage

rnbinom2(n, mean, sd = sqrt(mean))
rnbinom2(n, mean, sd = sqrt(mean))

Arguments

`n`	number of observations. If `length(n) > 1`, the length is taken to be the number required.
`mean`	the mean value on the true scale (vectorised)
`sd`	the standard deviation on the true scale (vectorised)

Details

The negative binomial distribution with size $= n$ and prob $= p$ has density

$p(x) = \frac{\Gamma(x+n)}{\Gamma(n) x!} p^n (1-p)^x$

for $x = 0, 1, 2, \ldots$ , $n > 0$ and $0 < p \le 1$ .

This represents the number of failures which occur in a sequence of Bernoulli trials before a target number of successes is reached. The mean is $\mu = n(1-p)/p$ and variance $n(1-p)/p^2$ .

If an element of x is not integer, the result of dnbinom is zero, with a warning.

The quantile is defined as the smallest value $x$ such that $F(x) \ge p$ , where $F$ is the distribution function.

Value

dnbinom gives the density, pnbinom gives the distribution function, qnbinom gives the quantile function, and rnbinom generates random deviates.

Invalid size or prob will result in return value NaN, with a warning.

The length of the result is determined by n for rnbinom, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

rnbinom returns a vector of type integer unless generated values exceed the maximum representable integer when double values are returned.

Source

dnbinom computes via binomial probabilities, using code contributed by Catherine Loader (see dbinom).

pnbinom uses pbeta.

qnbinom uses the Cornish–Fisher Expansion to include a skewness correction to a normal approximation, followed by a search.

rnbinom uses the derivation as a gamma mixture of Poisson distributions, see

Devroye, L. (1986) Non-Uniform Random Variate Generation. Springer-Verlag, New York. Page 480.

Examples

rnbinom2(10, 5, sqrt(5))
rnbinom2(10, 5, sqrt(5))

Reproduction number estimate using the Cori method

Description

Calculate a reproduction number estimate from incidence data using a reimplementation of the Cori method and an empirical generation time distribution. This uses a mixture distribution to transmit uncertainty in generation time estimates. A number of changes in the implementation are made. Firstly there is no technical limitation to the infectivity profile being strictly positive in time. Secondly this implementation should tolerate missing count values (NA values must be filtered out though). Thirdly for a given time point t this applies all Rt estimates for which the window spans time point t rather than end on time point t, and fourthly this implementation allows multiple windows widths to be calculated in parallel and aggregated. All of this tends to increase uncertainty in the result particularly in the time dimension, which addresses some of the issue seem with EpiEstim during the pandemic. Finally it is quite a bit quicker, especially if approximate quantiles are all that are needed.

Usage

rt_cori(
  df = i_incidence_input,
  ip = i_discrete_ip,
  window = 14,
  mean_prior = 1,
  std_prior = 2,
  ...,
  epiestim_compat = FALSE,
  approx = FALSE
)
rt_cori(
  df = i_incidence_input,
  ip = i_discrete_ip,
  window = 14,
  mean_prior = 1,
  std_prior = 2,
  ...,
  epiestim_compat = FALSE,
  approx = FALSE
)

Arguments

`df`	The count data. Extra groups are allowed. A dataframe containing the following columns: count (positive_integer) - Positive case counts associated with the specified time frame time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period' Ungrouped.
`ip`	A long format infectivity profile. A dataframe containing the following columns: boot (anything + default(1)) - a bootstrap identifier probability (proportion) - the probability of new event during this period. tau (integer + complete) - the days since the index event. Must be grouped by: boot (exactly). A default value is defined.
`window`	the widths of the Cori method window to include in the estimate. This can be a vector of values and all windows will be calculated and aggregated.
`mean_prior`	the prior for the $R_t$ estimate. When sample size is low the $R_t$ estimate will revert to this prior. In `EpiEstim` the default is a high number to allow detection of insufficient data but this tends to create anomalies in the early part of infection time series. A possible value is $R_0$ but in fact this also will be a poor choice for the value of $R_t$ when case numbers drop to a low value.
`std_prior`	the prior for the $R_t$ SD.
`...`	not used
`epiestim_compat`	produce an estimate of `Rt` using only windows that end on the time `t` rather than all windows that span time `t`. If this option is selected there can also only be one value for window.
`approx`	approximate the quantiles of the mixture distribution with a gamma distribution with the same first mean and SD.

Details

This will calculate a reproduction number for each group in the input dataframe.

Value

A dataframe containing the following columns:

time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
rt.fit (double) - an estimate of the reproduction number
rt.se.fit (positive_double) - the standard error of the reproduction number
rt.0.025 (double) - lower confidence limit of the reproduction number
rt.0.5 (double) - median estimate of the reproduction number
rt.0.975 (double) - upper confidence limit of the reproduction number

Any grouping allowed.

Examples


#TODO: speed up example

tmp = ggoutbreak::england_covid  %>%
  dplyr::filter(date < "2021-01-01") %>%
  time_aggregate(count=sum(count))

tmp2 = tmp %>% rt_cori(epiestim_compat = TRUE)
tmp3 = tmp %>% rt_cori(window=c(5:14), approx=TRUE)

comp = dplyr::bind_rows(
  tmp2 %>% dplyr::mutate(class = "EpiEstim"),
  tmp3 %>% dplyr::mutate(class = "Cori+")
) %>% dplyr::group_by(class)

if (interactive()) {
  plot_rt(comp)+
   ggplot2::geom_errorbar(
     data=england_consensus_rt %>% dplyr::filter(date < "2021-01-01"),
     mapping=ggplot2::aes(x=date,ymin=low,ymax=high),colour="black")+
   ggplot2::coord_cartesian(ylim=c(0.5,1.75))
}

#TODO: speed up example

tmp = ggoutbreak::england_covid  %>%
  dplyr::filter(date < "2021-01-01") %>%
  time_aggregate(count=sum(count))

tmp2 = tmp %>% rt_cori(epiestim_compat = TRUE)
tmp3 = tmp %>% rt_cori(window=c(5:14), approx=TRUE)

comp = dplyr::bind_rows(
  tmp2 %>% dplyr::mutate(class = "EpiEstim"),
  tmp3 %>% dplyr::mutate(class = "Cori+")
) %>% dplyr::group_by(class)

if (interactive()) {
  plot_rt(comp)+
   ggplot2::geom_errorbar(
     data=england_consensus_rt %>% dplyr::filter(date < "2021-01-01"),
     mapping=ggplot2::aes(x=date,ymin=low,ymax=high),colour="black")+
   ggplot2::coord_cartesian(ylim=c(0.5,1.75))
}

`EpiEstim` reproduction number

Description

Calculate a reproduction number estimate from incidence data using the EpiEstim library and an empirical generation time distribution. This uses resampling to transmit uncertainty in generation time estimates. This is quite slow for each time series depending on the number of bootstraps and samples in the infectivity profile.

Usage

rt_epiestim(
  df = i_incidence_input,
  ip = i_discrete_ip,
  bootstraps = 2000,
  window = 14,
  mean_prior = 1,
  std_prior = 2,
  ...
)
rt_epiestim(
  df = i_incidence_input,
  ip = i_discrete_ip,
  bootstraps = 2000,
  window = 14,
  mean_prior = 1,
  std_prior = 2,
  ...
)

Arguments

`df`	Count data. Extra groups are allowed. A dataframe containing the following columns: count (positive_integer) - Positive case counts associated with the specified time frame time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period' Ungrouped.
`ip`	infectivity profile A dataframe containing the following columns: boot (anything + default(1)) - a bootstrap identifier probability (proportion) - the probability of new event during this period. tau (integer + complete) - the days since the index event. Must be grouped by: boot (exactly). A default value is defined.
`bootstraps`	the number of bootstraps to take to calculate for each point.
`window`	the width of the `EpiEstim` window
`mean_prior`	the prior for the $R_t$ estimate. When sample size is low the $R_t$ estimate will revert to this prior. In `EpiEstim` the default is a high number to allow detection of insufficient data but this tends to create anomalies in the early part of infection time series. A possible value is $R_0$ but in fact this also will be a poor choice for the value of $R_t$ when case numbers drop to a low value.
`std_prior`	the prior for the $R_t$ SD.
`...`	not used

Details

This will calculate a reproduction number for each group in the input dataframe.

Value

A dataframe containing the following columns:

time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
rt.fit (double) - an estimate of the reproduction number
rt.se.fit (positive_double) - the standard error of the reproduction number
rt.0.025 (double) - lower confidence limit of the reproduction number
rt.0.5 (double) - median estimate of the reproduction number
rt.0.975 (double) - upper confidence limit of the reproduction number

Any grouping allowed.

Examples

tmp = ggoutbreak::england_covid %>%
  time_aggregate(count=sum(count))

if (interactive()) {
  # not run due to long running
  tmp2 = tmp %>% rt_epiestim()
}


tmp = ggoutbreak::england_covid %>%
  time_aggregate(count=sum(count))

if (interactive()) {
  # not run due to long running
  tmp2 = tmp %>% rt_epiestim()
}

Wallinga-Lipsitch reproduction number from growth rates

Description

Calculate a reproduction number estimate from growth rate using the Wallinga 2007 estimation using empirical generation time distribution. This uses resampling to transmit uncertainty in growth rate estimates. This also handles time-series that are not on a daily cadence (although this is experimental). The reproduction number estimate is neither a instantaneous (backward looking) nor case (forward looking) reproduction number but somewhere between the two.

Usage

rt_from_growth_rate(
  df = i_growth_rate,
  ip = i_empirical_ip,
  bootstraps = 1000,
  seed = Sys.time()
)
rt_from_growth_rate(
  df = i_growth_rate,
  ip = i_empirical_ip,
  bootstraps = 1000,
  seed = Sys.time()
)

Arguments

`df`	Growth rate estimates A dataframe containing the following columns: time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period' growth.fit (double) - an estimate of the growth rate growth.se.fit (positive_double) - the standard error the growth rate growth.0.025 (double) - lower confidence limit of the growth rate growth.0.5 (double) - median estimate of the growth rate growth.0.975 (double) - upper confidence limit of the growth rate Any grouping allowed.
`ip`	Infectivity profile A dataframe containing the following columns: boot (anything + default(1)) - a bootstrap identifier probability (proportion) - the probability of new event during this period. a0 (double) - the beginning of the time period (in days) a1 (double) - the end of the time period (in days) Must be grouped by: boot (exactly). A default value is defined.
`bootstraps`	the number of bootstraps to take to calculate for each point.
`seed`	a random number generator seed

Value

A dataframe containing the following columns:

time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
rt.fit (double) - an estimate of the reproduction number
rt.se.fit (positive_double) - the standard error of the reproduction number
rt.0.025 (double) - lower confidence limit of the reproduction number
rt.0.5 (double) - median estimate of the reproduction number
rt.0.975 (double) - upper confidence limit of the reproduction number

Any grouping allowed.

Examples

tmp = ggoutbreak::england_covid %>%
  time_aggregate(count=sum(count))


if (interactive()) {
  # not run
  withr::with_options(list("ggoutbreak.keep_cdf"=TRUE),{
   tmp2 = tmp %>%
      poisson_locfit_model() %>%
      rt_from_growth_rate()
  })
}

tmp = ggoutbreak::england_covid %>%
  time_aggregate(count=sum(count))


if (interactive()) {
  # not run
  withr::with_options(list("ggoutbreak.keep_cdf"=TRUE),{
   tmp2 = tmp %>%
      poisson_locfit_model() %>%
      rt_from_growth_rate()
  })
}

Reproduction number from modelled incidence

Description

Calculate a reproduction number estimate from modelled incidence using the methods described in the vignette "Estimating the reproduction number from modelled incidence" and using a set of empirical generation time distributions. This assumes that modelled incidence has the same time unit as the ip distribution, and that this is daily, if this is not the case then rescale_model() may be able to fix it.

Usage

rt_from_incidence(df = i_incidence_model, ip = i_discrete_ip, approx = FALSE)
rt_from_incidence(df = i_incidence_model, ip = i_discrete_ip, approx = FALSE)

Arguments

df

modelled incidence estimate

A dataframe containing the following columns:

time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period'
incidence.fit (double) - an estimate of the incidence rate on a log scale
incidence.se.fit (positive_double) - the standard error of the incidence rate estimate on a log scale
incidence.0.025 (positive_double) - lower confidence limit of the incidence rate (true scale)
incidence.0.5 (positive_double) - median estimate of the incidence rate (true scale)
incidence.0.975 (positive_double) - upper confidence limit of the incidence rate (true scale)

Any grouping allowed.

ip

an infectivity profile (aka generation time distribution)

A dataframe containing the following columns:

boot (anything + default(1)) - a bootstrap identifier
probability (proportion) - the probability of new event during this period.
tau (integer + complete) - the days since the index event.

Must be grouped by: boot (exactly).

A default value is defined.

approx

use a faster, but approximate, estimate of quantiles

Value

A dataframe containing the following columns:

time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
rt.fit (double) - an estimate of the reproduction number
rt.se.fit (positive_double) - the standard error of the reproduction number
rt.0.025 (double) - lower confidence limit of the reproduction number
rt.0.5 (double) - median estimate of the reproduction number
rt.0.975 (double) - upper confidence limit of the reproduction number

Any grouping allowed.

Examples

df = ggoutbreak::england_covid %>%
  dplyr::filter(date < "2021-01-01") %>%
  time_aggregate(count=sum(count)) %>%
  poisson_locfit_model()

if (interactive()) {
  # not run
  withr::with_options(list("ggoutbreak.keep_cdf"=TRUE),{
    tmp3 = df %>% rt_from_incidence(ganyani_ip)
  })
}

tmp = df %>% rt_from_incidence(du_serial_interval_ip)

df = ggoutbreak::england_covid %>%
  dplyr::filter(date < "2021-01-01") %>%
  time_aggregate(count=sum(count)) %>%
  poisson_locfit_model()

if (interactive()) {
  # not run
  withr::with_options(list("ggoutbreak.keep_cdf"=TRUE),{
    tmp3 = df %>% rt_from_incidence(ganyani_ip)
  })
}

tmp = df %>% rt_from_incidence(du_serial_interval_ip)

Reproduction number from renewal equation applied to modelled incidence using statistical re-sampling

Description

Calculate a reproduction number estimate from modelled incidence estimates, by statistical sampling from a log-normally distributed incidence estimate, combined with uncertain infectivity profile specified as multiple discrete empirical distributions.

Usage

rt_from_renewal(
  df = i_incidence_model,
  ip = i_discrete_ip,
  bootstraps = 1000,
  seed = Sys.time()
)
rt_from_renewal(
  df = i_incidence_model,
  ip = i_discrete_ip,
  bootstraps = 1000,
  seed = Sys.time()
)

Arguments

`df`	modelled incidence estimate A dataframe containing the following columns: time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period' incidence.fit (double) - an estimate of the incidence rate on a log scale incidence.se.fit (positive_double) - the standard error of the incidence rate estimate on a log scale incidence.0.025 (positive_double) - lower confidence limit of the incidence rate (true scale) incidence.0.5 (positive_double) - median estimate of the incidence rate (true scale) incidence.0.975 (positive_double) - upper confidence limit of the incidence rate (true scale) Any grouping allowed.
`ip`	infectivity profile A dataframe containing the following columns: boot (anything + default(1)) - a bootstrap identifier probability (proportion) - the probability of new event during this period. tau (integer + complete) - the days since the index event. Must be grouped by: boot (exactly). A default value is defined.
`bootstraps`	the number of samples to take at each time point. This will be rounded up to a whole multiple of the infectivity profile distribution length.
`seed`	a random number seed for reproducibility

Value

A dataframe containing the following columns:

time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
rt.fit (double) - an estimate of the reproduction number
rt.se.fit (positive_double) - the standard error of the reproduction number
rt.0.025 (double) - lower confidence limit of the reproduction number
rt.0.5 (double) - median estimate of the reproduction number
rt.0.975 (double) - upper confidence limit of the reproduction number

Any grouping allowed.

Examples

df = ggoutbreak::england_covid %>%
  time_aggregate(count=sum(count)) %>%
    poisson_locfit_model()

if (interactive()) {
  # not run
  withr::with_options(list("ggoutbreak.keep_cdf"=TRUE),{
   tmp2 = df %>%
      rt_from_renewal(ganyani_ip)
  })

}

df = ggoutbreak::england_covid %>%
  time_aggregate(count=sum(count)) %>%
    poisson_locfit_model()

if (interactive()) {
  # not run
  withr::with_options(list("ggoutbreak.keep_cdf"=TRUE),{
   tmp2 = df %>%
      rt_from_renewal(ganyani_ip)
  })

}

Wedge distribution

Description

The wedge distribution has a domain of 0 to 1 and has a linear probability density function over that domain.

Usage

rwedge(n, a)
rwedge(n, a)

Arguments

`n`	number of observations
`a`	a gradient from -2 (left skewed) to 2 (right skewed)

Details

The rwedge can be combined with quantile functions to skew standard distributions, or introduce correlation or down weight certain parts of the distribution.

Value

a vector of probabilities, quantiles, densities or samples.

Examples


pwedge(seq(0,1,0.1), a=1)
dwedge(seq(0,1,0.1), a=1)
qwedge(c(0.25,0.5,0.75), a=-1)

stats::cor(
  stats::qnorm(rwedge(1000, a=2)),
  stats::qnorm(rwedge(1000, a=-2))
)
pwedge(seq(0,1,0.1), a=1)
dwedge(seq(0,1,0.1), a=1)
qwedge(c(0.25,0.5,0.75), a=-1)

stats::cor(
  stats::qnorm(rwedge(1000, a=2)),
  stats::qnorm(rwedge(1000, a=-2))
)

A log1p y scale

Description

A log1p y scale

Usage

scale_y_log1p(..., n = 5, base = 10, dp = 0)
scale_y_log1p(..., n = 5, base = 10, dp = 0)

Arguments

`...`	Other arguments passed on to `⁠scale_(x\|y)_continuous()⁠`
`n`	the number of major breaks
`base`	the base for the logarithm
`dp`	decimal points

Value

a ggplot scale

A logit y scale

Description

A logit y scale

Usage

scale_y_logit(...)
scale_y_logit(...)

Arguments

...

Other arguments passed on to ⁠scale_(x|y)_continuous()⁠

Value

a ggplot scale

Calculate scoring statistics from predictions using `scoringutils`.

Description

This performs a range of quantile based, and if cumulative distribution functions are available, continuous scoring metrics for each estimate time-point.

Usage

score_estimate(est, obs, lags = NULL)
score_estimate(est, obs, lags = NULL)

Arguments

`est`	a dataframe of estimates of incidence, growth rate of reproduction number based off a simulation or data with known parameters.
`obs`	a dataframe of the ground truth, sharing the same grouping as `est` with at least one column(s) named `XXX.obs` with `XXX` being one of `rt`,`growth` or `incidence` or any other column group predicted in `est`.
`lags`	a data frame of estimate types and lags as output by `quantify_lag()` if multiple models are included then the columns must match those in `obs`.

Value

a dataframe of scoring metrics

Examples

tmp2 = test_bpm %>% sim_summarise_linelist()

withr::with_options(list("ggoutbreak.keep_cdf"=TRUE),{
   est = tmp2 %>% poisson_locfit_model() %>% rt_from_incidence()
})

obs = tmp2 %>% dplyr::mutate(rt.obs = rt.weighted)
score_estimate(est,obs) %>% dplyr::glimpse()

tmp2 = test_bpm %>% sim_summarise_linelist()

withr::with_options(list("ggoutbreak.keep_cdf"=TRUE),{
   est = tmp2 %>% poisson_locfit_model() %>% rt_from_incidence()
})

obs = tmp2 %>% dplyr::mutate(rt.obs = rt.weighted)
score_estimate(est,obs) %>% dplyr::glimpse()

Apply a ascertainment bias to the observed case counts.

Description

Apply a ascertainment bias to the observed case counts.

Usage

sim_apply_ascertainment(df = i_sim_count_data, fn_asc = ~1, seed = Sys.time())
sim_apply_ascertainment(df = i_sim_count_data, fn_asc = ~1, seed = Sys.time())

Arguments

df

a count dataframe from e.g. sim_poisson_model() or sim_summarise_linelist()

A dataframe containing the following columns:

statistic (character) - An identifier for the statistic, whether that be infections, admissions, deaths
count (positive_integer) - Positive case counts associated with the specified time frame
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period'

Minimally grouped by: statistic (and other groupings allowed).

fn_asc

a function that takes a single input vector t and returns a probability of ascertainment, e.g. ~ stats::rbeta(.x, 20, 80) or ⁠~ rbeta2(.x,prob=<probability>,kappa=<dispersion>)⁠. or cfg_weekly_proportion_rng()

seed

a RNG seed

Value

a dataframe with original column, and count column modified to include ascertainment bias.

Examples

tibble::tibble(
  statistic = "incidence",
  time=as.time_period(1:10,"1 day"),
  count=rep(100,10)
) %>% dplyr::group_by(statistic) %>% sim_apply_ascertainment(~ ifelse(.x<=5,0.1,0.9))
tibble::tibble(
  statistic = "incidence",
  time=as.time_period(1:10,"1 day"),
  count=rep(100,10)
) %>% dplyr::group_by(statistic) %>% sim_apply_ascertainment(~ ifelse(.x<=5,0.1,0.9))

Apply delay distribution to count or linelist data

Description

Events include symptom onset, admission, death, test sampling, test processing

Usage

sim_apply_delay(
  df,
  ...,
  fn_p_symptomatic = ~0.5,
  fn_p_admitted = ~0.1,
  fn_p_died = ~0.05,
  fn_p_tested = ~0.8,
  seed = Sys.time()
)
sim_apply_delay(
  df,
  ...,
  fn_p_symptomatic = ~0.5,
  fn_p_admitted = ~0.1,
  fn_p_died = ~0.05,
  fn_p_tested = ~0.8,
  seed = Sys.time()
)

Arguments

`df`	a line list dataframe arising from e.g. `sim_branching_process()`
`...`	Named arguments passed on to `sim_apply_delay.linelist` `fn_symptom_delay,fn_admission_delay,fn_death_delay` a function that calculates the time to event onset from infection. This will be called with a vector of infection times as the first parameter (`time`) but all other columns of `df` are also available as well as the `symptomatic`,`died`,and `admitted` flags. The function must be vectorised on its inputs (and consume additional inputs with `...`). A `purrr` style lambda is OK e.g. `~ stats::rgamma(.x, shape = 3)`, and the first parameter will be infection time. if you have an discrete probability profile for this then you can use `cfg_ip_sampler_rng(ip_symptoms)`. `fn_sample_delay` This function returns the time from either symptom onset (symptomatic) or from infection (asymptomatic) until a sample is taken. (N.B. this might be better to do a screening test probability plus screening test frequency rather than overloading this.) `fn_result_delay` Identical to other functions except the first parameter will be `sample_time` rather than time of infection. This is the time from sampling to the result being available. Named arguments passed on to `sim_apply_delay.count_data` `fn_symptom_profile,fn_admission_profile,fn_death_profile` a function that takes time and returns the probability density of symptoms, admissions, or deaths over time since infection (i.e. `tau`) as an ip delay distribution. If possible it is a very good idea to pre-compute these distributions as they need to be assigned to every line in the input and this can be very slow. `fn_sample_profile` a function that takes time and returns the probability density of test sample being taken over time since symptoms. `fn_result_profile` a function that takes time and returns the probability density of test result being available over time since test sampling.
`fn_p_symptomatic`, `fn_p_admitted`, `fn_p_died`, `fn_p_tested`	Function that returns a probability between 0 and 1 for each row of the input dataframe. A `purrr` style lambda is OK (e.g. `~ 1` for always true) the first parameter of this will be time of infection. The function must be vectorised on its inputs (and consume additional inputs with `...`)
`seed`	RNG seed for reproducibility

Value

Depends on input, either:

a wide format line list with additional XX, XX_time and XX_delay columns, for each of the set of statistics generated.
a long format set of counts of different statistics i.e. infections, symptoms, admission, death, sample (tests taken), results (test results) .

Examples

tmp = sim_branching_process(
  changes = tibble::tibble(t = c(0,20,40,60,80,110), R = c(1.8,1.5,0.9,1.5,0.8,1.2)),
  max_time = 120,
  seed = 100
)

tmp2 = tmp %>% sim_apply_delay()
tmp2 %>% dplyr::glimpse()

tmp = sim_branching_process(
  changes = tibble::tibble(t = c(0,20,40,60,80,110), R = c(1.8,1.5,0.9,1.5,0.8,1.2)),
  max_time = 120,
  seed = 100
)

tmp2 = tmp %>% sim_apply_delay()
tmp2 %>% dplyr::glimpse()

Generate a line list from a branching process model parametrised by reproduction number

Description

Generate a line list from a branching process model parametrised by reproduction number

Usage

sim_branching_process(
  changes = tibble::tibble(t = c(0, 40), rt = c(2.5, 0.8)),
  max_time = 80,
  seed = Sys.time(),
  fn_Rt = cfg_step_fn(changes),
  fn_ip = ~test_ip,
  fn_kappa = ~1,
  imports_df = NULL,
  fn_imports = ~ifelse(.x == 0, 30, 0),
  fn_list_next_gen = list(),
  ...
)
sim_branching_process(
  changes = tibble::tibble(t = c(0, 40), rt = c(2.5, 0.8)),
  max_time = 80,
  seed = Sys.time(),
  fn_Rt = cfg_step_fn(changes),
  fn_ip = ~test_ip,
  fn_kappa = ~1,
  imports_df = NULL,
  fn_imports = ~ifelse(.x == 0, 30, 0),
  fn_list_next_gen = list(),
  ...
)

Arguments

`changes`	a dataframe containing a `t` time column and `R` reproduction number parameter. This parameter is optional if `fn_Rt` is specified
`max_time`	maximum duration of simulation
`seed`	random seed
`fn_Rt`	can be specified instead of `changes` df. This is a vectorised function that accepts a time parameter and returns a reproduction number. If both this and `changes` are specified this takes preference.
`fn_ip`	a function that takes input vector `t` (and/or `class`) and returns an infectivity profile at times `t`.
`fn_kappa`	a vectorised function taking `t` and other imported case metadata returning a dispersion parameter controlling the likelihood of individual super-spreading. This must be between 1 and `Inf` with 1 being standard poisson dispersion and larger values representing over dispersion.
`imports_df`	a data frame containing minimally `time` and `count` columns plus any metadata about the imports in additional columns. Metadata columns can inform the `fn_Rt`,`fn_kappa` and `fn_ip` functions as additional parameters.
`fn_imports`	a time varying function the defines the number of infected importations. If `imports_df` is defined then this is used instead
`fn_list_next_gen`	a named list of functions. The name corresponds to metadata columns in the simulation, the function is a `purrr` style mapping that will replace the old value in the named column with a new one. Such a function can be generated with `cfg_transition_fn()` when a transition probability matrix is involved, of it can be specified directly as a `case_when` style function. The function must be vectorised and assume no grouping structure. If the function has named parameters it can reference any of the metadata columns, or time (as `t`). The `rcategorical()` function may be useful in this scenario.
`...`	not used

Value

a line list of cases, with individual ids, infection times, and infector ids, for a simulated outbreak

Examples

tmp = sim_branching_process(
  changes = tibble::tibble(t = c(0,40), R = c(1.5,0.8)),
  max_time = 120,
  seed = 100,
  fn_imports = ~ ifelse(.x<10,1,0)
)

if(interactive()) {
  plot_cases(tmp, mapping=ggplot2::aes(fill=as.factor(generation)),linewidth=0.1)
}

# imports can also be specified as a dataframe, which allows additional
# metadata in the line list. An example of which is as follows:
imports_df = tibble::tribble(
  ~time, ~variant, ~count,
  0:4, "wild-type", 100,
  10:14, "alpha", 5,
)
tmp = sim_branching_process(
  changes = tibble::tibble(t = c(0,40), R = c(1.5,0.8)),
  max_time = 120,
  seed = 100,
  fn_imports = ~ ifelse(.x<10,1,0)
)

if(interactive()) {
  plot_cases(tmp, mapping=ggplot2::aes(fill=as.factor(generation)),linewidth=0.1)
}

# imports can also be specified as a dataframe, which allows additional
# metadata in the line list. An example of which is as follows:
imports_df = tibble::tribble(
  ~time, ~variant, ~count,
  0:4, "wild-type", 100,
  10:14, "alpha", 5,
)

Apply a time varying probability and convolution to count data

Description

Standard convolution assumes one delay distribution. This is actually not what we see in reality as delays can depend on any factors, including the day of week. This function applies a convolution to an input time-series when that convolution is expressed as a function (usually of time, but can be of anything in the input dataframe). The convolution is then sampled using a poisson or negative binomial

Usage

sim_convolution(
  df = i_sim_count_data,
  p_fn,
  delay_fn,
  ...,
  input = "infections",
  output,
  kappa = 1,
  from = c("count", "rate")
)
sim_convolution(
  df = i_sim_count_data,
  p_fn,
  delay_fn,
  ...,
  input = "infections",
  output,
  kappa = 1,
  from = c("count", "rate")
)

Arguments

`df`	a count dataframe from e.g. `sim_poisson_model()` or `sim_summarise_linelist()` A dataframe containing the following columns: statistic (character) - An identifier for the statistic, whether that be infections, admissions, deaths count (positive_integer) - Positive case counts associated with the specified time frame time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period' Minimally grouped by: statistic (and other groupings allowed).
`p_fn`	a function that takes a time parameter and potentially and returns probability of observation given something occurs. a no-op for this parameter is `~ 1`.
`delay_fn`	a function that takes time and returns the probability of observation (given it occurred) over time since infection (i.e. `tau`) as an ip delay distribution. This does not have to sum to 1 (e.g. mapping incidence to prevalence) but if not then the combination of `p_fn` and `delay_fn` is less easy to interpret. This should behave sensibly if p changes halfway through a convolution. See `cfg_weekly_ip_fn()` and `cfg_gamma_ip_fn()` for helper functions to construct this parameter. A no-op for this parameter would be `~ ifelse(.x==0,1,0)`.
`...`	not used
`input`	the input statistic
`output`	the output statistic
`kappa`	dispersion. scaled such that poisson dispersion is 1. Values must be 0 (no dispersion), 1 (poisson dispersion) or greater than 1 for over-dispersion.
`from`	Controls if you base future counts on previous counts or on underlying rate, defaults to `count` but `rate` is a possibility if you want to base the convolution off a more theoretical value rather than observed cases. Either way the convolution generates a new rate which is in turn sampled into a poisson or negative binomial count.

Value

return the result of applying this convolution to the data.

Examples

weekday_delay = make_gamma_ip(median_of_mean = 5, median_of_sd = 2)
weekend_delay = make_gamma_ip(median_of_mean = 6, median_of_sd = 2)

delay_fn = ~ ifelse(.x %% 7 %in% c(6,7), list(weekend_delay), list(weekday_delay))
p_fn = ~ ifelse(.x < 20, 0.5, 0.75)

data = tibble::tibble(
    time=1:40,
    count = rep(100,40),
    rate = rep(100,40),
    statistic="infections") %>% dplyr::group_by(statistic)
delayed = data %>%
    sim_convolution(p_fn,delay_fn,output="delayed") %>%
    dplyr::filter(statistic=="delayed")
if (interactive()) ggplot2::ggplot(delayed,ggplot2::aes(x=time))+
  ggplot2::geom_line(ggplot2::aes(y=rate))+
  ggplot2::geom_line(ggplot2::aes(y=count))

# other example delay functions
delay_fn = cfg_gamma_ip_fn( ~ ifelse(.x<5, 8, 4))

weekday_delay = make_gamma_ip(median_of_mean = 5, median_of_sd = 2)
weekend_delay = make_gamma_ip(median_of_mean = 6, median_of_sd = 2)

delay_fn = ~ ifelse(.x %% 7 %in% c(6,7), list(weekend_delay), list(weekday_delay))
p_fn = ~ ifelse(.x < 20, 0.5, 0.75)

data = tibble::tibble(
    time=1:40,
    count = rep(100,40),
    rate = rep(100,40),
    statistic="infections") %>% dplyr::group_by(statistic)
delayed = data %>%
    sim_convolution(p_fn,delay_fn,output="delayed") %>%
    dplyr::filter(statistic=="delayed")
if (interactive()) ggplot2::ggplot(delayed,ggplot2::aes(x=time))+
  ggplot2::geom_line(ggplot2::aes(y=rate))+
  ggplot2::geom_line(ggplot2::aes(y=count))

# other example delay functions
delay_fn = cfg_gamma_ip_fn( ~ ifelse(.x<5, 8, 4))

Apply a time-varying probability and delay function to linelist data

Description

Apply a time-varying probability and delay function to linelist data

Usage

sim_delay(
  df = i_sim_linelist,
  p_fn,
  delay_fn,
  input = "time",
  output = "event",
  seed = Sys.time()
)
sim_delay(
  df = i_sim_linelist,
  p_fn,
  delay_fn,
  input = "time",
  output = "event",
  seed = Sys.time()
)

Arguments

`df`	a line list dataframe arising from e.g. `sim_branching_process()` A dataframe containing the following columns: id (unique_id) - Patient level unique id time (ggoutbreak::time_period) - Time of infection. A 'time_period' Any grouping allowed.
`p_fn`	Function that returns a probability between 0 and 1 for each row of the input dataframe. A `purrr` style lambda is OK (e.g. `~ 1` for always true) the first parameter of this will be time of infection. The function must be vectorised on its inputs (and consume additional inputs with `...`)
`delay_fn`	A function that calculates the time to event onset from the `input` time. This will be called with a vector of infection times as the first parameter (`time`) but all other columns of `df` are also available as well as the `symptomatic`,`died`,and `admitted` flags. The function must be vectorised on its inputs (and consume additional inputs with `...`). A `purrr` style lambda is OK e.g. `~ stats::rgamma(.x, shape = 3)`, and the first parameter will be infection time. if you have an discrete probability profile for this then you can use `cfg_ip_sampler_rng(ip_symptoms)` without the tilde.
`input`	a time column from which to calculate the delay from.
`output`	an output column set name (defaults to `"event"`)
`seed`	RNG seed for reproducibility

Value

the line list with extra columns with prefix given by output, specifying whether the event was observed, the delay and the simulation time.

Examples


tmp = sim_branching_process(
  changes = tibble::tibble(t = c(0,20,40,60,80,110), R = c(1.8,1.5,0.9,1.5,0.8,1.2)),
  max_time = 120,
  seed = 100
)

tmp2 = tmp %>% sim_delay(
  p_fn = ~ rbern(.x, 0.8),
  delay_fn = ~ rgamma2(.x, mean = 5),
)
tmp2 %>% dplyr::glimpse()

tmp = sim_branching_process(
  changes = tibble::tibble(t = c(0,20,40,60,80,110), R = c(1.8,1.5,0.9,1.5,0.8,1.2)),
  max_time = 120,
  seed = 100
)

tmp2 = tmp %>% sim_delay(
  p_fn = ~ rbern(.x, 0.8),
  delay_fn = ~ rgamma2(.x, mean = 5),
)
tmp2 %>% dplyr::glimpse()

Apply a right censoring to count data.

Description

Delayed observations means that if, for example a case is only attributed to a disease after a delay there is right censoring of the data. There can be very complex patterns of right censoring if for example observations are batched and published weekly. During COVID in the UK some death data was published frequently but some was retrospectively reported on monthly intervals, depending on where the patient died, which lead to complex time dependent biases in the death data. Given the description of the delay, this function will simulate this effect for count data. In another example there were delays reporting test results in the run up to Christmas which resulted in case rates apparently dropping as schools broke up. This could have affected timing of the 2021 Christmas lockdown.

Usage

sim_delayed_observation(
  df = i_sim_count_data,
  delay_fn,
  ...,
  input = "infections",
  output = input,
  max_time = max(df$time)
)
sim_delayed_observation(
  df = i_sim_count_data,
  delay_fn,
  ...,
  input = "infections",
  output = input,
  max_time = max(df$time)
)

Arguments

`df`	a count dataframe from e.g. `sim_poisson_model()` or `sim_summarise_linelist()`
`delay_fn`	a function that takes time and returns the probability of observation (given it occurred) over time since infection (i.e. tau) as an ip delay distribution. This does not have to sum to 1 (e.g. mapping incidence to prevalence) but if not then it will behave as if some fraction or events are not observed (or observed multiple times). See `cfg_weekly_ip_fn()` and `cfg_gamma_ip_fn()` for helper functions to construct this parameter.
`...`	not used
`input`	the input statistic (defaults to `count`)
`output`	the output column name (defaults to same as `input`)
`max_time`	the date on which censoring is taking place.

Value

the result of applying this right censoring to the data.

Examples

weekday_delay = make_gamma_ip(median_of_mean = 5, median_of_sd = 2)
weekend_delay = make_gamma_ip(median_of_mean = 7, median_of_sd = 2)

delay_fn = ~ ifelse(.x %% 7 %in% c(6,7), list(weekend_delay), list(weekday_delay))

data = tibble::tibble(time=1:40, count = rep(100,40), statistic="infections") %>%
  dplyr::group_by(statistic) %>%
  sim_delayed_observation(delay_fn,output="delayed")

if (interactive()) ggplot2::ggplot(data,ggplot2::aes(x=time,colour=statistic))+
  ggplot2::geom_line(ggplot2::aes(y=count))
weekday_delay = make_gamma_ip(median_of_mean = 5, median_of_sd = 2)
weekend_delay = make_gamma_ip(median_of_mean = 7, median_of_sd = 2)

delay_fn = ~ ifelse(.x %% 7 %in% c(6,7), list(weekend_delay), list(weekday_delay))

data = tibble::tibble(time=1:40, count = rep(100,40), statistic="infections") %>%
  dplyr::group_by(statistic) %>%
  sim_delayed_observation(delay_fn,output="delayed")

if (interactive()) ggplot2::ggplot(data,ggplot2::aes(x=time,colour=statistic))+
  ggplot2::geom_line(ggplot2::aes(y=count))

Generate a multinomial outbreak defined by per class growth rates and a poisson model

Description

Generate a multinomial outbreak defined by per class growth rates and a poisson model

Usage

sim_multinomial(
  changes = tibble::tibble(t = c(0, 20, 40, 60, 80), variant1 = c(0.1, 0, -0.1, 0, 0.1),
    variant2 = c(0.15, 0.05, -0.05, -0.01, 0.05), variant3 = c(0, 0.05, -0.05, +0.05,
    -0.05), ),
  initial = c(100, 100, 100),
  time_unit = "1 day",
  ...
)
sim_multinomial(
  changes = tibble::tibble(t = c(0, 20, 40, 60, 80), variant1 = c(0.1, 0, -0.1, 0, 0.1),
    variant2 = c(0.15, 0.05, -0.05, -0.01, 0.05), variant3 = c(0, 0.05, -0.05, +0.05,
    -0.05), ),
  initial = c(100, 100, 100),
  time_unit = "1 day",
  ...
)

Arguments

`changes`	a list of time points in column `t` and growth rates per week per class, in other columns.
`initial`	the size of the initial outbreak per class. There should be one entry per class
`time_unit`	e.g. a daily or weekly time series: "1 day", "7 days"
`...`	Named arguments passed on to `sim_poisson_model` `fn_imports` a function that takes input vector `t` and returns the number of imported cases at times `t`. `seed` a random seed `kappa` a dispersion parameter. 1 is no dispersion (compared to poisson), smaller values mean more dispersion. `max_time` the desired length of the time series, `fn_growth` a function that takes input vector `t` and returns the growth rates at times `t`

Value

a case count time series including class, count and time columns

Examples

if (interactive()) {
  plot_counts(
    sim_multinomial() %>% dplyr::glimpse()
  )
}
if (interactive()) {
  plot_counts(
    sim_multinomial() %>% dplyr::glimpse()
  )
}

Generate an outbreak case count series defined by growth rates using a poisson model.

Description

Generate an outbreak case count series defined by growth rates using a poisson model.

Usage

sim_poisson_model(
  changes = tibble::tibble(t = c(0, 20, 40, 60, 80), growth = c(0.1, 0, -0.1, 0, 0.1)),
  kappa = 1,
  max_time = 104,
  seed = Sys.time(),
  fn_growth = cfg_step_fn(changes),
  fn_imports = ~ifelse(.x == 0, 100, 0),
  time_unit = "1 day"
)
sim_poisson_model(
  changes = tibble::tibble(t = c(0, 20, 40, 60, 80), growth = c(0.1, 0, -0.1, 0, 0.1)),
  kappa = 1,
  max_time = 104,
  seed = Sys.time(),
  fn_growth = cfg_step_fn(changes),
  fn_imports = ~ifelse(.x == 0, 100, 0),
  time_unit = "1 day"
)

Arguments

`changes`	a dataframe holding change time points (`t`) and growth rate per week (`growth`) columns
`kappa`	a dispersion parameter. 1 is no dispersion (compared to poisson), smaller values mean more dispersion.
`max_time`	the desired length of the time series,
`seed`	a random seed
`fn_growth`	a function that takes input vector `t` and returns the growth rates at times `t`
`fn_imports`	a function that takes input vector `t` and returns the number of imported cases at times `t`.
`time_unit`	e.g. a daily or weekly time series: "1 day", "7 days"

Value

A dataframe containing the following columns:

count (positive_integer) - Positive case counts associated with the specified time frame
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period

Ungrouped.

Examples


tmp2 = sim_poisson_model(seed=100, fn_imports = ~ ifelse(.x %in% c(0,50),100,0))

if (interactive()) {
  ggplot2::ggplot(tmp2)+ggplot2::geom_point(ggplot2::aes(x=time,y=count))
}
tmp2 = sim_poisson_model(seed=100, fn_imports = ~ ifelse(.x %in% c(0,50),100,0))

if (interactive()) {
  ggplot2::ggplot(tmp2)+ggplot2::geom_point(ggplot2::aes(x=time,y=count))
}

Generate an outbreak case count series defined by Reproduction number using a poisson model.

Description

Generate an outbreak case count series defined by Reproduction number using a poisson model.

Usage

sim_poisson_Rt_model(
  changes = tibble::tibble(t = c(0, 40), rt = c(2.5, 0.8)),
  kappa = 1,
  max_time = 80,
  seed = Sys.time(),
  fn_Rt = cfg_step_fn(changes),
  fn_imports = ~ifelse(.x == 0, 30, 0),
  fn_ip = ~test_ip,
  time_unit = "1 day"
)
sim_poisson_Rt_model(
  changes = tibble::tibble(t = c(0, 40), rt = c(2.5, 0.8)),
  kappa = 1,
  max_time = 80,
  seed = Sys.time(),
  fn_Rt = cfg_step_fn(changes),
  fn_imports = ~ifelse(.x == 0, 30, 0),
  fn_ip = ~test_ip,
  time_unit = "1 day"
)

Arguments

`changes`	a dataframe holding change time points (`t`) and reproduction number (`rt`) columns
`kappa`	a dispersion parameter. 1 is no dispersion (compared to poisson), smaller values mean more dispersion.
`max_time`	the desired length of the time series,
`seed`	a random seed
`fn_Rt`	a function that takes input vector `t` and returns the instantaneous reproduction number at time `t`
`fn_imports`	a function that takes input vector `t` and returns the number of imported cases at times `t`.
`fn_ip`	a function that takes input vector `t` and returns an infectivity profile at times `t`.
`time_unit`	e.g. a daily or weekly time series: "1 day", "7 days"

Value

A dataframe containing the following columns:

count (positive_integer) - Positive case counts associated with the specified timeframe
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period

Ungrouped.

Examples


tmp = sim_poisson_Rt_model(kappa=1, seed=100, fn_imports = ~ ifelse(.x %in% c(0,50),100,0))

if (interactive()) {
  ggplot2::ggplot(tmp,ggplot2::aes(x=time,y=count))+ggplot2::geom_point()+
    ggplot2::geom_line()
}

tmp = sim_poisson_Rt_model(kappa=1, seed=100, fn_imports = ~ ifelse(.x %in% c(0,50),100,0))

if (interactive()) {
  ggplot2::ggplot(tmp,ggplot2::aes(x=time,y=count))+ggplot2::geom_point()+
    ggplot2::geom_line()
}

Summarise a line list

Description

This function converts a line list into a daily count of incident cases, plus infections, admissions, deaths, test samples, test results if present. Censoring of these counts can also be defined. Whilst summarising various network measures such as the forward looking case reproduction number are also calculated.

Usage

sim_summarise_linelist(
  df = i_sim_linelist,
  ...,
  censoring = list(),
  max_time = max(df$time)
)
sim_summarise_linelist(
  df = i_sim_linelist,
  ...,
  censoring = list(),
  max_time = max(df$time)
)

Arguments

`df`	a line list dataframe arising from e.g. `sim_branching_process()` A dataframe containing the following columns: id (unique_id) - Patient level unique id time (ggoutbreak::time_period) - Time of infection. A 'time_period' Any grouping allowed.
`...`	the grouping to include in the summarisation.
`censoring`	a named list of column names (without the `⁠_time⁠` suffix) of the kind created by `sim_delay()` or `sim_apply_delay()`, and an associated function defining the delay of reporting that the column experiences. For this function `t` (or `.x` for a purrr lambda) will refer to the `XX_time` column, i.e. whenever the event that is being reported happened and `time` the simulation infection time. N.B. since infection is not observed you can't censor it.
`max_time`	the censoring time for this observation.

Value

a count data frame with various R_t measures, ascertainment, and underlying infection count.

Examples


sim = sim_branching_process(
  changes = tibble::tibble(t = c(0,40), R = c(1.7,0.8)),
  max_time = 120,
  seed = 100,
  fn_imports = ~ ifelse(.x==0,100,0)
)

tmp = sim %>% sim_summarise_linelist()

p1 = plot_counts(tmp)

p2 = ggplot2::ggplot(tmp, ggplot2::aes(x=as.Date(time)))+
  ggplot2::geom_point(ggplot2::aes(y=rt.case,colour="case"))+
  ggplot2::geom_point(ggplot2::aes(y=rt.inst,colour="instantaneous"))+
  ggplot2::geom_line(ggplot2::aes(y=rt.weighted))+
  ggplot2::coord_cartesian(ylim=c(0,3.5))+
  ggplot2::xlab(NULL)

patchwork::wrap_plots(p1,p2,ncol=1,axes="collect")

sim = sim_branching_process(
  changes = tibble::tibble(t = c(0,40), R = c(1.7,0.8)),
  max_time = 120,
  seed = 100,
  fn_imports = ~ ifelse(.x==0,100,0)
)

tmp = sim %>% sim_summarise_linelist()

p1 = plot_counts(tmp)

p2 = ggplot2::ggplot(tmp, ggplot2::aes(x=as.Date(time)))+
  ggplot2::geom_point(ggplot2::aes(y=rt.case,colour="case"))+
  ggplot2::geom_point(ggplot2::aes(y=rt.inst,colour="instantaneous"))+
  ggplot2::geom_line(ggplot2::aes(y=rt.weighted))+
  ggplot2::coord_cartesian(ylim=c(0,3.5))+
  ggplot2::xlab(NULL)

patchwork::wrap_plots(p1,p2,ncol=1,axes="collect")

Generate a simple time-series of cases based on a growth rate step function

Description

This time-series has no statistical noise and is useful for testing things. It has a fixed known value for infections and growth rate (fixed at 0.05 and -0.05 per day), and a instantaneous reproduction number which is based on the provided infectivity profile. A fixed denominator gives a known proportion and a relative growth rate that is the same as the growth rate.

Usage

sim_test_data(ip = test_ip, duration = 500, period = 50)
sim_test_data(ip = test_ip, duration = 500, period = 50)

Arguments

`ip`	an infectivity profile. any uncertainty will be collapsed into the central distribution.
`duration`	the total length of the time-series
`period`	the duration of each positive or negative growth phase

Value

a time series with count, incidence, growth, rt, proportion and relative.growth columns

Examples

sim_test_data() %>% dplyr::glimpse()

sim_test_data() %>% dplyr::glimpse()

Generate a single infectivity profile from multiple bootstraps

Description

Generate a single infectivity profile from multiple bootstraps

Usage

summarise_ip(ip = i_empirical_ip)
summarise_ip(ip = i_empirical_ip)

Arguments

ip

the infectivity profile to summarise. a0 and a1 columns are optional if tau is given.

A dataframe containing the following columns:

boot (anything + default(1)) - a bootstrap identifier
probability (proportion) - the probability of new event during this period.
a0 (double) - the beginning of the time period (in days)
a1 (double) - the end of the time period (in days)

Must be grouped by: boot (exactly).

A default value is defined.

Value

an infectivity profile

An example of the linelist output of the branching process model simulation

Description

This is generated using the test_ip infectivity profile and also includes a delay to symptom onset which is a random gamma distributed quantity with mean of 6 and sd of 2

Usage

data(test_bpm)
data(test_bpm)

Format

A dataframe containing the following columns:

time (as.time_period) - the time column
id (integer) - an id per individual
generation_interval (numeric) - the generation_interval column
infector (integer) - the infector id
generation (numeric) - the generation column
symptom_onset (logical) - the flag for onset of symptoms
symptom_onset_delay (numeric) - the time to onset of symptoms from infection
symptom_onset_time (as.time_period) - the time of symptom onset

333126 rows and 8 columns

A test infectivity profile generated from a set of discretised gamma distributions with parameters mean 5 (95% CI 4-6) and sd 2 (95% CI 1.5-2.5).

Description

A test infectivity profile generated from a set of discretised gamma distributions with parameters mean 5 (95% CI 4-6) and sd 2 (95% CI 1.5-2.5).

Usage

data(test_ip)
data(test_ip)

Format

A dataframe containing the following columns:

boot (anything + default(1)) - a bootstrap identifier
probability (proportion) - the probability of infection between previous time period until time
tau (numeric) - the time index this probability relates to (in days)
a0 (numeric) - the beginning of the time period
a1 (numeric) - the end of the time period

Grouped by: boot.

2000 rows and 5 columns

An example of the linelist output of the poisson model simulation with defined $R_t$

Description

This is generated using the test_ip infectivity profile

Usage

data(test_poisson_rt)
data(test_poisson_rt)

Format

A dataframe containing the following columns:

time (as.time_period) - the time column
rt (numeric) - the time varying rt column (parameters)
imports (numeric) - the imports column
rate (numeric) - the poisson rate column (underlying infection rate)
count (integer) - the count column
statistic (character) - the statistic column

81 rows and 6 columns

A serial interval estimated from simulated data

Description

This serial interval is resampled from the first 1000 patients in the test_bpm dataset for whom both infector and infectee has symptoms. These patients are generated with a symptom delay of mean 6 days and SD 2 from infection (discrete under-dispersed gamma) and an infectivity profile with mean 5 days and SD 2 as defined in test_ip dataset. This serial interval if relevant to the estimation of $R_t$ from symptomatic case counts but includes negative times, and cannot be used with EpiEstim.

Usage

data(test_serial)
data(test_serial)

Format

A dataframe containing the following columns:

tau (numeric) - the time delay between symptoms in infector and infectee
a0 (numeric) - the a0 column
a1 (numeric) - the a1 column
probability (numeric) - the probability column
boot (integer) - the boot column

Minimally grouped by: boot (and other groupings allowed).

2166 rows and 5 columns

A test time series dataset

Description

A test time series dataset

Usage

data(test_ts)
data(test_ts)

Format

A dataframe containing the following columns:

time (as.time_period) - the time column
growth (numeric) - the growth column
incidence (numeric) - the incidence column
rt (numeric) - the rt column
denom (numeric) - the denom column
proportion (numeric) - the proportion column
relative.growth (numeric) - the relative.growth column
count (numeric) - the count column

Any grouping allowed.

501 rows and 8 columns

Aggregate time series data preserving the time series

Description

Aggregate time series data preserving the time series

Usage

time_aggregate(
  df = i_timestamped,
  ...,
  .groups = NULL,
  .cols = NULL,
  .fns = NULL
)
time_aggregate(
  df = i_timestamped,
  ...,
  .groups = NULL,
  .cols = NULL,
  .fns = NULL
)

Arguments

`df`	an optionally grouped time series. Grouping should not include the time column. The grouping works differently from `dplyr::summarise` in that the last level of non-time groups is lost in this operation, so the subgroup you wish to aggregate should be included in the grouping.
`...`	A set of `dplyr::summarise` statements, or additional parameters for `.fns`
`.groups`	as per `dplyr::summarise`
`.cols`	Optional tidyselect column specification for `dplyr::across`. if `.fns` is given and the `.cols` parameter is not specified then the columns to summarise are automatically identified. In doing this any `Date` columns are dropped. If this in not what you want then `.cols` or `...` must be given
`.fns`	Optional a set of function specifications as per `dplyr::across`

Value

the summarised time series preserving the time column, and with the grouping structure involving one fewer levels than the input

Examples

ggoutbreak::england_covid %>%
  time_aggregate(count = sum(count), denom = sum(denom)) %>%
  dplyr::glimpse()

ggoutbreak::england_covid %>%
  time_aggregate(.fns=mean) %>%
  dplyr::glimpse()
ggoutbreak::england_covid %>%
  time_aggregate(count = sum(count), denom = sum(denom)) %>%
  dplyr::glimpse()

ggoutbreak::england_covid %>%
  time_aggregate(.fns=mean) %>%
  dplyr::glimpse()

Summarise data from a line list to a time-series of counts.

Description

This principally is designed to take a record of single events and produce a summary time-series count of events by group, class and date. The default behaviour is to guess the cadence of the input data and summarise the event line list to a (set of) regular time-series counts for use in incidence and growth rate estimates.

Usage

time_summarise(
  df = i_dated,
  unit,
  anchor = "start",
  rectangular = FALSE,
  ...,
  .fill = list(count = 0)
)
time_summarise(
  df = i_dated,
  unit,
  anchor = "start",
  rectangular = FALSE,
  ...,
  .fill = list(count = 0)
)

Arguments

`df`	a line list of data you want to summarise, optionally grouped. If this is grouped then each group is treated independently. The remaining columns must contain a `date` column and may contain a `class` column. If a `count` column is present the counts will be summed, otherwise each individual row will be counted as a single event (as a linelist)
`unit`	a period e.g. "1 week"
`anchor`	one of a date, "start" or "end" or a weekday name e.g. "mon" this will always be one of the start of the time periods we are cutting into
`rectangular`	should the resulting time series be the same length for all groups. This is only the case if you can be sure that your data is complete for all subgroups, otherwise missing data will be treated as zero counts. This is important if leading and trailing missing data in one subgroup can be due to a reporting delay in that subgroup, in which case a rectangular time series will erroneously fill in zero counts for this missing data.
`...`	a spec for a dplyr::summary(...) - optional, and if not provided a `count = dplyr::n()` or a `count = sum(count)` is performed.
`.fill`	a list similar to tidyr::complete for values to fill variables with

Details

If the data is given with a class column the time series are interpreted as having a denominator, consisting of all the different classes within a time period. This may be subtypes (e.g. variants, serotypes) or markers for test positivity. In either case the resulting time series will have counts for all classes and denominators for the combination.

There is flexibility for other kinds of summarisation if the raw data is not count based (e.g. means of continuous variables) but in this case a the slider package is usually going to be better, as time summarise will only look at non overlapping time periods with fixed lengths.

There is another use case where an existing timeseries on a particular frequency is aggregated to another less frequent basis (e.g. moving from a daily timeseries to a weekly one). In this case the input will contain a count column. In this mode no checks are made that the more frequent events are all present before summarisation so the result may include different numbers of input periods (e.g. going from weeks to months may be 4 or 5 weeks in each month)

Value

The output depends on whether or not the input was grouped and had a class column. The most detailed output will be:

A dataframe containing the following columns:

denom (positive_integer) - Total test counts associated with the specified time frame
count (positive_integer) - Positive case counts associated with the specified time frame
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period

Any grouping allowed.

or a more minimal output if the input is only a plain list of dated events:

A dataframe containing the following columns:

count (positive_integer) - Positive case counts associated with the specified time frame
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period

Any grouping allowed.

Convert a set of time points to dates

Description

Convert a set of time points to dates

Usage

time_to_date(
  timepoints,
  unit = attr(timepoints, "unit"),
  start_date = attr(timepoints, "start_date")
)
time_to_date(
  timepoints,
  unit = attr(timepoints, "unit"),
  start_date = attr(timepoints, "start_date")
)

Arguments

`timepoints`	a set of numeric time points
`unit`	the period / unit of the time points, which will be extracted from time points if possible
`start_date`	the zero day of the time series, will be extracted from time points if possible

Value

a vector of dates

Examples

times = date_to_time(as.Date("2019-12-29")+0:100, "1 week")
dates = time_to_date(times)
times = date_to_time(as.Date("2019-12-29")+0:100, "1 week")
dates = time_to_date(times)

Calculate the reproduction number from a growth rate estimate and an infectivity profile

Description

This function uses a single empirical distribution for the infectivity profile aka generation time

Usage

wallinga_lipsitch(
  r,
  y = i_empirical_ip,
  a1 = seq(0.5, length.out = length(y)),
  a0 = dplyr::lag(a1, default = 0)
)
wallinga_lipsitch(
  r,
  y = i_empirical_ip,
  a1 = seq(0.5, length.out = length(y)),
  a0 = dplyr::lag(a1, default = 0)
)

Arguments

`r`	a growth rate (may be a vector)
`y`	an empirical infectivity profile either as a probability vector or as a dataframe of format: A dataframe containing the following columns: boot (anything + default(1)) - a bootstrap identifier probability (proportion) - the probability of new event during this period. a0 (double) - the beginning of the time period (in days) a1 (double) - the end of the time period (in days) Must be grouped by: boot (exactly). A default value is defined.
`a1`	the end time of the infectivity profile probability estimate (defaults to 0.5,1.5,2.5,...).
`a0`	the start time of the infectivity profile probability estimate (defaults to 0,0.5,1.5,...).

Value

a reproduction number estimate based on r

Examples

wallinga_lipsitch(r=seq(-0.1,0.1,length.out=9), y=stats::dgamma(1:50, 5,2))
wallinga_lipsitch(r=seq(-0.1,0.1,length.out=9), y=stats::dgamma(1:50, 5,2))

Wedge distribution

Description

The wedge distribution has a domain of 0 to 1 and has a linear probability density function over that domain.

Arguments

`n`	number of observations
`x`	vector of quantiles
`q`	vector of quantiles
`p`	vector of probabilities
`log`	logical; if TRUE, probabilities p are given as log(p).
`log.p`	logical; if TRUE, probabilities p are given as log(p).
`lower.tail`	logical; if TRUE (default), probabilities are `P[X<=x]` otherwise `P[X>x]`.
`a`	a gradient from -2 (left skewed) to 2 (right skewed)

Details

The rwedge can be combined with quantile functions to skew standard distributions, or introduce correlation or down weight certain parts of the distribution.

Value

a vector of probabilities, quantiles, densities or samples.

Examples


pwedge(seq(0,1,0.1), a=1)
dwedge(seq(0,1,0.1), a=1)
qwedge(c(0.25,0.5,0.75), a=-1)

stats::cor(
  stats::qnorm(rwedge(1000, a=2)),
  stats::qnorm(rwedge(1000, a=-2))
)
pwedge(seq(0,1,0.1), a=1)
dwedge(seq(0,1,0.1), a=1)
qwedge(c(0.25,0.5,0.75), a=-1)

stats::cor(
  stats::qnorm(rwedge(1000, a=2)),
  stats::qnorm(rwedge(1000, a=-2))
)

Extract Parts of a POSIXt or Date Object

Description

Extract the weekday, month or quarter, or the Julian time (days since some origin). These are generic functions: the methods for the internal date-time classes are documented here.

Usage

## S3 method for class 'time_period'
weekdays(x, abbreviate = FALSE)
## S3 method for class 'time_period'
weekdays(x, abbreviate = FALSE)

Arguments

`x`	an object inheriting from class `"POSIXt"` or `"Date"`.
`abbreviate`	logical vector (possibly recycled). Should the names be abbreviated?

Value

weekdays and months return a character vector of names in the locale in use, i.e., Sys.getlocale("LC_TIME").

quarters returns a character vector of "Q1" to "Q4".

julian returns the number of days (possibly fractional) since the origin, with the origin as a "origin" attribute. All time calculations in R are done ignoring leap-seconds.

Note

Examples

## first two are locale dependent:
weekdays(.leap.seconds)
months  (.leap.seconds)
quarters(.leap.seconds)

## Show how easily you get month, day, year, day (of {month, week, yr}), ... :
## (remember to count from 0 (!): mon = 0..11, wday = 0..6,  etc !!)

##' Transform (Time-)Date vector  to  convenient data frame :
dt2df <- function(dt, dName = deparse(substitute(dt))) {
    DF <- as.data.frame(unclass(as.POSIXlt( dt )))
    `names<-`(cbind(dt, DF, deparse.level=0L), c(dName, names(DF)))
}
## e.g.,
dt2df(.leap.seconds)    # date+time
dt2df(Sys.Date() + 0:9) # date

##' Even simpler:  Date -> Matrix - dropping time info {sec,min,hour, isdst}
d2mat <- function(x) simplify2array(unclass(as.POSIXlt(x))[4:7])
## e.g.,
d2mat(seq(as.Date("2000-02-02"), by=1, length.out=30)) # has R 1.0.0's release date


## Julian Day Number (JDN, https://en.wikipedia.org/wiki/Julian_day)
## is the number of days since noon UTC on the first day of 4317 BCE.
## in the proleptic Julian calendar.  To more recently, in
## 'Terrestrial Time' which differs from UTC by a few seconds
## See https://en.wikipedia.org/wiki/Terrestrial_Time
julian(Sys.Date(), -2440588) # from a day
floor(as.numeric(julian(Sys.time())) + 2440587.5) # from a date-time

## first two are locale dependent:
weekdays(.leap.seconds)
months  (.leap.seconds)
quarters(.leap.seconds)

## Show how easily you get month, day, year, day (of {month, week, yr}), ... :
## (remember to count from 0 (!): mon = 0..11, wday = 0..6,  etc !!)

##' Transform (Time-)Date vector  to  convenient data frame :
dt2df <- function(dt, dName = deparse(substitute(dt))) {
    DF <- as.data.frame(unclass(as.POSIXlt( dt )))
    `names<-`(cbind(dt, DF, deparse.level=0L), c(dName, names(DF)))
}
## e.g.,
dt2df(.leap.seconds)    # date+time
dt2df(Sys.Date() + 0:9) # date

##' Even simpler:  Date -> Matrix - dropping time info {sec,min,hour, isdst}
d2mat <- function(x) simplify2array(unclass(as.POSIXlt(x))[4:7])
## e.g.,
d2mat(seq(as.Date("2000-02-02"), by=1, length.out=30)) # has R 1.0.0's release date


## Julian Day Number (JDN, https://en.wikipedia.org/wiki/Julian_day)
## is the number of days since noon UTC on the first day of 4317 BCE.
## in the proleptic Julian calendar.  To more recently, in
## 'Terrestrial Time' which differs from UTC by a few seconds
## See https://en.wikipedia.org/wiki/Terrestrial_Time
julian(Sys.Date(), -2440588) # from a day
floor(as.numeric(julian(Sys.time())) + 2440587.5) # from a date-time

Package 'ggoutbreak'

Help Index

Insert a layer at the bottom of a ggplot

Description

Usage

Arguments

Value

Convert time period to dates

Description

Usage

Arguments

Value

Functions

Convert to a time period class

Description

Usage

Arguments

Value

Functions

Examples

A scales breaks generator for log1p scales

Description

Usage

Arguments

Value

Examples

Generate a random probability based on features of the simulation

Description

Usage

Arguments

Value

See Also

Examples

Get a IP generating function from time varying mean and SD of a gamma function

Description

Usage

Arguments

Value

Examples

Randomly sample from an empirical distribution

Description

Usage

Arguments

Value

Examples

Linear function from dataframe

Description

Usage

Arguments

Value

Step function from dataframe

Description

Usage

Arguments

Value

Sample from a multinomial transition matrix

Description

Usage

Arguments

Value

Examples

Weekly delay function with day of week effect

Description

Usage

Arguments

Value

Examples

Weekly convolution distribution function

Description

Usage

Arguments

Value

Examples

Random probability function with day of week effect

Description

Usage

Arguments

Value

Examples

A COVID-19 infectivity profile based on an empirical resampling approach

Insert a layer at the bottom of a `ggplot`

Expand a `time_period` vector to the full range of possible times