Package 'ggoutbreak'

Title: Estimate Incidence, Proportions and Exponential Growth Rates
Description: Simple statistical models and visualisations for calculating the incidence, proportion, exponential growth rate, and reproduction number of infectious disease case time series. This toolkit was largely developed during the COVID-19 pandemic.
Authors: Robert Challen [aut, cre, cph]
Maintainer: Robert Challen <[email protected]>
License: MIT + file LICENSE
Version: 0.4.1
Built: 2025-02-07 09:32:33 UTC
Source: https://github.com/ai4ci/ggoutbreak

Help Index


Insert a layer at the bottom of a ggplot

Description

Insert a layer at the bottom of a ggplot

Usage

plot %above% layer

Arguments

plot

the plot to add the layer to

layer

the layer to insert underneath the plot

Value

a ggplot


Convert time period to dates

Description

Convert time period to dates

Usage

## S3 method for class 'time_period'
as.Date(x, ...)

## S3 method for class 'time_period'
as.POSIXct(x, ...)

Arguments

x

a time_period

...

not used

Value

a vector of dates representing the start of each of the input time_period entries

Functions

  • as.POSIXct(time_period): Convert to a vector of POSIXct


Convert to a time period class

Description

Time periods are just a zero based numeric representation of dates with a time unit baked in. This allows variable length periods (e.g. days or weeks), and fractional days to be represented in a consistent(ish) way

Usage

as.time_period(x, unit = NULL, start_date = NULL, anchor = NULL, ...)

## S3 method for class 'time_period'
c(..., recursive = F)

## S3 method for class 'time_period'
x[...]

## S3 replacement method for class 'time_period'
x[...] <- value

## S3 method for class 'time_period'
x[[...]]

## S3 replacement method for class 'time_period'
x[[...]] <- value

## S3 method for class 'time_period'
seq(from, to = from, ...)

is.time_period(x)

## S3 method for class 'time_period'
print(x, ...)

Arguments

x

a vector of numbers (may be integer or real) or a time_period

unit

the length of one unit of time. This will be either a integer number of days, or a specification such as "1 week", or another time_period. If x is a time_period, and the unit is different then from that of x this will return a new time_period using the new units.

start_date

the zero time date as something that can be coerced to a date. If the x input is already a time_period and this is different to its start_date then it will be recalibrated to use the new start date.

anchor

only relevant is x is a vector of dates and start_date is not specified, this is a date, or "start" or "end" or a weekday name e.g. "mon". With the vector of dates in x it will find a reference date for the time-series. If this is NULL and start_date is also NULL it will fall back to getOption("day_zero","2019-12-29")

...

used for subtype implementations

recursive

concatenate recursively

value

the value

from, to

the starting and (maximal) end values of the sequence. Of length 1 unless just from is supplied as an unnamed argument.

Value

a time_period class, consisting of a vector of numbers, with attributes for time period and start_date

Functions

  • c(time_period): Combine time_period

  • [: Subset a time_period

  • `[`(time_period) <- value: Assign values to a subset of a time_period

  • [[: Get a value in a time_period

  • `[[`(time_period) <- value: Assign a value in a time_period

  • seq(time_period): Create a sequence using time_periods

  • is.time_period(): Check is a time_period

  • print(time_period): Print a time_period

Examples

# 100 weeks from 2020-01-01

tmp = as.time_period(0:100, 7, "2020-01-01")
as.Date(tmp)

range(tmp)
min(tmp)
tmp2 = as.integer(as.Date(tmp))
# testthat::expect_true(all(na.omit(tmp2-lag(tmp2)) == 7))

tmp2 = as.time_period(0:23, 1/24, "2020-01-01")
as.POSIXct(tmp2)

# convert timeseries to new "unit"
tmp = as.time_period(0:100, 7, "2020-01-01")
tmp2 = as.time_period(tmp,1)
testthat::expect_equal(as.numeric(tmp2), 0:100*7)

A scales breaks generator for log1p scales

Description

A scales breaks generator for log1p scales

Usage

breaks_log1p(n = 5, base = 10)

Arguments

n

the number of breaks

base

the base for the breaks

Value

a function for ggplot scale breaks

Examples

ggplot2::ggplot(ggplot2::diamonds, ggplot2::aes(x=price))+
  ggplot2::geom_density()+
  ggplot2::scale_x_continuous(trans="log1p", breaks=breaks_log1p())

Generate a random probability based on features of the simulation

Description

Generate a random probability based on features of the simulation

Usage

cfg_beta_prob_rng(probability_fn = ~0.8, kappa_fn = ~0.1)

Arguments

probability_fn

a function which gives the time-varying mean of a beta distribution, The function will be called minimally with .x or 't“ which will be the time as a time period. Other variables may be present.

kappa_fn

a function which gives the time-varying dispersion of a beta distribution. The function will be called with minimally .x or t which will be the time period and .y or mean will be the mean. Other variables may be present.

Value

a time dependent function that inputs a time (as a time_period) and returns an probability for each day defined by the probability_fn and kappa_fn

See Also

rbeta2()

Examples

fn = cfg_beta_prob_rng(~ ifelse(.x<=5,0.1,0.9))
fn(1:10)

Get a IP generating function from time varying mean and SD of a gamma function

Description

Get a IP generating function from time varying mean and SD of a gamma function

Usage

cfg_gamma_ip_fn(mean_fn = ~2, sd_fn = function(mean) sqrt(mean))

Arguments

mean_fn

a function which gives the time-varying mean of a gamma distribution, The function will be called minimally with .x or t which will be the time as a time period. Other variables may be present.

sd_fn

a function which gives the time-varying mean of a gamma distribution. The function will be called with minimally .x or t which will be the time period and .y or mean will be the mean. Other variables may be present.

Value

a time dependent function that inputs a time (as a time_period) and returns an ip delay distribution for each day defined by the mean_fn and sd_fn

Examples

fn = cfg_gamma_ip_fn(mean_fn = \(t) ifelse(t < 5, 4, 2))
# a gamma function that changes mean at time 5
fn(4)
fn(7)

Randomly sample from an empirical distribution

Description

This is used for random sampling from the infectivity profile for times to infection, for example. There is nothing to stop you putting in a delay distribution with negative times but strange things may happen in your simulation.

Usage

cfg_ip_sampler_rng(ip = i_empirical_ip)

Arguments

ip

a long format empirical distribution

A dataframe containing the following columns:

  • boot (anything + default(1)) - a bootstrap identifier

  • probability (proportion) - the probability of new event during this period.

  • a0 (double) - the beginning of the time period (in days)

  • a1 (double) - the end of the time period (in days)

Must be grouped by: boot (exactly).

A default value is defined.

Value

a function which accepts n parameter which produces random samples from the ip distribution

Examples

tmp = cfg_ip_sampler_rng(ganyani_ip_2)(10000)

# This discretised ganyani distribution is based on these figures:
# mean: 5.2 (3.78-6.78) and sd: 1.72 (0.91-3.93)
format_ip(ganyani_ip_2)

mean(tmp) # Should be about 5.2
stats::sd(tmp) # Should be about 1.72

Linear function from dataframe

Description

Linear function from dataframe

Usage

cfg_linear_fn(changes, ..., col_name = setdiff(colnames(changes), "t"))

Arguments

changes

a dataframe with t and ⁠<col_name>⁠ columns which define the change points for a piecewise linear function.

...

not used

col_name

the value column (optional if only 2 columns)

Value

a function that inputs a vector t and returns a linearly interpolated value from ⁠<col_name>⁠


Step function from dataframe

Description

Step function from dataframe

Usage

cfg_step_fn(changes, ..., col_name = setdiff(colnames(changes), "t"))

Arguments

changes

a dataframe with t and ⁠<col_name>⁠ columns which define the cut points for a step function.

...

not used

col_name

the value column (optional if only 2 columns)

Value

a function that inputs a vector t and returns the next smallest corresponding value in ⁠<col_name>⁠ (or the first one)


Sample from a multinomial transition matrix

Description

This is particularly designed for use within the fn_list_next_gen parameter of sim_branching_process() to allow age group contract matrices to be applied for example (assuming a categorical age).

Usage

cfg_transition_fn(transition)

Arguments

transition

a transition matrix or long format dataframe. If a matrix the columns should add to 1, and the column names are the input class. row names are the output class. the data frame format must have input, output, and probability columns.

Value

a function that given an input will return samples of the output class according to the probability distributions.

Examples

age = rep(c("child","adult","elderly"),100)

fn = cfg_transition_fn(tibble::tribble(
  ~input, ~output, ~probability,
  "child", "child", 0.5,
  "child", "adult", 0.5,
  "adult", "child", 0.5,
  "adult", "adult", 0.3,
  "adult", "elderly", 0.2,
  "elderly","elderly", 0.5,
  "elderly","adult", 0.5,
))

table(fn(age),age)

Weekly delay function with day of week effect

Description

This function returns a random number generator from a gamma distribution that has a weekly period and configurable degree of additional variability around weekly pattern. It is suited to delays of things like testing which may depend on the day of week.

Usage

cfg_weekly_gamma_rng(
  mean = c(1, 1, 1, 1, 4, 3, 2),
  sd = sqrt(mean),
  week_starts = weekdays(as.Date("2024-10-14"))
)

Arguments

mean

a mean amount of delay for each day of the week

sd

the SD of the delay

week_starts

locale description of first day of week (default is a "Monday").

Value

a random number generator taking t time parameter and returning a duration the that time t depending on the day of the week.

Examples

fn = cfg_weekly_gamma_rng(c(1,1,1,1,4,3,2))
matrix(fn(1:42),ncol=7,byrow=TRUE)

Weekly convolution distribution function

Description

Weekly convolution distribution function

Usage

cfg_weekly_ip_fn(
  mean = c(1, 1, 1, 1, 4, 3, 2),
  sd = sqrt(mean),
  week_starts = weekdays(as.Date("2024-10-14"))
)

Arguments

mean

The means of a gamma distributed delay function for each weekday

sd

The sds of a gamma distributed delay function for each weekday

week_starts

locale description of first day of week (default is a "Monday").

Value

a time dependent function that inputs a time (as a time_period) and generates an IP delay distribution for each day varying by day of week

Examples

cat(sapply(cfg_weekly_ip_fn()(1:7),format_ip),sep = "\n")

Random probability function with day of week effect

Description

This function returns a random probability generator that has a weekly period and configurable degree of additional variability around weekly pattern. It is suited to probabilities of things like testing which may depend on the day of week.

Usage

cfg_weekly_proportion_rng(
  prob = c(0.8, 0.8, 0.8, 0.8, 0.8, 0.5, 0.5),
  kappa = 0.1,
  week_starts = weekdays(as.Date("2024-10-14"))
)

Arguments

prob

the rates of e.g. ascertainment for each day of the week.

kappa

dispersion parameter between 0 and 1. O is no dispersion. 1 is maximum

week_starts

locale description of first day of week (default is a "Monday").

Value

a random number generator function taking t time parameter and returning a probability of ascertainment for that time, depending on day of week etc.

Examples

fn = cfg_weekly_proportion_rng(c(0.9,0.9,0.9,0.9,0.9,0.1,0.1))
matrix(fn(1:42),ncol=7,byrow=TRUE)

A COVID-19 infectivity profile based on an empirical resampling approach

Description

An infectivity profile derived from a meta-analysis of serial intervals.

Usage

data(covid_ip)

Format

A dataframe containing the following columns:

  • boot (anything + default(1)) - a bootstrap identifier

  • probability (proportion) - the probability of infection between previous time period until time

  • time (double) - the end of the time period (in days)

Must be grouped by: boot (exactly).

A dataframe containing the following columns:

  • boot (anything + default(1)) - a bootstrap identifier

  • probability (proportion) - the probability of infection between previous time period until time

  • tau (numeric) - the time index this probability relates to (in days)

  • a0 (numeric) - the beginning of the time period

  • a1 (numeric) - the end of the time period

Grouped by: boot.

1400 rows and 5 columns


Test sensitivity of PCR tests

Description

The probability of detecting COVID using PCR given time since infection, based on Binny et al 2023.

Usage

data(covid_test_sensitivity)

Format

A dataframe containing the following columns:

  • tau (numeric) - the time column

  • probability (numeric) - the probability column

  • boot (integer) - the boot column

Must be grouped by: boot (and other groupings allowed).

5100 rows and 3 columns

References

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9384503/


The COVID-19 viral shedding duration

Description

The COVID-19 viral shedding duration

Usage

data(covid_viral_shedding)

Format

A dataframe containing the following columns:

  • boot (anything + default(1)) - a bootstrap identifier

  • probability (proportion) - the probability of infection between previous time period until time

  • tau (numeric) - the time index this probability relates to (in days)

  • a0 (numeric) - the beginning of the time period

  • a1 (numeric) - the end of the time period

Grouped by: boot.

2600 rows and 3 columns

References

https://www.nature.com/articles/s41467-020-20568-4 From Von Kampen et al.


Places a set of dates within a regular time series

Description

The counterpart to date_seq_dates(). Take an original set of data and place it within a regular time series where the periodicity of the time series may be expressed as numbers of days, weeks, months quarters, or years, and the periods are defined by an anchoring date, day of the week or by reference to the start or end of the input dates. This can either return the periods as dates or factors (e.g. for plotting) or as a time_period for analysis that relies on a numeric representation of the date or duration from the anchor.

Usage

cut_date(
  dates,
  unit,
  anchor = "start",
  output = c("date", "factor", "time_period"),
  dfmt = "%d/%b/%y",
  ifmt = "{start} — {end}",
  ...
)

Arguments

dates

a set of dates

unit

a period e.g. "1 week"

anchor

one of a date, "start" or "end" or a weekday name e.g. "mon" this will always be one of the start of the time periods we are cutting into

output

return the result as either a "date" (the default), an ordered "factor" with the date ranges as a label, or as a "time_period". The result is named with labels referring to the

dfmt

the strptime format for the dates in the labels

ifmt

a sprintf format for the period label containing ⁠%s⁠ exactly twice.

...

ignored

Value

a set of dates, times or a factor level, representing the start of the period the date falls into, where the period is defined by the duration and the anchor

Examples

dates = as.Date(c("2020-01-01","2020-02-01","2020-01-15","2020-02-03",NA))
fs = ggoutbreak::date_seq(dates, "2 days")
dates - cut_date(dates, "2 days")
cut_date(dates,unit="2 days", output="time_period")

# A weekly set of dates:
dates2 = Sys.Date() + floor(stats::runif(50,max=10))*7

# in this specific situation the final date is not truncated because the
# input data is seen as an exact match for the whole output period.
cut_date(dates2, "1 week", "sun", output="factor")
cut_date(dates2, dfmt = "%d/%b", output="factor", unit = "2 weeks", anchor="sun")

Create the full sequence of values in a vector

Description

This is useful if you want to fill in missing values that should have been observed but weren't. For example, date_seq(c(1, 2, 4, 6), 1) will return 1:6.

Usage

date_seq(x, period, ...)

Arguments

x

a numeric or date vector

period

Gap between each observation. The existing data will be checked to ensure that it is actually of this periodicity.

...

for subtype methods

Value

a vector of the same type as the input

Examples

date_seq(c(1, 2, 4, 5, 10), 1)

Expand a date vector to the full range of possible dates

Description

Derive from a vector of observation dates, a complete ordered sequence of periods in a regular time series, where the length of the periods is specified, as a number of days, weeks, years etcetera. E.g. this can convert a random set of dates to a ordered complete list of 1 week intervals (or 2 month intervals) spanning the same range as the dates. This has some interesting problems regarding where to put breaks within a month or week. Often this is either based on a specific date (e.g. yearly periods starting at 2020-01-01) or a day of week (e.g. 2 weekly periods staring on a Sunday) or maybe relative to the input time series (weekly ending on the last date of the data). There is also a problem when we consider data that may have incomplete starting and end periods, which may not be comparable to other periods, and we may need to exclude these from the result.

Usage

## S3 method for class 'Date'
date_seq(x, period = .day_interval(x), anchor = "start", complete = FALSE, ...)

Arguments

x

a vector of dates, possibly including NA values

period

the gap between observations as a number of days or as a natural language definition of the period such as "1 week", '2 weeks', '1 month', etcetera. If not given this will be derived from the dates.

anchor

defines a day that appears in the sequence (if it were to extend that far). Given either as a date, or "start", "end" or a day of the week, e.g. "mon".

complete

truncate incomplete start and end periods

...

ignored

Value

a vector of dates for regular periods between the minimum and maximum of dates, with the boundaries defined by the anchor.

Examples

date_seq(as.Date(c("2020-01-01","2020-02-01","2020-01-15","2020-02-01",NA)), "2 days")

Create the full sequence of values in a vector

Description

This is useful if you want to fill in missing values that should have been observed but weren't. For example, date_seq(c(1, 2, 4, 6), 1) will return 1:6.

Usage

## S3 method for class 'numeric'
date_seq(x, period = 1, tol = 1e-06, ...)

Arguments

x

a numeric or date vector

period

Gap between each observation. The existing data will be checked to ensure that it is actually of this periodicity.

tol

Numerical tolerance for checking periodicity.

...

for subtype methods

Value

a vector of the same type as the input

Examples

date_seq(c(1, 2, 4, 5, 10), 1)

Expand a time_period vector to the full range of possible times

Description

Derive from a vector of observation time_periods, a complete ordered sequence of periods in a regular time series, where the length of the periods is specified, as a number of days, weeks, years etc. E.g. this can convert a random set of times to a ordered complete list of 1 week intervals (or 2 month intervals) spanning the same range as the dates. This has some interesting problems regarding where to put breaks within a month or week. Often this is either based on a specific date (e.g. yearly periods starting at 2020-01-01) or a day of week (e.g. 2 weekly periods staring on a sunday) or maybe relative to the input time series (weekly ending on the last date of the data). There is also a problem when we consider data that may have incomplete starting and end periods, which may not be comparable to other periods, and we may need to exclude these from the result.

Usage

## S3 method for class 'time_period'
date_seq(x, period = attributes(x)$unit, complete = FALSE, ...)

Arguments

x

a time period vector

period

the gap between observations as a number of days or as a natural language definition of the period such as "1 week", '2 weeks', '1 month', etc. If not given this will be derived from the dates.

complete

truncate incomplete start and end periods

...

ignored

Value

a vector of time_periods for regular periods between the minimum and maximum of dates, with the boundaries defined by the anchor.

Examples

tmp = as.time_period(c(0,10,100), 7, "2020-01-01")
date_seq(tmp, "7 days")
date_seq(tmp, "1 day")

Convert a set of dates to numeric timepoints

Description

Using a start_date and a unit specification

Usage

date_to_time(
  dates,
  unit = .day_interval(dates),
  start_date = getOption("day_zero", "2019-12-29")
)

Arguments

dates

a vector of dates to convert

unit

a specification of the unit of the resulting time series. Will be determined from periodicity of dates if not specified. If another time_period is given as the unit then the

start_date

the origin of the conversion. Defaults to the beginning of the COVID pandemic

Value

a vector of class time_period

Examples

times = date_to_time(as.Date("2019-12-29")+0:100, "1 week")
dates = time_to_date(times)

The Beta Distribution

Description

Density, distribution function, quantile function and random generation for the Beta distribution with parameters shape1 and shape2 (and optional non-centrality parameter ncp).

Usage

dbeta2(x, prob, kappa, log = FALSE)

Arguments

x

vector of quantiles

prob

the mean probability (vectorised)

kappa

a coefficient of variation. where 0 is no variability and 1 is maximally variability (vectorised)

log

logical; if TRUE, probabilities p are given as log(p).

Value

dbeta gives the density, pbeta the distribution function, qbeta the quantile function, and rbeta generates random deviates.

Invalid arguments will result in return value NaN, with a warning.

The length of the result is determined by n for rbeta, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

See Also

stats::dbeta()

Examples

dbeta2(c(0.25,0.5,0.75), 0.5, 0.25)

The Gamma Distribution

Description

Density, distribution function, quantile function and random generation for the Gamma distribution with parameters shape and scale.

Usage

dgamma2(x, mean, sd = sqrt(mean), log = FALSE, convex = TRUE)

Arguments

x

vector of quantiles

mean

the mean value on the true scale (vectorised)

sd

the standard deviation on the true scale (vectorised)

log

logical; if TRUE, probabilities p are given as log(p).

convex

Show a warning if the distribution selected is not a convex function

Value

dgamma gives the density, pgamma gives the distribution function, qgamma gives the quantile function, and rgamma generates random deviates.

Invalid arguments will result in return value NaN, with a warning.

The length of the result is determined by n for rgamma, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

See Also

stats::dgamma()

Examples

dgamma2(seq(0,4,0.25), 2, 1)

The Log Normal Distribution

Description

Density, distribution function, quantile function and random generation for the log normal distribution whose logarithm has mean equal to meanlog and standard deviation equal to sdlog.

Usage

dlnorm2(x, mean = 1, sd = sqrt(exp(1) - 1), log = FALSE)

Arguments

x

vector of quantiles

mean

the mean value on the true scale (vectorised)

sd

the standard deviation on the true scale (vectorised)

log

logical; if TRUE, probabilities p are given as log(p).

Details

The log normal distribution has density

f(x)=12πσxe(log(x)μ)2/2σ2f(x) = \frac{1}{\sqrt{2\pi}\sigma x} e^{-(\log(x) - \mu)^2/2 \sigma^2}%

where μ\mu and σ\sigma are the mean and standard deviation of the logarithm. The mean is E(X)=exp(μ+1/2σ2)E(X) = exp(\mu + 1/2 \sigma^2), the median is med(X)=exp(μ)med(X) = exp(\mu), and the variance Var(X)=exp(2μ+σ2)(exp(σ2)1)Var(X) = exp(2\mu + \sigma^2)(exp(\sigma^2) - 1) and hence the coefficient of variation is exp(σ2)1\sqrt{exp(\sigma^2) - 1} which is approximately σ\sigma when that is small (e.g., σ<1/2\sigma < 1/2).

Value

dlnorm gives the density, plnorm gives the distribution function, qlnorm gives the quantile function, and rlnorm generates random deviates.

The length of the result is determined by n for rlnorm, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

Note

The cumulative hazard H(t)=log(1F(t))H(t) = - \log(1 - F(t)) is -plnorm(t, r, lower = FALSE, log = TRUE).

Source

dlnorm is calculated from the definition (in ‘Details’). [pqr]lnorm are based on the relationship to the normal.

Consequently, they model a single point mass at exp(meanlog) for the boundary case sdlog = 0.

References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

See Also

Distributions for other standard distributions, including dnorm for the normal distribution.

Examples

dlnorm2(seq(0,4,0.25), 2, 1)

The Negative Binomial Distribution

Description

Density, distribution function, quantile function and random generation for the negative binomial distribution with parameters size and prob.

Usage

dnbinom2(x, mean, sd = sqrt(mean), log = FALSE)

Arguments

x

vector of (non-negative integer) quantiles.

mean

the mean value on the true scale (vectorised)

sd

the standard deviation on the true scale (vectorised)

log

logical; if TRUE, probabilities p are given as log(p).

Details

The negative binomial distribution with size =n= n and prob =p= p has density

p(x)=Γ(x+n)Γ(n)x!pn(1p)xp(x) = \frac{\Gamma(x+n)}{\Gamma(n) x!} p^n (1-p)^x

for x=0,1,2,x = 0, 1, 2, \ldots, n>0n > 0 and 0<p10 < p \le 1.

This represents the number of failures which occur in a sequence of Bernoulli trials before a target number of successes is reached. The mean is μ=n(1p)/p\mu = n(1-p)/p and variance n(1p)/p2n(1-p)/p^2.

A negative binomial distribution can also arise as a mixture of Poisson distributions with mean distributed as a gamma distribution (see pgamma) with scale parameter (1 - prob)/prob and shape parameter size. (This definition allows non-integer values of size.)

An alternative parametrization (often used in ecology) is by the mean mu (see above), and size, the dispersion parameter, where prob = size/(size+mu). The variance is mu + mu^2/size in this parametrization.

If an element of x is not integer, the result of dnbinom is zero, with a warning.

The case size == 0 is the distribution concentrated at zero. This is the limiting distribution for size approaching zero, even if mu rather than prob is held constant. Notice though, that the mean of the limit distribution is 0, whatever the value of mu.

The quantile is defined as the smallest value xx such that F(x)pF(x) \ge p, where FF is the distribution function.

Value

dnbinom gives the density, pnbinom gives the distribution function, qnbinom gives the quantile function, and rnbinom generates random deviates.

Invalid size or prob will result in return value NaN, with a warning.

The length of the result is determined by n for rnbinom, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

rnbinom returns a vector of type integer unless generated values exceed the maximum representable integer when double values are returned.

Source

dnbinom computes via binomial probabilities, using code contributed by Catherine Loader (see dbinom).

pnbinom uses pbeta.

qnbinom uses the Cornish–Fisher Expansion to include a skewness correction to a normal approximation, followed by a search.

rnbinom uses the derivation as a gamma mixture of Poisson distributions, see

Devroye, L. (1986) Non-Uniform Random Variate Generation. Springer-Verlag, New York. Page 480.

See Also

Distributions for standard distributions, including dbinom for the binomial, dpois for the Poisson and dgeom for the geometric distribution, which is a special case of the negative binomial.

Examples

dnbinom2(0:5, 2, sqrt(2))

Doubling time from growth rate

Description

The unit of doubling times is always days.

Usage

doubling_time(x, ...)

Arguments

x

a dataframe calculated from either proportion or incidence growth rate calculations:

e.g. A dataframe containing the following columns:

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period

  • incidence.fit (double) - an estimate of the incidence rate on a log scale

  • incidence.se.fit (positive_double) - the standard error of the incidence rate estimate on a log scale

  • incidence.0.025 (positive_double) - lower confidence limit of the incidence rate (true scale)

  • incidence.0.5 (positive_double) - median estimate of the incidence rate (true scale)

  • incidence.0.975 (positive_double) - upper confidence limit of the incidence rate (true scale)

  • growth.fit (double) - an estimate of the growth rate

  • growth.se.fit (positive_double) - the standard error the growth rate

  • growth.0.025 (double) - lower confidence limit of the growth rate

  • growth.0.5 (double) - median estimate of the growth rate

  • growth.0.975 (double) - upper confidence limit of the growth rate

Any grouping allowed.

OR

A dataframe containing the following columns:

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period

  • proportion.fit (double) - an estimate of the proportion on a logit scale

  • proportion.se.fit (positive_double) - the standard error of proportion estimate on a logit scale

  • proportion.0.025 (proportion) - lower confidence limit of proportion (true scale)

  • proportion.0.5 (proportion) - median estimate of proportion (true scale)

  • proportion.0.975 (proportion) - upper confidence limit of proportion (true scale)

  • relative.growth.fit (double) - an estimate of the relative growth rate

  • relative.growth.se.fit (positive_double) - the standard error the relative growth rate

  • relative.growth.0.025 (double) - lower confidence limit of the relative growth rate

  • relative.growth.0.5 (double) - median estimate of the relative growth rate

  • relative.growth.0.975 (double) - upper confidence limit of the relative growth rate

Any grouping allowed.

...

not used

Value

the same dataframe with additional columns for doubling time or relative doubling time plus confidence intervals.

Examples

ggoutbreak::england_covid %>%
  ggoutbreak::poisson_locfit_model(window=21) %>%
  ggoutbreak::doubling_time() %>%
  dplyr::glimpse()

The Du empirical serial interval dataset

Description

From Z. Du, X. Xu, Y. Wu, L. Wang, B. J. Cowling, and L. A. Meyers, ‘Serial Interval of COVID-19 among Publicly Reported Confirmed Cases’, Emerg Infect Dis, vol. 26, no. 6, pp. 1341–1343, Jun. 2020, doi: 10.3201/eid2606.200357.

Usage

data(du_serial_interval_ip)

Format

A dataframe containing the following columns:

  • boot (anything + default(1)) - a bootstrap identifier

  • probability (proportion) - the probability of infection between previous time period until time

  • tau (numeric) - the time index this probability relates to (in days)

  • a0 (numeric) - the beginning of the time period

  • a1 (numeric) - the end of the time period

Grouped by: boot.

2603 rows and 5 columns


Wedge distribution

Description

The wedge distribution has a domain of 0 to 1 and has a linear probability density function over that domain.

Usage

dwedge(x, a, lower.tail = TRUE, log = FALSE)

Arguments

x

vector of quantiles

a

a gradient from -2 (left skewed) to 2 (right skewed)

lower.tail

logical; if TRUE (default), probabilities are P[X<=x] otherwise P[X>x].

log

logical; if TRUE, probabilities p are given as log(p).

Details

The rwedge can be combined with quantile functions to skew standard distributions, or introduce correlation or down weight certain parts of the distribution.

Value

a vector of probabilities, quantiles, densities or samples.

Examples

pwedge(seq(0,1,0.1), a=1)
dwedge(seq(0,1,0.1), a=1)
qwedge(c(0.25,0.5,0.75), a=-1)

stats::cor(
  stats::qnorm(rwedge(1000, a=2)),
  stats::qnorm(rwedge(1000, a=-2))
)

The SPI-M-O England consensus growth rate

Description

SPI-M-O used a range of different statistical and mechanistic models to produce estimates of the growth rate of the epidemic from various data sources (including with an early version of ggoutbreak).

Usage

data(england_consensus_growth_rate)

Format

A dataframe containing the following columns:

  • date (date) - the date of the estimate

  • low (numeric) - the lower published estimate of the growth rate

  • high (numeric) - the higher published estimate of the growth rate

No mandatory groupings.

No default value.

111 rows and 3 columns


The SPI-M-O England consensus reproduction number

Description

SPI-M-O used a range of different statistical and mechanistic models to produce estimates of the reproduction number of the epidemic from various data sources.

Usage

data(england_consensus_rt)

Format

A dataframe containing the following columns:

  • date (date) - the date of the estimate

  • low (numeric) - the lower published estimate of the reproduction number

  • high (numeric) - the higher published estimate of the reproduction number

No mandatory groupings.

No default value.

113 rows and 3 columns


Daily COVID-19 case counts by age group in England

Description

A dataset of the daily count of COVID-19 cases by age group in England downloaded from the UKHSA coronavirus API, and formatted for use in ggoutbreak. A denominator is calculated which is the overall positive count for all age groups. This data set can be used to calculate group-wise incidence and absolute growth rates and group wise proportions and relative growth rates by age group.

Usage

data(england_covid)

Format

A dataframe containing the following columns:

  • date (as.Date) - the date column

  • class (enum(⁠00_04⁠,⁠05_09⁠,⁠10_14⁠,⁠15_19⁠,⁠20_24⁠,⁠25_29⁠,⁠30_34⁠,⁠35_39⁠,⁠40_44⁠,⁠45_49⁠,⁠50_54⁠,⁠55_59⁠,⁠60_64⁠,⁠65_69⁠,⁠70_74⁠,⁠75_79⁠,⁠80_84⁠,⁠85_89⁠,⁠90+⁠)) - the class column

  • count (numeric) - the test positives for each age group

  • denom (numeric) - the test positives for all age groups

  • time (time_period) - the time column

Must be grouped by: class (and other groupings allowed).

No default value.

26790 rows and 5 columns

Details

You may want england_covid_proportion instead which includes the population denominator. The denominator here is the total number of positive tests in all age groups and not the tests taken or denominators.


England COVID-19 PCR test positivity

Description

The coronavirus.gov.uk dashboard published tests conducted and positive results as separate data sets for a range of geographies. In this case the data is combined with testing rate as denominator, and positives as count for England.

Usage

data(england_covid_pcr_positivity)

Format

A dataframe containing the following columns:

  • date (date) - a daily time series

  • time (time_period) - the time column

  • count (numeric) - test positives in England on that day

  • denom (numeric) - total tests conducted on that day

No mandatory groupings.

No default value.

1413 rows and 4 columns


England COVID by age group for ascertainment

Description

An age group stratified dataset from

Usage

data(england_covid_proportion)

Format

A dataframe containing the following columns:

  • class (character) - the age group

  • date (date) - the start date of a week

  • count (numeric) - the count of COVID positives

  • denom (numeric) - the number of COVID tests performed

  • population (numeric) - the size of the population at this age group

  • time (time_period) - the time column (weekly)

Must be grouped by: class (and other groupings allowed).

No default value.

1050 rows and 6 columns

Details

  • the coronavirus.gov.uk site for positive cases aggregated to 10 year age groups and by weekly time.

  • NHS test and trace date which reported regional by age group testing effort aggregated to country level.

  • ONS 2021 census population aggregated to 10 year age groups.


England demographics

Description

Population counts by 5 year age group for England only from the 2021 census.

Usage

data(england_demographics)

Format

A dataframe containing the following columns:

  • class (enum(⁠00_04⁠,⁠05_09⁠,⁠10_14⁠,⁠15_19⁠,⁠20_24⁠,⁠25_29⁠,⁠30_34⁠,⁠35_39⁠,⁠40_44⁠,⁠45_49⁠,⁠50_54⁠,⁠55_59⁠,⁠60_64⁠,⁠65_69⁠,⁠70_74⁠,⁠75_79⁠,⁠80_84⁠,⁠85_89⁠,⁠90+⁠)) - the class column

  • population (numeric) - the population count column

  • baseline_proportion (numeric) - the baseline proportion is the proportion this age group makes up of the total.

Must be grouped by: class (and other groupings allowed).

No default value.

19 rows and 3 columns

Source

https://www.ons.gov.uk/file?uri=/peoplepopulationandcommunity/populationandmigration/populationestimates/datasets/populationandhouseholdestimatesenglandandwalescensus2021/census2021/census2021firstresultsenglandwales1.xlsx


Key dated in the COVID-19 response in England

Description

This includes mainly the dates of lockdowns, releases from social distancing measures and the dates that new variants were first detected.

Usage

data(england_events)

Format

A dataframe containing the following columns:

  • label (character) - the event label

  • start (date) - the event start date

  • end (date) - the (optional) event end date

No mandatory groupings.

No default value.

13 rows and 3 columns


NHS COVID-19 app data

Description

check-in (social activity) and alerts (self isolation instruction) data from the NHS COVID-19 app, aggregated to country level on a week by week basis.

Usage

data(england_nhs_app)

Format

A dataframe containing the following columns:

  • date (date) - the start date of the week

  • alerts (integer) - the count of self-isolation alerts

  • visits (integer) - the number of venue check-ins representing visits to social venues.

  • time (time_period) - the time column

No mandatory groupings.

No default value.

137 rows and 4 columns


The england_ons_infection_survey dataset

Description

The COVID-19 ONS infection survey took a random sample of the population and provides an estimate of the prevalence of COVID-19 that is supposedly free from ascertainment bias.

Usage

data(england_ons_infection_survey)

Format

A dataframe containing the following columns:

  • date (date) - the date column

  • geography (character) - the geography column

  • prevalence.0.5 (numeric) - the median proportion of people in the region testing positive for COVID-19

  • prevalence.0.025 (numeric) - the lower CI of the proportion of people in the region testing positive for COVID-19

  • prevalence.0.975 (numeric) - the upper CI of the proportion of people in the region testing positive for COVID-19

  • denom (integer) - the sample size on which this estimate was made (daily rate inferred from weekly sample sizes.)

  • time (time_period) - the time column

No mandatory groupings.

No default value.

9820 rows and 7 columns

Details

The data is available here: https://www.ons.gov.uk/file?uri=/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/datasets/coronaviruscovid19infectionsurveydata/2023/20230310covid19infectionsurveydatasetsengland.xlsx


Counts of COVID-19 variants

Description

Data from the COG-UK and Sanger centre sequencing programme. The data were made available through the Welcome foundation at Lower tier local authority level, and is weekly time series of counts per variant. Variants were assigned using the tree structure of the Pango lineage. Different sub-lineages are aggregated to the major WHO variants of concern.

Usage

data(england_variants)

Format

A dataframe containing the following columns:

  • date (date) - the end date of the week

  • time (time_period) - the time column

  • class (enum(Other,Alpha (B.1.1.7),Delta (B.1.617.2),Delta (AY.4),Omicron (Other),Omicron (BA.2),Omicron (BA.4),Omicron (BA.5),XBB (Other),Kraken (XBB.1.5),Arcturus (XBB.1.16),Eris (EG.5.1))) - the class column

  • who_class (enum(Other,Alpha,Delta,Omicron,Kraken,Arcturus,Eris)) - the who_class column

  • count (numeric) - the weekly count column

  • denom (numeric) - the number of sequences performed in that week

Must be grouped by: class (and other groupings allowed).

No default value.

479 rows and 6 columns


Format date as dmy

Description

Format date as dmy

Usage

fdmy(date)

Arguments

date

a date to convert

Value

the formatted date

Examples

fdmy(Sys.Date())

Print a summary of an infectivity profile

Description

Print a summary of an infectivity profile

Usage

format_ip(ip = i_empirical_ip)

Arguments

ip

the infectivity profile to summarise. a0 and a1 columns are optional if tau is given.

A dataframe containing the following columns:

  • boot (anything + default(1)) - a bootstrap identifier

  • probability (proportion) - the probability of new event during this period.

  • a0 (double) - the beginning of the time period (in days)

  • a1 (double) - the end of the time period (in days)

Must be grouped by: boot (exactly).

A default value is defined.

Value

an infectivity profile description

Examples

format_ip(ganyani_ip)

A COVID-19 infectivity profile based on an Ganyani et al 2020

Description

A COVID-19 infectivity profile based on an Ganyani et al 2020

The Ganyani serial interval dataset, compatible with EpiEstim

Usage

data(ganyani_ip)

Format

A dataframe containing the following columns:

  • boot (anything + default(1)) - a bootstrap identifier

  • probability (proportion) - the probability of infection between previous time period until time

  • tau (double) - the time index this probability relates to (in days)

  • a0 - the beginning of the time period

  • a1 - the end of the time period

Grouped by boot (exactly).

A dataframe containing the following columns:

  • boot (anything + default(1)) - a bootstrap identifier

  • probability (proportion) - the probability of infection between previous time period until time

  • tau (numeric) - the time index this probability relates to (in days)

  • a0 (numeric) - the beginning of the time period

  • a1 (numeric) - the end of the time period

Grouped by: boot.

2400 rows and 5 columns

References

https://www.eurosurveillance.org/content/10.2807/1560-7917.ES.2020.25.17.2000257


A COVID-19 infectivity profile based on an Ganyani et al 2020

Description

This version is discretised in a manner that makes it incompatible with EpiEstim.

Usage

data(ganyani_ip_2)

Format

A dataframe containing the following columns:

  • boot (anything + default(1)) - a bootstrap identifier

  • probability (proportion) - the probability of infection between previous time period until time

  • tau (double) - the time index this probability relates to (in days)

  • a0 - the beginning of the time period

  • a1 - the end of the time period

Grouped by boot (exactly).

A dataframe containing the following columns:

  • boot (anything + default(1)) - a bootstrap identifier

  • probability (proportion) - the probability of infection between previous time period until time

  • tau (numeric) - the time index this probability relates to (in days)

  • a0 (numeric) - the beginning of the time period

  • a1 (numeric) - the end of the time period

Grouped by: boot.

2800 rows and 5 columns

References

https://www.eurosurveillance.org/content/10.2807/1560-7917.ES.2020.25.17.2000257


Add time series event markers to a time series plot.

Description

The x axis must be a date.

Usage

geom_events(
  events = i_events,
  event_label_size = 7,
  event_label_colour = "black",
  event_label_angle = -30,
  event_line_colour = "grey50",
  event_fill_colour = "grey50",
  hide_labels = FALSE,
  guide_axis = ggplot2::derive(),
  ...
)

Arguments

events

Significant events or time spans

A dataframe containing the following columns:

  • label (character) - the event label

  • start (date) - the start date, or the date of the event

  • end (date) - the end date or NA if a single event

Any grouping allowed.

A default value is defined.

event_label_size

how big to make the event label

event_label_colour

the event label colour

event_label_angle

the event label colour

event_line_colour

the event line colour

event_fill_colour

the event area fill

hide_labels

do not show labels at all

guide_axis

a guide axis configuration for the labels (see ggplot2::guide_axis and ggplot2::dup_axis). This can be used to specify a position amongst other things.

...

Named arguments passed on to ggplot2::scale_x_date

name

The name of the scale. Used as the axis or legend title. If waiver(), the default, the name of the scale is taken from the first mapping used for that aesthetic. If NULL, the legend title will be omitted.

breaks

One of:

  • NULL for no breaks

  • waiver() for the breaks specified by date_breaks

  • A Date/POSIXct vector giving positions of breaks

  • A function that takes the limits as input and returns breaks as output

date_breaks

A string giving the distance between breaks like "2 weeks", or "10 years". If both breaks and date_breaks are specified, date_breaks wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'.

labels

One of:

  • NULL for no labels

  • waiver() for the default labels computed by the transformation object

  • A character vector giving labels (must be same length as breaks)

  • An expression vector (must be the same length as breaks). See ?plotmath for details.

  • A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation.

date_labels

A string giving the formatting specification for the labels. Codes are defined in strftime(). If both labels and date_labels are specified, date_labels wins.

minor_breaks

One of:

  • NULL for no breaks

  • waiver() for the breaks specified by date_minor_breaks

  • A Date/POSIXct vector giving positions of minor breaks

  • A function that takes the limits as input and returns minor breaks as output

date_minor_breaks

A string giving the distance between minor breaks like "2 weeks", or "10 years". If both minor_breaks and date_minor_breaks are specified, date_minor_breaks wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'.

limits

One of:

  • NULL to use the default scale range

  • A numeric vector of length two providing limits of the scale. Use NA to refer to the existing minimum or maximum

  • A function that accepts the existing (automatic) limits and returns new limits. Also accepts rlang lambda function notation. Note that setting limits on positional scales will remove data outside of the limits. If the purpose is to zoom, use the limit argument in the coordinate system (see coord_cartesian()).

expand

For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function expansion() to generate the values for the expand argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables.

oob

One of:

  • Function that handles limits outside of the scale limits (out of bounds). Also accepts rlang lambda function notation.

  • The default (scales::censor()) replaces out of bounds values with NA.

  • scales::squish() for squishing out of bounds values into range.

  • scales::squish_infinite() for squishing infinite values into range.

guide

A function used to create a guide or its name. See guides() for more information.

position

For position scales, The position of the axis. left or right for y axes, top or bottom for x axes.

sec.axis

sec_axis() is used to specify a secondary axis.

timezone

The timezone to use for display on the axes. The default (NULL) uses the timezone encoded in the data.

na.value

Missing values will be replaced with this value.

Value

a set of geoms for a time series.


Weekly COVID-19 case counts by age group in Germany

Description

A dataset of the weekly count of COVID-19 cases by age group in Germany downloaded from the Robert Koch Institute Survstat service, and formatted for use in growth rates. A denominator is calculated which is the overall positive count for all age groups. This data set can be used to calculate group-wise incidence and absolute growth rates and group wise proportions and relative growth rates.

Usage

data(germany_covid)

Format

A dataframe containing the following columns:

  • class (enum(⁠0–14⁠,⁠15–19⁠,⁠20–24⁠,⁠25–29⁠,⁠30–39⁠,⁠40–49⁠,⁠50–59⁠,⁠60–69⁠,⁠70–79⁠,⁠80+⁠,Unknown, .ordered=TRUE)) - the age group

  • date (as.Date) - the date column

  • count (integer) - the test positives for each age group

  • time (time_period) - the time column

  • denom (integer) - the test positives for all age groups

Must be grouped by: class (and other groupings allowed).

No default value.

2070 rows and 6 columns


Germany demographics

Description

Derived from the Robert Koch Survstat service by comparing counts and incidence rates.

Usage

data(germany_demographics)

Format

A dataframe containing the following columns:

  • class (enum(⁠0–14⁠,⁠15–19⁠,⁠20–24⁠,⁠25–29⁠,⁠30–39⁠,⁠40–49⁠,⁠50–59⁠,⁠60–69⁠,⁠70–79⁠,⁠80+⁠, .ordered=TRUE)) - the class column

  • population (integer) - the population column

Must be grouped by: class (and other groupings allowed).

No default value.

10 rows and 2 columns


Infers a daily baseline population for a timeseries

Description

This function augments any timeseries with a population denominator. The population data may be static estimates, or a set of estimates at time points. The population data may be grouped in which case the grouping might be geographical area or age group or gender for example. The two inputs must have compatible grouping (i.e. all the groups in the population data must be present in the timeseries).

Usage

infer_population(df = i_timeseries, pop = i_population_data)

Arguments

df

A time series, or a grouped collection of time series.

A dataframe containing the following columns:

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period'

Any grouping allowed.

pop

The population data must be grouped in the same way as df. It might also have a time column as a time_period if the population is not static

A dataframe containing the following columns:

  • population (positive_integer) - Size of population

Any grouping allowed.

Value

the df timeseries with additional population column

Examples

ggoutbreak::england_covid %>%
  ggoutbreak::infer_population(ggoutbreak::england_demographics) %>%
  dplyr::glimpse()

Infer the prevalence of disease from incidence estimates and population size.

Description

Log-scaled incidence estimates are used to generate samples of incidence. These are convolved with the infectivity profile (as a set of discrete distributions) and the result is evaluated as a fraction of population. The result is fitted to a logit-normal (using moments) and quantiles produced from samples.

Usage

infer_prevalence(
  modelled = i_incidence_model,
  pop = i_population_data,
  ip = i_discrete_ip,
  bootstraps = 1000,
  seed = Sys.time(),
  ...
)

Arguments

modelled

Model output from processing the raw dataframe with something like poission_locfit_model

A dataframe containing the following columns:

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period'

  • incidence.fit (double) - an estimate of the incidence rate on a log scale

  • incidence.se.fit (positive_double) - the standard error of the incidence rate estimate on a log scale

  • incidence.0.025 (positive_double) - lower confidence limit of the incidence rate (true scale)

  • incidence.0.5 (positive_double) - median estimate of the incidence rate (true scale)

  • incidence.0.975 (positive_double) - upper confidence limit of the incidence rate (true scale)

Any grouping allowed.

pop

The population data must be grouped in the same way as modelled.

A dataframe containing the following columns:

  • population (positive_integer) - Size of population

Any grouping allowed.

ip

A discrete distribution representing the probability of detection of disease on a given day after infection. This will not necessarily sum to one, but each entry will not exceed the asymptomatic fraction.

A dataframe containing the following columns:

  • boot (anything + default(1)) - a bootstrap identifier

  • probability (proportion) - the probability of new event during this period.

  • tau (integer + complete) - the days since the index event.

Must be grouped by: boot (exactly).

A default value is defined.

bootstraps

the number of samples to take at each time point. This will be rounded up to a whole multiple of the infectivity profile distribution length.

seed

a random number seed for reproducibility

...

not used

Value

the modelled input with additional proportion columns

Examples

tmp = ggoutbreak::england_covid %>%
  dplyr::filter(date < "2021-01-01") %>%
  time_aggregate(count=sum(count)) %>%
  ggoutbreak::poisson_locfit_model(window=21) %>%
  ggoutbreak::infer_prevalence(
     pop = 56489700,
     # N.B. timing is difficult in this example, as we are using cases.
     ip = ggoutbreak::covid_test_sensitivity
  )

if(interactive()) {
  plot_prevalence(tmp)
}

Calculate a risk ratio from incidence (experimental)

Description

This enables incidence rates are able to be compared to a baseline figure for incidence. The baseline could come for example from a population average or average incidence over time. The output is an incidence rate ratio. The incidence_baseline column is a rate of events per unit time. The time unit is expected to be the same as that of the date in modelled and this is not checked.

Usage

infer_rate_ratio(
  modelled = i_incidence_model,
  base = i_baseline_incidence_data,
  ...
)

Arguments

modelled

Model output from something like poisson_locfit_model(). It really makes sense if this is a grouped model.

A dataframe containing the following columns:

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period'

  • incidence.fit (double) - an estimate of the incidence rate on a log scale

  • incidence.se.fit (positive_double) - the standard error of the incidence rate estimate on a log scale

  • incidence.0.025 (positive_double) - lower confidence limit of the incidence rate (true scale)

  • incidence.0.5 (positive_double) - median estimate of the incidence rate (true scale)

  • incidence.0.975 (positive_double) - upper confidence limit of the incidence rate (true scale)

Any grouping allowed.

base

The baseline data must be grouped in the same way as modelled. It may be a time series but does not have to be. See the example and note this may change in the future.

A dataframe containing the following columns:

  • baseline_incidence (positive_double) - Baseline raw incidence rate as count data

Any grouping allowed.

...

not used

Value

a dataframe with incidence rate ratios for each of the classes in modelled. A dataframe containing the following columns:

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period

  • rate_ratio.0.025 (positive_double) - lower confidence limit of the rate ratio for a population group

  • rate_ratio.0.5 (positive_double) - median estimate of the rate ratio for a population group

  • rate_ratio.0.975 (positive_double) - upper confidence limit of the rate ratio for a population group

Any grouping allowed.

Examples

baseline = ggoutbreak::england_covid %>%
  ggoutbreak::time_aggregate() %>%
  ggoutbreak::poisson_locfit_model(window=21) %>%
  dplyr::mutate(baseline_incidence = incidence.0.5)


tmp = ggoutbreak::england_covid %>%
  ggoutbreak::poisson_locfit_model(window=21) %>%
  ggoutbreak::infer_rate_ratio(baseline) %>%
  dplyr::glimpse()

Calculate a normalised risk ratio from proportions

Description

This assumes that for example, case distribution proportions are stratified by a population grouping, e.g. geography or age, and we have estimates of the size of that population during that time period. Normalising by population proportion allows us to compare the relative risk of the outcome in groups, compared to the expected population risk if the outcome is evenly distributed across the population. There may be other proportions other than population fractions that may be useful to compare. At the moment this does not handle any uncertainty.

Usage

infer_risk_ratio(
  modelled = i_proportion_model,
  base = i_baseline_proportion_data,
  ...
)

Arguments

modelled

Model output from something like proportion_locfit_model()

A dataframe containing the following columns:

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period'

  • proportion.fit (double) - an estimate of the proportion on a logit scale

  • proportion.se.fit (positive_double) - the standard error of proportion estimate on a logit scale

  • proportion.0.025 (proportion) - lower confidence limit of proportion (true scale)

  • proportion.0.5 (proportion) - median estimate of proportion (true scale)

  • proportion.0.975 (proportion) - upper confidence limit of proportion (true scale)

Any grouping allowed.

base

The baseline data must be grouped in the same way as modelled. It may be a time series but does not have to be.

A dataframe containing the following columns:

  • baseline_proportion (proportion) - Baseline proportion for comparison

Any grouping allowed.

...

not used

Value

a dataframe with relative risk / risk ratio columns. A dataframe containing the following columns:

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period

  • risk_ratio.0.025 (positive_double) - lower confidence limit of the excess risk ratio for a population group

  • risk_ratio.0.5 (positive_double) - median estimate of the excess risk ratio for a population group

  • risk_ratio.0.975 (positive_double) - upper confidence limit of the excess risk ratio for a population group

Any grouping allowed.

Examples

tmp = ggoutbreak::england_covid %>%
  ggoutbreak::proportion_locfit_model(window=21) %>%
  ggoutbreak::infer_risk_ratio(ggoutbreak::england_demographics) %>%
  dplyr::glimpse()

if(interactive()) {
  plot_growth_phase(tmp)
}

Strictly integer breaks for continuous scale

Description

Strictly integer breaks for continuous scale

Usage

integer_breaks(n = 5, ...)

Arguments

n

number of breaks

...

further arguments for methods.

Value

a ggplot breaks function


Calculate a growth rate from a reproduction number and an infectivity profile

Description

Calculate a growth rate from a reproduction number and an infectivity profile

Usage

inv_wallinga_lipsitch(
  Rt,
  y = i_empirical_ip,
  a1 = seq(0.5, length.out = length(y)),
  a0 = dplyr::lag(a1, default = 0)
)

Arguments

Rt

a vector of reproduction numbers

y

an empirical infectivity profile as a probability vector or as a dataframe of format: A dataframe containing the following columns:

  • boot (anything + default(1)) - a bootstrap identifier

  • probability (proportion) - the probability of new event during this period.

  • a0 (double) - the beginning of the time period (in days)

  • a1 (double) - the end of the time period (in days)

Must be grouped by: boot (exactly). A default value is defined.

a1

the end time of the infectivity profile probability estimate (defaults to 0.5,1.5,2.5,...).

a0

the start time of the infectivity profile probability estimate (defaults to 0,0.5,1.5,...).

Value

an vector of growth rates

Examples

y=stats::pgamma(seq(0.5,length.out=50), 5,2)-stats::pgamma(c(0,seq(0.5,length.out=49)), 5,2)
inv_wallinga_lipsitch(Rt=seq(0.5,2.5,length.out=9), y=y)

Check whether vector is a date

Description

Check whether vector is a date

Usage

is.Date(x)

Arguments

x

a vector to check

Value

TRUE if dates, FALSE otherwise

Examples

is.Date(Sys.Date())

Extract Parts of a POSIXt or Date Object

Description

Extract the weekday, month or quarter, or the Julian time (days since some origin). These are generic functions: the methods for the internal date-time classes are documented here.

Usage

## S3 method for class 'time_period'
julian(x, ...)

Arguments

x

an object inheriting from class "POSIXt" or "Date".

...

arguments for other methods.

Value

weekdays and months return a character vector of names in the locale in use, i.e., Sys.getlocale("LC_TIME").

quarters returns a character vector of "Q1" to "Q4".

julian returns the number of days (possibly fractional) since the origin, with the origin as a "origin" attribute. All time calculations in R are done ignoring leap-seconds.

Note

Other components such as the day of the month or the year are very easy to compute: just use as.POSIXlt and extract the relevant component. Alternatively (especially if the components are desired as character strings), use strftime.

See Also

DateTimeClasses, Date; Sys.getlocale("LC_TIME") crucially for months() and weekdays().

Examples

## first two are locale dependent:
weekdays(.leap.seconds)
months  (.leap.seconds)
quarters(.leap.seconds)

## Show how easily you get month, day, year, day (of {month, week, yr}), ... :
## (remember to count from 0 (!): mon = 0..11, wday = 0..6,  etc !!)

##' Transform (Time-)Date vector  to  convenient data frame :
dt2df <- function(dt, dName = deparse(substitute(dt))) {
    DF <- as.data.frame(unclass(as.POSIXlt( dt )))
    `names<-`(cbind(dt, DF, deparse.level=0L), c(dName, names(DF)))
}
## e.g.,
dt2df(.leap.seconds)    # date+time
dt2df(Sys.Date() + 0:9) # date

##' Even simpler:  Date -> Matrix - dropping time info {sec,min,hour, isdst}
d2mat <- function(x) simplify2array(unclass(as.POSIXlt(x))[4:7])
## e.g.,
d2mat(seq(as.Date("2000-02-02"), by=1, length.out=30)) # has R 1.0.0's release date


## Julian Day Number (JDN, https://en.wikipedia.org/wiki/Julian_day)
## is the number of days since noon UTC on the first day of 4317 BCE.
## in the proleptic Julian calendar.  To more recently, in
## 'Terrestrial Time' which differs from UTC by a few seconds
## See https://en.wikipedia.org/wiki/Terrestrial_Time
julian(Sys.Date(), -2440588) # from a day
floor(as.numeric(julian(Sys.time())) + 2440587.5) # from a date-time

Label a time period

Description

Create a set of labels for a time period based on the start and duration of the period. The format is configurable using the start and end dates and the dfmt and ifmt parameters, however if the time period has names then these are used in preference.

Usage

## S3 method for class 'time_period'
labels(
  object,
  ...,
  dfmt = "%d/%b",
  ifmt = "{start} — {end}",
  na.value = "Unknown"
)

Arguments

object

a set of decimal times as a time_period

...

not used

dfmt

a strptime format specification for the format of the date

ifmt

a glue spec referring to start and end of the period as a formatted date

na.value

a label for NA times

Value

a set of character labels for the time

Examples

eg = as.time_period(Sys.Date()+0:10*7, anchor="start")
labels(eg)
labels(eg, ifmt="{start}", dfmt="%d/%b/%y")
labels(eg, ifmt="until {end}", dfmt="%d %b %Y")

# labels retained in constructor:
eg2 = Sys.Date()+0:10*7
names(eg2) = paste0("week ",0:10)
labels(eg2)
labels(as.time_period(eg2, anchor="start"))

logit scale

Description

Perform logit scaling with correct axis formatting. To not be used directly but with ggplot (e.g. ggplot2::scale_y_continuous(trans = "logit") )

Usage

logit_trans(n = 5, ...)

Arguments

n

the number of breaks

...

not used, for compatibility

Value

A scales object

Examples

library(ggplot2)
library(tibble)

tibble::tibble(pvalue = c(0.001, 0.05, 0.1), fold_change = 1:3) %>%
 ggplot2::ggplot(aes(fold_change , pvalue)) +
 ggplot2::geom_point() +
 ggplot2::scale_y_continuous(trans = "logit")

Recover a long format infectivity profile from an EpiEstim style matrix

Description

Recover a long format infectivity profile from an EpiEstim style matrix

Usage

make_empirical_ip(omega, normalise = TRUE)

Arguments

omega

a matrix of probabilities, starting at time zero, with columns representing one possible infectivity profile, with the fist value being the probability at time zero to 0.5. Or a vector of probabilities for one single profile, resulting in 1 bootstrap.

normalise

is this a probability mass function? In which case we make the sum of this equal to one (the default). If FALSE then the input matrix, or vector is clipped so that its maximum value is one. If this is a number then the PMF is scaled to this value.

Value

a long format ip delay distribution

Examples

format_ip(make_empirical_ip(c(0,0,1,1,1,2,2,2,1,1)))

Make an infectivity profile from published data

Description

The infectivity profile is typically fitted to data by MCMC and reported as median and 95% credible intervals, of the mean, and the SD of (usually_ a gamma distribution. This function generates a discrete infectivity probability distribution representing the chance that an infectee was infected on any specific day after the infector was infected (given that the infectee was infected).

Usage

make_gamma_ip(
  median_of_mean,
  lower_ci_of_mean = median_of_mean,
  upper_ci_of_mean = median_of_mean,
  median_of_sd = sqrt(median_of_mean),
  lower_ci_of_sd = median_of_sd,
  upper_ci_of_sd = median_of_sd,
  correlation = NA,
  n_boots = 100,
  epiestim_compat = FALSE,
  epiestim_sampler = epiestim_compat,
  z_crit = 0.95
)

Arguments

median_of_mean, lower_ci_of_mean, upper_ci_of_mean

Parameters of the infectivity profile mean.

median_of_sd, lower_ci_of_sd, upper_ci_of_sd

Parameters of the infectivity profile SD.

correlation

the correlation between mean and sd. this is optional and will be inferred if not changed form the default NA value.

n_boots

The number of samples to generate.

epiestim_compat

Use EpiEstim to generate the infectivity profiles. A true value here results in an infectivity profile with probability of 0 for day 0.

epiestim_sampler

Use EpiEstim to generate the random samples using independent truncated normal distributions for mean and SD based on parameters above. If FALSE then it will use a log normal distributions with correlation.

z_crit

the width of the confidence intervals (defaults to 95%).

Details

EpiEstim generates these distributions by sampling from a truncated normal distribution for both mean and sd. The means and sds thus produced are discretised using a gamma distribution offset by 1 day, to enforce that the probability of infection on day zero is zero.

This constraint changes the shape of the distribution somewhat and may cause a small bias (although there is no ground truth to evaluate). A different sampling and discretisation strategy is provided. The sampler uses log-normal distributions for both mean and SD with a degree of correlation. The discretizer assigns probabilities direct from the CDF of the gamma distribution without offset. This results in non zero values for the probability at time zero and can only be used with Rt estimation methods that can handle zero/negative serial intervals (e.g. rt_from_incidence or rt_from_renewal, or rt_from_growth_rate).

Value

a long format infectivity profile data frame, or a list of dataframes if input is a vector.

Examples

# COVID-19 estimates from Ganyani et al 2020.
tmp = make_gamma_ip(5.2, 3.78, 6.78, 1.72, 0.91, 3.93,
  epiestim_sampler=FALSE, epiestim_compat=FALSE)

tmp %>%
  dplyr::group_by(boot) %>%
  dplyr::summarise(
    mean = sum(tau*probability),
    sd = sqrt(sum((tau-sum(tau*probability))^2*probability))
  ) %>%
  dplyr::summarise(
    mean = sprintf("%1.2f [%1.2f-%1.2f]",
      stats::quantile(mean,0.5),
      stats::quantile(mean,0.025),
      stats::quantile(mean,0.975)),
    sd = sprintf("%1.2f [%1.2f-%1.2f]",
      stats::quantile(sd,0.5),
      stats::quantile(sd,0.025),
      stats::quantile(sd,0.975))
  )

if(interactive()) {
  plot_ip(tmp, alpha=0.1) +
    ggplot2::coord_cartesian(xlim=c(0,15))
}

means = c(3,4,5)
ips = make_gamma_ip(means)

if (interactive()) {
  purrr::map2(ips,means, ~ .x %>% dplyr::mutate(label = sprintf("Mean: %1.2f",.y))) %>%
    purrr::map( ~ plot_ip(.x,alpha=0.1)+ggplot2::facet_wrap(~label))
}

Make an infectivity profile from posterior samples

Description

The infectivity profile is typically fitted to data by MCMC as a gamma distribution. This function generates a discrete infectivity probability distribution representing the chance that an infectee was infected on any specific day after the infector was infected (given that the infectee was infected), from posterior samples.

Usage

make_posterior_ip(
  ...,
  mean,
  sd,
  shape,
  rate,
  scale,
  epiestim_compat = FALSE,
  n_boots = 100
)

Arguments

...

not used, must be empty

mean

a vector of gamma distribution means

sd

a vector of gamma distribution sds

shape

a vector of gamma distribution shape parameters

rate

a vector of gamma distribution rate parameters

scale

a vector of gamma distribution scale parameters

epiestim_compat

Use EpiEstim to generate the infectivity profiles. A true value here results in an infectivity profile with probability of 0 for day 0.

n_boots

if there are more posterior samples than this limit then a maximum of n_boots ip distributions will be created (randomly sampled).

Details

If using EpiEstim and coarseDataTools::dic.fit.mcmc the output of the MCMC will be a S4 object with a samples slot, containing a dataframe of shape=var1 and scale=var2 columns. to use this output with make_posterior_ip invoke it like this:

do.call(make_posterior_ip, SI_fit_clever@samples %>% dplyr::rename(shape=var1, scale=var2))

N.b. only one combination of mean and sd, shape and rate, or shape and scale, are required.

Value

a long format ip delay distribution

Examples

tmp = make_posterior_ip(
  mean = stats::rnorm(100,5,0.1),
  sd = stats::rnorm(100,1.5,0.1)
)
tmp %>% dplyr::glimpse()
if (interactive()) plot_ip(tmp)

Re-sample an empirical IP distribution direct from data

Description

Suits larger contact tracing data sets where there is a delay between 2 events which may or may not be precisely known.

Usage

make_resampled_ip(
  tau,
  min_tau = pmax(tau - 0.5, truncate),
  max_tau = tau + 0.5,
  add_noise = TRUE,
  truncate = 0,
  n_boots = 100,
  seed = Sys.time()
)

Arguments

tau

the delay between first and second events

min_tau

the minimum delay for interval censored delays

max_tau

the maximum delay for interval censored delays

add_noise

adds noise to each date point for each replicate

truncate

what is the minimum realistic value of the parameter

n_boots

number of replicates to generate

seed

a random number seed for reproducibility

Value

a long format ip delay distribution

Examples

tau = rgamma2(100, 5,2)
ip = make_resampled_ip(min_tau = tau-1, max_tau = tau+1, seed = 100)
if(interactive()) {
  plot_ip(ip,alpha=0.1)
}

The maximum of a set of dates

Description

max.Date returns an integer and -Inf for a set of NA dates. This is usually inconvenient.

Usage

max_date(x, ...)

Arguments

x

a vector of dates

...

ignored

Value

a date. '0001-01-01“ if there is no well defined minimum.

Examples

max_date(NA)

The minimum of a set of dates

Description

min.Date returns an integer and Inf for a set of NA dates. This is usually inconvenient.

Usage

min_date(x, ...)

Arguments

x

a vector of dates

...

ignored

Value

a date. 9999-12-31 if there is no well defined minimum.

Examples

min_date(NA)

Extract Parts of a POSIXt or Date Object

Description

Extract the weekday, month or quarter, or the Julian time (days since some origin). These are generic functions: the methods for the internal date-time classes are documented here.

Usage

## S3 method for class 'time_period'
months(x, abbreviate = FALSE)

Arguments

x

an object inheriting from class "POSIXt" or "Date".

abbreviate

logical vector (possibly recycled). Should the names be abbreviated?

Value

weekdays and months return a character vector of names in the locale in use, i.e., Sys.getlocale("LC_TIME").

quarters returns a character vector of "Q1" to "Q4".

julian returns the number of days (possibly fractional) since the origin, with the origin as a "origin" attribute. All time calculations in R are done ignoring leap-seconds.

Note

Other components such as the day of the month or the year are very easy to compute: just use as.POSIXlt and extract the relevant component. Alternatively (especially if the components are desired as character strings), use strftime.

See Also

DateTimeClasses, Date; Sys.getlocale("LC_TIME") crucially for months() and weekdays().

Examples

## first two are locale dependent:
weekdays(.leap.seconds)
months  (.leap.seconds)
quarters(.leap.seconds)

## Show how easily you get month, day, year, day (of {month, week, yr}), ... :
## (remember to count from 0 (!): mon = 0..11, wday = 0..6,  etc !!)

##' Transform (Time-)Date vector  to  convenient data frame :
dt2df <- function(dt, dName = deparse(substitute(dt))) {
    DF <- as.data.frame(unclass(as.POSIXlt( dt )))
    `names<-`(cbind(dt, DF, deparse.level=0L), c(dName, names(DF)))
}
## e.g.,
dt2df(.leap.seconds)    # date+time
dt2df(Sys.Date() + 0:9) # date

##' Even simpler:  Date -> Matrix - dropping time info {sec,min,hour, isdst}
d2mat <- function(x) simplify2array(unclass(as.POSIXlt(x))[4:7])
## e.g.,
d2mat(seq(as.Date("2000-02-02"), by=1, length.out=30)) # has R 1.0.0's release date


## Julian Day Number (JDN, https://en.wikipedia.org/wiki/Julian_day)
## is the number of days since noon UTC on the first day of 4317 BCE.
## in the proleptic Julian calendar.  To more recently, in
## 'Terrestrial Time' which differs from UTC by a few seconds
## See https://en.wikipedia.org/wiki/Terrestrial_Time
julian(Sys.Date(), -2440588) # from a day
floor(as.numeric(julian(Sys.time())) + 2440587.5) # from a date-time

Multinomial time-series model.

Description

Takes a list of times, classes and counts, e.g. a COGUK variant like data set with time, (multinomial) class (e.g. variant) and count being the count in that time period. Fits a quadratic B-spline on time to the proportion of the data using nnet::multinom, with approx one degree of freedom per class and per window units of the time series

Usage

multinomial_nnet_model(
  d = i_multinomial_input,
  ...,
  window = 14,
  frequency = "1 day",
  predict = TRUE
)

Arguments

d

multi-class count input

...

not used and present to allow proportion model to be used in a group_modify

window

a number of data points between knots, smaller values result in less smoothing, large value in more.

frequency

the density of the output estimates.

predict

result a prediction. If false we return the model.

Value

a new dataframe with time (as a time period), class, and proportion.0.5, or a model object

Examples

if (interactive()) {
  # not run due to long running
  tmp = ggoutbreak::england_covid %>%
    dplyr::filter(date > "2022-01-01") %>%
    ggoutbreak::multinomial_nnet_model(window=21) %>%
    dplyr::glimpse()
}

Calculate a normalised count per capita

Description

This assumes positive disease counts are stratified by a population grouping, e.g. geography or age, and we have estimates of the size of that population during that time period. Normalising by population size allows us to compare groups.

Usage

normalise_count(
  raw = i_incidence_data,
  pop = i_population_data,
  ...,
  population_unit = 1e+05,
  normalise_time = FALSE
)

Arguments

raw

The count data

A dataframe containing the following columns:

  • count (positive_integer) - Positive case counts associated with the specified time frame

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period'

Any grouping allowed.

pop

The population data must be grouped in the same way as raw.

A dataframe containing the following columns:

  • population (positive_integer) - Size of population

Any grouping allowed.

...

not used

population_unit

What population unit do you want the count data normalised to e.g. per 100K

normalise_time

The default behaviour for normalising is to keep it in the same time units as the input data. If this parameter is set to TRUE the incidence rates are calculated per year. If given as a lubridate period string e.g. "1 week" then the incidence is calculated over that time period.

Value

a dataframe with incidence rates per unit capita. A dataframe containing the following columns:

  • population (positive_integer) - Size of population

  • count (positive_integer) - Positive case counts associated with the specified time frame

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period

  • population_unit (double) - The population unit on which the per capita incidence rate is calculated

  • time_unit (lubridate::as.period) - The time period over which the per capita incidence rate is calculated

Any grouping allowed.

Examples

tmp = ggoutbreak::england_covid %>%
  ggoutbreak::normalise_count(ggoutbreak::england_demographics) %>%
  dplyr::glimpse()

Calculate a normalised incidence rate per capita

Description

This assumes positive disease counts are stratified by a population grouping, e.g. geography or age, and we have estimates of the size of that population during that time period. Normalising by population size allows us to compare groups.

Usage

normalise_incidence(
  modelled = i_incidence_model,
  pop = i_population_data,
  ...,
  population_unit = 1e+05,
  normalise_time = FALSE
)

Arguments

modelled

Model output from processing the raw dataframe with something like poission_locfit_model

A dataframe containing the following columns:

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period'

  • incidence.fit (double) - an estimate of the incidence rate on a log scale

  • incidence.se.fit (positive_double) - the standard error of the incidence rate estimate on a log scale

  • incidence.0.025 (positive_double) - lower confidence limit of the incidence rate (true scale)

  • incidence.0.5 (positive_double) - median estimate of the incidence rate (true scale)

  • incidence.0.975 (positive_double) - upper confidence limit of the incidence rate (true scale)

Any grouping allowed.

pop

The population data must be grouped in the same way as modelled.

A dataframe containing the following columns:

  • population (positive_integer) - Size of population

Any grouping allowed.

...

not used

population_unit

what population unit do you want the incidence in e.g. per 100K

normalise_time

The default behaviour for incidence is to keep it in the same time units as the input data. If this parameter is set to TRUE the incidence rates are calculated per year. If given as a lubridate period string e.g. "1 day" then the incidence is calculated over that time period.

Value

a dataframe with incidence rates per unit capita. A dataframe containing the following columns:

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period

  • incidence.per_capita.fit (double) - an estimate of the incidence per capita rate on a log scale

  • incidence.per_capita.se.fit (positive_double) - the standard error of the incidence per capita rate estimate on a log scale

  • incidence.per_capita.0.025 (positive_double) - lower confidence limit of the incidence per capita rate (true scale)

  • incidence.per_capita.0.5 (positive_double) - median estimate of the incidence per capita rate (true scale)

  • incidence.per_capita.0.975 (positive_double) - upper confidence limit of the incidence per capita rate (true scale)

  • population_unit (double) - The population unit on which the per capita incidence rate is calculated

  • time_unit (lubridate::as.period) - The time period over which the per capita incidence rate is calculated

Any grouping allowed.

Examples

tmp = ggoutbreak::england_covid %>%
  ggoutbreak::poisson_locfit_model(window=21) %>%
  ggoutbreak::normalise_incidence(ggoutbreak::england_demographics) %>%
  dplyr::glimpse()

The Beta Distribution

Description

Density, distribution function, quantile function and random generation for the Beta distribution with parameters shape1 and shape2 (and optional non-centrality parameter ncp).

Usage

pbeta2(q, prob, kappa, lower.tail = TRUE, log.p = FALSE)

Arguments

q

vector of quantiles

prob

the mean probability (vectorised)

kappa

a coefficient of variation. where 0 is no variability and 1 is maximally variability (vectorised)

lower.tail

logical; if TRUE (default), probabilities are P[X<=x] otherwise P[X>x].

log.p

logical; if TRUE, probabilities p are given as log(p).

Value

dbeta gives the density, pbeta the distribution function, qbeta the quantile function, and rbeta generates random deviates.

Invalid arguments will result in return value NaN, with a warning.

The length of the result is determined by n for rbeta, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

See Also

stats::pbeta()

Examples

pbeta2(c(0.25,0.5,0.75), 0.5, 0.25)

The Gamma Distribution

Description

Density, distribution function, quantile function and random generation for the Gamma distribution with parameters shape and scale.

Usage

pgamma2(
  q,
  mean,
  sd = sqrt(mean),
  lower.tail = TRUE,
  log.p = FALSE,
  convex = TRUE
)

Arguments

q

vector of quantiles

mean

the mean value on the true scale (vectorised)

sd

the standard deviation on the true scale (vectorised)

lower.tail

logical; if TRUE (default), probabilities are P[X<=x] otherwise P[X>x].

log.p

logical; if TRUE, probabilities p are given as log(p).

convex

Show a warning if the distribution selected is not a convex function

Value

dgamma gives the density, pgamma gives the distribution function, qgamma gives the quantile function, and rgamma generates random deviates.

Invalid arguments will result in return value NaN, with a warning.

The length of the result is determined by n for rgamma, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

See Also

stats::pgamma()

Examples

pgamma2(seq(0,4,0.25), 2, 1)

The Log Normal Distribution

Description

Density, distribution function, quantile function and random generation for the log normal distribution whose logarithm has mean equal to meanlog and standard deviation equal to sdlog.

Usage

plnorm2(q, mean = 1, sd = sqrt(exp(1) - 1), lower.tail = TRUE, log.p = FALSE)

Arguments

q

vector of quantiles

mean

the mean value on the true scale (vectorised)

sd

the standard deviation on the true scale (vectorised)

lower.tail

logical; if TRUE (default), probabilities are P[Xx]P[X \le x], otherwise, P[X>x]P[X > x].

log.p

logical; if TRUE, probabilities p are given as log(p).

Details

The log normal distribution has density

f(x)=12πσxe(log(x)μ)2/2σ2f(x) = \frac{1}{\sqrt{2\pi}\sigma x} e^{-(\log(x) - \mu)^2/2 \sigma^2}%

where μ\mu and σ\sigma are the mean and standard deviation of the logarithm. The mean is E(X)=exp(μ+1/2σ2)E(X) = exp(\mu + 1/2 \sigma^2), the median is med(X)=exp(μ)med(X) = exp(\mu), and the variance Var(X)=exp(2μ+σ2)(exp(σ2)1)Var(X) = exp(2\mu + \sigma^2)(exp(\sigma^2) - 1) and hence the coefficient of variation is exp(σ2)1\sqrt{exp(\sigma^2) - 1} which is approximately σ\sigma when that is small (e.g., σ<1/2\sigma < 1/2).

Value

dlnorm gives the density, plnorm gives the distribution function, qlnorm gives the quantile function, and rlnorm generates random deviates.

The length of the result is determined by n for rlnorm, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

Note

The cumulative hazard H(t)=log(1F(t))H(t) = - \log(1 - F(t)) is -plnorm(t, r, lower = FALSE, log = TRUE).

Source

dlnorm is calculated from the definition (in ‘Details’). [pqr]lnorm are based on the relationship to the normal.

Consequently, they model a single point mass at exp(meanlog) for the boundary case sdlog = 0.

References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

See Also

Distributions for other standard distributions, including dnorm for the normal distribution.

Examples

plnorm2(seq(0,4,0.25), 2, 1)

Plot a raw case counts as a histogram

Description

Plot a raw case counts as a histogram

Usage

plot_cases(
  raw = i_timestamped,
  ...,
  mapping = .check_for_aes(raw, ..., class_aes = "fill"),
  events = i_events
)

Arguments

raw

The raw case data either as a summarised count or as a line-list

A dataframe containing the following columns:

  • time (ggoutbreak::time_period) - A set of events with a timestamp as a 'time_period'

Any grouping allowed.

...

Named arguments passed on to geom_events

event_label_size

how big to make the event label

event_label_colour

the event label colour

event_label_angle

the event label colour

event_line_colour

the event line colour

event_fill_colour

the event area fill

hide_labels

do not show labels at all

guide_axis

a guide axis configuration for the labels (see ggplot2::guide_axis and ggplot2::dup_axis). This can be used to specify a position amongst other things.

...

Named arguments passed on to ggplot2::scale_x_date

name

The name of the scale. Used as the axis or legend title. If waiver(), the default, the name of the scale is taken from the first mapping used for that aesthetic. If NULL, the legend title will be omitted.

breaks

One of:

  • NULL for no breaks

  • waiver() for the breaks specified by date_breaks

  • A Date/POSIXct vector giving positions of breaks

  • A function that takes the limits as input and returns breaks as output

date_breaks

A string giving the distance between breaks like "2 weeks", or "10 years". If both breaks and date_breaks are specified, date_breaks wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'.

labels

One of:

  • NULL for no labels

  • waiver() for the default labels computed by the transformation object

  • A character vector giving labels (must be same length as breaks)

  • An expression vector (must be the same length as breaks). See ?plotmath for details.

  • A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation.

date_labels

A string giving the formatting specification for the labels. Codes are defined in strftime(). If both labels and date_labels are specified, date_labels wins.

minor_breaks

One of:

  • NULL for no breaks

  • waiver() for the breaks specified by date_minor_breaks

  • A Date/POSIXct vector giving positions of minor breaks

  • A function that takes the limits as input and returns minor breaks as output

date_minor_breaks

A string giving the distance between minor breaks like "2 weeks", or "10 years". If both minor_breaks and date_minor_breaks are specified, date_minor_breaks wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'.

limits

One of:

  • NULL to use the default scale range

  • A numeric vector of length two providing limits of the scale. Use NA to refer to the existing minimum or maximum

  • A function that accepts the existing (automatic) limits and returns new limits. Also accepts rlang lambda function notation. Note that setting limits on positional scales will remove data outside of the limits. If the purpose is to zoom, use the limit argument in the coordinate system (see coord_cartesian()).

expand

For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function expansion() to generate the values for the expand argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables.

oob

One of:

  • Function that handles limits outside of the scale limits (out of bounds). Also accepts rlang lambda function notation.

  • The default (scales::censor()) replaces out of bounds values with NA.

  • scales::squish() for squishing out of bounds values into range.

  • scales::squish_infinite() for squishing infinite values into range.

guide

A function used to create a guide or its name. See guides() for more information.

position

For position scales, The position of the axis. left or right for y axes, top or bottom for x axes.

sec.axis

sec_axis() is used to specify a secondary axis.

timezone

The timezone to use for display on the axes. The default (NULL) uses the timezone encoded in the data.

na.value

Missing values will be replaced with this value.

mapping

a ggplot2::aes mapping. Most importantly setting the fill to something if there are multiple types of event in the plot. If a class column is present the mapping will default to using this.

events

Significant events or time spans

A dataframe containing the following columns:

  • label (character) - the event label

  • start (date) - the start date, or the date of the event

  • end (date) - the end date or NA if a single event

Any grouping allowed.

A default value is defined.

Value

a ggplot object

Examples

# 50 random times:
tmp = tibble::tibble(
  time = as.time_period( sample.int(10,50,replace=TRUE) ,"1 day"),
  class = rep(c("one","two","three"), length.out=50)
)

if(interactive()) {
  plot_cases(tmp)
}

Plot a raw case count timeseries

Description

Plot a raw case count timeseries

Usage

plot_counts(
  raw = i_incidence_data,
  ...,
  mapping = .check_for_aes(raw, ...),
  events = i_events
)

Arguments

raw

The raw count data, or the raw count data normalised by population ( see normalise_count())

A dataframe containing the following columns:

  • count (positive_integer) - Positive case counts associated with the specified time frame

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period'

Any grouping allowed.

...

Named arguments passed on to geom_events

event_label_size

how big to make the event label

event_label_colour

the event label colour

event_label_angle

the event label colour

event_line_colour

the event line colour

event_fill_colour

the event area fill

hide_labels

do not show labels at all

guide_axis

a guide axis configuration for the labels (see ggplot2::guide_axis and ggplot2::dup_axis). This can be used to specify a position amongst other things.

...

Named arguments passed on to ggplot2::scale_x_date

name

The name of the scale. Used as the axis or legend title. If waiver(), the default, the name of the scale is taken from the first mapping used for that aesthetic. If NULL, the legend title will be omitted.

breaks

One of:

  • NULL for no breaks

  • waiver() for the breaks specified by date_breaks

  • A Date/POSIXct vector giving positions of breaks

  • A function that takes the limits as input and returns breaks as output

date_breaks

A string giving the distance between breaks like "2 weeks", or "10 years". If both breaks and date_breaks are specified, date_breaks wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'.

labels

One of:

  • NULL for no labels

  • waiver() for the default labels computed by the transformation object

  • A character vector giving labels (must be same length as breaks)

  • An expression vector (must be the same length as breaks). See ?plotmath for details.

  • A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation.

date_labels

A string giving the formatting specification for the labels. Codes are defined in strftime(). If both labels and date_labels are specified, date_labels wins.

minor_breaks

One of:

  • NULL for no breaks

  • waiver() for the breaks specified by date_minor_breaks

  • A Date/POSIXct vector giving positions of minor breaks

  • A function that takes the limits as input and returns minor breaks as output

date_minor_breaks

A string giving the distance between minor breaks like "2 weeks", or "10 years". If both minor_breaks and date_minor_breaks are specified, date_minor_breaks wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'.

limits

One of:

  • NULL to use the default scale range

  • A numeric vector of length two providing limits of the scale. Use NA to refer to the existing minimum or maximum

  • A function that accepts the existing (automatic) limits and returns new limits. Also accepts rlang lambda function notation. Note that setting limits on positional scales will remove data outside of the limits. If the purpose is to zoom, use the limit argument in the coordinate system (see coord_cartesian()).

expand

For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function expansion() to generate the values for the expand argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables.

oob

One of:

  • Function that handles limits outside of the scale limits (out of bounds). Also accepts rlang lambda function notation.

  • The default (scales::censor()) replaces out of bounds values with NA.

  • scales::squish() for squishing out of bounds values into range.

  • scales::squish_infinite() for squishing infinite values into range.

guide

A function used to create a guide or its name. See guides() for more information.

position

For position scales, The position of the axis. left or right for y axes, top or bottom for x axes.

sec.axis

sec_axis() is used to specify a secondary axis.

timezone

The timezone to use for display on the axes. The default (NULL) uses the timezone encoded in the data.

na.value

Missing values will be replaced with this value.

mapping

a ggplot2::aes mapping. Most importantly setting the colour to something if there are multiple incidence timeseries in the plot

events

Significant events or time spans

A dataframe containing the following columns:

  • label (character) - the event label

  • start (date) - the start date, or the date of the event

  • end (date) - the end date or NA if a single event

Any grouping allowed.

A default value is defined.

Value

a ggplot object

Examples

# example code

tmp = ggoutbreak::england_covid %>%
  time_aggregate(count=sum(count)) %>%
  normalise_count(pop=56489700, population_unit=1000, normalise_time=TRUE)

# normalised by England population (56489700 people)

if(interactive()) {
  plot_counts(tmp, colour="blue",size=0.25)
}

Plot an incidence or proportion versus growth phase diagram

Description

Plot an incidence or proportion versus growth phase diagram

Usage

plot_growth_phase(
  modelled = i_timestamped,
  timepoints = NULL,
  duration = max(dplyr::count(modelled)$n),
  interval = 7,
  mapping = if (interfacer::is_col_present(modelled, class)) ggplot2::aes(colour = class)
    else ggplot2::aes(),
  cis = TRUE,
  ...
)

Arguments

modelled

Either:

A dataframe containing the following columns:

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period

  • incidence.fit (double) - an estimate of the incidence rate on a log scale

  • incidence.se.fit (positive_double) - the standard error of the incidence rate estimate on a log scale

  • incidence.0.025 (positive_double) - lower confidence limit of the incidence rate (true scale)

  • incidence.0.5 (positive_double) - median estimate of the incidence rate (true scale)

  • incidence.0.975 (positive_double) - upper confidence limit of the incidence rate (true scale)

  • growth.fit (double) - an estimate of the growth rate

  • growth.se.fit (positive_double) - the standard error the growth rate

  • growth.0.025 (double) - lower confidence limit of the growth rate

  • growth.0.5 (double) - median estimate of the growth rate

  • growth.0.975 (double) - upper confidence limit of the growth rate

Any grouping allowed.

OR:

A dataframe containing the following columns:

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period

  • proportion.fit (double) - an estimate of the proportion on a logit scale

  • proportion.se.fit (positive_double) - the standard error of proportion estimate on a logit scale

  • proportion.0.025 (proportion) - lower confidence limit of proportion (true scale)

  • proportion.0.5 (proportion) - median estimate of proportion (true scale)

  • proportion.0.975 (proportion) - upper confidence limit of proportion (true scale)

  • relative.growth.fit (double) - an estimate of the relative growth rate

  • relative.growth.se.fit (positive_double) - the standard error the relative growth rate

  • relative.growth.0.025 (double) - lower confidence limit of the relative growth rate

  • relative.growth.0.5 (double) - median estimate of the relative growth rate

  • relative.growth.0.975 (double) - upper confidence limit of the relative growth rate

Any grouping allowed.

timepoints

time points (as Date or time_period vector) of dates to plot phase diagrams. If multiple this will result in a sequence of plots as facets. If NULL (the default) it will be the last time point in the series

duration

the length of the growth rate phase trail

interval

the length of time between markers on the phase plot

mapping

a ggplot2::aes() mapping

cis

logical; should the phases be marked with confidence intervals?

...

Named arguments passed on to geom_events

events

Significant events or time spans

A dataframe containing the following columns:

  • label (character) - the event label

  • start (date) - the start date, or the date of the event

  • end (date) - the end date or NA if a single event

Any grouping allowed.

A default value is defined.

event_label_size

how big to make the event label

event_label_colour

the event label colour

event_label_angle

the event label colour

event_line_colour

the event line colour

event_fill_colour

the event area fill

hide_labels

do not show labels at all

guide_axis

a guide axis configuration for the labels (see ggplot2::guide_axis and ggplot2::dup_axis). This can be used to specify a position amongst other things.

...

Named arguments passed on to ggplot2::scale_x_date

name

The name of the scale. Used as the axis or legend title. If waiver(), the default, the name of the scale is taken from the first mapping used for that aesthetic. If NULL, the legend title will be omitted.

breaks

One of:

  • NULL for no breaks

  • waiver() for the breaks specified by date_breaks

  • A Date/POSIXct vector giving positions of breaks

  • A function that takes the limits as input and returns breaks as output

date_breaks

A string giving the distance between breaks like "2 weeks", or "10 years". If both breaks and date_breaks are specified, date_breaks wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'.

labels

One of:

  • NULL for no labels

  • waiver() for the default labels computed by the transformation object

  • A character vector giving labels (must be same length as breaks)

  • An expression vector (must be the same length as breaks). See ?plotmath for details.

  • A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation.

date_labels

A string giving the formatting specification for the labels. Codes are defined in strftime(). If both labels and date_labels are specified, date_labels wins.

minor_breaks

One of:

  • NULL for no breaks

  • waiver() for the breaks specified by date_minor_breaks

  • A Date/POSIXct vector giving positions of minor breaks

  • A function that takes the limits as input and returns minor breaks as output

date_minor_breaks

A string giving the distance between minor breaks like "2 weeks", or "10 years". If both minor_breaks and date_minor_breaks are specified, date_minor_breaks wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'.

limits

One of:

  • NULL to use the default scale range

  • A numeric vector of length two providing limits of the scale. Use NA to refer to the existing minimum or maximum

  • A function that accepts the existing (automatic) limits and returns new limits. Also accepts rlang lambda function notation. Note that setting limits on positional scales will remove data outside of the limits. If the purpose is to zoom, use the limit argument in the coordinate system (see coord_cartesian()).

expand

For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function expansion() to generate the values for the expand argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables.

oob

One of:

  • Function that handles limits outside of the scale limits (out of bounds). Also accepts rlang lambda function notation.

  • The default (scales::censor()) replaces out of bounds values with NA.

  • scales::squish() for squishing out of bounds values into range.

  • scales::squish_infinite() for squishing infinite values into range.

guide

A function used to create a guide or its name. See guides() for more information.

position

For position scales, The position of the axis. left or right for y axes, top or bottom for x axes.

sec.axis

sec_axis() is used to specify a secondary axis.

timezone

The timezone to use for display on the axes. The default (NULL) uses the timezone encoded in the data.

na.value

Missing values will be replaced with this value.

Named arguments passed on to ggplot2::facet_wrap

facets

A set of variables or expressions quoted by vars() and defining faceting groups on the rows or columns dimension. The variables can be named (the names are passed to labeller).

For compatibility with the classic interface, can also be a formula or character vector. Use either a one sided formula, ~a + b, or a character vector, c("a", "b").

nrow,ncol

Number of rows and columns.

scales

Should scales be fixed ("fixed", the default), free ("free"), or free in one dimension ("free_x", "free_y")?

shrink

If TRUE, will shrink scales to fit output of statistics, not raw data. If FALSE, will be range of raw data before statistical summary.

labeller

A function that takes one data frame of labels and returns a list or data frame of character vectors. Each input column corresponds to one factor. Thus there will be more than one with vars(cyl, am). Each output column gets displayed as one separate line in the strip label. This function should inherit from the "labeller" S3 class for compatibility with labeller(). You can use different labeling functions for different kind of labels, for example use label_parsed() for formatting facet labels. label_value() is used by default, check it for more details and pointers to other options.

as.table

If TRUE, the default, the facets are laid out like a table with highest values at the bottom-right. If FALSE, the facets are laid out like a plot with the highest value at the top-right.

switch

By default, the labels are displayed on the top and right of the plot. If "x", the top labels will be displayed to the bottom. If "y", the right-hand side labels will be displayed to the left. Can also be set to "both".

drop

If TRUE, the default, all factor levels not used in the data will automatically be dropped. If FALSE, all factor levels will be shown, regardless of whether or not they appear in the data.

dir

Direction: either "h" for horizontal, the default, or "v", for vertical.

strip.position

By default, the labels are displayed on the top of the plot. Using strip.position it is possible to place the labels on either of the four sides by setting strip.position = c("top", "bottom", "left", "right")

axes

Determines which axes will be drawn in case of fixed scales. When "margins" (default), axes will be drawn at the exterior margins. "all_x" and "all_y" will draw the respective axes at the interior panels too, whereas "all" will draw all axes at all panels.

axis.labels

Determines whether to draw labels for interior axes when the scale is fixed and the axis argument is not "margins". When "all" (default), all interior axes get labels. When "margins", only the exterior axes get labels, and the interior axes get none. When "all_x" or "all_y", only draws the labels at the interior axes in the x- or y-direction respectively.

Value

a ggplot

Examples

tmp = ggoutbreak::england_covid %>%
  time_aggregate(count=sum(count))

tmp_pop = ggoutbreak::england_demographics %>%
  dplyr::ungroup() %>%
  dplyr::summarise(population = sum(population))

# If the incidence is normalised by population
tmp2 = tmp %>%
  poisson_locfit_model() %>%
  normalise_incidence(tmp_pop)

timepoints = as.Date(c("Lockdown 1" = "2020-03-30", "Lockdown 2" = "2020-12-31"))

if(interactive()) {
  plot_growth_phase(tmp2, timepoints, duration=108)
}

Growth rate timeseries diagram

Description

Growth rate timeseries diagram

Usage

plot_growth_rate(
  modelled = i_timeseries,
  ...,
  mapping = .check_for_aes(modelled, ...),
  events = i_events
)

Arguments

modelled

Either:

A dataframe containing the following columns:

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period

  • incidence.fit (double) - an estimate of the incidence rate on a log scale

  • incidence.se.fit (positive_double) - the standard error of the incidence rate estimate on a log scale

  • incidence.0.025 (positive_double) - lower confidence limit of the incidence rate (true scale)

  • incidence.0.5 (positive_double) - median estimate of the incidence rate (true scale)

  • incidence.0.975 (positive_double) - upper confidence limit of the incidence rate (true scale)

  • growth.fit (double) - an estimate of the growth rate

  • growth.se.fit (positive_double) - the standard error the growth rate

  • growth.0.025 (double) - lower confidence limit of the growth rate

  • growth.0.5 (double) - median estimate of the growth rate

  • growth.0.975 (double) - upper confidence limit of the growth rate

Any grouping allowed.

OR:

A dataframe containing the following columns:

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period

  • proportion.fit (double) - an estimate of the proportion on a logit scale

  • proportion.se.fit (positive_double) - the standard error of proportion estimate on a logit scale

  • proportion.0.025 (proportion) - lower confidence limit of proportion (true scale)

  • proportion.0.5 (proportion) - median estimate of proportion (true scale)

  • proportion.0.975 (proportion) - upper confidence limit of proportion (true scale)

  • relative.growth.fit (double) - an estimate of the relative growth rate

  • relative.growth.se.fit (positive_double) - the standard error the relative growth rate

  • relative.growth.0.025 (double) - lower confidence limit of the relative growth rate

  • relative.growth.0.5 (double) - median estimate of the relative growth rate

  • relative.growth.0.975 (double) - upper confidence limit of the relative growth rate

Any grouping allowed.

...

Named arguments passed on to geom_events

event_label_size

how big to make the event label

event_label_colour

the event label colour

event_label_angle

the event label colour

event_line_colour

the event line colour

event_fill_colour

the event area fill

hide_labels

do not show labels at all

guide_axis

a guide axis configuration for the labels (see ggplot2::guide_axis and ggplot2::dup_axis). This can be used to specify a position amongst other things.

...

Named arguments passed on to ggplot2::scale_x_date

name

The name of the scale. Used as the axis or legend title. If waiver(), the default, the name of the scale is taken from the first mapping used for that aesthetic. If NULL, the legend title will be omitted.

breaks

One of:

  • NULL for no breaks

  • waiver() for the breaks specified by date_breaks

  • A Date/POSIXct vector giving positions of breaks

  • A function that takes the limits as input and returns breaks as output

date_breaks

A string giving the distance between breaks like "2 weeks", or "10 years". If both breaks and date_breaks are specified, date_breaks wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'.

labels

One of:

  • NULL for no labels

  • waiver() for the default labels computed by the transformation object

  • A character vector giving labels (must be same length as breaks)

  • An expression vector (must be the same length as breaks). See ?plotmath for details.

  • A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation.

date_labels

A string giving the formatting specification for the labels. Codes are defined in strftime(). If both labels and date_labels are specified, date_labels wins.

minor_breaks

One of:

  • NULL for no breaks

  • waiver() for the breaks specified by date_minor_breaks

  • A Date/POSIXct vector giving positions of minor breaks

  • A function that takes the limits as input and returns minor breaks as output

date_minor_breaks

A string giving the distance between minor breaks like "2 weeks", or "10 years". If both minor_breaks and date_minor_breaks are specified, date_minor_breaks wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'.

limits

One of:

  • NULL to use the default scale range

  • A numeric vector of length two providing limits of the scale. Use NA to refer to the existing minimum or maximum

  • A function that accepts the existing (automatic) limits and returns new limits. Also accepts rlang lambda function notation. Note that setting limits on positional scales will remove data outside of the limits. If the purpose is to zoom, use the limit argument in the coordinate system (see coord_cartesian()).

expand

For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function expansion() to generate the values for the expand argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables.

oob

One of:

  • Function that handles limits outside of the scale limits (out of bounds). Also accepts rlang lambda function notation.

  • The default (scales::censor()) replaces out of bounds values with NA.

  • scales::squish() for squishing out of bounds values into range.

  • scales::squish_infinite() for squishing infinite values into range.

guide

A function used to create a guide or its name. See guides() for more information.

position

For position scales, The position of the axis. left or right for y axes, top or bottom for x axes.

sec.axis

sec_axis() is used to specify a secondary axis.

timezone

The timezone to use for display on the axes. The default (NULL) uses the timezone encoded in the data.

na.value

Missing values will be replaced with this value.

mapping

a ggplot2::aes mapping. Most importantly setting the colour to something if there are multiple incidence time series in the plot

events

Significant events or time spans

A dataframe containing the following columns:

  • label (character) - the event label

  • start (date) - the start date, or the date of the event

  • end (date) - the end date or NA if a single event

Any grouping allowed.

A default value is defined.

Value

a ggplot

Examples

# example code
tmp = ggoutbreak::england_covid %>%
  time_aggregate(count=sum(count))

tmp_pop = ggoutbreak::england_demographics %>%
  dplyr::ungroup() %>%
  dplyr::summarise(population = sum(population))

# If the incidence is normalised by population
tmp2 = tmp %>%
  poisson_locfit_model() %>%
  normalise_incidence(tmp_pop)

if(interactive()) {
  plot_growth_rate(tmp2,colour="blue")
}

tmp3 = ggoutbreak::england_covid %>%
  proportion_locfit_model()

if(interactive()) {
  plot_growth_rate(tmp3)
}

Plot an incidence timeseries

Description

Plot an incidence timeseries

Usage

plot_incidence(
  modelled = i_incidence_model,
  raw = i_incidence_data,
  ...,
  mapping = .check_for_aes(modelled, ...),
  events = i_events
)

Arguments

modelled

An optional estimate of the incidence time series. If modelled is missing then it is estimated from raw using a poisson_locfit_model. In this case parameters window and deg may be supplied to control the fit.

A dataframe containing the following columns:

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period

  • incidence.fit (double) - an estimate of the incidence rate on a log scale

  • incidence.se.fit (positive_double) - the standard error of the incidence rate estimate on a log scale

  • incidence.0.025 (positive_double) - lower confidence limit of the incidence rate (true scale)

  • incidence.0.5 (positive_double) - median estimate of the incidence rate (true scale)

  • incidence.0.975 (positive_double) - upper confidence limit of the incidence rate (true scale)

Any grouping allowed.

modelled can also be the output from normalise_incidence in which case the plot uses the per capita rates calculated by that function

raw

The raw count data

A dataframe containing the following columns:

  • count (positive_integer) - Positive case counts associated with the specified time frame

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period'

Any grouping allowed.

...

Named arguments passed on to geom_events

event_label_size

how big to make the event label

event_label_colour

the event label colour

event_label_angle

the event label colour

event_line_colour

the event line colour

event_fill_colour

the event area fill

hide_labels

do not show labels at all

guide_axis

a guide axis configuration for the labels (see ggplot2::guide_axis and ggplot2::dup_axis). This can be used to specify a position amongst other things.

...

Named arguments passed on to ggplot2::scale_x_date

name

The name of the scale. Used as the axis or legend title. If waiver(), the default, the name of the scale is taken from the first mapping used for that aesthetic. If NULL, the legend title will be omitted.

breaks

One of:

  • NULL for no breaks

  • waiver() for the breaks specified by date_breaks

  • A Date/POSIXct vector giving positions of breaks

  • A function that takes the limits as input and returns breaks as output

date_breaks

A string giving the distance between breaks like "2 weeks", or "10 years". If both breaks and date_breaks are specified, date_breaks wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'.

labels

One of:

  • NULL for no labels

  • waiver() for the default labels computed by the transformation object

  • A character vector giving labels (must be same length as breaks)

  • An expression vector (must be the same length as breaks). See ?plotmath for details.

  • A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation.

date_labels

A string giving the formatting specification for the labels. Codes are defined in strftime(). If both labels and date_labels are specified, date_labels wins.

minor_breaks

One of:

  • NULL for no breaks

  • waiver() for the breaks specified by date_minor_breaks

  • A Date/POSIXct vector giving positions of minor breaks

  • A function that takes the limits as input and returns minor breaks as output

date_minor_breaks

A string giving the distance between minor breaks like "2 weeks", or "10 years". If both minor_breaks and date_minor_breaks are specified, date_minor_breaks wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'.

limits

One of:

  • NULL to use the default scale range

  • A numeric vector of length two providing limits of the scale. Use NA to refer to the existing minimum or maximum

  • A function that accepts the existing (automatic) limits and returns new limits. Also accepts rlang lambda function notation. Note that setting limits on positional scales will remove data outside of the limits. If the purpose is to zoom, use the limit argument in the coordinate system (see coord_cartesian()).

expand

For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function expansion() to generate the values for the expand argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables.

oob

One of:

  • Function that handles limits outside of the scale limits (out of bounds). Also accepts rlang lambda function notation.

  • The default (scales::censor()) replaces out of bounds values with NA.

  • scales::squish() for squishing out of bounds values into range.

  • scales::squish_infinite() for squishing infinite values into range.

guide

A function used to create a guide or its name. See guides() for more information.

position

For position scales, The position of the axis. left or right for y axes, top or bottom for x axes.

sec.axis

sec_axis() is used to specify a secondary axis.

timezone

The timezone to use for display on the axes. The default (NULL) uses the timezone encoded in the data.

na.value

Missing values will be replaced with this value.

Named arguments passed on to poisson_locfit_model

d

input data

A dataframe containing the following columns:

  • count (positive_integer) - Positive case counts associated with the specified time frame

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period'

Ungrouped.

...

not used and present to allow proportion model to be used in a group_modify

window

a number of data points defining the bandwidth of the estimate, smaller values result in less smoothing, large value in more. The default value of 14 is calibrated for data provided on a daily frequency, with weekly data a lower value may be preferred. - (defaults to 14)

deg

polynomial degree (min 1) - higher degree results in less smoothing, lower values result in more smoothing. A degree of 1 is fitting a linear model piece wise. - (defaults to 1)

frequency

the density of the output estimates as a time period such as ⁠7 days⁠ or ⁠2 weeks⁠. - (defaults to "1 day")

predict

result is a prediction dataframe. If false we return the locfit models (advanced). - (defaults to TRUE)

mapping

a ggplot2::aes mapping. Most importantly setting the colour to something if there are multiple incidence timeseries in the plot

events

Significant events or time spans

A dataframe containing the following columns:

  • label (character) - the event label

  • start (date) - the start date, or the date of the event

  • end (date) - the end date or NA if a single event

Any grouping allowed.

A default value is defined.

Value

a ggplot object

Examples

# example code

tmp = ggoutbreak::england_covid %>%
  time_aggregate(count=sum(count)) %>%
  normalise_count(56489700, population_unit=1000, normalise_time="1 year")

tmp2 = tmp %>% poisson_locfit_model()

if(interactive()) {
  plot_incidence(tmp2,tmp,colour="blue",size=0.25)
}

Plot an infectivity profile

Description

Plot an infectivity profile

Usage

plot_ip(ip = i_empirical_ip, alpha = 0.1, ...)

Arguments

ip

A long format infectivity profile

A dataframe containing the following columns:

  • boot (anything + default(1)) - a bootstrap identifier

  • probability (proportion) - the probability of new event during this period.

  • a0 (double) - the beginning of the time period (in days)

  • a1 (double) - the end of the time period (in days)

Must be grouped by: boot (exactly).

A default value is defined.

alpha

the alpha value of the bootstrap lines

...

passed onto geom_segment for controlling line thickness, alpha etc.

Value

an ggplot object

Examples

if(interactive()) {
  plot_ip(ganyani_ip,alpha=0.1)
}

Plot a multinomial proportions mode

Description

Plot a multinomial proportions mode

Usage

plot_multinomial(
  modelled = i_multinomial_proportion_model,
  ...,
  mapping = ggplot2::aes(fill = class),
  events = i_events,
  normalise = FALSE
)

Arguments

modelled

the multinomial count data

A dataframe containing the following columns:

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period'

  • class (factor) - A factor specifying the type of observation. This will be things like variant, or serotype, for a multinomial model. Any missing data points are ignored.

  • proportion.0.5 (proportion) - median estimate of proportion (true scale)

Must be grouped by: class (exactly).

...

Named arguments passed on to geom_events

event_label_size

how big to make the event label

event_label_colour

the event label colour

event_label_angle

the event label colour

event_line_colour

the event line colour

event_fill_colour

the event area fill

hide_labels

do not show labels at all

guide_axis

a guide axis configuration for the labels (see ggplot2::guide_axis and ggplot2::dup_axis). This can be used to specify a position amongst other things.

...

Named arguments passed on to ggplot2::scale_x_date

name

The name of the scale. Used as the axis or legend title. If waiver(), the default, the name of the scale is taken from the first mapping used for that aesthetic. If NULL, the legend title will be omitted.

breaks

One of:

  • NULL for no breaks

  • waiver() for the breaks specified by date_breaks

  • A Date/POSIXct vector giving positions of breaks

  • A function that takes the limits as input and returns breaks as output

date_breaks

A string giving the distance between breaks like "2 weeks", or "10 years". If both breaks and date_breaks are specified, date_breaks wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'.

labels

One of:

  • NULL for no labels

  • waiver() for the default labels computed by the transformation object

  • A character vector giving labels (must be same length as breaks)

  • An expression vector (must be the same length as breaks). See ?plotmath for details.

  • A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation.

date_labels

A string giving the formatting specification for the labels. Codes are defined in strftime(). If both labels and date_labels are specified, date_labels wins.

minor_breaks

One of:

  • NULL for no breaks

  • waiver() for the breaks specified by date_minor_breaks

  • A Date/POSIXct vector giving positions of minor breaks

  • A function that takes the limits as input and returns minor breaks as output

date_minor_breaks

A string giving the distance between minor breaks like "2 weeks", or "10 years". If both minor_breaks and date_minor_breaks are specified, date_minor_breaks wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'.

limits

One of:

  • NULL to use the default scale range

  • A numeric vector of length two providing limits of the scale. Use NA to refer to the existing minimum or maximum

  • A function that accepts the existing (automatic) limits and returns new limits. Also accepts rlang lambda function notation. Note that setting limits on positional scales will remove data outside of the limits. If the purpose is to zoom, use the limit argument in the coordinate system (see coord_cartesian()).

expand

For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function expansion() to generate the values for the expand argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables.

oob

One of:

  • Function that handles limits outside of the scale limits (out of bounds). Also accepts rlang lambda function notation.

  • The default (scales::censor()) replaces out of bounds values with NA.

  • scales::squish() for squishing out of bounds values into range.

  • scales::squish_infinite() for squishing infinite values into range.

guide

A function used to create a guide or its name. See guides() for more information.

position

For position scales, The position of the axis. left or right for y axes, top or bottom for x axes.

sec.axis

sec_axis() is used to specify a secondary axis.

timezone

The timezone to use for display on the axes. The default (NULL) uses the timezone encoded in the data.

na.value

Missing values will be replaced with this value.

mapping

a ggplot2::aes mapping. Most importantly setting the colour to something if there are multiple incidence timeseries in the plot

events

Significant events or time spans

A dataframe containing the following columns:

  • label (character) - the event label

  • start (date) - the start date, or the date of the event

  • end (date) - the end date or NA if a single event

Any grouping allowed.

A default value is defined.

normalise

make sure the probabilities add up to one - this can be a bad idea if you know you may have missing values.

Value

a ggplot

Examples

tmp = ggoutbreak::england_covid %>%
  ggoutbreak::proportion_locfit_model(window=21) %>%
  dplyr::glimpse()

if(interactive()) {
  plot_multinomial(tmp, normalise=TRUE)+
    ggplot2::scale_fill_viridis_d()
}

Plot a proportions timeseries

Description

Plot a proportions timeseries

Usage

plot_prevalence(
  modelled = i_prevalence_model,
  raw = i_proportion_data,
  ...,
  mapping = .check_for_aes(modelled, ...),
  events = i_events
)

Arguments

modelled

Proportion model estimates

A dataframe containing the following columns:

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period'

  • prevalence.0.025 (proportion) - lower confidence limit of prevalence (true scale)

  • prevalence.0.5 (proportion) - median estimate of prevalence (true scale)

  • prevalence.0.975 (proportion) - upper confidence limit of prevalence (true scale)

Any grouping allowed.

raw

Raw count data

A dataframe containing the following columns:

  • denom (positive_integer) - Total test counts associated with the specified time frame

  • count (positive_integer) - Positive case counts associated with the specified time frame

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period'

Any grouping allowed.

...

Named arguments passed on to geom_events

event_label_size

how big to make the event label

event_label_colour

the event label colour

event_label_angle

the event label colour

event_line_colour

the event line colour

event_fill_colour

the event area fill

hide_labels

do not show labels at all

guide_axis

a guide axis configuration for the labels (see ggplot2::guide_axis and ggplot2::dup_axis). This can be used to specify a position amongst other things.

...

Named arguments passed on to ggplot2::scale_x_date

name

The name of the scale. Used as the axis or legend title. If waiver(), the default, the name of the scale is taken from the first mapping used for that aesthetic. If NULL, the legend title will be omitted.

breaks

One of:

  • NULL for no breaks

  • waiver() for the breaks specified by date_breaks

  • A Date/POSIXct vector giving positions of breaks

  • A function that takes the limits as input and returns breaks as output

date_breaks

A string giving the distance between breaks like "2 weeks", or "10 years". If both breaks and date_breaks are specified, date_breaks wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'.

labels

One of:

  • NULL for no labels

  • waiver() for the default labels computed by the transformation object

  • A character vector giving labels (must be same length as breaks)

  • An expression vector (must be the same length as breaks). See ?plotmath for details.

  • A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation.

date_labels

A string giving the formatting specification for the labels. Codes are defined in strftime(). If both labels and date_labels are specified, date_labels wins.

minor_breaks

One of:

  • NULL for no breaks

  • waiver() for the breaks specified by date_minor_breaks

  • A Date/POSIXct vector giving positions of minor breaks

  • A function that takes the limits as input and returns minor breaks as output

date_minor_breaks

A string giving the distance between minor breaks like "2 weeks", or "10 years". If both minor_breaks and date_minor_breaks are specified, date_minor_breaks wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'.

limits

One of:

  • NULL to use the default scale range

  • A numeric vector of length two providing limits of the scale. Use NA to refer to the existing minimum or maximum

  • A function that accepts the existing (automatic) limits and returns new limits. Also accepts rlang lambda function notation. Note that setting limits on positional scales will remove data outside of the limits. If the purpose is to zoom, use the limit argument in the coordinate system (see coord_cartesian()).

expand

For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function expansion() to generate the values for the expand argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables.

oob

One of:

  • Function that handles limits outside of the scale limits (out of bounds). Also accepts rlang lambda function notation.

  • The default (scales::censor()) replaces out of bounds values with NA.

  • scales::squish() for squishing out of bounds values into range.

  • scales::squish_infinite() for squishing infinite values into range.

guide

A function used to create a guide or its name. See guides() for more information.

position

For position scales, The position of the axis. left or right for y axes, top or bottom for x axes.

sec.axis

sec_axis() is used to specify a secondary axis.

timezone

The timezone to use for display on the axes. The default (NULL) uses the timezone encoded in the data.

na.value

Missing values will be replaced with this value.

Named arguments passed on to proportion_locfit_model

d

the input

A dataframe containing the following columns:

  • denom (positive_integer) - Total test counts associated with the specified time frame

  • count (positive_integer) - Positive case counts associated with the specified time frame

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period'

Ungrouped.

...

not used and present to allow proportion model to be used in a group_modify

window

a number of data points defining the bandwidth of the estimate, smaller values result in less smoothing, large value in more. The default value of 14 is calibrated for data provided on a daily frequency, with weekly data a lower value may be preferred. - (defaults to 14)

deg

polynomial degree (min 1) - higher degree results in less smoothing, lower values result in more smoothing. A degree of 1 is fitting a linear model piece wise. - (defaults to 1)

frequency

the density of the output estimates as a time period such as ⁠7 days⁠ or ⁠2 weeks⁠. - (defaults to "1 day")

predict

result is a prediction dataframe. If false we return the locfit models (advanced). - (defaults to TRUE)

mapping

a ggplot2::aes mapping. Most importantly setting the colour to something if there are multiple incidence timeseries in the plot

events

Significant events or time spans

A dataframe containing the following columns:

  • label (character) - the event label

  • start (date) - the start date, or the date of the event

  • end (date) - the end date or NA if a single event

Any grouping allowed.

A default value is defined.

Value

a ggplot object

Examples

if(interactive()) {
  plot_prevalence(
    ggoutbreak::england_ons_infection_survey,
    mapping = ggplot2::aes(colour=geography))
}

Plot a proportions timeseries

Description

Plot a proportions timeseries

Usage

plot_proportion(
  modelled = i_proportion_model,
  raw = i_proportion_data,
  ...,
  mapping = .check_for_aes(modelled, ...),
  events = i_events
)

Arguments

modelled

Proportion model estimates

A dataframe containing the following columns:

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period'

  • proportion.fit (double) - an estimate of the proportion on a logit scale

  • proportion.se.fit (positive_double) - the standard error of proportion estimate on a logit scale

  • proportion.0.025 (proportion) - lower confidence limit of proportion (true scale)

  • proportion.0.5 (proportion) - median estimate of proportion (true scale)

  • proportion.0.975 (proportion) - upper confidence limit of proportion (true scale)

Any grouping allowed.

raw

Raw count data

A dataframe containing the following columns:

  • denom (positive_integer) - Total test counts associated with the specified time frame

  • count (positive_integer) - Positive case counts associated with the specified time frame

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period'

Any grouping allowed.

...

Named arguments passed on to geom_events

event_label_size

how big to make the event label

event_label_colour

the event label colour

event_label_angle

the event label colour

event_line_colour

the event line colour

event_fill_colour

the event area fill

hide_labels

do not show labels at all

guide_axis

a guide axis configuration for the labels (see ggplot2::guide_axis and ggplot2::dup_axis). This can be used to specify a position amongst other things.

...

Named arguments passed on to ggplot2::scale_x_date

name

The name of the scale. Used as the axis or legend title. If waiver(), the default, the name of the scale is taken from the first mapping used for that aesthetic. If NULL, the legend title will be omitted.

breaks

One of:

  • NULL for no breaks

  • waiver() for the breaks specified by date_breaks

  • A Date/POSIXct vector giving positions of breaks

  • A function that takes the limits as input and returns breaks as output

date_breaks

A string giving the distance between breaks like "2 weeks", or "10 years". If both breaks and date_breaks are specified, date_breaks wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'.

labels

One of:

  • NULL for no labels

  • waiver() for the default labels computed by the transformation object

  • A character vector giving labels (must be same length as breaks)

  • An expression vector (must be the same length as breaks). See ?plotmath for details.

  • A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation.

date_labels

A string giving the formatting specification for the labels. Codes are defined in strftime(). If both labels and date_labels are specified, date_labels wins.

minor_breaks

One of:

  • NULL for no breaks

  • waiver() for the breaks specified by date_minor_breaks

  • A Date/POSIXct vector giving positions of minor breaks

  • A function that takes the limits as input and returns minor breaks as output

date_minor_breaks

A string giving the distance between minor breaks like "2 weeks", or "10 years". If both minor_breaks and date_minor_breaks are specified, date_minor_breaks wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'.

limits

One of:

  • NULL to use the default scale range

  • A numeric vector of length two providing limits of the scale. Use NA to refer to the existing minimum or maximum

  • A function that accepts the existing (automatic) limits and returns new limits. Also accepts rlang lambda function notation. Note that setting limits on positional scales will remove data outside of the limits. If the purpose is to zoom, use the limit argument in the coordinate system (see coord_cartesian()).

expand

For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function expansion() to generate the values for the expand argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables.

oob

One of:

  • Function that handles limits outside of the scale limits (out of bounds). Also accepts rlang lambda function notation.

  • The default (scales::censor()) replaces out of bounds values with NA.

  • scales::squish() for squishing out of bounds values into range.

  • scales::squish_infinite() for squishing infinite values into range.

guide

A function used to create a guide or its name. See guides() for more information.

position

For position scales, The position of the axis. left or right for y axes, top or bottom for x axes.

sec.axis

sec_axis() is used to specify a secondary axis.

timezone

The timezone to use for display on the axes. The default (NULL) uses the timezone encoded in the data.

na.value

Missing values will be replaced with this value.

Named arguments passed on to proportion_locfit_model

d

the input

A dataframe containing the following columns:

  • denom (positive_integer) - Total test counts associated with the specified time frame

  • count (positive_integer) - Positive case counts associated with the specified time frame

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period'

Ungrouped.

...

not used and present to allow proportion model to be used in a group_modify

window

a number of data points defining the bandwidth of the estimate, smaller values result in less smoothing, large value in more. The default value of 14 is calibrated for data provided on a daily frequency, with weekly data a lower value may be preferred. - (defaults to 14)

deg

polynomial degree (min 1) - higher degree results in less smoothing, lower values result in more smoothing. A degree of 1 is fitting a linear model piece wise. - (defaults to 1)

frequency

the density of the output estimates as a time period such as ⁠7 days⁠ or ⁠2 weeks⁠. - (defaults to "1 day")

predict

result is a prediction dataframe. If false we return the locfit models (advanced). - (defaults to TRUE)

mapping

a ggplot2::aes mapping. Most importantly setting the colour to something if there are multiple incidence timeseries in the plot

events

Significant events or time spans

A dataframe containing the following columns:

  • label (character) - the event label

  • start (date) - the start date, or the date of the event

  • end (date) - the end date or NA if a single event

Any grouping allowed.

A default value is defined.

Value

a ggplot object

Examples

tmp = ggoutbreak::england_covid %>%
  ggoutbreak::proportion_locfit_model(window=21) %>%
  dplyr::glimpse()

if(interactive()) {
  plot_proportion(tmp)+
    ggplot2::scale_fill_viridis_d(aesthetics = c("fill","colour"))
}

Plot a raw case count proportion timeseries

Description

Plot a raw case count proportion timeseries

Usage

plot_proportions(
  raw = i_proportion_data,
  ...,
  mapping = .check_for_aes(raw, ...),
  events = i_events
)

Arguments

raw

The raw count and denominator data

A dataframe containing the following columns:

  • denom (positive_integer) - Total test counts associated with the specified time frame

  • count (positive_integer) - Positive case counts associated with the specified time frame

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period'

Any grouping allowed.

...

Named arguments passed on to geom_events

event_label_size

how big to make the event label

event_label_colour

the event label colour

event_label_angle

the event label colour

event_line_colour

the event line colour

event_fill_colour

the event area fill

hide_labels

do not show labels at all

guide_axis

a guide axis configuration for the labels (see ggplot2::guide_axis and ggplot2::dup_axis). This can be used to specify a position amongst other things.

...

Named arguments passed on to ggplot2::scale_x_date

name

The name of the scale. Used as the axis or legend title. If waiver(), the default, the name of the scale is taken from the first mapping used for that aesthetic. If NULL, the legend title will be omitted.

breaks

One of:

  • NULL for no breaks

  • waiver() for the breaks specified by date_breaks

  • A Date/POSIXct vector giving positions of breaks

  • A function that takes the limits as input and returns breaks as output

date_breaks

A string giving the distance between breaks like "2 weeks", or "10 years". If both breaks and date_breaks are specified, date_breaks wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'.

labels

One of:

  • NULL for no labels

  • waiver() for the default labels computed by the transformation object

  • A character vector giving labels (must be same length as breaks)

  • An expression vector (must be the same length as breaks). See ?plotmath for details.

  • A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation.

date_labels

A string giving the formatting specification for the labels. Codes are defined in strftime(). If both labels and date_labels are specified, date_labels wins.

minor_breaks

One of:

  • NULL for no breaks

  • waiver() for the breaks specified by date_minor_breaks

  • A Date/POSIXct vector giving positions of minor breaks

  • A function that takes the limits as input and returns minor breaks as output

date_minor_breaks

A string giving the distance between minor breaks like "2 weeks", or "10 years". If both minor_breaks and date_minor_breaks are specified, date_minor_breaks wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'.

limits

One of:

  • NULL to use the default scale range

  • A numeric vector of length two providing limits of the scale. Use NA to refer to the existing minimum or maximum

  • A function that accepts the existing (automatic) limits and returns new limits. Also accepts rlang lambda function notation. Note that setting limits on positional scales will remove data outside of the limits. If the purpose is to zoom, use the limit argument in the coordinate system (see coord_cartesian()).

expand

For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function expansion() to generate the values for the expand argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables.

oob

One of:

  • Function that handles limits outside of the scale limits (out of bounds). Also accepts rlang lambda function notation.

  • The default (scales::censor()) replaces out of bounds values with NA.

  • scales::squish() for squishing out of bounds values into range.

  • scales::squish_infinite() for squishing infinite values into range.

guide

A function used to create a guide or its name. See guides() for more information.

position

For position scales, The position of the axis. left or right for y axes, top or bottom for x axes.

sec.axis

sec_axis() is used to specify a secondary axis.

timezone

The timezone to use for display on the axes. The default (NULL) uses the timezone encoded in the data.

na.value

Missing values will be replaced with this value.

mapping

a ggplot2::aes mapping. Most importantly setting the colour to something if there are multiple incidence timeseries in the plot

events

Significant events or time spans

A dataframe containing the following columns:

  • label (character) - the event label

  • start (date) - the start date, or the date of the event

  • end (date) - the end date or NA if a single event

Any grouping allowed.

A default value is defined.

Value

a ggplot object

Examples

# example code

tmp = tibble::tibble(
  time = as.time_period(1:10, "1 day"),
  count = 101:110
) %>% dplyr::mutate(
  denom = count*time
)

if(interactive()) {
  plot_proportions(tmp)+ggplot2::geom_line()
}

Reproduction number timeseries diagram

Description

Reproduction number timeseries diagram

Usage

plot_rt(
  modelled = i_reproduction_number,
  ...,
  mapping = .check_for_aes(modelled, ...),
  events = i_events
)

Arguments

modelled

the modelled Rt estimate

A dataframe containing the following columns:

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period'

  • rt.fit (double) - an estimate of the reproduction number

  • rt.se.fit (positive_double) - the standard error of the reproduction number

  • rt.0.025 (double) - lower confidence limit of the reproduction number

  • rt.0.5 (double) - median estimate of the reproduction number

  • rt.0.975 (double) - upper confidence limit of the reproduction number

Any grouping allowed.

...

Named arguments passed on to geom_events

event_label_size

how big to make the event label

event_label_colour

the event label colour

event_label_angle

the event label colour

event_line_colour

the event line colour

event_fill_colour

the event area fill

hide_labels

do not show labels at all

guide_axis

a guide axis configuration for the labels (see ggplot2::guide_axis and ggplot2::dup_axis). This can be used to specify a position amongst other things.

...

Named arguments passed on to ggplot2::scale_x_date

name

The name of the scale. Used as the axis or legend title. If waiver(), the default, the name of the scale is taken from the first mapping used for that aesthetic. If NULL, the legend title will be omitted.

breaks

One of:

  • NULL for no breaks

  • waiver() for the breaks specified by date_breaks

  • A Date/POSIXct vector giving positions of breaks

  • A function that takes the limits as input and returns breaks as output

date_breaks

A string giving the distance between breaks like "2 weeks", or "10 years". If both breaks and date_breaks are specified, date_breaks wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'.

labels

One of:

  • NULL for no labels

  • waiver() for the default labels computed by the transformation object

  • A character vector giving labels (must be same length as breaks)

  • An expression vector (must be the same length as breaks). See ?plotmath for details.

  • A function that takes the breaks as input and returns labels as output. Also accepts rlang lambda function notation.

date_labels

A string giving the formatting specification for the labels. Codes are defined in strftime(). If both labels and date_labels are specified, date_labels wins.

minor_breaks

One of:

  • NULL for no breaks

  • waiver() for the breaks specified by date_minor_breaks

  • A Date/POSIXct vector giving positions of minor breaks

  • A function that takes the limits as input and returns minor breaks as output

date_minor_breaks

A string giving the distance between minor breaks like "2 weeks", or "10 years". If both minor_breaks and date_minor_breaks are specified, date_minor_breaks wins. Valid specifications are 'sec', 'min', 'hour', 'day', 'week', 'month' or 'year', optionally followed by 's'.

limits

One of:

  • NULL to use the default scale range

  • A numeric vector of length two providing limits of the scale. Use NA to refer to the existing minimum or maximum

  • A function that accepts the existing (automatic) limits and returns new limits. Also accepts rlang lambda function notation. Note that setting limits on positional scales will remove data outside of the limits. If the purpose is to zoom, use the limit argument in the coordinate system (see coord_cartesian()).

expand

For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function expansion() to generate the values for the expand argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables.

oob

One of:

  • Function that handles limits outside of the scale limits (out of bounds). Also accepts rlang lambda function notation.

  • The default (scales::censor()) replaces out of bounds values with NA.

  • scales::squish() for squishing out of bounds values into range.

  • scales::squish_infinite() for squishing infinite values into range.

guide

A function used to create a guide or its name. See guides() for more information.

position

For position scales, The position of the axis. left or right for y axes, top or bottom for x axes.

sec.axis

sec_axis() is used to specify a secondary axis.

timezone

The timezone to use for display on the axes. The default (NULL) uses the timezone encoded in the data.

na.value

Missing values will be replaced with this value.

mapping

a ggplot2::aes mapping. Most importantly setting the colour to something if there are multiple incidence time series in the plot

events

Significant events or time spans

A dataframe containing the following columns:

  • label (character) - the event label

  • start (date) - the start date, or the date of the event

  • end (date) - the end date or NA if a single event

Any grouping allowed.

A default value is defined.

Value

a ggplot timeseries

Examples

# example code
tmp = ggoutbreak::england_covid %>%
  time_aggregate(count=sum(count))

if (interactive()) {

  tmp2 = tmp %>%
    poisson_locfit_model() %>%
    rt_from_growth_rate()

  # comparing RT from growth rates with England consensus Rt:
  plot_rt(tmp2,colour="blue")+
    ggplot2::geom_errorbar(
      data=england_consensus_rt,
      mapping=ggplot2::aes(x=date-21,ymin=low,ymax=high),
      colour="red")

}

The Negative Binomial Distribution

Description

Density, distribution function, quantile function and random generation for the negative binomial distribution with parameters size and prob.

Usage

pnbinom2(q, mean, sd = sqrt(mean), lower.tail = TRUE, log.p = FALSE)

Arguments

q

vector of quantiles.

mean

the mean value on the true scale (vectorised)

sd

the standard deviation on the true scale (vectorised)

lower.tail

logical; if TRUE (default), probabilities are P[Xx]P[X \le x], otherwise, P[X>x]P[X > x].

log.p

logical; if TRUE, probabilities p are given as log(p).

Details

The negative binomial distribution with size =n= n and prob =p= p has density

p(x)=Γ(x+n)Γ(n)x!pn(1p)xp(x) = \frac{\Gamma(x+n)}{\Gamma(n) x!} p^n (1-p)^x

for x=0,1,2,x = 0, 1, 2, \ldots, n>0n > 0 and 0<p10 < p \le 1.

This represents the number of failures which occur in a sequence of Bernoulli trials before a target number of successes is reached. The mean is μ=n(1p)/p\mu = n(1-p)/p and variance n(1p)/p2n(1-p)/p^2.

A negative binomial distribution can also arise as a mixture of Poisson distributions with mean distributed as a gamma distribution (see pgamma) with scale parameter (1 - prob)/prob and shape parameter size. (This definition allows non-integer values of size.)

An alternative parametrization (often used in ecology) is by the mean mu (see above), and size, the dispersion parameter, where prob = size/(size+mu). The variance is mu + mu^2/size in this parametrization.

If an element of x is not integer, the result of dnbinom is zero, with a warning.

The case size == 0 is the distribution concentrated at zero. This is the limiting distribution for size approaching zero, even if mu rather than prob is held constant. Notice though, that the mean of the limit distribution is 0, whatever the value of mu.

The quantile is defined as the smallest value xx such that F(x)pF(x) \ge p, where FF is the distribution function.

Value

dnbinom gives the density, pnbinom gives the distribution function, qnbinom gives the quantile function, and rnbinom generates random deviates.

Invalid size or prob will result in return value NaN, with a warning.

The length of the result is determined by n for rnbinom, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

rnbinom returns a vector of type integer unless generated values exceed the maximum representable integer when double values are returned.

Source

dnbinom computes via binomial probabilities, using code contributed by Catherine Loader (see dbinom).

pnbinom uses pbeta.

qnbinom uses the Cornish–Fisher Expansion to include a skewness correction to a normal approximation, followed by a search.

rnbinom uses the derivation as a gamma mixture of Poisson distributions, see

Devroye, L. (1986) Non-Uniform Random Variate Generation. Springer-Verlag, New York. Page 480.

See Also

Distributions for standard distributions, including dbinom for the binomial, dpois for the Poisson and dgeom for the geometric distribution, which is a special case of the negative binomial.

Examples

pnbinom2(0:5, 2, sqrt(2))

Poisson time-series model.

Description

This uses a generalised linear model to fit a quasi-poisson model with a time varying rate as a natural cubic spline with approx one degree of freedom per window units of the time series.

Usage

poisson_glm_model(d = i_incidence_input, ..., window = 14, frequency = "1 day")

Arguments

d

Count model input

A dataframe containing the following columns:

  • count (positive_integer) - Positive case counts associated with the specified time frame

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period'

Ungrouped.

...

not used and present to allow proportion model to be used in a group_modify

window

a number of data points defining the bandwidth of the estimate, smaller values result in less smoothing, large value in more. The default value of 14 is calibrated for data provided on a daily frequency, with weekly data a lower value may be preferred. - (defaults to 14)

frequency

the density of the output estimates as a time period such as ⁠7 days⁠ or ⁠2 weeks⁠. - (defaults to "1 day")

Value

A dataframe containing the following columns:

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period

  • incidence.fit (double) - an estimate of the incidence rate on a log scale

  • incidence.se.fit (positive_double) - the standard error of the incidence rate estimate on a log scale

  • incidence.0.025 (positive_double) - lower confidence limit of the incidence rate (true scale)

  • incidence.0.5 (positive_double) - median estimate of the incidence rate (true scale)

  • incidence.0.975 (positive_double) - upper confidence limit of the incidence rate (true scale)

Any grouping allowed.

Examples

ggoutbreak::england_covid %>%
 dplyr::filter(date < "2021-01-01") %>%
 time_aggregate(count=sum(count)) %>%
 ggoutbreak::poisson_glm_model(window=21) %>%
 dplyr::glimpse()

Poisson time-series model.

Description

Takes a list of times and counts and fits a quasi-poisson model fitted with a log link function to count data using local regression using the package locfit.

Usage

poisson_locfit_model(
  d = i_incidence_input,
  ...,
  window = 14,
  deg = 1,
  frequency = "1 day",
  predict = TRUE
)

Arguments

d

input data

A dataframe containing the following columns:

  • count (positive_integer) - Positive case counts associated with the specified time frame

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period'

Ungrouped.

...

not used and present to allow proportion model to be used in a group_modify

window

a number of data points defining the bandwidth of the estimate, smaller values result in less smoothing, large value in more. The default value of 14 is calibrated for data provided on a daily frequency, with weekly data a lower value may be preferred. - (defaults to 14)

deg

polynomial degree (min 1) - higher degree results in less smoothing, lower values result in more smoothing. A degree of 1 is fitting a linear model piece wise. - (defaults to 1)

frequency

the density of the output estimates as a time period such as ⁠7 days⁠ or ⁠2 weeks⁠. - (defaults to "1 day")

predict

result is a prediction dataframe. If false we return the locfit models (advanced). - (defaults to TRUE)

Details

This results is an incidence rate estimate plus an absolute exponential growth rate estimate both based on the time unit of the input data (e.g. for daily data the rate will be cases per day and the growth rate will be daily).

Value

A dataframe containing the following columns:

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period

  • incidence.fit (double) - an estimate of the incidence rate on a log scale

  • incidence.se.fit (positive_double) - the standard error of the incidence rate estimate on a log scale

  • incidence.0.025 (positive_double) - lower confidence limit of the incidence rate (true scale)

  • incidence.0.5 (positive_double) - median estimate of the incidence rate (true scale)

  • incidence.0.975 (positive_double) - upper confidence limit of the incidence rate (true scale)

  • growth.fit (double) - an estimate of the growth rate

  • growth.se.fit (positive_double) - the standard error the growth rate

  • growth.0.025 (double) - lower confidence limit of the growth rate

  • growth.0.5 (double) - median estimate of the growth rate

  • growth.0.975 (double) - upper confidence limit of the growth rate

Any grouping allowed.

Examples

withr::with_options(list("ggoutbreak.keep_cdf"=TRUE),{
  ggoutbreak::england_covid %>%
    ggoutbreak::poisson_locfit_model(window=21) %>%
    dplyr::glimpse()
})

Binomial time-series model.

Description

This uses a generalised linear model to fit a quasi-binomial model with a time varying rate as a natural cubic spline with approx one degree of freedom per window units of the time series.

Usage

proportion_glm_model(
  d = i_proportion_input,
  ...,
  window = 14,
  frequency = "1 day"
)

Arguments

d

Proportion model input

A dataframe containing the following columns:

  • denom (positive_integer) - Total test counts associated with the specified time frame

  • count (positive_integer) - Positive case counts associated with the specified time frame

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period'

Ungrouped.

...

not used and present to allow proportion model to be used in a group_modify

window

a number of data points defining the bandwidth of the estimate, smaller values result in less smoothing, large value in more. The default value of 14 is calibrated for data provided on a daily frequency, with weekly data a lower value may be preferred. - (defaults to 14)

frequency

the density of the output estimates as a time period such as ⁠7 days⁠ or ⁠2 weeks⁠. - (defaults to "1 day")

Value

A dataframe containing the following columns:

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period

  • proportion.fit (double) - an estimate of the proportion on a logit scale

  • proportion.se.fit (positive_double) - the standard error of proportion estimate on a logit scale

  • proportion.0.025 (proportion) - lower confidence limit of proportion (true scale)

  • proportion.0.5 (proportion) - median estimate of proportion (true scale)

  • proportion.0.975 (proportion) - upper confidence limit of proportion (true scale)

Any grouping allowed.

Examples

ggoutbreak::england_covid_proportion %>%
  dplyr::filter(date < "2021-01-01") %>%
  time_aggregate() %>%
  ggoutbreak::proportion_glm_model(window=5) %>%
  dplyr::glimpse()

# TODO: deal with error conditions
# "observations with zero weight not used for calculating dispersion

A binomial proportion estimate and associated exponential growth rate

Description

takes a list of times, counts and a denominator and fits a quasi-binomial model using a logit link function to proportion data using local regression using the package locfit.

Usage

proportion_locfit_model(
  d = i_proportion_input,
  ...,
  window = 14,
  deg = 1,
  frequency = "1 day",
  predict = TRUE
)

Arguments

d

the input

A dataframe containing the following columns:

  • denom (positive_integer) - Total test counts associated with the specified time frame

  • count (positive_integer) - Positive case counts associated with the specified time frame

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period'

Ungrouped.

...

not used and present to allow proportion model to be used in a group_modify

window

a number of data points defining the bandwidth of the estimate, smaller values result in less smoothing, large value in more. The default value of 14 is calibrated for data provided on a daily frequency, with weekly data a lower value may be preferred. - (defaults to 14)

deg

polynomial degree (min 1) - higher degree results in less smoothing, lower values result in more smoothing. A degree of 1 is fitting a linear model piece wise. - (defaults to 1)

frequency

the density of the output estimates as a time period such as ⁠7 days⁠ or ⁠2 weeks⁠. - (defaults to "1 day")

predict

result is a prediction dataframe. If false we return the locfit models (advanced). - (defaults to TRUE)

Details

This expects d to contain one combination of:

  • time and count and denom columns - e.g. all tests conducted.

This results is a one versus others comparison binomial proportion estimate plus a relative growth rate estimate specifying how much quicker this is growing compared to the growth of the denominator.

The denominator maybe the sum of all subgroups denom = sum(count), e.g. in the situation where there are multiple variants of a disease circulating. In which case the relative growth is that of the subgroup compared to the overall. You can make this a one-versus-others comparison by making the denominator exclude the current item (e.g. denom = sum(count)-count).

The denominator can also be used to express the size of the population tested. This gives us a relative growth rate that is different in essence to the previous and may be a better estimate of the true growth rate in the situation where testing effort is variable, or capacity saturated.

Value

A dataframe containing the following columns:

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period

  • proportion.fit (double) - an estimate of the proportion on a logit scale

  • proportion.se.fit (positive_double) - the standard error of proportion estimate on a logit scale

  • proportion.0.025 (proportion) - lower confidence limit of proportion (true scale)

  • proportion.0.5 (proportion) - median estimate of proportion (true scale)

  • proportion.0.975 (proportion) - upper confidence limit of proportion (true scale)

  • relative.growth.fit (double) - an estimate of the relative growth rate

  • relative.growth.se.fit (positive_double) - the standard error the relative growth rate

  • relative.growth.0.025 (double) - lower confidence limit of the relative growth rate

  • relative.growth.0.5 (double) - median estimate of the relative growth rate

  • relative.growth.0.975 (double) - upper confidence limit of the relative growth rate

Any grouping allowed.

Examples

ggoutbreak::england_covid_proportion %>%
  time_aggregate() %>%
  proportion_locfit_model(window=5, degree=2) %>%
  dplyr::glimpse()

Wedge distribution

Description

The wedge distribution has a domain of 0 to 1 and has a linear probability density function over that domain.

Usage

pwedge(q, a, log.p = FALSE)

Arguments

q

vector of quantiles

a

a gradient from -2 (left skewed) to 2 (right skewed)

log.p

logical; if TRUE, probabilities p are given as log(p).

Details

The rwedge can be combined with quantile functions to skew standard distributions, or introduce correlation or down weight certain parts of the distribution.

Value

a vector of probabilities, quantiles, densities or samples.

Examples

pwedge(seq(0,1,0.1), a=1)
dwedge(seq(0,1,0.1), a=1)
qwedge(c(0.25,0.5,0.75), a=-1)

stats::cor(
  stats::qnorm(rwedge(1000, a=2)),
  stats::qnorm(rwedge(1000, a=-2))
)

The Beta Distribution

Description

Density, distribution function, quantile function and random generation for the Beta distribution with parameters shape1 and shape2 (and optional non-centrality parameter ncp).

Usage

qbeta2(p, prob, kappa, lower.tail = TRUE, log.p = FALSE)

Arguments

p

vector of probabilities

prob

the mean probability (vectorised)

kappa

a coefficient of variation. where 0 is no variability and 1 is maximally variability (vectorised)

lower.tail

logical; if TRUE (default), probabilities are P[X<=x] otherwise P[X>x].

log.p

logical; if TRUE, probabilities p are given as log(p).

Value

dbeta gives the density, pbeta the distribution function, qbeta the quantile function, and rbeta generates random deviates.

Invalid arguments will result in return value NaN, with a warning.

The length of the result is determined by n for rbeta, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

See Also

stats::qbeta()

Examples

qbeta2(c(0.25,0.5,0.75), 0.5, 0.25)

The Gamma Distribution

Description

Density, distribution function, quantile function and random generation for the Gamma distribution with parameters shape and scale.

Usage

qgamma2(
  p,
  mean,
  sd = sqrt(mean),
  lower.tail = TRUE,
  log.p = FALSE,
  convex = TRUE
)

Arguments

p

vector of probabilities

mean

the mean value on the true scale (vectorised)

sd

the standard deviation on the true scale (vectorised)

lower.tail

logical; if TRUE (default), probabilities are P[X<=x] otherwise P[X>x].

log.p

logical; if TRUE, probabilities p are given as log(p).

convex

Show a warning if the distribution selected is not a convex function

Value

dgamma gives the density, pgamma gives the distribution function, qgamma gives the quantile function, and rgamma generates random deviates.

Invalid arguments will result in return value NaN, with a warning.

The length of the result is determined by n for rgamma, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

See Also

stats::qgamma()

Examples

qgamma2(c(0.25,0.5,0.75), 2, 1)

The Log Normal Distribution

Description

Density, distribution function, quantile function and random generation for the log normal distribution whose logarithm has mean equal to meanlog and standard deviation equal to sdlog.

Usage

qlnorm2(p, mean = 1, sd = sqrt(exp(1) - 1), lower.tail = TRUE, log.p = FALSE)

Arguments

p

vector of probabilities.

mean

the mean value on the true scale (vectorised)

sd

the standard deviation on the true scale (vectorised)

lower.tail

logical; if TRUE (default), probabilities are P[Xx]P[X \le x], otherwise, P[X>x]P[X > x].

log.p

logical; if TRUE, probabilities p are given as log(p).

Details

The log normal distribution has density

f(x)=12πσxe(log(x)μ)2/2σ2f(x) = \frac{1}{\sqrt{2\pi}\sigma x} e^{-(\log(x) - \mu)^2/2 \sigma^2}%

where μ\mu and σ\sigma are the mean and standard deviation of the logarithm. The mean is E(X)=exp(μ+1/2σ2)E(X) = exp(\mu + 1/2 \sigma^2), the median is med(X)=exp(μ)med(X) = exp(\mu), and the variance Var(X)=exp(2μ+σ2)(exp(σ2)1)Var(X) = exp(2\mu + \sigma^2)(exp(\sigma^2) - 1) and hence the coefficient of variation is exp(σ2)1\sqrt{exp(\sigma^2) - 1} which is approximately σ\sigma when that is small (e.g., σ<1/2\sigma < 1/2).

Value

dlnorm gives the density, plnorm gives the distribution function, qlnorm gives the quantile function, and rlnorm generates random deviates.

The length of the result is determined by n for rlnorm, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

Note

The cumulative hazard H(t)=log(1F(t))H(t) = - \log(1 - F(t)) is -plnorm(t, r, lower = FALSE, log = TRUE).

Source

dlnorm is calculated from the definition (in ‘Details’). [pqr]lnorm are based on the relationship to the normal.

Consequently, they model a single point mass at exp(meanlog) for the boundary case sdlog = 0.

References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

See Also

Distributions for other standard distributions, including dnorm for the normal distribution.

Examples

qlnorm2(c(0.25,0.5,0.72), 2, 1)

The Negative Binomial Distribution

Description

Density, distribution function, quantile function and random generation for the negative binomial distribution with parameters size and prob.

Usage

qnbinom2(p, mean, sd = sqrt(mean), lower.tail = TRUE, log.p = FALSE)

Arguments

p

vector of probabilities.

mean

the mean value on the true scale (vectorised)

sd

the standard deviation on the true scale (vectorised)

lower.tail

logical; if TRUE (default), probabilities are P[Xx]P[X \le x], otherwise, P[X>x]P[X > x].

log.p

logical; if TRUE, probabilities p are given as log(p).

Details

The negative binomial distribution with size =n= n and prob =p= p has density

p(x)=Γ(x+n)Γ(n)x!pn(1p)xp(x) = \frac{\Gamma(x+n)}{\Gamma(n) x!} p^n (1-p)^x

for x=0,1,2,x = 0, 1, 2, \ldots, n>0n > 0 and 0<p10 < p \le 1.

This represents the number of failures which occur in a sequence of Bernoulli trials before a target number of successes is reached. The mean is μ=n(1p)/p\mu = n(1-p)/p and variance n(1p)/p2n(1-p)/p^2.

A negative binomial distribution can also arise as a mixture of Poisson distributions with mean distributed as a gamma distribution (see pgamma) with scale parameter (1 - prob)/prob and shape parameter size. (This definition allows non-integer values of size.)

An alternative parametrization (often used in ecology) is by the mean mu (see above), and size, the dispersion parameter, where prob = size/(size+mu). The variance is mu + mu^2/size in this parametrization.

If an element of x is not integer, the result of dnbinom is zero, with a warning.

The case size == 0 is the distribution concentrated at zero. This is the limiting distribution for size approaching zero, even if mu rather than prob is held constant. Notice though, that the mean of the limit distribution is 0, whatever the value of mu.

The quantile is defined as the smallest value xx such that F(x)pF(x) \ge p, where FF is the distribution function.

Value

dnbinom gives the density, pnbinom gives the distribution function, qnbinom gives the quantile function, and rnbinom generates random deviates.

Invalid size or prob will result in return value NaN, with a warning.

The length of the result is determined by n for rnbinom, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

rnbinom returns a vector of type integer unless generated values exceed the maximum representable integer when double values are returned.

Source

dnbinom computes via binomial probabilities, using code contributed by Catherine Loader (see dbinom).

pnbinom uses pbeta.

qnbinom uses the Cornish–Fisher Expansion to include a skewness correction to a normal approximation, followed by a search.

rnbinom uses the derivation as a gamma mixture of Poisson distributions, see

Devroye, L. (1986) Non-Uniform Random Variate Generation. Springer-Verlag, New York. Page 480.

See Also

Distributions for standard distributions, including dbinom for the binomial, dpois for the Poisson and dgeom for the geometric distribution, which is a special case of the negative binomial.

Examples

qnbinom2(c(0.25,0.5,0.75), 5, sqrt(5))

Identify estimate lags in a model

Description

A specific parameter or set of parameters can be estimated by a pipeline. This function applies the pipeline to a synthetic epidemic with sawtooth incidence resulting from a stepped growth rate function. The lag between synthetic input and estimate is assessed by minimising the root mean square error of input and estimated based on different lag offsets.

Usage

quantify_lag(pipeline, ip = i_empirical_ip, lags = -10:30)

Arguments

pipeline

a function taking an input dataset and an infectivity profile as inputs and producing an estimate as output. This is the whole parametrised pipeline including any other inputs. This can be a purrr style function.

ip

the infectivity profile.

A dataframe containing the following columns:

  • boot (anything + default(1)) - a bootstrap identifier

  • probability (proportion) - the probability of new event during this period.

  • a0 (double) - the beginning of the time period (in days)

  • a1 (double) - the end of the time period (in days)

Must be grouped by: boot (exactly).

A default value is defined.

lags

a vector with the delays to test. Defaults to -10 to +30 days

Value

a lag analysis dataframe containing the estimate type and the lag in days that the estimate is behind the actual observation

Examples

pipeline = ~ .x %>% poisson_locfit_model() %>% rt_from_incidence(ip = .y)
lag_analysis = quantify_lag(pipeline)

quantify_lag(~ rt_epiestim(.x,ip = .y))

Extract Parts of a POSIXt or Date Object

Description

Extract the weekday, month or quarter, or the Julian time (days since some origin). These are generic functions: the methods for the internal date-time classes are documented here.

Usage

## S3 method for class 'time_period'
quarters(x, ...)

Arguments

x

an object inheriting from class "POSIXt" or "Date".

...

arguments for other methods.

Value

weekdays and months return a character vector of names in the locale in use, i.e., Sys.getlocale("LC_TIME").

quarters returns a character vector of "Q1" to "Q4".

julian returns the number of days (possibly fractional) since the origin, with the origin as a "origin" attribute. All time calculations in R are done ignoring leap-seconds.

Note

Other components such as the day of the month or the year are very easy to compute: just use as.POSIXlt and extract the relevant component. Alternatively (especially if the components are desired as character strings), use strftime.

See Also

DateTimeClasses, Date; Sys.getlocale("LC_TIME") crucially for months() and weekdays().

Examples

## first two are locale dependent:
weekdays(.leap.seconds)
months  (.leap.seconds)
quarters(.leap.seconds)

## Show how easily you get month, day, year, day (of {month, week, yr}), ... :
## (remember to count from 0 (!): mon = 0..11, wday = 0..6,  etc !!)

##' Transform (Time-)Date vector  to  convenient data frame :
dt2df <- function(dt, dName = deparse(substitute(dt))) {
    DF <- as.data.frame(unclass(as.POSIXlt( dt )))
    `names<-`(cbind(dt, DF, deparse.level=0L), c(dName, names(DF)))
}
## e.g.,
dt2df(.leap.seconds)    # date+time
dt2df(Sys.Date() + 0:9) # date

##' Even simpler:  Date -> Matrix - dropping time info {sec,min,hour, isdst}
d2mat <- function(x) simplify2array(unclass(as.POSIXlt(x))[4:7])
## e.g.,
d2mat(seq(as.Date("2000-02-02"), by=1, length.out=30)) # has R 1.0.0's release date


## Julian Day Number (JDN, https://en.wikipedia.org/wiki/Julian_day)
## is the number of days since noon UTC on the first day of 4317 BCE.
## in the proleptic Julian calendar.  To more recently, in
## 'Terrestrial Time' which differs from UTC by a few seconds
## See https://en.wikipedia.org/wiki/Terrestrial_Time
julian(Sys.Date(), -2440588) # from a day
floor(as.numeric(julian(Sys.time())) + 2440587.5) # from a date-time

Wedge distribution

Description

The wedge distribution has a domain of 0 to 1 and has a linear probability density function over that domain.

Usage

qwedge(p, a, lower.tail = TRUE, log.p = FALSE)

Arguments

p

vector of probabilities

a

a gradient from -2 (left skewed) to 2 (right skewed)

lower.tail

logical; if TRUE (default), probabilities are P[X<=x] otherwise P[X>x].

log.p

logical; if TRUE, probabilities p are given as log(p).

Details

The rwedge can be combined with quantile functions to skew standard distributions, or introduce correlation or down weight certain parts of the distribution.

Value

a vector of probabilities, quantiles, densities or samples.

Examples

pwedge(seq(0,1,0.1), a=1)
dwedge(seq(0,1,0.1), a=1)
qwedge(c(0.25,0.5,0.75), a=-1)

stats::cor(
  stats::qnorm(rwedge(1000, a=2)),
  stats::qnorm(rwedge(1000, a=-2))
)

A random Bernoulli sample as a logical value

Description

A random Bernoulli sample as a logical value

Usage

rbern(n, prob)

Arguments

n

number of observations

prob

the mean probability (vectorised)

Value

a vector of logical values of size n

Examples

table(rbern(100, 0.25))

The Beta Distribution

Description

Density, distribution function, quantile function and random generation for the Beta distribution with parameters shape1 and shape2 (and optional non-centrality parameter ncp).

Usage

rbeta2(n, prob, kappa)

Arguments

n

number of observations

prob

the mean probability (vectorised)

kappa

a coefficient of variation. where 0 is no variability and 1 is maximally variability (vectorised)

Value

dbeta gives the density, pbeta the distribution function, qbeta the quantile function, and rbeta generates random deviates.

Invalid arguments will result in return value NaN, with a warning.

The length of the result is determined by n for rbeta, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

See Also

stats::rbeta()

Examples

rbeta2(3, c(0.1,0.5,0.9),0.1)

Sampling from the multinomial equivalent of the Bernoulli distribution

Description

Sampling from the multinomial equivalent of the Bernoulli distribution

Usage

rcategorical(n, prob, factor = FALSE)

Arguments

n

a sample size

prob

a (optionally named) vector of probabilities that will be normalised to sum to 1

factor

if not FALSE then factor levels will either be taken from the names of prob first, or if this is a character vector from this.

Value

a vector of random class labels of length n. Labels come from names of prob or from a character vector in factor.

Examples

prob = c("one"=0.1,"two"=0.2,"seven"=0.7)
table(rcategorical(1000,prob))
rcategorical(10,prob,factor=TRUE)
rcategorical(10,rep(1,26),factor=letters)

Random count data from a discrete gamma distribution

Description

For count data that is under-dispersed we can use a gamma distribution rounded to the nearest whole number. This is the same as the method of discretisation in make_gamma_ip, and suits delay distributions where there is less variability than can be represented with a Poisson or negative binomial distribution.

Usage

rdiscgamma(n, mean, sd, kappa)

Arguments

n

number of observations

mean

the mean value on the true scale (vectorised)

sd

the standard deviation on the true scale (vectorised)

kappa

a coefficient of variation. where 0 is no variability and 1 is maximally variability (vectorised)

Value

an integer valued vector from a gamma distribution.

See Also

stats::rgamma()

Examples

rdiscgamma(10, 2, 1)

Reband any discrete distribution

Description

e.g. age banded population, or a discrete probability distribution e.g. a serial interval distribution. This method fits a monotonically increasing spline to the cumulative distribution (including the upper and lower limits) and interpolating using that spline to the new cut points.

Usage

reband_discrete(
  x,
  y,
  xout,
  xlim = c(0, NA),
  ytotal = c(0, sum(y)),
  digits = 0,
  labelling = c("positive_integer", "inclusive", "exclusive"),
  sep = "-"
)

Arguments

x

a set of upper limits of bands, e.g. for age: 0-14;15-64;65-79;80+ is 15,65,80,NA

y

a set of quantities for each band e.g. population figures

xout

a set of new upper limits

xlim

Upper and lower limits for x. if the last band is e.g 80+ in the input and we want to know the 85+ band in the output some kind of maximum upper limit is needed to interpolate to.

ytotal

upper and lower limits for y. If the interpolation values fall outside of x then the minimum and maximum limits of y are given by this. This would be c(0,1) for a probability distribution, for example.

digits

if the xout value is continuous then how many significant figures to put in the labels

labelling

are the xout values interpretable as an inclusive upper limit, or an exclusive upper limit, or as an upper limit of an 'positive_integer“ quantity

sep

separator for names e.g. 18-24 or ⁠18 to 24⁠

Value

a re-banded set of discrete values, guaranteed to sum to the same as y

Examples

ul = stringr::str_extract(england_demographics$class, "_([0-9]+)",group = 1) %>%
  as.numeric()

tmp = reband_discrete(
  ul, england_demographics$population,
  c(5,10,15,40,80), xlim=c(0,120))

tmp

sum(tmp)
sum(england_demographics$population)

Re-parametrised distributions

Description

Re-parametrised distributions

Arguments

n

number of observations

x

vector of quantiles

q

vector of quantiles

p

vector of probabilities

log

logical; if TRUE, probabilities p are given as log(p).

log.p

logical; if TRUE, probabilities p are given as log(p).

lower.tail

logical; if TRUE (default), probabilities are P[X<=x] otherwise P[X>x].

convex

Show a warning if the distribution selected is not a convex function

prob

the mean probability (vectorised)

kappa

a coefficient of variation. where 0 is no variability and 1 is maximally variability (vectorised)

mean

the mean value on the true scale (vectorised)

sd

the standard deviation on the true scale (vectorised)


Rescale a timeseries in the temporal dimension

Description

Sometimes we may have, for example, modelled incidence or growth rates on weekly data resulting in cases per week and growth rate per week. We may wish to use this to estimate the reproduction number, using algorithms that assume a daily incidence. Not everything has a dependence on time, and things such as proportions, or prevalence will not change.

Usage

rescale_model(df = i_timeseries, time_unit)

Arguments

df

A data frame containing modelled output. This will modify the following columns if present:

A dataframe containing the following columns:

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period

  • incidence.fit (double) - an estimate of the incidence rate on a log scale

  • incidence.se.fit (positive_double) - the standard error of the incidence rate estimate on a log scale

  • incidence.0.025 (positive_double) - lower confidence limit of the incidence rate (true scale)

  • incidence.0.5 (positive_double) - median estimate of the incidence rate (true scale)

  • incidence.0.975 (positive_double) - upper confidence limit of the incidence rate (true scale)

Any grouping allowed.

A dataframe containing the following columns:

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period

  • growth.fit (double) - an estimate of the growth rate

  • growth.se.fit (positive_double) - the standard error the growth rate

  • growth.0.025 (double) - lower confidence limit of the growth rate

  • growth.0.5 (double) - median estimate of the growth rate

  • growth.0.975 (double) - upper confidence limit of the growth rate

Any grouping allowed.

A dataframe containing the following columns:

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period

  • proportion.fit (double) - an estimate of the proportion on a logit scale

  • proportion.se.fit (positive_double) - the standard error of proportion estimate on a logit scale

  • proportion.0.025 (proportion) - lower confidence limit of proportion (true scale)

  • proportion.0.5 (proportion) - median estimate of proportion (true scale)

  • proportion.0.975 (proportion) - upper confidence limit of proportion (true scale)

  • relative.growth.fit (double) - an estimate of the relative growth rate

  • relative.growth.se.fit (positive_double) - the standard error the relative growth rate

  • relative.growth.0.025 (double) - lower confidence limit of the relative growth rate

  • relative.growth.0.5 (double) - median estimate of the relative growth rate

  • relative.growth.0.975 (double) - upper confidence limit of the relative growth rate

Any grouping allowed.

time_unit

a lubridate period string such as "1 day"

Value

the same time series with different time unit, and adjusted incidence and growth rate figures.

Examples

sim = sim_poisson_model(time_unit = "1 week")
incidence = sim %>% poisson_locfit_model(frequency = "1 day", deg = 2, window=5)
incidence2 = incidence %>% rescale_model(time_unit = "1 day")
incidence2 %>% dplyr::glimpse()

The Gamma Distribution

Description

Density, distribution function, quantile function and random generation for the Gamma distribution with parameters shape and scale.

Usage

rgamma2(n, mean, sd = sqrt(mean), convex = TRUE)

Arguments

n

number of observations

mean

the mean value on the true scale (vectorised)

sd

the standard deviation on the true scale (vectorised)

convex

Show a warning if the distribution selected is not a convex function

Value

dgamma gives the density, pgamma gives the distribution function, qgamma gives the quantile function, and rgamma generates random deviates.

Invalid arguments will result in return value NaN, with a warning.

The length of the result is determined by n for rgamma, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

See Also

stats::rgamma()

Examples

rgamma2(10, 2, 1)

The Log Normal Distribution

Description

Density, distribution function, quantile function and random generation for the log normal distribution whose logarithm has mean equal to meanlog and standard deviation equal to sdlog.

Usage

rlnorm2(n, mean = 1, sd = sqrt(exp(1) - 1))

Arguments

n

number of observations. If length(n) > 1, the length is taken to be the number required.

mean

the mean value on the true scale (vectorised)

sd

the standard deviation on the true scale (vectorised)

Details

The log normal distribution has density

f(x)=12πσxe(log(x)μ)2/2σ2f(x) = \frac{1}{\sqrt{2\pi}\sigma x} e^{-(\log(x) - \mu)^2/2 \sigma^2}%

where μ\mu and σ\sigma are the mean and standard deviation of the logarithm. The mean is E(X)=exp(μ+1/2σ2)E(X) = exp(\mu + 1/2 \sigma^2), the median is med(X)=exp(μ)med(X) = exp(\mu), and the variance Var(X)=exp(2μ+σ2)(exp(σ2)1)Var(X) = exp(2\mu + \sigma^2)(exp(\sigma^2) - 1) and hence the coefficient of variation is exp(σ2)1\sqrt{exp(\sigma^2) - 1} which is approximately σ\sigma when that is small (e.g., σ<1/2\sigma < 1/2).

Value

dlnorm gives the density, plnorm gives the distribution function, qlnorm gives the quantile function, and rlnorm generates random deviates.

The length of the result is determined by n for rlnorm, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

Note

The cumulative hazard H(t)=log(1F(t))H(t) = - \log(1 - F(t)) is -plnorm(t, r, lower = FALSE, log = TRUE).

Source

dlnorm is calculated from the definition (in ‘Details’). [pqr]lnorm are based on the relationship to the normal.

Consequently, they model a single point mass at exp(meanlog) for the boundary case sdlog = 0.

References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.

See Also

Distributions for other standard distributions, including dnorm for the normal distribution.

Examples

rlnorm2(10, 2, 1)

The Negative Binomial Distribution

Description

Density, distribution function, quantile function and random generation for the negative binomial distribution with parameters size and prob.

Usage

rnbinom2(n, mean, sd = sqrt(mean))

Arguments

n

number of observations. If length(n) > 1, the length is taken to be the number required.

mean

the mean value on the true scale (vectorised)

sd

the standard deviation on the true scale (vectorised)

Details

The negative binomial distribution with size =n= n and prob =p= p has density

p(x)=Γ(x+n)Γ(n)x!pn(1p)xp(x) = \frac{\Gamma(x+n)}{\Gamma(n) x!} p^n (1-p)^x

for x=0,1,2,x = 0, 1, 2, \ldots, n>0n > 0 and 0<p10 < p \le 1.

This represents the number of failures which occur in a sequence of Bernoulli trials before a target number of successes is reached. The mean is μ=n(1p)/p\mu = n(1-p)/p and variance n(1p)/p2n(1-p)/p^2.

A negative binomial distribution can also arise as a mixture of Poisson distributions with mean distributed as a gamma distribution (see pgamma) with scale parameter (1 - prob)/prob and shape parameter size. (This definition allows non-integer values of size.)

An alternative parametrization (often used in ecology) is by the mean mu (see above), and size, the dispersion parameter, where prob = size/(size+mu). The variance is mu + mu^2/size in this parametrization.

If an element of x is not integer, the result of dnbinom is zero, with a warning.

The case size == 0 is the distribution concentrated at zero. This is the limiting distribution for size approaching zero, even if mu rather than prob is held constant. Notice though, that the mean of the limit distribution is 0, whatever the value of mu.

The quantile is defined as the smallest value xx such that F(x)pF(x) \ge p, where FF is the distribution function.

Value

dnbinom gives the density, pnbinom gives the distribution function, qnbinom gives the quantile function, and rnbinom generates random deviates.

Invalid size or prob will result in return value NaN, with a warning.

The length of the result is determined by n for rnbinom, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

rnbinom returns a vector of type integer unless generated values exceed the maximum representable integer when double values are returned.

Source

dnbinom computes via binomial probabilities, using code contributed by Catherine Loader (see dbinom).

pnbinom uses pbeta.

qnbinom uses the Cornish–Fisher Expansion to include a skewness correction to a normal approximation, followed by a search.

rnbinom uses the derivation as a gamma mixture of Poisson distributions, see

Devroye, L. (1986) Non-Uniform Random Variate Generation. Springer-Verlag, New York. Page 480.

See Also

Distributions for standard distributions, including dbinom for the binomial, dpois for the Poisson and dgeom for the geometric distribution, which is a special case of the negative binomial.

Examples

rnbinom2(10, 5, sqrt(5))

Reproduction number estimate using the Cori method

Description

Calculate a reproduction number estimate from incidence data using a reimplementation of the Cori method and an empirical generation time distribution. This uses a mixture distribution to transmit uncertainty in generation time estimates. A number of changes in the implementation are made. Firstly there is no technical limitation to the infectivity profile being strictly positive in time. Secondly this implementation should tolerate missing count values (NA values must be filtered out though). Thirdly for a given time point t this applies all Rt estimates for which the window spans time point t rather than end on time point t, and fourthly this implementation allows multiple windows widths to be calculated in parallel and aggregated. All of this tends to increase uncertainty in the result particularly in the time dimension, which addresses some of the issue seem with EpiEstim during the pandemic. Finally it is quite a bit quicker, especially if approximate quantiles are all that are needed.

Usage

rt_cori(
  df = i_incidence_input,
  ip = i_discrete_ip,
  window = 14,
  mean_prior = 1,
  std_prior = 2,
  ...,
  epiestim_compat = FALSE,
  approx = FALSE
)

Arguments

df

The count data. Extra groups are allowed.

A dataframe containing the following columns:

  • count (positive_integer) - Positive case counts associated with the specified time frame

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period'

Ungrouped.

ip

A long format infectivity profile.

A dataframe containing the following columns:

  • boot (anything + default(1)) - a bootstrap identifier

  • probability (proportion) - the probability of new event during this period.

  • tau (integer + complete) - the days since the index event.

Must be grouped by: boot (exactly).

A default value is defined.

window
  • the widths of the Cori method window to include in the estimate. This can be a vector of values and all windows will be calculated and aggregated.

mean_prior

the prior for the $R_t$ estimate. When sample size is low the $R_t$ estimate will revert to this prior. In EpiEstim the default is a high number to allow detection of insufficient data but this tends to create anomalies in the early part of infection time series. A possible value is $R_0$ but in fact this also will be a poor choice for the value of $R_t$ when case numbers drop to a low value.

std_prior

the prior for the $R_t$ SD.

...

not used

epiestim_compat

produce an estimate of Rt using only windows that end on the time t rather than all windows that span time t. If this option is selected there can also only be one value for window.

approx

approximate the quantiles of the mixture distribution with a gamma distribution with the same first mean and SD.

Details

This will calculate a reproduction number for each group in the input dataframe.

Value

A dataframe containing the following columns:

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period

  • rt.fit (double) - an estimate of the reproduction number

  • rt.se.fit (positive_double) - the standard error of the reproduction number

  • rt.0.025 (double) - lower confidence limit of the reproduction number

  • rt.0.5 (double) - median estimate of the reproduction number

  • rt.0.975 (double) - upper confidence limit of the reproduction number

Any grouping allowed.

Examples

#TODO: speed up example

tmp = ggoutbreak::england_covid  %>%
  dplyr::filter(date < "2021-01-01") %>%
  time_aggregate(count=sum(count))

tmp2 = tmp %>% rt_cori(epiestim_compat = TRUE)
tmp3 = tmp %>% rt_cori(window=c(5:14), approx=TRUE)

comp = dplyr::bind_rows(
  tmp2 %>% dplyr::mutate(class = "EpiEstim"),
  tmp3 %>% dplyr::mutate(class = "Cori+")
) %>% dplyr::group_by(class)

if (interactive()) {
  plot_rt(comp)+
   ggplot2::geom_errorbar(
     data=england_consensus_rt %>% dplyr::filter(date < "2021-01-01"),
     mapping=ggplot2::aes(x=date,ymin=low,ymax=high),colour="black")+
   ggplot2::coord_cartesian(ylim=c(0.5,1.75))
}

EpiEstim reproduction number

Description

Calculate a reproduction number estimate from incidence data using the EpiEstim library and an empirical generation time distribution. This uses resampling to transmit uncertainty in generation time estimates. This is quite slow for each time series depending on the number of bootstraps and samples in the infectivity profile.

Usage

rt_epiestim(
  df = i_incidence_input,
  ip = i_discrete_ip,
  bootstraps = 2000,
  window = 14,
  mean_prior = 1,
  std_prior = 2,
  ...
)

Arguments

df

Count data. Extra groups are allowed.

A dataframe containing the following columns:

  • count (positive_integer) - Positive case counts associated with the specified time frame

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period'

Ungrouped.

ip

infectivity profile

A dataframe containing the following columns:

  • boot (anything + default(1)) - a bootstrap identifier

  • probability (proportion) - the probability of new event during this period.

  • tau (integer + complete) - the days since the index event.

Must be grouped by: boot (exactly).

A default value is defined.

bootstraps
  • the number of bootstraps to take to calculate for each point.

window
  • the width of the EpiEstim window

mean_prior

the prior for the $R_t$ estimate. When sample size is low the $R_t$ estimate will revert to this prior. In EpiEstim the default is a high number to allow detection of insufficient data but this tends to create anomalies in the early part of infection time series. A possible value is $R_0$ but in fact this also will be a poor choice for the value of $R_t$ when case numbers drop to a low value.

std_prior

the prior for the $R_t$ SD.

...

not used

Details

This will calculate a reproduction number for each group in the input dataframe.

Value

A dataframe containing the following columns:

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period

  • rt.fit (double) - an estimate of the reproduction number

  • rt.se.fit (positive_double) - the standard error of the reproduction number

  • rt.0.025 (double) - lower confidence limit of the reproduction number

  • rt.0.5 (double) - median estimate of the reproduction number

  • rt.0.975 (double) - upper confidence limit of the reproduction number

Any grouping allowed.

Examples

tmp = ggoutbreak::england_covid %>%
  time_aggregate(count=sum(count))

if (interactive()) {
  # not run due to long running
  tmp2 = tmp %>% rt_epiestim()
}

Wallinga-Lipsitch reproduction number from growth rates

Description

Calculate a reproduction number estimate from growth rate using the Wallinga 2007 estimation using empirical generation time distribution. This uses resampling to transmit uncertainty in growth rate estimates. This also handles time-series that are not on a daily cadence (although this is experimental). The reproduction number estimate is neither a instantaneous (backward looking) nor case (forward looking) reproduction number but somewhere between the two.

Usage

rt_from_growth_rate(
  df = i_growth_rate,
  ip = i_empirical_ip,
  bootstraps = 1000,
  seed = Sys.time()
)

Arguments

df

Growth rate estimates

A dataframe containing the following columns:

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period'

  • growth.fit (double) - an estimate of the growth rate

  • growth.se.fit (positive_double) - the standard error the growth rate

  • growth.0.025 (double) - lower confidence limit of the growth rate

  • growth.0.5 (double) - median estimate of the growth rate

  • growth.0.975 (double) - upper confidence limit of the growth rate

Any grouping allowed.

ip

Infectivity profile

A dataframe containing the following columns:

  • boot (anything + default(1)) - a bootstrap identifier

  • probability (proportion) - the probability of new event during this period.

  • a0 (double) - the beginning of the time period (in days)

  • a1 (double) - the end of the time period (in days)

Must be grouped by: boot (exactly).

A default value is defined.

bootstraps
  • the number of bootstraps to take to calculate for each point.

seed

a random number generator seed

Value

A dataframe containing the following columns:

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period

  • rt.fit (double) - an estimate of the reproduction number

  • rt.se.fit (positive_double) - the standard error of the reproduction number

  • rt.0.025 (double) - lower confidence limit of the reproduction number

  • rt.0.5 (double) - median estimate of the reproduction number

  • rt.0.975 (double) - upper confidence limit of the reproduction number

Any grouping allowed.

Examples

tmp = ggoutbreak::england_covid %>%
  time_aggregate(count=sum(count))


if (interactive()) {
  # not run
  withr::with_options(list("ggoutbreak.keep_cdf"=TRUE),{
   tmp2 = tmp %>%
      poisson_locfit_model() %>%
      rt_from_growth_rate()
  })
}

Reproduction number from modelled incidence

Description

Calculate a reproduction number estimate from modelled incidence using the methods described in the vignette "Estimating the reproduction number from modelled incidence" and using a set of empirical generation time distributions. This assumes that modelled incidence has the same time unit as the ip distribution, and that this is daily, if this is not the case then rescale_model() may be able to fix it.

Usage

rt_from_incidence(df = i_incidence_model, ip = i_discrete_ip, approx = FALSE)

Arguments

df

modelled incidence estimate

A dataframe containing the following columns:

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period'

  • incidence.fit (double) - an estimate of the incidence rate on a log scale

  • incidence.se.fit (positive_double) - the standard error of the incidence rate estimate on a log scale

  • incidence.0.025 (positive_double) - lower confidence limit of the incidence rate (true scale)

  • incidence.0.5 (positive_double) - median estimate of the incidence rate (true scale)

  • incidence.0.975 (positive_double) - upper confidence limit of the incidence rate (true scale)

Any grouping allowed.

ip

an infectivity profile (aka generation time distribution)

A dataframe containing the following columns:

  • boot (anything + default(1)) - a bootstrap identifier

  • probability (proportion) - the probability of new event during this period.

  • tau (integer + complete) - the days since the index event.

Must be grouped by: boot (exactly).

A default value is defined.

approx

use a faster, but approximate, estimate of quantiles

Value

A dataframe containing the following columns:

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period

  • rt.fit (double) - an estimate of the reproduction number

  • rt.se.fit (positive_double) - the standard error of the reproduction number

  • rt.0.025 (double) - lower confidence limit of the reproduction number

  • rt.0.5 (double) - median estimate of the reproduction number

  • rt.0.975 (double) - upper confidence limit of the reproduction number

Any grouping allowed.

Examples

df = ggoutbreak::england_covid %>%
  dplyr::filter(date < "2021-01-01") %>%
  time_aggregate(count=sum(count)) %>%
  poisson_locfit_model()

if (interactive()) {
  # not run
  withr::with_options(list("ggoutbreak.keep_cdf"=TRUE),{
    tmp3 = df %>% rt_from_incidence(ganyani_ip)
  })
}

tmp = df %>% rt_from_incidence(du_serial_interval_ip)

Reproduction number from renewal equation applied to modelled incidence using statistical re-sampling

Description

Calculate a reproduction number estimate from modelled incidence estimates, by statistical sampling from a log-normally distributed incidence estimate, combined with uncertain infectivity profile specified as multiple discrete empirical distributions.

Usage

rt_from_renewal(
  df = i_incidence_model,
  ip = i_discrete_ip,
  bootstraps = 1000,
  seed = Sys.time()
)

Arguments

df

modelled incidence estimate

A dataframe containing the following columns:

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period'

  • incidence.fit (double) - an estimate of the incidence rate on a log scale

  • incidence.se.fit (positive_double) - the standard error of the incidence rate estimate on a log scale

  • incidence.0.025 (positive_double) - lower confidence limit of the incidence rate (true scale)

  • incidence.0.5 (positive_double) - median estimate of the incidence rate (true scale)

  • incidence.0.975 (positive_double) - upper confidence limit of the incidence rate (true scale)

Any grouping allowed.

ip

infectivity profile

A dataframe containing the following columns:

  • boot (anything + default(1)) - a bootstrap identifier

  • probability (proportion) - the probability of new event during this period.

  • tau (integer + complete) - the days since the index event.

Must be grouped by: boot (exactly).

A default value is defined.

bootstraps

the number of samples to take at each time point. This will be rounded up to a whole multiple of the infectivity profile distribution length.

seed

a random number seed for reproducibility

Value

A dataframe containing the following columns:

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period

  • rt.fit (double) - an estimate of the reproduction number

  • rt.se.fit (positive_double) - the standard error of the reproduction number

  • rt.0.025 (double) - lower confidence limit of the reproduction number

  • rt.0.5 (double) - median estimate of the reproduction number

  • rt.0.975 (double) - upper confidence limit of the reproduction number

Any grouping allowed.

Examples

df = ggoutbreak::england_covid %>%
  time_aggregate(count=sum(count)) %>%
    poisson_locfit_model()

if (interactive()) {
  # not run
  withr::with_options(list("ggoutbreak.keep_cdf"=TRUE),{
   tmp2 = df %>%
      rt_from_renewal(ganyani_ip)
  })

}

Wedge distribution

Description

The wedge distribution has a domain of 0 to 1 and has a linear probability density function over that domain.

Usage

rwedge(n, a)

Arguments

n

number of observations

a

a gradient from -2 (left skewed) to 2 (right skewed)

Details

The rwedge can be combined with quantile functions to skew standard distributions, or introduce correlation or down weight certain parts of the distribution.

Value

a vector of probabilities, quantiles, densities or samples.

Examples

pwedge(seq(0,1,0.1), a=1)
dwedge(seq(0,1,0.1), a=1)
qwedge(c(0.25,0.5,0.75), a=-1)

stats::cor(
  stats::qnorm(rwedge(1000, a=2)),
  stats::qnorm(rwedge(1000, a=-2))
)

A log1p y scale

Description

A log1p y scale

Usage

scale_y_log1p(..., n = 5, base = 10, dp = 0)

Arguments

...

Other arguments passed on to ⁠scale_(x|y)_continuous()⁠

n

the number of major breaks

base

the base for the logarithm

dp

decimal points

Value

a ggplot scale


A logit y scale

Description

A logit y scale

Usage

scale_y_logit(...)

Arguments

...

Other arguments passed on to ⁠scale_(x|y)_continuous()⁠

Value

a ggplot scale


Calculate scoring statistics from predictions using scoringutils.

Description

This performs a range of quantile based, and if cumulative distribution functions are available, continuous scoring metrics for each estimate time-point.

Usage

score_estimate(est, obs, lags = NULL)

Arguments

est

a dataframe of estimates of incidence, growth rate of reproduction number based off a simulation or data with known parameters.

obs

a dataframe of the ground truth, sharing the same grouping as est with at least one column(s) named XXX.obs with XXX being one of rt,growth or incidence or any other column group predicted in est.

lags

a data frame of estimate types and lags as output by quantify_lag() if multiple models are included then the columns must match those in obs.

Value

a dataframe of scoring metrics

Examples

tmp2 = test_bpm %>% sim_summarise_linelist()

withr::with_options(list("ggoutbreak.keep_cdf"=TRUE),{
   est = tmp2 %>% poisson_locfit_model() %>% rt_from_incidence()
})

obs = tmp2 %>% dplyr::mutate(rt.obs = rt.weighted)
score_estimate(est,obs) %>% dplyr::glimpse()

Apply a ascertainment bias to the observed case counts.

Description

Apply a ascertainment bias to the observed case counts.

Usage

sim_apply_ascertainment(df = i_sim_count_data, fn_asc = ~1, seed = Sys.time())

Arguments

df

a count dataframe from e.g. sim_poisson_model() or sim_summarise_linelist()

A dataframe containing the following columns:

  • statistic (character) - An identifier for the statistic, whether that be infections, admissions, deaths

  • count (positive_integer) - Positive case counts associated with the specified time frame

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period'

Minimally grouped by: statistic (and other groupings allowed).

fn_asc

a function that takes a single input vector t and returns a probability of ascertainment, e.g. ~ stats::rbeta(.x, 20, 80) or ⁠~ rbeta2(.x,prob=<probability>,kappa=<dispersion>)⁠. or cfg_weekly_proportion_rng()

seed

a RNG seed

Value

a dataframe with original column, and count column modified to include ascertainment bias.

Examples

tibble::tibble(
  statistic = "incidence",
  time=as.time_period(1:10,"1 day"),
  count=rep(100,10)
) %>% dplyr::group_by(statistic) %>% sim_apply_ascertainment(~ ifelse(.x<=5,0.1,0.9))

Apply delay distribution to count or linelist data

Description

Events include symptom onset, admission, death, test sampling, test processing

Usage

sim_apply_delay(
  df,
  ...,
  fn_p_symptomatic = ~0.5,
  fn_p_admitted = ~0.1,
  fn_p_died = ~0.05,
  fn_p_tested = ~0.8,
  seed = Sys.time()
)

Arguments

df

a line list dataframe arising from e.g. sim_branching_process()

...

Named arguments passed on to sim_apply_delay.linelist

fn_symptom_delay,fn_admission_delay,fn_death_delay

a function that calculates the time to event onset from infection. This will be called with a vector of infection times as the first parameter (time) but all other columns of df are also available as well as the symptomatic,died,and admitted flags. The function must be vectorised on its inputs (and consume additional inputs with ...). A purrr style lambda is OK e.g. ~ stats::rgamma(.x, shape = 3), and the first parameter will be infection time. if you have an discrete probability profile for this then you can use cfg_ip_sampler_rng(ip_symptoms).

fn_sample_delay

This function returns the time from either symptom onset (symptomatic) or from infection (asymptomatic) until a sample is taken. (N.B. this might be better to do a screening test probability plus screening test frequency rather than overloading this.)

fn_result_delay

Identical to other functions except the first parameter will be sample_time rather than time of infection. This is the time from sampling to the result being available.

Named arguments passed on to sim_apply_delay.count_data

fn_symptom_profile,fn_admission_profile,fn_death_profile

a function that takes time and returns the probability density of symptoms, admissions, or deaths over time since infection (i.e. tau) as an ip delay distribution. If possible it is a very good idea to pre-compute these distributions as they need to be assigned to every line in the input and this can be very slow.

fn_sample_profile

a function that takes time and returns the probability density of test sample being taken over time since symptoms.

fn_result_profile

a function that takes time and returns the probability density of test result being available over time since test sampling.

fn_p_symptomatic, fn_p_admitted, fn_p_died, fn_p_tested

Function that returns a probability between 0 and 1 for each row of the input dataframe. A purrr style lambda is OK (e.g. ~ 1 for always true) the first parameter of this will be time of infection. The function must be vectorised on its inputs (and consume additional inputs with ...)

seed

RNG seed for reproducibility

Value

Depends on input, either:

  • a wide format line list with additional XX, XX_time and XX_delay columns, for each of the set of statistics generated.

  • a long format set of counts of different statistics i.e. infections, symptoms, admission, death, sample (tests taken), results (test results) .

Examples

tmp = sim_branching_process(
  changes = tibble::tibble(t = c(0,20,40,60,80,110), R = c(1.8,1.5,0.9,1.5,0.8,1.2)),
  max_time = 120,
  seed = 100
)

tmp2 = tmp %>% sim_apply_delay()
tmp2 %>% dplyr::glimpse()

Generate a line list from a branching process model parametrised by reproduction number

Description

Generate a line list from a branching process model parametrised by reproduction number

Usage

sim_branching_process(
  changes = tibble::tibble(t = c(0, 40), rt = c(2.5, 0.8)),
  max_time = 80,
  seed = Sys.time(),
  fn_Rt = cfg_step_fn(changes),
  fn_ip = ~test_ip,
  fn_kappa = ~1,
  imports_df = NULL,
  fn_imports = ~ifelse(.x == 0, 30, 0),
  fn_list_next_gen = list(),
  ...
)

Arguments

changes

a dataframe containing a t time column and R reproduction number parameter. This parameter is optional if fn_Rt is specified

max_time

maximum duration of simulation

seed

random seed

fn_Rt

can be specified instead of changes df. This is a vectorised function that accepts a time parameter and returns a reproduction number. If both this and changes are specified this takes preference.

fn_ip

a function that takes input vector t (and/or class) and returns an infectivity profile at times t.

fn_kappa

a vectorised function taking t and other imported case metadata returning a dispersion parameter controlling the likelihood of individual super-spreading. This must be between 1 and Inf with 1 being standard poisson dispersion and larger values representing over dispersion.

imports_df

a data frame containing minimally time and count columns plus any metadata about the imports in additional columns. Metadata columns can inform the fn_Rt,fn_kappa and fn_ip functions as additional parameters.

fn_imports

a time varying function the defines the number of infected importations. If imports_df is defined then this is used instead

fn_list_next_gen

a named list of functions. The name corresponds to metadata columns in the simulation, the function is a purrr style mapping that will replace the old value in the named column with a new one. Such a function can be generated with cfg_transition_fn() when a transition probability matrix is involved, of it can be specified directly as a case_when style function. The function must be vectorised and assume no grouping structure. If the function has named parameters it can reference any of the metadata columns, or time (as t). The rcategorical() function may be useful in this scenario.

...

not used

Value

a line list of cases, with individual ids, infection times, and infector ids, for a simulated outbreak

Examples

tmp = sim_branching_process(
  changes = tibble::tibble(t = c(0,40), R = c(1.5,0.8)),
  max_time = 120,
  seed = 100,
  fn_imports = ~ ifelse(.x<10,1,0)
)

if(interactive()) {
  plot_cases(tmp, mapping=ggplot2::aes(fill=as.factor(generation)),linewidth=0.1)
}

# imports can also be specified as a dataframe, which allows additional
# metadata in the line list. An example of which is as follows:
imports_df = tibble::tribble(
  ~time, ~variant, ~count,
  0:4, "wild-type", 100,
  10:14, "alpha", 5,
)

Apply a time varying probability and convolution to count data

Description

Standard convolution assumes one delay distribution. This is actually not what we see in reality as delays can depend on any factors, including the day of week. This function applies a convolution to an input time-series when that convolution is expressed as a function (usually of time, but can be of anything in the input dataframe). The convolution is then sampled using a poisson or negative binomial

Usage

sim_convolution(
  df = i_sim_count_data,
  p_fn,
  delay_fn,
  ...,
  input = "infections",
  output,
  kappa = 1,
  from = c("count", "rate")
)

Arguments

df

a count dataframe from e.g. sim_poisson_model() or sim_summarise_linelist()

A dataframe containing the following columns:

  • statistic (character) - An identifier for the statistic, whether that be infections, admissions, deaths

  • count (positive_integer) - Positive case counts associated with the specified time frame

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a 'time_period'

Minimally grouped by: statistic (and other groupings allowed).

p_fn

a function that takes a time parameter and potentially and returns probability of observation given something occurs. a no-op for this parameter is ~ 1.

delay_fn

a function that takes time and returns the probability of observation (given it occurred) over time since infection (i.e. tau) as an ip delay distribution. This does not have to sum to 1 (e.g. mapping incidence to prevalence) but if not then the combination of p_fn and delay_fn is less easy to interpret. This should behave sensibly if p changes halfway through a convolution. See cfg_weekly_ip_fn() and cfg_gamma_ip_fn() for helper functions to construct this parameter. A no-op for this parameter would be ~ ifelse(.x==0,1,0).

...

not used

input

the input statistic

output

the output statistic

kappa

dispersion. scaled such that poisson dispersion is 1. Values must be 0 (no dispersion), 1 (poisson dispersion) or greater than 1 for over-dispersion.

from

Controls if you base future counts on previous counts or on underlying rate, defaults to count but rate is a possibility if you want to base the convolution off a more theoretical value rather than observed cases. Either way the convolution generates a new rate which is in turn sampled into a poisson or negative binomial count.

Value

return the result of applying this convolution to the data.

Examples

weekday_delay = make_gamma_ip(median_of_mean = 5, median_of_sd = 2)
weekend_delay = make_gamma_ip(median_of_mean = 6, median_of_sd = 2)

delay_fn = ~ ifelse(.x %% 7 %in% c(6,7), list(weekend_delay), list(weekday_delay))
p_fn = ~ ifelse(.x < 20, 0.5, 0.75)

data = tibble::tibble(
    time=1:40,
    count = rep(100,40),
    rate = rep(100,40),
    statistic="infections") %>% dplyr::group_by(statistic)
delayed = data %>%
    sim_convolution(p_fn,delay_fn,output="delayed") %>%
    dplyr::filter(statistic=="delayed")
if (interactive()) ggplot2::ggplot(delayed,ggplot2::aes(x=time))+
  ggplot2::geom_line(ggplot2::aes(y=rate))+
  ggplot2::geom_line(ggplot2::aes(y=count))

# other example delay functions
delay_fn = cfg_gamma_ip_fn( ~ ifelse(.x<5, 8, 4))

Apply a time-varying probability and delay function to linelist data

Description

Apply a time-varying probability and delay function to linelist data

Usage

sim_delay(
  df = i_sim_linelist,
  p_fn,
  delay_fn,
  input = "time",
  output = "event",
  seed = Sys.time()
)

Arguments

df

a line list dataframe arising from e.g. sim_branching_process()

A dataframe containing the following columns:

  • id (unique_id) - Patient level unique id

  • time (ggoutbreak::time_period) - Time of infection. A 'time_period'

Any grouping allowed.

p_fn

Function that returns a probability between 0 and 1 for each row of the input dataframe. A purrr style lambda is OK (e.g. ~ 1 for always true) the first parameter of this will be time of infection. The function must be vectorised on its inputs (and consume additional inputs with ...)

delay_fn

A function that calculates the time to event onset from the input time. This will be called with a vector of infection times as the first parameter (time) but all other columns of df are also available as well as the symptomatic,died,and admitted flags. The function must be vectorised on its inputs (and consume additional inputs with ...). A purrr style lambda is OK e.g. ~ stats::rgamma(.x, shape = 3), and the first parameter will be infection time. if you have an discrete probability profile for this then you can use cfg_ip_sampler_rng(ip_symptoms) without the tilde.

input

a time column from which to calculate the delay from.

output

an output column set name (defaults to "event")

seed

RNG seed for reproducibility

Value

the line list with extra columns with prefix given by output, specifying whether the event was observed, the delay and the simulation time.

Examples

tmp = sim_branching_process(
  changes = tibble::tibble(t = c(0,20,40,60,80,110), R = c(1.8,1.5,0.9,1.5,0.8,1.2)),
  max_time = 120,
  seed = 100
)

tmp2 = tmp %>% sim_delay(
  p_fn = ~ rbern(.x, 0.8),
  delay_fn = ~ rgamma2(.x, mean = 5),
)
tmp2 %>% dplyr::glimpse()

Apply a right censoring to count data.

Description

Delayed observations means that if, for example a case is only attributed to a disease after a delay there is right censoring of the data. There can be very complex patterns of right censoring if for example observations are batched and published weekly. During COVID in the UK some death data was published frequently but some was retrospectively reported on monthly intervals, depending on where the patient died, which lead to complex time dependent biases in the death data. Given the description of the delay, this function will simulate this effect for count data. In another example there were delays reporting test results in the run up to Christmas which resulted in case rates apparently dropping as schools broke up. This could have affected timing of the 2021 Christmas lockdown.

Usage

sim_delayed_observation(
  df = i_sim_count_data,
  delay_fn,
  ...,
  input = "infections",
  output = input,
  max_time = max(df$time)
)

Arguments

df

a count dataframe from e.g. sim_poisson_model() or sim_summarise_linelist()

delay_fn

a function that takes time and returns the probability of observation (given it occurred) over time since infection (i.e. tau) as an ip delay distribution. This does not have to sum to 1 (e.g. mapping incidence to prevalence) but if not then it will behave as if some fraction or events are not observed (or observed multiple times). See cfg_weekly_ip_fn() and cfg_gamma_ip_fn() for helper functions to construct this parameter.

...

not used

input

the input statistic (defaults to count)

output

the output column name (defaults to same as input)

max_time

the date on which censoring is taking place.

Value

the result of applying this right censoring to the data.

Examples

weekday_delay = make_gamma_ip(median_of_mean = 5, median_of_sd = 2)
weekend_delay = make_gamma_ip(median_of_mean = 7, median_of_sd = 2)

delay_fn = ~ ifelse(.x %% 7 %in% c(6,7), list(weekend_delay), list(weekday_delay))

data = tibble::tibble(time=1:40, count = rep(100,40), statistic="infections") %>%
  dplyr::group_by(statistic) %>%
  sim_delayed_observation(delay_fn,output="delayed")

if (interactive()) ggplot2::ggplot(data,ggplot2::aes(x=time,colour=statistic))+
  ggplot2::geom_line(ggplot2::aes(y=count))

Generate a multinomial outbreak defined by per class growth rates and a poisson model

Description

Generate a multinomial outbreak defined by per class growth rates and a poisson model

Usage

sim_multinomial(
  changes = tibble::tibble(t = c(0, 20, 40, 60, 80), variant1 = c(0.1, 0, -0.1, 0, 0.1),
    variant2 = c(0.15, 0.05, -0.05, -0.01, 0.05), variant3 = c(0, 0.05, -0.05, +0.05,
    -0.05), ),
  initial = c(100, 100, 100),
  time_unit = "1 day",
  ...
)

Arguments

changes

a list of time points in column t and growth rates per week per class, in other columns.

initial

the size of the initial outbreak per class. There should be one entry per class

time_unit

e.g. a daily or weekly time series: "1 day", "7 days"

...

Named arguments passed on to sim_poisson_model

fn_imports

a function that takes input vector t and returns the number of imported cases at times t.

seed

a random seed

kappa

a dispersion parameter. 1 is no dispersion (compared to poisson), smaller values mean more dispersion.

max_time

the desired length of the time series,

fn_growth

a function that takes input vector t and returns the growth rates at times t

Value

a case count time series including class, count and time columns

Examples

if (interactive()) {
  plot_counts(
    sim_multinomial() %>% dplyr::glimpse()
  )
}

Generate an outbreak case count series defined by growth rates using a poisson model.

Description

Generate an outbreak case count series defined by growth rates using a poisson model.

Usage

sim_poisson_model(
  changes = tibble::tibble(t = c(0, 20, 40, 60, 80), growth = c(0.1, 0, -0.1, 0, 0.1)),
  kappa = 1,
  max_time = 104,
  seed = Sys.time(),
  fn_growth = cfg_step_fn(changes),
  fn_imports = ~ifelse(.x == 0, 100, 0),
  time_unit = "1 day"
)

Arguments

changes

a dataframe holding change time points (t) and growth rate per week (growth) columns

kappa

a dispersion parameter. 1 is no dispersion (compared to poisson), smaller values mean more dispersion.

max_time

the desired length of the time series,

seed

a random seed

fn_growth

a function that takes input vector t and returns the growth rates at times t

fn_imports

a function that takes input vector t and returns the number of imported cases at times t.

time_unit

e.g. a daily or weekly time series: "1 day", "7 days"

Value

A dataframe containing the following columns:

  • count (positive_integer) - Positive case counts associated with the specified time frame

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period

Ungrouped.

Examples

tmp2 = sim_poisson_model(seed=100, fn_imports = ~ ifelse(.x %in% c(0,50),100,0))

if (interactive()) {
  ggplot2::ggplot(tmp2)+ggplot2::geom_point(ggplot2::aes(x=time,y=count))
}

Generate an outbreak case count series defined by Reproduction number using a poisson model.

Description

Generate an outbreak case count series defined by Reproduction number using a poisson model.

Usage

sim_poisson_Rt_model(
  changes = tibble::tibble(t = c(0, 40), rt = c(2.5, 0.8)),
  kappa = 1,
  max_time = 80,
  seed = Sys.time(),
  fn_Rt = cfg_step_fn(changes),
  fn_imports = ~ifelse(.x == 0, 30, 0),
  fn_ip = ~test_ip,
  time_unit = "1 day"
)

Arguments

changes

a dataframe holding change time points (t) and reproduction number (rt) columns

kappa

a dispersion parameter. 1 is no dispersion (compared to poisson), smaller values mean more dispersion.

max_time

the desired length of the time series,

seed

a random seed

fn_Rt

a function that takes input vector t and returns the instantaneous reproduction number at time t

fn_imports

a function that takes input vector t and returns the number of imported cases at times t.

fn_ip

a function that takes input vector t and returns an infectivity profile at times t.

time_unit

e.g. a daily or weekly time series: "1 day", "7 days"

Value

A dataframe containing the following columns:

  • count (positive_integer) - Positive case counts associated with the specified timeframe

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period

Ungrouped.

Examples

tmp = sim_poisson_Rt_model(kappa=1, seed=100, fn_imports = ~ ifelse(.x %in% c(0,50),100,0))

if (interactive()) {
  ggplot2::ggplot(tmp,ggplot2::aes(x=time,y=count))+ggplot2::geom_point()+
    ggplot2::geom_line()
}

Summarise a line list

Description

This function converts a line list into a daily count of incident cases, plus infections, admissions, deaths, test samples, test results if present. Censoring of these counts can also be defined. Whilst summarising various network measures such as the forward looking case reproduction number are also calculated.

Usage

sim_summarise_linelist(
  df = i_sim_linelist,
  ...,
  censoring = list(),
  max_time = max(df$time)
)

Arguments

df

a line list dataframe arising from e.g. sim_branching_process()

A dataframe containing the following columns:

  • id (unique_id) - Patient level unique id

  • time (ggoutbreak::time_period) - Time of infection. A 'time_period'

Any grouping allowed.

...

the grouping to include in the summarisation.

censoring

a named list of column names (without the ⁠_time⁠ suffix) of the kind created by sim_delay() or sim_apply_delay(), and an associated function defining the delay of reporting that the column experiences. For this function t (or .x for a purrr lambda) will refer to the XX_time column, i.e. whenever the event that is being reported happened and time the simulation infection time. N.B. since infection is not observed you can't censor it.

max_time

the censoring time for this observation.

Value

a count data frame with various R_t measures, ascertainment, and underlying infection count.

Examples

sim = sim_branching_process(
  changes = tibble::tibble(t = c(0,40), R = c(1.7,0.8)),
  max_time = 120,
  seed = 100,
  fn_imports = ~ ifelse(.x==0,100,0)
)

tmp = sim %>% sim_summarise_linelist()

p1 = plot_counts(tmp)

p2 = ggplot2::ggplot(tmp, ggplot2::aes(x=as.Date(time)))+
  ggplot2::geom_point(ggplot2::aes(y=rt.case,colour="case"))+
  ggplot2::geom_point(ggplot2::aes(y=rt.inst,colour="instantaneous"))+
  ggplot2::geom_line(ggplot2::aes(y=rt.weighted))+
  ggplot2::coord_cartesian(ylim=c(0,3.5))+
  ggplot2::xlab(NULL)

patchwork::wrap_plots(p1,p2,ncol=1,axes="collect")

Generate a simple time-series of cases based on a growth rate step function

Description

This time-series has no statistical noise and is useful for testing things. It has a fixed known value for infections and growth rate (fixed at 0.05 and -0.05 per day), and a instantaneous reproduction number which is based on the provided infectivity profile. A fixed denominator gives a known proportion and a relative growth rate that is the same as the growth rate.

Usage

sim_test_data(ip = test_ip, duration = 500, period = 50)

Arguments

ip

an infectivity profile. any uncertainty will be collapsed into the central distribution.

duration

the total length of the time-series

period

the duration of each positive or negative growth phase

Value

a time series with count, incidence, growth, rt, proportion and relative.growth columns

Examples

sim_test_data() %>% dplyr::glimpse()

Generate a single infectivity profile from multiple bootstraps

Description

Generate a single infectivity profile from multiple bootstraps

Usage

summarise_ip(ip = i_empirical_ip)

Arguments

ip

the infectivity profile to summarise. a0 and a1 columns are optional if tau is given.

A dataframe containing the following columns:

  • boot (anything + default(1)) - a bootstrap identifier

  • probability (proportion) - the probability of new event during this period.

  • a0 (double) - the beginning of the time period (in days)

  • a1 (double) - the end of the time period (in days)

Must be grouped by: boot (exactly).

A default value is defined.

Value

an infectivity profile


An example of the linelist output of the branching process model simulation

Description

This is generated using the test_ip infectivity profile and also includes a delay to symptom onset which is a random gamma distributed quantity with mean of 6 and sd of 2

Usage

data(test_bpm)

Format

A dataframe containing the following columns:

  • time (as.time_period) - the time column

  • id (integer) - an id per individual

  • generation_interval (numeric) - the generation_interval column

  • infector (integer) - the infector id

  • generation (numeric) - the generation column

  • symptom_onset (logical) - the flag for onset of symptoms

  • symptom_onset_delay (numeric) - the time to onset of symptoms from infection

  • symptom_onset_time (as.time_period) - the time of symptom onset

333126 rows and 8 columns


A test infectivity profile generated from a set of discretised gamma distributions with parameters mean 5 (95% CI 4-6) and sd 2 (95% CI 1.5-2.5).

Description

A test infectivity profile generated from a set of discretised gamma distributions with parameters mean 5 (95% CI 4-6) and sd 2 (95% CI 1.5-2.5).

Usage

data(test_ip)

Format

A dataframe containing the following columns:

  • boot (anything + default(1)) - a bootstrap identifier

  • probability (proportion) - the probability of infection between previous time period until time

  • tau (numeric) - the time index this probability relates to (in days)

  • a0 (numeric) - the beginning of the time period

  • a1 (numeric) - the end of the time period

Grouped by: boot.

2000 rows and 5 columns


An example of the linelist output of the poisson model simulation with defined $R_t$

Description

This is generated using the test_ip infectivity profile

Usage

data(test_poisson_rt)

Format

A dataframe containing the following columns:

  • time (as.time_period) - the time column

  • rt (numeric) - the time varying rt column (parameters)

  • imports (numeric) - the imports column

  • rate (numeric) - the poisson rate column (underlying infection rate)

  • count (integer) - the count column

  • statistic (character) - the statistic column

81 rows and 6 columns


A serial interval estimated from simulated data

Description

This serial interval is resampled from the first 1000 patients in the test_bpm dataset for whom both infector and infectee has symptoms. These patients are generated with a symptom delay of mean 6 days and SD 2 from infection (discrete under-dispersed gamma) and an infectivity profile with mean 5 days and SD 2 as defined in test_ip dataset. This serial interval if relevant to the estimation of $R_t$ from symptomatic case counts but includes negative times, and cannot be used with EpiEstim.

Usage

data(test_serial)

Format

A dataframe containing the following columns:

  • tau (numeric) - the time delay between symptoms in infector and infectee

  • a0 (numeric) - the a0 column

  • a1 (numeric) - the a1 column

  • probability (numeric) - the probability column

  • boot (integer) - the boot column

Minimally grouped by: boot (and other groupings allowed).

2166 rows and 5 columns


A test time series dataset

Description

A test time series dataset

Usage

data(test_ts)

Format

A dataframe containing the following columns:

  • time (as.time_period) - the time column

  • growth (numeric) - the growth column

  • incidence (numeric) - the incidence column

  • rt (numeric) - the rt column

  • denom (numeric) - the denom column

  • proportion (numeric) - the proportion column

  • relative.growth (numeric) - the relative.growth column

  • count (numeric) - the count column

Any grouping allowed.

501 rows and 8 columns


Aggregate time series data preserving the time series

Description

Aggregate time series data preserving the time series

Usage

time_aggregate(
  df = i_timestamped,
  ...,
  .groups = NULL,
  .cols = NULL,
  .fns = NULL
)

Arguments

df

an optionally grouped time series. Grouping should not include the time column. The grouping works differently from dplyr::summarise in that the last level of non-time groups is lost in this operation, so the subgroup you wish to aggregate should be included in the grouping.

...

A set of dplyr::summarise statements, or additional parameters for .fns

.groups

as per dplyr::summarise

.cols

Optional tidyselect column specification for dplyr::across. if .fns is given and the .cols parameter is not specified then the columns to summarise are automatically identified. In doing this any Date columns are dropped. If this in not what you want then .cols or ... must be given

.fns

Optional a set of function specifications as per dplyr::across

Value

the summarised time series preserving the time column, and with the grouping structure involving one fewer levels than the input

Examples

ggoutbreak::england_covid %>%
  time_aggregate(count = sum(count), denom = sum(denom)) %>%
  dplyr::glimpse()

ggoutbreak::england_covid %>%
  time_aggregate(.fns=mean) %>%
  dplyr::glimpse()

Summarise data from a line list to a time-series of counts.

Description

This principally is designed to take a record of single events and produce a summary time-series count of events by group, class and date. The default behaviour is to guess the cadence of the input data and summarise the event line list to a (set of) regular time-series counts for use in incidence and growth rate estimates.

Usage

time_summarise(
  df = i_dated,
  unit,
  anchor = "start",
  rectangular = FALSE,
  ...,
  .fill = list(count = 0)
)

Arguments

df

a line list of data you want to summarise, optionally grouped. If this is grouped then each group is treated independently. The remaining columns must contain a date column and may contain a class column. If a count column is present the counts will be summed, otherwise each individual row will be counted as a single event (as a linelist)

unit

a period e.g. "1 week"

anchor

one of a date, "start" or "end" or a weekday name e.g. "mon" this will always be one of the start of the time periods we are cutting into

rectangular

should the resulting time series be the same length for all groups. This is only the case if you can be sure that your data is complete for all subgroups, otherwise missing data will be treated as zero counts. This is important if leading and trailing missing data in one subgroup can be due to a reporting delay in that subgroup, in which case a rectangular time series will erroneously fill in zero counts for this missing data.

...

a spec for a dplyr::summary(...) - optional, and if not provided a count = dplyr::n() or a count = sum(count) is performed.

.fill

a list similar to tidyr::complete for values to fill variables with

Details

If the data is given with a class column the time series are interpreted as having a denominator, consisting of all the different classes within a time period. This may be subtypes (e.g. variants, serotypes) or markers for test positivity. In either case the resulting time series will have counts for all classes and denominators for the combination.

There is flexibility for other kinds of summarisation if the raw data is not count based (e.g. means of continuous variables) but in this case a the slider package is usually going to be better, as time summarise will only look at non overlapping time periods with fixed lengths.

There is another use case where an existing timeseries on a particular frequency is aggregated to another less frequent basis (e.g. moving from a daily timeseries to a weekly one). In this case the input will contain a count column. In this mode no checks are made that the more frequent events are all present before summarisation so the result may include different numbers of input periods (e.g. going from weeks to months may be 4 or 5 weeks in each month)

Value

The output depends on whether or not the input was grouped and had a class column. The most detailed output will be:

A dataframe containing the following columns:

  • denom (positive_integer) - Total test counts associated with the specified time frame

  • count (positive_integer) - Positive case counts associated with the specified time frame

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period

Any grouping allowed.

or a more minimal output if the input is only a plain list of dated events:

A dataframe containing the following columns:

  • count (positive_integer) - Positive case counts associated with the specified time frame

  • time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period

Any grouping allowed.


Convert a set of time points to dates

Description

Convert a set of time points to dates

Usage

time_to_date(
  timepoints,
  unit = attr(timepoints, "unit"),
  start_date = attr(timepoints, "start_date")
)

Arguments

timepoints

a set of numeric time points

unit

the period / unit of the time points, which will be extracted from time points if possible

start_date

the zero day of the time series, will be extracted from time points if possible

Value

a vector of dates

Examples

times = date_to_time(as.Date("2019-12-29")+0:100, "1 week")
dates = time_to_date(times)

Calculate the reproduction number from a growth rate estimate and an infectivity profile

Description

This function uses a single empirical distribution for the infectivity profile aka generation time

Usage

wallinga_lipsitch(
  r,
  y = i_empirical_ip,
  a1 = seq(0.5, length.out = length(y)),
  a0 = dplyr::lag(a1, default = 0)
)

Arguments

r

a growth rate (may be a vector)

y

an empirical infectivity profile either as a probability vector or as a dataframe of format: A dataframe containing the following columns:

  • boot (anything + default(1)) - a bootstrap identifier

  • probability (proportion) - the probability of new event during this period.

  • a0 (double) - the beginning of the time period (in days)

  • a1 (double) - the end of the time period (in days)

Must be grouped by: boot (exactly). A default value is defined.

a1

the end time of the infectivity profile probability estimate (defaults to 0.5,1.5,2.5,...).

a0

the start time of the infectivity profile probability estimate (defaults to 0,0.5,1.5,...).

Value

a reproduction number estimate based on r

Examples

wallinga_lipsitch(r=seq(-0.1,0.1,length.out=9), y=stats::dgamma(1:50, 5,2))

Wedge distribution

Description

The wedge distribution has a domain of 0 to 1 and has a linear probability density function over that domain.

Arguments

n

number of observations

x

vector of quantiles

q

vector of quantiles

p

vector of probabilities

log

logical; if TRUE, probabilities p are given as log(p).

log.p

logical; if TRUE, probabilities p are given as log(p).

lower.tail

logical; if TRUE (default), probabilities are P[X<=x] otherwise P[X>x].

a

a gradient from -2 (left skewed) to 2 (right skewed)

Details

The rwedge can be combined with quantile functions to skew standard distributions, or introduce correlation or down weight certain parts of the distribution.

Value

a vector of probabilities, quantiles, densities or samples.

Examples

pwedge(seq(0,1,0.1), a=1)
dwedge(seq(0,1,0.1), a=1)
qwedge(c(0.25,0.5,0.75), a=-1)

stats::cor(
  stats::qnorm(rwedge(1000, a=2)),
  stats::qnorm(rwedge(1000, a=-2))
)

Extract Parts of a POSIXt or Date Object

Description

Extract the weekday, month or quarter, or the Julian time (days since some origin). These are generic functions: the methods for the internal date-time classes are documented here.

Usage

## S3 method for class 'time_period'
weekdays(x, abbreviate = FALSE)

Arguments

x

an object inheriting from class "POSIXt" or "Date".

abbreviate

logical vector (possibly recycled). Should the names be abbreviated?

Value

weekdays and months return a character vector of names in the locale in use, i.e., Sys.getlocale("LC_TIME").

quarters returns a character vector of "Q1" to "Q4".

julian returns the number of days (possibly fractional) since the origin, with the origin as a "origin" attribute. All time calculations in R are done ignoring leap-seconds.

Note

Other components such as the day of the month or the year are very easy to compute: just use as.POSIXlt and extract the relevant component. Alternatively (especially if the components are desired as character strings), use strftime.

See Also

DateTimeClasses, Date; Sys.getlocale("LC_TIME") crucially for months() and weekdays().

Examples

## first two are locale dependent:
weekdays(.leap.seconds)
months  (.leap.seconds)
quarters(.leap.seconds)

## Show how easily you get month, day, year, day (of {month, week, yr}), ... :
## (remember to count from 0 (!): mon = 0..11, wday = 0..6,  etc !!)

##' Transform (Time-)Date vector  to  convenient data frame :
dt2df <- function(dt, dName = deparse(substitute(dt))) {
    DF <- as.data.frame(unclass(as.POSIXlt( dt )))
    `names<-`(cbind(dt, DF, deparse.level=0L), c(dName, names(DF)))
}
## e.g.,
dt2df(.leap.seconds)    # date+time
dt2df(Sys.Date() + 0:9) # date

##' Even simpler:  Date -> Matrix - dropping time info {sec,min,hour, isdst}
d2mat <- function(x) simplify2array(unclass(as.POSIXlt(x))[4:7])
## e.g.,
d2mat(seq(as.Date("2000-02-02"), by=1, length.out=30)) # has R 1.0.0's release date


## Julian Day Number (JDN, https://en.wikipedia.org/wiki/Julian_day)
## is the number of days since noon UTC on the first day of 4317 BCE.
## in the proleptic Julian calendar.  To more recently, in
## 'Terrestrial Time' which differs from UTC by a few seconds
## See https://en.wikipedia.org/wiki/Terrestrial_Time
julian(Sys.Date(), -2440588) # from a day
floor(as.numeric(julian(Sys.time())) + 2440587.5) # from a date-time