Title: | Estimate Incidence, Proportions and Exponential Growth Rates |
---|---|
Description: | Simple statistical models and visualisations for calculating the incidence, proportion, exponential growth rate, and reproduction number of infectious disease case time series. This toolkit was largely developed during the COVID-19 pandemic. |
Authors: | Robert Challen [aut, cre, cph]
|
Maintainer: | Robert Challen <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.4.1 |
Built: | 2025-02-07 09:32:33 UTC |
Source: | https://github.com/ai4ci/ggoutbreak |
ggplot
Insert a layer at the bottom of a ggplot
plot %above% layer
plot %above% layer
plot |
the plot to add the layer to |
layer |
the layer to insert underneath the plot |
a ggplot
Convert time period to dates
## S3 method for class 'time_period' as.Date(x, ...) ## S3 method for class 'time_period' as.POSIXct(x, ...)
## S3 method for class 'time_period' as.Date(x, ...) ## S3 method for class 'time_period' as.POSIXct(x, ...)
x |
a time_period |
... |
not used |
a vector of dates representing the start of each of the input
time_period
entries
as.POSIXct(time_period)
: Convert to a vector of POSIXct
Time periods are just a zero based numeric representation of dates with a time unit baked in. This allows variable length periods (e.g. days or weeks), and fractional days to be represented in a consistent(ish) way
as.time_period(x, unit = NULL, start_date = NULL, anchor = NULL, ...) ## S3 method for class 'time_period' c(..., recursive = F) ## S3 method for class 'time_period' x[...] ## S3 replacement method for class 'time_period' x[...] <- value ## S3 method for class 'time_period' x[[...]] ## S3 replacement method for class 'time_period' x[[...]] <- value ## S3 method for class 'time_period' seq(from, to = from, ...) is.time_period(x) ## S3 method for class 'time_period' print(x, ...)
as.time_period(x, unit = NULL, start_date = NULL, anchor = NULL, ...) ## S3 method for class 'time_period' c(..., recursive = F) ## S3 method for class 'time_period' x[...] ## S3 replacement method for class 'time_period' x[...] <- value ## S3 method for class 'time_period' x[[...]] ## S3 replacement method for class 'time_period' x[[...]] <- value ## S3 method for class 'time_period' seq(from, to = from, ...) is.time_period(x) ## S3 method for class 'time_period' print(x, ...)
x |
a vector of numbers (may be integer or real) or a time_period |
unit |
the length of one unit of time. This will be either a integer
number of days, or a specification such as "1 week", or another |
start_date |
the zero time date as something that can be coerced to a
date. If the |
anchor |
only relevant is |
... |
used for subtype implementations |
recursive |
concatenate recursively |
value |
the value |
from , to
|
the starting and (maximal) end values of the
sequence. Of length |
a time_period
class, consisting of a vector of numbers, with
attributes for time period and start_date
c(time_period)
: Combine time_period
[
: Subset a time_period
`[`(time_period) <- value
: Assign values to a subset of a time_period
[[
: Get a value in a time_period
`[[`(time_period) <- value
: Assign a value in a time_period
seq(time_period)
: Create a sequence using time_period
s
is.time_period()
: Check is a time_period
print(time_period)
: Print a time_period
# 100 weeks from 2020-01-01 tmp = as.time_period(0:100, 7, "2020-01-01") as.Date(tmp) range(tmp) min(tmp) tmp2 = as.integer(as.Date(tmp)) # testthat::expect_true(all(na.omit(tmp2-lag(tmp2)) == 7)) tmp2 = as.time_period(0:23, 1/24, "2020-01-01") as.POSIXct(tmp2) # convert timeseries to new "unit" tmp = as.time_period(0:100, 7, "2020-01-01") tmp2 = as.time_period(tmp,1) testthat::expect_equal(as.numeric(tmp2), 0:100*7)
# 100 weeks from 2020-01-01 tmp = as.time_period(0:100, 7, "2020-01-01") as.Date(tmp) range(tmp) min(tmp) tmp2 = as.integer(as.Date(tmp)) # testthat::expect_true(all(na.omit(tmp2-lag(tmp2)) == 7)) tmp2 = as.time_period(0:23, 1/24, "2020-01-01") as.POSIXct(tmp2) # convert timeseries to new "unit" tmp = as.time_period(0:100, 7, "2020-01-01") tmp2 = as.time_period(tmp,1) testthat::expect_equal(as.numeric(tmp2), 0:100*7)
A scales breaks generator for log1p scales
breaks_log1p(n = 5, base = 10)
breaks_log1p(n = 5, base = 10)
n |
the number of breaks |
base |
the base for the breaks |
a function for ggplot scale breaks
ggplot2::ggplot(ggplot2::diamonds, ggplot2::aes(x=price))+ ggplot2::geom_density()+ ggplot2::scale_x_continuous(trans="log1p", breaks=breaks_log1p())
ggplot2::ggplot(ggplot2::diamonds, ggplot2::aes(x=price))+ ggplot2::geom_density()+ ggplot2::scale_x_continuous(trans="log1p", breaks=breaks_log1p())
Generate a random probability based on features of the simulation
cfg_beta_prob_rng(probability_fn = ~0.8, kappa_fn = ~0.1)
cfg_beta_prob_rng(probability_fn = ~0.8, kappa_fn = ~0.1)
probability_fn |
a function which gives the time-varying mean of a beta
distribution, The function will be called minimally with |
kappa_fn |
a function which gives the time-varying dispersion of a beta
distribution. The function will be called with minimally |
a time dependent function that inputs a time (as a time_period) and
returns an probability for each day defined by the probability_fn
and kappa_fn
fn = cfg_beta_prob_rng(~ ifelse(.x<=5,0.1,0.9)) fn(1:10)
fn = cfg_beta_prob_rng(~ ifelse(.x<=5,0.1,0.9)) fn(1:10)
Get a IP generating function from time varying mean and SD of a gamma function
cfg_gamma_ip_fn(mean_fn = ~2, sd_fn = function(mean) sqrt(mean))
cfg_gamma_ip_fn(mean_fn = ~2, sd_fn = function(mean) sqrt(mean))
mean_fn |
a function which gives the time-varying mean of a gamma
distribution, The function will be called minimally with |
sd_fn |
a function which gives the time-varying mean of a gamma
distribution. The function will be called with minimally |
a time dependent function that inputs a time (as a time_period) and
returns an ip delay distribution for each day defined by the mean_fn
and sd_fn
fn = cfg_gamma_ip_fn(mean_fn = \(t) ifelse(t < 5, 4, 2)) # a gamma function that changes mean at time 5 fn(4) fn(7)
fn = cfg_gamma_ip_fn(mean_fn = \(t) ifelse(t < 5, 4, 2)) # a gamma function that changes mean at time 5 fn(4) fn(7)
This is used for random sampling from the infectivity profile for times to infection, for example. There is nothing to stop you putting in a delay distribution with negative times but strange things may happen in your simulation.
cfg_ip_sampler_rng(ip = i_empirical_ip)
cfg_ip_sampler_rng(ip = i_empirical_ip)
ip |
a long format empirical distribution A dataframe containing the following columns:
Must be grouped by: boot (exactly). A default value is defined. |
a function which accepts n
parameter which produces random samples
from the ip
distribution
tmp = cfg_ip_sampler_rng(ganyani_ip_2)(10000) # This discretised ganyani distribution is based on these figures: # mean: 5.2 (3.78-6.78) and sd: 1.72 (0.91-3.93) format_ip(ganyani_ip_2) mean(tmp) # Should be about 5.2 stats::sd(tmp) # Should be about 1.72
tmp = cfg_ip_sampler_rng(ganyani_ip_2)(10000) # This discretised ganyani distribution is based on these figures: # mean: 5.2 (3.78-6.78) and sd: 1.72 (0.91-3.93) format_ip(ganyani_ip_2) mean(tmp) # Should be about 5.2 stats::sd(tmp) # Should be about 1.72
Linear function from dataframe
cfg_linear_fn(changes, ..., col_name = setdiff(colnames(changes), "t"))
cfg_linear_fn(changes, ..., col_name = setdiff(colnames(changes), "t"))
changes |
a dataframe with |
... |
not used |
col_name |
the value column (optional if only 2 columns) |
a function that inputs a vector t
and returns a linearly
interpolated value from <col_name>
Step function from dataframe
cfg_step_fn(changes, ..., col_name = setdiff(colnames(changes), "t"))
cfg_step_fn(changes, ..., col_name = setdiff(colnames(changes), "t"))
changes |
a dataframe with |
... |
not used |
col_name |
the value column (optional if only 2 columns) |
a function that inputs a vector t
and returns the next smallest
corresponding value in <col_name>
(or the first one)
This is particularly designed for use within the fn_list_next_gen
parameter
of sim_branching_process()
to allow age group contract matrices to be applied
for example (assuming a categorical age).
cfg_transition_fn(transition)
cfg_transition_fn(transition)
transition |
a transition matrix or long format dataframe. If a matrix the columns should add to 1, and the column names are the input class. row names are the output class. the data frame format must have input, output, and probability columns. |
a function that given an input will return samples of the output class according to the probability distributions.
age = rep(c("child","adult","elderly"),100) fn = cfg_transition_fn(tibble::tribble( ~input, ~output, ~probability, "child", "child", 0.5, "child", "adult", 0.5, "adult", "child", 0.5, "adult", "adult", 0.3, "adult", "elderly", 0.2, "elderly","elderly", 0.5, "elderly","adult", 0.5, )) table(fn(age),age)
age = rep(c("child","adult","elderly"),100) fn = cfg_transition_fn(tibble::tribble( ~input, ~output, ~probability, "child", "child", 0.5, "child", "adult", 0.5, "adult", "child", 0.5, "adult", "adult", 0.3, "adult", "elderly", 0.2, "elderly","elderly", 0.5, "elderly","adult", 0.5, )) table(fn(age),age)
This function returns a random number generator from a gamma distribution that has a weekly period and configurable degree of additional variability around weekly pattern. It is suited to delays of things like testing which may depend on the day of week.
cfg_weekly_gamma_rng( mean = c(1, 1, 1, 1, 4, 3, 2), sd = sqrt(mean), week_starts = weekdays(as.Date("2024-10-14")) )
cfg_weekly_gamma_rng( mean = c(1, 1, 1, 1, 4, 3, 2), sd = sqrt(mean), week_starts = weekdays(as.Date("2024-10-14")) )
mean |
a mean amount of delay for each day of the week |
sd |
the SD of the delay |
week_starts |
locale description of first day of week (default is a "Monday"). |
a random number generator taking t
time parameter and returning a
duration the that time t
depending on the day of the week.
fn = cfg_weekly_gamma_rng(c(1,1,1,1,4,3,2)) matrix(fn(1:42),ncol=7,byrow=TRUE)
fn = cfg_weekly_gamma_rng(c(1,1,1,1,4,3,2)) matrix(fn(1:42),ncol=7,byrow=TRUE)
Weekly convolution distribution function
cfg_weekly_ip_fn( mean = c(1, 1, 1, 1, 4, 3, 2), sd = sqrt(mean), week_starts = weekdays(as.Date("2024-10-14")) )
cfg_weekly_ip_fn( mean = c(1, 1, 1, 1, 4, 3, 2), sd = sqrt(mean), week_starts = weekdays(as.Date("2024-10-14")) )
mean |
The means of a gamma distributed delay function for each weekday |
sd |
The sds of a gamma distributed delay function for each weekday |
week_starts |
locale description of first day of week (default is a "Monday"). |
a time dependent function that inputs a time (as a time_period) and generates an IP delay distribution for each day varying by day of week
cat(sapply(cfg_weekly_ip_fn()(1:7),format_ip),sep = "\n")
cat(sapply(cfg_weekly_ip_fn()(1:7),format_ip),sep = "\n")
This function returns a random probability generator that has a weekly period and configurable degree of additional variability around weekly pattern. It is suited to probabilities of things like testing which may depend on the day of week.
cfg_weekly_proportion_rng( prob = c(0.8, 0.8, 0.8, 0.8, 0.8, 0.5, 0.5), kappa = 0.1, week_starts = weekdays(as.Date("2024-10-14")) )
cfg_weekly_proportion_rng( prob = c(0.8, 0.8, 0.8, 0.8, 0.8, 0.5, 0.5), kappa = 0.1, week_starts = weekdays(as.Date("2024-10-14")) )
prob |
the rates of e.g. ascertainment for each day of the week. |
kappa |
dispersion parameter between 0 and 1. O is no dispersion. 1 is maximum |
week_starts |
locale description of first day of week (default is a "Monday"). |
a random number generator function taking t
time parameter and
returning a probability of ascertainment for that time, depending on day of
week etc.
fn = cfg_weekly_proportion_rng(c(0.9,0.9,0.9,0.9,0.9,0.1,0.1)) matrix(fn(1:42),ncol=7,byrow=TRUE)
fn = cfg_weekly_proportion_rng(c(0.9,0.9,0.9,0.9,0.9,0.1,0.1)) matrix(fn(1:42),ncol=7,byrow=TRUE)
An infectivity profile derived from a meta-analysis of serial intervals.
data(covid_ip)
data(covid_ip)
A dataframe containing the following columns:
boot (anything + default(1)) - a bootstrap identifier
probability (proportion) - the probability of infection between previous time period until time
time (double) - the end of the time period (in days)
Must be grouped by: boot (exactly).
A dataframe containing the following columns:
boot (anything + default(1)) - a bootstrap identifier
probability (proportion) - the probability of infection between previous time period until time
tau (numeric) - the time index this probability relates to (in days)
a0 (numeric) - the beginning of the time period
a1 (numeric) - the end of the time period
Grouped by: boot.
1400 rows and 5 columns
The probability of detecting COVID using PCR given time since infection, based on Binny et al 2023.
data(covid_test_sensitivity)
data(covid_test_sensitivity)
A dataframe containing the following columns:
tau (numeric) - the time column
probability (numeric) - the probability column
boot (integer) - the boot column
Must be grouped by: boot (and other groupings allowed).
5100 rows and 3 columns
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9384503/
The COVID-19 viral shedding duration
data(covid_viral_shedding)
data(covid_viral_shedding)
A dataframe containing the following columns:
boot (anything + default(1)) - a bootstrap identifier
probability (proportion) - the probability of infection between previous time period until time
tau (numeric) - the time index this probability relates to (in days)
a0 (numeric) - the beginning of the time period
a1 (numeric) - the end of the time period
Grouped by: boot.
2600 rows and 3 columns
https://www.nature.com/articles/s41467-020-20568-4 From Von Kampen et al.
The counterpart to date_seq_dates(). Take an original set of data and place
it within a regular time series where the periodicity of the time series may
be expressed as numbers of days, weeks, months quarters, or years, and the
periods are defined by an anchoring date, day of the week or by reference to
the start or end of the input dates. This can either return the periods as
dates or factors (e.g. for plotting) or as a time_period
for analysis that
relies on a numeric representation of the date or duration from the anchor.
cut_date( dates, unit, anchor = "start", output = c("date", "factor", "time_period"), dfmt = "%d/%b/%y", ifmt = "{start} — {end}", ... )
cut_date( dates, unit, anchor = "start", output = c("date", "factor", "time_period"), dfmt = "%d/%b/%y", ifmt = "{start} — {end}", ... )
dates |
a set of dates |
unit |
a period e.g. "1 week" |
anchor |
one of a date, "start" or "end" or a weekday name e.g. "mon" this will always be one of the start of the time periods we are cutting into |
output |
return the result as either a "date" (the default), an ordered "factor" with the date ranges as a label, or as a "time_period". The result is named with labels referring to the |
dfmt |
the |
ifmt |
a |
... |
ignored |
a set of dates, times or a factor level, representing the start of the period the date falls into, where the period is defined by the duration and the anchor
dates = as.Date(c("2020-01-01","2020-02-01","2020-01-15","2020-02-03",NA)) fs = ggoutbreak::date_seq(dates, "2 days") dates - cut_date(dates, "2 days") cut_date(dates,unit="2 days", output="time_period") # A weekly set of dates: dates2 = Sys.Date() + floor(stats::runif(50,max=10))*7 # in this specific situation the final date is not truncated because the # input data is seen as an exact match for the whole output period. cut_date(dates2, "1 week", "sun", output="factor") cut_date(dates2, dfmt = "%d/%b", output="factor", unit = "2 weeks", anchor="sun")
dates = as.Date(c("2020-01-01","2020-02-01","2020-01-15","2020-02-03",NA)) fs = ggoutbreak::date_seq(dates, "2 days") dates - cut_date(dates, "2 days") cut_date(dates,unit="2 days", output="time_period") # A weekly set of dates: dates2 = Sys.Date() + floor(stats::runif(50,max=10))*7 # in this specific situation the final date is not truncated because the # input data is seen as an exact match for the whole output period. cut_date(dates2, "1 week", "sun", output="factor") cut_date(dates2, dfmt = "%d/%b", output="factor", unit = "2 weeks", anchor="sun")
This is useful if you want to fill in missing values that should have been observed but weren't. For example, date_seq(c(1, 2, 4, 6), 1) will return 1:6.
date_seq(x, period, ...)
date_seq(x, period, ...)
x |
a numeric or date vector |
period |
Gap between each observation. The existing data will be checked to ensure that it is actually of this periodicity. |
... |
for subtype methods |
a vector of the same type as the input
date_seq(c(1, 2, 4, 5, 10), 1)
date_seq(c(1, 2, 4, 5, 10), 1)
Derive from a vector of observation dates, a complete ordered sequence of
periods in a regular time series, where the length of the periods is
specified, as a number of days, weeks, years etcetera. E.g. this can convert a
random set of dates to a ordered complete list of 1 week intervals (or 2
month intervals) spanning the same range as the dates. This has some
interesting problems regarding where to put breaks within a month or week.
Often this is either based on a specific date (e.g. yearly periods starting
at 2020-01-01
) or a day of week (e.g. 2 weekly periods staring on a Sunday)
or maybe relative to the input time series (weekly ending on the last date of
the data). There is also a problem when we consider data that may have
incomplete starting and end periods, which may not be comparable to other
periods, and we may need to exclude these from the result.
## S3 method for class 'Date' date_seq(x, period = .day_interval(x), anchor = "start", complete = FALSE, ...)
## S3 method for class 'Date' date_seq(x, period = .day_interval(x), anchor = "start", complete = FALSE, ...)
x |
a vector of dates, possibly including NA values |
period |
the gap between observations as a number of days or as a natural language definition of the period such as "1 week", '2 weeks', '1 month', etcetera. If not given this will be derived from the dates. |
anchor |
defines a day that appears in the sequence (if it were to extend that far). Given either as a date, or "start", "end" or a day of the week, e.g. "mon". |
complete |
truncate incomplete start and end periods |
... |
ignored |
a vector of dates for regular periods between the minimum and maximum of dates, with the boundaries defined by the anchor.
date_seq(as.Date(c("2020-01-01","2020-02-01","2020-01-15","2020-02-01",NA)), "2 days")
date_seq(as.Date(c("2020-01-01","2020-02-01","2020-01-15","2020-02-01",NA)), "2 days")
This is useful if you want to fill in missing values that should have been observed but weren't. For example, date_seq(c(1, 2, 4, 6), 1) will return 1:6.
## S3 method for class 'numeric' date_seq(x, period = 1, tol = 1e-06, ...)
## S3 method for class 'numeric' date_seq(x, period = 1, tol = 1e-06, ...)
x |
a numeric or date vector |
period |
Gap between each observation. The existing data will be checked to ensure that it is actually of this periodicity. |
tol |
Numerical tolerance for checking periodicity. |
... |
for subtype methods |
a vector of the same type as the input
date_seq(c(1, 2, 4, 5, 10), 1)
date_seq(c(1, 2, 4, 5, 10), 1)
time_period
vector to the full range of possible timesDerive from a vector of observation time_periods
, a complete ordered sequence of
periods in a regular time series, where the length of the periods is
specified, as a number of days, weeks, years etc. E.g. this can convert a
random set of times to a ordered complete list of 1 week intervals (or 2
month intervals) spanning the same range as the dates. This has some
interesting problems regarding where to put breaks within a month or week.
Often this is either based on a specific date (e.g. yearly periods starting
at 2020-01-01
) or a day of week (e.g. 2 weekly periods staring on a sunday)
or maybe relative to the input time series (weekly ending on the last date of
the data). There is also a problem when we consider data that may have
incomplete starting and end periods, which may not be comparable to other
periods, and we may need to exclude these from the result.
## S3 method for class 'time_period' date_seq(x, period = attributes(x)$unit, complete = FALSE, ...)
## S3 method for class 'time_period' date_seq(x, period = attributes(x)$unit, complete = FALSE, ...)
x |
a time period vector |
period |
the gap between observations as a number of days or as a natural language definition of the period such as "1 week", '2 weeks', '1 month', etc. If not given this will be derived from the dates. |
complete |
truncate incomplete start and end periods |
... |
ignored |
a vector of time_periods
for regular periods between the minimum and maximum of
dates, with the boundaries defined by the anchor.
tmp = as.time_period(c(0,10,100), 7, "2020-01-01") date_seq(tmp, "7 days") date_seq(tmp, "1 day")
tmp = as.time_period(c(0,10,100), 7, "2020-01-01") date_seq(tmp, "7 days") date_seq(tmp, "1 day")
Using a start_date and a unit specification
date_to_time( dates, unit = .day_interval(dates), start_date = getOption("day_zero", "2019-12-29") )
date_to_time( dates, unit = .day_interval(dates), start_date = getOption("day_zero", "2019-12-29") )
dates |
a vector of dates to convert |
unit |
a specification of the unit of the resulting time series.
Will be determined from periodicity of dates if not specified. If another
|
start_date |
the origin of the conversion. Defaults to the beginning of the COVID pandemic |
a vector of class time_period
times = date_to_time(as.Date("2019-12-29")+0:100, "1 week") dates = time_to_date(times)
times = date_to_time(as.Date("2019-12-29")+0:100, "1 week") dates = time_to_date(times)
Density, distribution function, quantile function and random
generation for the Beta distribution with parameters shape1
and
shape2
(and optional non-centrality parameter ncp
).
dbeta2(x, prob, kappa, log = FALSE)
dbeta2(x, prob, kappa, log = FALSE)
x |
vector of quantiles |
prob |
the mean probability (vectorised) |
kappa |
a coefficient of variation. where 0 is no variability and 1 is maximally variability (vectorised) |
log |
logical; if TRUE, probabilities p are given as log(p). |
dbeta
gives the density, pbeta
the distribution
function, qbeta
the quantile function, and rbeta
generates random deviates.
Invalid arguments will result in return value NaN
, with a warning.
The length of the result is determined by n
for
rbeta
, and is the maximum of the lengths of the
numerical arguments for the other functions.
The numerical arguments other than n
are recycled to the
length of the result. Only the first elements of the logical
arguments are used.
dbeta2(c(0.25,0.5,0.75), 0.5, 0.25)
dbeta2(c(0.25,0.5,0.75), 0.5, 0.25)
Density, distribution function, quantile function and random
generation for the Gamma distribution with parameters shape
and
scale
.
dgamma2(x, mean, sd = sqrt(mean), log = FALSE, convex = TRUE)
dgamma2(x, mean, sd = sqrt(mean), log = FALSE, convex = TRUE)
x |
vector of quantiles |
mean |
the mean value on the true scale (vectorised) |
sd |
the standard deviation on the true scale (vectorised) |
log |
logical; if TRUE, probabilities p are given as log(p). |
convex |
Show a warning if the distribution selected is not a convex function |
dgamma
gives the density,
pgamma
gives the distribution function,
qgamma
gives the quantile function, and
rgamma
generates random deviates.
Invalid arguments will result in return value NaN
, with a warning.
The length of the result is determined by n
for
rgamma
, and is the maximum of the lengths of the
numerical arguments for the other functions.
The numerical arguments other than n
are recycled to the
length of the result. Only the first elements of the logical
arguments are used.
dgamma2(seq(0,4,0.25), 2, 1)
dgamma2(seq(0,4,0.25), 2, 1)
Density, distribution function, quantile function and random
generation for the log normal distribution whose logarithm has mean
equal to meanlog
and standard deviation equal to sdlog
.
dlnorm2(x, mean = 1, sd = sqrt(exp(1) - 1), log = FALSE)
dlnorm2(x, mean = 1, sd = sqrt(exp(1) - 1), log = FALSE)
x |
vector of quantiles |
mean |
the mean value on the true scale (vectorised) |
sd |
the standard deviation on the true scale (vectorised) |
log |
logical; if TRUE, probabilities p are given as log(p). |
The log normal distribution has density
where and
are the mean and standard
deviation of the logarithm.
The mean is
,
the median is
, and the variance
and hence the coefficient of variation is
which is
approximately
when that is small (e.g.,
).
dlnorm
gives the density,
plnorm
gives the distribution function,
qlnorm
gives the quantile function, and
rlnorm
generates random deviates.
The length of the result is determined by n
for
rlnorm
, and is the maximum of the lengths of the
numerical arguments for the other functions.
The numerical arguments other than n
are recycled to the
length of the result. Only the first elements of the logical
arguments are used.
The cumulative hazard
is
-plnorm(t, r, lower = FALSE, log = TRUE)
.
dlnorm
is calculated from the definition (in ‘Details’).
[pqr]lnorm
are based on the relationship to the normal.
Consequently, they model a single point mass at exp(meanlog)
for the boundary case sdlog = 0
.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.
Distributions for other standard distributions, including
dnorm
for the normal distribution.
dlnorm2(seq(0,4,0.25), 2, 1)
dlnorm2(seq(0,4,0.25), 2, 1)
Density, distribution function, quantile function and random
generation for the negative binomial distribution with parameters
size
and prob
.
dnbinom2(x, mean, sd = sqrt(mean), log = FALSE)
dnbinom2(x, mean, sd = sqrt(mean), log = FALSE)
x |
vector of (non-negative integer) quantiles. |
mean |
the mean value on the true scale (vectorised) |
sd |
the standard deviation on the true scale (vectorised) |
log |
logical; if TRUE, probabilities p are given as log(p). |
The negative binomial distribution with size
and
prob
has density
for ,
and
.
This represents the number of failures which occur in a sequence of
Bernoulli trials before a target number of successes is reached.
The mean is and variance
.
A negative binomial distribution can also arise as a mixture of
Poisson distributions with mean distributed as a gamma distribution
(see pgamma
) with scale parameter (1 - prob)/prob
and shape parameter size
. (This definition allows non-integer
values of size
.)
An alternative parametrization (often used in ecology) is by the
mean mu
(see above), and size
, the dispersion
parameter, where prob
= size/(size+mu)
. The variance
is mu + mu^2/size
in this parametrization.
If an element of x
is not integer, the result of dnbinom
is zero, with a warning.
The case size == 0
is the distribution concentrated at zero.
This is the limiting distribution for size
approaching zero,
even if mu
rather than prob
is held constant. Notice
though, that the mean of the limit distribution is 0, whatever the
value of mu
.
The quantile is defined as the smallest value such that
, where
is the distribution function.
dnbinom
gives the density,
pnbinom
gives the distribution function,
qnbinom
gives the quantile function, and
rnbinom
generates random deviates.
Invalid size
or prob
will result in return value
NaN
, with a warning.
The length of the result is determined by n
for
rnbinom
, and is the maximum of the lengths of the
numerical arguments for the other functions.
The numerical arguments other than n
are recycled to the
length of the result. Only the first elements of the logical
arguments are used.
rnbinom
returns a vector of type integer unless generated
values exceed the maximum representable integer when double
values are returned.
dnbinom
computes via binomial probabilities, using code
contributed by Catherine Loader (see dbinom
).
pnbinom
uses pbeta
.
qnbinom
uses the Cornish–Fisher Expansion to include a skewness
correction to a normal approximation, followed by a search.
rnbinom
uses the derivation as a gamma mixture of Poisson
distributions, see
Devroye, L. (1986) Non-Uniform Random Variate Generation. Springer-Verlag, New York. Page 480.
Distributions for standard distributions, including
dbinom
for the binomial, dpois
for the
Poisson and dgeom
for the geometric distribution, which
is a special case of the negative binomial.
dnbinom2(0:5, 2, sqrt(2))
dnbinom2(0:5, 2, sqrt(2))
The unit of doubling times is always days.
doubling_time(x, ...)
doubling_time(x, ...)
x |
a dataframe calculated from either proportion or incidence growth rate calculations: e.g. A dataframe containing the following columns:
Any grouping allowed. OR A dataframe containing the following columns:
Any grouping allowed. |
... |
not used |
the same dataframe with additional columns for doubling time or relative doubling time plus confidence intervals.
ggoutbreak::england_covid %>% ggoutbreak::poisson_locfit_model(window=21) %>% ggoutbreak::doubling_time() %>% dplyr::glimpse()
ggoutbreak::england_covid %>% ggoutbreak::poisson_locfit_model(window=21) %>% ggoutbreak::doubling_time() %>% dplyr::glimpse()
From Z. Du, X. Xu, Y. Wu, L. Wang, B. J. Cowling, and L. A. Meyers, ‘Serial Interval of COVID-19 among Publicly Reported Confirmed Cases’, Emerg Infect Dis, vol. 26, no. 6, pp. 1341–1343, Jun. 2020, doi: 10.3201/eid2606.200357.
data(du_serial_interval_ip)
data(du_serial_interval_ip)
A dataframe containing the following columns:
boot (anything + default(1)) - a bootstrap identifier
probability (proportion) - the probability of infection between previous time period until time
tau (numeric) - the time index this probability relates to (in days)
a0 (numeric) - the beginning of the time period
a1 (numeric) - the end of the time period
Grouped by: boot.
2603 rows and 5 columns
The wedge distribution has a domain of 0 to 1 and has a linear probability density function over that domain.
dwedge(x, a, lower.tail = TRUE, log = FALSE)
dwedge(x, a, lower.tail = TRUE, log = FALSE)
x |
vector of quantiles |
a |
a gradient from -2 (left skewed) to 2 (right skewed) |
lower.tail |
logical; if TRUE (default), probabilities are |
log |
logical; if TRUE, probabilities p are given as log(p). |
The rwedge
can be combined with quantile functions to
skew standard distributions, or introduce correlation or down weight
certain parts of the distribution.
a vector of probabilities, quantiles, densities or samples.
pwedge(seq(0,1,0.1), a=1) dwedge(seq(0,1,0.1), a=1) qwedge(c(0.25,0.5,0.75), a=-1) stats::cor( stats::qnorm(rwedge(1000, a=2)), stats::qnorm(rwedge(1000, a=-2)) )
pwedge(seq(0,1,0.1), a=1) dwedge(seq(0,1,0.1), a=1) qwedge(c(0.25,0.5,0.75), a=-1) stats::cor( stats::qnorm(rwedge(1000, a=2)), stats::qnorm(rwedge(1000, a=-2)) )
SPI-M-O used a range of different statistical and mechanistic models to
produce estimates of the growth rate of the epidemic from various data
sources (including with an early version of ggoutbreak
).
data(england_consensus_growth_rate)
data(england_consensus_growth_rate)
A dataframe containing the following columns:
date (date) - the date of the estimate
low (numeric) - the lower published estimate of the growth rate
high (numeric) - the higher published estimate of the growth rate
No mandatory groupings.
No default value.
111 rows and 3 columns
SPI-M-O used a range of different statistical and mechanistic models to produce estimates of the reproduction number of the epidemic from various data sources.
data(england_consensus_rt)
data(england_consensus_rt)
A dataframe containing the following columns:
date (date) - the date of the estimate
low (numeric) - the lower published estimate of the reproduction number
high (numeric) - the higher published estimate of the reproduction number
No mandatory groupings.
No default value.
113 rows and 3 columns
A dataset of the daily count of COVID-19 cases by age group in England
downloaded from the UKHSA coronavirus API, and formatted for
use in ggoutbreak
. A denominator is calculated which is the overall
positive count for all age groups. This data set can be used to calculate
group-wise incidence and absolute growth rates and group wise proportions and
relative growth rates by age group.
data(england_covid)
data(england_covid)
A dataframe containing the following columns:
date (as.Date) - the date column
class (enum(00_04
,05_09
,10_14
,15_19
,20_24
,25_29
,30_34
,35_39
,40_44
,45_49
,50_54
,55_59
,60_64
,65_69
,70_74
,75_79
,80_84
,85_89
,90+
)) - the class column
count (numeric) - the test positives for each age group
denom (numeric) - the test positives for all age groups
time (time_period) - the time column
Must be grouped by: class (and other groupings allowed).
No default value.
26790 rows and 5 columns
You may want england_covid_proportion
instead which includes the population
denominator. The denominator here is the total number of positive tests in all
age groups and not the tests taken or denominators.
The coronavirus.gov.uk
dashboard published tests conducted and positive
results as separate data sets for a range of geographies. In this case the
data is combined with testing rate as denominator, and positives as count for
England.
data(england_covid_pcr_positivity)
data(england_covid_pcr_positivity)
A dataframe containing the following columns:
date (date) - a daily time series
time (time_period) - the time column
count (numeric) - test positives in England on that day
denom (numeric) - total tests conducted on that day
No mandatory groupings.
No default value.
1413 rows and 4 columns
An age group stratified dataset from
data(england_covid_proportion)
data(england_covid_proportion)
A dataframe containing the following columns:
class (character) - the age group
date (date) - the start date of a week
count (numeric) - the count of COVID positives
denom (numeric) - the number of COVID tests performed
population (numeric) - the size of the population at this age group
time (time_period) - the time column (weekly)
Must be grouped by: class (and other groupings allowed).
No default value.
1050 rows and 6 columns
the coronavirus.gov.uk site for positive cases aggregated to 10 year age groups and by weekly time.
NHS test and trace date which reported regional by age group testing effort aggregated to country level.
ONS 2021 census population aggregated to 10 year age groups.
Population counts by 5 year age group for England only from the 2021 census.
data(england_demographics)
data(england_demographics)
A dataframe containing the following columns:
class (enum(00_04
,05_09
,10_14
,15_19
,20_24
,25_29
,30_34
,35_39
,40_44
,45_49
,50_54
,55_59
,60_64
,65_69
,70_74
,75_79
,80_84
,85_89
,90+
)) - the class column
population (numeric) - the population count column
baseline_proportion (numeric) - the baseline proportion is the proportion this age group makes up of the total.
Must be grouped by: class (and other groupings allowed).
No default value.
19 rows and 3 columns
https://www.ons.gov.uk/file?uri=/peoplepopulationandcommunity/populationandmigration/populationestimates/datasets/populationandhouseholdestimatesenglandandwalescensus2021/census2021/census2021firstresultsenglandwales1.xlsx
This includes mainly the dates of lockdowns, releases from social distancing measures and the dates that new variants were first detected.
data(england_events)
data(england_events)
A dataframe containing the following columns:
label (character) - the event label
start (date) - the event start date
end (date) - the (optional) event end date
No mandatory groupings.
No default value.
13 rows and 3 columns
check-in (social activity) and alerts (self isolation instruction) data from the NHS COVID-19 app, aggregated to country level on a week by week basis.
data(england_nhs_app)
data(england_nhs_app)
A dataframe containing the following columns:
date (date) - the start date of the week
alerts (integer) - the count of self-isolation alerts
visits (integer) - the number of venue check-ins representing visits to social venues.
time (time_period) - the time column
No mandatory groupings.
No default value.
137 rows and 4 columns
The COVID-19 ONS infection survey took a random sample of the population and provides an estimate of the prevalence of COVID-19 that is supposedly free from ascertainment bias.
data(england_ons_infection_survey)
data(england_ons_infection_survey)
A dataframe containing the following columns:
date (date) - the date column
geography (character) - the geography column
prevalence.0.5 (numeric) - the median proportion of people in the region testing positive for COVID-19
prevalence.0.025 (numeric) - the lower CI of the proportion of people in the region testing positive for COVID-19
prevalence.0.975 (numeric) - the upper CI of the proportion of people in the region testing positive for COVID-19
denom (integer) - the sample size on which this estimate was made (daily rate inferred from weekly sample sizes.)
time (time_period) - the time column
No mandatory groupings.
No default value.
9820 rows and 7 columns
The data is available here: https://www.ons.gov.uk/file?uri=/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/datasets/coronaviruscovid19infectionsurveydata/2023/20230310covid19infectionsurveydatasetsengland.xlsx
Data from the COG-UK and Sanger centre sequencing programme. The data were made available through the Welcome foundation at Lower tier local authority level, and is weekly time series of counts per variant. Variants were assigned using the tree structure of the Pango lineage. Different sub-lineages are aggregated to the major WHO variants of concern.
data(england_variants)
data(england_variants)
A dataframe containing the following columns:
date (date) - the end date of the week
time (time_period) - the time column
class (enum(Other
,Alpha (B.1.1.7)
,Delta (B.1.617.2)
,Delta (AY.4)
,Omicron (Other)
,Omicron (BA.2)
,Omicron (BA.4)
,Omicron (BA.5)
,XBB (Other)
,Kraken (XBB.1.5)
,Arcturus (XBB.1.16)
,Eris (EG.5.1)
)) - the class column
who_class (enum(Other
,Alpha
,Delta
,Omicron
,Kraken
,Arcturus
,Eris
)) - the who_class column
count (numeric) - the weekly count column
denom (numeric) - the number of sequences performed in that week
Must be grouped by: class (and other groupings allowed).
No default value.
479 rows and 6 columns
Format date as dmy
fdmy(date)
fdmy(date)
date |
a date to convert |
the formatted date
fdmy(Sys.Date())
fdmy(Sys.Date())
Print a summary of an infectivity profile
format_ip(ip = i_empirical_ip)
format_ip(ip = i_empirical_ip)
ip |
the infectivity profile to summarise. A dataframe containing the following columns:
Must be grouped by: boot (exactly). A default value is defined. |
an infectivity profile description
format_ip(ganyani_ip)
format_ip(ganyani_ip)
A COVID-19 infectivity profile based on an Ganyani et al 2020
The Ganyani serial interval dataset, compatible with EpiEstim
data(ganyani_ip)
data(ganyani_ip)
A dataframe containing the following columns:
boot (anything + default(1)) - a bootstrap identifier
probability (proportion) - the probability of infection between previous time period until time
tau (double) - the time index this probability relates to (in days)
a0 - the beginning of the time period
a1 - the end of the time period
Grouped by boot (exactly).
A dataframe containing the following columns:
boot (anything + default(1)) - a bootstrap identifier
probability (proportion) - the probability of infection between previous time period until time
tau (numeric) - the time index this probability relates to (in days)
a0 (numeric) - the beginning of the time period
a1 (numeric) - the end of the time period
Grouped by: boot.
2400 rows and 5 columns
https://www.eurosurveillance.org/content/10.2807/1560-7917.ES.2020.25.17.2000257
This version is discretised in a manner that makes it incompatible with
EpiEstim
.
data(ganyani_ip_2)
data(ganyani_ip_2)
A dataframe containing the following columns:
boot (anything + default(1)) - a bootstrap identifier
probability (proportion) - the probability of infection between previous time period until time
tau (double) - the time index this probability relates to (in days)
a0 - the beginning of the time period
a1 - the end of the time period
Grouped by boot (exactly).
A dataframe containing the following columns:
boot (anything + default(1)) - a bootstrap identifier
probability (proportion) - the probability of infection between previous time period until time
tau (numeric) - the time index this probability relates to (in days)
a0 (numeric) - the beginning of the time period
a1 (numeric) - the end of the time period
Grouped by: boot.
2800 rows and 5 columns
https://www.eurosurveillance.org/content/10.2807/1560-7917.ES.2020.25.17.2000257
The x axis must be a date.
geom_events( events = i_events, event_label_size = 7, event_label_colour = "black", event_label_angle = -30, event_line_colour = "grey50", event_fill_colour = "grey50", hide_labels = FALSE, guide_axis = ggplot2::derive(), ... )
geom_events( events = i_events, event_label_size = 7, event_label_colour = "black", event_label_angle = -30, event_line_colour = "grey50", event_fill_colour = "grey50", hide_labels = FALSE, guide_axis = ggplot2::derive(), ... )
events |
Significant events or time spans A dataframe containing the following columns:
Any grouping allowed. A default value is defined. |
event_label_size |
how big to make the event label |
event_label_colour |
the event label colour |
event_label_angle |
the event label colour |
event_line_colour |
the event line colour |
event_fill_colour |
the event area fill |
hide_labels |
do not show labels at all |
guide_axis |
a guide axis configuration for the labels
(see |
... |
Named arguments passed on to
|
a set of geoms
for a time series.
A dataset of the weekly count of COVID-19 cases by age group in Germany
downloaded from the Robert Koch Institute Survstat
service, and formatted for
use in growth rates. A denominator is calculated which is the overall
positive count for all age groups. This data set can be used to calculate
group-wise incidence and absolute growth rates and group wise proportions and
relative growth rates.
data(germany_covid)
data(germany_covid)
A dataframe containing the following columns:
class (enum(0–14
,15–19
,20–24
,25–29
,30–39
,40–49
,50–59
,60–69
,70–79
,80+
,Unknown
, .ordered=TRUE)) - the age group
date (as.Date) - the date column
count (integer) - the test positives for each age group
time (time_period) - the time column
denom (integer) - the test positives for all age groups
Must be grouped by: class (and other groupings allowed).
No default value.
2070 rows and 6 columns
Derived from the Robert Koch Survstat service by comparing counts and incidence rates.
data(germany_demographics)
data(germany_demographics)
A dataframe containing the following columns:
class (enum(0–14
,15–19
,20–24
,25–29
,30–39
,40–49
,50–59
,60–69
,70–79
,80+
, .ordered=TRUE)) - the class column
population (integer) - the population column
Must be grouped by: class (and other groupings allowed).
No default value.
10 rows and 2 columns
This function augments any timeseries with a population denominator. The population data may be static estimates, or a set of estimates at time points. The population data may be grouped in which case the grouping might be geographical area or age group or gender for example. The two inputs must have compatible grouping (i.e. all the groups in the population data must be present in the timeseries).
infer_population(df = i_timeseries, pop = i_population_data)
infer_population(df = i_timeseries, pop = i_population_data)
df |
A time series, or a grouped collection of time series. A dataframe containing the following columns:
Any grouping allowed. |
pop |
The population data must be grouped in the same way as A dataframe containing the following columns:
Any grouping allowed. |
the df
timeseries with additional population
column
ggoutbreak::england_covid %>% ggoutbreak::infer_population(ggoutbreak::england_demographics) %>% dplyr::glimpse()
ggoutbreak::england_covid %>% ggoutbreak::infer_population(ggoutbreak::england_demographics) %>% dplyr::glimpse()
Log-scaled incidence estimates are used to generate samples of incidence. These are convolved with the infectivity profile (as a set of discrete distributions) and the result is evaluated as a fraction of population. The result is fitted to a logit-normal (using moments) and quantiles produced from samples.
infer_prevalence( modelled = i_incidence_model, pop = i_population_data, ip = i_discrete_ip, bootstraps = 1000, seed = Sys.time(), ... )
infer_prevalence( modelled = i_incidence_model, pop = i_population_data, ip = i_discrete_ip, bootstraps = 1000, seed = Sys.time(), ... )
modelled |
Model output from processing the A dataframe containing the following columns:
Any grouping allowed. |
pop |
The population data must be grouped in the same way as A dataframe containing the following columns:
Any grouping allowed. |
ip |
A discrete distribution representing the probability of detection of disease on a given day after infection. This will not necessarily sum to one, but each entry will not exceed the asymptomatic fraction. A dataframe containing the following columns:
Must be grouped by: boot (exactly). A default value is defined. |
bootstraps |
the number of samples to take at each time point. This will be rounded up to a whole multiple of the infectivity profile distribution length. |
seed |
a random number seed for reproducibility |
... |
not used |
the modelled input with additional proportion columns
tmp = ggoutbreak::england_covid %>% dplyr::filter(date < "2021-01-01") %>% time_aggregate(count=sum(count)) %>% ggoutbreak::poisson_locfit_model(window=21) %>% ggoutbreak::infer_prevalence( pop = 56489700, # N.B. timing is difficult in this example, as we are using cases. ip = ggoutbreak::covid_test_sensitivity ) if(interactive()) { plot_prevalence(tmp) }
tmp = ggoutbreak::england_covid %>% dplyr::filter(date < "2021-01-01") %>% time_aggregate(count=sum(count)) %>% ggoutbreak::poisson_locfit_model(window=21) %>% ggoutbreak::infer_prevalence( pop = 56489700, # N.B. timing is difficult in this example, as we are using cases. ip = ggoutbreak::covid_test_sensitivity ) if(interactive()) { plot_prevalence(tmp) }
This enables incidence rates are able to be compared to a baseline figure for
incidence. The baseline could come for example from a population average or
average incidence over time. The output is an incidence rate ratio. The
incidence_baseline
column is a rate of events per unit time. The time unit
is expected to be the same as that of the date in modelled
and this is not
checked.
infer_rate_ratio( modelled = i_incidence_model, base = i_baseline_incidence_data, ... )
infer_rate_ratio( modelled = i_incidence_model, base = i_baseline_incidence_data, ... )
modelled |
Model output from something like A dataframe containing the following columns:
Any grouping allowed. |
base |
The baseline data must be grouped in the same way as A dataframe containing the following columns:
Any grouping allowed. |
... |
not used |
a dataframe with incidence rate ratios for each of the classes in modelled. A dataframe containing the following columns:
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
rate_ratio.0.025 (positive_double) - lower confidence limit of the rate ratio for a population group
rate_ratio.0.5 (positive_double) - median estimate of the rate ratio for a population group
rate_ratio.0.975 (positive_double) - upper confidence limit of the rate ratio for a population group
Any grouping allowed.
baseline = ggoutbreak::england_covid %>% ggoutbreak::time_aggregate() %>% ggoutbreak::poisson_locfit_model(window=21) %>% dplyr::mutate(baseline_incidence = incidence.0.5) tmp = ggoutbreak::england_covid %>% ggoutbreak::poisson_locfit_model(window=21) %>% ggoutbreak::infer_rate_ratio(baseline) %>% dplyr::glimpse()
baseline = ggoutbreak::england_covid %>% ggoutbreak::time_aggregate() %>% ggoutbreak::poisson_locfit_model(window=21) %>% dplyr::mutate(baseline_incidence = incidence.0.5) tmp = ggoutbreak::england_covid %>% ggoutbreak::poisson_locfit_model(window=21) %>% ggoutbreak::infer_rate_ratio(baseline) %>% dplyr::glimpse()
This assumes that for example, case distribution proportions are stratified by a population grouping, e.g. geography or age, and we have estimates of the size of that population during that time period. Normalising by population proportion allows us to compare the relative risk of the outcome in groups, compared to the expected population risk if the outcome is evenly distributed across the population. There may be other proportions other than population fractions that may be useful to compare. At the moment this does not handle any uncertainty.
infer_risk_ratio( modelled = i_proportion_model, base = i_baseline_proportion_data, ... )
infer_risk_ratio( modelled = i_proportion_model, base = i_baseline_proportion_data, ... )
modelled |
Model output from something like A dataframe containing the following columns:
Any grouping allowed. |
base |
The baseline data must be grouped in the same way as A dataframe containing the following columns:
Any grouping allowed. |
... |
not used |
a dataframe with relative risk / risk ratio columns. A dataframe containing the following columns:
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
risk_ratio.0.025 (positive_double) - lower confidence limit of the excess risk ratio for a population group
risk_ratio.0.5 (positive_double) - median estimate of the excess risk ratio for a population group
risk_ratio.0.975 (positive_double) - upper confidence limit of the excess risk ratio for a population group
Any grouping allowed.
tmp = ggoutbreak::england_covid %>% ggoutbreak::proportion_locfit_model(window=21) %>% ggoutbreak::infer_risk_ratio(ggoutbreak::england_demographics) %>% dplyr::glimpse() if(interactive()) { plot_growth_phase(tmp) }
tmp = ggoutbreak::england_covid %>% ggoutbreak::proportion_locfit_model(window=21) %>% ggoutbreak::infer_risk_ratio(ggoutbreak::england_demographics) %>% dplyr::glimpse() if(interactive()) { plot_growth_phase(tmp) }
Strictly integer breaks for continuous scale
integer_breaks(n = 5, ...)
integer_breaks(n = 5, ...)
n |
number of breaks |
... |
further arguments for methods. |
a ggplot breaks function
Calculate a growth rate from a reproduction number and an infectivity profile
inv_wallinga_lipsitch( Rt, y = i_empirical_ip, a1 = seq(0.5, length.out = length(y)), a0 = dplyr::lag(a1, default = 0) )
inv_wallinga_lipsitch( Rt, y = i_empirical_ip, a1 = seq(0.5, length.out = length(y)), a0 = dplyr::lag(a1, default = 0) )
Rt |
a vector of reproduction numbers |
y |
an empirical infectivity profile as a probability vector or as a dataframe of format: A dataframe containing the following columns:
Must be grouped by: boot (exactly). A default value is defined. |
a1 |
the end time of the infectivity profile probability estimate (defaults to 0.5,1.5,2.5,...). |
a0 |
the start time of the infectivity profile probability estimate (defaults to 0,0.5,1.5,...). |
an vector of growth rates
y=stats::pgamma(seq(0.5,length.out=50), 5,2)-stats::pgamma(c(0,seq(0.5,length.out=49)), 5,2) inv_wallinga_lipsitch(Rt=seq(0.5,2.5,length.out=9), y=y)
y=stats::pgamma(seq(0.5,length.out=50), 5,2)-stats::pgamma(c(0,seq(0.5,length.out=49)), 5,2) inv_wallinga_lipsitch(Rt=seq(0.5,2.5,length.out=9), y=y)
Check whether vector is a date
is.Date(x)
is.Date(x)
x |
a vector to check |
TRUE if dates, FALSE otherwise
is.Date(Sys.Date())
is.Date(Sys.Date())
Extract the weekday, month or quarter, or the Julian time (days since some origin). These are generic functions: the methods for the internal date-time classes are documented here.
## S3 method for class 'time_period' julian(x, ...)
## S3 method for class 'time_period' julian(x, ...)
x |
an object inheriting from class |
... |
arguments for other methods. |
weekdays
and months
return a character
vector of names in the locale in use, i.e., Sys.getlocale("LC_TIME")
.
quarters
returns a character vector of "Q1"
to
"Q4"
.
julian
returns the number of days (possibly fractional)
since the origin, with the origin as a "origin"
attribute.
All time calculations in R are done ignoring leap-seconds.
Other components such as the day of the month or the year are
very easy to compute: just use as.POSIXlt
and extract
the relevant component. Alternatively (especially if the components
are desired as character strings), use strftime
.
DateTimeClasses
, Date
;
Sys.getlocale("LC_TIME")
crucially for months()
and weekdays()
.
## first two are locale dependent: weekdays(.leap.seconds) months (.leap.seconds) quarters(.leap.seconds) ## Show how easily you get month, day, year, day (of {month, week, yr}), ... : ## (remember to count from 0 (!): mon = 0..11, wday = 0..6, etc !!) ##' Transform (Time-)Date vector to convenient data frame : dt2df <- function(dt, dName = deparse(substitute(dt))) { DF <- as.data.frame(unclass(as.POSIXlt( dt ))) `names<-`(cbind(dt, DF, deparse.level=0L), c(dName, names(DF))) } ## e.g., dt2df(.leap.seconds) # date+time dt2df(Sys.Date() + 0:9) # date ##' Even simpler: Date -> Matrix - dropping time info {sec,min,hour, isdst} d2mat <- function(x) simplify2array(unclass(as.POSIXlt(x))[4:7]) ## e.g., d2mat(seq(as.Date("2000-02-02"), by=1, length.out=30)) # has R 1.0.0's release date ## Julian Day Number (JDN, https://en.wikipedia.org/wiki/Julian_day) ## is the number of days since noon UTC on the first day of 4317 BCE. ## in the proleptic Julian calendar. To more recently, in ## 'Terrestrial Time' which differs from UTC by a few seconds ## See https://en.wikipedia.org/wiki/Terrestrial_Time julian(Sys.Date(), -2440588) # from a day floor(as.numeric(julian(Sys.time())) + 2440587.5) # from a date-time
## first two are locale dependent: weekdays(.leap.seconds) months (.leap.seconds) quarters(.leap.seconds) ## Show how easily you get month, day, year, day (of {month, week, yr}), ... : ## (remember to count from 0 (!): mon = 0..11, wday = 0..6, etc !!) ##' Transform (Time-)Date vector to convenient data frame : dt2df <- function(dt, dName = deparse(substitute(dt))) { DF <- as.data.frame(unclass(as.POSIXlt( dt ))) `names<-`(cbind(dt, DF, deparse.level=0L), c(dName, names(DF))) } ## e.g., dt2df(.leap.seconds) # date+time dt2df(Sys.Date() + 0:9) # date ##' Even simpler: Date -> Matrix - dropping time info {sec,min,hour, isdst} d2mat <- function(x) simplify2array(unclass(as.POSIXlt(x))[4:7]) ## e.g., d2mat(seq(as.Date("2000-02-02"), by=1, length.out=30)) # has R 1.0.0's release date ## Julian Day Number (JDN, https://en.wikipedia.org/wiki/Julian_day) ## is the number of days since noon UTC on the first day of 4317 BCE. ## in the proleptic Julian calendar. To more recently, in ## 'Terrestrial Time' which differs from UTC by a few seconds ## See https://en.wikipedia.org/wiki/Terrestrial_Time julian(Sys.Date(), -2440588) # from a day floor(as.numeric(julian(Sys.time())) + 2440587.5) # from a date-time
Create a set of labels for a time period based on the start and duration of
the period. The format is configurable using the start and end dates and the
dfmt
and ifmt
parameters, however if the time period has names then these
are used in preference.
## S3 method for class 'time_period' labels( object, ..., dfmt = "%d/%b", ifmt = "{start} — {end}", na.value = "Unknown" )
## S3 method for class 'time_period' labels( object, ..., dfmt = "%d/%b", ifmt = "{start} — {end}", na.value = "Unknown" )
object |
a set of decimal times as a time_period |
... |
not used |
dfmt |
a |
ifmt |
a glue spec referring to |
na.value |
a label for |
a set of character labels for the time
eg = as.time_period(Sys.Date()+0:10*7, anchor="start") labels(eg) labels(eg, ifmt="{start}", dfmt="%d/%b/%y") labels(eg, ifmt="until {end}", dfmt="%d %b %Y") # labels retained in constructor: eg2 = Sys.Date()+0:10*7 names(eg2) = paste0("week ",0:10) labels(eg2) labels(as.time_period(eg2, anchor="start"))
eg = as.time_period(Sys.Date()+0:10*7, anchor="start") labels(eg) labels(eg, ifmt="{start}", dfmt="%d/%b/%y") labels(eg, ifmt="until {end}", dfmt="%d %b %Y") # labels retained in constructor: eg2 = Sys.Date()+0:10*7 names(eg2) = paste0("week ",0:10) labels(eg2) labels(as.time_period(eg2, anchor="start"))
Perform logit scaling with correct axis formatting. To not be used directly but with ggplot (e.g. ggplot2::scale_y_continuous(trans = "logit") )
logit_trans(n = 5, ...)
logit_trans(n = 5, ...)
n |
the number of breaks |
... |
not used, for compatibility |
A scales object
library(ggplot2) library(tibble) tibble::tibble(pvalue = c(0.001, 0.05, 0.1), fold_change = 1:3) %>% ggplot2::ggplot(aes(fold_change , pvalue)) + ggplot2::geom_point() + ggplot2::scale_y_continuous(trans = "logit")
library(ggplot2) library(tibble) tibble::tibble(pvalue = c(0.001, 0.05, 0.1), fold_change = 1:3) %>% ggplot2::ggplot(aes(fold_change , pvalue)) + ggplot2::geom_point() + ggplot2::scale_y_continuous(trans = "logit")
EpiEstim
style matrixRecover a long format infectivity profile from an EpiEstim
style matrix
make_empirical_ip(omega, normalise = TRUE)
make_empirical_ip(omega, normalise = TRUE)
omega |
a matrix of probabilities, starting at time zero, with columns representing one possible infectivity profile, with the fist value being the probability at time zero to 0.5. Or a vector of probabilities for one single profile, resulting in 1 bootstrap. |
normalise |
is this a probability mass function? In which case we make the
sum of this equal to one (the default). If |
a long format ip
delay distribution
format_ip(make_empirical_ip(c(0,0,1,1,1,2,2,2,1,1)))
format_ip(make_empirical_ip(c(0,0,1,1,1,2,2,2,1,1)))
The infectivity profile is typically fitted to data by MCMC and reported as median and 95% credible intervals, of the mean, and the SD of (usually_ a gamma distribution. This function generates a discrete infectivity probability distribution representing the chance that an infectee was infected on any specific day after the infector was infected (given that the infectee was infected).
make_gamma_ip( median_of_mean, lower_ci_of_mean = median_of_mean, upper_ci_of_mean = median_of_mean, median_of_sd = sqrt(median_of_mean), lower_ci_of_sd = median_of_sd, upper_ci_of_sd = median_of_sd, correlation = NA, n_boots = 100, epiestim_compat = FALSE, epiestim_sampler = epiestim_compat, z_crit = 0.95 )
make_gamma_ip( median_of_mean, lower_ci_of_mean = median_of_mean, upper_ci_of_mean = median_of_mean, median_of_sd = sqrt(median_of_mean), lower_ci_of_sd = median_of_sd, upper_ci_of_sd = median_of_sd, correlation = NA, n_boots = 100, epiestim_compat = FALSE, epiestim_sampler = epiestim_compat, z_crit = 0.95 )
median_of_mean , lower_ci_of_mean , upper_ci_of_mean
|
Parameters of the infectivity profile mean. |
median_of_sd , lower_ci_of_sd , upper_ci_of_sd
|
Parameters of the infectivity profile SD. |
correlation |
the correlation between mean and sd. this is optional
and will be inferred if not changed form the default |
n_boots |
The number of samples to generate. |
epiestim_compat |
Use |
epiestim_sampler |
Use |
z_crit |
the width of the confidence intervals (defaults to 95%). |
EpiEstim
generates these distributions by sampling from a truncated normal
distribution for both mean and sd. The means and sds thus produced are
discretised using a gamma distribution offset by 1 day, to enforce that the
probability of infection on day zero is zero.
This constraint changes the shape of the distribution somewhat and may cause
a small bias (although there is no ground truth to evaluate). A different
sampling and discretisation strategy is provided. The sampler uses log-normal
distributions for both mean and SD with a degree of correlation. The
discretizer assigns probabilities direct from the CDF of the gamma
distribution without offset. This results in non zero values for the
probability at time zero and can only be used with Rt estimation methods that
can handle zero/negative serial intervals (e.g. rt_from_incidence
or
rt_from_renewal
, or rt_from_growth_rate
).
a long format infectivity profile data frame, or a list of dataframes if input is a vector.
# COVID-19 estimates from Ganyani et al 2020. tmp = make_gamma_ip(5.2, 3.78, 6.78, 1.72, 0.91, 3.93, epiestim_sampler=FALSE, epiestim_compat=FALSE) tmp %>% dplyr::group_by(boot) %>% dplyr::summarise( mean = sum(tau*probability), sd = sqrt(sum((tau-sum(tau*probability))^2*probability)) ) %>% dplyr::summarise( mean = sprintf("%1.2f [%1.2f-%1.2f]", stats::quantile(mean,0.5), stats::quantile(mean,0.025), stats::quantile(mean,0.975)), sd = sprintf("%1.2f [%1.2f-%1.2f]", stats::quantile(sd,0.5), stats::quantile(sd,0.025), stats::quantile(sd,0.975)) ) if(interactive()) { plot_ip(tmp, alpha=0.1) + ggplot2::coord_cartesian(xlim=c(0,15)) } means = c(3,4,5) ips = make_gamma_ip(means) if (interactive()) { purrr::map2(ips,means, ~ .x %>% dplyr::mutate(label = sprintf("Mean: %1.2f",.y))) %>% purrr::map( ~ plot_ip(.x,alpha=0.1)+ggplot2::facet_wrap(~label)) }
# COVID-19 estimates from Ganyani et al 2020. tmp = make_gamma_ip(5.2, 3.78, 6.78, 1.72, 0.91, 3.93, epiestim_sampler=FALSE, epiestim_compat=FALSE) tmp %>% dplyr::group_by(boot) %>% dplyr::summarise( mean = sum(tau*probability), sd = sqrt(sum((tau-sum(tau*probability))^2*probability)) ) %>% dplyr::summarise( mean = sprintf("%1.2f [%1.2f-%1.2f]", stats::quantile(mean,0.5), stats::quantile(mean,0.025), stats::quantile(mean,0.975)), sd = sprintf("%1.2f [%1.2f-%1.2f]", stats::quantile(sd,0.5), stats::quantile(sd,0.025), stats::quantile(sd,0.975)) ) if(interactive()) { plot_ip(tmp, alpha=0.1) + ggplot2::coord_cartesian(xlim=c(0,15)) } means = c(3,4,5) ips = make_gamma_ip(means) if (interactive()) { purrr::map2(ips,means, ~ .x %>% dplyr::mutate(label = sprintf("Mean: %1.2f",.y))) %>% purrr::map( ~ plot_ip(.x,alpha=0.1)+ggplot2::facet_wrap(~label)) }
The infectivity profile is typically fitted to data by MCMC as a gamma distribution. This function generates a discrete infectivity probability distribution representing the chance that an infectee was infected on any specific day after the infector was infected (given that the infectee was infected), from posterior samples.
make_posterior_ip( ..., mean, sd, shape, rate, scale, epiestim_compat = FALSE, n_boots = 100 )
make_posterior_ip( ..., mean, sd, shape, rate, scale, epiestim_compat = FALSE, n_boots = 100 )
... |
not used, must be empty |
mean |
a vector of gamma distribution means |
sd |
a vector of gamma distribution sds |
shape |
a vector of gamma distribution shape parameters |
rate |
a vector of gamma distribution rate parameters |
scale |
a vector of gamma distribution scale parameters |
epiestim_compat |
Use |
n_boots |
if there are more posterior samples than this limit then a maximum
of |
If using EpiEstim
and coarseDataTools::dic.fit.mcmc
the output of the
MCMC will be a S4 object with a samples
slot, containing a dataframe of
shape=var1
and scale=var2
columns. to use this output with
make_posterior_ip
invoke it like this:
do.call(make_posterior_ip, SI_fit_clever@samples %>% dplyr::rename(shape=var1, scale=var2))
N.b. only one combination of mean and sd, shape and rate, or shape and scale, are required.
a long format ip
delay distribution
tmp = make_posterior_ip( mean = stats::rnorm(100,5,0.1), sd = stats::rnorm(100,1.5,0.1) ) tmp %>% dplyr::glimpse() if (interactive()) plot_ip(tmp)
tmp = make_posterior_ip( mean = stats::rnorm(100,5,0.1), sd = stats::rnorm(100,1.5,0.1) ) tmp %>% dplyr::glimpse() if (interactive()) plot_ip(tmp)
Suits larger contact tracing data sets where there is a delay between 2 events which may or may not be precisely known.
make_resampled_ip( tau, min_tau = pmax(tau - 0.5, truncate), max_tau = tau + 0.5, add_noise = TRUE, truncate = 0, n_boots = 100, seed = Sys.time() )
make_resampled_ip( tau, min_tau = pmax(tau - 0.5, truncate), max_tau = tau + 0.5, add_noise = TRUE, truncate = 0, n_boots = 100, seed = Sys.time() )
tau |
the delay between first and second events |
min_tau |
the minimum delay for interval censored delays |
max_tau |
the maximum delay for interval censored delays |
add_noise |
adds noise to each date point for each replicate |
truncate |
what is the minimum realistic value of the parameter |
n_boots |
number of replicates to generate |
seed |
a random number seed for reproducibility |
a long format ip
delay distribution
tau = rgamma2(100, 5,2) ip = make_resampled_ip(min_tau = tau-1, max_tau = tau+1, seed = 100) if(interactive()) { plot_ip(ip,alpha=0.1) }
tau = rgamma2(100, 5,2) ip = make_resampled_ip(min_tau = tau-1, max_tau = tau+1, seed = 100) if(interactive()) { plot_ip(ip,alpha=0.1) }
max.Date
returns an integer and -Inf
for a set of NA
dates. This
is usually inconvenient.
max_date(x, ...)
max_date(x, ...)
x |
a vector of dates |
... |
ignored |
a date. '0001-01-01“ if there is no well defined minimum.
max_date(NA)
max_date(NA)
min.Date
returns an integer and Inf
for a set of NA
dates. This
is usually inconvenient.
min_date(x, ...)
min_date(x, ...)
x |
a vector of dates |
... |
ignored |
a date. 9999-12-31
if there is no well defined minimum.
min_date(NA)
min_date(NA)
Extract the weekday, month or quarter, or the Julian time (days since some origin). These are generic functions: the methods for the internal date-time classes are documented here.
## S3 method for class 'time_period' months(x, abbreviate = FALSE)
## S3 method for class 'time_period' months(x, abbreviate = FALSE)
x |
an object inheriting from class |
abbreviate |
logical vector (possibly recycled). Should the names be abbreviated? |
weekdays
and months
return a character
vector of names in the locale in use, i.e., Sys.getlocale("LC_TIME")
.
quarters
returns a character vector of "Q1"
to
"Q4"
.
julian
returns the number of days (possibly fractional)
since the origin, with the origin as a "origin"
attribute.
All time calculations in R are done ignoring leap-seconds.
Other components such as the day of the month or the year are
very easy to compute: just use as.POSIXlt
and extract
the relevant component. Alternatively (especially if the components
are desired as character strings), use strftime
.
DateTimeClasses
, Date
;
Sys.getlocale("LC_TIME")
crucially for months()
and weekdays()
.
## first two are locale dependent: weekdays(.leap.seconds) months (.leap.seconds) quarters(.leap.seconds) ## Show how easily you get month, day, year, day (of {month, week, yr}), ... : ## (remember to count from 0 (!): mon = 0..11, wday = 0..6, etc !!) ##' Transform (Time-)Date vector to convenient data frame : dt2df <- function(dt, dName = deparse(substitute(dt))) { DF <- as.data.frame(unclass(as.POSIXlt( dt ))) `names<-`(cbind(dt, DF, deparse.level=0L), c(dName, names(DF))) } ## e.g., dt2df(.leap.seconds) # date+time dt2df(Sys.Date() + 0:9) # date ##' Even simpler: Date -> Matrix - dropping time info {sec,min,hour, isdst} d2mat <- function(x) simplify2array(unclass(as.POSIXlt(x))[4:7]) ## e.g., d2mat(seq(as.Date("2000-02-02"), by=1, length.out=30)) # has R 1.0.0's release date ## Julian Day Number (JDN, https://en.wikipedia.org/wiki/Julian_day) ## is the number of days since noon UTC on the first day of 4317 BCE. ## in the proleptic Julian calendar. To more recently, in ## 'Terrestrial Time' which differs from UTC by a few seconds ## See https://en.wikipedia.org/wiki/Terrestrial_Time julian(Sys.Date(), -2440588) # from a day floor(as.numeric(julian(Sys.time())) + 2440587.5) # from a date-time
## first two are locale dependent: weekdays(.leap.seconds) months (.leap.seconds) quarters(.leap.seconds) ## Show how easily you get month, day, year, day (of {month, week, yr}), ... : ## (remember to count from 0 (!): mon = 0..11, wday = 0..6, etc !!) ##' Transform (Time-)Date vector to convenient data frame : dt2df <- function(dt, dName = deparse(substitute(dt))) { DF <- as.data.frame(unclass(as.POSIXlt( dt ))) `names<-`(cbind(dt, DF, deparse.level=0L), c(dName, names(DF))) } ## e.g., dt2df(.leap.seconds) # date+time dt2df(Sys.Date() + 0:9) # date ##' Even simpler: Date -> Matrix - dropping time info {sec,min,hour, isdst} d2mat <- function(x) simplify2array(unclass(as.POSIXlt(x))[4:7]) ## e.g., d2mat(seq(as.Date("2000-02-02"), by=1, length.out=30)) # has R 1.0.0's release date ## Julian Day Number (JDN, https://en.wikipedia.org/wiki/Julian_day) ## is the number of days since noon UTC on the first day of 4317 BCE. ## in the proleptic Julian calendar. To more recently, in ## 'Terrestrial Time' which differs from UTC by a few seconds ## See https://en.wikipedia.org/wiki/Terrestrial_Time julian(Sys.Date(), -2440588) # from a day floor(as.numeric(julian(Sys.time())) + 2440587.5) # from a date-time
Takes a list of times, classes and counts, e.g. a COGUK variant like data set
with time, (multinomial) class (e.g. variant) and count being the count in
that time period. Fits a quadratic B-spline on time to the proportion of the
data using nnet::multinom
, with approx one degree of freedom per class and per
window
units of the time series
multinomial_nnet_model( d = i_multinomial_input, ..., window = 14, frequency = "1 day", predict = TRUE )
multinomial_nnet_model( d = i_multinomial_input, ..., window = 14, frequency = "1 day", predict = TRUE )
d |
multi-class count input |
... |
not used and present to allow proportion model to be used in a group_modify |
window |
a number of data points between knots, smaller values result in less smoothing, large value in more. |
frequency |
the density of the output estimates. |
predict |
result a prediction. If false we return the model. |
a new dataframe with time
(as a time period), class
, and proportion.0.5
, or a model object
if (interactive()) { # not run due to long running tmp = ggoutbreak::england_covid %>% dplyr::filter(date > "2022-01-01") %>% ggoutbreak::multinomial_nnet_model(window=21) %>% dplyr::glimpse() }
if (interactive()) { # not run due to long running tmp = ggoutbreak::england_covid %>% dplyr::filter(date > "2022-01-01") %>% ggoutbreak::multinomial_nnet_model(window=21) %>% dplyr::glimpse() }
This assumes positive disease counts are stratified by a population grouping, e.g. geography or age, and we have estimates of the size of that population during that time period. Normalising by population size allows us to compare groups.
normalise_count( raw = i_incidence_data, pop = i_population_data, ..., population_unit = 1e+05, normalise_time = FALSE )
normalise_count( raw = i_incidence_data, pop = i_population_data, ..., population_unit = 1e+05, normalise_time = FALSE )
raw |
The count data A dataframe containing the following columns:
Any grouping allowed. |
pop |
The population data must be grouped in the same way as A dataframe containing the following columns:
Any grouping allowed. |
... |
not used |
population_unit |
What population unit do you want the count data normalised to e.g. per 100K |
normalise_time |
The default behaviour for normalising is to keep it in
the same time units as the input data. If this parameter is set to |
a dataframe with incidence rates per unit capita. A dataframe containing the following columns:
population (positive_integer) - Size of population
count (positive_integer) - Positive case counts associated with the specified time frame
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
population_unit (double) - The population unit on which the per capita incidence rate is calculated
time_unit (lubridate::as.period) - The time period over which the per capita incidence rate is calculated
Any grouping allowed.
tmp = ggoutbreak::england_covid %>% ggoutbreak::normalise_count(ggoutbreak::england_demographics) %>% dplyr::glimpse()
tmp = ggoutbreak::england_covid %>% ggoutbreak::normalise_count(ggoutbreak::england_demographics) %>% dplyr::glimpse()
This assumes positive disease counts are stratified by a population grouping, e.g. geography or age, and we have estimates of the size of that population during that time period. Normalising by population size allows us to compare groups.
normalise_incidence( modelled = i_incidence_model, pop = i_population_data, ..., population_unit = 1e+05, normalise_time = FALSE )
normalise_incidence( modelled = i_incidence_model, pop = i_population_data, ..., population_unit = 1e+05, normalise_time = FALSE )
modelled |
Model output from processing the A dataframe containing the following columns:
Any grouping allowed. |
pop |
The population data must be grouped in the same way as A dataframe containing the following columns:
Any grouping allowed. |
... |
not used |
population_unit |
what population unit do you want the incidence in e.g. per 100K |
normalise_time |
The default behaviour for incidence is to keep it in
the same time units as the input data. If this parameter is set to |
a dataframe with incidence rates per unit capita. A dataframe containing the following columns:
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
incidence.per_capita.fit (double) - an estimate of the incidence per capita rate on a log scale
incidence.per_capita.se.fit (positive_double) - the standard error of the incidence per capita rate estimate on a log scale
incidence.per_capita.0.025 (positive_double) - lower confidence limit of the incidence per capita rate (true scale)
incidence.per_capita.0.5 (positive_double) - median estimate of the incidence per capita rate (true scale)
incidence.per_capita.0.975 (positive_double) - upper confidence limit of the incidence per capita rate (true scale)
population_unit (double) - The population unit on which the per capita incidence rate is calculated
time_unit (lubridate::as.period) - The time period over which the per capita incidence rate is calculated
Any grouping allowed.
tmp = ggoutbreak::england_covid %>% ggoutbreak::poisson_locfit_model(window=21) %>% ggoutbreak::normalise_incidence(ggoutbreak::england_demographics) %>% dplyr::glimpse()
tmp = ggoutbreak::england_covid %>% ggoutbreak::poisson_locfit_model(window=21) %>% ggoutbreak::normalise_incidence(ggoutbreak::england_demographics) %>% dplyr::glimpse()
Density, distribution function, quantile function and random
generation for the Beta distribution with parameters shape1
and
shape2
(and optional non-centrality parameter ncp
).
pbeta2(q, prob, kappa, lower.tail = TRUE, log.p = FALSE)
pbeta2(q, prob, kappa, lower.tail = TRUE, log.p = FALSE)
q |
vector of quantiles |
prob |
the mean probability (vectorised) |
kappa |
a coefficient of variation. where 0 is no variability and 1 is maximally variability (vectorised) |
lower.tail |
logical; if TRUE (default), probabilities are |
log.p |
logical; if TRUE, probabilities p are given as log(p). |
dbeta
gives the density, pbeta
the distribution
function, qbeta
the quantile function, and rbeta
generates random deviates.
Invalid arguments will result in return value NaN
, with a warning.
The length of the result is determined by n
for
rbeta
, and is the maximum of the lengths of the
numerical arguments for the other functions.
The numerical arguments other than n
are recycled to the
length of the result. Only the first elements of the logical
arguments are used.
pbeta2(c(0.25,0.5,0.75), 0.5, 0.25)
pbeta2(c(0.25,0.5,0.75), 0.5, 0.25)
Density, distribution function, quantile function and random
generation for the Gamma distribution with parameters shape
and
scale
.
pgamma2( q, mean, sd = sqrt(mean), lower.tail = TRUE, log.p = FALSE, convex = TRUE )
pgamma2( q, mean, sd = sqrt(mean), lower.tail = TRUE, log.p = FALSE, convex = TRUE )
q |
vector of quantiles |
mean |
the mean value on the true scale (vectorised) |
sd |
the standard deviation on the true scale (vectorised) |
lower.tail |
logical; if TRUE (default), probabilities are |
log.p |
logical; if TRUE, probabilities p are given as log(p). |
convex |
Show a warning if the distribution selected is not a convex function |
dgamma
gives the density,
pgamma
gives the distribution function,
qgamma
gives the quantile function, and
rgamma
generates random deviates.
Invalid arguments will result in return value NaN
, with a warning.
The length of the result is determined by n
for
rgamma
, and is the maximum of the lengths of the
numerical arguments for the other functions.
The numerical arguments other than n
are recycled to the
length of the result. Only the first elements of the logical
arguments are used.
pgamma2(seq(0,4,0.25), 2, 1)
pgamma2(seq(0,4,0.25), 2, 1)
Density, distribution function, quantile function and random
generation for the log normal distribution whose logarithm has mean
equal to meanlog
and standard deviation equal to sdlog
.
plnorm2(q, mean = 1, sd = sqrt(exp(1) - 1), lower.tail = TRUE, log.p = FALSE)
plnorm2(q, mean = 1, sd = sqrt(exp(1) - 1), lower.tail = TRUE, log.p = FALSE)
q |
vector of quantiles |
mean |
the mean value on the true scale (vectorised) |
sd |
the standard deviation on the true scale (vectorised) |
lower.tail |
logical; if TRUE (default), probabilities are
|
log.p |
logical; if TRUE, probabilities p are given as log(p). |
The log normal distribution has density
where and
are the mean and standard
deviation of the logarithm.
The mean is
,
the median is
, and the variance
and hence the coefficient of variation is
which is
approximately
when that is small (e.g.,
).
dlnorm
gives the density,
plnorm
gives the distribution function,
qlnorm
gives the quantile function, and
rlnorm
generates random deviates.
The length of the result is determined by n
for
rlnorm
, and is the maximum of the lengths of the
numerical arguments for the other functions.
The numerical arguments other than n
are recycled to the
length of the result. Only the first elements of the logical
arguments are used.
The cumulative hazard
is
-plnorm(t, r, lower = FALSE, log = TRUE)
.
dlnorm
is calculated from the definition (in ‘Details’).
[pqr]lnorm
are based on the relationship to the normal.
Consequently, they model a single point mass at exp(meanlog)
for the boundary case sdlog = 0
.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.
Distributions for other standard distributions, including
dnorm
for the normal distribution.
plnorm2(seq(0,4,0.25), 2, 1)
plnorm2(seq(0,4,0.25), 2, 1)
Plot a raw case counts as a histogram
plot_cases( raw = i_timestamped, ..., mapping = .check_for_aes(raw, ..., class_aes = "fill"), events = i_events )
plot_cases( raw = i_timestamped, ..., mapping = .check_for_aes(raw, ..., class_aes = "fill"), events = i_events )
raw |
The raw case data either as a summarised count or as a line-list A dataframe containing the following columns:
Any grouping allowed. |
... |
Named arguments passed on to
|
mapping |
a |
events |
Significant events or time spans A dataframe containing the following columns:
Any grouping allowed. A default value is defined. |
a ggplot object
# 50 random times: tmp = tibble::tibble( time = as.time_period( sample.int(10,50,replace=TRUE) ,"1 day"), class = rep(c("one","two","three"), length.out=50) ) if(interactive()) { plot_cases(tmp) }
# 50 random times: tmp = tibble::tibble( time = as.time_period( sample.int(10,50,replace=TRUE) ,"1 day"), class = rep(c("one","two","three"), length.out=50) ) if(interactive()) { plot_cases(tmp) }
Plot a raw case count timeseries
plot_counts( raw = i_incidence_data, ..., mapping = .check_for_aes(raw, ...), events = i_events )
plot_counts( raw = i_incidence_data, ..., mapping = .check_for_aes(raw, ...), events = i_events )
raw |
The raw count data, or the raw count data normalised by population (
see A dataframe containing the following columns:
Any grouping allowed. |
... |
Named arguments passed on to
|
mapping |
a |
events |
Significant events or time spans A dataframe containing the following columns:
Any grouping allowed. A default value is defined. |
a ggplot object
# example code tmp = ggoutbreak::england_covid %>% time_aggregate(count=sum(count)) %>% normalise_count(pop=56489700, population_unit=1000, normalise_time=TRUE) # normalised by England population (56489700 people) if(interactive()) { plot_counts(tmp, colour="blue",size=0.25) }
# example code tmp = ggoutbreak::england_covid %>% time_aggregate(count=sum(count)) %>% normalise_count(pop=56489700, population_unit=1000, normalise_time=TRUE) # normalised by England population (56489700 people) if(interactive()) { plot_counts(tmp, colour="blue",size=0.25) }
Plot an incidence or proportion versus growth phase diagram
plot_growth_phase( modelled = i_timestamped, timepoints = NULL, duration = max(dplyr::count(modelled)$n), interval = 7, mapping = if (interfacer::is_col_present(modelled, class)) ggplot2::aes(colour = class) else ggplot2::aes(), cis = TRUE, ... )
plot_growth_phase( modelled = i_timestamped, timepoints = NULL, duration = max(dplyr::count(modelled)$n), interval = 7, mapping = if (interfacer::is_col_present(modelled, class)) ggplot2::aes(colour = class) else ggplot2::aes(), cis = TRUE, ... )
modelled |
Either: A dataframe containing the following columns:
Any grouping allowed. OR: A dataframe containing the following columns:
Any grouping allowed. |
timepoints |
time points (as |
duration |
the length of the growth rate phase trail |
interval |
the length of time between markers on the phase plot |
mapping |
a |
cis |
logical; should the phases be marked with confidence intervals? |
... |
Named arguments passed on to
Named arguments passed on to
|
a ggplot
tmp = ggoutbreak::england_covid %>% time_aggregate(count=sum(count)) tmp_pop = ggoutbreak::england_demographics %>% dplyr::ungroup() %>% dplyr::summarise(population = sum(population)) # If the incidence is normalised by population tmp2 = tmp %>% poisson_locfit_model() %>% normalise_incidence(tmp_pop) timepoints = as.Date(c("Lockdown 1" = "2020-03-30", "Lockdown 2" = "2020-12-31")) if(interactive()) { plot_growth_phase(tmp2, timepoints, duration=108) }
tmp = ggoutbreak::england_covid %>% time_aggregate(count=sum(count)) tmp_pop = ggoutbreak::england_demographics %>% dplyr::ungroup() %>% dplyr::summarise(population = sum(population)) # If the incidence is normalised by population tmp2 = tmp %>% poisson_locfit_model() %>% normalise_incidence(tmp_pop) timepoints = as.Date(c("Lockdown 1" = "2020-03-30", "Lockdown 2" = "2020-12-31")) if(interactive()) { plot_growth_phase(tmp2, timepoints, duration=108) }
Growth rate timeseries diagram
plot_growth_rate( modelled = i_timeseries, ..., mapping = .check_for_aes(modelled, ...), events = i_events )
plot_growth_rate( modelled = i_timeseries, ..., mapping = .check_for_aes(modelled, ...), events = i_events )
modelled |
Either: A dataframe containing the following columns:
Any grouping allowed. OR: A dataframe containing the following columns:
Any grouping allowed. |
... |
Named arguments passed on to
|
mapping |
a |
events |
Significant events or time spans A dataframe containing the following columns:
Any grouping allowed. A default value is defined. |
a ggplot
# example code tmp = ggoutbreak::england_covid %>% time_aggregate(count=sum(count)) tmp_pop = ggoutbreak::england_demographics %>% dplyr::ungroup() %>% dplyr::summarise(population = sum(population)) # If the incidence is normalised by population tmp2 = tmp %>% poisson_locfit_model() %>% normalise_incidence(tmp_pop) if(interactive()) { plot_growth_rate(tmp2,colour="blue") } tmp3 = ggoutbreak::england_covid %>% proportion_locfit_model() if(interactive()) { plot_growth_rate(tmp3) }
# example code tmp = ggoutbreak::england_covid %>% time_aggregate(count=sum(count)) tmp_pop = ggoutbreak::england_demographics %>% dplyr::ungroup() %>% dplyr::summarise(population = sum(population)) # If the incidence is normalised by population tmp2 = tmp %>% poisson_locfit_model() %>% normalise_incidence(tmp_pop) if(interactive()) { plot_growth_rate(tmp2,colour="blue") } tmp3 = ggoutbreak::england_covid %>% proportion_locfit_model() if(interactive()) { plot_growth_rate(tmp3) }
Plot an incidence timeseries
plot_incidence( modelled = i_incidence_model, raw = i_incidence_data, ..., mapping = .check_for_aes(modelled, ...), events = i_events )
plot_incidence( modelled = i_incidence_model, raw = i_incidence_data, ..., mapping = .check_for_aes(modelled, ...), events = i_events )
modelled |
An optional estimate of the incidence time series. If A dataframe containing the following columns:
Any grouping allowed.
|
raw |
The raw count data A dataframe containing the following columns:
Any grouping allowed. |
... |
Named arguments passed on to
Named arguments passed on to
|
mapping |
a |
events |
Significant events or time spans A dataframe containing the following columns:
Any grouping allowed. A default value is defined. |
a ggplot object
# example code tmp = ggoutbreak::england_covid %>% time_aggregate(count=sum(count)) %>% normalise_count(56489700, population_unit=1000, normalise_time="1 year") tmp2 = tmp %>% poisson_locfit_model() if(interactive()) { plot_incidence(tmp2,tmp,colour="blue",size=0.25) }
# example code tmp = ggoutbreak::england_covid %>% time_aggregate(count=sum(count)) %>% normalise_count(56489700, population_unit=1000, normalise_time="1 year") tmp2 = tmp %>% poisson_locfit_model() if(interactive()) { plot_incidence(tmp2,tmp,colour="blue",size=0.25) }
Plot an infectivity profile
plot_ip(ip = i_empirical_ip, alpha = 0.1, ...)
plot_ip(ip = i_empirical_ip, alpha = 0.1, ...)
ip |
A long format infectivity profile A dataframe containing the following columns:
Must be grouped by: boot (exactly). A default value is defined. |
alpha |
the alpha value of the bootstrap lines |
... |
passed onto geom_segment for controlling line thickness, alpha etc. |
an ggplot object
if(interactive()) { plot_ip(ganyani_ip,alpha=0.1) }
if(interactive()) { plot_ip(ganyani_ip,alpha=0.1) }
Plot a multinomial proportions mode
plot_multinomial( modelled = i_multinomial_proportion_model, ..., mapping = ggplot2::aes(fill = class), events = i_events, normalise = FALSE )
plot_multinomial( modelled = i_multinomial_proportion_model, ..., mapping = ggplot2::aes(fill = class), events = i_events, normalise = FALSE )
modelled |
the multinomial count data A dataframe containing the following columns:
Must be grouped by: class (exactly). |
... |
Named arguments passed on to
|
mapping |
a |
events |
Significant events or time spans A dataframe containing the following columns:
Any grouping allowed. A default value is defined. |
normalise |
make sure the probabilities add up to one - this can be a bad idea if you know you may have missing values. |
a ggplot
tmp = ggoutbreak::england_covid %>% ggoutbreak::proportion_locfit_model(window=21) %>% dplyr::glimpse() if(interactive()) { plot_multinomial(tmp, normalise=TRUE)+ ggplot2::scale_fill_viridis_d() }
tmp = ggoutbreak::england_covid %>% ggoutbreak::proportion_locfit_model(window=21) %>% dplyr::glimpse() if(interactive()) { plot_multinomial(tmp, normalise=TRUE)+ ggplot2::scale_fill_viridis_d() }
Plot a proportions timeseries
plot_prevalence( modelled = i_prevalence_model, raw = i_proportion_data, ..., mapping = .check_for_aes(modelled, ...), events = i_events )
plot_prevalence( modelled = i_prevalence_model, raw = i_proportion_data, ..., mapping = .check_for_aes(modelled, ...), events = i_events )
modelled |
Proportion model estimates A dataframe containing the following columns:
Any grouping allowed. |
raw |
Raw count data A dataframe containing the following columns:
Any grouping allowed. |
... |
Named arguments passed on to
Named arguments passed on to
|
mapping |
a |
events |
Significant events or time spans A dataframe containing the following columns:
Any grouping allowed. A default value is defined. |
a ggplot object
if(interactive()) { plot_prevalence( ggoutbreak::england_ons_infection_survey, mapping = ggplot2::aes(colour=geography)) }
if(interactive()) { plot_prevalence( ggoutbreak::england_ons_infection_survey, mapping = ggplot2::aes(colour=geography)) }
Plot a proportions timeseries
plot_proportion( modelled = i_proportion_model, raw = i_proportion_data, ..., mapping = .check_for_aes(modelled, ...), events = i_events )
plot_proportion( modelled = i_proportion_model, raw = i_proportion_data, ..., mapping = .check_for_aes(modelled, ...), events = i_events )
modelled |
Proportion model estimates A dataframe containing the following columns:
Any grouping allowed. |
raw |
Raw count data A dataframe containing the following columns:
Any grouping allowed. |
... |
Named arguments passed on to
Named arguments passed on to
|
mapping |
a |
events |
Significant events or time spans A dataframe containing the following columns:
Any grouping allowed. A default value is defined. |
a ggplot object
tmp = ggoutbreak::england_covid %>% ggoutbreak::proportion_locfit_model(window=21) %>% dplyr::glimpse() if(interactive()) { plot_proportion(tmp)+ ggplot2::scale_fill_viridis_d(aesthetics = c("fill","colour")) }
tmp = ggoutbreak::england_covid %>% ggoutbreak::proportion_locfit_model(window=21) %>% dplyr::glimpse() if(interactive()) { plot_proportion(tmp)+ ggplot2::scale_fill_viridis_d(aesthetics = c("fill","colour")) }
Plot a raw case count proportion timeseries
plot_proportions( raw = i_proportion_data, ..., mapping = .check_for_aes(raw, ...), events = i_events )
plot_proportions( raw = i_proportion_data, ..., mapping = .check_for_aes(raw, ...), events = i_events )
raw |
The raw count and denominator data A dataframe containing the following columns:
Any grouping allowed. |
... |
Named arguments passed on to
|
mapping |
a |
events |
Significant events or time spans A dataframe containing the following columns:
Any grouping allowed. A default value is defined. |
a ggplot object
# example code tmp = tibble::tibble( time = as.time_period(1:10, "1 day"), count = 101:110 ) %>% dplyr::mutate( denom = count*time ) if(interactive()) { plot_proportions(tmp)+ggplot2::geom_line() }
# example code tmp = tibble::tibble( time = as.time_period(1:10, "1 day"), count = 101:110 ) %>% dplyr::mutate( denom = count*time ) if(interactive()) { plot_proportions(tmp)+ggplot2::geom_line() }
Reproduction number timeseries diagram
plot_rt( modelled = i_reproduction_number, ..., mapping = .check_for_aes(modelled, ...), events = i_events )
plot_rt( modelled = i_reproduction_number, ..., mapping = .check_for_aes(modelled, ...), events = i_events )
modelled |
the modelled Rt estimate A dataframe containing the following columns:
Any grouping allowed. |
... |
Named arguments passed on to
|
mapping |
a |
events |
Significant events or time spans A dataframe containing the following columns:
Any grouping allowed. A default value is defined. |
a ggplot timeseries
# example code tmp = ggoutbreak::england_covid %>% time_aggregate(count=sum(count)) if (interactive()) { tmp2 = tmp %>% poisson_locfit_model() %>% rt_from_growth_rate() # comparing RT from growth rates with England consensus Rt: plot_rt(tmp2,colour="blue")+ ggplot2::geom_errorbar( data=england_consensus_rt, mapping=ggplot2::aes(x=date-21,ymin=low,ymax=high), colour="red") }
# example code tmp = ggoutbreak::england_covid %>% time_aggregate(count=sum(count)) if (interactive()) { tmp2 = tmp %>% poisson_locfit_model() %>% rt_from_growth_rate() # comparing RT from growth rates with England consensus Rt: plot_rt(tmp2,colour="blue")+ ggplot2::geom_errorbar( data=england_consensus_rt, mapping=ggplot2::aes(x=date-21,ymin=low,ymax=high), colour="red") }
Density, distribution function, quantile function and random
generation for the negative binomial distribution with parameters
size
and prob
.
pnbinom2(q, mean, sd = sqrt(mean), lower.tail = TRUE, log.p = FALSE)
pnbinom2(q, mean, sd = sqrt(mean), lower.tail = TRUE, log.p = FALSE)
q |
vector of quantiles. |
mean |
the mean value on the true scale (vectorised) |
sd |
the standard deviation on the true scale (vectorised) |
lower.tail |
logical; if TRUE (default), probabilities are
|
log.p |
logical; if TRUE, probabilities p are given as log(p). |
The negative binomial distribution with size
and
prob
has density
for ,
and
.
This represents the number of failures which occur in a sequence of
Bernoulli trials before a target number of successes is reached.
The mean is and variance
.
A negative binomial distribution can also arise as a mixture of
Poisson distributions with mean distributed as a gamma distribution
(see pgamma
) with scale parameter (1 - prob)/prob
and shape parameter size
. (This definition allows non-integer
values of size
.)
An alternative parametrization (often used in ecology) is by the
mean mu
(see above), and size
, the dispersion
parameter, where prob
= size/(size+mu)
. The variance
is mu + mu^2/size
in this parametrization.
If an element of x
is not integer, the result of dnbinom
is zero, with a warning.
The case size == 0
is the distribution concentrated at zero.
This is the limiting distribution for size
approaching zero,
even if mu
rather than prob
is held constant. Notice
though, that the mean of the limit distribution is 0, whatever the
value of mu
.
The quantile is defined as the smallest value such that
, where
is the distribution function.
dnbinom
gives the density,
pnbinom
gives the distribution function,
qnbinom
gives the quantile function, and
rnbinom
generates random deviates.
Invalid size
or prob
will result in return value
NaN
, with a warning.
The length of the result is determined by n
for
rnbinom
, and is the maximum of the lengths of the
numerical arguments for the other functions.
The numerical arguments other than n
are recycled to the
length of the result. Only the first elements of the logical
arguments are used.
rnbinom
returns a vector of type integer unless generated
values exceed the maximum representable integer when double
values are returned.
dnbinom
computes via binomial probabilities, using code
contributed by Catherine Loader (see dbinom
).
pnbinom
uses pbeta
.
qnbinom
uses the Cornish–Fisher Expansion to include a skewness
correction to a normal approximation, followed by a search.
rnbinom
uses the derivation as a gamma mixture of Poisson
distributions, see
Devroye, L. (1986) Non-Uniform Random Variate Generation. Springer-Verlag, New York. Page 480.
Distributions for standard distributions, including
dbinom
for the binomial, dpois
for the
Poisson and dgeom
for the geometric distribution, which
is a special case of the negative binomial.
pnbinom2(0:5, 2, sqrt(2))
pnbinom2(0:5, 2, sqrt(2))
This uses a generalised linear model to fit a quasi-poisson model with a time
varying rate as a natural cubic spline with approx one degree of freedom per
window
units of the time series.
poisson_glm_model(d = i_incidence_input, ..., window = 14, frequency = "1 day")
poisson_glm_model(d = i_incidence_input, ..., window = 14, frequency = "1 day")
d |
Count model input A dataframe containing the following columns:
Ungrouped. |
... |
not used and present to allow proportion model to be used in a
|
window |
a number of data points defining the bandwidth of the estimate,
smaller values result in less smoothing, large value in more. The default
value of 14 is calibrated for data provided on a daily frequency, with
weekly data a lower value may be preferred. - (defaults to |
frequency |
the density of the output estimates as a time period such as
|
A dataframe containing the following columns:
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
incidence.fit (double) - an estimate of the incidence rate on a log scale
incidence.se.fit (positive_double) - the standard error of the incidence rate estimate on a log scale
incidence.0.025 (positive_double) - lower confidence limit of the incidence rate (true scale)
incidence.0.5 (positive_double) - median estimate of the incidence rate (true scale)
incidence.0.975 (positive_double) - upper confidence limit of the incidence rate (true scale)
Any grouping allowed.
ggoutbreak::england_covid %>% dplyr::filter(date < "2021-01-01") %>% time_aggregate(count=sum(count)) %>% ggoutbreak::poisson_glm_model(window=21) %>% dplyr::glimpse()
ggoutbreak::england_covid %>% dplyr::filter(date < "2021-01-01") %>% time_aggregate(count=sum(count)) %>% ggoutbreak::poisson_glm_model(window=21) %>% dplyr::glimpse()
Takes a list of times and counts and fits a quasi-poisson model
fitted with a log link function to count data using local regression
using the package locfit
.
poisson_locfit_model( d = i_incidence_input, ..., window = 14, deg = 1, frequency = "1 day", predict = TRUE )
poisson_locfit_model( d = i_incidence_input, ..., window = 14, deg = 1, frequency = "1 day", predict = TRUE )
d |
input data A dataframe containing the following columns:
Ungrouped. |
... |
not used and present to allow proportion model to be used in a
|
window |
a number of data points defining the bandwidth of the estimate,
smaller values result in less smoothing, large value in more. The default
value of 14 is calibrated for data provided on a daily frequency, with
weekly data a lower value may be preferred. - (defaults to |
deg |
polynomial degree (min 1) - higher degree results in less
smoothing, lower values result in more smoothing. A degree of 1 is fitting
a linear model piece wise. - (defaults to |
frequency |
the density of the output estimates as a time period such as
|
predict |
result is a prediction dataframe. If false we return the
|
This results is an incidence rate estimate plus an absolute exponential growth rate estimate both based on the time unit of the input data (e.g. for daily data the rate will be cases per day and the growth rate will be daily).
A dataframe containing the following columns:
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
incidence.fit (double) - an estimate of the incidence rate on a log scale
incidence.se.fit (positive_double) - the standard error of the incidence rate estimate on a log scale
incidence.0.025 (positive_double) - lower confidence limit of the incidence rate (true scale)
incidence.0.5 (positive_double) - median estimate of the incidence rate (true scale)
incidence.0.975 (positive_double) - upper confidence limit of the incidence rate (true scale)
growth.fit (double) - an estimate of the growth rate
growth.se.fit (positive_double) - the standard error the growth rate
growth.0.025 (double) - lower confidence limit of the growth rate
growth.0.5 (double) - median estimate of the growth rate
growth.0.975 (double) - upper confidence limit of the growth rate
Any grouping allowed.
withr::with_options(list("ggoutbreak.keep_cdf"=TRUE),{ ggoutbreak::england_covid %>% ggoutbreak::poisson_locfit_model(window=21) %>% dplyr::glimpse() })
withr::with_options(list("ggoutbreak.keep_cdf"=TRUE),{ ggoutbreak::england_covid %>% ggoutbreak::poisson_locfit_model(window=21) %>% dplyr::glimpse() })
This uses a generalised linear model to fit a quasi-binomial model with a time
varying rate as a natural cubic spline with approx one degree of freedom per
window
units of the time series.
proportion_glm_model( d = i_proportion_input, ..., window = 14, frequency = "1 day" )
proportion_glm_model( d = i_proportion_input, ..., window = 14, frequency = "1 day" )
d |
Proportion model input A dataframe containing the following columns:
Ungrouped. |
... |
not used and present to allow proportion model to be used in a
|
window |
a number of data points defining the bandwidth of the estimate,
smaller values result in less smoothing, large value in more. The default
value of 14 is calibrated for data provided on a daily frequency, with
weekly data a lower value may be preferred. - (defaults to |
frequency |
the density of the output estimates as a time period such as
|
A dataframe containing the following columns:
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
proportion.fit (double) - an estimate of the proportion on a logit scale
proportion.se.fit (positive_double) - the standard error of proportion estimate on a logit scale
proportion.0.025 (proportion) - lower confidence limit of proportion (true scale)
proportion.0.5 (proportion) - median estimate of proportion (true scale)
proportion.0.975 (proportion) - upper confidence limit of proportion (true scale)
Any grouping allowed.
ggoutbreak::england_covid_proportion %>% dplyr::filter(date < "2021-01-01") %>% time_aggregate() %>% ggoutbreak::proportion_glm_model(window=5) %>% dplyr::glimpse() # TODO: deal with error conditions # "observations with zero weight not used for calculating dispersion
ggoutbreak::england_covid_proportion %>% dplyr::filter(date < "2021-01-01") %>% time_aggregate() %>% ggoutbreak::proportion_glm_model(window=5) %>% dplyr::glimpse() # TODO: deal with error conditions # "observations with zero weight not used for calculating dispersion
takes a list of times, counts and a denominator and fits a quasi-binomial model
using a logit link function to proportion data using local regression
using the package locfit
.
proportion_locfit_model( d = i_proportion_input, ..., window = 14, deg = 1, frequency = "1 day", predict = TRUE )
proportion_locfit_model( d = i_proportion_input, ..., window = 14, deg = 1, frequency = "1 day", predict = TRUE )
d |
the input A dataframe containing the following columns:
Ungrouped. |
... |
not used and present to allow proportion model to be used in a
|
window |
a number of data points defining the bandwidth of the estimate,
smaller values result in less smoothing, large value in more. The default
value of 14 is calibrated for data provided on a daily frequency, with
weekly data a lower value may be preferred. - (defaults to |
deg |
polynomial degree (min 1) - higher degree results in less
smoothing, lower values result in more smoothing. A degree of 1 is fitting
a linear model piece wise. - (defaults to |
frequency |
the density of the output estimates as a time period such as
|
predict |
result is a prediction dataframe. If false we return the
|
This expects d to contain one combination of:
time
and count
and denom
columns - e.g. all tests conducted.
This results is a one versus others comparison binomial proportion estimate plus a relative growth rate estimate specifying how much quicker this is growing compared to the growth of the denominator.
The denominator maybe the sum of all subgroups denom = sum(count)
, e.g. in the situation where
there are multiple variants of a disease circulating. In which case the
relative growth is that of the subgroup compared to the overall. You can
make this a one-versus-others comparison by making the denominator exclude the
current item (e.g. denom = sum(count)-count
).
The denominator can also be used to express the size of the population tested. This gives us a relative growth rate that is different in essence to the previous and may be a better estimate of the true growth rate in the situation where testing effort is variable, or capacity saturated.
A dataframe containing the following columns:
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
proportion.fit (double) - an estimate of the proportion on a logit scale
proportion.se.fit (positive_double) - the standard error of proportion estimate on a logit scale
proportion.0.025 (proportion) - lower confidence limit of proportion (true scale)
proportion.0.5 (proportion) - median estimate of proportion (true scale)
proportion.0.975 (proportion) - upper confidence limit of proportion (true scale)
relative.growth.fit (double) - an estimate of the relative growth rate
relative.growth.se.fit (positive_double) - the standard error the relative growth rate
relative.growth.0.025 (double) - lower confidence limit of the relative growth rate
relative.growth.0.5 (double) - median estimate of the relative growth rate
relative.growth.0.975 (double) - upper confidence limit of the relative growth rate
Any grouping allowed.
ggoutbreak::england_covid_proportion %>% time_aggregate() %>% proportion_locfit_model(window=5, degree=2) %>% dplyr::glimpse()
ggoutbreak::england_covid_proportion %>% time_aggregate() %>% proportion_locfit_model(window=5, degree=2) %>% dplyr::glimpse()
The wedge distribution has a domain of 0 to 1 and has a linear probability density function over that domain.
pwedge(q, a, log.p = FALSE)
pwedge(q, a, log.p = FALSE)
q |
vector of quantiles |
a |
a gradient from -2 (left skewed) to 2 (right skewed) |
log.p |
logical; if TRUE, probabilities p are given as log(p). |
The rwedge
can be combined with quantile functions to
skew standard distributions, or introduce correlation or down weight
certain parts of the distribution.
a vector of probabilities, quantiles, densities or samples.
pwedge(seq(0,1,0.1), a=1) dwedge(seq(0,1,0.1), a=1) qwedge(c(0.25,0.5,0.75), a=-1) stats::cor( stats::qnorm(rwedge(1000, a=2)), stats::qnorm(rwedge(1000, a=-2)) )
pwedge(seq(0,1,0.1), a=1) dwedge(seq(0,1,0.1), a=1) qwedge(c(0.25,0.5,0.75), a=-1) stats::cor( stats::qnorm(rwedge(1000, a=2)), stats::qnorm(rwedge(1000, a=-2)) )
Density, distribution function, quantile function and random
generation for the Beta distribution with parameters shape1
and
shape2
(and optional non-centrality parameter ncp
).
qbeta2(p, prob, kappa, lower.tail = TRUE, log.p = FALSE)
qbeta2(p, prob, kappa, lower.tail = TRUE, log.p = FALSE)
p |
vector of probabilities |
prob |
the mean probability (vectorised) |
kappa |
a coefficient of variation. where 0 is no variability and 1 is maximally variability (vectorised) |
lower.tail |
logical; if TRUE (default), probabilities are |
log.p |
logical; if TRUE, probabilities p are given as log(p). |
dbeta
gives the density, pbeta
the distribution
function, qbeta
the quantile function, and rbeta
generates random deviates.
Invalid arguments will result in return value NaN
, with a warning.
The length of the result is determined by n
for
rbeta
, and is the maximum of the lengths of the
numerical arguments for the other functions.
The numerical arguments other than n
are recycled to the
length of the result. Only the first elements of the logical
arguments are used.
qbeta2(c(0.25,0.5,0.75), 0.5, 0.25)
qbeta2(c(0.25,0.5,0.75), 0.5, 0.25)
Density, distribution function, quantile function and random
generation for the Gamma distribution with parameters shape
and
scale
.
qgamma2( p, mean, sd = sqrt(mean), lower.tail = TRUE, log.p = FALSE, convex = TRUE )
qgamma2( p, mean, sd = sqrt(mean), lower.tail = TRUE, log.p = FALSE, convex = TRUE )
p |
vector of probabilities |
mean |
the mean value on the true scale (vectorised) |
sd |
the standard deviation on the true scale (vectorised) |
lower.tail |
logical; if TRUE (default), probabilities are |
log.p |
logical; if TRUE, probabilities p are given as log(p). |
convex |
Show a warning if the distribution selected is not a convex function |
dgamma
gives the density,
pgamma
gives the distribution function,
qgamma
gives the quantile function, and
rgamma
generates random deviates.
Invalid arguments will result in return value NaN
, with a warning.
The length of the result is determined by n
for
rgamma
, and is the maximum of the lengths of the
numerical arguments for the other functions.
The numerical arguments other than n
are recycled to the
length of the result. Only the first elements of the logical
arguments are used.
qgamma2(c(0.25,0.5,0.75), 2, 1)
qgamma2(c(0.25,0.5,0.75), 2, 1)
Density, distribution function, quantile function and random
generation for the log normal distribution whose logarithm has mean
equal to meanlog
and standard deviation equal to sdlog
.
qlnorm2(p, mean = 1, sd = sqrt(exp(1) - 1), lower.tail = TRUE, log.p = FALSE)
qlnorm2(p, mean = 1, sd = sqrt(exp(1) - 1), lower.tail = TRUE, log.p = FALSE)
p |
vector of probabilities. |
mean |
the mean value on the true scale (vectorised) |
sd |
the standard deviation on the true scale (vectorised) |
lower.tail |
logical; if TRUE (default), probabilities are
|
log.p |
logical; if TRUE, probabilities p are given as log(p). |
The log normal distribution has density
where and
are the mean and standard
deviation of the logarithm.
The mean is
,
the median is
, and the variance
and hence the coefficient of variation is
which is
approximately
when that is small (e.g.,
).
dlnorm
gives the density,
plnorm
gives the distribution function,
qlnorm
gives the quantile function, and
rlnorm
generates random deviates.
The length of the result is determined by n
for
rlnorm
, and is the maximum of the lengths of the
numerical arguments for the other functions.
The numerical arguments other than n
are recycled to the
length of the result. Only the first elements of the logical
arguments are used.
The cumulative hazard
is
-plnorm(t, r, lower = FALSE, log = TRUE)
.
dlnorm
is calculated from the definition (in ‘Details’).
[pqr]lnorm
are based on the relationship to the normal.
Consequently, they model a single point mass at exp(meanlog)
for the boundary case sdlog = 0
.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.
Distributions for other standard distributions, including
dnorm
for the normal distribution.
qlnorm2(c(0.25,0.5,0.72), 2, 1)
qlnorm2(c(0.25,0.5,0.72), 2, 1)
Density, distribution function, quantile function and random
generation for the negative binomial distribution with parameters
size
and prob
.
qnbinom2(p, mean, sd = sqrt(mean), lower.tail = TRUE, log.p = FALSE)
qnbinom2(p, mean, sd = sqrt(mean), lower.tail = TRUE, log.p = FALSE)
p |
vector of probabilities. |
mean |
the mean value on the true scale (vectorised) |
sd |
the standard deviation on the true scale (vectorised) |
lower.tail |
logical; if TRUE (default), probabilities are
|
log.p |
logical; if TRUE, probabilities p are given as log(p). |
The negative binomial distribution with size
and
prob
has density
for ,
and
.
This represents the number of failures which occur in a sequence of
Bernoulli trials before a target number of successes is reached.
The mean is and variance
.
A negative binomial distribution can also arise as a mixture of
Poisson distributions with mean distributed as a gamma distribution
(see pgamma
) with scale parameter (1 - prob)/prob
and shape parameter size
. (This definition allows non-integer
values of size
.)
An alternative parametrization (often used in ecology) is by the
mean mu
(see above), and size
, the dispersion
parameter, where prob
= size/(size+mu)
. The variance
is mu + mu^2/size
in this parametrization.
If an element of x
is not integer, the result of dnbinom
is zero, with a warning.
The case size == 0
is the distribution concentrated at zero.
This is the limiting distribution for size
approaching zero,
even if mu
rather than prob
is held constant. Notice
though, that the mean of the limit distribution is 0, whatever the
value of mu
.
The quantile is defined as the smallest value such that
, where
is the distribution function.
dnbinom
gives the density,
pnbinom
gives the distribution function,
qnbinom
gives the quantile function, and
rnbinom
generates random deviates.
Invalid size
or prob
will result in return value
NaN
, with a warning.
The length of the result is determined by n
for
rnbinom
, and is the maximum of the lengths of the
numerical arguments for the other functions.
The numerical arguments other than n
are recycled to the
length of the result. Only the first elements of the logical
arguments are used.
rnbinom
returns a vector of type integer unless generated
values exceed the maximum representable integer when double
values are returned.
dnbinom
computes via binomial probabilities, using code
contributed by Catherine Loader (see dbinom
).
pnbinom
uses pbeta
.
qnbinom
uses the Cornish–Fisher Expansion to include a skewness
correction to a normal approximation, followed by a search.
rnbinom
uses the derivation as a gamma mixture of Poisson
distributions, see
Devroye, L. (1986) Non-Uniform Random Variate Generation. Springer-Verlag, New York. Page 480.
Distributions for standard distributions, including
dbinom
for the binomial, dpois
for the
Poisson and dgeom
for the geometric distribution, which
is a special case of the negative binomial.
qnbinom2(c(0.25,0.5,0.75), 5, sqrt(5))
qnbinom2(c(0.25,0.5,0.75), 5, sqrt(5))
A specific parameter or set of parameters can be estimated by a pipeline
.
This function applies the pipeline
to a synthetic epidemic with sawtooth
incidence resulting from a stepped growth rate function. The lag between
synthetic input and estimate is assessed by minimising the root mean square
error of input and estimated based on different lag offsets.
quantify_lag(pipeline, ip = i_empirical_ip, lags = -10:30)
quantify_lag(pipeline, ip = i_empirical_ip, lags = -10:30)
pipeline |
a function taking an input dataset and an infectivity profile as
inputs and producing an estimate as output. This is the whole parametrised
pipeline including any other inputs. This can be a |
ip |
the infectivity profile. A dataframe containing the following columns:
Must be grouped by: boot (exactly). A default value is defined. |
lags |
a vector with the delays to test. Defaults to -10 to +30 days |
a lag analysis dataframe containing the estimate
type and the lag
in days that the estimate is behind the actual observation
pipeline = ~ .x %>% poisson_locfit_model() %>% rt_from_incidence(ip = .y) lag_analysis = quantify_lag(pipeline) quantify_lag(~ rt_epiestim(.x,ip = .y))
pipeline = ~ .x %>% poisson_locfit_model() %>% rt_from_incidence(ip = .y) lag_analysis = quantify_lag(pipeline) quantify_lag(~ rt_epiestim(.x,ip = .y))
Extract the weekday, month or quarter, or the Julian time (days since some origin). These are generic functions: the methods for the internal date-time classes are documented here.
## S3 method for class 'time_period' quarters(x, ...)
## S3 method for class 'time_period' quarters(x, ...)
x |
an object inheriting from class |
... |
arguments for other methods. |
weekdays
and months
return a character
vector of names in the locale in use, i.e., Sys.getlocale("LC_TIME")
.
quarters
returns a character vector of "Q1"
to
"Q4"
.
julian
returns the number of days (possibly fractional)
since the origin, with the origin as a "origin"
attribute.
All time calculations in R are done ignoring leap-seconds.
Other components such as the day of the month or the year are
very easy to compute: just use as.POSIXlt
and extract
the relevant component. Alternatively (especially if the components
are desired as character strings), use strftime
.
DateTimeClasses
, Date
;
Sys.getlocale("LC_TIME")
crucially for months()
and weekdays()
.
## first two are locale dependent: weekdays(.leap.seconds) months (.leap.seconds) quarters(.leap.seconds) ## Show how easily you get month, day, year, day (of {month, week, yr}), ... : ## (remember to count from 0 (!): mon = 0..11, wday = 0..6, etc !!) ##' Transform (Time-)Date vector to convenient data frame : dt2df <- function(dt, dName = deparse(substitute(dt))) { DF <- as.data.frame(unclass(as.POSIXlt( dt ))) `names<-`(cbind(dt, DF, deparse.level=0L), c(dName, names(DF))) } ## e.g., dt2df(.leap.seconds) # date+time dt2df(Sys.Date() + 0:9) # date ##' Even simpler: Date -> Matrix - dropping time info {sec,min,hour, isdst} d2mat <- function(x) simplify2array(unclass(as.POSIXlt(x))[4:7]) ## e.g., d2mat(seq(as.Date("2000-02-02"), by=1, length.out=30)) # has R 1.0.0's release date ## Julian Day Number (JDN, https://en.wikipedia.org/wiki/Julian_day) ## is the number of days since noon UTC on the first day of 4317 BCE. ## in the proleptic Julian calendar. To more recently, in ## 'Terrestrial Time' which differs from UTC by a few seconds ## See https://en.wikipedia.org/wiki/Terrestrial_Time julian(Sys.Date(), -2440588) # from a day floor(as.numeric(julian(Sys.time())) + 2440587.5) # from a date-time
## first two are locale dependent: weekdays(.leap.seconds) months (.leap.seconds) quarters(.leap.seconds) ## Show how easily you get month, day, year, day (of {month, week, yr}), ... : ## (remember to count from 0 (!): mon = 0..11, wday = 0..6, etc !!) ##' Transform (Time-)Date vector to convenient data frame : dt2df <- function(dt, dName = deparse(substitute(dt))) { DF <- as.data.frame(unclass(as.POSIXlt( dt ))) `names<-`(cbind(dt, DF, deparse.level=0L), c(dName, names(DF))) } ## e.g., dt2df(.leap.seconds) # date+time dt2df(Sys.Date() + 0:9) # date ##' Even simpler: Date -> Matrix - dropping time info {sec,min,hour, isdst} d2mat <- function(x) simplify2array(unclass(as.POSIXlt(x))[4:7]) ## e.g., d2mat(seq(as.Date("2000-02-02"), by=1, length.out=30)) # has R 1.0.0's release date ## Julian Day Number (JDN, https://en.wikipedia.org/wiki/Julian_day) ## is the number of days since noon UTC on the first day of 4317 BCE. ## in the proleptic Julian calendar. To more recently, in ## 'Terrestrial Time' which differs from UTC by a few seconds ## See https://en.wikipedia.org/wiki/Terrestrial_Time julian(Sys.Date(), -2440588) # from a day floor(as.numeric(julian(Sys.time())) + 2440587.5) # from a date-time
The wedge distribution has a domain of 0 to 1 and has a linear probability density function over that domain.
qwedge(p, a, lower.tail = TRUE, log.p = FALSE)
qwedge(p, a, lower.tail = TRUE, log.p = FALSE)
p |
vector of probabilities |
a |
a gradient from -2 (left skewed) to 2 (right skewed) |
lower.tail |
logical; if TRUE (default), probabilities are |
log.p |
logical; if TRUE, probabilities p are given as log(p). |
The rwedge
can be combined with quantile functions to
skew standard distributions, or introduce correlation or down weight
certain parts of the distribution.
a vector of probabilities, quantiles, densities or samples.
pwedge(seq(0,1,0.1), a=1) dwedge(seq(0,1,0.1), a=1) qwedge(c(0.25,0.5,0.75), a=-1) stats::cor( stats::qnorm(rwedge(1000, a=2)), stats::qnorm(rwedge(1000, a=-2)) )
pwedge(seq(0,1,0.1), a=1) dwedge(seq(0,1,0.1), a=1) qwedge(c(0.25,0.5,0.75), a=-1) stats::cor( stats::qnorm(rwedge(1000, a=2)), stats::qnorm(rwedge(1000, a=-2)) )
A random Bernoulli sample as a logical value
rbern(n, prob)
rbern(n, prob)
n |
number of observations |
prob |
the mean probability (vectorised) |
a vector of logical values of size n
table(rbern(100, 0.25))
table(rbern(100, 0.25))
Density, distribution function, quantile function and random
generation for the Beta distribution with parameters shape1
and
shape2
(and optional non-centrality parameter ncp
).
rbeta2(n, prob, kappa)
rbeta2(n, prob, kappa)
n |
number of observations |
prob |
the mean probability (vectorised) |
kappa |
a coefficient of variation. where 0 is no variability and 1 is maximally variability (vectorised) |
dbeta
gives the density, pbeta
the distribution
function, qbeta
the quantile function, and rbeta
generates random deviates.
Invalid arguments will result in return value NaN
, with a warning.
The length of the result is determined by n
for
rbeta
, and is the maximum of the lengths of the
numerical arguments for the other functions.
The numerical arguments other than n
are recycled to the
length of the result. Only the first elements of the logical
arguments are used.
rbeta2(3, c(0.1,0.5,0.9),0.1)
rbeta2(3, c(0.1,0.5,0.9),0.1)
Sampling from the multinomial equivalent of the Bernoulli distribution
rcategorical(n, prob, factor = FALSE)
rcategorical(n, prob, factor = FALSE)
n |
a sample size |
prob |
a (optionally named) vector of probabilities that will be normalised to sum to 1 |
factor |
if not FALSE then factor levels will either be taken from the
names of |
a vector of random class labels of length n
. Labels come from names
of prob
or from a character vector in factor
.
prob = c("one"=0.1,"two"=0.2,"seven"=0.7) table(rcategorical(1000,prob)) rcategorical(10,prob,factor=TRUE) rcategorical(10,rep(1,26),factor=letters)
prob = c("one"=0.1,"two"=0.2,"seven"=0.7) table(rcategorical(1000,prob)) rcategorical(10,prob,factor=TRUE) rcategorical(10,rep(1,26),factor=letters)
For count data that is under-dispersed we can use a gamma distribution rounded
to the nearest whole number. This is the same as the method of discretisation
in make_gamma_ip
, and suits delay distributions where there is less variability
than can be represented with a Poisson or negative binomial distribution.
rdiscgamma(n, mean, sd, kappa)
rdiscgamma(n, mean, sd, kappa)
n |
number of observations |
mean |
the mean value on the true scale (vectorised) |
sd |
the standard deviation on the true scale (vectorised) |
kappa |
a coefficient of variation. where 0 is no variability and 1 is maximally variability (vectorised) |
an integer valued vector from a gamma distribution.
rdiscgamma(10, 2, 1)
rdiscgamma(10, 2, 1)
e.g. age banded population, or a discrete probability distribution e.g. a serial interval distribution. This method fits a monotonically increasing spline to the cumulative distribution (including the upper and lower limits) and interpolating using that spline to the new cut points.
reband_discrete( x, y, xout, xlim = c(0, NA), ytotal = c(0, sum(y)), digits = 0, labelling = c("positive_integer", "inclusive", "exclusive"), sep = "-" )
reband_discrete( x, y, xout, xlim = c(0, NA), ytotal = c(0, sum(y)), digits = 0, labelling = c("positive_integer", "inclusive", "exclusive"), sep = "-" )
x |
a set of upper limits of bands, e.g. for age: 0-14;15-64;65-79;80+ is 15,65,80,NA |
y |
a set of quantities for each band e.g. population figures |
xout |
a set of new upper limits |
xlim |
Upper and lower limits for x. if the last band is e.g 80+ in the input and we want to know the 85+ band in the output some kind of maximum upper limit is needed to interpolate to. |
ytotal |
upper and lower limits for y. If the interpolation values fall outside of
x then the minimum and maximum limits of y are given by this. This would be |
digits |
if the |
labelling |
are the |
sep |
separator for names e.g. |
a re-banded set of discrete values, guaranteed to sum to the same as y
ul = stringr::str_extract(england_demographics$class, "_([0-9]+)",group = 1) %>% as.numeric() tmp = reband_discrete( ul, england_demographics$population, c(5,10,15,40,80), xlim=c(0,120)) tmp sum(tmp) sum(england_demographics$population)
ul = stringr::str_extract(england_demographics$class, "_([0-9]+)",group = 1) %>% as.numeric() tmp = reband_discrete( ul, england_demographics$population, c(5,10,15,40,80), xlim=c(0,120)) tmp sum(tmp) sum(england_demographics$population)
Re-parametrised distributions
n |
number of observations |
x |
vector of quantiles |
q |
vector of quantiles |
p |
vector of probabilities |
log |
logical; if TRUE, probabilities p are given as log(p). |
log.p |
logical; if TRUE, probabilities p are given as log(p). |
lower.tail |
logical; if TRUE (default), probabilities are |
convex |
Show a warning if the distribution selected is not a convex function |
prob |
the mean probability (vectorised) |
kappa |
a coefficient of variation. where 0 is no variability and 1 is maximally variability (vectorised) |
mean |
the mean value on the true scale (vectorised) |
sd |
the standard deviation on the true scale (vectorised) |
Sometimes we may have, for example, modelled incidence or growth rates on weekly data resulting in cases per week and growth rate per week. We may wish to use this to estimate the reproduction number, using algorithms that assume a daily incidence. Not everything has a dependence on time, and things such as proportions, or prevalence will not change.
rescale_model(df = i_timeseries, time_unit)
rescale_model(df = i_timeseries, time_unit)
df |
A data frame containing modelled output. This will modify the following columns if present: A dataframe containing the following columns:
Any grouping allowed. A dataframe containing the following columns:
Any grouping allowed. A dataframe containing the following columns:
Any grouping allowed. |
time_unit |
a |
the same time series with different time unit, and adjusted incidence and growth rate figures.
sim = sim_poisson_model(time_unit = "1 week") incidence = sim %>% poisson_locfit_model(frequency = "1 day", deg = 2, window=5) incidence2 = incidence %>% rescale_model(time_unit = "1 day") incidence2 %>% dplyr::glimpse()
sim = sim_poisson_model(time_unit = "1 week") incidence = sim %>% poisson_locfit_model(frequency = "1 day", deg = 2, window=5) incidence2 = incidence %>% rescale_model(time_unit = "1 day") incidence2 %>% dplyr::glimpse()
Density, distribution function, quantile function and random
generation for the Gamma distribution with parameters shape
and
scale
.
rgamma2(n, mean, sd = sqrt(mean), convex = TRUE)
rgamma2(n, mean, sd = sqrt(mean), convex = TRUE)
n |
number of observations |
mean |
the mean value on the true scale (vectorised) |
sd |
the standard deviation on the true scale (vectorised) |
convex |
Show a warning if the distribution selected is not a convex function |
dgamma
gives the density,
pgamma
gives the distribution function,
qgamma
gives the quantile function, and
rgamma
generates random deviates.
Invalid arguments will result in return value NaN
, with a warning.
The length of the result is determined by n
for
rgamma
, and is the maximum of the lengths of the
numerical arguments for the other functions.
The numerical arguments other than n
are recycled to the
length of the result. Only the first elements of the logical
arguments are used.
rgamma2(10, 2, 1)
rgamma2(10, 2, 1)
Density, distribution function, quantile function and random
generation for the log normal distribution whose logarithm has mean
equal to meanlog
and standard deviation equal to sdlog
.
rlnorm2(n, mean = 1, sd = sqrt(exp(1) - 1))
rlnorm2(n, mean = 1, sd = sqrt(exp(1) - 1))
n |
number of observations. If |
mean |
the mean value on the true scale (vectorised) |
sd |
the standard deviation on the true scale (vectorised) |
The log normal distribution has density
where and
are the mean and standard
deviation of the logarithm.
The mean is
,
the median is
, and the variance
and hence the coefficient of variation is
which is
approximately
when that is small (e.g.,
).
dlnorm
gives the density,
plnorm
gives the distribution function,
qlnorm
gives the quantile function, and
rlnorm
generates random deviates.
The length of the result is determined by n
for
rlnorm
, and is the maximum of the lengths of the
numerical arguments for the other functions.
The numerical arguments other than n
are recycled to the
length of the result. Only the first elements of the logical
arguments are used.
The cumulative hazard
is
-plnorm(t, r, lower = FALSE, log = TRUE)
.
dlnorm
is calculated from the definition (in ‘Details’).
[pqr]lnorm
are based on the relationship to the normal.
Consequently, they model a single point mass at exp(meanlog)
for the boundary case sdlog = 0
.
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.
Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 14. Wiley, New York.
Distributions for other standard distributions, including
dnorm
for the normal distribution.
rlnorm2(10, 2, 1)
rlnorm2(10, 2, 1)
Density, distribution function, quantile function and random
generation for the negative binomial distribution with parameters
size
and prob
.
rnbinom2(n, mean, sd = sqrt(mean))
rnbinom2(n, mean, sd = sqrt(mean))
n |
number of observations. If |
mean |
the mean value on the true scale (vectorised) |
sd |
the standard deviation on the true scale (vectorised) |
The negative binomial distribution with size
and
prob
has density
for ,
and
.
This represents the number of failures which occur in a sequence of
Bernoulli trials before a target number of successes is reached.
The mean is and variance
.
A negative binomial distribution can also arise as a mixture of
Poisson distributions with mean distributed as a gamma distribution
(see pgamma
) with scale parameter (1 - prob)/prob
and shape parameter size
. (This definition allows non-integer
values of size
.)
An alternative parametrization (often used in ecology) is by the
mean mu
(see above), and size
, the dispersion
parameter, where prob
= size/(size+mu)
. The variance
is mu + mu^2/size
in this parametrization.
If an element of x
is not integer, the result of dnbinom
is zero, with a warning.
The case size == 0
is the distribution concentrated at zero.
This is the limiting distribution for size
approaching zero,
even if mu
rather than prob
is held constant. Notice
though, that the mean of the limit distribution is 0, whatever the
value of mu
.
The quantile is defined as the smallest value such that
, where
is the distribution function.
dnbinom
gives the density,
pnbinom
gives the distribution function,
qnbinom
gives the quantile function, and
rnbinom
generates random deviates.
Invalid size
or prob
will result in return value
NaN
, with a warning.
The length of the result is determined by n
for
rnbinom
, and is the maximum of the lengths of the
numerical arguments for the other functions.
The numerical arguments other than n
are recycled to the
length of the result. Only the first elements of the logical
arguments are used.
rnbinom
returns a vector of type integer unless generated
values exceed the maximum representable integer when double
values are returned.
dnbinom
computes via binomial probabilities, using code
contributed by Catherine Loader (see dbinom
).
pnbinom
uses pbeta
.
qnbinom
uses the Cornish–Fisher Expansion to include a skewness
correction to a normal approximation, followed by a search.
rnbinom
uses the derivation as a gamma mixture of Poisson
distributions, see
Devroye, L. (1986) Non-Uniform Random Variate Generation. Springer-Verlag, New York. Page 480.
Distributions for standard distributions, including
dbinom
for the binomial, dpois
for the
Poisson and dgeom
for the geometric distribution, which
is a special case of the negative binomial.
rnbinom2(10, 5, sqrt(5))
rnbinom2(10, 5, sqrt(5))
Calculate a reproduction number estimate from incidence data using a reimplementation
of the Cori method and an empirical generation time distribution. This uses
a mixture distribution to transmit uncertainty in generation time estimates.
A number of changes in the implementation are made. Firstly there is no technical
limitation to the infectivity profile being strictly positive in time.
Secondly this implementation should tolerate missing count values (NA values
must be filtered out though). Thirdly for a given time point t
this applies
all Rt
estimates for which the window spans time point t
rather than end
on time point t
, and fourthly this implementation allows multiple windows
widths to be calculated in parallel and aggregated. All of this tends to
increase uncertainty in the result particularly in the time dimension, which
addresses some of the issue seem with EpiEstim
during the pandemic. Finally
it is quite a bit quicker, especially if approximate quantiles are all that
are needed.
rt_cori( df = i_incidence_input, ip = i_discrete_ip, window = 14, mean_prior = 1, std_prior = 2, ..., epiestim_compat = FALSE, approx = FALSE )
rt_cori( df = i_incidence_input, ip = i_discrete_ip, window = 14, mean_prior = 1, std_prior = 2, ..., epiestim_compat = FALSE, approx = FALSE )
df |
The count data. Extra groups are allowed. A dataframe containing the following columns:
Ungrouped. |
ip |
A long format infectivity profile. A dataframe containing the following columns:
Must be grouped by: boot (exactly). A default value is defined. |
window |
|
mean_prior |
the prior for the $R_t$ estimate. When sample size is low the
$R_t$ estimate will revert to this prior. In |
std_prior |
the prior for the $R_t$ SD. |
... |
not used |
epiestim_compat |
produce an estimate of |
approx |
approximate the quantiles of the mixture distribution with a gamma distribution with the same first mean and SD. |
This will calculate a reproduction number for each group in the input dataframe.
A dataframe containing the following columns:
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
rt.fit (double) - an estimate of the reproduction number
rt.se.fit (positive_double) - the standard error of the reproduction number
rt.0.025 (double) - lower confidence limit of the reproduction number
rt.0.5 (double) - median estimate of the reproduction number
rt.0.975 (double) - upper confidence limit of the reproduction number
Any grouping allowed.
#TODO: speed up example tmp = ggoutbreak::england_covid %>% dplyr::filter(date < "2021-01-01") %>% time_aggregate(count=sum(count)) tmp2 = tmp %>% rt_cori(epiestim_compat = TRUE) tmp3 = tmp %>% rt_cori(window=c(5:14), approx=TRUE) comp = dplyr::bind_rows( tmp2 %>% dplyr::mutate(class = "EpiEstim"), tmp3 %>% dplyr::mutate(class = "Cori+") ) %>% dplyr::group_by(class) if (interactive()) { plot_rt(comp)+ ggplot2::geom_errorbar( data=england_consensus_rt %>% dplyr::filter(date < "2021-01-01"), mapping=ggplot2::aes(x=date,ymin=low,ymax=high),colour="black")+ ggplot2::coord_cartesian(ylim=c(0.5,1.75)) }
#TODO: speed up example tmp = ggoutbreak::england_covid %>% dplyr::filter(date < "2021-01-01") %>% time_aggregate(count=sum(count)) tmp2 = tmp %>% rt_cori(epiestim_compat = TRUE) tmp3 = tmp %>% rt_cori(window=c(5:14), approx=TRUE) comp = dplyr::bind_rows( tmp2 %>% dplyr::mutate(class = "EpiEstim"), tmp3 %>% dplyr::mutate(class = "Cori+") ) %>% dplyr::group_by(class) if (interactive()) { plot_rt(comp)+ ggplot2::geom_errorbar( data=england_consensus_rt %>% dplyr::filter(date < "2021-01-01"), mapping=ggplot2::aes(x=date,ymin=low,ymax=high),colour="black")+ ggplot2::coord_cartesian(ylim=c(0.5,1.75)) }
EpiEstim
reproduction numberCalculate a reproduction number estimate from incidence data using the
EpiEstim
library and an empirical generation time distribution. This uses
resampling to transmit uncertainty in generation time estimates. This is
quite slow for each time series depending on the number of bootstraps and
samples in the infectivity profile.
rt_epiestim( df = i_incidence_input, ip = i_discrete_ip, bootstraps = 2000, window = 14, mean_prior = 1, std_prior = 2, ... )
rt_epiestim( df = i_incidence_input, ip = i_discrete_ip, bootstraps = 2000, window = 14, mean_prior = 1, std_prior = 2, ... )
df |
Count data. Extra groups are allowed. A dataframe containing the following columns:
Ungrouped. |
ip |
infectivity profile A dataframe containing the following columns:
Must be grouped by: boot (exactly). A default value is defined. |
bootstraps |
|
window |
|
mean_prior |
the prior for the $R_t$ estimate. When sample size is low the
$R_t$ estimate will revert to this prior. In |
std_prior |
the prior for the $R_t$ SD. |
... |
not used |
This will calculate a reproduction number for each group in the input dataframe.
A dataframe containing the following columns:
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
rt.fit (double) - an estimate of the reproduction number
rt.se.fit (positive_double) - the standard error of the reproduction number
rt.0.025 (double) - lower confidence limit of the reproduction number
rt.0.5 (double) - median estimate of the reproduction number
rt.0.975 (double) - upper confidence limit of the reproduction number
Any grouping allowed.
tmp = ggoutbreak::england_covid %>% time_aggregate(count=sum(count)) if (interactive()) { # not run due to long running tmp2 = tmp %>% rt_epiestim() }
tmp = ggoutbreak::england_covid %>% time_aggregate(count=sum(count)) if (interactive()) { # not run due to long running tmp2 = tmp %>% rt_epiestim() }
Calculate a reproduction number estimate from growth rate using the Wallinga 2007 estimation using empirical generation time distribution. This uses resampling to transmit uncertainty in growth rate estimates. This also handles time-series that are not on a daily cadence (although this is experimental). The reproduction number estimate is neither a instantaneous (backward looking) nor case (forward looking) reproduction number but somewhere between the two.
rt_from_growth_rate( df = i_growth_rate, ip = i_empirical_ip, bootstraps = 1000, seed = Sys.time() )
rt_from_growth_rate( df = i_growth_rate, ip = i_empirical_ip, bootstraps = 1000, seed = Sys.time() )
df |
Growth rate estimates A dataframe containing the following columns:
Any grouping allowed. |
ip |
Infectivity profile A dataframe containing the following columns:
Must be grouped by: boot (exactly). A default value is defined. |
bootstraps |
|
seed |
a random number generator seed |
A dataframe containing the following columns:
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
rt.fit (double) - an estimate of the reproduction number
rt.se.fit (positive_double) - the standard error of the reproduction number
rt.0.025 (double) - lower confidence limit of the reproduction number
rt.0.5 (double) - median estimate of the reproduction number
rt.0.975 (double) - upper confidence limit of the reproduction number
Any grouping allowed.
tmp = ggoutbreak::england_covid %>% time_aggregate(count=sum(count)) if (interactive()) { # not run withr::with_options(list("ggoutbreak.keep_cdf"=TRUE),{ tmp2 = tmp %>% poisson_locfit_model() %>% rt_from_growth_rate() }) }
tmp = ggoutbreak::england_covid %>% time_aggregate(count=sum(count)) if (interactive()) { # not run withr::with_options(list("ggoutbreak.keep_cdf"=TRUE),{ tmp2 = tmp %>% poisson_locfit_model() %>% rt_from_growth_rate() }) }
Calculate a reproduction number estimate from modelled incidence using the
methods described in the vignette "Estimating the reproduction number from
modelled incidence" and using a set of empirical generation time
distributions. This assumes that modelled incidence has the same time unit as
the ip
distribution, and that this is daily, if this is not the case then
rescale_model()
may be able to fix it.
rt_from_incidence(df = i_incidence_model, ip = i_discrete_ip, approx = FALSE)
rt_from_incidence(df = i_incidence_model, ip = i_discrete_ip, approx = FALSE)
df |
modelled incidence estimate A dataframe containing the following columns:
Any grouping allowed. |
ip |
an infectivity profile (aka generation time distribution) A dataframe containing the following columns:
Must be grouped by: boot (exactly). A default value is defined. |
approx |
use a faster, but approximate, estimate of quantiles |
A dataframe containing the following columns:
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
rt.fit (double) - an estimate of the reproduction number
rt.se.fit (positive_double) - the standard error of the reproduction number
rt.0.025 (double) - lower confidence limit of the reproduction number
rt.0.5 (double) - median estimate of the reproduction number
rt.0.975 (double) - upper confidence limit of the reproduction number
Any grouping allowed.
df = ggoutbreak::england_covid %>% dplyr::filter(date < "2021-01-01") %>% time_aggregate(count=sum(count)) %>% poisson_locfit_model() if (interactive()) { # not run withr::with_options(list("ggoutbreak.keep_cdf"=TRUE),{ tmp3 = df %>% rt_from_incidence(ganyani_ip) }) } tmp = df %>% rt_from_incidence(du_serial_interval_ip)
df = ggoutbreak::england_covid %>% dplyr::filter(date < "2021-01-01") %>% time_aggregate(count=sum(count)) %>% poisson_locfit_model() if (interactive()) { # not run withr::with_options(list("ggoutbreak.keep_cdf"=TRUE),{ tmp3 = df %>% rt_from_incidence(ganyani_ip) }) } tmp = df %>% rt_from_incidence(du_serial_interval_ip)
Calculate a reproduction number estimate from modelled incidence estimates, by statistical sampling from a log-normally distributed incidence estimate, combined with uncertain infectivity profile specified as multiple discrete empirical distributions.
rt_from_renewal( df = i_incidence_model, ip = i_discrete_ip, bootstraps = 1000, seed = Sys.time() )
rt_from_renewal( df = i_incidence_model, ip = i_discrete_ip, bootstraps = 1000, seed = Sys.time() )
df |
modelled incidence estimate A dataframe containing the following columns:
Any grouping allowed. |
ip |
infectivity profile A dataframe containing the following columns:
Must be grouped by: boot (exactly). A default value is defined. |
bootstraps |
the number of samples to take at each time point. This will be rounded up to a whole multiple of the infectivity profile distribution length. |
seed |
a random number seed for reproducibility |
A dataframe containing the following columns:
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
rt.fit (double) - an estimate of the reproduction number
rt.se.fit (positive_double) - the standard error of the reproduction number
rt.0.025 (double) - lower confidence limit of the reproduction number
rt.0.5 (double) - median estimate of the reproduction number
rt.0.975 (double) - upper confidence limit of the reproduction number
Any grouping allowed.
df = ggoutbreak::england_covid %>% time_aggregate(count=sum(count)) %>% poisson_locfit_model() if (interactive()) { # not run withr::with_options(list("ggoutbreak.keep_cdf"=TRUE),{ tmp2 = df %>% rt_from_renewal(ganyani_ip) }) }
df = ggoutbreak::england_covid %>% time_aggregate(count=sum(count)) %>% poisson_locfit_model() if (interactive()) { # not run withr::with_options(list("ggoutbreak.keep_cdf"=TRUE),{ tmp2 = df %>% rt_from_renewal(ganyani_ip) }) }
The wedge distribution has a domain of 0 to 1 and has a linear probability density function over that domain.
rwedge(n, a)
rwedge(n, a)
n |
number of observations |
a |
a gradient from -2 (left skewed) to 2 (right skewed) |
The rwedge
can be combined with quantile functions to
skew standard distributions, or introduce correlation or down weight
certain parts of the distribution.
a vector of probabilities, quantiles, densities or samples.
pwedge(seq(0,1,0.1), a=1) dwedge(seq(0,1,0.1), a=1) qwedge(c(0.25,0.5,0.75), a=-1) stats::cor( stats::qnorm(rwedge(1000, a=2)), stats::qnorm(rwedge(1000, a=-2)) )
pwedge(seq(0,1,0.1), a=1) dwedge(seq(0,1,0.1), a=1) qwedge(c(0.25,0.5,0.75), a=-1) stats::cor( stats::qnorm(rwedge(1000, a=2)), stats::qnorm(rwedge(1000, a=-2)) )
A log1p y scale
scale_y_log1p(..., n = 5, base = 10, dp = 0)
scale_y_log1p(..., n = 5, base = 10, dp = 0)
... |
Other arguments passed on to |
n |
the number of major breaks |
base |
the base for the logarithm |
dp |
decimal points |
a ggplot scale
A logit y scale
scale_y_logit(...)
scale_y_logit(...)
... |
Other arguments passed on to |
a ggplot scale
scoringutils
.This performs a range of quantile based, and if cumulative distribution functions are available, continuous scoring metrics for each estimate time-point.
score_estimate(est, obs, lags = NULL)
score_estimate(est, obs, lags = NULL)
est |
a dataframe of estimates of incidence, growth rate of reproduction number based off a simulation or data with known parameters. |
obs |
a dataframe of the ground truth, sharing the same grouping as
|
lags |
a data frame of estimate types and lags as output by |
a dataframe of scoring metrics
tmp2 = test_bpm %>% sim_summarise_linelist() withr::with_options(list("ggoutbreak.keep_cdf"=TRUE),{ est = tmp2 %>% poisson_locfit_model() %>% rt_from_incidence() }) obs = tmp2 %>% dplyr::mutate(rt.obs = rt.weighted) score_estimate(est,obs) %>% dplyr::glimpse()
tmp2 = test_bpm %>% sim_summarise_linelist() withr::with_options(list("ggoutbreak.keep_cdf"=TRUE),{ est = tmp2 %>% poisson_locfit_model() %>% rt_from_incidence() }) obs = tmp2 %>% dplyr::mutate(rt.obs = rt.weighted) score_estimate(est,obs) %>% dplyr::glimpse()
Apply a ascertainment bias to the observed case counts.
sim_apply_ascertainment(df = i_sim_count_data, fn_asc = ~1, seed = Sys.time())
sim_apply_ascertainment(df = i_sim_count_data, fn_asc = ~1, seed = Sys.time())
df |
a count dataframe from e.g. A dataframe containing the following columns:
Minimally grouped by: statistic (and other groupings allowed). |
fn_asc |
a function that takes a single input vector |
seed |
a RNG seed |
a dataframe with original
column, and count
column modified to
include ascertainment bias.
tibble::tibble( statistic = "incidence", time=as.time_period(1:10,"1 day"), count=rep(100,10) ) %>% dplyr::group_by(statistic) %>% sim_apply_ascertainment(~ ifelse(.x<=5,0.1,0.9))
tibble::tibble( statistic = "incidence", time=as.time_period(1:10,"1 day"), count=rep(100,10) ) %>% dplyr::group_by(statistic) %>% sim_apply_ascertainment(~ ifelse(.x<=5,0.1,0.9))
Events include symptom onset, admission, death, test sampling, test processing
sim_apply_delay( df, ..., fn_p_symptomatic = ~0.5, fn_p_admitted = ~0.1, fn_p_died = ~0.05, fn_p_tested = ~0.8, seed = Sys.time() )
sim_apply_delay( df, ..., fn_p_symptomatic = ~0.5, fn_p_admitted = ~0.1, fn_p_died = ~0.05, fn_p_tested = ~0.8, seed = Sys.time() )
df |
a line list dataframe arising from e.g. |
... |
Named arguments passed on to
Named arguments passed on to
|
fn_p_symptomatic , fn_p_admitted , fn_p_died , fn_p_tested
|
Function that
returns a probability between 0 and 1 for each row of the input dataframe.
A |
seed |
RNG seed for reproducibility |
Depends on input, either:
a wide format line list with additional XX
, XX_time
and XX_delay
columns,
for each of the set of statistics generated.
a long format set of counts of different statistics i.e. infections
,
symptoms
, admission
, death
, sample
(tests taken), results
(test results) .
tmp = sim_branching_process( changes = tibble::tibble(t = c(0,20,40,60,80,110), R = c(1.8,1.5,0.9,1.5,0.8,1.2)), max_time = 120, seed = 100 ) tmp2 = tmp %>% sim_apply_delay() tmp2 %>% dplyr::glimpse()
tmp = sim_branching_process( changes = tibble::tibble(t = c(0,20,40,60,80,110), R = c(1.8,1.5,0.9,1.5,0.8,1.2)), max_time = 120, seed = 100 ) tmp2 = tmp %>% sim_apply_delay() tmp2 %>% dplyr::glimpse()
Generate a line list from a branching process model parametrised by reproduction number
sim_branching_process( changes = tibble::tibble(t = c(0, 40), rt = c(2.5, 0.8)), max_time = 80, seed = Sys.time(), fn_Rt = cfg_step_fn(changes), fn_ip = ~test_ip, fn_kappa = ~1, imports_df = NULL, fn_imports = ~ifelse(.x == 0, 30, 0), fn_list_next_gen = list(), ... )
sim_branching_process( changes = tibble::tibble(t = c(0, 40), rt = c(2.5, 0.8)), max_time = 80, seed = Sys.time(), fn_Rt = cfg_step_fn(changes), fn_ip = ~test_ip, fn_kappa = ~1, imports_df = NULL, fn_imports = ~ifelse(.x == 0, 30, 0), fn_list_next_gen = list(), ... )
changes |
a dataframe containing a |
max_time |
maximum duration of simulation |
seed |
random seed |
fn_Rt |
can be specified instead of |
fn_ip |
a function that takes input vector |
fn_kappa |
a vectorised function taking |
imports_df |
a data frame containing minimally |
fn_imports |
a time varying function the defines the number of infected
importations. If |
fn_list_next_gen |
a named list of functions. The name corresponds to
metadata columns in the simulation, the function is a |
... |
not used |
a line list of cases, with individual ids, infection times, and infector ids, for a simulated outbreak
tmp = sim_branching_process( changes = tibble::tibble(t = c(0,40), R = c(1.5,0.8)), max_time = 120, seed = 100, fn_imports = ~ ifelse(.x<10,1,0) ) if(interactive()) { plot_cases(tmp, mapping=ggplot2::aes(fill=as.factor(generation)),linewidth=0.1) } # imports can also be specified as a dataframe, which allows additional # metadata in the line list. An example of which is as follows: imports_df = tibble::tribble( ~time, ~variant, ~count, 0:4, "wild-type", 100, 10:14, "alpha", 5, )
tmp = sim_branching_process( changes = tibble::tibble(t = c(0,40), R = c(1.5,0.8)), max_time = 120, seed = 100, fn_imports = ~ ifelse(.x<10,1,0) ) if(interactive()) { plot_cases(tmp, mapping=ggplot2::aes(fill=as.factor(generation)),linewidth=0.1) } # imports can also be specified as a dataframe, which allows additional # metadata in the line list. An example of which is as follows: imports_df = tibble::tribble( ~time, ~variant, ~count, 0:4, "wild-type", 100, 10:14, "alpha", 5, )
Standard convolution assumes one delay distribution. This is actually not what we see in reality as delays can depend on any factors, including the day of week. This function applies a convolution to an input time-series when that convolution is expressed as a function (usually of time, but can be of anything in the input dataframe). The convolution is then sampled using a poisson or negative binomial
sim_convolution( df = i_sim_count_data, p_fn, delay_fn, ..., input = "infections", output, kappa = 1, from = c("count", "rate") )
sim_convolution( df = i_sim_count_data, p_fn, delay_fn, ..., input = "infections", output, kappa = 1, from = c("count", "rate") )
df |
a count dataframe from e.g. A dataframe containing the following columns:
Minimally grouped by: statistic (and other groupings allowed). |
p_fn |
a function that takes a time parameter and potentially and
returns probability of observation given something occurs. a no-op for this
parameter is |
delay_fn |
a function that takes time and returns the probability of
observation (given it occurred) over time since infection (i.e. |
... |
not used |
input |
the input statistic |
output |
the output statistic |
kappa |
dispersion. scaled such that poisson dispersion is 1. Values must be 0 (no dispersion), 1 (poisson dispersion) or greater than 1 for over-dispersion. |
from |
Controls if you base future counts on previous counts or on
underlying rate, defaults to |
return the result of applying this convolution to the data.
weekday_delay = make_gamma_ip(median_of_mean = 5, median_of_sd = 2) weekend_delay = make_gamma_ip(median_of_mean = 6, median_of_sd = 2) delay_fn = ~ ifelse(.x %% 7 %in% c(6,7), list(weekend_delay), list(weekday_delay)) p_fn = ~ ifelse(.x < 20, 0.5, 0.75) data = tibble::tibble( time=1:40, count = rep(100,40), rate = rep(100,40), statistic="infections") %>% dplyr::group_by(statistic) delayed = data %>% sim_convolution(p_fn,delay_fn,output="delayed") %>% dplyr::filter(statistic=="delayed") if (interactive()) ggplot2::ggplot(delayed,ggplot2::aes(x=time))+ ggplot2::geom_line(ggplot2::aes(y=rate))+ ggplot2::geom_line(ggplot2::aes(y=count)) # other example delay functions delay_fn = cfg_gamma_ip_fn( ~ ifelse(.x<5, 8, 4))
weekday_delay = make_gamma_ip(median_of_mean = 5, median_of_sd = 2) weekend_delay = make_gamma_ip(median_of_mean = 6, median_of_sd = 2) delay_fn = ~ ifelse(.x %% 7 %in% c(6,7), list(weekend_delay), list(weekday_delay)) p_fn = ~ ifelse(.x < 20, 0.5, 0.75) data = tibble::tibble( time=1:40, count = rep(100,40), rate = rep(100,40), statistic="infections") %>% dplyr::group_by(statistic) delayed = data %>% sim_convolution(p_fn,delay_fn,output="delayed") %>% dplyr::filter(statistic=="delayed") if (interactive()) ggplot2::ggplot(delayed,ggplot2::aes(x=time))+ ggplot2::geom_line(ggplot2::aes(y=rate))+ ggplot2::geom_line(ggplot2::aes(y=count)) # other example delay functions delay_fn = cfg_gamma_ip_fn( ~ ifelse(.x<5, 8, 4))
Apply a time-varying probability and delay function to linelist data
sim_delay( df = i_sim_linelist, p_fn, delay_fn, input = "time", output = "event", seed = Sys.time() )
sim_delay( df = i_sim_linelist, p_fn, delay_fn, input = "time", output = "event", seed = Sys.time() )
df |
a line list dataframe arising from e.g. A dataframe containing the following columns:
Any grouping allowed. |
p_fn |
Function that returns a probability between 0 and 1 for each row
of the input dataframe. A |
delay_fn |
A function that calculates the time to event onset from the
|
input |
a time column from which to calculate the delay from. |
output |
an output column set name (defaults to |
seed |
RNG seed for reproducibility |
the line list with extra columns with prefix given by output
,
specifying whether the event was observed, the delay and the simulation
time.
tmp = sim_branching_process( changes = tibble::tibble(t = c(0,20,40,60,80,110), R = c(1.8,1.5,0.9,1.5,0.8,1.2)), max_time = 120, seed = 100 ) tmp2 = tmp %>% sim_delay( p_fn = ~ rbern(.x, 0.8), delay_fn = ~ rgamma2(.x, mean = 5), ) tmp2 %>% dplyr::glimpse()
tmp = sim_branching_process( changes = tibble::tibble(t = c(0,20,40,60,80,110), R = c(1.8,1.5,0.9,1.5,0.8,1.2)), max_time = 120, seed = 100 ) tmp2 = tmp %>% sim_delay( p_fn = ~ rbern(.x, 0.8), delay_fn = ~ rgamma2(.x, mean = 5), ) tmp2 %>% dplyr::glimpse()
Delayed observations means that if, for example a case is only attributed to a disease after a delay there is right censoring of the data. There can be very complex patterns of right censoring if for example observations are batched and published weekly. During COVID in the UK some death data was published frequently but some was retrospectively reported on monthly intervals, depending on where the patient died, which lead to complex time dependent biases in the death data. Given the description of the delay, this function will simulate this effect for count data. In another example there were delays reporting test results in the run up to Christmas which resulted in case rates apparently dropping as schools broke up. This could have affected timing of the 2021 Christmas lockdown.
sim_delayed_observation( df = i_sim_count_data, delay_fn, ..., input = "infections", output = input, max_time = max(df$time) )
sim_delayed_observation( df = i_sim_count_data, delay_fn, ..., input = "infections", output = input, max_time = max(df$time) )
df |
a count dataframe from e.g. |
delay_fn |
a function that takes time and returns the probability of
observation (given it occurred) over time since infection (i.e. tau) as an
ip delay distribution. This does not have to sum to 1 (e.g. mapping
incidence to prevalence) but if not then it will behave as if some fraction
or events are not observed (or observed multiple times). See
|
... |
not used |
input |
the input statistic (defaults to |
output |
the output column name (defaults to same as |
max_time |
the date on which censoring is taking place. |
the result of applying this right censoring to the data.
weekday_delay = make_gamma_ip(median_of_mean = 5, median_of_sd = 2) weekend_delay = make_gamma_ip(median_of_mean = 7, median_of_sd = 2) delay_fn = ~ ifelse(.x %% 7 %in% c(6,7), list(weekend_delay), list(weekday_delay)) data = tibble::tibble(time=1:40, count = rep(100,40), statistic="infections") %>% dplyr::group_by(statistic) %>% sim_delayed_observation(delay_fn,output="delayed") if (interactive()) ggplot2::ggplot(data,ggplot2::aes(x=time,colour=statistic))+ ggplot2::geom_line(ggplot2::aes(y=count))
weekday_delay = make_gamma_ip(median_of_mean = 5, median_of_sd = 2) weekend_delay = make_gamma_ip(median_of_mean = 7, median_of_sd = 2) delay_fn = ~ ifelse(.x %% 7 %in% c(6,7), list(weekend_delay), list(weekday_delay)) data = tibble::tibble(time=1:40, count = rep(100,40), statistic="infections") %>% dplyr::group_by(statistic) %>% sim_delayed_observation(delay_fn,output="delayed") if (interactive()) ggplot2::ggplot(data,ggplot2::aes(x=time,colour=statistic))+ ggplot2::geom_line(ggplot2::aes(y=count))
Generate a multinomial outbreak defined by per class growth rates and a poisson model
sim_multinomial( changes = tibble::tibble(t = c(0, 20, 40, 60, 80), variant1 = c(0.1, 0, -0.1, 0, 0.1), variant2 = c(0.15, 0.05, -0.05, -0.01, 0.05), variant3 = c(0, 0.05, -0.05, +0.05, -0.05), ), initial = c(100, 100, 100), time_unit = "1 day", ... )
sim_multinomial( changes = tibble::tibble(t = c(0, 20, 40, 60, 80), variant1 = c(0.1, 0, -0.1, 0, 0.1), variant2 = c(0.15, 0.05, -0.05, -0.01, 0.05), variant3 = c(0, 0.05, -0.05, +0.05, -0.05), ), initial = c(100, 100, 100), time_unit = "1 day", ... )
changes |
a list of time points in column |
initial |
the size of the initial outbreak per class. There should be one entry per class |
time_unit |
e.g. a daily or weekly time series: "1 day", "7 days" |
... |
Named arguments passed on to
|
a case count time series including class
, count
and time
columns
if (interactive()) { plot_counts( sim_multinomial() %>% dplyr::glimpse() ) }
if (interactive()) { plot_counts( sim_multinomial() %>% dplyr::glimpse() ) }
Generate an outbreak case count series defined by growth rates using a poisson model.
sim_poisson_model( changes = tibble::tibble(t = c(0, 20, 40, 60, 80), growth = c(0.1, 0, -0.1, 0, 0.1)), kappa = 1, max_time = 104, seed = Sys.time(), fn_growth = cfg_step_fn(changes), fn_imports = ~ifelse(.x == 0, 100, 0), time_unit = "1 day" )
sim_poisson_model( changes = tibble::tibble(t = c(0, 20, 40, 60, 80), growth = c(0.1, 0, -0.1, 0, 0.1)), kappa = 1, max_time = 104, seed = Sys.time(), fn_growth = cfg_step_fn(changes), fn_imports = ~ifelse(.x == 0, 100, 0), time_unit = "1 day" )
changes |
a dataframe holding change time points ( |
kappa |
a dispersion parameter. 1 is no dispersion (compared to poisson), smaller values mean more dispersion. |
max_time |
the desired length of the time series, |
seed |
a random seed |
fn_growth |
a function that takes input vector |
fn_imports |
a function that takes input vector |
time_unit |
e.g. a daily or weekly time series: "1 day", "7 days" |
A dataframe containing the following columns:
count (positive_integer) - Positive case counts associated with the specified time frame
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
Ungrouped.
tmp2 = sim_poisson_model(seed=100, fn_imports = ~ ifelse(.x %in% c(0,50),100,0)) if (interactive()) { ggplot2::ggplot(tmp2)+ggplot2::geom_point(ggplot2::aes(x=time,y=count)) }
tmp2 = sim_poisson_model(seed=100, fn_imports = ~ ifelse(.x %in% c(0,50),100,0)) if (interactive()) { ggplot2::ggplot(tmp2)+ggplot2::geom_point(ggplot2::aes(x=time,y=count)) }
Generate an outbreak case count series defined by Reproduction number using a poisson model.
sim_poisson_Rt_model( changes = tibble::tibble(t = c(0, 40), rt = c(2.5, 0.8)), kappa = 1, max_time = 80, seed = Sys.time(), fn_Rt = cfg_step_fn(changes), fn_imports = ~ifelse(.x == 0, 30, 0), fn_ip = ~test_ip, time_unit = "1 day" )
sim_poisson_Rt_model( changes = tibble::tibble(t = c(0, 40), rt = c(2.5, 0.8)), kappa = 1, max_time = 80, seed = Sys.time(), fn_Rt = cfg_step_fn(changes), fn_imports = ~ifelse(.x == 0, 30, 0), fn_ip = ~test_ip, time_unit = "1 day" )
changes |
a dataframe holding change time points ( |
kappa |
a dispersion parameter. 1 is no dispersion (compared to poisson), smaller values mean more dispersion. |
max_time |
the desired length of the time series, |
seed |
a random seed |
fn_Rt |
a function that takes input vector |
fn_imports |
a function that takes input vector |
fn_ip |
a function that takes input vector |
time_unit |
e.g. a daily or weekly time series: "1 day", "7 days" |
A dataframe containing the following columns:
count (positive_integer) - Positive case counts associated with the specified timeframe
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
Ungrouped.
tmp = sim_poisson_Rt_model(kappa=1, seed=100, fn_imports = ~ ifelse(.x %in% c(0,50),100,0)) if (interactive()) { ggplot2::ggplot(tmp,ggplot2::aes(x=time,y=count))+ggplot2::geom_point()+ ggplot2::geom_line() }
tmp = sim_poisson_Rt_model(kappa=1, seed=100, fn_imports = ~ ifelse(.x %in% c(0,50),100,0)) if (interactive()) { ggplot2::ggplot(tmp,ggplot2::aes(x=time,y=count))+ggplot2::geom_point()+ ggplot2::geom_line() }
This function converts a line list into a daily count of incident cases, plus infections, admissions, deaths, test samples, test results if present. Censoring of these counts can also be defined. Whilst summarising various network measures such as the forward looking case reproduction number are also calculated.
sim_summarise_linelist( df = i_sim_linelist, ..., censoring = list(), max_time = max(df$time) )
sim_summarise_linelist( df = i_sim_linelist, ..., censoring = list(), max_time = max(df$time) )
df |
a line list dataframe arising from e.g. A dataframe containing the following columns:
Any grouping allowed. |
... |
the grouping to include in the summarisation. |
censoring |
a named list of column names (without the |
max_time |
the censoring time for this observation. |
a count data frame with various R_t measures, ascertainment, and underlying infection count.
sim = sim_branching_process( changes = tibble::tibble(t = c(0,40), R = c(1.7,0.8)), max_time = 120, seed = 100, fn_imports = ~ ifelse(.x==0,100,0) ) tmp = sim %>% sim_summarise_linelist() p1 = plot_counts(tmp) p2 = ggplot2::ggplot(tmp, ggplot2::aes(x=as.Date(time)))+ ggplot2::geom_point(ggplot2::aes(y=rt.case,colour="case"))+ ggplot2::geom_point(ggplot2::aes(y=rt.inst,colour="instantaneous"))+ ggplot2::geom_line(ggplot2::aes(y=rt.weighted))+ ggplot2::coord_cartesian(ylim=c(0,3.5))+ ggplot2::xlab(NULL) patchwork::wrap_plots(p1,p2,ncol=1,axes="collect")
sim = sim_branching_process( changes = tibble::tibble(t = c(0,40), R = c(1.7,0.8)), max_time = 120, seed = 100, fn_imports = ~ ifelse(.x==0,100,0) ) tmp = sim %>% sim_summarise_linelist() p1 = plot_counts(tmp) p2 = ggplot2::ggplot(tmp, ggplot2::aes(x=as.Date(time)))+ ggplot2::geom_point(ggplot2::aes(y=rt.case,colour="case"))+ ggplot2::geom_point(ggplot2::aes(y=rt.inst,colour="instantaneous"))+ ggplot2::geom_line(ggplot2::aes(y=rt.weighted))+ ggplot2::coord_cartesian(ylim=c(0,3.5))+ ggplot2::xlab(NULL) patchwork::wrap_plots(p1,p2,ncol=1,axes="collect")
This time-series has no statistical noise and is useful for testing things. It has a fixed known value for infections and growth rate (fixed at 0.05 and -0.05 per day), and a instantaneous reproduction number which is based on the provided infectivity profile. A fixed denominator gives a known proportion and a relative growth rate that is the same as the growth rate.
sim_test_data(ip = test_ip, duration = 500, period = 50)
sim_test_data(ip = test_ip, duration = 500, period = 50)
ip |
an infectivity profile. any uncertainty will be collapsed into the central distribution. |
duration |
the total length of the time-series |
period |
the duration of each positive or negative growth phase |
a time series with count
, incidence
, growth
, rt
,
proportion
and relative.growth
columns
sim_test_data() %>% dplyr::glimpse()
sim_test_data() %>% dplyr::glimpse()
Generate a single infectivity profile from multiple bootstraps
summarise_ip(ip = i_empirical_ip)
summarise_ip(ip = i_empirical_ip)
ip |
the infectivity profile to summarise. A dataframe containing the following columns:
Must be grouped by: boot (exactly). A default value is defined. |
an infectivity profile
This is generated using the test_ip
infectivity profile and
also includes a delay to symptom onset which is a random gamma distributed
quantity with mean of 6 and sd of 2
data(test_bpm)
data(test_bpm)
A dataframe containing the following columns:
time (as.time_period) - the time column
id (integer) - an id per individual
generation_interval (numeric) - the generation_interval column
infector (integer) - the infector id
generation (numeric) - the generation column
symptom_onset (logical) - the flag for onset of symptoms
symptom_onset_delay (numeric) - the time to onset of symptoms from infection
symptom_onset_time (as.time_period) - the time of symptom onset
333126 rows and 8 columns
A test infectivity profile generated from a set of discretised gamma distributions with parameters mean 5 (95% CI 4-6) and sd 2 (95% CI 1.5-2.5).
data(test_ip)
data(test_ip)
A dataframe containing the following columns:
boot (anything + default(1)) - a bootstrap identifier
probability (proportion) - the probability of infection between previous time period until time
tau (numeric) - the time index this probability relates to (in days)
a0 (numeric) - the beginning of the time period
a1 (numeric) - the end of the time period
Grouped by: boot.
2000 rows and 5 columns
This is generated using the test_ip
infectivity profile
data(test_poisson_rt)
data(test_poisson_rt)
A dataframe containing the following columns:
time (as.time_period) - the time column
rt (numeric) - the time varying rt column (parameters)
imports (numeric) - the imports column
rate (numeric) - the poisson rate column (underlying infection rate)
count (integer) - the count column
statistic (character) - the statistic column
81 rows and 6 columns
This serial interval is resampled from the first 1000 patients in the
test_bpm
dataset for whom both infector and infectee has symptoms. These
patients are generated with a symptom delay of mean 6 days and SD 2 from
infection (discrete under-dispersed gamma) and an infectivity profile with
mean 5 days and SD 2 as defined in test_ip
dataset. This serial interval
if relevant to the estimation of $R_t$ from symptomatic case counts but
includes negative times, and cannot be used with EpiEstim
.
data(test_serial)
data(test_serial)
A dataframe containing the following columns:
tau (numeric) - the time delay between symptoms in infector and infectee
a0 (numeric) - the a0 column
a1 (numeric) - the a1 column
probability (numeric) - the probability column
boot (integer) - the boot column
Minimally grouped by: boot (and other groupings allowed).
2166 rows and 5 columns
A test time series dataset
data(test_ts)
data(test_ts)
A dataframe containing the following columns:
time (as.time_period) - the time column
growth (numeric) - the growth column
incidence (numeric) - the incidence column
rt (numeric) - the rt column
denom (numeric) - the denom column
proportion (numeric) - the proportion column
relative.growth (numeric) - the relative.growth column
count (numeric) - the count column
Any grouping allowed.
501 rows and 8 columns
Aggregate time series data preserving the time series
time_aggregate( df = i_timestamped, ..., .groups = NULL, .cols = NULL, .fns = NULL )
time_aggregate( df = i_timestamped, ..., .groups = NULL, .cols = NULL, .fns = NULL )
df |
an optionally grouped time series. Grouping should not include the time
column. The grouping works differently from |
... |
A set of |
.groups |
as per |
.cols |
Optional tidyselect column specification for |
.fns |
Optional a set of function specifications as per |
the summarised time series preserving the time
column, and with the grouping
structure involving one fewer levels than the input
ggoutbreak::england_covid %>% time_aggregate(count = sum(count), denom = sum(denom)) %>% dplyr::glimpse() ggoutbreak::england_covid %>% time_aggregate(.fns=mean) %>% dplyr::glimpse()
ggoutbreak::england_covid %>% time_aggregate(count = sum(count), denom = sum(denom)) %>% dplyr::glimpse() ggoutbreak::england_covid %>% time_aggregate(.fns=mean) %>% dplyr::glimpse()
This principally is designed to take a record of single events and produce a summary time-series count of events by group, class and date. The default behaviour is to guess the cadence of the input data and summarise the event line list to a (set of) regular time-series counts for use in incidence and growth rate estimates.
time_summarise( df = i_dated, unit, anchor = "start", rectangular = FALSE, ..., .fill = list(count = 0) )
time_summarise( df = i_dated, unit, anchor = "start", rectangular = FALSE, ..., .fill = list(count = 0) )
df |
a line list of data you want to summarise, optionally grouped.
If this is grouped then each group is treated independently. The remaining
columns must contain a |
unit |
a period e.g. "1 week" |
anchor |
one of a date, "start" or "end" or a weekday name e.g. "mon" this will always be one of the start of the time periods we are cutting into |
rectangular |
should the resulting time series be the same length for all groups. This is only the case if you can be sure that your data is complete for all subgroups, otherwise missing data will be treated as zero counts. This is important if leading and trailing missing data in one subgroup can be due to a reporting delay in that subgroup, in which case a rectangular time series will erroneously fill in zero counts for this missing data. |
... |
a spec for a dplyr::summary(...) - optional, and if not provided a
|
.fill |
a list similar to tidyr::complete for values to fill variables with |
If the data is given with a class
column the time series are interpreted as
having a denominator, consisting of all the different classes within a time period.
This may be subtypes (e.g. variants, serotypes) or markers for test positivity.
In either case the resulting time series will have counts for all classes and
denominators for the combination.
There is flexibility for other kinds of summarisation if the raw data is not
count based (e.g. means of continuous variables) but in this case a the slider
package is usually going to be better, as time summarise will only look at non
overlapping time periods with fixed lengths.
There is another use case where an existing timeseries on a particular
frequency is aggregated to another less frequent basis (e.g. moving from a
daily timeseries to a weekly one). In this case the input will contain a
count
column. In this mode no checks are made that the more frequent events
are all present before summarisation so the result may include different numbers
of input periods (e.g. going from weeks to months may be 4 or 5 weeks in each
month)
The output depends on whether or not the input was grouped and
had a class
column. The most detailed output will be:
A dataframe containing the following columns:
denom (positive_integer) - Total test counts associated with the specified time frame
count (positive_integer) - Positive case counts associated with the specified time frame
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
Any grouping allowed.
or a more minimal output if the input is only a plain list of dated events:
A dataframe containing the following columns:
count (positive_integer) - Positive case counts associated with the specified time frame
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
Any grouping allowed.
Convert a set of time points to dates
time_to_date( timepoints, unit = attr(timepoints, "unit"), start_date = attr(timepoints, "start_date") )
time_to_date( timepoints, unit = attr(timepoints, "unit"), start_date = attr(timepoints, "start_date") )
timepoints |
a set of numeric time points |
unit |
the period / unit of the time points, which will be extracted from time points if possible |
start_date |
the zero day of the time series, will be extracted from time points if possible |
a vector of dates
times = date_to_time(as.Date("2019-12-29")+0:100, "1 week") dates = time_to_date(times)
times = date_to_time(as.Date("2019-12-29")+0:100, "1 week") dates = time_to_date(times)
This function uses a single empirical distribution for the infectivity profile aka generation time
wallinga_lipsitch( r, y = i_empirical_ip, a1 = seq(0.5, length.out = length(y)), a0 = dplyr::lag(a1, default = 0) )
wallinga_lipsitch( r, y = i_empirical_ip, a1 = seq(0.5, length.out = length(y)), a0 = dplyr::lag(a1, default = 0) )
r |
a growth rate (may be a vector) |
y |
an empirical infectivity profile either as a probability vector or as a dataframe of format: A dataframe containing the following columns:
Must be grouped by: boot (exactly). A default value is defined. |
a1 |
the end time of the infectivity profile probability estimate (defaults to 0.5,1.5,2.5,...). |
a0 |
the start time of the infectivity profile probability estimate (defaults to 0,0.5,1.5,...). |
a reproduction number estimate based on r
wallinga_lipsitch(r=seq(-0.1,0.1,length.out=9), y=stats::dgamma(1:50, 5,2))
wallinga_lipsitch(r=seq(-0.1,0.1,length.out=9), y=stats::dgamma(1:50, 5,2))
The wedge distribution has a domain of 0 to 1 and has a linear probability density function over that domain.
n |
number of observations |
x |
vector of quantiles |
q |
vector of quantiles |
p |
vector of probabilities |
log |
logical; if TRUE, probabilities p are given as log(p). |
log.p |
logical; if TRUE, probabilities p are given as log(p). |
lower.tail |
logical; if TRUE (default), probabilities are |
a |
a gradient from -2 (left skewed) to 2 (right skewed) |
The rwedge
can be combined with quantile functions to
skew standard distributions, or introduce correlation or down weight
certain parts of the distribution.
a vector of probabilities, quantiles, densities or samples.
pwedge(seq(0,1,0.1), a=1) dwedge(seq(0,1,0.1), a=1) qwedge(c(0.25,0.5,0.75), a=-1) stats::cor( stats::qnorm(rwedge(1000, a=2)), stats::qnorm(rwedge(1000, a=-2)) )
pwedge(seq(0,1,0.1), a=1) dwedge(seq(0,1,0.1), a=1) qwedge(c(0.25,0.5,0.75), a=-1) stats::cor( stats::qnorm(rwedge(1000, a=2)), stats::qnorm(rwedge(1000, a=-2)) )
Extract the weekday, month or quarter, or the Julian time (days since some origin). These are generic functions: the methods for the internal date-time classes are documented here.
## S3 method for class 'time_period' weekdays(x, abbreviate = FALSE)
## S3 method for class 'time_period' weekdays(x, abbreviate = FALSE)
x |
an object inheriting from class |
abbreviate |
logical vector (possibly recycled). Should the names be abbreviated? |
weekdays
and months
return a character
vector of names in the locale in use, i.e., Sys.getlocale("LC_TIME")
.
quarters
returns a character vector of "Q1"
to
"Q4"
.
julian
returns the number of days (possibly fractional)
since the origin, with the origin as a "origin"
attribute.
All time calculations in R are done ignoring leap-seconds.
Other components such as the day of the month or the year are
very easy to compute: just use as.POSIXlt
and extract
the relevant component. Alternatively (especially if the components
are desired as character strings), use strftime
.
DateTimeClasses
, Date
;
Sys.getlocale("LC_TIME")
crucially for months()
and weekdays()
.
## first two are locale dependent: weekdays(.leap.seconds) months (.leap.seconds) quarters(.leap.seconds) ## Show how easily you get month, day, year, day (of {month, week, yr}), ... : ## (remember to count from 0 (!): mon = 0..11, wday = 0..6, etc !!) ##' Transform (Time-)Date vector to convenient data frame : dt2df <- function(dt, dName = deparse(substitute(dt))) { DF <- as.data.frame(unclass(as.POSIXlt( dt ))) `names<-`(cbind(dt, DF, deparse.level=0L), c(dName, names(DF))) } ## e.g., dt2df(.leap.seconds) # date+time dt2df(Sys.Date() + 0:9) # date ##' Even simpler: Date -> Matrix - dropping time info {sec,min,hour, isdst} d2mat <- function(x) simplify2array(unclass(as.POSIXlt(x))[4:7]) ## e.g., d2mat(seq(as.Date("2000-02-02"), by=1, length.out=30)) # has R 1.0.0's release date ## Julian Day Number (JDN, https://en.wikipedia.org/wiki/Julian_day) ## is the number of days since noon UTC on the first day of 4317 BCE. ## in the proleptic Julian calendar. To more recently, in ## 'Terrestrial Time' which differs from UTC by a few seconds ## See https://en.wikipedia.org/wiki/Terrestrial_Time julian(Sys.Date(), -2440588) # from a day floor(as.numeric(julian(Sys.time())) + 2440587.5) # from a date-time
## first two are locale dependent: weekdays(.leap.seconds) months (.leap.seconds) quarters(.leap.seconds) ## Show how easily you get month, day, year, day (of {month, week, yr}), ... : ## (remember to count from 0 (!): mon = 0..11, wday = 0..6, etc !!) ##' Transform (Time-)Date vector to convenient data frame : dt2df <- function(dt, dName = deparse(substitute(dt))) { DF <- as.data.frame(unclass(as.POSIXlt( dt ))) `names<-`(cbind(dt, DF, deparse.level=0L), c(dName, names(DF))) } ## e.g., dt2df(.leap.seconds) # date+time dt2df(Sys.Date() + 0:9) # date ##' Even simpler: Date -> Matrix - dropping time info {sec,min,hour, isdst} d2mat <- function(x) simplify2array(unclass(as.POSIXlt(x))[4:7]) ## e.g., d2mat(seq(as.Date("2000-02-02"), by=1, length.out=30)) # has R 1.0.0's release date ## Julian Day Number (JDN, https://en.wikipedia.org/wiki/Julian_day) ## is the number of days since noon UTC on the first day of 4317 BCE. ## in the proleptic Julian calendar. To more recently, in ## 'Terrestrial Time' which differs from UTC by a few seconds ## See https://en.wikipedia.org/wiki/Terrestrial_Time julian(Sys.Date(), -2440588) # from a day floor(as.numeric(julian(Sys.time())) + 2440587.5) # from a date-time