Title: | Estimate Incidence, Proportions and Exponential Growth Rates |
---|---|
Description: | Simple statistical models and visualisations for calculating the incidence, proportion, exponential growth rate, and reproduction number of infectious disease case timeseries. This toolkit was largely developed during the COVID-19 pandemic. |
Authors: | Robert Challen [aut, cre, cph] |
Maintainer: | Robert Challen <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.3.1 |
Built: | 2024-11-14 03:19:51 UTC |
Source: | https://github.com/ai4ci/ggoutbreak |
Generate a multinomial outbreak defined by per class growth rates and a poisson model
.test_multinomial( changes = tibble::tibble(time = c(0, 20, 40, 60, 80), variant1 = c(0.1, 0, -0.1, 0, 0.1), variant2 = c(0.15, 0.05, -0.05, -0.01, 0.05), variant3 = c(0, 0.05, -0.05, +0.05, -0.05), ), initial = c(100, 1, 100), ... )
.test_multinomial( changes = tibble::tibble(time = c(0, 20, 40, 60, 80), variant1 = c(0.1, 0, -0.1, 0, 0.1), variant2 = c(0.15, 0.05, -0.05, -0.01, 0.05), variant3 = c(0, 0.05, -0.05, +0.05, -0.05), ), initial = c(100, 1, 100), ... )
changes |
a list of time points and growth rates per week per class. |
initial |
the size of the inital outbreak per class |
... |
Arguments passed on to
|
a case count time series including class
, count
and time
columns
if (interactive()) { ggplot2::ggplot( .test_multinomial(), ggplot2::aes(x=time,y=count,colour=class) )+ggplot2::geom_point() }
if (interactive()) { ggplot2::ggplot( .test_multinomial(), ggplot2::aes(x=time,y=count,colour=class) )+ggplot2::geom_point() }
Convert time period to dates
## S3 method for class 'time_period' as.Date(x, ...) ## S3 method for class 'time_period' as.POSIXct(x, ...)
## S3 method for class 'time_period' as.Date(x, ...) ## S3 method for class 'time_period' as.POSIXct(x, ...)
x |
a time_period |
... |
not used |
a vector of dates representing the start of each of the input
time_period
entries
as.POSIXct(time_period)
: Convert to a vector of POSIXct
Time periods are just a zero based numeric representation of dates with a time unit baked in. This allows variable length periods (e.g. days or weeks), and fractional days to be represented in a consistent(ish) way
as.time_period(x, unit = NULL, start_date = NULL, anchor = NULL, ...) ## S3 method for class 'time_period' c(..., recursive = F) ## S3 method for class 'time_period' x[...] ## S3 replacement method for class 'time_period' x[...] <- value ## S3 method for class 'time_period' x[[...]] ## S3 replacement method for class 'time_period' x[[...]] <- value is.time_period(x) ## S3 method for class 'time_period' print(x, ...)
as.time_period(x, unit = NULL, start_date = NULL, anchor = NULL, ...) ## S3 method for class 'time_period' c(..., recursive = F) ## S3 method for class 'time_period' x[...] ## S3 replacement method for class 'time_period' x[...] <- value ## S3 method for class 'time_period' x[[...]] ## S3 replacement method for class 'time_period' x[[...]] <- value is.time_period(x) ## S3 method for class 'time_period' print(x, ...)
x |
a vector of numbers (may be integer or real) or a time_period |
unit |
the length of one unit of time. This will be either a integer
number of days, or a specification such as "1 week", or another |
start_date |
the zero time date as something that can be coerced to a
date. If the |
anchor |
only relevant is |
... |
used for subtype implementations |
recursive |
concatenate recursively |
value |
the value |
a time_period
class, consisting of a vector of numbers, with
attributes for time period and start_date
c(time_period)
: Combine time_period
[
: Subset a time_period
`[`(time_period) <- value
: Assign values to a subset of a time_period
[[
: Get a value in a time_period
`[[`(time_period) <- value
: Assign a value in a time_period
is.time_period()
: Check is a time_period
print(time_period)
: Print a time_period
# 100 weeks from 2020-01-01 tmp = as.time_period(0:100, 7, "2020-01-01") as.Date(tmp) range(tmp) min(tmp) tmp2 = as.integer(as.Date(tmp)) # testthat::expect_true(all(na.omit(tmp2-lag(tmp2)) == 7)) tmp2 = as.time_period(0:23, 1/24, "2020-01-01") as.POSIXct(tmp2) # convert timeseries to new "unit" tmp = as.time_period(0:100, 7, "2020-01-01") tmp2 = as.time_period(tmp,1) testthat::expect_equal(as.numeric(tmp2), 0:100*7)
# 100 weeks from 2020-01-01 tmp = as.time_period(0:100, 7, "2020-01-01") as.Date(tmp) range(tmp) min(tmp) tmp2 = as.integer(as.Date(tmp)) # testthat::expect_true(all(na.omit(tmp2-lag(tmp2)) == 7)) tmp2 = as.time_period(0:23, 1/24, "2020-01-01") as.POSIXct(tmp2) # convert timeseries to new "unit" tmp = as.time_period(0:100, 7, "2020-01-01") tmp2 = as.time_period(tmp,1) testthat::expect_equal(as.numeric(tmp2), 0:100*7)
A scales breaks generator for log1p scales
breaks_log1p(n = 5, base = 10)
breaks_log1p(n = 5, base = 10)
n |
the number of breaks |
base |
the base for the breaks |
a function for ggplot scale breaks
ggplot2::ggplot(ggplot2::diamonds, ggplot2::aes(x=price))+ ggplot2::geom_density()+ ggplot2::scale_x_continuous(trans="log1p", breaks=breaks_log1p())
ggplot2::ggplot(ggplot2::diamonds, ggplot2::aes(x=price))+ ggplot2::geom_density()+ ggplot2::scale_x_continuous(trans="log1p", breaks=breaks_log1p())
The covid_infectivity_profile dataframe structure specification
A dataframe containing the following columns:
boot (anything + default(1)) - a bootstrap identifier
time (positive_double) - the end of the time period (in days)
probability (proportion) - the probability of infection between previous time period until time
Must be grouped by: boot (exactly).
A default value is defined.
The counterpart to date_seq_dates(). Take an original set of data and place
it within a regular time series where the periodicity of the time series may
be expressed as numbers of days, weeks, months quarters, or years, and the
periods are defined by an anchoring date, day of the week or by reference to
the start or end of the input dates. This can either return the periods as
dates or factors (e.g. for plotting) or as a time_period
for analysis that
relies on a numeric representation of the date or duration from the anchor.
cut_date( dates, unit, anchor = "start", output = c("date", "factor", "time_period"), dfmt = "%d/%b/%y", ifmt = "{start} — {end}", ... )
cut_date( dates, unit, anchor = "start", output = c("date", "factor", "time_period"), dfmt = "%d/%b/%y", ifmt = "{start} — {end}", ... )
dates |
a set of dates |
unit |
a period e.g. "1 week" |
anchor |
one of a date, "start" or "end" or a weekday name e.g. "mon" this will always be one of the start of the time periods we are cutting into |
output |
return the result as either a "date" (the default), an ordered "factor" with the date ranges as a label, or as a "time_period". The result is named with labels referring to the |
dfmt |
the |
ifmt |
a |
... |
ignored |
a set of dates, times or a factor level, representing the start of the period the date falls into, where the period is defined by the duration and the anchor
dates = as.Date(c("2020-01-01","2020-02-01","2020-01-15","2020-02-03",NA)) fs = ggoutbreak::date_seq(dates, "2 days") dates - cut_date(dates, "2 days") cut_date(dates,unit="2 days", output="time_period") # A weekly set of dates: dates2 = Sys.Date() + floor(stats::runif(50,max=10))*7 # in this specific situation the final date is not truncated because the # input data is seen as an exact match for the whole output period. cut_date(dates2, "1 week", "sun", output="factor") cut_date(dates2, dfmt = "%d/%b", output="factor", unit = "2 weeks", anchor="sun")
dates = as.Date(c("2020-01-01","2020-02-01","2020-01-15","2020-02-03",NA)) fs = ggoutbreak::date_seq(dates, "2 days") dates - cut_date(dates, "2 days") cut_date(dates,unit="2 days", output="time_period") # A weekly set of dates: dates2 = Sys.Date() + floor(stats::runif(50,max=10))*7 # in this specific situation the final date is not truncated because the # input data is seen as an exact match for the whole output period. cut_date(dates2, "1 week", "sun", output="factor") cut_date(dates2, dfmt = "%d/%b", output="factor", unit = "2 weeks", anchor="sun")
This is useful if you want to fill in missing values that should have been observed but weren't. For example, date_seq(c(1, 2, 4, 6), 1) will return 1:6.
date_seq(x, period, ...)
date_seq(x, period, ...)
x |
a numeric or date vector |
period |
Gap between each observation. The existing data will be checked to ensure that it is actually of this periodicity. |
... |
for subtype methods |
a vector of the same type as the input
date_seq(c(1, 2, 4, 5, 10), 1)
date_seq(c(1, 2, 4, 5, 10), 1)
Derive from a vector of observation dates, a complete ordered sequence of periods in a regular time series, where the length of the periods is specified, as a number od days, weeks, years etc. E.g. this can convert a random set of dates to a ordered complete list of 1 week intervals (or 2 month intervals) spanning the same range as the dates. This has some interesting problems regarding where to put breaks within a month or week. Often this is either based on a specific date (e.g. yearly periods starting at 2020-01-01) or a day of week (e.g. 2 weekly periods staring on a sunday) or maybe relative to the input time series (weekly ending on the last date of the data). There is also a problem when we consider data that may have incomplete starting and end periods, which may not be comparable to other periods, and we may need to exclude these from the result.
## S3 method for class 'Date' date_seq(x, period = .day_interval(x), anchor = "start", complete = FALSE, ...)
## S3 method for class 'Date' date_seq(x, period = .day_interval(x), anchor = "start", complete = FALSE, ...)
x |
a vector of dates, possibly including NA values |
period |
the gap between observations as a number of days or as a natural language definition of the period such as "1 week", '2 weeks', '1 month', etc. If not given this will be derived from the dates. |
anchor |
defines a day that appears in the sequence (if it were to extend that far). Given either as a date, or "start", "end" or a day of the week, e.g. "mon". |
complete |
truncate incomplete start and end periods |
... |
ignored |
a vector of dates for regular periods between the minimum and maximum of dates, with the boundaries defined by the anchor.
date_seq(as.Date(c("2020-01-01","2020-02-01","2020-01-15","2020-02-01",NA)), "2 days")
date_seq(as.Date(c("2020-01-01","2020-02-01","2020-01-15","2020-02-01",NA)), "2 days")
This is useful if you want to fill in missing values that should have been observed but weren't. For example, date_seq(c(1, 2, 4, 6), 1) will return 1:6.
## S3 method for class 'numeric' date_seq(x, period = 1, tol = 1e-06, ...)
## S3 method for class 'numeric' date_seq(x, period = 1, tol = 1e-06, ...)
x |
a numeric or date vector |
period |
Gap between each observation. The existing data will be checked to ensure that it is actually of this periodicity. |
tol |
Numerical tolerance for checking periodicity. |
... |
for subtype methods |
a vector of the same type as the input
date_seq(c(1, 2, 4, 5, 10), 1)
date_seq(c(1, 2, 4, 5, 10), 1)
time_period
vector to the full range of possible timesDerive from a vector of observation time_periods
, a complete ordered sequence of
periods in a regular time series, where the length of the periods is
specified, as a number of days, weeks, years etc. E.g. this can convert a
random set of times to a ordered complete list of 1 week intervals (or 2
month intervals) spanning the same range as the dates. This has some
interesting problems regarding where to put breaks within a month or week.
Often this is either based on a specific date (e.g. yearly periods starting
at 2020-01-01) or a day of week (e.g. 2 weekly periods staring on a sunday)
or maybe relative to the input time series (weekly ending on the last date of
the data). There is also a problem when we consider data that may have
incomplete starting and end periods, which may not be comparable to other
periods, and we may need to exclude these from the result.
## S3 method for class 'time_period' date_seq(x, period = attributes(x)$unit, complete = FALSE, ...)
## S3 method for class 'time_period' date_seq(x, period = attributes(x)$unit, complete = FALSE, ...)
x |
a time period vector |
period |
the gap between observations as a number of days or as a natural language definition of the period such as "1 week", '2 weeks', '1 month', etc. If not given this will be derived from the dates. |
complete |
truncate incomplete start and end periods |
... |
ignored |
a vector of time_periods
for regular periods between the minimum and maximum of
dates, with the boundaries defined by the anchor.
tmp = as.time_period(c(0,10,100), 7, "2020-01-01") date_seq(tmp, "7 days") date_seq(tmp, "1 day")
tmp = as.time_period(c(0,10,100), 7, "2020-01-01") date_seq(tmp, "7 days") date_seq(tmp, "1 day")
Using a start_date and a unit specification
date_to_time( dates, unit = .day_interval(dates), start_date = getOption("day_zero", "2019-12-29") )
date_to_time( dates, unit = .day_interval(dates), start_date = getOption("day_zero", "2019-12-29") )
dates |
a vector of dates to convert |
unit |
a specification of the unit of the resulting time series.
Will be determined from periodicity of dates if not specified. If another
|
start_date |
the origin of the conversion. Defaults to the beginning of the COVID pandemic |
a vector of class time_period
times = date_to_time(as.Date("2019-12-29")+0:100, "1 week") dates = time_to_date(times)
times = date_to_time(as.Date("2019-12-29")+0:100, "1 week") dates = time_to_date(times)
The unit of doubling times is always days.
doubling_time(x, ...)
doubling_time(x, ...)
x |
a dataframe calculated from either proportion or incidence growth rate calculations: e.g. A dataframe containing the following columns:
No mandatory groupings. No default value. OR A dataframe containing the following columns:
No mandatory groupings. No default value. |
... |
not used |
the same dataframe with additional columns for doubling time or relative doubling time plus confidence intervals.
ggoutbreak::england_covid %>% ggoutbreak::poisson_locfit_model(window=21) %>% ggoutbreak::doubling_time() %>% dplyr::glimpse()
ggoutbreak::england_covid %>% ggoutbreak::poisson_locfit_model(window=21) %>% ggoutbreak::doubling_time() %>% dplyr::glimpse()
SPI-M-O used a range of different statistical and mechanistic models to
produce estimates of the growth rate of the epidemic from various data
sources (including with an early version of ggoutbreak
).
data(england_consensus_growth_rate)
data(england_consensus_growth_rate)
A dataframe containing the following columns:
date (date) - the date of the estimate
low (numeric) - the lower published estimate of the growth rate
high (numeric) - the higher published estimate of the growth rate
No mandatory groupings.
No default value.
111 rows and 3 columns
SPI-M-O used a range of different statistical and mechanistic models to produce estimates of the reproduction number of the epidemic from various data sources.
data(england_consensus_rt)
data(england_consensus_rt)
A dataframe containing the following columns:
date (date) - the date of the estimate
low (numeric) - the lower published estimate of the reproduction number
high (numeric) - the higher published estimate of the reproduction number
No mandatory groupings.
No default value.
113 rows and 3 columns
A dataset of the daily count of covid cases by age group in England
downloaded from the UKHSA coronavirus API, and formatted for
use in ggoutbreak
. A denominator is calculated which is the overall
positive count for all age groups. This data set can be used to calculate
group-wise incidence and absolute growth rates and group wise proportions and
relative growth rates.
data(england_covid)
data(england_covid)
A dataframe containing the following columns:
date (as.Date) - the date column
class (enum(00_04
,05_09
,10_14
,15_19
,20_24
,25_29
,30_34
,35_39
,40_44
,45_49
,50_54
,55_59
,60_64
,65_69
,70_74
,75_79
,80_84
,85_89
,90+
)) - the class column
count (numeric) - the test positives for each age group
denom (numeric) - the test positives for all age groups
time (time_period) - the time column
Must be grouped by: class (and other groupings allowed).
No default value.
26790 rows and 5 columns
The coronavirus.gov.uk
dashboard published tests conducted and positive
results as separate data sets for a range of geographies. In this case the
data is combined with testing rate as denominator, and positives as count for
England.
data(england_covid_pcr_positivity)
data(england_covid_pcr_positivity)
A dataframe containing the following columns:
date (date) - a daily time series
time (time_period) - the time column
count (numeric) - test positives in England on that day
denom (numeric) - total tests conducted on that day
No mandatory groupings.
No default value.
1413 rows and 4 columns
An age group stratified dataset from
data(england_covid_proportion)
data(england_covid_proportion)
A dataframe containing the following columns:
class (character) - the age group
date (date) - the start date of a week
count (numeric) - the count of COVID positives
denom (numeric) - the number of COVID tests performed
population (numeric) - the size of the population at this age group
time (time_period) - the time column (weekly)
Must be grouped by: class (and other groupings allowed).
No default value.
1050 rows and 6 columns
the coronavirus.gov.uk site for positive cases aggregated to 10 year age groups and by weekly time.
NHS test and trace date which reported regional by age group testing effort aggregated to country level.
ONS 2021 census population aggregated to 10 year age groups.
Population counts by 5 year age group for England only from the 2021 census.
data(england_demographics)
data(england_demographics)
A dataframe containing the following columns:
class (enum(00_04
,05_09
,10_14
,15_19
,20_24
,25_29
,30_34
,35_39
,40_44
,45_49
,50_54
,55_59
,60_64
,65_69
,70_74
,75_79
,80_84
,85_89
,90+
)) - the class column
population (numeric) - the population count column
baseline_proportion (numeric) - the baseline proportion is the proportion this age group makes up of the total.
Must be grouped by: class (and other groupings allowed).
No default value.
19 rows and 3 columns
https://www.ons.gov.uk/file?uri=/peoplepopulationandcommunity/populationandmigration/populationestimates/datasets/populationandhouseholdestimatesenglandandwalescensus2021/census2021/census2021firstresultsenglandwales1.xlsx
This includes mainly the dates of lockdowns, releases from social distancing measures and the dates that new variants were first detected.
data(england_events)
data(england_events)
A dataframe containing the following columns:
label (character) - the event label
start (date) - the event start date
end (date) - the (optional) event end date
No mandatory groupings.
No default value.
13 rows and 3 columns
check-in (social activity) and alerts (self isolation instruction) data from the NHS COVID-19 app, aggregated to country level on a week by week basis.
data(england_nhs_app)
data(england_nhs_app)
A dataframe containing the following columns:
date (date) - the start date of the week
alerts (integer) - the count of self-isolation alerts
visits (integer) - the number of venue check-ins representing visits to social venues.
time (time_period) - the time column
No mandatory groupings.
No default value.
137 rows and 4 columns
The COVID-19 ONS infection survey took a random sample of the population and provides an estimate of the prevalence of COVID-19 that is supposedly free from ascertainment bias.
data(england_ons_infection_survey)
data(england_ons_infection_survey)
A dataframe containing the following columns:
date (date) - the date column
geography (character) - the geography column
proportion.0.5 (numeric) - the median proportion of people in the region testing positive for COVID-19
proportion.0.025 (numeric) - the lower CI of the proportion of people in the region testing positive for COVID-19
proportion.0.975 (numeric) - the upper CI of the proportion of people in the region testing positive for COVID-19
denom (integer) - the sample size on which this estimate was made (daily rate inferred from weekly sample sizes.)
time (time_period) - the time column
No mandatory groupings.
No default value.
9820 rows and 7 columns
The data is available here: https://www.ons.gov.uk/file?uri=/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/datasets/coronaviruscovid19infectionsurveydata/2023/20230310covid19infectionsurveydatasetsengland.xlsx
Data from the COG-UK and Sanger centre sequencing programme. The data were made available through the Welcome foundation at Lower tier local authority level, and is weekly timeseries of counts per variant. Variants were assigned using the tree structure of the Pango lineage. Different sub-lineages are aggregated to the major WHO variants of concern.
data(england_variants)
data(england_variants)
A dataframe containing the following columns:
date (date) - the end date of the week
time (time_period) - the time column
class (enum(Other
,Alpha (B.1.1.7)
,Delta (B.1.617.2)
,Delta (AY.4)
,Omicron (Other)
,Omicron (BA.2)
,Omicron (BA.4)
,Omicron (BA.5)
,XBB (Other)
,Kraken (XBB.1.5)
,Arcturus (XBB.1.16)
,Eris (EG.5.1)
)) - the class column
who_class (enum(Other
,Alpha
,Delta
,Omicron
,Kraken
,Arcturus
,Eris
)) - the who_class column
count (numeric) - the weekly count column
denom (numeric) - the number of sequences performed in that week
Must be grouped by: class (and other groupings allowed).
No default value.
479 rows and 6 columns
Format date as dmy
fdmy(date)
fdmy(date)
date |
a date to convert |
the formatted date
fdmy(Sys.Date())
fdmy(Sys.Date())
The x axis must be a date.
geom_events( events = i_events, event_label_size = 7, event_label_colour = "black", event_label_angle = -30, event_line_colour = "grey50", event_fill_colour = "grey50", hide_labels = FALSE, guide_axis = ggplot2::derive(), ... )
geom_events( events = i_events, event_label_size = 7, event_label_colour = "black", event_label_angle = -30, event_line_colour = "grey50", event_fill_colour = "grey50", hide_labels = FALSE, guide_axis = ggplot2::derive(), ... )
events |
Significant events or time spans A dataframe containing the following columns:
No mandatory groupings. A default value is defined. |
event_label_size |
how big to make the event label |
event_label_colour |
the event label colour |
event_label_angle |
the event label colour |
event_line_colour |
the event line colour |
event_fill_colour |
the event area fill |
hide_labels |
do not show labels at all |
guide_axis |
a guide axis configuration for the labels (see ggplot2::guide_axis and ggplot2::dup_axis). This can be used to specify a position amongst other things. |
... |
Arguments passed on to
|
a set of geoms for a timeseries.
A dataset of the weekly count of covid cases by age group in Germany downloaded from the Robert Koch Institute Survstat service, and formatted for use in growth rates. A denominator is calculated which is the overall positive count for all age groups. This data set can be used to calculate group-wise incidence and absolute growth rates and group wise proportions and relative growth rates.
data(germany_covid)
data(germany_covid)
A dataframe containing the following columns:
class (enum(0–14
,15–19
,20–24
,25–29
,30–39
,40–49
,50–59
,60–69
,70–79
,80+
,Unknown
, .ordered=TRUE)) - the age group
date (as.Date) - the date column
count (integer) - the test positives for each age group
time (time_period) - the time column
denom (integer) - the test positives for all age groups
Must be grouped by: class (and other groupings allowed).
No default value.
2070 rows and 6 columns
Derived from the Robert Koch Survstat service by comparing counts and incidence rates.
data(germany_demographics)
data(germany_demographics)
A dataframe containing the following columns:
class (enum(0–14
,15–19
,20–24
,25–29
,30–39
,40–49
,50–59
,60–69
,70–79
,80+
, .ordered=TRUE)) - the class column
population (integer) - the population column
Must be grouped by: class (and other groupings allowed).
No default value.
10 rows and 2 columns
Check whether vector is a date
is.Date(x)
is.Date(x)
x |
a vector to check |
TRUE if dates, FALSE otherwise
is.Date(Sys.Date())
is.Date(Sys.Date())
Create a set of labels for a time period based on the start and duration of
the period. The format is configurable using the start and end dates and the
dfmt
and ifmt
parameters, however if the time period has names then these
are used in preference.
## S3 method for class 'time_period' labels( object, ..., dfmt = "%d/%b", ifmt = "{start} — {end}", na.value = "Unknown" )
## S3 method for class 'time_period' labels( object, ..., dfmt = "%d/%b", ifmt = "{start} — {end}", na.value = "Unknown" )
object |
a set of decimal times as a time_period |
... |
not used |
dfmt |
a |
ifmt |
a glue spec referring to |
na.value |
a label for |
a set of character labels for the time
eg = as.time_period(Sys.Date()+0:10*7, anchor="start") labels(eg) labels(eg, ifmt="{start}", dfmt="%d/%b/%y") labels(eg, ifmt="until {end}", dfmt="%d %b %Y") # labels retained in constructor: eg2 = Sys.Date()+0:10*7 names(eg2) = paste0("week ",0:10) labels(eg2) labels(as.time_period(eg2, anchor="start"))
eg = as.time_period(Sys.Date()+0:10*7, anchor="start") labels(eg) labels(eg, ifmt="{start}", dfmt="%d/%b/%y") labels(eg, ifmt="until {end}", dfmt="%d %b %Y") # labels retained in constructor: eg2 = Sys.Date()+0:10*7 names(eg2) = paste0("week ",0:10) labels(eg2) labels(as.time_period(eg2, anchor="start"))
Perform logit scaling with correct axis formatting. To not be used directly but with ggplot (e.g. ggplot2::scale_y_continuous(trans = "logit") )
logit_trans(n = 5, ...)
logit_trans(n = 5, ...)
n |
the number of breaks |
... |
not used, for compatibility |
A scales object
library(ggplot2) library(tibble) tibble::tibble(pvalue = c(0.001, 0.05, 0.1), fold_change = 1:3) %>% ggplot2::ggplot(aes(fold_change , pvalue)) + ggplot2::geom_point() + ggplot2::scale_y_continuous(trans = "logit")
library(ggplot2) library(tibble) tibble::tibble(pvalue = c(0.001, 0.05, 0.1), fold_change = 1:3) %>% ggplot2::ggplot(aes(fold_change , pvalue)) + ggplot2::geom_point() + ggplot2::scale_y_continuous(trans = "logit")
max.Date
returns an integer and -Inf
for a set of NA
dates. This
is usually inconvenient.
max_date(x, ...)
max_date(x, ...)
x |
a vector of dates |
... |
ignored |
a date. '0001-01-01“ if there is no well defined minimum.
max_date(NA)
max_date(NA)
min.Date
returns an integer and Inf
for a set of NA
dates. This
is usually inconvenient.
min_date(x, ...)
min_date(x, ...)
x |
a vector of dates |
... |
ignored |
a date. 9999-12-31
if there is no well defined minimum.
min_date(NA)
min_date(NA)
Takes a list of times, classes and counts, e.g. a COGUK variant like data set
with time, (multinomial) class (e.g. variant) and count being the count in
that time period. Fits a quadratic B-spline on time to the proportion of the
data using nnet::multinom
, with approx one degree of freedom per class and per
window
units of the time series
multinomial_nnet_model( d = i_multinomial_input, ..., window = 14, frequency = "1 day", predict = TRUE )
multinomial_nnet_model( d = i_multinomial_input, ..., window = 14, frequency = "1 day", predict = TRUE )
d |
Multiclass count input |
... |
not used and present to allow proportion model to be used in a group_modify |
window |
a number of data points between knots, smaller values result in less smoothing, large value in more. |
frequency |
the density of the output estimates. |
predict |
result a prediction. If false we return the model. |
a new dataframe with time
(as a time period), class
, and proportion.0.5
, or a model object
if (FALSE) { # not run due to long running tmp = ggoutbreak::england_covid %>% dplyr::filter(date > "2022-01-01") %>% ggoutbreak::multinomial_nnet_model(window=21) %>% dplyr::glimpse() }
if (FALSE) { # not run due to long running tmp = ggoutbreak::england_covid %>% dplyr::filter(date > "2022-01-01") %>% ggoutbreak::multinomial_nnet_model(window=21) %>% dplyr::glimpse() }
This assumes positive disease counts are stratified by a population grouping, e.g. geography or age, and we have estimates of the size of that population during that time period. Normalising by population size allows us to compare groups.
normalise_incidence( modelled = i_timeseries, ..., population_unit = 1e+05, normalise_time = FALSE )
normalise_incidence( modelled = i_timeseries, ..., population_unit = 1e+05, normalise_time = FALSE )
modelled |
Model output from processing the A dataframe containing the following columns:
No mandatory groupings. No default value. |
... |
not used |
population_unit |
what population unit do you want the incidence in e.g. per 100K |
normalise_time |
The default behaviour for incidence is to keep it in
the same time units as the input data. If this parameter is set to |
a dataframe with incidence rates per unit capita. A dataframe containing the following columns:
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
incidence.per_capita.fit (double) - an estimate of the incidence per capita rate on a log scale
incidence.per_capita.se.fit (double) - the standard error of the incidence per capita rate estimate on a log scale
incidence.per_capita.0.025 (positive_double) - lower confidence limit of the incidence per capita rate (true scale)
incidence.per_capita.0.5 (positive_double) - median estimate of the incidence per capita rate (true scale)
incidence.per_capita.0.975 (positive_double) - upper confidence limit of the incidence per capita rate (true scale)
population_unit (double) - The population unit on which the per capita incidence rate is calculated
No mandatory groupings.
No default value.
tmp = ggoutbreak::england_covid %>% ggoutbreak::poisson_locfit_model(window=21) %>% ggoutbreak::normalise_incidence(ggoutbreak::england_demographics) %>% dplyr::glimpse()
tmp = ggoutbreak::england_covid %>% ggoutbreak::poisson_locfit_model(window=21) %>% ggoutbreak::normalise_incidence(ggoutbreak::england_demographics) %>% dplyr::glimpse()
This assumes positive disease counts are stratified by a population grouping, e.g. geography or age, and we have estimates of the size of that population during that time period. Normalising by population size allows us to compare groups.
normalise_incidence.incidence( modelled = i_incidence_model, pop = i_population_data, ..., population_unit = 1e+05, normalise_time = FALSE )
normalise_incidence.incidence( modelled = i_incidence_model, pop = i_population_data, ..., population_unit = 1e+05, normalise_time = FALSE )
modelled |
Model output from processing the A dataframe containing the following columns:
No mandatory groupings. No default value. |
pop |
The population data must be grouped in the same way as A dataframe containing the following columns:
No mandatory groupings. No default value. |
... |
not used |
population_unit |
what population unit do you want the incidence in e.g. per 100K |
normalise_time |
The default behaviour for incidence is to keep it in
the same time units as the input data. If this parameter is set to |
a dataframe with incidence rates per unit capita. A dataframe containing the following columns:
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
incidence.per_capita.fit (double) - an estimate of the incidence per capita rate on a log scale
incidence.per_capita.se.fit (double) - the standard error of the incidence per capita rate estimate on a log scale
incidence.per_capita.0.025 (positive_double) - lower confidence limit of the incidence per capita rate (true scale)
incidence.per_capita.0.5 (positive_double) - median estimate of the incidence per capita rate (true scale)
incidence.per_capita.0.975 (positive_double) - upper confidence limit of the incidence per capita rate (true scale)
population_unit (double) - The population unit on which the per capita incidence rate is calculated
No mandatory groupings.
No default value.
tmp = ggoutbreak::england_covid %>% ggoutbreak::poisson_locfit_model(window=21) %>% ggoutbreak::normalise_incidence(ggoutbreak::england_demographics) %>% dplyr::glimpse()
tmp = ggoutbreak::england_covid %>% ggoutbreak::poisson_locfit_model(window=21) %>% ggoutbreak::normalise_incidence(ggoutbreak::england_demographics) %>% dplyr::glimpse()
This assumes positive disease counts are stratified by a population grouping, e.g. geography or age, and we have estimates of the size of that population during that time period. Normalising by population size allows us to compare groups.
normalise_incidence.proportion( modelled = i_proportion_model, ..., population_unit = 1e+05, normalise_time = FALSE )
normalise_incidence.proportion( modelled = i_proportion_model, ..., population_unit = 1e+05, normalise_time = FALSE )
modelled |
Model output from processing the A dataframe containing the following columns:
No mandatory groupings. No default value. |
... |
not used |
population_unit |
what population unit do you want the incidence in e.g. per 100K |
normalise_time |
The default behaviour for incidence is to keep it in
the same time units as the input data. If this parameter is set to |
This scales a proportion model by the population unit to make it comparable to an incidence model.
a dataframe with incidence rates per unit capita. A dataframe containing the following columns:
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
incidence.per_capita.fit (double) - an estimate of the incidence per capita rate on a log scale
incidence.per_capita.se.fit (double) - the standard error of the incidence per capita rate estimate on a log scale
incidence.per_capita.0.025 (positive_double) - lower confidence limit of the incidence per capita rate (true scale)
incidence.per_capita.0.5 (positive_double) - median estimate of the incidence per capita rate (true scale)
incidence.per_capita.0.975 (positive_double) - upper confidence limit of the incidence per capita rate (true scale)
population_unit (double) - The population unit on which the per capita incidence rate is calculated
No mandatory groupings.
No default value.
tmp = ggoutbreak::england_covid %>% ggoutbreak::poisson_locfit_model(window=21) %>% ggoutbreak::normalise_incidence(ggoutbreak::england_demographics) %>% dplyr::glimpse()
tmp = ggoutbreak::england_covid %>% ggoutbreak::poisson_locfit_model(window=21) %>% ggoutbreak::normalise_incidence(ggoutbreak::england_demographics) %>% dplyr::glimpse()
This assumes case distribution proportions are stratified by a population grouping, e.g. geography or age, and we have estimates of the size of that population during that time period. Normalising by population proportion allows us to compare groups.
normalise_proportion( modelled = i_proportion_model, base = i_baseline_proportion_data, ... )
normalise_proportion( modelled = i_proportion_model, base = i_baseline_proportion_data, ... )
modelled |
Model output from processing the A dataframe containing the following columns:
No mandatory groupings. No default value. |
base |
The baseline data must be grouped in the same way as A dataframe containing the following columns:
No mandatory groupings. No default value. |
... |
not used |
a dataframe with incidence rates per unit capita. A dataframe containing the following columns:
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
proportion.fit (double) - an estimate of the proportion on a logit scale
proportion.se.fit (double) - the standard error of proportion estimate on a logit scale
proportion.0.025 (proportion) - lower confidence limit of proportion (true scale)
proportion.0.5 (proportion) - median estimate of proportion (true scale)
proportion.0.975 (proportion) - upper confidence limit of proportion (true scale)
risk_ratio.0.025 (positive_double) - lower confidence limit of the excess risk ratio for a population group
risk_ratio.0.5 (positive_double) - median estimate of the excess risk ratio for a population group
risk_ratio.0.975 (positive_double) - upper confidence limit of the excess risk ratio for a population group
baseline_proportion (proportion) - The population baseline risk from which the excess risk ratio is based
No mandatory groupings.
No default value.
tmp = ggoutbreak::england_covid %>% ggoutbreak::proportion_locfit_model(window=21) %>% ggoutbreak::normalise_proportion(ggoutbreak::england_demographics) %>% dplyr::glimpse() plot_growth_phase(tmp)
tmp = ggoutbreak::england_covid %>% ggoutbreak::proportion_locfit_model(window=21) %>% ggoutbreak::normalise_proportion(ggoutbreak::england_demographics) %>% dplyr::glimpse() plot_growth_phase(tmp)
Plot an incidence or proportion vs. growth phase diagram
plot_growth_phase( modelled = i_timestamped, timepoints = NULL, duration = max(dplyr::count(modelled)$n), interval = 7, mapping = if (interfacer::is_col_present(modelled, class)) ggplot2::aes(colour = class) else ggplot2::aes(), cis = TRUE, ... )
plot_growth_phase( modelled = i_timestamped, timepoints = NULL, duration = max(dplyr::count(modelled)$n), interval = 7, mapping = if (interfacer::is_col_present(modelled, class)) ggplot2::aes(colour = class) else ggplot2::aes(), cis = TRUE, ... )
modelled |
Either: A dataframe containing the following columns:
No mandatory groupings. No default value. OR: A dataframe containing the following columns:
No mandatory groupings. No default value. |
timepoints |
timepoints (as |
duration |
the length of the growth rate phase trail |
interval |
the length of time between markers on the phase plot |
mapping |
a |
cis |
should the phases be marked with confidence intervals? |
... |
Arguments passed on to
|
a ggplot timeseries
# example code tmp = ggoutbreak::england_covid %>% time_aggregate(count=sum(count)) tmp_pop = ggoutbreak::england_demographics %>% dplyr::ungroup() %>% dplyr::summarise(population = sum(population)) # If the incidence is normalised by population tmp2 = tmp %>% poisson_locfit_model() %>% normalise_incidence(tmp_pop) timepoints = as.Date(c("Lockdown 1" = "2020-03-30", "Lockdown 2" = "2020-12-31")) plot_growth_phase(tmp2, timepoints, duration=108)
# example code tmp = ggoutbreak::england_covid %>% time_aggregate(count=sum(count)) tmp_pop = ggoutbreak::england_demographics %>% dplyr::ungroup() %>% dplyr::summarise(population = sum(population)) # If the incidence is normalised by population tmp2 = tmp %>% poisson_locfit_model() %>% normalise_incidence(tmp_pop) timepoints = as.Date(c("Lockdown 1" = "2020-03-30", "Lockdown 2" = "2020-12-31")) plot_growth_phase(tmp2, timepoints, duration=108)
Growth rate timeseries diagram
plot_growth_rate( modelled = i_timeseries, ..., mapping = if (interfacer::is_col_present(modelled, class)) ggplot2::aes(colour = class) else ggplot2::aes(), events = i_events )
plot_growth_rate( modelled = i_timeseries, ..., mapping = if (interfacer::is_col_present(modelled, class)) ggplot2::aes(colour = class) else ggplot2::aes(), events = i_events )
modelled |
Either: A dataframe containing the following columns:
No mandatory groupings. No default value. OR: A dataframe containing the following columns:
No mandatory groupings. No default value. |
... |
Arguments passed on to |
mapping |
a |
events |
Significant events or time spans A dataframe containing the following columns:
No mandatory groupings. A default value is defined. |
a ggplot timeseries
# example code tmp = ggoutbreak::england_covid %>% time_aggregate(count=sum(count)) tmp_pop = ggoutbreak::england_demographics %>% dplyr::ungroup() %>% dplyr::summarise(population = sum(population)) # If the incidence is normalised by population tmp2 = tmp %>% poisson_locfit_model() %>% normalise_incidence(tmp_pop) # Default pdf device doesn't support unicode plot_growth_rate(tmp2,colour="blue") tmp3 = ggoutbreak::england_covid %>% proportion_locfit_model() # Default pdf device doesn't support unicode plot_growth_rate(tmp3)
# example code tmp = ggoutbreak::england_covid %>% time_aggregate(count=sum(count)) tmp_pop = ggoutbreak::england_demographics %>% dplyr::ungroup() %>% dplyr::summarise(population = sum(population)) # If the incidence is normalised by population tmp2 = tmp %>% poisson_locfit_model() %>% normalise_incidence(tmp_pop) # Default pdf device doesn't support unicode plot_growth_rate(tmp2,colour="blue") tmp3 = ggoutbreak::england_covid %>% proportion_locfit_model() # Default pdf device doesn't support unicode plot_growth_rate(tmp3)
Plot an incidence timeseries
plot_incidence( modelled = i_incidence_model, raw = i_incidence_data, ..., mapping = if (interfacer::is_col_present(modelled, class)) ggplot2::aes(colour = class) else ggplot2::aes(), events = i_events )
plot_incidence( modelled = i_incidence_model, raw = i_incidence_data, ..., mapping = if (interfacer::is_col_present(modelled, class)) ggplot2::aes(colour = class) else ggplot2::aes(), events = i_events )
modelled |
An optional estimate of the incidence time series. If A dataframe containing the following columns:
No mandatory groupings. No default value.
|
raw |
The raw count data A dataframe containing the following columns:
No mandatory groupings. No default value. |
... |
Arguments passed on to
|
mapping |
a |
events |
Significant events or time spans A dataframe containing the following columns:
No mandatory groupings. A default value is defined. |
a ggplot object
# example code tmp = ggoutbreak::england_covid %>% time_aggregate(count=sum(count)) tmp_pop = ggoutbreak::england_demographics %>% dplyr::ungroup() %>% dplyr::summarise(population = sum(population)) # If the incidence is normalised by population tmp2 = tmp %>% poisson_locfit_model() %>% normalise_incidence(tmp_pop) plot_incidence(tmp2,tmp %>% dplyr::cross_join(tmp_pop),colour="blue",size=0.25)
# example code tmp = ggoutbreak::england_covid %>% time_aggregate(count=sum(count)) tmp_pop = ggoutbreak::england_demographics %>% dplyr::ungroup() %>% dplyr::summarise(population = sum(population)) # If the incidence is normalised by population tmp2 = tmp %>% poisson_locfit_model() %>% normalise_incidence(tmp_pop) plot_incidence(tmp2,tmp %>% dplyr::cross_join(tmp_pop),colour="blue",size=0.25)
Plot a multinomial proportions mode
plot_multinomial( modelled = i_multinomial_proportion_model, ..., mapping = ggplot2::aes(fill = class), events = i_events, normalise = FALSE )
plot_multinomial( modelled = i_multinomial_proportion_model, ..., mapping = ggplot2::aes(fill = class), events = i_events, normalise = FALSE )
modelled |
the multinomial count data A dataframe containing the following columns:
Must be grouped by: class (exactly). No default value. |
... |
Arguments passed on to |
mapping |
a |
events |
Significant events or time spans A dataframe containing the following columns:
No mandatory groupings. A default value is defined. |
normalise |
make sure the probabilities add up to one - this can be a bad idea if you know you may have missing values. |
a ggplot
tmp = ggoutbreak::england_covid %>% ggoutbreak::proportion_locfit_model(window=21) %>% dplyr::glimpse() plot_multinomial(tmp, normalise=TRUE)+ ggplot2::scale_fill_viridis_d()
tmp = ggoutbreak::england_covid %>% ggoutbreak::proportion_locfit_model(window=21) %>% dplyr::glimpse() plot_multinomial(tmp, normalise=TRUE)+ ggplot2::scale_fill_viridis_d()
Plot a proportions timeseries
plot_proportion( modelled = i_proportion_model, raw = i_proportion_data, ..., mapping = if (interfacer::is_col_present(modelled, class)) ggplot2::aes(colour = class) else ggplot2::aes(), events = i_events )
plot_proportion( modelled = i_proportion_model, raw = i_proportion_data, ..., mapping = if (interfacer::is_col_present(modelled, class)) ggplot2::aes(colour = class) else ggplot2::aes(), events = i_events )
modelled |
Proportion model estimates A dataframe containing the following columns:
No mandatory groupings. No default value. |
raw |
Raw count data A dataframe containing the following columns:
No mandatory groupings. No default value. |
... |
Arguments passed on to
|
mapping |
a |
events |
Significant events or time spans A dataframe containing the following columns:
No mandatory groupings. A default value is defined. |
a ggplot object
tmp = ggoutbreak::england_covid %>% ggoutbreak::proportion_locfit_model(window=21) %>% dplyr::glimpse() plot_proportion(tmp)+ ggplot2::scale_fill_viridis_d(aesthetics = c("fill","colour"))
tmp = ggoutbreak::england_covid %>% ggoutbreak::proportion_locfit_model(window=21) %>% dplyr::glimpse() plot_proportion(tmp)+ ggplot2::scale_fill_viridis_d(aesthetics = c("fill","colour"))
Reproduction number timeseries diagram
plot_rt( modelled = i_reproduction_number, ..., mapping = if (interfacer::is_col_present(modelled, class)) ggplot2::aes(colour = class) else ggplot2::aes(), events = i_events )
plot_rt( modelled = i_reproduction_number, ..., mapping = if (interfacer::is_col_present(modelled, class)) ggplot2::aes(colour = class) else ggplot2::aes(), events = i_events )
modelled |
the modelled Rt estimate A dataframe containing the following columns:
No mandatory groupings. No default value. |
... |
Arguments passed on to |
mapping |
a |
events |
Significant events or time spans A dataframe containing the following columns:
No mandatory groupings. A default value is defined. |
a ggplot timeseries
# example code tmp = ggoutbreak::england_covid %>% time_aggregate(count=sum(count)) if (FALSE) { tmp2 = tmp %>% poisson_locfit_model() %>% rt_from_growth_rate() # comparing RT from growth rates with England consensus Rt: plot_rt(tmp2,colour="blue")+ geom_errorbar(data=england_consensus_rt, mapping=aes(x=date-21,ymin=low,ymax=high),colour="red") }
# example code tmp = ggoutbreak::england_covid %>% time_aggregate(count=sum(count)) if (FALSE) { tmp2 = tmp %>% poisson_locfit_model() %>% rt_from_growth_rate() # comparing RT from growth rates with England consensus Rt: plot_rt(tmp2,colour="blue")+ geom_errorbar(data=england_consensus_rt, mapping=aes(x=date-21,ymin=low,ymax=high),colour="red") }
This uses a generalised linear model to fit a quasi-poisson model with a time
varying rate as a natural cubic spline with approx one degree of freedom per
window
units of the time series.
poisson_glm_model(d = i_incidence_input, ..., window = 14, frequency = "1 day")
poisson_glm_model(d = i_incidence_input, ..., window = 14, frequency = "1 day")
d |
Count model input A dataframe containing the following columns:
Ungrouped. No default value. |
... |
not used and present to allow proportion model to be used in a
|
window |
a number of data points defining the bandwidth of the estimate,
smaller values result in less smoothing, large value in more. The default
value of 14 is calibrated for data provided on a daily frequency, with
weekly data a lower value may be preferred. - (defaults to |
frequency |
the density of the output estimates as a time period such as
|
A dataframe containing the following columns:
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
incidence.fit (double) - an estimate of the incidence rate on a log scale
incidence.se.fit (double) - the standard error of the incidence rate estimate on a log scale
incidence.0.025 (positive_double) - lower confidence limit of the incidence rate (true scale)
incidence.0.5 (positive_double) - median estimate of the incidence rate (true scale)
incidence.0.975 (positive_double) - upper confidence limit of the incidence rate (true scale)
No mandatory groupings.
No default value.
tmp = ggoutbreak::england_covid %>% ggoutbreak::poisson_glm_model(window=21) %>% dplyr::glimpse()
tmp = ggoutbreak::england_covid %>% ggoutbreak::poisson_glm_model(window=21) %>% dplyr::glimpse()
Takes a list of times and counts and fits a quasi-poisson model
fitted with a log link function to count data using local regression
using the package locfit
.
poisson_locfit_model( d = i_incidence_input, ..., window = 14, deg = 1, frequency = "1 day", predict = TRUE )
poisson_locfit_model( d = i_incidence_input, ..., window = 14, deg = 1, frequency = "1 day", predict = TRUE )
d |
input data A dataframe containing the following columns:
Ungrouped. No default value. |
... |
not used and present to allow proportion model to be used in a
|
window |
a number of data points defining the bandwidth of the estimate,
smaller values result in less smoothing, large value in more. The default
value of 14 is calibrated for data provided on a daily frequency, with
weekly data a lower value may be preferred. - (defaults to |
deg |
polynomial degree (min 1) - higher degree results in less
smoothing, lower values result in more smoothing. A degree of 1 is fitting
a linear model piece wise. - (defaults to |
frequency |
the density of the output estimates as a time period such as
|
predict |
result is a prediction dataframe. If false we return the
|
This results is an incidence rate estimate plus an absolute exponential growth rate estimate both based on the time unit of the input data (e.g. for daily data the rate will be cases per day and the growth rate will be daily).
A dataframe containing the following columns:
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
incidence.fit (double) - an estimate of the incidence rate on a log scale
incidence.se.fit (double) - the standard error of the incidence rate estimate on a log scale
incidence.0.025 (positive_double) - lower confidence limit of the incidence rate (true scale)
incidence.0.5 (positive_double) - median estimate of the incidence rate (true scale)
incidence.0.975 (positive_double) - upper confidence limit of the incidence rate (true scale)
growth.fit (double) - an estimate of the growth rate
growth.se.fit (double) - the standard error the growth rate
growth.0.025 (double) - lower confidence limit of the growth rate
growth.0.5 (double) - median estimate of the growth rate
growth.0.975 (double) - upper confidence limit of the growth rate
No mandatory groupings.
No default value.
ggoutbreak::england_covid %>% ggoutbreak::poisson_locfit_model(window=21) %>% dplyr::glimpse()
ggoutbreak::england_covid %>% ggoutbreak::poisson_locfit_model(window=21) %>% dplyr::glimpse()
This uses a generalised linear model to fit a quasi-binomial model with a time
varying rate as a natural cubic spline with approx one degree of freedom per
window
units of the time series.
proportion_glm_model( d = i_proportion_input, ..., window = 14, frequency = "1 day" )
proportion_glm_model( d = i_proportion_input, ..., window = 14, frequency = "1 day" )
d |
Proportion model input A dataframe containing the following columns:
Ungrouped. No default value. |
... |
not used and present to allow proportion model to be used in a
|
window |
a number of data points defining the bandwidth of the estimate,
smaller values result in less smoothing, large value in more. The default
value of 14 is calibrated for data provided on a daily frequency, with
weekly data a lower value may be preferred. - (defaults to |
frequency |
the density of the output estimates as a time period such as
|
A dataframe containing the following columns:
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
proportion.fit (double) - an estimate of the proportion on a logit scale
proportion.se.fit (double) - the standard error of proportion estimate on a logit scale
proportion.0.025 (proportion) - lower confidence limit of proportion (true scale)
proportion.0.5 (proportion) - median estimate of proportion (true scale)
proportion.0.975 (proportion) - upper confidence limit of proportion (true scale)
No mandatory groupings.
No default value.
# TODO: find out cause of the warnings # "observations with zero weight not used for calculating dispersion" suppressWarnings( ggoutbreak::england_covid %>% ggoutbreak::proportion_glm_model(window=21) %>% dplyr::glimpse() )
# TODO: find out cause of the warnings # "observations with zero weight not used for calculating dispersion" suppressWarnings( ggoutbreak::england_covid %>% ggoutbreak::proportion_glm_model(window=21) %>% dplyr::glimpse() )
takes a list of times, counts and a denominator and fits a quasi-binomial model
using a logit link function to proportion data using local regression
using the package locfit
.
proportion_locfit_model( d = i_proportion_input, ..., window = 14, deg = 1, frequency = "1 day", predict = TRUE )
proportion_locfit_model( d = i_proportion_input, ..., window = 14, deg = 1, frequency = "1 day", predict = TRUE )
d |
the input A dataframe containing the following columns:
Ungrouped. No default value. |
... |
not used and present to allow proportion model to be used in a
|
window |
a number of data points defining the bandwidth of the estimate,
smaller values result in less smoothing, large value in more. The default
value of 14 is calibrated for data provided on a daily frequency, with
weekly data a lower value may be preferred. - (defaults to |
deg |
polynomial degree (min 1) - higher degree results in less
smoothing, lower values result in more smoothing. A degree of 1 is fitting
a linear model piece wise. - (defaults to |
frequency |
the density of the output estimates as a time period such as
|
predict |
result is a prediction dataframe. If false we return the
|
This expects d to contain one combination of:
time
and count
and denom
columns - e.g. all tests conducted.
This results is a one versus others comparison binomial proportion estimate plus a relative growth rate estimate specifying how much quicker this is growing compared to the growth of the denominator.
The denominator maybe the sum of all subgroups denom = sum(count)
, e.g. in the situation where
there are multiple variants of a disease circulating. In which case the
relative growth is that of the subgroup compared to the overall. You can
make this a one-versus-others comparison by making the denominator exclude the
current item (e.g. denom = sum(count)-count
).
The denominator can also be used to express the size of the population tested. This gives us a relative growth rate that is different in essence to the previous and may be a better estimate of the true growth rate in the situation where testing effort is variable, or capacity saturated.
A dataframe containing the following columns:
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
proportion.fit (double) - an estimate of the proportion on a logit scale
proportion.se.fit (double) - the standard error of proportion estimate on a logit scale
proportion.0.025 (proportion) - lower confidence limit of proportion (true scale)
proportion.0.5 (proportion) - median estimate of proportion (true scale)
proportion.0.975 (proportion) - upper confidence limit of proportion (true scale)
relative.growth.fit (double) - an estimate of the relative growth rate
relative.growth.se.fit (double) - the standard error the relative growth rate
relative.growth.0.025 (double) - lower confidence limit of the relative growth rate
relative.growth.0.5 (double) - median estimate of the relative growth rate
relative.growth.0.975 (double) - upper confidence limit of the relative growth rate
No mandatory groupings.
No default value.
ggoutbreak::england_covid %>% ggoutbreak::proportion_locfit_model(window=21) %>% dplyr::glimpse()
ggoutbreak::england_covid %>% ggoutbreak::proportion_locfit_model(window=21) %>% dplyr::glimpse()
e.g. age banded population, or a discrete probability distribution e.g. a serial interval distribution. This method fits a monotonically increasing spline to the cumulative distribution (including the upper and lower limits) and interpolating using that spline to the new cut points.
reband_discrete( x, y, xout, xlim = c(0, NA), ytotal = c(0, sum(y)), digits = 0, labelling = c("positive_integer", "inclusive", "exclusive"), sep = "-" )
reband_discrete( x, y, xout, xlim = c(0, NA), ytotal = c(0, sum(y)), digits = 0, labelling = c("positive_integer", "inclusive", "exclusive"), sep = "-" )
x |
a set of upper limits of bands, e.g. for age: 0-14;15-64;65-79;80+ is 15,65,80,NA |
y |
a set of quantities for each band e.g. population figures |
xout |
a set of new upper limits |
xlim |
Upper and lower limits for x. if the last band is e.g 80+ in the input and we want to know the 85+ band in the output some kind of maximum upper limit is needed to interpolate to. |
ytotal |
upper and lower limits for y. If the interpolation values fall outside of
x then the min and max limits of y are given by this. This would be |
digits |
if the |
labelling |
are the |
sep |
seperator for names e.g. |
a rebanded set of discrete values, guaranteed to sum to the same as y
ul = stringr::str_extract(england_demographics$class, "_([0-9]+)",group = 1) %>% as.numeric() tmp = reband_discrete( ul, england_demographics$population, c(5,10,15,40,80), xlim=c(0,120)) tmp sum(tmp) sum(england_demographics$population)
ul = stringr::str_extract(england_demographics$class, "_([0-9]+)",group = 1) %>% as.numeric() tmp = reband_discrete( ul, england_demographics$population, c(5,10,15,40,80), xlim=c(0,120)) tmp sum(tmp) sum(england_demographics$population)
Calculate a reproduction number estimate from incidence data using the EpiEstim library and an empirical generation time distribution. This uses resampling to transmit uncertainty in generation time estimates. This is quite slow for each time series depending on the number of bootstraps and samples in the infectivity profile.
rt_epiestim( df = i_incidence_input, ip = i_infectivity_profile, bootstraps = 2000, window = 14, mean_prior = 1, std_prior = 2, ... )
rt_epiestim( df = i_incidence_input, ip = i_infectivity_profile, bootstraps = 2000, window = 14, mean_prior = 1, std_prior = 2, ... )
df |
Count data. Extra groups are allowed. A dataframe containing the following columns:
Ungrouped. No default value. |
ip |
infectivity profile A dataframe containing the following columns:
Must be grouped by: boot (exactly). A default value is defined. |
bootstraps |
|
window |
|
mean_prior |
the prior for the $R_t$ estimate. When sample size is low the $R_t$ estimate will revert to this prior. In EpiEstim the default is a high number to allow detection of insufficient data but this tends to create anomalies in the early part of infection timeseries. A possible value is $R_0$ but in fact this also will be a poor choice for the value of $R_t$ when case numbers drop to a low value. |
std_prior |
the prior for the $R_t$ SD. |
... |
not used |
This will calculate a reproduction number for each group in the input dataframe.
A dataframe containing the following columns:
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
rt.fit (double) - an estimate of the reproduction number
rt.se.fit (double) - the standard error of the reproduction number
rt.0.025 (double) - lower confidence limit of the reproduction number
rt.0.5 (double) - median estimate of the reproduction number
rt.0.975 (double) - upper confidence limit of the reproduction number
No mandatory groupings.
No default value.
tmp = ggoutbreak::england_covid %>% time_aggregate(count=sum(count)) if (FALSE) { # not run due to long running tmp2 = tmp %>% rt_epiestim() }
tmp = ggoutbreak::england_covid %>% time_aggregate(count=sum(count)) if (FALSE) { # not run due to long running tmp2 = tmp %>% rt_epiestim() }
Calculate a reproduction number estimate from growth rate using the Wallinga 2007 estimation using empirical generation time distribution. This uses resampling to transmit uncertainty in growth rate estimates
rt_from_growth_rate( df = i_growth_rate, ip = i_infectivity_profile, bootstraps = 2000 )
rt_from_growth_rate( df = i_growth_rate, ip = i_infectivity_profile, bootstraps = 2000 )
df |
Growth rate estimates A dataframe containing the following columns:
No mandatory groupings. No default value. |
ip |
Infectivity profile A dataframe containing the following columns:
Must be grouped by: boot (exactly). A default value is defined. |
bootstraps |
|
A dataframe containing the following columns:
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
rt.fit (double) - an estimate of the reproduction number
rt.se.fit (double) - the standard error of the reproduction number
rt.0.025 (double) - lower confidence limit of the reproduction number
rt.0.5 (double) - median estimate of the reproduction number
rt.0.975 (double) - upper confidence limit of the reproduction number
No mandatory groupings.
No default value.
tmp = ggoutbreak::england_covid %>% time_aggregate(count=sum(count)) if (FALSE) { # not run tmp2 = tmp %>% poisson_locfit_model() %>% rt_from_growth_rate() }
tmp = ggoutbreak::england_covid %>% time_aggregate(count=sum(count)) if (FALSE) { # not run tmp2 = tmp %>% poisson_locfit_model() %>% rt_from_growth_rate() }
Calculate a reproduction number estimate from growth rate using the methods described in the vignette "Estimating the reproduction number from modelled incidence" and using an empirical generation time distribution.
rt_from_incidence(df = i_incidence_model, ip = i_infectivity_profile)
rt_from_incidence(df = i_incidence_model, ip = i_infectivity_profile)
df |
Count data A dataframe containing the following columns:
No mandatory groupings. No default value. |
ip |
Infectivity profile A dataframe containing the following columns:
Must be grouped by: boot (exactly). A default value is defined. |
A dataframe containing the following columns:
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
rt.fit (double) - an estimate of the reproduction number
rt.se.fit (double) - the standard error of the reproduction number
rt.0.025 (double) - lower confidence limit of the reproduction number
rt.0.5 (double) - median estimate of the reproduction number
rt.0.975 (double) - upper confidence limit of the reproduction number
No mandatory groupings.
No default value.
df = ggoutbreak::england_covid %>% time_aggregate(count=sum(count)) %>% poisson_locfit_model() if (FALSE) { # not run tmp2 = df %>% rt_from_incidence() }
df = ggoutbreak::england_covid %>% time_aggregate(count=sum(count)) %>% poisson_locfit_model() if (FALSE) { # not run tmp2 = df %>% rt_from_incidence() }
A log1p y scale
scale_y_log1p(..., n = 5, base = 10, dp = 0)
scale_y_log1p(..., n = 5, base = 10, dp = 0)
... |
Other arguments passed on to |
n |
the number of major breaks |
base |
the base for the logarithm |
dp |
decimal points |
a ggplot scale
A logit y scale
scale_y_logit(...)
scale_y_logit(...)
... |
Other arguments passed on to |
a ggplot scale
Aggregate time series data preserving the time series
time_aggregate( df = i_timestamped, ..., .groups = NULL, .cols = NULL, .fns = NULL )
time_aggregate( df = i_timestamped, ..., .groups = NULL, .cols = NULL, .fns = NULL )
df |
an optionally grouped time series. Grouping should not include the time
column. The grouping works differently from |
... |
A set of |
.groups |
as per |
.cols |
Optional tidyselect column specification for |
.fns |
Optional a set of function specifications as per |
the summarised time series preserving the time
column, and with the grouping
structure involving one fewer levels that the input
ggoutbreak::england_covid %>% time_aggregate(count = sum(count), denom = sum(denom)) %>% dplyr::glimpse() ggoutbreak::england_covid %>% time_aggregate(.fns=mean) %>% dplyr::glimpse()
ggoutbreak::england_covid %>% time_aggregate(count = sum(count), denom = sum(denom)) %>% dplyr::glimpse() ggoutbreak::england_covid %>% time_aggregate(.fns=mean) %>% dplyr::glimpse()
This principally is designed to take a record of single events and produce a summary time-series count of events by group, class and date. The default behaviour is to guess the cadence of the input data and summarise the event line list to a (set of) regular time-series counts for use in incidence and growth rate estimates.
time_summarise( df = i_dated, unit, anchor = "start", rectangular = FALSE, ..., .fill = list(count = 0) )
time_summarise( df = i_dated, unit, anchor = "start", rectangular = FALSE, ..., .fill = list(count = 0) )
df |
a line list of data you want to summarise, optionally grouped.
If this is grouped then each group is treated independently. The remaining
columns must contain a |
unit |
a period e.g. "1 week" |
anchor |
one of a date, "start" or "end" or a weekday name e.g. "mon" this will always be one of the start of the time periods we are cutting into |
rectangular |
should the resulting time series be the same length for all groups. This is only the case if you can be sure that your data is complete for all subgroups, otherwise missing data will be treated as zero counts. This is important if leading and trailing missing data in one subgroup can be due to a reporting delay in that subgroup, in which case a rectangular time series will erroneously fill in zero counts for this missing data. |
... |
a spec for a dplyr::summary(...) - optional, and if not provided a
|
.fill |
a list similar to tidyr::complete for values to fill variables with |
If the data is given with a class
column the time series are interpreted as
having a denominator, consisting of all the different classes within a time period.
This may be subtypes (e.g. variants, serotypes) or markers for test positivity.
In either case the resulting time series will have counts for all classes and
denominators for the combination.
There is flexibility for other kinds of summarisation if the raw data is not
count based (e.g. means of continuous variables) but in this case a the slider
package is usually going to be better, as time summarise will only look at non
overlapping time periods with fixed lengths.
There is another use case where an existing timeseries on a particular
frequency is aggregated to another less frequent basis (e.g. moving from a
daily timeseries to a weekly one). In this case the input will contain a
count
column. In this mode no checks are made that the more frequent events
are all present before summarisation so the result may include different numbers
of input periods (e.g. going from weeks to months may be 4 or 5 weeks in each
month)
The output depends on whether or not the input was grouped and
had a class
column. The most detailed output will be:
A dataframe containing the following columns:
denom (positive_integer) - Total test counts associated with the specified timeframe
count (positive_integer) - Positive case counts associated with the specified timeframe
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
No mandatory groupings.
No default value.
or a more minimal output if the input is only a plain list of dated events:
A dataframe containing the following columns:
count (positive_integer) - Positive case counts associated with the specified timeframe
time (ggoutbreak::time_period + group_unique) - A (usually complete) set of singular observations per unit time as a time_period
No mandatory groupings.
No default value.
Convert a set of timepoints to dates
time_to_date( timepoints, unit = attr(timepoints, "unit"), start_date = attr(timepoints, "start_date") )
time_to_date( timepoints, unit = attr(timepoints, "unit"), start_date = attr(timepoints, "start_date") )
timepoints |
a set of numeric time points |
unit |
the period / unit of the time points, which will be extracted from timepoints if possible |
start_date |
the zero day of the time series, will be extracted from timepoints if possible |
a vector of dates
times = date_to_time(as.Date("2019-12-29")+0:100, "1 week") dates = time_to_date(times)
times = date_to_time(as.Date("2019-12-29")+0:100, "1 week") dates = time_to_date(times)
time_period
classType coercion to a time_period
class
type.time_period(x)
type.time_period(x)
x |
a vector to be coerced to a time period |
a time_period
or an error
type.time_period(1:100)
type.time_period(1:100)
Calculate the reproduction number from a growth rate estimate and an infectivity profile
wallinga_lipsitch(r, y, a = 1:length(y))
wallinga_lipsitch(r, y, a = 1:length(y))
r |
a growth rate (may be a vector) |
y |
an empirical infectivity profile as a probability vector, starting at |
a |
the end time of the estimate (defaults to single days). |
a reproduction number estimate based on r
wallinga_lipsitch(r=seq(-0.1,0.1,length.out=9), y=dgamma(1:50, 5,2))
wallinga_lipsitch(r=seq(-0.1,0.1,length.out=9), y=dgamma(1:50, 5,2))