--- title: "Population comparisons and incidence" output: html_document vignette: > %\VignetteIndexEntry{Population comparisons and incidence} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(patchwork) library(ggoutbreak) here::i_am("vignettes/incidence-trends.Rmd") source(here::here("vignettes/vignette-utils.R")) ``` Comparisons between epidemic states in different populations separated by for example geography, or in this case age, requires a normalisation by the population size in either absolute or proportional terms. Based on the 2021 census we have some demo data for the demographics of England: ```{r} dplyr::glimpse(england_demographics) ``` # Incidence Poisson rate model A plot of the normalised incidence rates of COVID-19 by population size, shows initially the rate of COVID cases was highest in the elderly. By late 2020 this pattern had changed and rates were uniform accross age groups. In early 2021 as vaccination took hold and school testing was rolled out, younger age groups had higher rates of COVID positive tests, with a curious spike in the very young age groups around November 2021. In early 2022 the pattern reversed and the elderly again became the age group with the highest rates, and that pattern has persisted until the present. ```{r} tmp = ggoutbreak::england_covid %>% ggoutbreak::poisson_locfit_model(window=21) %>% ggoutbreak::normalise_incidence(ggoutbreak::england_demographics) raw_pop = ggoutbreak::england_covid %>% dplyr::inner_join(england_demographics, by="class") plot_incidence(tmp,raw = raw_pop, size=0.25)+scale_y_log1p(n=7)+ ggplot2::scale_colour_viridis_d(aesthetics = c("fill","colour")) ``` The use of test positives as a proxy for COVID incidence is clearlly potentially biased by testing (partilcularly in the first wave where testing was limited to those in hospital). A more reliable comparison in this situation would be a test positivie proportion, but unfortunately testing rates are not published broken down by age. The exponential growth rate is already normalised by population size. Comparisons of the growth rate in these populations gives an idea of how tightly coupled they are. In most age groups the epidemic is growing and shrinking in sync apart from possibly the very young. COVID detections in this age group were not particularly reliable though and this would be easy to over-interpret. ```{r} plot_growth_rate(tmp)+ ggplot2::scale_fill_viridis_d(aesthetics = c("fill","colour"))+ ggplot2::coord_cartesian(ylim=c(-0.15,0.15)) ``` The combination of growth and normalised incidence allows us to compare the epidemic state at different time points, in this case Christmas day in 2020, 2021 and 2022. This shows the same data as the previous graphs. ```{r} plot_growth_phase(tmp, timepoints = as.Date(c("Xmas 2020"="2020-12-25","Xmas 2021"="2021-12-25","Xmas 2022"="2022-12-25")), duration = 70, interval = 7 )+ ggplot2::scale_colour_viridis_d() ``` # Proportion model There are two possible proportions models which woudl be of interest here. As mentioned about the proportion of positive tests in each age group would give us a clearer picture of whether the differences between age groups are down to differential testing, but unfortunately not available in this data set. The second potential use is the distribution of ages in the test positive age group. This age distribution gives us some information about where the burden of disease is in the population but is also biased by test prioritisation. A multinomial proportion shows similar patterns as the normalised incidence plot above: ```{r} tmp2 = ggoutbreak::england_covid %>% ggoutbreak::proportion_locfit_model(window=21) p1 = plot_multinomial(tmp2,normalise = TRUE)+ ggplot2::scale_fill_viridis_d() p2 = ggplot2::ggplot(england_demographics)+ ggplot2::geom_bar(ggplot2::aes(x="baseline",y=population/sum(population)*100,fill=class), stat="identity", position="stack", colour="black", linewidth=0.1)+ ggplot2::scale_fill_viridis_d(guide="none")+ ggplot2::xlab(NULL)+ ggplot2::ylab(NULL)+ ggplot2::theme(axis.text.y = ggplot2::element_blank())+ ggplot2::coord_cartesian(expand=FALSE) p1+p2+patchwork::plot_layout(nrow=1,widths = c(20,1),guides = "collect") ``` The age distribution of test positives can be normalised by the age distribution of the population. This give us a relative proportion of age groups in people testing positive versus expected in the population. This is conceptually a relative risk but of age group given COVID status i.e. it is $\frac{P(age = 80+|COVID+)}{P(age = 80+)}$ At any point in time for a given population this quantity is centred around 1 and comparing it to growth rate gives us a possibly clearer picture of the trajectory of the relative distribution of COVID in the population. In Xmas 2021 although the majority of cases was in the young, relatively high growth in the elderly population meant that they were catching up, and from above we can see that by early 2022 the elderly had highest COVID positive rates. In 2022 however, the separation of the age groups was established and the trajectories were acting to preserve that separation. ```{r} tmp3 = tmp2 %>% normalise_proportion(england_demographics) plot_growth_phase(tmp3, timepoints = as.Date(c("Xmas 2020"="2020-12-25","Xmas 2021"="2021-12-25","Xmas 2022"="2022-12-25")), duration = 70, interval = 7 )+ ggplot2::scale_colour_viridis_d() ``` # TODO: Regional breakdown of testing effort and positivity by age group was published as part of test and trace. This had a by-age and by-region breakdown, until they shut down test and trace. https://www.gov.uk/government/publications/weekly-statistics-for-nhs-test-and-trace-england-2-to-15-june-2022 From this we could look at by age group proportion and incidence models and look into ascertainment bias in age groups. ```{r} prop = ggoutbreak::england_covid_proportion %>% ggoutbreak::proportion_locfit_model(window=5) plot_proportion(prop)+ggplot2::scale_fill_viridis_d(aesthetics = c("colour","fill")) tmp_pop = ggoutbreak::england_covid_proportion %>% dplyr::select(class,population) %>% dplyr::distinct() pois = ggoutbreak::england_covid_proportion %>% ggoutbreak::poisson_locfit_model(window=5) %>% ggoutbreak::normalise_incidence(tmp_pop) plot_incidence(pois)+ggplot2::scale_fill_viridis_d(aesthetics = c("colour","fill")) ``` ## Pre test probability Using the covid infection survey we can look at population prevalence based on random sampling versus estimates of incidence based on test positivity. This has some relationship to the pre test probability of testing, although would need some sort of accounting for the fact that one is incidence and the other prevalence. ```{r} p1 = ggoutbreak::plot_proportion( ggoutbreak::england_ons_infection_survey %>% dplyr::filter(geography == "England") %>% dplyr::mutate( proportion.fit = NA, proportion.se.fit=NA, ),events = ggoutbreak::england_events)+ ggplot2::theme( axis.title.x = ggplot2::element_blank(), axis.text.x = ggplot2::element_blank(), axis.text.x.bottom = ggplot2::element_blank() ) p2 = ggoutbreak::england_covid_pcr_positivity %>% proportion_locfit_model() %>% dplyr::inner_join(ggoutbreak::england_ons_infection_survey %>% dplyr::filter(geography=="England") %>% dplyr::rename_with(.cols = starts_with("proportion"), .fn = ~stringr::str_replace(.x,"proportion","ons")), by=c("time") ) %>% dplyr::transmute(date=date, pre_test_odds = proportion.0.5 / ons.0.5) %>% ggplot2::ggplot(ggplot2::aes(x=date,y=pre_test_odds)) + ggplot2::geom_line() + ggplot2::geom_hline(yintercept=1, colour="grey40",linetype="dashed") + ggplot2::ylab("Pre-test odds ratio COVID") + ggoutbreak::geom_events(events = ggoutbreak::england_events,hide_labels = TRUE)+ ggplot2::theme( axis.title.x = ggplot2::element_blank(), axis.text.x = ggplot2::element_blank(), axis.text.x.bottom = ggplot2::element_blank(), axis.text.x.top = ggplot2::element_blank() ) ``` Likewise the test positives per head of population can be compared to the prevalence. In this case the ratio has a connection with the ascertainment rate although the infectivity profile needs to be taken into account here (or more accurately the probability the test is positive given the sample is taken on a specific day post infection and the patient is infected), as the prevalence number represents people who are infectious on a given day, and the test positives is closer to the incidence. ```{r} england_pop = sum(ggoutbreak::england_demographics$population) p3 = ggoutbreak::england_covid_pcr_positivity %>% dplyr::select(-denom) %>% dplyr::mutate(denom = england_pop) %>% proportion_locfit_model() %>% dplyr::inner_join(england_ons_infection_survey %>% dplyr::filter(geography=="England") %>% dplyr::rename_with(.cols = starts_with("proportion"), .fn = ~stringr::str_replace(.x,"proportion","ons")), by=c("time") ) %>% dplyr::transmute(date=date, ascertainment = proportion.0.5 / ons.0.5 * 100) %>% ggplot2::ggplot(ggplot2::aes(x=date,y=ascertainment)) + ggplot2::geom_line() + ggplot2::ylab("Ascertainment rate (%)") + ggoutbreak::geom_events(events = ggoutbreak::england_events,hide_labels = TRUE) p1+p2+p3+patchwork::plot_layout(ncol=1) ```