| Title: | Helpful Functions for Cleaning Surveillance Data |
|---|---|
| Description: | Helpful functions for the cleaning and manipulation of surveillance data, especially with regards to the creation and validation of panel data from individual level surveillance data. |
| Authors: | Richard Aubrey White [aut, cre] (ORCID: <https://orcid.org/0000-0002-6747-1726>) |
| Maintainer: | Richard Aubrey White <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 2026.7.1 |
| Built: | 2026-07-02 10:21:23 UTC |
| Source: | https://github.com/niphr/cstidy |
Attempts to expand the dataset to include more time
A time series is defined as a unique combination of:
granularity_time
granularity_geo
country_iso3
location_code
border
age
sex
*_id
*_tag
expand_time_to( x, max_isoyear = NULL, max_isoyearweek = NULL, max_date = NULL, ... )expand_time_to( x, max_isoyear = NULL, max_isoyearweek = NULL, max_date = NULL, ... )
x |
An object of type |
max_isoyear |
Maximum isoyear to expand each isoyear time series up to. |
max_isoyearweek |
Maximum isoyearweek to expand each isoyearweek time series up to. |
max_date |
Maximum date to expand each daily time series up to. |
... |
Not used. |
csfmt_rts_data_v2, a larger dataset that includes more rows corresponding to more time.
Other csfmt_rts_data:
identify_data_structure(),
remove_class_csfmt_rts_data(),
set_csfmt_rts_data_v1(),
set_csfmt_rts_data_v2(),
set_csfmt_rts_data_v3(),
unique_time_series()
x <- cstidy::generate_test_data() %>% cstidy::set_csfmt_rts_data_v2() cstidy::expand_time_to(x, max_isoyearweek = "2022-10")x <- cstidy::generate_test_data() %>% cstidy::set_csfmt_rts_data_v2() cstidy::expand_time_to(x, max_isoyearweek = "2022-10")
Generates some test data
generate_test_data(fmt = "csfmt_rts_data_v2")generate_test_data(fmt = "csfmt_rts_data_v2")
fmt |
Data format ( |
csfmt_rts_data_v2, a dataset containing fake data.
cstidy::generate_test_data("csfmt_rts_data_v2")cstidy::generate_test_data("csfmt_rts_data_v2")
Looks up the time columns (such as isoyear, isoweek, season, and date) that correspond to a vector of dates, isoyearweeks, or isoyears, returning them as a data.table restricted to the requested columns.
heal_time_csfmt_rts_data_v1(x, cols, granularity_time = "date")heal_time_csfmt_rts_data_v1(x, cols, granularity_time = "date")
x |
A vector containing either dates, isoyearweek, or isoyear. |
cols |
Columns to restrict the output to. |
granularity_time |
date, isoyearweek, or isoyear, depending on the values contained in x. |
data.table, a dataset with time columns corresponding to the values given in x.
cstidy::heal_time_csfmt_rts_data_v1( as.Date(c("2022-01-01", "2022-06-15")), cols = c("isoyear", "isoyearweek", "date"), granularity_time = "date" )cstidy::heal_time_csfmt_rts_data_v1( as.Date(c("2022-01-01", "2022-06-15")), cols = c("isoyear", "isoyearweek", "date"), granularity_time = "date" )
Looks up the time columns (such as isoyear, isoweek, isoquarter, season, and date) that correspond to a vector of dates, isoyearweeks, seasons, or isoyears, returning them as a data.table restricted to the requested columns.
heal_time_csfmt_rts_data_v2(x, cols, granularity_time = "date")heal_time_csfmt_rts_data_v2(x, cols, granularity_time = "date")
x |
A vector containing dates, isoyearweek, season, or isoyear. |
cols |
Columns to restrict the output to. |
granularity_time |
One of "date", "isoyearweek", "season", or "isoyear", matching the values contained in x. |
data.table, a dataset with time columns corresponding to the values given in x.
cstidy::heal_time_csfmt_rts_data_v2( c("2022-01", "2022-02"), cols = c("isoyear", "isoweek", "season", "date"), granularity_time = "isoyearweek" )cstidy::heal_time_csfmt_rts_data_v2( c("2022-01", "2022-02"), cols = c("isoyear", "isoweek", "season", "date"), granularity_time = "isoyearweek" )
Summarises the data structure of a single column inside a dataset. For each
combination of granularity_time, granularity_geo, age, and sex it records
whether the column is structurally missing, only NA, only data, or a mix of
data and NA. The result can be passed to plot() for a visual overview.
identify_data_structure(x, col, ...) ## S3 method for class 'csfmt_rts_data_v2' identify_data_structure(x, col, ...) ## S3 method for class ''tbl_Microsoft SQL Server'' identify_data_structure(x, col, ...)identify_data_structure(x, col, ...) ## S3 method for class 'csfmt_rts_data_v2' identify_data_structure(x, col, ...) ## S3 method for class ''tbl_Microsoft SQL Server'' identify_data_structure(x, col, ...)
x |
An object of type |
col |
Column name (character) whose data structure is summarised. |
... |
Arguments passed to or from other methods. |
csfmt_rts_data_structure_hash_v2, a summary object that can be plotted.
Other csfmt_rts_data:
expand_time_to(),
remove_class_csfmt_rts_data(),
set_csfmt_rts_data_v1(),
set_csfmt_rts_data_v2(),
set_csfmt_rts_data_v3(),
unique_time_series()
cstidy::generate_test_data() %>% cstidy::set_csfmt_rts_data_v2() %>% cstidy::identify_data_structure("deaths_n") %>% plot()cstidy::generate_test_data() %>% cstidy::set_csfmt_rts_data_v2() %>% cstidy::identify_data_structure("deaths_n") %>% plot()
This data comes from the Norwegian Surveillance System for Communicable Diseases (MSIS). The date corresponds to when the PCR-test was taken.
nor_covid19_cases_by_time_location_csfmt_rts_v1nor_covid19_cases_by_time_location_csfmt_rts_v1
A csfmt_rts_data_v1 with 11028 rows and 18 variables:
day/isoweek
nation, county
nor
norge, 11 counties
2020
total
Isoyear of event
Isoweek of event
Isoyearweek of event
Season of event
Seasonweek of event
Calyear of event
Calmonth of event
Calyearmonth of event
Date of event
Number of confirmed covid19 cases
Number of confirmed covid19 cases per 100.000 population
The raw number of cases and cases per 100.000 population are recorded.
This data was extracted on 2022-05-04.
head(cstidy::nor_covid19_cases_by_time_location_csfmt_rts_v1)head(cstidy::nor_covid19_cases_by_time_location_csfmt_rts_v1)
This data was extracted on 2022-05-04.
nor_covid19_icu_and_hospitalization_csfmt_rts_v1nor_covid19_icu_and_hospitalization_csfmt_rts_v1
A csfmt_rts_data_v1 with 919 rows and 18 variables:
day/isoweek
nation
nor
norge
2020
total
Isoyear of event
Isoweek of event
Isoyearweek of event
Season of event
Seasonweek of event
Calyear of event
Calmonth of event
Calyearmonth of event
Date of event
Number of new admissions to the ICU with a positive PCR test
Number of new hospitalizations with Covid-19 as the primary cause
head(cstidy::nor_covid19_icu_and_hospitalization_csfmt_rts_v1)head(cstidy::nor_covid19_icu_and_hospitalization_csfmt_rts_v1)
Remove class csfmt_rts_data_*
remove_class_csfmt_rts_data(x)remove_class_csfmt_rts_data(x)
x |
data.table |
No return value, called for the side effect of removing the csfmt_rts_data class from x.
Other csfmt_rts_data:
expand_time_to(),
identify_data_structure(),
set_csfmt_rts_data_v1(),
set_csfmt_rts_data_v2(),
set_csfmt_rts_data_v3(),
unique_time_series()
x <- cstidy::generate_test_data() %>% cstidy::set_csfmt_rts_data_v2() class(x) cstidy::remove_class_csfmt_rts_data(x) class(x)x <- cstidy::generate_test_data() %>% cstidy::set_csfmt_rts_data_v2() class(x) cstidy::remove_class_csfmt_rts_data(x) class(x)
set_csfmt_rts_data_v1 converts a data.table to csfmt_rts_data_v1 by reference.
csfmt_rts_data_v1 creates a new csfmt_rts_data_v1 (not by reference) from either a data.table or data.frame.
set_csfmt_rts_data_v1(x, create_unified_columns = TRUE, heal = TRUE) csfmt_rts_data_v1(x, create_unified_columns = TRUE, heal = TRUE)set_csfmt_rts_data_v1(x, create_unified_columns = TRUE, heal = TRUE) csfmt_rts_data_v1(x, create_unified_columns = TRUE, heal = TRUE)
x |
The data.table to be converted to csfmt_rts_data_v1 |
create_unified_columns |
Do you want it to create unified columns? |
heal |
Do you want to impute missing values on creation? |
An extended data.table, which has been modified by reference and returned (invisibly).
No return value, called for side effect of replacing the current data.table with a csfmt_rts_data_v1 in place.
Returns a duplicated csfmt_rts_data_v1.
csfmt_rts_data_v1 contains the smart assignment feature for time and geography.
When the variables in bold are assigned using :=, the listed variables will be automatically imputed.
location_code:
granularity_geo
country_iso3
isoyear:
granularity_time
isoweek
isoyearweek
season
seasonweek
calyear
calmonth
calyearmonth
date
isoyearweek:
granularity_time
isoyear
isoweek
season
seasonweek
calyear
calmonth
calyearmonth
date
date:
granularity_time
isoyear
isoweek
isoyearweek
season
seasonweek
calyear
calmonth
calyearmonth
csfmt_rts_data_v1 contains 16 unified columns:
granularity_time
granularity_geo
country_iso3
location_code
border
age
sex
isoyear
isoweek
isoyearweek
season
seasonweek
calyear
calmonth
calyearmonth
date
Other csfmt_rts_data:
expand_time_to(),
identify_data_structure(),
remove_class_csfmt_rts_data(),
set_csfmt_rts_data_v2(),
set_csfmt_rts_data_v3(),
unique_time_series()
set_csfmt_rts_data_v2 converts a data.table to csfmt_rts_data_v2 by reference.
csfmt_rts_data_v2 creates a new csfmt_rts_data_v2 (not by reference) from either a data.table or data.frame.
set_csfmt_rts_data_v2(x, create_unified_columns = TRUE, heal = TRUE) csfmt_rts_data_v2(x, create_unified_columns = TRUE, heal = TRUE)set_csfmt_rts_data_v2(x, create_unified_columns = TRUE, heal = TRUE) csfmt_rts_data_v2(x, create_unified_columns = TRUE, heal = TRUE)
x |
The data.table to be converted to csfmt_rts_data_v2 |
create_unified_columns |
Do you want it to create unified columns? |
heal |
Do you want to impute missing values on creation? |
For more details see the vignette:
vignette("csfmt_rts_data_v2", package = "cstidy")
An extended data.table, which has been modified by reference and returned (invisibly).
No return value, called for side effect of replacing the current data.table with a csfmt_rts_data_v2 in place.
Returns a duplicated csfmt_rts_data_v2.
csfmt_rts_data_v2 contains the smart assignment feature for time and geography.
When the variables in bold are assigned using :=, the listed variables will be automatically imputed.
location_code:
granularity_geo
country_iso3
isoyear:
granularity_time
isoweek
isoyearweek
isoquarter
isoyearquarter
season
seasonweek
calyear
calmonth
calyearmonth
date
isoyearweek:
granularity_time
isoyear
isoweek
isoquarter
isoyearquarter
season
seasonweek
calyear
calmonth
calyearmonth
date
season:
granularity_time
isoyear
isoweek
isoyearweek
isoquarter
isoyearquarter
seasonweek
calyear
calmonth
calyearmonth
date
date:
granularity_time
isoyear
isoweek
isoyearweek
isoquarter
isoyearquarter
season
seasonweek
calyear
calmonth
calyearmonth
csfmt_rts_data_v2 contains 16 unified columns:
granularity_time
granularity_geo
country_iso3
location_code
border
age
sex
isoyear
isoweek
isoyearweek
isoquarter
isoyearquarter
season
seasonweek
calyear
calmonth
calyearmonth
date
Other csfmt_rts_data:
expand_time_to(),
identify_data_structure(),
remove_class_csfmt_rts_data(),
set_csfmt_rts_data_v1(),
set_csfmt_rts_data_v3(),
unique_time_series()
# Create some fake data as data.table d <- cstidy::generate_test_data(fmt = "csfmt_rts_data_v2") d <- d[1:5] # convert to csfmt_rts_data_v2 by reference cstidy::set_csfmt_rts_data_v2(d, create_unified_columns = TRUE) # d[1, isoyearweek := "2021-01"] d d[2, isoyear := 2019] d d[3, date := as.Date("2020-01-01")] d d[4, c("isoyear", "isoyearweek") := .(2021, "2021-01")] d d[5, c("location_code") := .("norge")] d # Investigating the data structure of one column inside a dataset cstidy::generate_test_data() %>% cstidy::set_csfmt_rts_data_v2() %>% cstidy::identify_data_structure("deaths_n") %>% plot() # Investigating the data structure via summary cstidy::generate_test_data() %>% cstidy::set_csfmt_rts_data_v2() %>% summary()# Create some fake data as data.table d <- cstidy::generate_test_data(fmt = "csfmt_rts_data_v2") d <- d[1:5] # convert to csfmt_rts_data_v2 by reference cstidy::set_csfmt_rts_data_v2(d, create_unified_columns = TRUE) # d[1, isoyearweek := "2021-01"] d d[2, isoyear := 2019] d d[3, date := as.Date("2020-01-01")] d d[4, c("isoyear", "isoyearweek") := .(2021, "2021-01")] d d[5, c("location_code") := .("norge")] d # Investigating the data structure of one column inside a dataset cstidy::generate_test_data() %>% cstidy::set_csfmt_rts_data_v2() %>% cstidy::identify_data_structure("deaths_n") %>% plot() # Investigating the data structure via summary cstidy::generate_test_data() %>% cstidy::set_csfmt_rts_data_v2() %>% summary()
Same unified columns as set_csfmt_rts_data_v2, but without the
self-healing [ override (healing is explicit) and with a content-hash
time_series_id.
set_csfmt_rts_data_v3(x, create_unified_columns = TRUE, heal = TRUE) csfmt_rts_data_v3(x, create_unified_columns = TRUE, heal = TRUE)set_csfmt_rts_data_v3(x, create_unified_columns = TRUE, heal = TRUE) csfmt_rts_data_v3(x, create_unified_columns = TRUE, heal = TRUE)
x |
The data.table to convert (by reference). |
create_unified_columns |
Create the unified columns? |
heal |
Impute missing time/geo columns on creation? |
x, modified by reference, invisibly.
A new csfmt_rts_data_v3 (not by reference).
Other csfmt_rts_data:
expand_time_to(),
identify_data_structure(),
remove_class_csfmt_rts_data(),
set_csfmt_rts_data_v1(),
set_csfmt_rts_data_v2(),
unique_time_series()
Attempts to identify the unique time series that exist in this dataset.
A time series is defined as a unique combination of:
granularity_time
granularity_geo
country_iso3
location_code
border
age
sex
*_id
*_tag
unique_time_series(x, set_time_series_id = FALSE, ...)unique_time_series(x, set_time_series_id = FALSE, ...)
x |
An object of type |
set_time_series_id |
If TRUE, then |
... |
Not used. |
data.table, a dataset that lists all the unique time series in x.
Other csfmt_rts_data:
expand_time_to(),
identify_data_structure(),
remove_class_csfmt_rts_data(),
set_csfmt_rts_data_v1(),
set_csfmt_rts_data_v2(),
set_csfmt_rts_data_v3()
x <- cstidy::generate_test_data() %>% cstidy::set_csfmt_rts_data_v2() cstidy::unique_time_series(x)x <- cstidy::generate_test_data() %>% cstidy::set_csfmt_rts_data_v2() cstidy::unique_time_series(x)