---
title: "Data format (csfmt_rts_data_v2)"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Data format (csfmt_rts_data_v2)}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options:
  chunk_output_type: console
---

```{r setup, include = FALSE}
if(isFALSE(getOption('knitr.in.progress'))){
  base_folder <- "vignettes/"
} else {
  base_folder <- ""
}

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r echo=FALSE, include=FALSE}
library(data.table)
library(magrittr)
library(ggplot2)
```

## csfmt_rts_data_v2

This document describes the `csfmt_rts_data_v2` data format, which the Core Surveillance team uses for real-time surveillance of infectious diseases.

## Style

**Language**

English is the primary language for code. Norwegian abbreviations and data source names (`msis`, `daar`, `sysvak`, `normomo`) are kept as-is.

**Capital letters**

Avoid capital letters wherever possible, including in filenames (e.g. `data.rds` is preferred to `data.RDS`).

**snake_case or camelCase?**

Use snake_case.

**Timestamping file names**

Result files (e.g. reports) should include a creation timestamp in the filename so the most recent version is easy to identify and Airflow errors are easy to trace.

e.g. `Epidemiologisk_situasjonsrapport_2021-05-31_0659.docx` for a report generated on 2021-05-31 at 06:59.

## Ordering of variables

When variable ordering matters, use the following sequence:

- time
- location
- age
- sex

e.g. A database table could be called `msis_by_time_location_age_sex` or a filename could be called `2020_oslo_05-10_male.xlsx`

## Time

Time conversion functions are available from [cstime](https://niphr.github.io/cstime/). Missing time values should be coded as `NA`. Uncommon or internal-use values are indicated by strikethrough text in the table.

```{r echo=FALSE, results='asis'}
d <- rbind(
  data.frame(
    granularity_time = "date",
    class = "Date",
    fn = "as.Date",
    example = "2021-12-31"
  ),
  data.frame(
    granularity_time = "isoyear (numeric)",
    class = "character",
    fn = "cstime::isoyear_n",
    example = "2021"
  ),
  data.frame(
    granularity_time = "isoyearweek",
    class = "character",
    fn = "cstime::isoyearweek_c",
    example = "\"2021-01\""
  ),
  data.frame(
    granularity_time = "event_*_date1_to_date2",
    class = "character",
    fn = "as.character",
    example = "\"event_covid19_norway_vaccination_2020_12_02_to_9999_09_09\", \"event_covid19_norway_2020_02_21_to_9999_09_09\""
  )
)

gt::gt(d) %>%
  gt::tab_options(
    table.width = "1100px"
  ) %>%
  gt::tab_header(title = "Valid times in the csverse format") %>%
  gt::cols_label(
    granularity_time = "Time (Granularity)",
    class = "Class",
    fn = "Function",
    example = "Example(s)"
  ) %>%
  gt::cols_width(
    granularity_time ~ "20%",
    class ~ "15%",
    fn ~ "20%",
    example ~ "55%"
  ) %>%
  gt::tab_footnote(
    footnote = "If the event is ongoing, then the 'to' date should be 9999_09_09.",
    locations = gt::cells_body(
      columns = granularity_time,
      rows = stringr::str_detect(granularity_time, "event")
    )
  )
```

### Approved events

The following are approved events:

```{r echo=FALSE, results='asis'}
d <- rbind(
  data.frame(
    granularity_time = "event_covid19_norway_2020_02_21_to_9999_09_09",
    definition = "Covid-19 outbreak in Norway (on-going)."
  ),
  data.frame(
    granularity_time = "event_covid19_norway_vaccination_2020_12_02_to_9999_09_09",
    definition = "Covid-19 vaccination campaign in Norway (on-going)."
  )
)

gt::gt(d) %>%
  gt::tab_options(
    table.width = "750px"
  ) %>%
  gt::cols_width(
    granularity_time ~ "65%",
    definition ~ "35%"
  ) %>%
  gt::cols_align(
    align = "left"
  ) %>%
  gt::tab_header(title = "Approved events")
```

## Location

Locations come from [csdata](https://niphr.github.io/csdata/). The full list of valid location codes and types is in `csdata::nor_locations_names()`. Uncommon or internal-use columns are indicated by strikethrough text in the table.

```{r echo=FALSE, results='asis'}
d <- csdata::nor_locations_names()[, .(
  n = .N,
  location_code = location_code[1],
  location_name = location_name[1],
  location_name_description_nb = location_name_description_nb[1],
  location_name_file_nb_utf = location_name_file_nb_utf[1],
  location_name_file_nb_ascii = location_name_file_nb_ascii[1]
),
by = .(granularity_geo)
]

gt::gt(d) %>%
  gt::tab_options(
    table.width = "1500px"
  ) %>%
  gt::tab_header(title = "Valid locations and location types in the csverse format") %>%
  gt::cols_label(
    granularity_geo = "Geo (Granularity)",
    n = "N"
  ) %>%
  # gt::cols_width(
  #   granularity_time ~ "20%",
  #   class ~ "15%",
  #   fn ~ "20%",
  #   example ~ "55%"
  # ) %>%
  gt::tab_spanner(
    label = "Examples",
    columns = c(location_code, location_name, location_name_description_nb, location_name_file_nb_utf, location_name_file_nb_ascii)
  ) %>%
  gt::tab_footnote(
    footnote = gt::md("**location_code**: Used a) **inside datasets** and b) in data **file names** for transfer of data/results between analytic systems. All values are unique."),
    locations = gt::cells_column_labels(
      columns = location_code
    )
  ) %>%
  gt::tab_footnote(
    footnote = gt::md("**location_name**: Used (rarely) **inside results** (figures, tables, documents). Can be confusing as some names are duplicated. Its rare usage is demarcated by a line through the text."),
    locations = gt::cells_column_labels(
      columns = location_name
    )
  ) %>%
  gt::tab_style(
    style = list(
      gt::cell_text(decorate = "line-through")
    ),
    locations = gt::cells_body(
      columns = location_name,
      rows = gt::everything()
    )
  ) %>%
  gt::tab_footnote(
    footnote = gt::md("**location_name_description_nb**: Used (frequently) **inside results** (figures, tables, documents). All values are unique."),
    locations = gt::cells_column_labels(
      columns = location_name_description_nb
    )
  ) %>%
  gt::tab_footnote(
    footnote = gt::md("**location_name_file_nb_utf**: Used (frequently) in the **file names** for results (figures, tables, documents). All values are unique."),
    locations = gt::cells_column_labels(
      columns = location_name_file_nb_utf
    )
  ) %>%
  gt::tab_footnote(
    footnote = gt::md("**location_name_file_nb_ascii**: Used (rarely) in the **file names** for results (figures, tables, documents). Used if file systems have problems with the Norwegian letters æøå. All values are unique."),
    locations = gt::cells_column_labels(
      columns = location_name_file_nb_ascii
    )
  ) %>%
  gt::tab_footnote(
    footnote = "Bo- og arbeidsmarkedsregioner. Housing and labor market regions.",
    locations = gt::cells_body(
      columns = granularity_geo,
      rows = granularity_geo == "baregion"
    )
  ) %>%
  gt::tab_footnote(
    footnote = "Mattilsynet-regioner. Food authority regions.",
    locations = gt::cells_body(
      columns = granularity_geo,
      rows = granularity_geo == "faregion"
    )
  )
```

## Ages

Ages are stored as character strings and must always contain 3 digits. Age ranges join the two bounds with an underscore (e.g. `005_010`).

Use `085p` rather than `>=085` or `85+`; this ensures clean conversion between long and wide formats.

```{r echo=FALSE, results='asis'}
d <- rbind(
  data.frame(
    value = "\"000\"",
    class = "character",
    definition = "One year age group (0 year olds)"
  ),
  data.frame(
    value = "\"079\"",
    class = "character",
    definition = "One year age group(79 year olds)"
  ),
  data.frame(
    value = "\"000_004\"",
    class = "character",
    definition = "Age span of 0-4 year olds"
  ),
  data.frame(
    value = "\"065p\"",
    class = "character",
    definition = "Age span of >=65 year olds"
  ),
  data.frame(
    value = "\"missing\"",
    class = "character",
    definition = "Missing/unknown"
  ),
  data.frame(
    value = "\"total\"",
    class = "character",
    definition = "Everyone"
  )
)

gt::gt(d) %>%
  gt::tab_header(title = "Valid ages in the csverse format") %>%
  gt::cols_label(
    value = "Value",
    definition = "Definition"
  )
```

This format keeps data in a consistent sort order and produces valid variable names when the data is pivoted to wide format.

Missing ages should be coded as `"missing"`.

## Sex

Sex is stored as a character string.

```{r echo=FALSE, results='asis'}
d <- rbind(
  data.frame(
    value = "\"male\"",
    class = "character",
    definition = "Male"
  ),
  data.frame(
    value = "\"female\"",
    class = "character",
    definition = "Female"
  ),
  data.frame(
    value = "\"missing\"",
    class = "character",
    definition = "Missing/unknown"
  ),
  data.frame(
    value = "\"total\"",
    class = "character",
    definition = "Everyone"
  )
)

gt::gt(d) %>%
  gt::tab_header(title = "Valid sexes in the csverse format") %>%
  gt::cols_label(
    value = "Value",
    definition = "Definition"
  )
```

Missing values should be coded as `"missing"`.

## Unified columns

Every dataset in `csfmt_rts_data_v2` format contains the 18 columns listed below. Time conversion functions are in [cstime](https://niphr.github.io/cstime/).

```{r echo=FALSE, results='asis'}
d <- rbind(
  data.frame(
    variable = "granularity_time",
    accepted_values = gt::html("\"date\", \"isoyearweek\", \"isoyear\", \"event_*_*_to_*\" (<a href='#time'>Time</a>)"),
    definition = "Granularity of time"
  ),
  data.frame(
    variable = "granularity_geo",
    accepted_values = paste0("\"", paste0(unique(csdata::nor_locations_names()$granularity_geo), collapse = "\", \""), "\""),
    definition = "Granularity of geography"
  ),
  data.frame(
    variable = "country_iso3",
    accepted_values = "\"nor\", \"den\", \"swe\", \"fin\"",
    definition = "ISO3 country code."
  ),
  data.frame(
    variable = "location_code",
    accepted_values = gt::html("\"norge\", \"countyXX\", \"municipXXXX\", ... (<a href='#location'>Location</a>)"),
    definition = "Location code"
  ),
  data.frame(
    variable = "border",
    accepted_values = "2020",
    definition = "The borders (kommunesammenslåing) that location_code represents"
  ),
  data.frame(
    variable = "age",
    accepted_values = gt::html("\"000\", \"001\", \"000_004\", \"065p\", \"total\", \"missing\", ... (<a href='#age'>Age</a>)"),
    definition = "Age in years"
  ),
  data.frame(
    variable = "sex",
    accepted_values = gt::html("\"male\", \"female\", \"total\", \"missing\" (<a href='#sex'>Sex</a>)"),
    definition = "Sex"
  ),
  data.frame(
    variable = "isoyear",
    accepted_values = "YYYY",
    definition = "Use function cstime::*_to_isoyear_n"
  ),
  data.frame(
    variable = "isoweek",
    accepted_values = "1, 2, ..., 53",
    definition = "Use functions cstime::*_to_isoweek_n"
  ),
  data.frame(
    variable = "isoyearweek",
    accepted_values = "\"YYYY-WW\"",
    definition = "Use function cstime::*_to_isoyearweek_c"
  ),
    data.frame(
    variable = "isoquarter",
    accepted_values = "1, 2, 3, 4",
    definition = "Use functions cstime::*_to_isoquarter_n"
  ),
  data.frame(
    variable = "isoyearquarter",
    accepted_values = "\"2021-Q01\"",
    definition = "Use function cstime::*_to_isoyearquarter_c"
  ),
  data.frame(
    variable = "season",
    accepted_values = "\"YYYY/YYYY\"",
    definition = "Seasons start in week 30 and finish in week 29."
  ),
  data.frame(
    variable = "seasonweek",
    accepted_values = "1, 2, ..., 23, 23.5, 24, ..., 52",
    definition = "isoweek = 30 -> seasonweek = 1. isoweek = 52 -> seasonweek = 23. isoweek = 53 -> seasonweek = 23.5. isoweek = 1 -> seasonweek = 24. isoweek = 29 -> seasonweek = 52. This is used primarily for plotting/analysis reasons."
  ),
  data.frame(
    variable = "calyear",
    accepted_values = "..., 2020, 2021, ...",
    definition = "Calendar years."
  ),
  data.frame(
    variable = "calmonth",
    accepted_values = "1, 2, ..., 11, 12",
    definition = "Calendar months."
  ),
  data.frame(
    variable = "calyearmonth",
    accepted_values = "\"2021-M01\"",
    definition = ""
  ),
  data.frame(
    variable = "date",
    accepted_values = "YYYY-MM-DD",
    definition = "Always corresponds to the last date in the time period. E.g. if granularity_time=='isoweek' then date is the Sunday of that week. If granularity_time == 'event_*_date1_to_9999_09_09' then date is 9999-09-09"
  )
) %>%
  dplyr::mutate(
    accepted_values = purrr::map(accepted_values, ~ gt::html(as.character(.)))
  )

gt::gt(d) %>%
  gt::tab_options(
    table.width = "750px"
  ) %>%
  gt::cols_width(
    variable ~ "20%",
    accepted_values ~ "40%",
    definition ~ "40%"
  ) %>%
  gt::cols_align(
    align = "left"
  ) %>%
  gt::tab_header(title = "Unified columns (18) in the csverse format csfmt_rts_data_v2") %>%
  gt::cols_label(
    variable = "Variable",
    accepted_values = "Accepted values",
    definition = "Definition"
  )
```

### Smart assignment

`csfmt_rts_data_v2` supports smart assignment for time and geography. When the **bold** variables below are set with `:=`, the associated columns are automatically derived.

**location_code**:

- granularity_geo
- country_iso3

**isoyear**:

- granularity_time
- isoweek
- isoyearweek
- isoquarter
- isoyearquarter
- season
- seasonweek
- calyear
- calmonth
- calyearmonth
- date

**isoyearweek**:

- granularity_time
- isoyear
- isoweek
- isoquarter
- isoyearquarter
- season
- seasonweek
- calyear
- calmonth
- calyearmonth
- date

**date**:

- granularity_time
- isoyear
- isoweek
- isoyearweek
- isoquarter
- isoyearquarter
- season
- seasonweek
- calyear
- calmonth
- calyearmonth

## Context-specific columns

Any variable not in the [unified columns](#unified-columns) is a context-specific column. Its name is built from two mandatory sections (description and format) and up to five optional sections (time, statistics, forecast, censored/status, formatted), joined by underscores:

```{yaml}
description[_time][_statistics]_format[_forecast][_censored/status][_formatted]
```

Brackets denote optional sections. In practice, most column names use only a subset of these.


```{r echo=FALSE, results='asis'}
d <- rbind(
  data.frame(
    x = "",
    examples = "deaths, consultations, cases",
    definition = "Simple."
  ),
  data.frame(
    x = "",
    examples = "deaths_registered, deaths_nowcasted, deaths_nowcasted_baseline",
    definition = "Slightly complex."
  ),
  data.frame(
    x = "",
    examples = "hospital_deaths, vax_administered_dose_1, vax_coverage_dose_1, msis_cases_testdate, msis_cases_regdate",
    definition = "Complex."
  ),
  data.frame(
    x = "",
    examples = "outcome, exposure, model",
    definition = "Generally used in conjunction with 'tag' (see 'Format')."
  ),
  data.frame(
    x = "",
    examples = "sum0_13",
    definition = "The sum of values for the given date and the previous 13 days. If granularity_time=='isoyearweek' and the given isoweek has full data, then it is the sum of values for the Sunday in the given isoweek and the previous 13 days. If granularity_time=='isoyearweek' and the given isoweek does not have full data, or granularity_time=='event_*_to_9999_09_09' (ongoing event), then it is the sum of values for the last day with data and the previous 13 days."
  ),
  data.frame(
    x = "",
    examples = "sum0_999999",
    definition = "The sum of all days with data."
  ),
  data.frame(
    x = "",
    examples = "daymean0_13",
    definition = "The mean of all the daily observations for the given date and the previous 13 days."
  ),
  data.frame(
    x = "",
    examples = "isoweekmean0_13",
    definition = "The mean of all the weekly observations for the given date and the previous 13 days (i.e. the last 2 weeks)."
  ),
  data.frame(
    x = "",
    examples = "predinterval_q02x5",
    definition = "Prediction interval for the baseline (2.5th quantile). 'x' is used to denominate a decimal point, so that we can differentiate between 100 (100x0) and 10.0 (10x0)."
  ),
  data.frame(
    x = "",
    examples = "credintervalobs_q02x5",
    definition = "Credibility interval for a new observation of data according to the baseline model (2.5th quantile)."
  ),
  data.frame(
    x = "",
    examples = "credintervalmean_q02x5",
    definition = "Credibility interval for the mean of the data according to the baseline model (2.5th quantile)."
  ),
  data.frame(
    x = "",
    examples = "*interval*_q50x0",
    definition = "Generally speaking, the 50th percentile is the expected value."
  ),
  data.frame(
    x = "",
    examples = "id/tag",
    definition = "Used when data is in long format, to indicate an id variable. Frequently combined with descriptions of 'outcome', 'exposure', 'model'. id is used for numeric columns. tag is used for character columns."
  ),
  data.frame(
    x = "",
    examples = "n",
    definition = "Numerical value"
  ),
  data.frame(
    x = "",
    examples = "pr1",
    definition = "Proportion (between 0 and 1)"
  ),
  data.frame(
    x = "",
    examples = "pr100",
    definition = "Percentage (between 0 and 100)"
  ),
  data.frame(
    x = "",
    examples = "pr100000, prX",
    definition = "Rate per X"
  ),
  data.frame(
    x = "",
    examples = "date",
    definition = "Date"
  ),
  data.frame(
    x = "",
    examples = "bool",
    definition = "TRUE/FALSE"
  ),
  data.frame(
    x = "",
    examples = "forecast",
    definition = "TRUE/FALSE. Only used when a column contains both forecasted and non-forecasted data."
  ),
  data.frame(
    x = "",
    examples = "censored",
    definition = "TRUE/FALSE"
  ),
  data.frame(
    x = "",
    examples = "status",
    definition = "Character."
  )
)

gt::gt(d) %>%
  gt::tab_options(
    table.width = "750px"
  ) %>%
  gt::cols_width(
    x ~ "5%",
    examples ~ "35%",
    definition ~ "60%"
  ) %>%
  gt::cols_align(
    align = "left"
  ) %>%
  gt::tab_header(title = "Context-specific columns in the csverse format csfmt_rts_data_v2") %>%
  gt::cols_label(
    x = "",
    examples = "Examples",
    definition = "Definition"
  ) %>%
  gt::tab_row_group(
    label = "Censored/Status (optional)",
    rows = 21:nrow(d)
  ) %>%
  gt::tab_row_group(
    label = "Forecast (optional)",
    rows = 20
  ) %>%
  gt::tab_row_group(
    label = "Format (mandatory)",
    rows = 13:19
  ) %>%
  gt::tab_row_group(
    label = "Statistics (optional)",
    rows = 9:12
  ) %>%
  gt::tab_row_group(
    label = "Time (optional)",
    rows = 5:8
  ) %>%
  gt::tab_row_group(
    label = "Description (mandatory)",
    rows = 1:4
  )
```

### Examples

In the examples below, the description, time, statistics, format, and censor/status sections are separated by `/` for readability.

Death and nowcasting:

- **deaths_registered/\_n**: Number of registered deaths.
- **deaths_nowcasted/\_n**: Number of registered deaths, corrected for registration delay (nowcasting).
- **deaths_nowcasted/\_n\_forecast**: Has 'deaths_nowcasted_n' been forecasted (i.e. nowcasted)?
- **deaths_nowcasted/\_n/\_censored**: Has 'deaths_nowcasted_n' been censored?
- **deaths_nowcasted/\_n/\_status**: Status of 'deaths_nowcasted_n' in relation to 'deaths_nowcasted_baseline_credintervalobs_q\*_n'.
- **deaths_nowcasted/\_credintervalobs_q02x5/\_n**: The 2.5th quantile of where we expect the real number of registered deaths (Bayesian).
- **deaths_nowcasted_baseline/\_predinterval_q02x5/\_n**: The 2.5th quantile of an expected new observation of nowcasted deaths (frequentist).
- **deaths_nowcasted_baseline/\_predinterval_q97x5/\_n**: The 97.5th quantile of an expected new observation of nowcasted deaths (frequentist).
- **deaths_nowcasted_baseline/\_credintervalobs_q02x5/\_n**: The 2.5th quantile of where we expect an observation of nowcasted deaths (Bayesian).
- **deaths_nowcasted_baseline/\_credintervalmean_q02x5/\_n**: The 2.5th quantile of the mean of nowcasted deaths (Bayesian).

Covid-19 case counts:

- **covid19_cases_regdate/\_n**: Number of covid-19 cases by registration date.
- **covid19_cases_testdate/\_n**: Number of covid-19 cases by testing date.
- **covid19_cases_testdate/\_sum0_13/\_n**: The sum of 14 days of cases. When granularity_time=='date', date=='2022-01-20', and the current date is '2022-02-07', the value is the sum of covid19_cases_testdate_n between '2022-01-07' and '2022-01-20'. When granularity_time=='isoyearweek',  isoyearweek=='2022-03', and the current date is '2022-02-07' (Monday in isoyearweek '2022-06') the value is the sum of covid19_cases_testdate_n between '2022-01-10' (Monday in isoyearweek '2022-02') and '2022-01-23' (Sunday in isoyearweek '2022-03'). When granularity_time=='isoyearweek',  isoyearweek=='2022-06', and the current date is '2022-02-07' (Monday in isoyearweek '2022-06'), the value is the sum of covid19_cases_testdate_n between the last day with available data and 13 days prior. When granularity_time=='event_covid19_norway_2020_02_21_to_9999_09_09', the value is the sum of covid19_cases_testdate_n between the last day with available data and 13 days prior.
- **covid19_cases_testdate/\_sum0_999999/\_n**: The sum of all recorded days of cases.
- **covid19_cases_testdate/\_sum0_13/\_n**: Expected number of nowcasted deaths (i.e. baseline).
- **deaths_nowcasted_baseline/\_predinterval_q02x5/\_n**: The 2.5th quantile of an expected new observation of nowcasted deaths (frequentist).
- **deaths_nowcasted_baseline/\_predinterval_q97x5/\_n**: The 97.5th quantile of an expected new observation of nowcasted deaths (frequentist).
- **deaths_nowcasted_baseline/\_credintervalobs_q02x5/\_n**: The 2.5th quantile of where we expect an observation of nowcasted deaths (Bayesian).
- **deaths_nowcasted_baseline/\_credintervalmean_q02x5/\_n**: The 2.5th quantile of the mean of nowcasted deaths (Bayesian).

Covid-19 test events:

- **covid19_testevents/\_n**: Number of covid-19 test events (i.e. a person getting tested within a 7 day period).
- **covid19_testevents_pos/\_pr1**: Proportion (0-1) of covid-19 test events that were positive.
- **covid19_testevents_pos/\_pr100**: Percentage (0-100) of covid-19 test events that were positive.
- **covid19_testevents_pos/\_sum0_13/\_pr100**: Percentage (0-100) of covid-19 test events that were positive over the last 14 days.
- **covid19_testevents_pos/\_daymean0_13/\_pr100**: For each of the last 14 days, calculate the percentage (0-100) of covid-19 test events that were positive, and then take the mean of these 14 values.
- **covid19_testevents_pos/\_isoweekmean0_13/\_pr100**: For each of the last 7 day periods (0-6 days, 7-13 days), calculate the percentage (0-100) of covid-19 test events that were positive, and then take the mean of these 2 values.

Vaccination:

- **covid19_vax_administered_dose_1/\_n**: Number of people who received their first dose during this day/isoweek/event. The corresponding age is permanently fixed (a person who received their first dose when 21, will always have received their first dose when 21).
- **covid19_vax_coverage_dose_1/\_n**: Number of people who (on the last day of the day/isoweek/event) have received 1 dose of vaccine. The corresponding age is fixed at the last day of the day/isoweek/event.

## In action

The examples below show smart assignment, collapsing, and class removal on a small test dataset.

```{r}
d <- cstidy::generate_test_data()[1:5]
cstidy::set_csfmt_rts_data_v2(d)

# Looking at the dataset
d[]

# Smart assignment of time columns (note how granularity_time, isoyear, isoyearweek, date all change)
d[1, isoyearweek := "2021-01"]
d

# Smart assignment of time columns (note how granularity_time, isoyear, isoyearweek, date all change)
d[2, isoyear := 2019]
d

# Smart assignment of time columns (note how granularity_time, isoyear, isoyearweek, date all change)
d[4:5, date := as.Date("2020-01-01")]
d

# Smart assignment fails when multiple time columns are set
d[1, c("isoyear", "isoyearweek") := .(2021, "2021-01")]
d

# Smart assignment of geo columns
d[1, c("location_code") := .("norge")]
d

# Collapsing down to different levels, and healing the dataset
# (so that it can be worked on further with regards to real time surveillance)
d[, .(deaths_n = sum(deaths_n), location_code = "norge"), keyby = .(granularity_time)] %>%
  cstidy::set_csfmt_rts_data_v2(create_unified_columns = TRUE) %>%
  print()

# Collapsing down to different levels, without healing the dataset and without
# removing the class csfmt_rts_data_v2 (this is uncommon)
d[, .(deaths_n = sum(deaths_n), location_code = "norge"), keyby = .(granularity_time)] %>%
  print()

# Collapsing to different levels, and removing the class csfmt_rts_data_v2 because
# it is going to be used in new output/analyses
d[, .(deaths_n = sum(deaths_n), location_code = "norge"), keyby = .(granularity_time)] %>%
  cstidy::remove_class_csfmt_rts_data() %>%
  print()
```

## Expand time to

`cstidy::expand_time_to()` adds rows to extend a dataset up to a given future time point.

```{r}
cstidy::generate_test_data() %>%
  cstidy::set_csfmt_rts_data_v2() %>%
  dplyr::filter(location_code == "county03") %>%
  cstidy::expand_time_to(max_isoyearweek = "2022-08") %>%
  print()
```

## Time series

`cstidy::unique_time_series()` counts the distinct time series in a dataset.

```{r}
cstidy::generate_test_data() %>%
  cstidy::set_csfmt_rts_data_v2() %>%
  cstidy::unique_time_series()
```

## Summary

`summary()` gives a concise overview of the data structure.

```{r}
cstidy::generate_test_data() %>%
  cstidy::set_csfmt_rts_data_v2() %>%
  summary()
```

## Identifying the data structure of one column

`cstidy::identify_data_structure()` inspects a single column and returns a plottable object.

```{r}
cstidy::generate_test_data() %>%
  cstidy::set_csfmt_rts_data_v2() %>%
  cstidy::identify_data_structure("deaths_n") %>%
  plot()
```

## Reference (Location)

The table below lists all valid `location_code` and `location_name_description_nb` values — the two most commonly used location identifiers. The full dataset is available in `csdata::nor_locations_names()`.

```{r echo=FALSE, results='asis'}
d <- csdata::nor_locations_names()[, .(
  location_order = paste0("#", location_order),
  location_code,
  location_name_description_nb
)]

gt::gt(d) %>%
  gt::tab_options(
    table.width = "750px"
  ) %>%
  gt::tab_header(title = "Reference table of location_code and location_name_description_nb") %>%
  gt::cols_label(
    location_order = "#"
  )
```