Skip to content

dataset

An ETL dataset is a collection of tables. The dataset metadata fields are the attributes of the DatasetMeta object in ETL.

dataset.description

type: string | recommended (often automatic)

Description of the dataset (mostly for internal purposes, or for users of our data catalog) which is a one- (or a few) paragraph description of the content of the tables.

  • Must start with a capital letter.
  • Must end with a period.
  • Should not mention other metadata fields (e.g. producer or date_published). Exceptions:
    • The other metadata fields are crucial in the description of the data product.
  • Should describe the dataset (i.e. the collection of tables resulting from one or more original data products).
  • Should ideally contain just one or a few paragraphs, that describe its content succinctly.
  • Should be used only to override the automatic description (which usually is the description of the containing table). For example, use it when the dataset contains multiple tables.

dataset.licenses

type: array

List of all licenses that have been involved in the processing history of the indicators in this dataset.

  • Note: Licenses should be propagated automatically from snapshots. Therefore, this field should only be manually filled out if automatic propagation fails. In the near future, this field may not even exist, since licenses should only exist inside origins.

dataset.sources

type: array

(DEPRECATED, no longer in use). List of all sources of the indicators in this dataset.


dataset.title

type: string | required (often automatic)

Title of the dataset (mostly for internal purposes, or for users of our data catalog) which is a one-line description of the dataset.

  • Must start with a capital letter.
  • Must not end with a period.
  • Should identify the dataset (i.e. the collection of tables resulting from one or more original data products).
  • Should be used only to override the automatic title (which usually is the title of the containing table). For example, use it when the dataset contains multiple tables.

dataset.update_period_days

type: integer | required

Expected number of days between consecutive updates of this dataset by OWID, typically 30, 90 or 365.

  • Must be defined in the garden step.
  • Must be an integer.
  • Must specify the update period of OWID's data, not the producer's data (although they may often coincide, e.g. 365).
DO DON'T
«7» «2023-01-07»
«30» «monthly»
«90» «0.2»
«365» «1/365»