dataset
An ETL dataset is a collection of tables. The dataset metadata fields are the attributes of the DatasetMeta
object in ETL.
dataset.description
type: string
| recommended (often automatic)
Description of the dataset (mostly for internal purposes, or for users of our data catalog) which is a one- (or a few) paragraph description of the content of the tables.
- Must start with a capital letter.
- Must end with a period.
- Should not mention other metadata fields (e.g.
producer
ordate_published
). Exceptions:- The other metadata fields are crucial in the description of the data product.
- Should describe the dataset (i.e. the collection of tables resulting from one or more original data products).
- Should ideally contain just one or a few paragraphs, that describe its content succinctly.
- Should be used only to override the automatic description (which usually is the description of the containing table). For example, use it when the dataset contains multiple tables.
dataset.licenses
type: array
List of all licenses that have been involved in the processing history of the indicators in this dataset. NOTE: Licenses should be propagated automatically from snapshots. Therefore, this field should only be manually filled out if automatic propagation fails. In the near future, this field may not even exist, since licenses
should only exist inside origins
.
dataset.origins
type: array
| required (automatic)
List of all origins of the indicators in this dataset. NOTE: Origins should be propagated automatically from snapshots. Therefore, this field should only be manually filled out if automatic propagation fails.
dataset.sources
type: array
List of all sources of the indicators in this dataset. NOTE: This is no longer in use, you should use origins.
dataset.title
type: string
| required (often automatic)
Title of the dataset (mostly for internal purposes, or for users of our data catalog) which is a one-line description of the dataset.
- Must start with a capital letter.
- Must not end with a period.
- Should identify the dataset (i.e. the collection of tables resulting from one or more original data products).
- Should be used only to override the automatic title (which usually is the title of the containing table). For example, use it when the dataset contains multiple tables.
dataset.update_period_days
type: integer
| required
Expected number of days between consecutive updates of this dataset by OWID, typically 30
, 90
or 365
.
- Must be defined in the garden step.
- Must be an integer.
- Must specify the update period of OWID's data, not the producer's data (although they may often coincide, e.g.
365
).
DO | DON'T |
---|---|
«7 » |
«2023-01-07 » |
«30 » |
«monthly » |
«90 » |
«0.2 » |
«365 » |
«1/365 » |