Metadata reference¶
variable
¶
An indicator (also commonly called 'variable') is a collection of data points (usually a time series) with metadata. The indicator metadata fields are the attributes of the VariableMeta
object in ETL.
variable.description_from_producer
¶
type: string
| recommended (if existing)
Description of the indicator written by the producer, if any was given.
- Must start with a capital letter.
- Must end with a period.
- Should be identical to the producer's text, except for some formatting changes, typo corrections, or other appropriate minor edits.
- Should only be given if the producer clearly provides such definitions in a structured way. Avoid spending time searching for a definition given by the producer elsewhere.
variable.description_key
¶
type: array
| recommended (for curated indicators)
List of key pieces of information about the indicator.
- Must be a list of one or more short paragraphs.
- Each paragraph must start with a capital letter.
- Each paragraph must end with a period.
- Must not contain
description_short
(although there might be some overlap of information). - Should contain all the key information about the indicator (except that already given in
description_short
). - Should include the key information given in other fields like
grapher_config.subtitle
(if different fromdescription_short
) andgrapher_config.note
. - Should not contain information about processing (which should be in
description_processing
). - Should only contain information that is key to the public.
- Anything that is too detailed or technical should be left in the code.
variable.description_processing
¶
type: string
| required (if applicable)
Relevant information about the processing of the indicator done by OWID.
- Must start with a capital letter.
- Must end with a period.
- Must be used if important editorial decisions have been taken during data processing.
- Must not be used to describe common processing steps like country harmonization.
- Should only contain key processing information to the public.
- Anything that is too detailed or technical should be left in the code.
variable.description_short
¶
type: string
| required
One or a few lines that complement the title to have a short description of the indicator.
- Must start with a capital letter.
- Must end with a period.
- Must be one short paragraph (for example suitable to fit in a chart subtitle).
- Should not mention any other metadata fields (like information about the processing, or the origins, or the units). Exceptions:
- The unit can be mentioned if it is crucial for the description.
variable.display
¶
We keep display for the time being as the 'less powerful sibling' of grapher config.
variable.display.color
¶
type: string
Color to use for the indicator in e.g. line charts.
variable.display.conversionFactor
¶
type: number
Conversion factor to apply to indicator values.
- Note: We should avoid using this, and instead convert data and units (and possibly other metadata fields where the units are mentioned) consistently in the ETL grapher step.
variable.display.description
¶
type: string
Description to display for the indicator, to replace the indicator's description
.
variable.display.entityAnnotationsMap
¶
type: string
Entity annotations
variable.display.includeInTable
¶
type: boolean
Whether to render this indicator in the table sheet.
variable.display.isProjection
¶
type: boolean
, string
, string
Indicates if this time series is a forward projection (if so then this is rendered differently in e.g. line charts).
variable.display.name
¶
type: string
| required
Indicator's title to display in the legend of a chart. NOTE: For backwards compatibility, display.name
also replaces the indicator's title in other public places. Therefore, whenever display.name
is defined, title_public
should also be defined.
- Must be very short, to fit the legend of a chart.
- Must not end with a period.
- Should not mention other metadata fields like
producer
orversion
.
DO | DON'T |
---|---|
«Agriculture » |
«Nitrous oxide emissions from agriculture » |
«Area harvested » |
«Barley | 00000044 || Area harvested | 005312 || hectares » |
variable.display.numDecimalPlaces
¶
type: integer
, string
, string
Number of decimal places to show in charts (and in the table tab).
variable.display.numSignificantFigures
¶
type: integer
, string
Number of significant rounding figures in charts.
variable.display.roundingMode
¶
type: string
Specifies the rounding mode to use.
variable.display.shortUnit
¶
type: string
Short unit to use in charts instead of the indicator's short_unit
.
variable.display.tableDisplay
¶
Configuration for the table tab for this indicator, with options hideAbsoluteChange
and hideRelativeChange
.
variable.display.tableDisplay.hideAbsoluteChange
¶
type: boolean
Whether to hide the absolute change.
variable.display.tableDisplay.hideRelativeChange
¶
type: boolean
Whether to hide the relative change.
variable.display.tolerance
¶
type: integer
Tolerance (in years or days) to use in charts. If data points are missing, the closest data point will be shown, if it lies within the specified tolerance.
variable.display.unit
¶
type: string
Unit to use in charts instead of the indicator's unit
.
variable.display.yearIsDay
¶
type: boolean
Switch to indicate if the number in the year column represents a day (since zeroDay) or a year.
variable.display.zeroDay
¶
type: string
ISO date day string for the starting date if yearIsDay
is True
.
variable.license
¶
type: string
| required (in the future this could be automatic)
License of the indicator, which depends on the indicator's processing level and the origins' licenses.
- If the indicator's
processing_level
is major, assignCC BY 4.0
. - If the indicator's
processing_level
is minor, choose the most strict license among the origins'licenses
.
variable.origins
¶
type: array
List of all origins of the indicator.
- Note: Origins should be propagated automatically from snapshots. Therefore, this field should only be manually filled out if automatic propagation fails.
variable.presentation
¶
An indicator's presentation defines how the indicator's metadata will be shown on our website (e.g. in data pages). The indicator presentation metadata fields are the attributes of the VariablePresentationMeta
object in ETL.
variable.presentation.attribution
¶
type: string
| optional
Citation of the indicator's origins, to override the automatic format producer1 (year1); producer2 (year2)
.
- Must start with a capital letter. Exceptions:
- The name of the institution or the author must be spelled with small letter, e.g.
van Haasteren
.
- The name of the institution or the author must be spelled with small letter, e.g.
- Must join multiple attributions by a
;
. - Must not end in a period (and must not end in
;
). - Must contain the year of
date_published
, for each origin, in parenthesis. - Should only be used when the automatic format
producer1 (year1); producer2 (year2)
needs to be overridden.
DO | DON'T |
---|---|
«Energy Institute - Statistical Review of World Energy (2023); Ember (2022) » |
«UN (2023), WHO (2023) » |
variable.presentation.attribution_short
¶
type: string
| recommended (for curated indicators)
Very short citation of the indicator's main producer(s).
- Must start with a capital letter. Exceptions:
- The name of the institution or the author must be spelled with small letter, e.g.
van Haasteren
.
- The name of the institution or the author must be spelled with small letter, e.g.
- Must not end in a period.
- Should be very short.
- Should be used if the automatic concatenation of origin's
attribution_short
are too long. In those cases, choose the most importantattribution
(e.g. the main producer of the data).
variable.presentation.faqs
¶
type: array
| recommended (for curated indicators)
List of references to questions in an FAQ google document, relevant to the indicator.
- Each reference must contain
fragment_id
(question identifier) andgdoc_id
(document identifier).
variable.presentation.grapher_config
¶
Our World in Data grapher configuration. Most of the fields can be left empty and will be filled with reasonable default values.
Find more details on its attributes here.
variable.presentation.title_public
¶
type: string
| optional
Indicator title to be shown in public places like data pages, that overrides the indicator's title.
- Must start with a capital letter.
- Must not end with a period.
- Must be one short sentence (a few words).
- Should not mention other metadata fields like
producer
orversion
. - Should help OWID and expert users identify the indicator.
- For big datasets where constructing human-readable titles is hard (e.g. FAOSTAT), this field is mandatory, to improve the public appearance of the title.
- When
display.name
is defined (to edit the default title in the legend of a chart),title_public
must also be defined (otherwisedisplay.name
will be used as a public title).
variable.presentation.title_variant
¶
type: string
| optional
Short disambiguation of the title that references a special feature of the methods or nature of the data.
- Must start with a capital letter.
- Must not end in a period.
- Should be very short.
- Should only be used if the indicator's title is ambiguous (e.g. if there are multiple indicators with the same title).
- Should not reference the data provider. For that, instead, use
attribution_short
.
DO | DON'T |
---|---|
«Age-standardized » |
«The data is age-standardized » |
«Historical data » |
«Historical data from 1800 to 2010 » |
variable.presentation.topic_tags
¶
type: array
| recommended (for curated indicators)
List of topics where the indicator is relevant.
- Must be an existing topic tag, and be spelled correctly (see the list of topic tags in datasette (requires Tailscale).
- The first tag must correspond to the most relevant topic page (since that topic page will be used in citations of this indicator).
- Should contain 1, 2, or at most 3 tags.
variable.presentation_license
¶
License to display for the indicator, overriding license
.
variable.presentation_license.name
¶
type: string
variable.presentation_license.url
¶
type: string
variable.processing_level
¶
type: ['minor', 'major']
, string
| required (in the future this could be automatic).
Level of processing that the indicator values have experienced.
- Must be
minor
if the indicator has undergone only minor operations since its origin:- Rename entities (e.g. countries or columns)
- Multiplication by a constant (e.g. unit change)
- Drop missing values.
- Must be
major
if any other operation is used:- Data aggregates (e.g. sum data for continents or income groups)
- Operations between indicators (e.g. per capita, percentages, annual changes)
- Concatenation of indicators, etc.
variable.short_unit
¶
type: string
| required
Characters that represent the unit we use to measure the indicator value.
- Must follow the rules of capitalization of the International System of Units, when applicable.
- Must not end with a period.
- Must be empty if the indicator has no units.
- Should not contain spaces.
- If, for clarity, we prefer to simplify the units in a chart, e.g. to show
kWh
instead ofkWh/person
, usedisplay.short_unit
for the simplified units, and keep the correct one inindicator.short_unit
(and ensure there is no ambiguity in the chart).
DO | DON'T |
---|---|
«t/ha » |
«t / ha » |
«% » |
«pct » |
«kWh/person » |
«pc » |
variable.sort
¶
type: array
variable.sources
¶
type: array
List of all sources of the indicator. Automatically filled. NOTE: This is no longer in use, you should use origins.
variable.title
¶
type: string
| required
Title of the indicator, which is a few words definition of the indicator.
- Must start with a capital letter.
- Must not end with a period.
- Must be one short sentence (a few words).
- For 'small datasets', this should be the publicly displayed title. For 'big datasets' (like FAOSTAT, with many dimensions), it can be less human-readable, optimized for internal searches (then, use
presentation.title_public
for the public title). - Should not mention other metadata fields like
producer
orversion
.
DO | DON'T |
---|---|
«Number of neutron star mergers in the Milky Way » |
«Number of neutron stars (NASA) » |
«Share of neutron star mergers that happen in the Milky Way » |
«Share of neutron star mergers that happen in the Milky Way (2023) » |
«Barley | 00000044 || Area harvested | 005312 || hectares » |
«Barley » |
variable.type
¶
type: string
Indicator type is usually automatically inferred from the data, but must be manually set for ordinal and categorical types.
variable.unit
¶
type: string
| required
Very concise name of the unit we use to measure the indicator values.
- Must not start with a capital letter.
- Must not end with a period.
- Must be empty if the indicator has no units.
- Must be in plural.
- Must be a metric unit when applicable.
- Should not use symbols like “/”.
- If it is a derived unit, use 'per' to denote a division, e.g. '... per hectare', or '... per person'.
- Should be '%' for percentages.
DO | DON'T |
---|---|
«tonnes per hectare » |
«tonnes/hectare » |
«kilowatts per person » |
«kilowatts per capita » |
origin
¶
An indicator's origin is the information about the snapshot where the indicator's data and metadata came from. A snapshot is a subset of data (a 'slice') taken on a specific day from a data product (often a public dataset, but sometimes a paper or a database). The producer of the data product is typically an institution or a set of authors.
A snapshot often coincides with the data product (e.g. the dataset is a public csv file, and we download the entire file). But sometimes the data product is a bigger object (e.g. a set of files, a paper or a database) and the snapshot is just a particular subset of the data product (e.g. one of the files, or a table from a paper, or the result of a query). The origin fields are the attributes of the Origin
object in ETL.
origin.attribution
¶
type: string
| optional
Citation of the data product to be used when the automatic format producer (year)
needs to be overridden.
- Must start with a capital letter. Exceptions:
- The name of the institution or the author must be spelled with small letter, e.g.
van Haasteren
.
- The name of the institution or the author must be spelled with small letter, e.g.
- Must not end with a period.
- Must end with the year of
date_published
in parenthesis. - Must not include any semicolon
;
. - Should only be used if the automatic attribution format
producer (year)
is considered uninformative. For example, when the title of the data product is well known and should be cited along with the producer, or when the original version of the data product should also be mentioned. - If this field is used to mention the data product, follow the preferred format
{producer} - {title} {version_producer} ({year})
(whereversion_producer
may be omitted). - If the producer explicitly asked for a specific short citation, follow their guidelines and ignore the above.
DO | DON'T |
---|---|
«Energy Institute - Statistical Review of World Energy (2023) » |
«Statistical Review of World Energy, Energy Institute (2023) », «Statistical Review of World Energy (Energy Institute, 2023) » |
origin.attribution_short
¶
type: string
| recommended
Shorter version of attribution
(without the year), usually an acronym of the producer, to be used in public places that are short on space.
- Must start with a capital letter. Exceptions:
- The name of the institution or the author must be spelled with small letter, e.g.
van Haasteren
.
- The name of the institution or the author must be spelled with small letter, e.g.
- Must not end with a period.
- Should refer to the producer or the data product (if well known), not the year or any other field.
- Should be an acronym, if the acronym is well-known, otherwise a brief name.
DO | DON'T |
---|---|
«FAO » |
«UN FAO », «FAO (2023) » |
«World Bank » |
«WB » |
origin.citation_full
¶
type: string
| required
Full citation of the data product. If the producer expressed how to cite them, we should follow their guidelines.
- Must start with a capital letter.
- Must end with a period.
- Must include (wherever is appropriate) the year of publication, i.e. the year given in
date_published
. - If the producer specified how to cite them, this field should be identical to the producer's text, except for some formatting changes, typo corrections, or other appropriate minor edits.
- Note: This field can be as long as necessary to follow the producer's guidelines.
- If the origin is the compilation of multiple sources, they can be added here as a list.
origin.date_accessed
¶
type: string
| required
Exact day when the producer's data (in its current version) was downloaded by OWID.
- Must be a date with format
YYYY-MM-DD
. - Must be the date when the current version of the producer's data was accessed (not any other previous version).
«2023-09-07
»
origin.date_published
¶
type: string
| required
Exact day (or year, if exact day is unknown) when the producer's data (in its current version) was published.
- Must be a date with format
YYYY-MM-DD
, or, exceptionally,YYYY
. - Must be the date when the current version of the dataset was published (not when the dataset was first released).
«2023-09-07
»
«2023
»
origin.description
¶
type: string
| recommended
Description of the original data product.
- Must start with a capital letter.
- Must end with a period.
- Must not mention other metadata fields like
producer
orversion_producer
. Exceptions:- These other metadata fields are crucial in the description of the data product.
- Should describe the data product, not the snapshot (i.e. the subset of data we extract from the data product).
- Should ideally contain just one or a few paragraphs, that describe the data product succinctly.
- If the producer provides a good description, use that, either exactly or conveniently rephrased.
origin.description_snapshot
¶
type: string
| recommended (if the data product and snapshot do not coincide)
Additional information to append to the description of the data product, in order to describe the snapshot (i.e. the subset of data that we extract from the data product).
- Must start with a capital letter.
- Must end with a period.
- Should be defined only if the data product and the snapshot do not coincide.
- Should not repeat information given in
description
(the description of the data product). - Should not mention other metadata fields.
- If fields like
producer
ordate_published
are mentioned, placeholders should be used.
- If fields like
origin.license
¶
An origin's license is the license, assigned by a producer, of the data product from where we extracted the indicator's original data and metadata.
origin.license.name
¶
type: string
| required
Name of the license. Find more details on licensing at https://creativecommons.org/share-your-work/cclicenses/.
- If it's a standard license, e.g. CC, it should be one of the acronyms in the examples below.
- If the license is CC, but the version is not specified, assume 4.0.
- If it's a custom license defined by the producer, it should follow the producer's text.
- When the license of an external dataset is not specified, temporarily assume
CC BY 4.0
. Contact the producer before publishing.- If there is no response after a few days, ask Ed or Este and decide on a case-by-case basis.
«Public domain
»
«CC0
»
«PDM
»
«CC BY 4.0
»
«CC BY-SA 4.0
»
«© GISAID 2023
»
origin.license.url
¶
type: string
| required (if existing)
URL leading to the producer's website where the dataset license is specified.
- Must be a complete URL, i.e.
http...
. - Must not lead to a Creative Commons website or any other generic page, but to the place where the producer specifies the license of the data.
- If the license is specified inside, say, a PDF document, the URL should be the download link of that document.
- When the license of an external dataset is not specified, leave
url
empty.- Do not use the URL of the main page of the dataset if the license is not mentioned anywhere.
origin.producer
¶
type: string
| required
Name of the institution or the author(s) that produced the data product.
- Must start with a capital letter. Exceptions:
- The name of the institution or the author must be spelled with small letter, e.g.
van Haasteren
.
- The name of the institution or the author must be spelled with small letter, e.g.
- Must not end with a period. Exceptions:
- When using
et al.
(for papers with multiple authors).
- When using
- Must not include a date or year.
- Must not mention
Our World in Data
orOWID
. - Must not include any semicolon
;
. - Regarding authors:
- One author:
Williams
. - Two authors:
Williams and Jones
. - Three or more authors:
Williams et al.
.
- One author:
- Regarding acronyms:
- If the acronym is more well known than the full name, use just the acronym, e.g.
NASA
. - If the acronym is not well known, use the full name, e.g.
Energy Institute
. - If the institution explicitly asks, follow their guidelines, e.g.
Food and Agriculture Organization of the United Nations
(instead ofFAO
).
- If the acronym is more well known than the full name, use just the acronym, e.g.
DO | DON'T |
---|---|
«NASA » |
«NASA (2023) », «N.A.S.A. », «N A S A », «National Aeronautics and Space Administration », «Our World in Data based on NASA » |
«World Bank » |
«WB » |
«Williams et al. » |
«Williams et al. (2023) », «Williams et al », «John Williams et al. » |
«van Haasteren et al. » |
«Van Haasteren et al. » |
«Williams and Jones » |
«Williams & Jones », «John Williams and Indiana Jones » |
What should be the value if there are multiple producers?
We don't have a clear guideline for this at the moment, and depending on the case you might want to specify all the producers. However, a good option is to use 'Various sources'.
origin.title
¶
type: string
| required
Title of the original data product.
- Must start with a capital letter.
- Must not end with a period.
- Must not mention other metadata fields like
producer
orversion_producer
. Exceptions:- The name of the origin is well known and includes other metadata fields.
- Should identify the data product, not the snapshot (i.e. the subset of data that we extract from the data product).
- If the producer's data product has a well-known name, use that name exactly (except for minor changes like typos).
- If the producer's data product does not have a well-known name, use a short sentence that describes its content.
DO | DON'T |
---|---|
«Global Carbon Budget » |
«Global Carbon Budget (fossil fuels) » |
origin.title_snapshot
¶
type: string
| required (if different from title
)
Title of the snapshot (i.e. the subset of data that we extract from the data product).
- Must start with a capital letter.
- Must not end with a period.
- Must not mention other metadata fields like
producer
orversion_producer
. Exceptions:- The name of the origin is well known and includes other metadata fields.
- The producer's data product has a well-known name, and the snapshot is a specific slice of the data product. If so, use the format 'Data product - Specific slice'.
- Note: This means that the title of the snapshot may contain the title of the data product.
- Must not include any semicolon
;
. - Should only be used when the snapshot does not coincide with the entire data product.
- Should not include words like
data
,dataset
ordatabase
, unless that's part of a well-known name of the origin. - If the producer's data product does not have a well-known name, use a short sentence that describes the snapshot.
DO | DON'T |
---|---|
«Global Carbon Budget - Fossil fuels » |
«Global Carbon Budget » |
«Neutron star mergers » |
«Neutron star mergers (NASA, 2023) », «Data on neutron star mergers », «Neutron star mergers dataset » |
origin.url_download
¶
type: string
| required (if existing)
Producer's URL that directly downloads their data as a single file.
- Must be a complete URL or S3 URI, i.e.
http...
. - Must be a direct download link.
- The URL must not lead to a website that requires user input to download the dataset. If there is no direct download URL, this field should be empty.
«https://data.some_institution.com/dataset_12/data.csv
»
«s3://owid-private/data.csv
»
origin.url_main
¶
type: string
| required
Producer's URL leading to the main website of the original data product.
- Must be a complete URL, i.e.
http...
. - Should lead to a website where the data product is described.
«https://data.some_institution.com/dataset_12
»
origin.version_producer
¶
type: string
, number
| recommended (if existing)
Producer's version of the data product.
- Should be used if the producer specifies the version of the data product.
- Should follow the same naming as the producer, e.g.
v13
,2023.a
,version II
.
table
¶
A table is a collection of indicators that share the same index. The table metadata fields are the attributes of the TableMeta
object in ETL.
table.common
¶
table.description
¶
type: string
| recommended (often automatic)
Description of the table (mostly for internal purposes, or for users of our data catalog) which is a one- (or a few) paragraph description of the table.
- Must start with a capital letter.
- Must end with a period.
- Should not mention other metadata fields (e.g.
producer
ordate_published
). Exceptions:- The other metadata fields are crucial in the description of the data product.
- Should ideally contain just one or a few paragraphs, that describe its content succinctly.
- Should be used only to override the automatic description (which usually is the description of the origin). For example, use it when the table has multiple origins.
table.title
¶
type: string
| required (often automatic)
Title of the table (mostly for internal purposes, or for users of our data catalog) which is a few words description of the table.
- Must start with a capital letter.
- Must not end with a period.
- Should identify the table.
- Should be used only to override the automatic title (which usually is the title of the origin). For example, use it when the table has multiple origins.
table.variables
¶
An indicator (also commonly called 'variable') is a collection of data points (usually a time series) with metadata. The indicator metadata fields are the attributes of the VariableMeta
object in ETL.
dataset
¶
An ETL dataset is a collection of tables. The dataset metadata fields are the attributes of the DatasetMeta
object in ETL.
dataset.description
¶
type: string
| recommended (often automatic)
Description of the dataset (mostly for internal purposes, or for users of our data catalog) which is a one- (or a few) paragraph description of the content of the tables.
- Must start with a capital letter.
- Must end with a period.
- Should not mention other metadata fields (e.g.
producer
ordate_published
). Exceptions:- The other metadata fields are crucial in the description of the data product.
- Should describe the dataset (i.e. the collection of tables resulting from one or more original data products).
- Should ideally contain just one or a few paragraphs, that describe its content succinctly.
- Should be used only to override the automatic description (which usually is the description of the containing table). For example, use it when the dataset contains multiple tables.
dataset.licenses
¶
type: array
List of all licenses that have been involved in the processing history of the indicators in this dataset.
- Note: Licenses should be propagated automatically from snapshots. Therefore, this field should only be manually filled out if automatic propagation fails. In the near future, this field may not even exist, since
licenses
should only exist insideorigins
.
dataset.non_redistributable
¶
type: boolean
Whether the dataset is non-redistributable, and therefore data should not be downloadable from the chart.
dataset.sources
¶
type: array
(DEPRECATED, no longer in use). List of all sources of the indicators in this dataset.
dataset.title
¶
type: string
| required (often automatic)
Title of the dataset (mostly for internal purposes, or for users of our data catalog) which is a one-line description of the dataset.
- Must start with a capital letter.
- Must not end with a period.
- Should identify the dataset (i.e. the collection of tables resulting from one or more original data products).
- Should be used only to override the automatic title (which usually is the title of the containing table). For example, use it when the dataset contains multiple tables.
dataset.update_period_days
¶
type: integer
| required
Expected number of days between consecutive updates of this dataset by OWID, typically 30
, 90
or 365
. Set to 0
if we do not have the intention to update it.
- Must be defined in the garden step.
- Must be an integer.
- Must specify the update period of OWID's data, not the producer's data (although they may often coincide, e.g.
365
).
DO | DON'T |
---|---|
«7 » |
«2023-01-07 » |
«30 » |
«monthly » |
«90 » |
«0.2 » |
«365 » |
«1/365 » |