origin
An indicator's origin is the information about the snapshot where the indicator's data and metadata came from. A snapshot is a subset of data (a 'slice') taken on a specific day from a data product (often a public dataset, but sometimes a paper or a database). The producer of the data product is typically an institution or a set of authors.
A snapshot often coincides with the data product (e.g. the dataset is a public csv file, and we download the entire file). But sometimes the data product is a bigger object (e.g. a set of files, a paper or a database) and the snapshot is just a particular subset of the data product (e.g. one of the files, or a table from a paper, or the result of a query). The origin fields are the attributes of the Origin
object in ETL.
origin.attribution
type: string
| optional
Citation of the data product to be used when the automatic format producer (year)
needs to be overridden.
- Must start with a capital letter. Exceptions:
- The name of the institution or the author must be spelled with small letter, e.g.
van Haasteren
.
- The name of the institution or the author must be spelled with small letter, e.g.
- Must not end with a period.
- Must end with the year of
date_published
in parenthesis. - Must not include any semicolon
;
. - Should only be used if the automatic attribution format
producer (year)
is considered uninformative. For example, when the title of the data product is well known and should be cited along with the producer, or when the original version of the data product should also be mentioned. - If this field is used to mention the data product, follow the preferred format
{producer} - {title} {version_producer} ({year})
(whereversion_producer
may be omitted). - If the producer explicitly asked for a specific short citation, follow their guidelines and ignore the above.
DO | DON'T |
---|---|
«Energy Institute - Statistical Review of World Energy (2023) » |
«Statistical Review of World Energy, Energy Institute (2023) », «Statistical Review of World Energy (Energy Institute, 2023) » |
origin.attribution_short
type: string
| recommended
Shorter version of attribution
(without the year), usually an acronym of the producer, to be used in public places that are short on space.
- Must start with a capital letter. Exceptions:
- The name of the institution or the author must be spelled with small letter, e.g.
van Haasteren
.
- The name of the institution or the author must be spelled with small letter, e.g.
- Must not end with a period.
- Should refer to the producer or the data product (if well known), not the year or any other field.
- Should be an acronym, if the acronym is well-known, otherwise a brief name.
DO | DON'T |
---|---|
«FAO » |
«UN FAO », «FAO (2023) » |
«World Bank » |
«WB » |
origin.citation_full
type: string
| required
Full citation of the data product. If the producer expressed how to cite them, we should follow their guidelines.
- Must start with a capital letter.
- Must end with a period.
- If the producer specified how to cite them, this field should be identical to the producer's text, except for some formatting changes, typo corrections, or other appropriate minor edits.
- Note: This field can be as long as necessary to follow the producer's guidelines.
- If the origin is the compilation of multiple sources, they can be added here as a list.
origin.date_accessed
type: string
| required
Exact day when the producer's data (in its current version) was downloaded by OWID.
- Must be a date with format
YYYY-MM-DD
. - Must be the date when the current version of the producer's data was accessed (not any other previous version).
«2023-09-07
»
origin.date_published
type: string
| required
Exact day (or year, if exact day is unknown) when the producer's data (in its current version) was published.
- Must be a date with format
YYYY-MM-DD
, or, exceptionally,YYYY
. - Must be the date when the current version of the dataset was published (not when the dataset was first released).
«2023-09-07
»
«2023
»
«latest
»
origin.description
type: string
| recommended
Description of the original data product.
- Must start with a capital letter.
- Must end with a period.
- Must not mention other metadata fields like
producer
orversion_producer
. Exceptions:- These other metadata fields are crucial in the description of the data product.
- Should describe the data product, not the snapshot (i.e. the subset of data we extract from the data product).
- Should ideally contain just one or a few paragraphs, that describe the data product succinctly.
- If the producer provides a good description, use that, either exactly or conveniently rephrased.
origin.description_snapshot
type: string
| recommended (if the data product and snapshot do not coincide)
Additional information to append to the description of the data product, in order to describe the snapshot (i.e. the subset of data that we extract from the data product).
- Must start with a capital letter.
- Must end with a period.
- Should be defined only if the data product and the snapshot do not coincide.
- Should not repeat information given in
description
(the description of the data product). - Should not mention other metadata fields.
- If fields like
producer
ordate_published
are mentioned, placeholders should be used.
- If fields like
origin.license
An origin's license is the license, assigned by a producer, of the data product from where we extracted the indicator's original data and metadata.
origin.license.name
type: string
| required
Name of the license.
- If it's a standard license, e.g. CC, it should be one of the acronyms in the examples below.
- If the license is CC, but the version is not specified, assume 4.0.
- If it's a custom license defined by the producer, it should follow the producer's text.
- When the license of an external dataset is not specified, temporarily assume
CC BY 4.0
. Contact the producer before publishing.- If there is no response after a few days, ask Ed or Este and decide on a case-by-case basis.
«Public domain
»
«CC0
»
«PDM
»
«CC BY 4.0
»
«CC BY-SA 4.0
»
«© GISAID 2023
»
origin.license.url
type: string
| required (if existing)
URL leading to the producer's website where the dataset license is specified.
- Must be a complete URL, i.e.
http...
. - Must not lead to a Creative Commons website or any other generic page, but to the place where the producer specifies the license of the data.
- If the license is specified inside, say, a PDF document, the URL should be the download link of that document.
- When the license of an external dataset is not specified, leave
url
empty.- Do not use the URL of the main page of the dataset if the license is not mentioned anywhere.
origin.producer
type: string
| required
Name of the institution or the author(s) that produced the data product.
- Must start with a capital letter. Exceptions:
- The name of the institution or the author must be spelled with small letter, e.g.
van Haasteren
.
- The name of the institution or the author must be spelled with small letter, e.g.
- Must not end with a period. Exceptions:
- When using
et al.
(for papers with multiple authors).
- When using
- Must not include a date or year.
- Must not mention
Our World in Data
orOWID
. - Must not include any semicolon
;
. - Regarding authors:
- One author:
Williams
. - Two authors:
Williams and Jones
. - Three or more authors:
Williams et al.
.
- One author:
- Regarding acronyms:
- If the acronym is more well known than the full name, use just the acronym, e.g.
NASA
. - If the acronym is not well known, use the full name, e.g.
Energy Institute
. - If the institution explicitly asks, follow their guidelines, e.g.
Food and Agriculture Organization of the United Nations
(instead ofFAO
).
- If the acronym is more well known than the full name, use just the acronym, e.g.
DO | DON'T |
---|---|
«NASA » |
«NASA (2023) », «N.A.S.A. », «N A S A », «National Aeronautics and Space Administration », «Our World in Data based on NASA » |
«World Bank » |
«WB » |
«Williams et al. » |
«Williams et al. (2023) », «Williams et al », «John Williams et al. » |
«van Haasteren et al. » |
«Van Haasteren et al. » |
«Williams and Jones » |
«Williams & Jones », «John Williams and Indiana Jones » |
What should be the value if there are multiple producers?
We don't have a clear guideline for this at the moment, and depending on the case you might want to specify all the producers. However, a good option is to use 'Various sources'.
origin.title
type: string
| required
Title of the original data product.
- Must start with a capital letter.
- Must not end with a period.
- Must not mention other metadata fields like
producer
orversion_producer
. Exceptions:- The name of the origin is well known and includes other metadata fields.
- Should identify the data product, not the snapshot (i.e. the subset of data that we extract from the data product).
- If the producer's data product has a well-known name, use that name exactly (except for minor changes like typos).
- If the producer's data product does not have a well-known name, use a short sentence that describes its content.
DO | DON'T |
---|---|
«Global Carbon Budget » |
«Global Carbon Budget (fossil fuels) » |
origin.title_snapshot
type: string
| required (if different from title
)
Title of the snapshot (i.e. the subset of data that we extract from the data product).
- Must start with a capital letter.
- Must not end with a period.
- Must not mention other metadata fields like
producer
orversion_producer
. Exceptions:- The name of the origin is well known and includes other metadata fields.
- Must not include any semicolon
;
. - Should only be used when the snapshot does not coincide with the entire data product.
- Should not include words like
data
,dataset
ordatabase
, unless that's part of a well-known name of the origin. - If the producer's data product has a well-known name, and the snapshot is a specific slice of the data product, use a title like 'Data product - Specific slice'. NOTE: The title of the snapshot may contain the title of the data product.
- If the producer's data product does not have a well-known name, use a short sentence that describes the snapshot.
DO | DON'T |
---|---|
«Global Carbon Budget - Fossil fuels » |
«Global Carbon Budget » |
«Neutron star mergers » |
«Neutron star mergers (NASA, 2023) », «Data on neutron star mergers », «Neutron star mergers dataset » |
origin.url_download
type: string
| required (if existing)
Producer's URL that directly downloads their data as a single file.
- Must be a complete URL, i.e.
http...
. - Must be a direct download link.
- The URL must not lead to a website that requires user input to download the dataset. If there is no direct download URL, this field should be empty.
«https://data.some_institution.com/dataset_12/data.csv
»
origin.url_main
type: string
| required
Producer's URL leading to the main website of the original data product.
- Must be a complete URL, i.e.
http...
. - Should lead to a website where the data product is described.
«https://data.some_institution.com/dataset_12
»
origin.version_producer
type: string
, number
| recommended (if existing)
Producer's version of the data product.
- Should be used if the producer specifies the version of the data product.
- Should follow the same naming as the producer, e.g.
v13
,2023.a
,version II
.