Skip to content

The URI

We use URIs throughout all the ETL to identify files and datasets. The format of a URI varies depending on the step we are dealing with, but in general they follow the following convention:

<prefix>://<path>

Prefix

Most of the time, the prefix will either be snapshot or data. The former is used for snapshots of upstream dat files, and the latter for ETL datasets (with different levels of curation).

Prefix Description
snapshot Used for snapshot steps.
data Used for meadow, garden, grapher and most of the ETL steps where we operate with curated Datasets.
walden ⚠ Deprecated. Used before the introduction of snapshot.
backport Used to import datasets from the OWID database that are not present in the ETL.

Path

The format of the path is different depending on the prefix.

Path for snapshot://

snapshot://<namespace>/<version>/<filename>.<extension>

where

Prefix Description
namespace Used to group files from similar topics or sources. Namespace are typically source names (e.g. un) or topic names (e.g. health).
version Version of the file. Typically, we use the date the file was downloaded in the format YYYY-mm-dd.
filename Name of the downloaded file.
extension Extension of the file.

Example

snapshot://ember/2023-02-20/yearly_electricity.csv

Path for data://

data://<channel>/<namespace>/<version>/<dataset-name>

where

Prefix Description
channel Denotes the curation level of the dataset. Possible values include meadow, garden, grapher, explorers.
namespace Used to group datasets from similar topics or sources. Namespace are typically source names (e.g. un) or topic names (e.g. health).
version Version of the file. Typically, we use the date the file was downloaded in the format YYYY-mm-dd.
dataset-name Short name of the curated dataset (e.g. un_wpp).

Examples

  • Meadow: data://meadow/nasa/2023-03-06/ozone_hole_area
  • Garden: data://garden/nasa/2023-03-06/ozone_hole_area
  • Grapher: data://grapher/nasa/2023-03-06/ozone_hole_area
  • Explorers: data://explorers/faostat/2023-02-22/food_explorer

Path for walden://

walden steps are no longer used. Use snapshot instead.

walden://<namespace>/<version>/<dataset-name>

where

Prefix Description
namespace Used to group files from similar topics or sources. Namespace are typically source names (e.g. un) or topic names (e.g. health).
version Version of the file. Typically, we use the date the file was downloaded in the format YYYY-mm-dd.
dataset-name Short name of the curated dataset (e.g. un_wpp).

Example

walden://irena/2022-10-07/renewable_electricity_capacity_and_generation

Path for backport://

backport://backport/owid/latest/<dataset-name>

where

Prefix Description
dataset-name Name of the dataset. It follows the format dataset_<dataset-id>_<dataset-name>, where dataset-id corresponds to the dataset Grapher id and dataset-name is the name of the dataset in Grapher (with lower case, and all symbols replaced by underscores).

Example

backport://backport/owid/latest/dataset_5676_global_health_observatory__world_health_organization__2022_08