Our World in Data has a whole team dedicated to data management that takes data from publicly available sources (e.g. the UN Food and Agriculture Organisation), and makes it available to our researchers to analyse and create visualisation for their articles.
The ETL project supports an opinionated data management workflow, which separates a data manager's work into several stages:
graph TB snapshot --> format --> harmonise --> import --> publish
The design of the ETL involves steps that mirror the stages above, which help us to meet several design goals of the project:
- Snapshot step: Take a snapshot of the upstream data product and store it on our end.
- Meadow step: Bring the data into a common format.
- Garden step: Harmonise the names of countries, genders and any other columns we may want to join on. Also do the necessary data processing to make the dataset usable for our needs.
- Grapher step: Import the data to our internal MySQL database.
A data manager must implement all these steps to make something chartable on the Our World in Data site.
When all steps (1 to 4) are implemented, the data is available for publication on our site. The publication step can involve creating new charts or updating existing ones with the new data.
Note that there are other steps, which are less frequently used but are useful in some particular instances.