Our World In Data - Data APIs¶
Our World in Data offers a curated collection of charts on our website, with data and metadata accessible via our Public Chart API. The API provides data in CSV and JSON formats over HTTP, enabling seamless integration with any programming language. It is specifically designed to support the creation of interactive charts.
We also maintain a larger data catalog within our ETL system, where we fetch, process, and prepare data used for our charts. This catalog contains significantly more data, though its level of curation varies across different sections. It also has an API, albeit one that is currently less accessible; at this time, we only offer a Python client for interacting with it. Unlike the Public Chart API, which exclusively provides time series data by time (typically year) and entity (typically country), our ETL catalog includes larger datasets with additional dimensions, such as age group and gender breakdowns.
This documentation briefly describes both of these APIs.
Chart data API¶
Our chart API is structured around charts on our website, i.e. at https://ourworldindata.org/grapher/* . You can find charts by searching our data catalog at https://ourworldindata.org/data.
Once you've found the chart with the data you need, simply append ".csv" to the URL to download the data or ".metadata.json" to retrieve the metadata. You can also add ".zip" to download a ZIP file that includes both files, along with a README in markdown format describing the data.
An example for our life expectancy chart:
- https://ourworldindata.org/grapher/life-expectancy - the page on our website where you can see the chart
- https://ourworldindata.org/grapher/life-expectancy.csv - the data for this chart (see below for options)
- https://ourworldindata.org/grapher/life-expectancy.metadata.json - the metadata for this chart, like the chart title, the units, how to cite the data sources
- https://ourworldindata.org/grapher/life-expectancy.zip - the above two plus a readme as zip file archive
Options¶
The following options can be specified for all of these endpoints:
csvType
full
(default): Get the full data, i.e. all time points and all entitiesfiltered
: Get only the data needed to display the visible chart. Different chart types return different subsets of the full data. For a map this will download data for only a single year but all countries, for a line chart it will be the selected time range and visible entities and so on for other chart types.
Note that if you use filtered
, the other query parameters in the URL will change what is downloaded. E.g. if you navigate to our life-expectancy chart and then visually select the country "Italy" and change the time range to 1950-2000 you will see that the URL in the browser is modified to include ?time=1980..2000&country=~ITA
. When you make a request to any of the endpoints above you can include any of these modifications to get exactly that data:
https://ourworldindata.org/grapher/life-expectancy.csv?csvType=filtered&time=1980..2000&country=~ITA
useColumnShortNames
false
(default): Column names are long, use capitalization and whitespace - e.g.Period life expectancy at birth - Sex: all - Age: 0
true
: Column names are short and don't use whitespace - e.g.life_expectancy_0__sex_all__age_0
https://ourworldindata.org/grapher/life-expectancy.csv?useColumnShortNames=true
Example notebooks¶
Check out this list of public example notebooks that demonstrate the use of our chart API:
- Example python notebook on Google Colab using Pandas
- ObservableHQ notebook using Javascript to recreate the life expectancy chart
CSV structure¶
Each row in the CSV file corresponds to an observation for an entity (most often a country or region) at a specific time point (generally a year). For example, the first three rows of data from our life expectancy chart appear as follows:
Entity,Code,Year,Period life expectancy at birth - Sex: all - Age: 0
Afghanistan,AFG,1950,27.7275
Afghanistan,AFG,1951,27.9634
The first two columns in the CSV file are "Entity" and "Code." "Entity" is the name of the entity, typically a country, such as "United States." "Code" is the OWID internal entity code used for countries or regions. For standard countries, this matches the ISO alpha-3 code (e.g., "USA"); for non-standard or historical countries, we use custom codes. Country and region names are standardized across all Our World in Data datasets, allowing you to join multiple datasets using either of these columns.
The third column is either "Year" or "Day". If the data is annual, this is "Year" and contains only the year as an integer. If the column is "Day", the column contains a date string in the form "YYYY-MM-DD".
The final columns are the data columns, which are the time series that powers the chart. For simple line charts there is only a single data column, whereas more complex charts can have more columns.
Metadata structure¶
The .metadata.json
file contains metadata about the data package. The "charts" key contains information to recreate the chart, like the title, subtitle etc. The "columns" key contains information about each of the columns in the csv, like the unit, timespan covered, citation for the data etc. Here is a (slightly shortened) example of the metadata for the life-expectancy chart:
{
"chart": {
"title": "Life expectancy",
"subtitle": "The [period life expectancy](#dod:period-life-expectancy) at birth, in a given year.",
"citation": "UN WPP (2022); HMD (2023); Zijdeman et al. (2015); Riley (2005)",
"originalChartUrl": "https://ourworldindata.org/grapher/life-expectancy",
"selection": ["World", "Americas", "Europe", "Africa", "Asia", "Oceania"]
},
"columns": {
"Period life expectancy at birth - Sex: all - Age: 0": {
"titleShort": "Life expectancy at birth",
"titleLong": "Life expectancy at birth - Various sources – period tables",
"descriptionShort": "The period life expectancy at birth, in a given year.",
"descriptionKey": [
"Period life expectancy is a metric that summarizes death rates across all age groups in one particular year.",
"..."
],
"shortUnit": "years",
"unit": "years",
"timespan": "1543-2021",
"type": "Numeric",
"owidVariableId": 815383,
"shortName": "life_expectancy_0__sex_all__age_0",
"lastUpdated": "2023-10-10",
"nextUpdate": "2024-11-30",
"citationShort": "UN WPP (2022); HMD (2023); Zijdeman et al. (2015); Riley (2005) – with minor processing by Our World in Data",
"citationLong": "UN WPP (2022); HMD (2023); Zijdeman et al. (2015); Riley (2005) – ...",
"fullMetadata": "https://api.ourworldindata.org/v1/indicators/815383.metadata.json"
}
},
"dateDownloaded": "2024-10-30"
}
ETL catalog API¶
The ETL catalog API makes it possible to access the dataframes our data scientists use to prepare the data for our public charts.
When using this API, you have access to the public catalog of data processed by our data team. The catalog indexes tables of data, rather than datasets or individual indicators. To learn more, read about our data model.
At the moment, this API only supports Python.
Our ETL API is in beta
We currently only provide a python API for our ETL catalog. Our hope is to extend this to other languages in the future. Please report any issue that you may find.
(see example notebook)
owid-catalog¶
A Pythonic API for working with OWID's data catalog.
Status: experimental, APIs likely to change
Overview¶
Our World in Data is building a new data catalog, with the goal of our datasets being reproducible and transparent to the general public. That project is our etl, which going forward will contain the recipes for all the datasets we republish.
This library allows you to query our data catalog programmatically, and get back data in the form of Pandas data frames, perfect for data pipelines or Jupyter notebook explorations.
graph TB
etl -->|reads| walden[upstream datasets]
etl -->|generates| s3[data catalog]
catalog[owid-catalog-py] -->|queries| s3
We would love feedback on how we can make this library and overall data catalog better. Feel free to send us an email at info@ourworldindata.org, or start a discussion on Github.
Quickstart¶
Install with pip install owid-catalog
. Then you can get data in two different ways.
Charts catalog¶
This API attempts to give you exactly the data you in a chart on our site.
from owid.catalog import charts
# get the data for one chart by URL
df = charts.get_data('https://ourworldindata.org/grapher/life-expectancy')
Notice that the last part of the URL is the chart's slug, its identifier, in this case life-expectancy
. Using the slug alone also works.
df = charts.get_data('life-expectancy')
Data science API¶
We also curate much more data than is available on our site. To access that in efficient binary (Feather) format, use our data science API.
This API is designed for use in Jupyter notebooks.
from owid import catalog
# look for Covid-19 data, return a data frame of matches
catalog.find('covid')
# load Covid-19 data from the Our World in Data namespace as a data frame
df = catalog.find('covid', namespace='owid').load()
There many be multiple versions of the same dataset in a catalog, each will have a unique path. To easily load the same dataset again, you should record its path and load it this way:
from owid import catalog
path = 'garden/ihme_gbd/2023-05-15/gbd_mental_health_prevalence_rate/gbd_mental_health_prevalence_rate'
rc = catalog.RemoteCatalog()
df = rc[path]
Development¶
You need Python 3.9+, uv
and make
installed. Clone the repo, then you can simply run:
# run all unit tests and CI checks
make test
# watch for changes, then run all checks
make watch
Changelog¶
v0.3.11
- Add support for Python 3.12 in
pypackage.toml
v0.3.10
- Add experimental chart data API in
owid.catalog.charts
v0.3.9
- Switch from isort & black & fake8 to ruff
v0.3.8
- Pin dataclasses-json==0.5.8 to fix error with python3.9
v0.3.7
- Fix bugs.
- Improve metadata propagation.
- Improve metadata YAML file handling, to have common definitions.
- Remove
DatasetMeta.origins
. v0.3.6
- Fixed tons of bugs
processing.py
module with pandas-like functions that propagate metadata- Support for Dynamic YAML files
- Support for R2 alongside S3
v0.3.5
- Remove
catalog.frames
; useowid-repack
package instead - Relax dependency constraints
- Add optional
channel
argument toDatasetMeta
- Stop supporting metadata in Parquet format, load JSON sidecar instead
- Fix errors when creating new Table columns
v0.3.4
- Bump
pyarrow
dependency to enable Python 3.11 support v0.3.3
- Add more arguments to
Table.__init__
that are often used in ETL - Add
Dataset.update_metadata
function for updating metadata from YAML file - Python 3.11 support via update of
pyarrow
dependency v0.3.2
- Fix a bug in
Catalog.__getitem__()
- Replace
mypy
type checker bypyright
v0.3.1
- Sort imports with
isort
- Change black line length to 120
- Add
grapher
channel - Support path-based indexing into catalogs
v0.3.0
- Update
OWID_CATALOG_VERSION
to 3 - Support multiple formats per table
- Support reading and writing
parquet
files with embedded metadata - Optional
repack
argument when adding tables to dataset - Underscore
|
- Get
version
field fromDatasetMeta
init - Resolve collisions of
underscore_table
function - Convert
version
tostr
and load jsondimensions
v0.2.9
- Allow multiple channels in
catalog.find
function v0.2.8
- Update
OWID_CATALOG_VERSION
to 2 v0.2.7
- Split datasets into channels (
garden
,meadow
,open_numbers
, ...) and make garden default one - Add
.find_latest
method to Catalog v0.2.6
- Add flag
is_public
for public/private datasets - Enforce snake_case for table, dataset and variable short names
- Add fields
published_by
andpublished_at
to Source - Added a list of supported and unsupported operations on columns
- Updated
pyarrow
v0.2.5
- Fix ability to load remote CSV tables
v0.2.4
- Update the default catalog URL to use a CDN
v0.2.3
- Fix methods for finding and loading data from a
LocalCatalog
v0.2.2
- Repack frames to compact dtypes on
Table.to_feather()
v0.2.1
- Fix key typo used in version check
v0.2.0
- Copy dataset metadata into tables, to make tables more traceable
- Add API versioning, and a requirement to update if your version of this library is too old
v0.1.1
- Add support for Python 3.8
v0.1.0
- Initial release, including searching and fetching data from a remote catalog