Python API (beta)¶
Installing¶
The owid-catalog
provides API access to the data. Install it with:
$ pip install owid-catalog
Searching for data¶
Firstly, import the owid.catalog
module:
from owid import catalog
You can search for an appropriate table of data using the find(query)
method:
catalog.find('covid')
table | dataset | version | namespace | channel | is_public | dimensions | path | formats | |
---|---|---|---|---|---|---|---|---|---|
676 | covid | covid19 | 2023-01-10 | owid | garden | True | [iso_code, date] | garden/owid/latest/covid/covid | [feather, parquet] |
catalog.find('energy').head()
table | dataset | version | namespace | channel | is_public | dimensions | path | formats | |
---|---|---|---|---|---|---|---|---|---|
690 | energy_consumption | energy_consumption | 2022-07-27 | eia | garden | True | [country, year] | garden/eia/2022-07-27/energy_consumption/energ... | [feather, parquet] |
691 | energy_mix | energy_mix | 2022-07-14 | bp | garden | True | [country, year] | garden/bp/2022-07-14/energy_mix/energy_mix | [feather, parquet] |
692 | energy_mix | energy_mix | 2022-12-28 | bp | garden | True | [country, year] | garden/bp/2022-12-28/energy_mix/energy_mix | [feather, parquet] |
762 | global_primary_energy | global_primary_energy | 2017-01-01 | smil | garden | True | [country, year] | garden/smil/2017-01-01/global_primary_energy/g... | [feather, parquet] |
763 | global_primary_energy | global_primary_energy | 2022-09-09 | energy | garden | True | [country, year] | garden/energy/2022-09-09/global_primary_energy... | [feather, parquet] |
Loading data¶
The find()
method returns a data frame of results. Calling .load()
on any row in the frame fetches the data into memory:
results = catalog.find('energy')
df = results.iloc[2].load()
df.head()
country_code | hydro__twh__direct | nuclear__twh__direct | solar__twh__direct | wind__twh__direct | other_renewables__twh__direct | coal__twh | oil__twh | gas__twh | biofuels__twh | ... | nuclear_per_capita__kwh__direct | nuclear_per_capita__kwh__equivalent | other_renewables_per_capita__kwh__direct | other_renewables_per_capita__kwh__equivalent | renewables_per_capita__kwh__direct | renewables_per_capita__kwh__equivalent | solar_per_capita__kwh__direct | solar_per_capita__kwh__equivalent | wind_per_capita__kwh__direct | wind_per_capita__kwh__equivalent | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
country | year | |||||||||||||||||||||
Africa | 1965 | OWID_AFR | 13.905635 | 0.0 | NaN | 0.0 | NaN | 323.496155 | 341.262787 | 9.543755 | NaN | ... | 0.0 | 0.0 | NaN | NaN | 43.259445 | 127.917709 | NaN | NaN | 0.0 | 0.0 |
1966 | OWID_AFR | 15.510005 | 0.0 | NaN | 0.0 | NaN | 323.122192 | 369.486572 | 10.669916 | NaN | ... | 0.0 | 0.0 | NaN | NaN | 47.048717 | 139.122559 | NaN | NaN | 0.0 | 0.0 | |
1967 | OWID_AFR | 16.190636 | 0.0 | NaN | 0.0 | NaN | 330.291595 | 368.125244 | 10.545670 | NaN | ... | 0.0 | 0.0 | NaN | NaN | 47.878628 | 141.576599 | NaN | NaN | 0.0 | 0.0 | |
1968 | OWID_AFR | 18.938341 | 0.0 | NaN | 0.0 | NaN | 343.512878 | 389.199829 | 10.688969 | NaN | ... | 0.0 | 0.0 | NaN | NaN | 54.580433 | 161.393753 | NaN | NaN | 0.0 | 0.0 | |
1969 | OWID_AFR | 22.100891 | 0.0 | NaN | 0.0 | NaN | 346.642883 | 396.922852 | 12.491999 | NaN | ... | 0.0 | 0.0 | NaN | NaN | 62.068840 | 183.536865 | NaN | NaN | 0.0 | 0.0 |
5 rows × 96 columns
In cases like the Covid-19 query above, where there was only one match, you can call .load()
on the set of results directly:
catalog.find('covid').load().head()
continent | location | total_cases | new_cases | new_cases_smoothed | total_deaths | new_deaths | new_deaths_smoothed | total_cases_per_million | new_cases_per_million | ... | male_smokers | handwashing_facilities | hospital_beds_per_thousand | life_expectancy | human_development_index | population | excess_mortality_cumulative_absolute | excess_mortality_cumulative | excess_mortality | excess_mortality_cumulative_per_million | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
iso_code | date | |||||||||||||||||||||
AFG | 2020-02-24 | Asia | Afghanistan | 5 | 5 | NaN | <NA> | <NA> | NaN | 0.122 | 0.122 | ... | NaN | 37.745998 | 0.5 | 64.830002 | 0.511 | 41128772 | NaN | NaN | NaN | NaN |
2020-02-25 | Asia | Afghanistan | 5 | 0 | NaN | <NA> | <NA> | NaN | 0.122 | 0.000 | ... | NaN | 37.745998 | 0.5 | 64.830002 | 0.511 | 41128772 | NaN | NaN | NaN | NaN | |
2020-02-26 | Asia | Afghanistan | 5 | 0 | NaN | <NA> | <NA> | NaN | 0.122 | 0.000 | ... | NaN | 37.745998 | 0.5 | 64.830002 | 0.511 | 41128772 | NaN | NaN | NaN | NaN | |
2020-02-27 | Asia | Afghanistan | 5 | 0 | NaN | <NA> | <NA> | NaN | 0.122 | 0.000 | ... | NaN | 37.745998 | 0.5 | 64.830002 | 0.511 | 41128772 | NaN | NaN | NaN | NaN | |
2020-02-28 | Asia | Afghanistan | 5 | 0 | NaN | <NA> | <NA> | NaN | 0.122 | 0.000 | ... | NaN | 37.745998 | 0.5 | 64.830002 | 0.511 | 41128772 | NaN | NaN | NaN | NaN |
5 rows × 65 columns
Metadata¶
Data fetched from the API supports metadata at every level. Let's take a look by loading an energy mix dataset:
df = catalog.find('energy_mix').iloc[0].load()
Table metadata¶
Every data frame loaded is accompanied by a .metadata
object that describes it in more detail.
table_meta = df.metadata
table_meta.title
'Energy mix from BP'
Dataset metadata¶
Tables keep a reference to the dataset that they're part of, which also has information such as the sources of the data.
dataset_meta = df.metadata.dataset
dataset_meta.sources[0].name
'Our World in Data based on BP Statistical Review of World Energy (2022)'
Variable metadata¶
Each column in the data frame also has a .metadata
object which describes that specific variable in additional detail.
df.coal__pct_growth.metadata.title
'Coal (% growth)'
df.coal__pct_growth.metadata.unit
'%'
To learn more about what metadata fields are available, see the owid.catalog.meta
module.
Channels¶
Our catalog publishes multiple channels of data, including:
garden
: curated data that has been harmonized and gone through Our World In Data's editorial processmeadow
: data from an upstream provider in unmodified formbackport
: data from our legacy data catalogopen_numbers
: a mirror of Gapminder's Open Numbers datasets
By default, find()
will only search in garden
, but you can also specify other channels to search in. This can be useful since our modern data catalog does not contain all the data used on our site.
Issues and feedback¶
We want this library to be as useful as possible. If you have any feedback on the library itself or the data behind, it please reach out:
- General feedback: https://github.com/owid/etl/discussions
- Issues: https://github.com/owid/owid-catalog-py/issues