Consuming datasets
Now that our data/
folder has a table built, we can try reading it. We recommend using our library owid-catalog
.
You can load the complete dataset using the owid.catalog.Dataset
object:
>>> from owid.catalog import Dataset
>>> ds = Dataset('data/garden/un/2022-11-29/undp_hdr')
We can access the metadata of the dataset by using Dataset.metadata
:
>>> ds.metadata
Dataset(path='data/garden/un/2022-11-29/undp_hdr/', metadata=DatasetMeta(namespace='un', short_name='undp_hdr', title='Human Development Report - UNDP (2021-22)', description="The 2021/2022 Human Development Report is the latest in the series of global Human Development Reports published by the United Nations Development Programme (UNDP) since 1990 as independent and analytically and empirically grounded discussions of major development issues, trends and policies.\n\nAdditional resources related to the 2021/2022 Human Development Report can be found online at http://hdr.undp.org. Resources on the website include digital versions and translations of the Report and the overview in more than 10 languages, an interactive web version of the Report, a set of background papers and think pieces commissioned for the Report, interactive data visualizations and databases of human development indicators, full explanations of the sources and methodologies used in the Report's composite indices, country insights and other background materials, and previous global, regional and national Human Development Reports. Corrections and addenda are also available online.\n\nTechnical notes may be found at https://hdr.undp.org/sites/default/files/2021-22_HDR/hdr2021-22_technical_notes.pdf.\n", sources=[Source(name='UNDP, Human Development Report (2021-22)', description='The 2021/2022 Human Development Report is the latest in the series of global Human Development Reports published by the United Nations Development Programme (UNDP) since 1990 as independent and analytically and empirically grounded discussions of major development issues, trends and policies.\n\nAdditional resources related to the 2021/2022 Human Development Report can be found online at http://hdr.undp.org. Resources on the website include digital versions and translations of the Report and the overview in more than 10 languages, an interactive web version of the Report, a set of background papers and think pieces commissioned for the Report, interactive data visualizations and databases of human development indicators, full explanations of the sources and methodologies used in the Report’s composite indices, country insights and other background materials, and previous global, regional and national Human Development Reports. Corrections and addenda are also available online.\n\nTechnical notes (region definitions, reports, etc.) can be found at https://hdr.undp.org/sites/default/files/2021-22_HDR/hdr2021-22_technical_notes.pdf.\n', url='https://hdr.undp.org/', source_data_url='https://hdr.undp.org/sites/default/files/2021-22_HDR/HDR21-22_Composite_indices_complete_time_series.csv', owid_data_url=None, date_accessed='2022-11-29', publication_date='2022-09-08', publication_year=2022, published_by=None, publisher_source=None)], licenses=[License(name='CC BY 3.0 IGO', url='https://hdr.undp.org/copyright-and-terms-use')], is_public=True, additional_info=None, version='2022-11-29', source_checksum='b806f3297dfa67e996487b1c3602c94f'))
To load the data as a table run:
>>> tb = ds["undp_hdr"]
>>> tb.head()
abr co2_prod coef_ineq diff_hdi_phdi ... pr_m rankdiff_hdi_phdi se_f se_m
country year ...
Afghanistan 1990 142.960007 0.209727 NaN 1.098901 ... NaN <NA> 0.700485 5.419458
1991 147.524994 0.182525 NaN 1.075269 ... NaN <NA> 0.772361 5.583395
1992 147.520996 0.095233 NaN 1.045296 ... NaN <NA> 0.844236 5.747332
1993 147.895996 0.084285 NaN 1.010101 ... NaN <NA> 0.916112 5.911269
1994 155.669006 0.075054 NaN 1.027397 ... NaN <NA> 0.987988 6.075205
[5 rows x 39 columns]
We can see that this dataset provides several indicators, reported by country and year, which are the primary key for this table.
The object tb
is an instance of owid.catalog.Table
, which is a wrapper around the well stablished pandas.DataFrame
class. Using these custom
objects (from our library owid-catalog
) allow us to enrich the data with metadata.