Processing log¶

In progress.

The processing log is an experimental feature that is not yet fully tested.

This log captures every operation in its metadata, allowing users to track the processing history of each indicator in a dataset. It is particularly useful for visualizing the data pipeline and aiding in debugging processes.

To enable the processing log, set the environment variable PROCESSING_LOG to 1. For example:

PROCESSING_LOG=1 etl meadow/dummy/2020-01-01/dummy --force

To visualize the processing log in a browser, use the following code (from notebook):

ds = Dataset(DATA_DIR / "meadow/dummy/2020-01-01/dummy")
tab = ds['dummy']
tab['dummy_variable'].metadata.processing_log.display(DATA_DIR)

Ensure that the PROCESSING_LOG environment variable is unset or set to "0" when displaying the log. Otherwise, the diagram will only show a single "load" operation. Therefore, it is not advisable to set this variable in the .env file.

Custom processing log entry¶

Sometimes you have a function that is so complex that its visualisation doesn't look good. You can wrap the function with the decorator @pl.wrap to squeeze the function into a single log entry. For example:

from owid.catalog import processing_log as pl

@pl.wrap("complex_processing")
def func(...) -> Table:
    ...
    return tab