Grapher schema sync¶
The grapher chart-config schema is owned by the web team in owid-grapher and published at https://files.ourworldindata.org/schemas/grapher-schema.NNN.json. It is mutated in place without version bumps (e.g. dumbbell plots landed in grapher-schema.010.json directly), so ETL needs a way to follow upstream changes.
ETL keeps a vendored pin of the upstream schema — a committed snapshot, same model as a lockfile — and derives or checks everything else against it:
owid-grapher (web team)
│ publishes / mutates grapher-schema.010.json
▼
schemas/grapher-schema.010.json ← vendored pin (the only upstream copy in ETL)
├─→ etl/collection/model/schema_types.py generated from it ← unit test (offline)
├─→ multidim/explorer-schema $refs resolved against it ← at runtime (offline)
└─→ dataset-schema.json embedded block checked against it ← unit test (offline, enum-deep)
vendored ↔ live upstream ← scheduled workflow + integration tests
Everything inside the repo is machine-checked against the pin on every PR; only the pin itself can lag upstream, and that is what the scheduled workflow watches.
The scheduled workflow¶
.github/workflows/sync-grapher-schema.yml runs on weekdays at 07:00 UTC (plus manual workflow_dispatch). When everything is in sync it is a no-op (two curls).
| Upstream event | Workflow action |
|---|---|
| Nothing changed | No-op |
| Pinned schema mutated in place | Refreshes the vendored copy, regenerates schema_types.py, and opens/updates a draft PR on the auto-sync-grapher-schema branch. The PR's own CI flags any remaining manual propagation. |
New schema version published (grapher-schema.latest.json $id ≠ our pin) |
Opens an issue (deduped by title) pointing at the version-bump procedure. Not auto-PR'd, since a bump needs judgment. |
Failure modes and caveats¶
- Red CI on a bot PR is by design, not a malfunction:
test_grapher_config_schema_syncfails when the upstream change also needs manual propagation (see below). The failing test output is the todo list. - Branch force-update: if upstream changes again before a bot PR is merged, the next run force-updates the
auto-sync-grapher-schemabranch and can clobber manual commits on it. Finish and merge bot PRs promptly; if you need more time, move the work to your own branch. files.ourworldindata.orgunreachable, dependency installation or generator failures → the job fails loudly in the Actions tab; nothing is skipped silently.- Bot PRs are created with the default
GITHUB_TOKEN: Buildkite CI triggers normally (external webhook), but GitHub-Actions-based CI would not.
Completing a sync: the /sync-grapher-schema skill¶
The automatic part of a sync (vendored refresh + type regeneration) is committed by the workflow. The judgment part is guided by the internal Claude Code skill /sync-grapher-schema:
- mirroring the upstream diff into the
grapher_configblock embedded inschemas/dataset-schema.json, preserving the deliberate ETL-side deviations (JinjaoneOfescape hatches, the extraWorldMapchart type, the ETL-onlydata/includedEntitiesproperties); - adding
$refs for genuinely new properties toschemas/multidim-schema.json/schemas/explorer-schema.json; - handling the rarer version bump (new
grapher-schema.NNN): bumpingDEFAULT_GRAPHER_SCHEMAinetl/config.py, updating$refs, re-vendoring.
It can also be run ad-hoc — e.g. when the web team announces a change and you don't want to wait for the cron (the skill performs the refresh itself), or trigger the workflow manually via workflow_dispatch.
Never edit schema_types.py by hand
etl/collection/model/schema_types.py is fully generated by scripts/generate_schema_types.py; a unit test fails if it doesn't round-trip. Hand-written types belong in etl/collection/model/params.py.
Who does what¶
| Who | Responsibility |
|---|---|
| Web team | Nothing ETL-specific: publish the schema and announce changes, as they already do. |
| The workflow | Detect upstream changes; open the draft PR / issue. |
| Auto-assigned reviewer | Triage bot PRs: review the vendored diff; CI green → mark ready and merge; CI red → run /sync-grapher-schema on the branch (or delegate). |
| Anyone on the data team | Can run /sync-grapher-schema; picks up "new version" issues. |
| CI on every PR | Refuses internally-inconsistent states (schemas ↔ generated types ↔ embedded block). |
Drift guards reference¶
| Drift vector | Guard |
|---|---|
schemas/*.json edited without regenerating types, or generated file hand-edited |
tests/test_schema_types_generation.py (unit, offline) |
Embedded grapher_config in dataset-schema.json out of sync with the vendored pin |
test_grapher_config_schema_sync in tests/test_metadata_schemas.py (unit, offline, enum-deep) |
| Vendored pin stale vs live upstream (in-place mutation) | scheduled workflow → draft PR; test_vendored_grapher_schema_is_current (integration) |
| Upstream publishes a new schema version | scheduled workflow → issue; test_no_newer_grapher_schema_version (integration) |