Meaning of the "indexed_at" property

What changes / updates have an impact on the “indexed_at” property of a provider, dataset or serie?

  • When a serie (not an observation) is added to a dataset, is the indexed_at property of the parenting dataset updated?

  • When a serie (not an observation) is removed from a dataset, is the indexed_at property of the parenting dataset updated?

  • When a serie (not an observation) is updated (for example a typo is fixed) is the indexed_at property of the parenting dataset updated?

  • When a dataset is added to a provider, is the indexed_at property of the provider updated?

  • When a dataset is removed from a provider, is the indexed_at property of the provider updated?

  • When a dataset is updated (for example a typo is fixed), is the indexed_at property of the provider updated?

Thank you in advance!
Regards

The indexed_at property is an internal metadata representing the latest update of the JSON documents stored in the Apache Solr index that DBnomics uses to implement full-text search (on the homepage of the website) and search by dimension (on a dataset page of the website).

When a fetcher runs, it triggers the pipeline (defined by this file) which contains a job named “Index converted data with Solr”, which processes the datasets that were downloaded and converted just before. Each dataset is re-indexed if its content did change, including data and metadata of its series, and the dataset metadata. The indexed_at property of the documents for series and datasets are updated with the same ISO date string. At the end, the indexed_at property of the JSON document of the provider is also updated.

There are no typo fixes: DBnomics fetchers just download and convert data from the provider again and again; if the provider fixes a typo, it is considered as an update, the same way that new data is added, modified or deleted.

So all of your questions have a “yes” answer!

Hope it is clear enough.

I would like to add that indexed_at is not the best way to reflect dataset update dates, and could someday move to exposing the download date instead, because it really reflects the date when data was fetched from the provider. The conversion and indexation date would then be internal implementation details of the DBnomics data acquisition pipeline.

1 Like