Open Numbers

Open Numbers: Crowd-sourced harmonization of global & local statistics

It’s a community for data-crunchers who harmonize statistics to make them more useful for a fact-based worldview. Gapminder moderates the process to make sure people participating know what they are doing.

Dans Open Numbers, chaque jeu de données est structuré suivant le modèle DDF. 4 éléments : DataPoints (“simple data storage”), Entities (“defining single-dimensional data”), Concepts (“giving information about your column headers”), Metadata (“giving information about your data”) => simili SDMX.
Les données sont stockées en csv (mais pas obligatoirement).

Usage final d’Open Numbers : assemblage de jeux de données pour visualisation dans l’outil Gapminder.

Exemple : ‘ddf–gapminder–gdp_per_capita_cppp’. Assemblage à partir de données World Bank (1990-2015) + données Madison (~200 ans). Méthodo.

Liens utiles :

2017-09-26 Discussion with Jasper Heeffer (Gapminder / Open Numbers)

Gapminder 7 people team, including 2 developers (some things outsourced).

The old Gapminder World graph used Flash and data on Google Spreadsheet. Open Numbers started about 1,5 years ago for the new version of Gapminder Tools.

Data is stored in CSV because it’s easy to work with, that’s what researchers want, to dig in, create their own dataset, etc.
However CSV is not the best to fetch things from, it’s not optimised for queries.

Architecture

Python script => transformed to DDF => data stored in CSV
Semantic harmonization: for example country names/IDs are changed in order to be harmonized across all stored datasets, using tables to define and match countries and territories frequently used across many other datasets (see all alternative names/IDs for countries).
Data is stored harmonized but an unchanged copy is also kept.
“DDF Chef” : a python script (what we would call a fetcher).
“DDF Recipe” : how to mix and harmonize (“cook”) multiple datasets.

Challenges

  • Automatise more
  • Fetch more sources
  • Crowdsource the harmonization
  • Get an overview of fetchers (which will be updated?, when? etc.)

Their strength is in data visualisation

Vizabi : Powerful visualisation tools developed in-house (GitHub)

Example of dataviz project they’ve done: Södertörnsmodellen

Potential of collaboration

DB.nomics is kind of creating the “data architecture”, with a platform on which you could build other projects on the user-side: vizualisation, reuse, mixing, harmonization, etc.
Users can be individuals but also other websites/platforms
Gapminder could be one of these users, focused on the dataviz, since the architecture (fetching, agregating, etc.) is not their strength.
Data.world : similar idea
See also Quandl