Open Numbers

johan · June 27, 2017, 12:41pm

Open Numbers: Crowd-sourced harmonization of global & local statistics

It’s a community for data-crunchers who harmonize statistics to make them more useful for a fact-based worldview. Gapminder moderates the process to make sure people participating know what they are doing.

johan · August 30, 2017, 5:12pm

Dans Open Numbers, chaque jeu de données est structuré suivant le modèle DDF. 4 éléments : DataPoints (“simple data storage”), Entities (“defining single-dimensional data”), Concepts (“giving information about your column headers”), Metadata (“giving information about your data”) => simili SDMX.
Les données sont stockées en csv (mais pas obligatoirement).

Usage final d’Open Numbers : assemblage de jeux de données pour visualisation dans l’outil Gapminder.

Exemple : ‘ddf–gapminder–gdp_per_capita_cppp’. Assemblage à partir de données World Bank (1990-2015) + données Madison (~200 ans). Méthodo.

Liens utiles :

Systema Globalis (main dataset used in tools on the official Gapminder website. It contains local & global statistics combined from hundreds of sources)
An introduction to DDF
DDFcsv format
DDFcsv datapackage
DDF_utils

johan · September 28, 2017, 11:37am

2017-09-26 Discussion with Jasper Heeffer (Gapminder / Open Numbers)

Gapminder 7 people team, including 2 developers (some things outsourced).

The old Gapminder World graph used Flash and data on Google Spreadsheet. Open Numbers started about 1,5 years ago for the new version of Gapminder Tools.

Data is stored in CSV because it’s easy to work with, that’s what researchers want, to dig in, create their own dataset, etc.
However CSV is not the best to fetch things from, it’s not optimised for queries.

Architecture

Python script => transformed to DDF => data stored in CSV
Semantic harmonization: for example country names/IDs are changed in order to be harmonized across all stored datasets, using tables to define and match countries and territories frequently used across many other datasets (see all alternative names/IDs for countries).
Data is stored harmonized but an unchanged copy is also kept.
“DDF Chef” : a python script (what we would call a fetcher).
“DDF Recipe” : how to mix and harmonize (“cook”) multiple datasets.

Challenges

Automatise more
Fetch more sources
Crowdsource the harmonization
Get an overview of fetchers (which will be updated?, when? etc.)

Their strength is in data visualisation

Vizabi : Powerful visualisation tools developed in-house (GitHub)

Example of dataviz project they’ve done: Södertörnsmodellen

Potential of collaboration

DB.nomics is kind of creating the “data architecture”, with a platform on which you could build other projects on the user-side: vizualisation, reuse, mixing, harmonization, etc.
Users can be individuals but also other websites/platforms
Gapminder could be one of these users, focused on the dataviz, since the architecture (fetching, agregating, etc.) is not their strength.
Data.world : similar idea
See also Quandl

Topic		Replies	Views
OECD Revenue Statistics database Site Feedback	3	363	March 10, 2023
Data Vintages Site Feedback	5	850	April 20, 2023
IMF/WEO, subjects LP, LE? Site Feedback	6	642	May 29, 2020
DataHub Community	0	912	November 22, 2017
OECD Main economic indidator (MEI)	0	41	October 4, 2024