How would I get a list of all datasets for a given data provider?
I would like to ask for:
- All providers
- All datasets for a given provider
- Attributes of dataset
Thank you.
How would I get a list of all datasets for a given data provider?
I would like to ask for:
Thank you.
for example in R i can do:
dim = rdbnomics::rdb_dimensions(provider_code = “IMF”, dataset_code = dataset)
countries = dim$IMF$WEO:2021-10
$weo-country
series = dim$IMF$WEO:2021-10
$weo-subject
but I cannot do this in python?
Hi @toast,
The DBnomics Python client can fetch series only. It may be enhanced, and we’ll keep your needs in mind for future updates of the package.
However, meanwhile, you can request the DBnomics API directly and parse the JSON response, and I’m going to help you about it below.
See also the docs of the API.
You can load this URL: https://api.db.nomics.world/v22/providers and get the response in the providers/docs
key path.
There are 2 ways:
You can request the category tree of the provider (the same you see on the provider page on DBnomics website) by calling https://api.db.nomics.world/v22/providers/{provider_code}
, and reading the category_tree
key of the JSON response.
Please note that the category tree can be hierarchical (but not always) and you’ll have to flatten it by writing a recursive function yourself, for example. The leaves of the tree are the datasets.
Or you can also use this URL https://api.db.nomics.world/v22/datasets/{provider_code}
and read the datasets/docs
key path in the JSON response. For each item of the list you will have the dataset code as well as many other metadata you probably don’t need, so just ignore them.
Dataset attributes are not returned by the API, there is #265 and a draft merge request about it.
Could you give an example? I’m not sure we talk about the same “attributes”.
Your R example suggests that you need actually the dataset dimensions, which are not available from the Python client. This should be added definitively.
I hope it helped, if you need more info please ask.
ok I think it looks something like this to get all providers and then get their respective datasets and required metadata to query each dataset. this is exactly what I was after. thank you !
#%%
import requests
from bs4 import BeautifulSoup
#%%
######################################################################## All provider codes
all_providers_url= 'https://api.db.nomics.world/v22/providers'
providers_json= requests.get(all_providers_url).json()
providers= [provider['code'] for provider in providers_json['providers']['docs']]
providers
#%%
######################################################################### All datasets for a provider
provider_code= 'IMF'
provider_data_string= f'https://api.db.nomics.world/v22/datasets/{provider_code}'
providers_json= requests.get(provider_data_string).json()
datasets_provider= [dataset['code'] for dataset in providers_json['datasets']['docs']]
datasets_provider
#%%
########################################################################### All series for a dataset
series= providers_json['datasets']['docs'][0]['dimensions_values_labels']['INDICATOR']