The International Monetary Fund (IMF) provides a great resource on inflation: the IMF Consumer Price Index (CPI) database. This database receives inflation data from nations across the globe and combines it into a single unified table. This requires the co-operation of every country that is represented in the database, it can sometimes take a little while for the latest data to be published. This delay leaves the database temporarily out of date, often by more than a month, and incomplete where national statistics offices don’t proactively send data to the IMF.
As Africa Data Hub, our mission is to provide data journalists and researchers with up-to-date, accurate, African-based data. In service to this mission, we have undertaken to maintain an inflation database, specifically focused on African countries, using the IMF database as the key dataset, but updating it with data released by individual African nations as it becomes available. Thus we are able to maintain one of the most up to date inflation datasets in the world based exclusively on African datasets.
We gather the latest inflation data from different African countries by visiting the relevant websites (typically the country’s national statistics bureau), searching for the latest inflation statistics, and then downloading the data. This data is often contained in official reports, published in PDF format. This can make extracting and using the data tables contained within the reports difficult and time-consuming. On rare occasions, the reports are accompanied by a separate data table release in CSV or XLSX format, which is much easier to use. The first step in combining the data from the IMF and the various countries is to extract the relevant data and convert it all to the same format.
The full list of data sources that we monitor can be found here.
Once we have collected the documents, we store them here.
Extracting data and combining it into one dataset
The one advantage about formal reports that have been published in PDF is that they are typically consistently structured, with only the content changing. Thus we were able to construct an automated process for extracting data from PDF tables that was unique to each country.
The data is extracted using a python module (look out for a blog post on this process, coming soon!) called Tabula, then cleaned and arranged in a manner that is consistent across all datasets, before being stored as a CSV. Each country has its own CSV file with rows for each inflation indicator and columns for each data release. In cases where the structure of the report changes fairly often, we extract the data by eye and manually update that country's CSV file.
Next in the process, we take our latest Africa Data Hub Inflation dataset, update it with the latest IMF dataset, and then update it further with the data scraped from individual countries.
Once the data has been combined, we save the final dataset as a CSV file and then post it on the Africa Data Hub CKAN data repository here. We also push all of the source data and scraped CSV files for each country to our github repository and then provide public access to this data via links in each country dataset on CKAN, found here. Finally, the data can be accessed and viewed via our Inflation Observer.