Africa Data Hub seeks to support and promote quality data journalism in Africa by providing newsrooms with easy access to quality African data. African data is typically difficult to find, stored in unwieldy formats, and is often out of date.
Africa Data Hub is working to remedy this by actively seeking out interesting and useful African datasets, converting them to more easily accessible formats, and storing them all on our open source, online, CKAN data repository. Once the data is made available, we make every effort to keep the data up to date with the latest releases from our various sources. In this way, we find data from all over Africa and bring it together on one platform, where it can be searched, accessed and downloaded by anyone, anywhere, anytime.
What is CKAN?
The data repository is powered by CKAN, an open source data management system that is used by hundreds of organisations around the world, including the national governments of USA, Canada, Singapore and Australia. We use this resource to host any data we find that we believe may be useful in serving our mandate in the promotion of quality data journalism in Africa.
CKAN can be a little intimidating at first, but once you are able to navigate our repository, you should be able to work with any other CKAN instance you find.
Dataset vs Resource
A key concept to understand when using CKAN, is the difference between a dataset and a resource. A dataset is a collection of related data resources, while a resource is a single file.
It may be useful to think of a dataset as a folder on your computer and a resource as a file in that folder. For example, this is a dataset in our data repository called “Human Development Index”. Within this dataset, there are three data resources relating to articles that we have written about the Human Development Index. If you click on one of the resources, you will be taken to the actual data resource, which you can then download.
Each dataset and every data resource in our repository has its own set of metadata. This provides context for a given dataset or resource which helps you to understand what it is about, where it came from, how it was created and how it can be used. The metadata also provides information regarding the licence under which the data can be used and shared.
Finding and accessing data
Our data is organised in terms of datasets, organisations, groups and tags.
Searching for datasets on the Africa Data Hub data repository is done in two ways: by searching for terms that you type into the search bar found at the top of almost every page of the repository, and by filtering a list of search results. Entering a search term causes CKAN to look for matching terms in the titles, descriptions, locations and tags of a dataset. This is very similar to searching for something on Google. The resulting list of items can be further refined using the filter options on the left side of the search result. You can filter by organisation, group, tag, format, and license.
Once you have narrowed your search down to a manageable list of datasets, you need to click on one and see if it contains any resources that you wish to use. You can download any resource by clicking on download on the dropdown explore button to the right of the resource. Be sure to pay attention to the format of the resource that you wish to download and ensure that you download it in the correct format.
If you prefer to navigate the repository in a more structured way, data can also be retrieved by clicking on the metadata tags in menus. For example…
You may see reference to 'Organisations' in the navigation menus for CKAN. These are partners within the Africa Data Hub network who manage and maintain datasets as part of ongoing collaborations.
Groups indicate data categories, for example Health data, Economic data etc. A dataset may belong to more than one group. Groups also serve as folders for all datasets used in a particular data tool or project. Finally, if several different datasets have all been sourced from the same place, then all of those datasets are placed in a group named after that source. See for example, the Humanitarian Data Exchange.
Tags identify interesting characteristics of the data. For example, if the data contains gender information, it is tagged with gender, if it is geographical data, it is tagged with geodata.
Types of Data Available on the Africa Data Hub data repository
The data on the Africa Data Hub repository is available in a wide variety of formats, from spreadsheets (eg: XLSX, CSV) to geographic data (eg: SHP, geoJSON) to images (eg: geoTIFF, PNG) and in some cases documents like PDF, docx and so on. Some datasets include the same data in different formats under different resources. It is therefore not always necessary to download an entire dataset, but rather, only look for formats that you are comfortable with or are able to use. Note that the same data in different formats can be a different size.
Many countries tend to release their data in PDF format. Unfortunately, data in PDF form is usually difficult to work with. As such, when we come across PDF data that we believe could be useful for African data journalists, we try to extract the data and present it in csv or xlsx format, which is much easier to work with. When we do this, we typically try to ensure links to the original PDF documents are included in the dataset so that the authenticity and correctness of the extracted data can be verified by anyone who wishes to use it. An example of this process can be found in the data behind our Inflation Observer.