CKAN – Open Data Portal

In the scope of this document all references to CKAN are referring to the CKAN Open Data Portal loaded on the IBM-KWMC machine image. CKAN is one of the fundamental components of the Open Data Toolkit being provided by Knowle West Media Centre and IBM.

For more information on CKAN and the CKAN portal please visit the Open Knowledge Foundation at:

https://okfn.org

CKAN is an Open Data portal which allows easy collection and upload of data in formats which a computer can understand. What this means is that the data format that this application specialises in is not the traditional form of data you are used to dealing with such as spreadsheets. The data that CKAN uses is more often in data formats such as JSON, CSV and XML. What are these formats? These are machine readable data formats which makes programming applications to use it much easier. We will show examples of each in the Appendix Section.

Web Portal Access

CKAN can be found on the pre-built image at:

When you go to this link you should be shown a page similar to this:

ckan

In the CKAN portal the first thing you will need to do is generate a new user for yourself in the portal. There is a default user (admin/(Password Redacted. Contact datatoolkit@kwmc.org.uk for password)) however you will probably want a lower privileged account. To do this click the Register button in the top right hand corner and create your account. Once you have an account you can create your first dataset. To create a dataset you click “Datasets” in the top bar and then click “Add Dataset”. Within the dataset creation you will have to give it a descriptive title and a description. This can be seen below:

ckan-adddataset

Once the dataset is correctly labelled, resources can be uploaded. A resource could be a fragment of CSV data as text, a weblink, a file uploaded or a hosted API link. If the data being uploaded is a recognised format that CKAN knows how to process in the later steps there will be previews of the data available from the resource panel. You can upload files by pressing the “Upload” button on the Add resource page.

ckan-upload1

Once you have completed creating a dataset and uploading a resource, we can go back to the main CKAN page. CKAN acts as a large data hub and part of that job is to make the data entrusted to it searchable and accessible. We can search the data in the portal by going to the front page and typing in the search box.

Programmatic Access

We have just performed a search however how would we do the same search programatically? Well that’s easy, open your web browser and we can use the API to do the same operation but in a machine readable way. You may want to install the JSONViewer Chrome plugin to make the raw data output a bit easier on the eye. The difference between is shown in the below images.

ckan-jsonsafari

ckan-jsonchrome

The web link you need to type into the browser to search for a dataset is:

Where the search string is test. If there are any datasets which are returned from the search they will be included in the search results as well as any data and resources attached to them. This is our first usage of the CKAN API which is fully documented here:

http://docs.ckan.org/en/ckan-2.2/api.html

Now that we know how to access the CKAN API we can use this knowledge to access data programatically. We will show you a quick simple python program which accesses the CKAN API and returns some information.

Python is probably the simplest useful programming language you can use. Although daunting at first with a bit of practise Python will become a very powerful tool which can be used for many things. You will need to install the Python package if you are on Windows or if you are on Mac or Linux it should already be installed. To get the package for Windows download it here:

Alternatively you can use python online to test the following code, it can be accessed here:

Once you have Python we will make a Python script (or run interactively), this is described in the “Introduc- tion to Programming Workshop” in section 2.5 “Producing our first source file!”. This document is appended in the appendices of this document.

I will search for a “test” dataset and return it’s ID number. The program is as follows:

If you seriously want to get into using the CKAN API, we recommend using the requests python package as it allows for much easier access of urls. Documentation for requests can be found here:

The same code written in requests will often be shorter. for this example it is similar but for larger programs, the benefits of using requests will become obvious.

For a quick intro to how to use python see my other document available here:
https://github.com/thomasmortensson/bristolraspberrypipythontutorial/raw/master/TeachingGuide.pdf
I have also made this availabe at the end of this document in the appendices.