A group of hackers, programmers, researchers and policy makers met on the premises of the Lower House of the Dutch Parliament on September 8, 2012 to try to hack the parliamentary database. They were there on invitation of the outgoing Speaker of the Lower House, Mrs Gerdi Verbeet. The reason for the get-together was that the parliamentary database, Parlis, will be available as open data from now on. Open data is data that is freely available and can be used by everyone. There is also no fee for open data and no copyright, it is easy to find, and it is provided in machine-readable format.

Mieke van Heesewijk and Josien Pieterse are the joint founders and directors of the non-profit foundation Netwerk Democratie [Dutch], a platform for democratic innovation. In cooperation with two other organizations, Hack de Overheid (“hack the government”) and Open State, Netwerk Democratie organized the Apps for Democracy [Dutch] event. Mark Bastiaans, a researcher at the Dutch scientific research organization TNO, and a team of colleagues developed an application based on the parliamentary database. I talked with them about open data and how it helps to shape democracy in the twenty-first century.

The term hacking is used in this article in its original sense, which is “using innovation to create new applications for an existing system”. The connotation of unauthorized penetration into systems, which later came be associated with the word, is not intended here.

As old as the Internet

Tessel: Is open data something new?

Mieke: Linking datasets is as old as the Internet, but the topic that’s getting a lot of attention right now is open data and democracy. For a good while already, various programmers, universities, the TNO and activists have been advocating making public data open. The nice thing now is that we see growing support for open data in a broader circle, including government bodies and companies.



Josien PieterseJosien: This is about public data. Data that is generated using public funds or data that concerns the public. The idea is that this data belongs to everybody because we have all contributed to collecting the data. This data becomes open when government bodies or companies put the data in an accessible location in a usable format. What’s especially interesting is that this allows new combinations to be generated from the data streams, which leads to the creation of new knowledge and the discovery of new relationships.

Opening up the parliamentary database

Mark BastiaansJosien: Opening up Parlis, the parliamentary database, is a good example. It contains information of interest to citizens so they can keep track of the democratic process, such as the voting records of the parties, parliamentary motions and parliamentary questions. This information was already public, but it was released in PDF format. That’s of little use to researchers, since it takes a lot of effort to access the information. Making the data available in machine readable form allows applications to be developed.

On invitation of Gerdi Verbeet, the former Speaker of the Lower House, Netwerk Democratie organized the Apps for Democracy event on September 8, 2012, in cooperation with other organizations. The event was held in the Parliament Building and consisted of a hackathon where programmers built applications to access the Parlis database. There were also workshops on open data.

Mieke: A hackathon right in the Parliament Building is a world first. It’s really nice to see programmers sitting in the Lower House hacking away. It also shows the nerve of the Lower House in opening their doors to hackers. Hackers are actually treated very poorly in the Netherlands, and you can see this fear of hackers with some of the political parties in the Lower House. They seem to feel that you should have as little to do with them as possible. This despite the fact that ethical hackers can be of tremendous benefit for transparency and security, particularly in this phase of democracy. The hackathon in the Lower House marked a turning point.

Hackathon

Mieke van HeesewijkMark: Together with a team of TNO employees, we looked at ways to visualize data from Parlis. We built an application for this during the hackathon. With our tool, you can arrange the members of Parliament in the Lower House in various groups based on specific parameters. For example, you can sort them by experience or by the number of submitted and approved motions. That way you can see at a glance which member of the House has submitted the most motions.

 We added a data source of our own, where you can see which words are mentioned frequently in the media in connection with a particular member of Parliament. That’s the nice thing about open data – that you can combine information from different sources.

Tessel: What requirements does open data have to meet so that developers can use it?

Mark: That depends on the type of developer. Tim Berners-Lee, the inventor of the World Wide Web, devised a five-star scale for open data. A single star is awarded to data with properties that meet the minimum requirements for open data: unstructured data that is published under an open license. Five stars are awarded to data that is annotated in a manner that allows it to be linked semantically to other datasets. This is called “linked open data”. Data with five stars is better for a developer than data with just one star.

In theory, developers can extract information from unstructured data by using automatic analysis methods, but that is rather complicated and not very exact. A scanned PDF document is an example of unstructured data. It is actually an image, so you have to use optical character recognition (OCR) to extract the individual letters. Then you have to use text analysis to transform the letters into structured text. People have been working on OCR for a long time, and open source programs for OCR are available, but for the average developer it is easier if you have data that is directly machine readable, such as an Excel worksheet or a CSV file.

For me as a developer, it is also important that the meaning of the data is clear and that the semantics and relationships in the data are well defined. For building our application, we received a dump of several tables in the Parlis database. It is a relational database, which means that a column in one table is related to a column in another table. It takes a while to find out what the relationships are and what they actually mean. Fortunately, documentation about the exact relationships was provided with the data, and the organization had also generated a data model, but it still took a lot of detective work. This means that data owners must also supply metadata with the actual data, as otherwise developers won’t use it and will most likely find some other dataset.

Privacy and corrupted data

Tessel: Although many people have been advocating open data for a long time, government bodies were initially opposed to the idea. Now we’ve reached a turning point. Why is that?

Josien: It’s naturally a difficult process. Lots of programmers say, “Hey, just open it up and see what happens”. However, that's not how it works with the Lower House. They are very cautious about making things open.

Mieke: The Lower House is rightfully concerned about ensuring that only clean data is made open. That is the difficulty for institutions. Their position is that you can't put corrupted datasets online. They first want to get their internal information management in order, but even the best organizations can’t do that.

Josien: Another reason is privacy. Letters are very interesting. Among other things, they let you see which organizations are doing lobbying, and that reveals power structures. However, this is only possible if it is clear who sent the letter, and that falls under privacy legislation. Consequently, this information is not available now, which is too bad because it is naturally very interesting information.

Mieke: Privacy and data corruption are often given as reasons, and they are legitimate concerns.

Changing circumstances

Mieke: The main reason that the idea finally got off the ground in the Netherlands is that the Ministry of Economic Affairs put their weight behind it. They realized that it’s possible to make money with open data. Interesting new applications can be built by linking datasets with each other, and that is good for technological innovation and a knowledge-based economy. The parliamentary data is now hitching along for the ride.

You can see a change in attitude in the government. The government has less and less money available. That’s why they need more and more help from citizens, entrepreneurs and all sorts of activists to get things done. They are looking for more collaboration with the social sector in order to achieve innovation. More and more government bodies are becoming aware of this.

Ten years ago, a lot of data that was collected with public funds was farmed out to companies. Now the government realizes that there are problems because this caused the information to cut off from people who actually have a right to it. An example of this is postal code data. When the Dutch post office was privatized to form what is now TNT, it was given the right to manage the postal codes as a sort of dowry. However, there are lots of applications on the Internet that work with postal codes. Developers had to pay license fees to TNT in order to use up-to-date postal code data. The question is: who owns that data? After ten years of lobbying by a group called “Free the Postal Code”, in 2012 the postal codes were finally released as open data.

Josien: There was also another thing that happened: Gerdi Verbeet actually changed her position. She considers it important that citizens understand what happens in the political sector. At first her view was fairly traditional, but as a result of discussions with people who have open data high on their agenda, she came to realize that a transparent democracy also offers more opportunities for citizens to get involved in what the government does. She gradually came to see that making information open can play a part in this. That’s why she started promoting it in the political world.

Tessel: Is the government on side now?

Mark: European legislation and regulations are being prepared to encourage governments to make data open. The details of the implementation are the only thing still being negotiated in Brussels. One thing is clear: things are heading in that direction top-down, but the material has not yet trickled down the chain. You see a lot of initiatives. An incredible number of hackers, enthusiasts and technically adept people are keen to get started. Things are moving at the top policy level and at the grass roots level, but we're still waiting for action at the intermediate level. This sort of change has to take place in the social realm, and that will take a bit longer than just banging a bit of code together and hacking a few systems.

This article first appeared in the March issue of Elektor

Image: Cornell.edu