I’m reading Becky Hogge’s Open Data Study, a review of open data and government transparency efforts.
First, in many cases, the data does not exist in any electronic form, and in some cases record-keeping systems may not exist at all. Nathaniel Heller:
My simplest example for this would be years ago, talking with the government in Senegal and trying to plan an intervention based on electronic property records and…the Senegalese government was at first very enthusiastic. And then we started talking about the physical challenge of it and what we ended up discovering is that before we built an electronic property records system we actually had to build a property records system. It wasn’t clear that data existed in paper form and that to build that sort of government data transparency system we needed, in many case we would have to do the basic data collection.
From Ethan Zuckerman:
I think it would be great to start mapping what datasets exist within governments, but I’m going to stand by my skepticism: I think a lot of the data that you want…it’s not clear that those records are getting digitized or digitized in any meaningful way.
In some cases the data does exist, but requires a lot of work to put into a form that you can distribute on the Web. I really admire Mzalendo, a site where activists in Kenya try to get their hands on any government records they can find, cutting and pasting and even retyping. Here’s Ory Okolloh:
All the work we do is manual, so we have to literally cut and past information if we can find it. It’s gotten a lot better from when we started. Now things like the Hansard [transcripts of parliamentary debates] are on the website pretty much in soft copy and up to date. So it’s improved but it’s still either in a PDF or Word document that we can’t crawl or extract information from.
[I’m reading a lot of papers and reports on open data and civic data. You can check out my reading list here. Civic Data/Open Data Reading List — LW]
The PDFs. OMFG. The PDFs.
Paul Bradshaw, ‘Two reasons why every journalist should know about scraping’
Sandra Crucianelli, Data journalism: The good, the bad and the ugly
Barrett Sheridan, Is Cue the Cure for Information Overload?
The Copy Editor asks: That’s for the average person. Any ballpark figure on the quantity of information processed by journalists every day?
Vivek Kundra’s Harvard study on the potential of open data and data journalism in a democracy:
In today’s world, open data leveraged by networks is the fuel that powers important decisions at each level of society—from government, to business, to community, to households—but it is also a product of our every activity at every level of our existence.
Channeling the power of this open data and the network effect can help:
- Fight government corruption, improve accountability and enhance government services
- Change the default setting of government to open, transparent and participatory
- Create new models of journalism to separate signal from noise to provide meaningful insights
- Launch multi-billion dollar businesses based on public sector data