I’m reading Becky Hogge’s Open Data Study, a review of open data and government transparency efforts.
First, in many cases, the data does not exist in any electronic form, and in some cases record-keeping systems may not exist at all. Nathaniel Heller:
My simplest example for this would be years ago, talking with the government in Senegal and trying to plan an intervention based on electronic property records and…the Senegalese government was at first very enthusiastic. And then we started talking about the physical challenge of it and what we ended up discovering is that before we built an electronic property records system we actually had to build a property records system. It wasn’t clear that data existed in paper form and that to build that sort of government data transparency system we needed, in many case we would have to do the basic data collection.
From Ethan Zuckerman:
I think it would be great to start mapping what datasets exist within governments, but I’m going to stand by my skepticism: I think a lot of the data that you want…it’s not clear that those records are getting digitized or digitized in any meaningful way.
In some cases the data does exist, but requires a lot of work to put into a form that you can distribute on the Web. I really admire Mzalendo, a site where activists in Kenya try to get their hands on any government records they can find, cutting and pasting and even retyping. Here’s Ory Okolloh:
All the work we do is manual, so we have to literally cut and past information if we can find it. It’s gotten a lot better from when we started. Now things like the Hansard [transcripts of parliamentary debates] are on the website pretty much in soft copy and up to date. So it’s improved but it’s still either in a PDF or Word document that we can’t crawl or extract information from.
[I’m reading a lot of papers and reports on open data and civic data. You can check out my reading list here. Civic Data/Open Data Reading List — LW]
The PDFs. OMFG. The PDFs.

These recipes may be most helpful to journalists who are trying to learn programming and already know the basics. If you’re already an experienced programmer, you might learn about a new library or tool you haven’t tried yet.
(via lifeandcode)
If you’re a fellow data mapper, here’s Data Roller, which converts shapefiles (.shp) to .csv format [h/t @datenjournalist]
Paul Bradshaw, ‘Two reasons why every journalist should know about scraping’
The BBC College of Journalism’s Jonathan Stoneman, David Donald of the Center for Public Integrity, the New York Times’ Aron Pilhofer, Birmingham City University’s Paul Bradshaw, and the BBC’s Martin Rosenbaum talk data journalism.
More videos from the BBC Academy College of Journalism.

Sandra Crucianelli, Data journalism: The good, the bad and the ugly

Barrett Sheridan, Is Cue the Cure for Information Overload?
The Copy Editor asks: That’s for the average person. Any ballpark figure on the quantity of information processed by journalists every day?
Interactive designer Julian Koschwitz’s art installation on journalists killed worldwide
Vivek Kundra’s Harvard study on the potential of open data and data journalism in a democracy:
In today’s world, open data leveraged by networks is the fuel that powers important decisions at each level of society—from government, to business, to community, to households—but it is also a product of our every activity at every level of our existence.
Channeling the power of this open data and the network effect can help:
- Fight government corruption, improve accountability and enhance government services
- Change the default setting of government to open, transparent and participatory
- Create new models of journalism to separate signal from noise to provide meaningful insights
- Launch multi-billion dollar businesses based on public sector data
The presentation at a Center for Media Freedom and Responsibility lecture-workshop I delivered Wednesday at the Asian Institute of Management in Makati.
I may have turned Prof. Luis Teodoro and ma’am Melinda Quintos de Jesus into fans of data visualization.
