Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

February 14 2012

The bond between data and journalism grows stronger

While reporters and editors have been the traditional vectors for information gathering and dissemination, the flattened information environment of 2012 now has news breaking first online, not on the newsdesk.

That doesn't mean that the integrated media organizations of today don't play a crucial role. Far from it. In the information age, journalists are needed more than ever to curate, verify, analyze and synthesize the wash of data.

To learn more about the shifting world of data journalism, I interviewed Liliana Bounegru (@bb_liliana), project coordinator of SYNC3 and Data Driven Journalism at the European Journalism Centre.

What's the difference between the data journalism of today and the computer-assisted reporting (CAR) of the past?

Liliana Bounegru: There is a "continuity and change" debate going on around the label "data journalism" and its relationship with previous journalistic practices that employ computational techniques to analyze datasets.

Some argue [PDF] that there is a difference between CAR and data journalism. They say that CAR is a technique for gathering and analyzing data as a way of enhancing (usually investigative) reportage, whereas data journalism pays attention to the way that data sits within the whole journalistic workflow. In this sense, data journalism pays equal attention to finding stories and to the data itself. Hence, we find the Guardian Datablog or the Texas Tribune publishing datasets alongside stories, or even just datasets by themselves for people to analyze and explore.

Another difference is that in the past, investigative reporters would suffer from a poverty of information relating to a question they were trying to answer or an issue that they were trying to address. While this is, of course, still the case, there is also an overwhelming abundance of information that journalists don't necessarily know what to do with. They don't know how to get value out of data. As Philip Meyer recently wrote to me: "When information was scarce, most of our efforts were devoted to hunting and gathering. Now that information is abundant, processing is more important."

On the other hand, some argue that there is no difference between data journalism and computer-assisted reporting. It is by now common sense that even the most recent media practices have histories as well as something new in them. Rather than debating whether or not data journalism is completely novel, a more fruitful position would be to consider it as part of a longer tradition but responding to new circumstances and conditions. Even if there might not be a difference in goals and techniques, the emergence of the label "data journalism" at the beginning of the century indicates a new phase wherein the sheer volume of data that is freely available online combined with sophisticated user-centric tools enables more people to work with more data more easily than ever before. Data journalism is about mass data literacy.

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

What does data journalism mean for the future of journalism? Are there new business models here?

Liliana Bounegru: There are all kinds of interesting new business models emerging with data journalism. Media companies are becoming increasingly innovative with the way they produce revenues, moving away from subscription-based models and advertising to offering consultancy services, as in the case of the German award-winning OpenDataCity.

Digital technologies and the web are fundamentally changing the way we do journalism. Data journalism is one part in the ecosystem of tools and practices that have sprung up around data sites and services. Quoting and sharing source materials (structured data) is in the nature of the hyperlink structure of the web and in the way we are accustomed to navigating information today. By enabling anyone to drill down into data sources and find information that is relevant to them as individuals or to their community, as well as to do fact checking, data journalism provides a much needed service coming from a trustworthy source. Quoting and linking to data sources is specific to data journalism at the moment, but seamless integration of data in the fabric of media is increasingly the direction journalism is going in the future. As Tim Berners-Lee says, "data-driven journalism is the future".

What data-driven journalism initiatives have caught your attention?

Liliana Bounegru: The data journalism project is one of my favorites. It addresses a real problem: The European Union (EU) is spending 48% of its budget on agriculture subsidies, yet the money doesn't reach those who need it.

Tracking payments and recipients of agriculture subsidies from the European Union to all member states is a difficult task. The data is scattered in different places in different formats, with some missing and some scanned in from paper records. It is hard to piece it together to form a comprehensive picture of how funds are distributed. The project not only made the data available to anyone in an easy to understand way, but it also advocated for policy changes and better transparency laws.

LRA Crisis Tracker

Another of my favorite examples is the LRA Crisis Tracker, a real-time crisis mapping platform and data collection system. The tracker makes information about the attacks and movements of the Lord's Resistance Army (LRA) in Africa publicly available. It helps to inform local communities, as well as the organizations that support the affected communities, about the activities of the LRA through an early-warning radio network in order to reduce their response time to incidents.

I am also a big fan of much of the work done by the Guardian Datablog. You can find lots of other examples featured on, along with interviews, case studies and tutorials.

I've talked to people like Chicago Tribune news app developer Brian Boyer about the emerging "newsroom stack." What do you feel are the key tools of the data journalist?

Liliana Bounegru: Experienced data journalists list spreadsheets as a top data journalism tool. Open source tools and web-based applications for data cleaning, analysis and visualization play very important roles in finding and presenting data stories. I have been involved in organizing several workshops on ScraperWiki and Google Refine for data collection and analysis. We found that participants were quite able to quickly ask and answer new kinds of questions with these tools.

How does data journalism relate to open data and open government?

Liliana Bounegru: Open government data means that more people can access and reuse official information published by government bodies. This in itself is not enough. It is increasingly important that journalists can keep up and are equipped with skills and resources to understand open government data. Journalists need to know what official data means, what it says and what it leaves out. They need to know what kind of picture is being presented of an issue.

Public bodies are very experienced in presenting data to the public in support of official policies and practices. Journalists, however, will often not have this level of literacy. Only by equipping journalists with the skills to use data more effectively can we break the current asymmetry, where our understanding of the information that matters is mediated by governments, companies and other experts. In a nutshell, open data advocates push for more data, and data journalists help the public to use, explore and evaluate it.

This interview has been edited and condensed for clarity.

Photo on associated home and category pages: NYTimes: 365/360 - 1984 (in color) by blprnt_van, on Flickr.


April 27 2011

Linked data creates a new lens for examining the U.S. Civil War

Screenshot from the Civl War Data 150 projectApril 2011 marks the 150th anniversary of the first hostilities of U.S. Civil War, and museums, municipalities, historic sites, and schools are making their preparations for the events and exhibits to commemorate it. While, no doubt, times are tough for funding cultural heritage projects, there's a lot of excitement around the sesquicentennial, making it a great opportunity for those exploring how technology can make history more interactive.

It's also a great opportunity to pursue linked data efforts across these museums and historic sites, in turn making this historical information more discoverable and interoperable. That's what the Civil War Data 150 project is undertaking, and I asked two of the project organizers — Scott Nesbitt, Civil War historian and associate director of the Digital Scholarship Lab at the University of Richmond, and Jon Voss, founder of LookBackMaps — about how the Civil War anniversary will help boost linked data and digital history efforts.

What opportunities does the sesquicentennial provide for museums, historical sites, data geeks, and developers?

Scott Nesbitt: We're in a time of remarkable collaboration across institutional barriers. Just as an example, a wide array of institutions ranging from the Slave Trail Commission to the Museum of the Confederacy came together recently in Richmond, Virginia to commemorate Civil War and Emancipation Day. More than 3,000 visitors braved the rain to tour these sites and hear presentations about new discoveries and data sources that are still emerging. In the same way, building technical links to data currently held by institutions within a single community and across the country is really quite an opportunity.

Jon Voss: The cultural heritage community has been preparing for the sesquicentennial for several years — it's a huge opportunity to engage and educate new audiences about the Civil War. Many archives and libraries have been bringing their Civil War collections to the web in new ways, and we've seen a host of new digitization efforts as well. And while there are incredible curated exhibits and events across the country, an increasing number of institutions are recognizing the power of direct discovery and are making raw data and metadata open and accessible to developers and the general public. Combined, there's a myriad of possibilities to discover and analyze the Civil War in new ways during the four-year commemoration.

How does linked data benefit the study of Civil War history?

Jon Voss: Perhaps the most exciting possibility of applying linked open data to Civil War history is to connect information and images across many standalone databases and view them together in any number of applications. One element of this is discovery — finding images associated with one regiment in multiple institutions, for instance. But more important is the ability to combine that information in an entirely new way. Just the ability to search across historical collections is a radical development, as search engines typically aren't able to crawl databases. Part of what linked data does is expose metadata that's been pretty much hidden up until now.

What are some of the new things we can learn thanks to this sort of approach?

Jon Voss: Already we're learning about the variety and sheer amount of data out there. More than 3 million Americans fought in the Civil War, and there is an enormous amount of paper left behind, including muster rolls, medical records, food shipments, pensions, photographs, correspondence, and first-hand accounts.

Scott Nesbitt: With new links between data sources, historians will be able to imagine new questions to ask that would have been discarded as nearly impossible to answer before. We don't know, for example, whether Union regiments made up of working-class men or farmers were more likely to go out of their way to set enslaved men and women free in the South, or whether units made up of primarily of Republican or Democrat men were more likely to confiscate food from devastated southern farms. So the possibilities for historians are exciting.

What are the most interesting data projects you see happening in conjunction with the anniversary?

Scott Nesbitt: Linked data presents an exciting challenge: How do we begin to make sense of the patterns within large datasets and between disparate kinds of data? At the University of Richmond, we have been building "Hidden Patterns of the Civil War," a suite of projects devoted to exploring these possibilities.

Hidden Patterns of the Civil War
"Hidden Patterns of the Civil War" collects a number of interrelated data projects.

Jon Voss: We've already seen some great data visualizations from media outlets like the History Channel, Washington Post, and The New York Times, and I expect to see a lot more of that as we go. But what's on the horizon are augmented reality and location-based apps that really bring the harsh reality of the Civil War to life — there are a few of these projects already in the works. There will also be lots of opportunities for people to transcribe and map documents and photos. That will help us make the links between disparate datasets.

What's really exciting is that all of the data we work with on this project will be permanently open and free to use.

Associated photo used on home and category pages: General John P. Hatch by The U.S. National Archives, on Flickr

This interview was edited and condensed.


Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.
No Soup for you

Don't be the product, buy the product!

YES, I want to SOUP ●UP for ...