Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

May 09 2012

Four short links: 9 May 2012

  1. We Need Version Control for Real Stuff (Chris Anderson) -- This is pointing us toward the next step, a GitHub for stuff. If open source hardware is going to take off like open source software, we need this. (via Evil Mad Scientist)
  2. Graduates and Post-Graduates on Food Stamps (Chronicle of Higher Education) -- two points for me here: the inherent evil of not paying a living wage; and the pain of market signals that particular occupations and specialisations are not as useful as once they were. I imagine it's hard to repurpose the specific knowledge in a Masters of Medieval History to some other field, though hopefully the skills of diligent hard work, rapid acquisition of knowledge, and critical thought will apply to new jobs. Expect more of this as we replace human labour with automation. I look forward to the software startup which creates work for people outside the organisation; the ultimate "create more value than you capture".
  3. Explore Exoplanets with Gestural Interfaces -- uses John Underkoffler's Oblong gestural interface. Underkoffler came up with the Minority Report interface which has fed the dreams of designers for years.
  4. Book Marketing Lessons Learned (Sarah Milstein) -- I really liked this honest appraisal of how Baratunde Thurston marketed his "How to be Black" book, and am doubly chuffed that it appeared on the O'Reilly Radar blog. I was fascinated by his Street Team, but knew I wanted to bring it to your attention when I read this. Start with your inner circle. I had an epiphany with Gary Vaynerchuk. I asked: "Did I ever ask you to buy my book?" He said, "Yeah, I bought it yesterday." I talked about his book, but cash on the table — it didn't happen. He wished he had identified everyone he knows, sending a personal note explaining: "A) buy the book; B) this means a lot to me. You owe me or I will owe you. Here's some things you can do to help: If you have speaking opportunities, let me know. For instance, I would love to speak at schools." Make it easy for people who want to help you. Everything else is bonus. If you haven't already converted the inner circle, you've skipped a critical step. "Let the people who already love you show it" is the skill I feel like I've spent years working on, and still have years to go.

March 08 2012

Profile of the Data Journalist: The Storyteller and The Teacher

Around the globe, the bond between data and journalism is growing stronger. In an age of big data, the growing importance of data journalism lies in the ability of its practitioners to provide context, clarity and, perhaps most important, find truth in the expanding amount of digital content in the world. In that context, data journalism has profound importance for society.

To learn more about the people who are doing this work and, in some cases, building the newsroom stack for the 21st century, I conducted in-person and email interviews during the 2012 NICAR Conference and published a series of data journalist profiles here at Radar.

Sarah Cohen (@sarahduke), the Knight professor of the practice of journalism and public policy at Duke University, and Anthony DeBarros (@AnthonyDB), the senior database editor at USA Today, were both important sources of historical perspective for my feature on how data journalism is evolving from "computer-assisted reporting" (CAR) to a powerful Web-enabled practice that uses cloud computing, machine learning and algorithms to make sense of unstructured data.

The latter halves of our interviews, which focused upon their personal and professional experience, follow.

What data journalism project are you the most proud of working on or creating?

DeBarros: "In 2006, my USA TODAY colleague Robert Davis and I built a database of 620 students killed on or near college campuses and mined it to show how freshmen were uniquely vulnerable. It was a heart-breaking but vitally important story to tell. We won the 2007 Missouri Lifestyle Journalism Awards for the piece, and followed it with an equally wrenching look at student deaths from fires."

Cohen: "I'd have to say the Pulitzer-winning series on child deaths in DC, in which we documented that children were dying in predictable circumstances after key mistakes by people who knew that their agencies had specific flaws that could let them fall through the cracks.

I liked working on the Post's POTUS Tracker and Head Count. Those were Web projects that were geared at accumulating lots of little bits about Obama's schedule and his appointees, respectively, that we could share with our readers while simultaneously building an important dataset for use down the road. Some of the Post's Solyndra and related stories, I have heard, came partly from studying the president's trips in POTUS Tracker.

There was one story, called "Misplaced Trust," on DC's guardianship system, that created immediate change in Superior Court, which was gratifying. "Harvesting Cash," our 18-month project on farm subsidies, also helped point out important problems in that system.

The last one, I'll note, is a piece of a project I worked on, in which the DC water authority refused to release the results of a massive lead testing effort, which in turn had shown widespread contamination. We got the survey from a source, but it was on paper.

After scanning, parsing, and geocoding, we sent out a team of reporters to neighborhoods to spot check the data, and also do some reporting on the neighborhoods. We ended up with a story about people who didn't know what was near them.

We also had an interesting experience: the water authority called our editor to complain that we were going to put all of the addresses online -- they felt that it was violating peoples' privacy, even though we weren't identifyng the owners or the residents. It was more important to them that we keep people in the dark about their blocks. Our editor at the time, Len Downie, said, "you're right. We shouldn't just put it on the Web." He also ordered up a special section to put them all in print.

Where do you turn to keep your skills updated or learn new things?

Cohen: "It's actually a little harder now that I'm out of the newsroom, surprisingly. Before, I would just dive into learning something when I'd heard it was possible and I wanted to use it to get to a story. Now I'm less driven, and I have to force myself a little more. I'm hoping to start doing more reporting again soon, and that the Reporters' Lab will help there too.

Lately, I've been spending more time with people from other disciplines to understand better what's possible, like machine learning and speech recognition at Carnegie Mellon and MIT, or natural language processing at Stanford. I can't DO them, but getting a chance to understand what's out there is useful. NewsFoo, SparkCamp and NICAR are the three places that had the best bang this year. I wish I could have gone to Strata, even if I didn't understand it all."

DeBarros: For surveillance, I follow really smart people on Twitter and have several key Google Reader subscriptions.

To learn, I spend a lot of time training after work hours. I've really been pushing myself in the last couple of years to up my game and stay relevant, particularly by learning Python, Linux and web development. Then I bring it back to the office and use it for web scraping and app building.

Why are data journalism and "news apps" important, in the context of the contemporary digital environment for information?

Cohen: "I think anything that gets more leverage out of fewer people is important in this age, because fewer people are working full time holding government accountable. The news apps help get more eyes on what the government is doing by getting more of what we work with and let them see it. I also think it helps with credibility -- the 'show your work' ethos -- because it forces newsrooms to be more transparent with readers / viewers.

For instance, now, when I'm judging an investigative prize, I am quite suspicious of any project that doesn't let you see each item, I.e., when they say, "there were 300 cases that followed this pattern," I want to see all 300 cases, or all cases with the 300 marked, so I can see whether I agree.

DeBarros: "They're important because we're living in a data-driven culture. A data-savvy journalist can use the Twitter API or a spreadsheet to find news as readily as he or she can use the telephone to call a source. Not only that, we serve many readers who are accustomed to dealing with data every day -- accountants, educators, researchers, marketers. If we're going to capture their attention, we need to speak the language of data with authority. And they are smart enough to know whether we've done our research correctly or not.

As for news apps, they're important because -- when done right -- they can make large amounts of data easily understood and relevant to each person using them."

These interviews were edited and condensed for clarity.

Sponsored post
Reposted bySchrammelhammelMrCoffeinmybetterworldkonikonikonikonikoniambassadorofdumbgroeschtlNaitliszpikkumyygittimmoe

Profile of the Data Journalist: The Hacks Hacker

Around the globe, the bond between data and journalism is growing stronger. In an age of big data, the growing importance of data journalism lies in the ability of its practitioners to provide context, clarity and, perhaps most important, find truth in the expanding amount of digital content in the world. In that context, data journalism has profound importance for society.

To learn more about the people who are doing this work and, in some cases, building the newsroom stack for the 21st century, I conducted a series of email interviews during the 2012 NICAR Conference. This interview followed the conference and featured a remote participant who diligently used social media and the World Wide Web to document and share the best of NICAR:

Chrys Wu (@MacDiva) is a data journalist and user engagement strategist based in New York City. Our interview follows.

Where do you work now? What is a day in your life like?

I work with clients through my company, Matchstrike, which specializes in user engagement strategy. It's a combination of user experience research, design and program planning. Businesses turn to me to figure out how to keep people's attention, create community and tie that back to return on investment.

I also launch Hacks/Hackers chapters around the world and co-organize the group in New York with Al Shaw of ProPublica and Jacqui Cox of The New York Times.

Both things involve seeking out people and ideas, asking questions, reading, wireframing and understanding what motivates people as individuals and as groups.

How did you get started in data journalism? Did you get any special degrees or certificates?

I had a stats class in high school with a really terrific instructor who also happened to be the varsity basketball coach. He was kind of like our John Wooden. Realizing the importance of statistics, being able to organize and interpret data — and learning how to be skeptical of claims (e.g., where "4 out of 5 dentists agree" comes from)— has always stayed with me.

Other than that class and studying journalism at university, what I know has come from exploring (finding what's out there), doing (making something) and working (making something for money). I think that's pretty similar to most journalists and journalist-developers currently in the field.

Though I've spent several years in newsrooms (most notably with the Los Angeles Times and CBS Digital Media Group), most of my journalism and communications career has been as a freelancer. One of my earliest clients specialized in fundraising for Skid Row shelters. I quantified the need cases for her proposals. That involved working closely with the city health and child welfare departments and digging through a lot of data.

Once I figured that out, it was important to balance the data with narrative. Numbers and charts have a much more profound impact on people if they're framed by an idea to latch onto and compelling story to share.

Did you have any mentors? Who? What were the most important resources they shared with you?

I don't have individual mentors, but there's an active community with a huge body of work out there to learn from. It's one of the reasons why I've been collecting things on Delicious and Pinboard, and it's why I try my best to put everything that's taught at NICAR on my blog.

I always try look beyond journalism to see what people are thinking about and doing in other fields. Great ideas can come from everywhere. There are lots of very smart people willing to share what they know.

What does your personal data journalism "stack" look like? What tools could you not live without?

I use Coda and TextMate most often. For wireframing, I'm a big fan of OmniGraffle. I code in Ruby, and a little bit in Python. I'm starting to learn how to use R for dataset manipulation and for its maps library.

For keeping tabs on new but not urgent-to-read material, I use my friend Samuel Clay's RSS reader, Newsblur.

What data journalism project are you the most proud of working on or creating?

I'm most proud of working with the Hacks/Hackers community. Since 2009, we've grown to more than 40 groups worldwide, with each locality bringing journalists, designers and developers together to push what's possible for news.

As I say, talking is good; making is better — and the individual Hacks/Hackers chapters have all done some version of that: presentations, demos, classes and hack days. They're all opportunities to share knowledge, make friends and create new things that help people better understand what's happening around them.

Where do you turn to keep your skills updated or learn new things?

MIT's open courses have been great. There's also blogs, mailing lists, meetups, lectures and conferences. And then there's talking with friends and people they know.

Why are data journalism and "news apps" important, in the context of the contemporary digital environment for information?

I like Amanda Cox's view of the importance of reporting through data. She's a New York Times graphics editor who comes from a statistics background. To paraphrase: Presenting a pile of facts and numbers without directing people toward any avenue of understanding is not useful.

Journalism is fundamentally about fact-finding and opening eyes. One of the best ways to do that, especially when lots of people are affected by something, is to interweave narrative with quantifiable information.

Data journalism and news apps create the lens that shows people the big picture they couldn't see but maybe had a hunch about otherwise. That's important for a greater understanding of the things that matter to us as individuals and as a society.

This interview has been edited and condensed for clarity.

March 06 2012

Profile of the Data Journalist: The Daily Visualizer

Around the globe, the bond between data and journalism is growing stronger. In an age of big data, the growing importance of data journalism lies in the ability of its practitioners to provide context, clarity and, perhaps most important, find truth in the expanding amount of digital content in the world. In that context, data journalism has profound importance for society.

To learn more about the people who are doing this work and, in some cases, building the newsroom stack for the 21st century, I conducted a series of email interviews during the 2012 NICAR Conference.

Matt Stiles (@stiles) , a data journalist based in Washington, D.C., maintains a popular Daily Visualization blog. Our interview follows.

Where do you work now? What is a day in your life like?

I work at NPR, where I oversee data journalism on the State Impact project, a local-national partnership between us and member stations. My typical day always begins with a morning "scrum" meeting among the D.C. team as part of our agile development process. I spend time acquiring and analyzing data throughout each data, and I typically work directly with reporters, training them on software and data visualization techniques. I also spend time planning news apps and interactives, a process that requires close consultation with reporters, designers and developers.

How did you get started in data journalism? Did you get any special degrees or certificates?

No special training or certificates, though I did attend three NICAR boot camps (databases, mapping, statistics) over the years.

Did you have any mentors? Who? What were the most important resources they shared with you?

I have several mentors, both on the reporting side and the data side. For data, I wouldn't be where I am today without the help of two people: Chase Davis and Jennifer LaFleur. Jen got me interested early, and has helped me with formal and informal training over the years. Chase helped me with day-to-day questions when we worked together at the Houston Chronicle.

What does your personal data journalism "stack" look like? What tools could you not live without?

I have a MacBook that runs Windows 7. I have the basic CAR suite (Excel/Access, ArcGIS, SPSS, etc.) but also plenty of open-source tools, such as R for visualization or MySQL/Postgres for databases. I use Coda and Text Mate for coding. I use BBEdit and Python for text manipulation. I also couldn't live without Photoshop and Illustrator for cleaning up graphics.

What data journalism project are you the most proud of working on or creating?

I'm most proud of the online data library I created (and others have since expanded) at The Texas Tribune, but we're building some sweet apps at NPR. That's only going to expand now that we've created a national news apps team, which I'm joining soon.

Where do you turn to keep your skills updated or learn new things?

I read blogs, subscribe to email lists and attend lots of conferences for inspiration. There's no silver bullet. If you love this stuff, you'll keep up.

Why are data journalism and "news apps" important, in the context of the contemporary digital environment for information?

More and more information is coming at us every day. The deluge is so vast. Data journalism at its core is important because it's about facts, not anecdotes.

Apps are important because Americans are already savvy data consumers, even if they don't know it. We must get them thinking -- or, even better, not thinking -- about news consumption in the same way they think about syncing their iPads or booking flights on Priceline or purchasing items on eBay. These are all "apps" that are familiar to many people. Interactive news should be, too.

This interview has been edited and condensed for clarity.

March 02 2012

Profile of the Data Journalist: The Visualizer

Around the globe, the bond between data and journalism is growing stronger. In an age of big data, the growing importance of data journalism lies in the ability of its practitioners to provide context, clarity and, perhaps most important, find truth in the expanding amount of digital content in the world. In that context, data journalism has profound importance for society.

To learn more about the people who are doing this work and, in some cases, building the newsroom stack for the 21st century, I conducted a series of email interviews during the 2012 NICAR Conference.

Michelle Minkoff (@MichelleMinkoff ) is an investigative developer/journalist based in Washington, D.C. Our interview follows.

Where do you work now? What is a day in your life like?

I am an Interactive Producer at the Associated Press' Washington DC bureau, where I focus on news applications related to politics and the election, as well as general mapping for our interactives on the Web. While my days pretty much always involve sitting in front of a computer, the actual tasks themselves can vary wildly. I may be chatting with reporters and editors in politics, environment, educational, national security or myriad beats about upcoming stories and how to use data to support reporting or create interactive stories. I might be gathering data, reformatting it or crafting Web applications. I spend a great deal of time creating interactive mapping systems, working a lot with geographic data, and collaborating with cartographers, editors and designers to decide how to best display it.

I split my time between working closely with my colleagues in the Washington bureau on the reporting/editing side, and my fellow interactive team members, only one of whom is also in DC. Our team is global, headquartered in New York, but with members spanning the globe from Phoenix to Bangkok.

It's a question of walking a balance between what needs to be done on daily deadlines for breaking news, longer-term stories which are often investigative, and creating frameworks that help The Associated Press to make the most of the Web's interactive nature in the long run.

How did you get started in data journalism? Did you get any special degrees or certificates?

I caught the bug when I took a computer-assisted reporting class from Derek Willis, a member of the New York Times' Interactive News Team, at Northwestern's journalism school where I was a grad student. I was fascinated by the role that technology could play in journalism for reporting and presentation, and very quickly got hooked. I also quickly discovered that I could lose track of hours playing with these tools, and that what came naturally to me was not as natural to others. I would spend days reporting for class, on and off Capitol Hill, and nights exchanging gchats with Derek and other data journalists he introduced me to. I started to understand SQL, advanced Excel, and fairly quickly thereafter, Python and Django.

I followed this up with an independent study in data visualization back at Medill's Chicago campus, under Rich Gordon. I practiced making Django apps, played with the Processing visualization language. I voraciously read through all the Tufte books. As a final project, I created a package about the persistence of Chicago art galleries that encompasses text, Flash visualization and a searchable database.

I have a concentration in Interactive Journalism, with my Medill masters' degree, but the courses mentioned above are but a partial component of that concentration.

Did you have any mentors? Who? What were the most important resources they shared with you?

The question here is in the wrong tense. I currently "do" have many mentors, and I don't know how I would do my job without what they've shared in the past, and in the present. Derek, mentioned above, was the first. He introduced me to his friend Matt [Waite], and then he told me there was a whole group of people doing this work at NICAR. Literally hundreds of people from that organization have helped me at various places on my journey, and I believe strongly in the mantra of "paying it forward" as they have -- no one can know it all, so we pass on what we've learned, so more people can do even better work.

Other key folks I've had the privilege to work with include all of the Los Angeles Times' Data Desk's members, which includes reporters, editors and Web developers. I worked most closely with Ben Welsh and Ken Schwencke, who answered many questions, and were extremely encouraging when I was at the very beginning of my journey.

At my current job at The Associated Press, I'm lucky to have teammates who mentor me in design, mapping and various Washington-based beats. Each is helpful in his or her own way.

Special attention deserves to be called to Jonathan Stray, who's my official boss, but also a fantastic mentor who enables me to do what I do. He's helping me to learn the appropriate technical skills to execute what I see in my head, as well as learn how to learn. He's not just teaching me the answers to the problems we encounter in our daily work, but also helping me learn how to better solve them, and work this whole "thing I do" into a sustainable career path. And all with more patience than I have for myself.

What does your personal data journalism "stack" look like? What tools could you not live without?

No matter how advanced our tools get, I always find myself coming back to Excel first to do simple work. It helps us an overall handle on a data set. I also will often quickly bring data into SQLite, a Firefox extension that allows a user to run SQL queries, with no database setup. I'm more comfortable asking complicated questions of data that way. I also like to use Google's Chart Tools to create quick visualizations for myself to better understand a story.

When it comes to presentation, since I've been doing a lot with mapping recently, I don't know what I'd do without my favorite open source tools, Tilemill and Leaflet. Building a map stack is hard work, but the work that others have done before it have made it a lot easier.

If we consider programming languages tools (which I do), JavaScript is my new Swiss army knife. Prior to coming to the AP, I did a lot with Python and Django, but I've learned a lot about what I like to call "Really Hard JavaScript." It's not just about manipulating the colors of a background on a Web page, but parsing, analyzing and presenting data. When I need to do more complex work to manipulate data, I use a combination of Ruby and Python -- depending on which has better tools for the job. For XML parsing, I like Ruby more. For simplifying geo data, I prefer Python.

What data journalism project are you the most proud of working on or creating?

That would be " Road to 270", a project we did at the AP that allows users to test out hypothetical "what-if" scenarios for the national election, painting states to define to which candidate a state's delegates could go. It combines demographic and past election data with the ability for users to make a choice and deeply engage with the interactive. It's not just telling the user a story, but informing the user by allowing him or her to be part of the story. That, I believe, is when data journalism becomes its most compelling and informative.

It also uses some advanced technical mapping skills that were new to me. I greatly enjoyed the thrill of learning how to structure a complex application, and add new tools to my toolkit. Now, I don't just have those new tools, but a better understanding of how to add other new tools.

Where do you turn to keep your skills updated or learn new things?

I look at other projects, both within the journalism industry and in general visualization communities. The Web inspector is my best friend. I'm always looking to see how people did things. I read blogs voraciously, and have a fairly robust Google Reader set of people whose work I follow closely. I also use frequently (I tend to learn best by video tutorials.) Hanging out on listservs for free tools I use (such as Leaflet), programming languages I care about (Python), or projects whose mission our work is related to (Sunlight Foundation) help me engage with a community that cares about similar issues.

Help sites like Stack Overflow, and pretty much anything I can find on Google, are my other best friends. The not-so-secret secret of data journalism: we're learning as we go. That's part of what makes it so fun.

Really, the learning is not about paper or electronic resources. Like so much of journalism, this is best conquered, I argue, with persistence and stick-to-it-ness. I approach the process of data journalism and Web development as a beat. We attend key meetings. Instead of city council, it's NICAR. We develop vast rolodexes. I know people who have myriad specialties and feel comfortable calling on them. In return, I help people all over the world with this sort of work whenever I can, because it's that important. While we may work for competing places, we're really working toward the same goal: improving the way we inform the public about what's going on in our world. That knowledge matters a great deal.

Why are data journalism and "news apps" important, in the context of the contemporary digital environment for information?

More and more information is coming at us every day. The deluge is so vast that we need to not just say things are true, but prove those truths with verifiable facts. Data journalism allows for great specificity, and truths based in the scientific method. Using computers to commit data journalism allows us to process great amounts of information much more efficiently, and make the world more comprehensible to a user.

Also, while we are working with big data, often only a subset of that data is valuable to a specific user. Data journalism and Web development skills allow us to customize those subsets for our various users, such as by localizing a map. That helps us give a more relevant and useful experience to each individual we serve.

Perhaps most importantly, more and more information is digital, and is coming at us through the Internet. It simply makes sense to display that information with a similar environment in which it's provided. Information is dispensed in a different way now than it was five years ago. It will be totally different in another five years. So, our explanations of that environment should match. We must make the most of the Internet to tell our stories differently now than we did before, and differently than we will in the future.

Knowing things are constantly changing, being at the forefront of that change, and enabling the public to understand and participate in that change, is a large part of what makes data journalism so exciting and fundamentally essential.

This interview has been edited and condensed for clarity.

September 15 2011

Global Adaptation Index enables better data-driven decisions

The launch of the Global Adaptation Index (GaIn) literally puts a powerful open data browser into the hands of anyone with a connected mobile device. The index rates a given country's vulnerability to environmental shifts precipitated by climate change, its readiness to adapt to such changes, and its ability to utilize investment capital that would address the state of those vulnerabilities.

Global Adaptation Index

The Global Adaptation Index combines development indicators from 161 countries into a map that provides quick access to thousands of open data records. All of the data visualizations at are powered by indicators that are openly available and downloadable under a Creative Commons license.

"All of the technology that we're using is a way to bring this information close to society," said Bruno Sanchez-Andrade Nuño, the director of science and technology at the Global Adaptation Institute (GAI), the organization that launched the index.

Open data, open methodology

The project was helped by the World Bank's move to open data, including the release of its full development database. "All data is from sources that are already open," said Ian Noble, chief scientist at GAI. "We would not use any data that had restrictions. We can point people through to the data source and encourage them to download the data."

Being open in this manner is "the most effective way of testing and improving the index," said Noble. "We have to be certain that data is from a quality, authoritative source and be able to give you an immediate source for it, like the FAO, WHO or disaster database."

"It's not only the data that's open, but also our methodology," said Nuño. " is a really good base, with something like 70% of our data going through that portal. With some of the rest of the data, we see lots of gaps. We're trying to make all values consistent.

Strata Conference New York 2011, being held Sept. 22-23, covers the latest and best tools and technologies for data science — from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.

Save 30% on registration with the code ORM30

Node.js powers the data browser

"This initiative is a big deal in the open data space as it shows a maturing from doing open data hacking competitions to powering a portal that will help channel billions of investment dollars over the next several years," said Development Seed associate Bonnie Bugle in a prepared statement. Development Seed built the site with open source tools, including Node.js and CouchDB.

The choice of Node is a useful indicator, in terms of where the cutting edge of open source technology is moving. "The most important breakthrough is moving beyond PHP and Drupal — our initial thought — to Node.js," said Nuño. "Drupal and PHP are robust and well known, but this seems like the next big thing. We really wanted to push the limits of what's possible. Node.js is faster and allows for more connections. If you navigate countries using the data browser, you're just two clicks away from the source data. It doesn't feel like a web page. It feels native."

Speed of access and interoperability were important considerations, said Nuño. "It works on an iOS device or on a slow connection, like GPRS." Noble said he had even accessed it from rural Australia using an iPad.

Highlights from the GAI press conference are available in the following video:

Global Adaptation Index Press Conference: Data Browser Launched from Development Seed on Vimeo.


September 07 2011

Look at Cook sets a high bar for open government data visualizations

Every month, more open government data is available online. Open government data is being used in mobile apps, baked into search engines or incorporated into powerful data visualizations. An important part of that trend is that local governments are becoming data suppliers.

For local, state and federal governments, however, releasing data is not enough. Someone has to put it to work, pulling the data together to create cohesive stories so citizens and other stakeholders can gain more knowledge. Sometimes this work is performed by public servants, though data visualization and user experience design has historically not been the strong suit of government employees. In the hands of skilled developers and designers, however, open data can be used to tell powerful stories.

One of the best recent efforts at visualizing local open government data can be found at Look at Cook, which tracks government budgets and expenditures from 1993-2011 in Cook County, Illinois.


The site was designed and developed by Derek Eder and Nick Rougeux, in collaboration with Cook County Commissioner John Fritchey. Below, Eder explains how they built the site, the civic stack tools they applied, and the problems Look at Cook aims to solve.

Why did you build Look at Cook?

Derek Eder: After being installed as a Cook County Commissioner, John Fritchey, along with the rest of the Board of Commissioners, had to tackle a very difficult budget season. He realized that even though the budget books were presented in the best accounting format possible and were also posted online in PDF format, this information was still not friendly to the public. After some internal discussion, one of his staff members, Seth Lavin approached me and Nick Rougeux and asked that we develop a visualization that would let the public easily explore and understand the budget in greater detail. Seth and I had previously connected through some of Chicago's open government social functions, and we were looking for an opportunity for the county and the open government community to collaborate.

Strata Conference New York 2011, being held Sept. 22-23, covers the latest and best tools and technologies for data science — from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.

Save 30% on registration with the code ORM30

What problems does Look at Cook solve for government?

Derek Eder: Look at Cook shines a light on what's working in the system and what's not. Cook County, along with many other municipalities, has its fair share of problems, but before you can even try to fix any of them, you need to understand what they are. This visualization does exactly that. You can look at the Jail Diversion department in the Public Safety Fund and compare it to the Corrections and Juvenile Detention departments. They have an inverse relationship, and you can actually see one affecting the other between 2005 and 2007. There are probably dozens of other stories like these hidden within the budget data. All that was needed was an easy way to find and correlate them — which anyone can now do with our tool.

Look at Cook visualization example
Is there a relationship between the lower funding for Cook County's Jail Diversion and Crime Prevention division and the higher funding levels for the Department of Corrections and the Juvenile Temporary Detention Center divisions? (Click to enlarge.)

What problems does Look at Cook solve for citizens?

Derek Eder: Working on and now using Look at Cook opened my eyes to what Cook County government does. In Chicago especially, there is a big disconnect between where the county begins and where the city ends. Now I can see that the county runs specific hospitals and jails, maintains highways, and manages dozens of other civic institutions. Additionally, I know how much money it is spending on each, and I can begin to understand just how $3.5 billion dollars are spent every year. If I'm interested, I can take it a step further and start asking questions about why the county spends money on what it does and how it has been distributed over the last 18 years. Examples include:

  • Why did the Clerk of the Circuit Court get a 480% increase in its budget between 2007 and 2008? See the 2008 public safety fund.
  • How is the Cook County Board President going to deal with a 74% decrease in appropriations for 2011? See the 2011 president data.
  • What happened in 2008 when the Secretary of the Board of Commissioners got its funding reallocated to the individual District Commissioners? See the 2008 corporate fund.

As a citizen, I now have a powerful tool for asking these questions and being more involved in my local government.

What data did you use?

Derek Eder: We were given budget data in a fairly raw format as a basic spreadsheet broken down into appropriations and expenditures by department and year. That data went back to 1993. Collectively, we and Commissioner Fritchey's office agreed that clear descriptions of everything were crucial to the success of the site, so his office diligently spent the time to write and collect them. They also made connections between all the data points so we could see what control officer was in charge of what department, and they hunted down the official websites for each department.

What tools did you use to build Look at Cook?

Derek Eder: Our research began with basic charts in Excel to get an initial idea of what the data looked like. Considering the nature of the data, we knew we wanted to show trends over time and let people compare departments, funds, and control officers. This made line and bar charts a natural choice. From there, we created a couple iterations of wireframes and storyboards to get an idea of the visual layout and style. Given our prior technical experience building websites at Webitects, we decided to use free tools like jQuery for front-end functionality and Google Fusion Tables to house the data. We're also big fans of Google Analytics, so we're using it to track how people are using the site.

Specifically, we used:

What design principles did you apply?

Derek Eder: Our guiding principles were clarity and transparency. We were already familiar with other popular visualizations, like the New York Times' federal budget and the Death and Taxes poster from WallStats. While they were intriguing, they seemed to lack some of these traits. We wanted to illustrate the budget in a way that anyone could explore without being an expert in county government. From a visual standpoint, the goal was to present the information professionally and essentially let the visuals get out of the way so the data could be the focus.

We feel that designing with data means that the data should do most of the talking. Effective design encourages people to explore information without making them feel overwhelmed. A good example of this is how we progressively expose more information as people drill down into departments and control officers. Effective design should also create some level of emotional connection with people so they understand what they're seeing. For example, someone may know one of the control officers or have had an experience with one of the departments. This small connection draws their attention to those areas and gets them to ask questions about why things are the way they are.

This interview was edited and condensed.


August 26 2011

Social, mapping and mobile data tell the story of Hurricane Irene

As Hurricane Irene bears down the East Coast, millions of people are bracing for the impact of what could be a multi-billion dollar disaster.

We've been through hurricanes before. What's different about this one is the unprecedented levels of connectivity that now exist up and down the East Coast. According to the most recent numbers from the Pew Internet and Life Project, for the first time, more than 50% of American adults use social networks. 35% of American adults have smartphones. 78% of American adults are connected to the Internet. When combined, those factors mean that we now see earthquake tweets spread faster than the seismic waves themselves. The growth of an Internet of things is an important evolution. What we're seeing this weekend is the importance of an Internet of people.

As citizens look for hurricane information online, government websites are under high demand. In this information ecosystem, media, government and citizens alike will play a critical role in sharing information about what's happening and providing help to one another. The federal government is providing information on Hurricane Irene at and sharing news and advisories in real-time on the radio, television, mobile devices and online using social media channels like @fema. As the storm comes in, FEMA recommends for desktops.

Over the next 72 hours a networked public can share its effects in real-time, providing city, state and federal officials unprecedented insight into what's happening. Citizens will be acting as sensors in the midst of the storm, creating an ad hoc system of networked accountability through data. There are already efforts underway to organize and collect the crisis data that citizens are generating, along with putting the open data that city and state government have released.

Following are just a few examples of how data is playing a role in hurricane response and reporting.

Open data in the Big Apple

The city of New York is squarely in the path of Hurricane Irene and has initiated mandatory evacuations from low-lying areas. The NYC Mayor's Office has been providing frequent updates to New Yorkers as the hurricane approaches, including links to an evacuation map, embedded below:

NYC Hurricane Evacuation Map

The city provides public hurricane evacuation data on the NYC DataMine. Geographic data regarding NYC Hurricane Evacuation Zones and Hurricane Evacuation Centers is publicly available on the NYC DataMine. To find and use this open data, search for “Data by Agency” and select “Office of Emergency Management (OEM). Developers can also download Google Earth KMZ files for the Hurricane Evacuation Zones. If you have any trouble accessing these files, civic technologist Philip Ashlock is mirroring NYC Irene data and links on Amazon Web Services (AWS).

"This data is already being used to power a range of hurricane evacuation zone maps completely independent of the City of New York, including at and the New York Times," said Rachel Sterne, chief digital officer of New York City. "As always, we support and encourage developers to develop civic applications using public data."

Strata Conference New York 2011, being held Sept. 22-23, covers the latest and best tools and technologies for data science — from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.

30% on registration with the code STN11RAD

Partnering with citizens in Maryland

"We're partnering with SeeClickFix to collect reports from citizens about the effects from Irene to help first responders," said Bryan Sivak, Maryland's chief innovation officer, in a phone interview. The state has invited its citizens to share and view hurricane data throughout the state.

"This is interesting from a state perspective because there are very few things that we are responsible for or have the ability to fix. Any tree branches or wires that go down will be fixed by a local town or a utility. The whole purpose is to give our first responders another channel. We're operating under the perspective that more information is better information. By having more eyes and ears out there reporting data, we can make better informed decisions from an emergency management perspective. We just want to stress that this is a channel for communication, as opposed to a way to get something fixed. If this channel is useful in terms of managing the situation, we'll work with local governments in the future to see if it can help them. "

Your browser does not support iframes. Try it from

SeeClickFix has been working on enabling government to use citizens as public sensors since its founding. We'll see if they can help Maryland with Hurricane Irene this weekend.

[Disclosure: O'Reilly AlphaTech Ventures is an investor in SeeClickFix.]

The best hurricane tracker ever?

In the face of the storm, the New York Times has given New Yorkers one of the best examples of data journalism I've seen to date, a hurricane tracker that puts open data from the National Weather Service to beautiful use.

If you want a virtuoso human curation of the storm, New York Times reporter Brian Stelter is down in the Carolinas and reporting live via Twitter.

Crisismapping the hurricane

A Hurricane Irene crisis map is already online, where volunteers have stood up an instance of Ushahidi:

Mashing up social and geospatial data

ESRI has also posted a mashup that combines video and tweets onto an interactive map, embedded below:

The Florida Division of Emergency Management is maintaining, with support from DHS Science and Technology, mashing up curated Twitter accounts. You can download live shape files of tweeters and KML files to use if you wish.

Google adds data layers

There are also a wealth of GIS and weather data feeds powering's Hurricane Season mashup:


If you have more data stories or sources from Hurricane Irene, please let me know at or on Twitter at @digiphile. If you're safe, dry and connected, you can also help Crisis Commons by contributing to the Hurricane Irene wiki.

August 19 2011

Visualizing hunger in the Horn of Africa

Drought, conflict and rising food prices have put the lives of millions of people in the Horn of Africa at risk. Today, on World Humanitarian Day, citizens and governments alike are looking for ways to help victims of the East Africa drought. According to the State Department, more people than the combined populations of New York City and Houston need urgent assistance in the Horn of Africa. To understand the scope of the unfolding humanitarian disaster, explore the embedded map below.

The map was built by Development Seed using open source tools and open data. It includes estimates from the Famine Early Warning System Network (FEWS NET) and the Food Security and Nutrition Survey Unit - Somalia (FSNAU), coupled with data from the UN Office of Humanitarian Coordination and Affairs (UN OCHA). The map mashes up operational data from the World Food Program with situational data to show how resources are being allocated.

"This is about more than just creating a new map," writes Nate Smith, a data lead at Development Seed:

This map makes information actionable and makes its easy to see both the extent of the crisis and the response to it. It allows people to quickly find information about how to easily contribute much needed donations to support aid efforts on the ground, and see where those donations are actually going. In the Horn of Africa, the World Food Programme can feed one person for one day with just $0.50. Using this map it is possible to see what is needed budget wise to feed those in need, and how close the World Food Programme is in achieving this. Going forward, new location and shipment data will be posted in near real-time, keeping the data as accurate as possible.

Development Seed has also applied a fundamental platform principle by making it easy to spread both the data and message through social tools and embeddable code.

If you'd like to donate to organizations that are working to help people directly affected in the crisis, has posted a list of charities. If you'd prefer to donate directly to the World Food Program, you can also text AID to 27722 using your mobile phone to give $10 to help those affected by the Horn of Africa crisis.


August 12 2011

Visualization of the Week: Visualizing the Library Catalog

WorldCat, the world's largest library catalog, has launched a new interactive tool that lets users visually explore the catalog, specifically the relationships between WorldCat "Identities." A WorldCat Identity can be a person (an author or a fictional or non-fictional character, for example), a thing (an animal or a boat, for example), or a corporation.

A screenshot from the WorldCat Identity Network. Click to visit the full interactive version.

The WorldCat Identity Network uses the WorldCat Search API and the WorldCat Identities Web Service to create an interactive map.

Using these Identity Maps, users will be able to see how these subject-based identities are interconnected. For example, they could see relationships between authors and their characters, but also relationships between authors and between subjects. Below each Identity Map, the tool also gives a list of relevant titles found in WorldCat.

href=""> style="float: left; border: none; padding-right: 10px;"
src="" /> href="">Strata
Conference New York 2011
, being held Sept. 22-23, covers
the latest and best tools and technologies for data science -- from
gathering, cleaning, analyzing, and storing data to communicating data
intelligence effectively.

20% on registration with the code STN11RAD

Found a great visualization? Tell us about it

This post is part of an ongoing series exploring visualizations. We're always looking for leads, so please drop us a line if there's a visualization you think we should know about.

August 11 2011

Strata Week: Twitter's coming Storm, data and maps from the London riots

Here are a few of the data stories that caught my attention this week:

Twitter's coming Storm


In a blog post late last week, Twitter announced that it plans to open source Storm, its Hadoop-like data processing tool. Storm was developed by BackType, the social media analytics company that Twitter acquired last month. Several of BackType's other technologies, including ElephantDB, have already been open sourced, and Storm will join them this fall, according to Nathan Marz, formerly of BackType now of Twitter.

Marz's post digs into how Storm works as well as how it can be applied. He notes that a Storm cluster is only "superficially similar" to a Hadoop cluster. Instead of running MapReduce "jobs," Storm runs "topologies." One of the key differences is that a MapReduce job eventually finishes, whereas a topology processes messages "forever (or until you kill it)." This makes Storm useful, among other things, for processing real-time streams of data, continuous computation, and distributed RPC.

Touting the technology's ease-of-use, Marz lists the following complexities "under the hood: guaranteed message processing, robust process management, fault detection and automatic reassignment, efficient message passing, and local mode and distributed mode. More details -- and more documentation -- will follow in September 19 when Storm is officially open sourced.

Strata Conference New York 2011, being held Sept. 22-23, covers the latest and best tools and technologies for data science -- from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.

20% on registration with the code STN11RAD

Mapping the London riots

Using real-time social streams and mapping tools in a crisis situation is hardly new. We've seen citizens, developers, journalists, governments alike undertake these efforts following a multitude of natural disasters. But the violence that erupted in London over weekend has proven yet again that these data tools are important for both safety and for analysis and understanding. Indeed, as journalist Kevin Anderson argued, "data journalists and social scientists should join forces" to understand the causes and motivations for the riots, rather than the more traditional "hours of speculation on television and acres of newsprint positing theories."

NPR's Matt Stiles was just one of the data journalists who picked up the mantle. Using data from The Guardian, he created a map that highlighted riot locations, overlaid on top of a colored representation of indices of deprivation." This makes a pretty compelling visualization, demonstrating that the areas with the most incidents of violence are also the least well-off areas of London.


In a reflective piece in PaidContent, James Cridland examined his experiences trying to use social media to map the riots. He created a Google Map where he was marking "verified incident areas." As he describes it, however, that verifiability became quite challenging. His "lessons learned" included realizations about what constitutes a reliable source.

"Twitter is not a reliable source: I lost count of the amount of times I was told that riots were occurring in Derby or Manchester. They weren’t, yet on Twitter they were being reported as fact, despite the Derbyshire Constabulary and Greater Manchester Police issuing denials on Twitter. I realised that, in order for this map to be useful, every entry needed to be verified, and verifiable for others too. For every report, I searched Google News, Twitter, and major news sites to try and establish some sort of verification. My criteria was that something had to be reported by an established news organisation (BBC, Sky, local newspapers) or by multiple people on Twitter in different ways.

Cridland points out that the traditional news media wasn't reliable either, as the BBC for example reported disturbances that never occurred or misreported their location.

"Many people don't know what a reliable source is," he concludes. "I discovered it was surprisingly easy to check the veracity of claims being made on Twitter by using the Internet to check and cross-reference, rather than blindly retweet."

When data disappears

Following the riots in the U.K., there is now a trove of data -- from Blackberry Messenger, from Twitter, from CCTV -- that the authorities can utilize to investigate "what happened." There are also probably plenty of people who wish that data would just disappear.

What happens when that actually happens? How can we ensure that important digital information is preserved? Those were the questions asked in an Op-Ed in Sunday's The New York Times. Kari Kraus, an assistant professor in the College of Information Studies and the English department at the University of Maryland, makes a strong case for why "digitization" isn't really the end-of-the-road when it comes to preservation.

"For all its many promises, digital storage is perishable, perhaps even more so than paper. Disks corrode, bits "rot" and hardware becomes obsolete.

But that doesn't mean digital preservation is pointless: if we're going to save even a fraction of the trillions of bits of data churned out every year, we can't think of digital preservation in the same way we do paper preservation. We have to stop thinking about how to save data only after it's no longer needed, as when an author donates her papers to an archive. Instead, we must look for ways to continuously maintain and improve it. In other words, we must stop preserving digital material and start curating it.


She points to the efforts made to curate and preserve video games, something that highlights the struggles of not just saving the content -- the games -- but the technology -- NES cartridges, for example, as well as the gaming systems themselves. "It might seem silly to look to video-game fans for lessons on how to save our informational heritage, but in fact complex interactive games represent the outer limit of what we can do with digital preservation." By figuring out the complexities around preserving this sort of material -- a game, a console, for example -- we can get a better sense of how to develop systems to preserve other things, whether it's our Twitter archives, digital maps of London, or genetic data.

Got data news?

Send me an email.

Reposted bycheg00 cheg00

May 05 2011

Interactive mapping and open data illustrate excess federal property

Last week, I reported how open source tools make mapping easier. Yesterday, the White House showed how open data can be visualized in a massive new interactive feature posted at The map was published as the White House proposed legislation to create an independent commission to identify civilian properties that can be sold, closed or destroyed.

The interactive map of excess federal property is beautiful, fast, and it shows the location of approximately half of the 14,000 buildings and structures that are currently designated as excess by the White House. Many structures from the Department of Defense are not mapped, given national security concerns.

As the biggest property owner in the United States, the federal government has an immense amount of data about its holdings. By mapping out the locations, the White House has taken the step of not only putting open data to good use, but also educating online visitors about just how much property is out there.

For those that wish to download the dataset themselves, the White House has made it available as a zipped .csv file. The White House also released an infographic that provides a static look at the data.

Click to enlarge

As USA Today reported, however, that while identifying surplus buildings is a step toward greater transparency, knowing what can be sold won't be so easy:

A USA TODAY analysis shows that just 82 of the 12,218 surplus properties have been identified as candidates to sell. That's partly because the federal data are from 2009, and many might have already been sold, said Danny Werfel, the OMB's controller.

In other words, actually divesting government of the excess property will be harder than mapping it. Financial, legal and political roadblocks will persist. That said, with bipartisan support there might be billions of dollars in maintenance out there that could be saved. And with the release of open government data in structured form, civic developers can work with it. On balance, that's a public good.

The rapid evolution of tools for mapping open data is an important trend for the intersection of data, new media, citizens and society. Whether it's mapping issues, mapping broadband access or mapping crisis data, geospatial technology is giving citizens and policy makers new insight into the world we inhabit.


February 17 2011

Broadband availability and speed visualized in new government map

Today, the United States Department of Commerce's National Telecommunications and Information Administration (NTIA) unveiled a new National Broadband Map, which can be viewed at

The map includes more than 25 million searchable records and it incorporates crowdsourced reporting. Built entirely upon Wordpress, the map is also one of the largest implementations of open source and open data in government to date.

Importantly, the data behind the map shows that despite an increase in broadband adoption to 68%, a digital divide persists between citizens who have full access to the rich media of the 2011 Internet and those who are limited by geography or means.


The launch of a national map of broadband Internet access fulfills a Congressional mandate created by the 2009 federal stimulus, which directed regulators to collect better data to show which communities have broadband access — and which do not. The National Broadband Map is searchable, right down to individual census block.

"Broadband is as vital and transformative today as electricity was in the 20th century," said FCC chairman Julius Genachowski in a press briefing today. "Millions live in areas where they can't get access even if they want it." Genachowski asserted that extending broadband access to the nearly one third of Americans still without high-speed Internet is essential for the United States to remain globally competitive. "

The FCC chairman also noted that the release of the map was not only important in terms of what it could tell legislators and regulators but that it was "also part of making government more open and participatory," with respect to how it used technology to help citizens drive solutions.

As Anne Neville, director of the State Broadband Initiative at the NTIA, explains in the first post on the Broadband Map blog, crowdsourcing will be an important part of gathering more data. Wherever broadband speed data isn't available, the Commerce Department wants people to submit reports using the speed test apps. By reporting dead zones, citizens can add further results to the database of more 2 million reports that have already been filed.

The creators of the map showed some social media savvy by providing a short URL for maps, like, and creating the @BroadbandMap Twitter account (though the account hadn't sent any tweets at the time this post went live).

The designers of the map said during the press briefing that it embodied "the spirit of the Internet through open formats and protocols." Specifically, the National Broadband Map was built on the LAMP stack (Linux, Apache, MySQL and PHP) familiar to open source developers everywhere. Currently, the site has around 35 RESTful APIs that will enable developers to write applications to look at specific providers. The open government data behind the National Broadband Map can also be downloaded for anyone to use. According to Commerce Department officials, this data will be updated twice a year over the next five years.

Responding to reporters' questions on how the new map might be used by regulators, NTIA administrator Lawrence E. Strickling said that the National Broadband Map will be of great use to all manner of people, particularly for those interested in economic development. There is "nothing about our map that dictates that it will be regulatory," he noted.

That said, at least one visualization from the online gallery could certainly be used to direct more truth in advertising: a comparison of advertised broadband speed vs actual speed shown in testing.

The FCC chairman and other staff have also indicated that a national map of broadband access will enable policy makers to better target resources toward bringing more people online, particularly if Universal Service Fund reform allows for increased funding of rural broadband. While data from more than 600 broadband providers is factored into the map, there's still more that civic developers might do in working with user-submitted data and government data to show how much choice consumers have in broadband access in a specific area.

Given that access to the Internet has become increasingly important to economic, educational, professional and commercial opportunities, understanding who has it and who doesn't is an important component of forming better public policy. Whether the United States government is able to successfully provide broadband access to all Americans through a combination of public and private partnerships and open spectrum reallocation is one of the central challenges of the moment.

February 16 2011

Google Public Data Explorer goes public

The explosion of data has created important new roles for mapping tools, data journalism and data science. Today, Google made it possible for the public to visualize data in the Google Public Data Explorer.

Uploading a dataset is straightforward. Once the data sets have been uploaded, users can easily link to them or embed them. For instance, embedded below is a data visualization of unemployment rates in the continental United States. Click play to watch it change over time, with the expected alarming growth over the past three years.

As Cliff Kuang writes at Fast Company's design blog, Google infographic tools went online after the company bought the Gapminder Trendalizer, the data visualization technology invented by Dr. Hans Rosling.

Google Public Data Explorer isn't the first big data visualization app to go online, as Mike Melanson pointed out over at ReadWriteWeb. Sites like Factual, CKAN, InfoChimps and Amazon's Public Data Sets are also making it easier for people to work with big data


Of note to government agencies: Google is looking for partnerships with "official providers" of public data, which can request to have their datasets appear in the Public Data Explorer directory.

In a post on Google's official blog, Omar Benjelloun, technical lead of Google's public data team, wrote more about Public Data Explorer and the different ways that the search giant has been working with public data:

Together with our data provider partners, we've curated 27 datasets including more than 300 data metrics. You can now use the Public Data Explorer to visualize everything from labor productivity (OECD) to Internet speed (Ookla) to gender balance in parliaments (UNECE) to government debt levels (IMF) to population density by municipality (Statistics Catalonia), with more data being added every week.

Google also introduced a new metadata format, the Dataset Publishing Language (DSPL). DSPL is an XML-based format that Google says will support rich, interactive visualizations like those in the Public Data Explorer.

For those interested, as is Google's way, they have created a helpful embeddable document that explains how to use Public Data Explorer:

And for those interested in what democratized data visualization means to journalism, check out Megan Garber's thoughtful article at the Nieman Journalism Lab.

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.
No Soup for you

Don't be the product, buy the product!

YES, I want to SOUP ●UP for ...