Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

September 20 2011

BuzzData: Come for the data, stay for the community

BuzzDataAs the data deluge created by the activities of global industries accelerates, the need for decision makers to find a signal in the noise will only grow more important. Therein lies the promise of data science, from data visualization to dashboard to predictive algorithms that filter the exaflood and produce meaning for those who need it most. Data consumers and data producers, however, are both challenged by "dirty data" and limited access to the expertise and insight they need. To put it another way, if you can't derive value, as Alistair Croll has observed here at Radar, there's no such thing as big data.

BuzzData, based in Toronto, Canada, is one of several startups looking to help bridge that gap. BuzzData launched this spring with a combination of online community and social networking that is reminiscent of what GitHub provides for code. The thinking here is that every dataset will have a community of interest around the topic it describes, no matter how niche it might be. Once uploaded, each dataset has tabs for tracking versions, visualizations, related articles, attachments and comments. BuzzData users can "follow" datasets, just as they would a user on Twitter or a page on Facebook.

"User experience is key to building a community around data, and that's what BuzzData seems to be set on doing," said Marshall Kirkpatrick, lead writer at ReadWriteWeb, in an interview. "Right now it's a little rough around the edges to use, but it's very pretty, and that's going to open a lot of doors. Hopefully a lot of creative minds will walk through those doors and do things with the data they find there that no single person would have thought of or been capable of doing on their own."

The value proposition that BuzzData offers will depend upon many more users showing up and engaging with one another and, most importantly, the data itself. For now, the site remains in limited beta with hundreds of users, including at least one government entity, the City of Vancouver.

"Right now, people email an Excel spreadsheet around or spend time clobbering a shared file on a network," said Mark Opauszky, the startup's CEO, in an interview late this summer. "Our behind-the-scenes energy is focused on interfaces so that you can talk through BuzzData instead. We're working to bring the same powerful tools that programmers have for source code into the world of data. Ultimately, you're not adding and removing lines of code — you're adding and removing columns of data."

Opauszky said that BuzzData is actively talking with data publishers about the potential of the platform: "What BuzzData will ultimately offer when we move beyond a minimum viable product is for organizations to have their own territory in that data. There is a 'brandability' to that option. We've found it very easy to make this case to corporations, as they're already spending dollars, usually on social networks, to try to understand this."

That corporate constituency may well be where BuzzData finds its business model, though the executive team was careful to caution that they're remaining flexible. It's "absolutely a freemium model," said Opauszky. "It's a fundamentally free system, but people can pay a nominal fee on an individual basis for some enhanced features — primarily the ability to privatize data projects, which by default are open. Once in a while, people will find that they're on to something and want a smaller context. They may want to share files, commercialize a data product, or want to designate where data is stored geographically."

Strata Conference New York 2011, being held Sept. 22-23, covers the latest and best tools and technologies for data science — from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.

Save 30% on registration with the code ORM30


Open data communities

"We're starting to see analysis happen, where people tell 'data stories' that are evolving in ways they didn't necessarily expect when they posted data on BuzzData," said Opauszky. "Once data is uploaded, we see people use it, fork it, and evolve data stories in all sorts of directions that the original data publishers didn't perceive."

For instance, a dataset of open data hubs worldwide has attracted a community that improved the original upload considerably. BuzzData featured the work of James McKinney, a civic hacker from Montreal, Canada, in making it so. A Google Map mashing up locations is embedded below:


The hope is that communities of developers, policy wonks, media, and designers will self-aggregate around datasets on the site and collectively improve them. Hints of that future are already present, as open government advocate David Eaves highlighted in his post on open source data journalism at BuzzData. As Eaves pointed out, it isn't just media companies that should be paying attention to the trends around open data journalism:

For years I argued that governments — and especially politicians — interested in open data have an unhealthy appetite for applications. They like the idea of sexy apps on smart phones enabling citizens to do cool things. To be clear, I think apps are cool, too. I hope in cities and jurisdictions with open data we see more of them. But open data isn't just about apps. It's about the analysis.

Imagine a city's budget up on BuzzData. Imagine the flow rates of the water or sewage system. Or the inventory of trees. Think of how a community of interested and engaged "followers" could supplement that data, analyze it, and visualize it. Maybe they would be able to explain it to others better, to find savings or potential problems, or develop new forms of risk assessment.

Open data journalism

"It's an interesting service that's cutting down barriers to open data crunching," said Craig Saila, director of digital products at the Globe and Mail, Canada's national newspaper, in an interview. He said that the Globe and Mail has started to open up the data that it's collecting, like forest fire data, at the Globe and Mail BuzzData account.

"We're a traditional paper with a strong digital component that will be a huge driver in the future," said Saila. "We're putting data out there and letting our audiences play with it. The licensing provides us with a neutral source that we can use to share data. We're working with data suppliers to release the data that we have or are collecting, exposing the Globe's journalism to more people. In a lot of ways, it's beneficial to the Globe to share census information, press releases and statistics."

The Globe and Mail is not, however, hosting any information there that's sensitive. "In terms of confidential information, I'm not sure if we're ready as a news organization to put that in the cloud," said Saila. "Were just starting to explore open data as a thing to share, following the Guardian model."

Saila said that he's found the private collaboration model useful. "We're working on a big data project where we need to combine all of the sources, and we're trying to munge them all together in a safe place," he said. "It's a great space for journalists to connect and normalize public data."

The BuzzData team emphasized that they're not trying to be another data marketplace, like Infochimps, or replace Excel. "We made an early decision not to reinvent the wheel," said Opauszky, "but instead to try to be a water cooler, in the same way that people go to Vimeo to share their work. People don't go to Flickr to edit photos or YouTube to edit videos. The value is to be the connective tissue of what's happening."

If that question about "what's happening?" sounds familiar to Twitter users, it's because that kind of stream is part of BuzzData's vision for the future of open data communities.

"One of the things that will become more apparent is that everything in the interface is real time," said Opauszky. "We think that topics will ultimately become one of the most popular features on the site. People will come from the Guardian or the Economist for the data and stay for the conversation. Those topics are hives for peers and collaborators. We think that BuzzData can provide an even 'closer to the feed' source of information for people's interests, similar to the way that journalists monitor feeds in Tweetdeck."

Related:

September 08 2011

Strata Week: MapReduce gets its arms around a million songs

Here are some of the data stories that caught my attention this week.

A millions songs and MapReduce

Million Song DatasetEarlier this year, Echo Nest and LabROSA at Columbia University released the Million Song Dataset, a freely available collection of audio and metadata for a million contemporary popular music tracks. The purpose of the dataset, among other things, was to help encourage research on music algorithms. But as Paul Lamere, director of Echo Nest's Developer Platform, makes clear, getting started with the dataset can be daunting.

In a post on his Music Machinery blog, Lamere explains how to use Amazon's Elastic MapReduce to process the data. In fact, Echo Nest has loaded the entire Million Song Dataset onto a single S3 bucket, available at http://tbmmsd.s3.amazonaws.com/. The bucket contains approximately 300 files, each with data on about 3,000 tracks. Lamere also points to a small subset of the data — just 20 tracks — available in a file on GitHub, and he also created track.py to parse track data and return a dictionary containing all of it.

Strata Conference New York 2011, being held Sept. 22-23, covers the latest and best tools and technologies for data science — from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.

Save 30% on registration with the code ORM30

GPS steps in where memory fails

Garmin 305 GPS deviceAfter decades of cycling without incident, The New York Times science writer John Markoff experienced what every cyclist dreads: a major crash, one that resulted in a broken nose, a deep gash on his knee, and road rash aplenty. He was knocked unconscious by the crash, unable to remember what had happened to cause it. In a recent piece in the NYT, he chronicled the steps he took to reconstruct the accident.

He did so by turning to the GPS data tracked by the Garmin 305 on his bicycle. Typically, devices like this are utilized to track the distance and location of rides as well as a cyclist's pedaling and heart rates. But as Markoff investigated his own crash, he found that the data stored in these types of devices can be use to ascertain what happens in cycling accidents.

In investigating his own memory-less crash, Markoff was able to piece together data about his trip:

My Garmin was unharmed, and when I uploaded the data I could see that in the roughly eight seconds before I crashed, my speed went from 30 to 10 miles per hour — and then 0 — while my heart rate stayed a constant 126. By entering the GPS data into Google Maps, I could see just where I crashed. I realized I did have several disconnected memories. One was of my hands being thrown off the handlebars violently, but I had no sense of where I was when it happened. With a friend, Bill Duvall, who many years ago also raced for the local bike club Pedali Alpini, I went back to the spot. La Honda Road cuts a steep and curving path through the redwoods. Just above where the GPS data said I crashed, we could see a long, thin, deep pothole. (It was even visible in Google's street view.) If my tire hit that, it could easily have taken me down. I also had a fleeting recollection of my mangled dark glasses, and on the side of the road, I stooped and picked up one of the lenses, which was deeply scratched. From the swift deceleration, I deduced that when my hands were thrown from the handlebars, I must have managed to reach my brakes again in time to slow down before I fell. My right hand was pinned under the brake lever when I hit the ground, causing the nasty road rash.

It's one thing for a rider to reconstruct his own accident, but Markoff says insurance companies are also starting to pay attention to this sort of data. As one lawyer notes in the Times article, "Frankly, it's probably going to be a booming new industry for experts."

Crowdsourcing and crisis mapping from WWI

The explosion of mobile, mapping, and web technologies has facilitated the rise of crowdsourcing during crisis situations, giving citizens and NGOs — among others — the ability to contribute to and coordinate emergency responses. But as Patrick Meier, director of crisis mapping and partnerships at Ushahidi has found, there are examples of crisis mapping that pre-date our Internet age.

Meier highlights maps he discovered from World War I at the National Air and Space Museum, pointing to the government's request for citizens to help with the mapping process:

In the event of a hostile aircraft being seen in country districts, the nearest Naval, Military or Police Authorities should, if possible, be advised immediately by Telephone of the time of appearance, the direction of flight, and whether the aircraft is an Airship or an Aeroplane.

And he asks a number of very interesting questions: How often were these maps updated? What sources were used? And "would public opinion at the time have differed had live crowdsourced crisis maps existed?"

Got data news?

Feel free to email me.

Related:

August 26 2011

Visualization of the Week: Social media and the UK riots

The recent violence in the U.K. has led some in the British government to propose banning people from social networks during times of civic unrest. Prime Minister David Cameron told an emergency session of Parliament:

Everyone watching these horrific actions will be struck by how they were organised via social media. We are working with the police, the intelligence services and industry to look at whether it would be right to stop people communicating via these websites and services when we know they are plotting violence, disorder and criminality.

But The Guardian has analyzed some 2.5 million tweets relating to the events in the U.K., and the newspaper's findings challenge the government's contention that Twitter and other social networks were used to organize violence.

More than 206,000 tweets — about 8% of the total — focused on coordinating clean-up efforts following the rioting and looting.

In the interactive visualization, you can see the relationship in the various communities where outbreaks of violence occurred between the events themselves and the surge of Twitter activity. In the majority of cases, Twitter activity increased after, not before, the violence.

Screenshot from the Guardian's Twitter riot visualization
Twitter traffic during U.K. riots. Click to see the full interactive visualization

The Guardian says it will continue to examine this database in the coming weeks, just as the British government continues its inquiries into the relationship between social media and violence.

Found a great visualization? Tell us about it

This post is part of an ongoing series exploring visualizations. We're always looking for leads, so please drop a line if there's a visualization you think we should know about.

Strata Conference New York 2011, being held Sept. 22-23, covers the latest and best tools and technologies for data science — from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.

Save 30% on registration with the code STN11RAD



More Visualizations:


Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl