Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

February 23 2012

Strata Week: Infochimps makes a platform play

Here are a few of the data stories that caught my attention this week.

Infochimps makes its big data expertise available in a platform

The big data marketplace Infochimps announced this week that it will begin offering the platform that it's built for itself to other companies — as both a platform-as-a-service and an on-premise solution. "The technical needs for Infochimps are pretty substantial," says CEO Joe Kelly, and the company now plans to help others get up-to-speed with implementing a big data infrastructure.

Infochimps has offered datasets for download or via API for a number of years (see my May 2011 interview with the company here), but the startup is now making the transition to offer its infrastructure to others. Likening its big data marketplace to an "iTunes for data," Infochimps says it's clear that we still need a lot more "iPods" in production before most companies are able to handle the big data deluge.

Infochimps will now offer its in-house expertise to others. That includes a number of tools that one might expect: AWS, Hadoop, and Pig. But it also includes Ironfan, Infochimps' management tool built on top of Chef.

Infochimps isn't abandoning the big data marketplace piece of its business. However, its move to support companies with their big data efforts is indication there's still quite a bit of work to do before everyone's quite ready to "do stuff" with the big data we're accumulating.

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.



Save 20% on registration with the code RADAR20

How do you anonymize online publications?

A fascinating piece of research is set to to appear at IEEE S&P on the subject of Internet-scale authorship identification based on "stylometry," which is an analysis of writing style. The paper was co-authored by Arvind Narayanan, Hristo Paskov, Neil Gong, John Bethencourt, Emil Stefanov, Richard Shin and Dawn Song. They've been able to correctly identify writers 20% of the time based on looking at what they've published online before. It's a finding with serious implications for online anonymity and free speech, the team notes.

"The good news for authors who would like to protect themselves against de-anonymization is it appears that manually changing one's style is enough to throw off these attacks," says Narayanan.

Open data for the public data

O'Reilly Media has just published a report on "Data for the Public Good." In the report, Alex Howard makes the argument for a systemic approach to thinking about open data and the public sector, examining the case for a "public good" around public data as well as around governmental, journalistic, healthcare, and crisis situations (to name but a few scenarios and applications).

Howard notes that the success of recent open data initiatives "won't depend on any single chief information officer, chief executive or brilliant developer. Data for the public good will be driven by a distributed community of media, nonprofits, academics and civic advocates focused on better outcomes, more informed communities and the new news, in whatever form it is delivered." Although many municipalities have made the case for open data initiatives, there's more to the puzzle, Howard argues, including recognizing the importance of personal data and making the case for a "hybridized public-private data."

The "Data for the Public Good" report is available for free as a PDF, ePUB, or MOBI download.

Got data news?

Feel free to email me.

Related:

August 16 2010

Stepping it up with Transit Score

Where you live has a huge impact on how much you drive. If your neighborhood has easy access to public transportation or there are a lot of amenities nearby, you can walk more and drive less (thus saving money while getting a little exercise). Front Seat's Walk Score has become a well-known metric for determining a place's walkability (Radar post). However, this only told a fraction of the story. How walkable a place tells you very little about the public transportation options. Today Front Seat is releasing Transit Score, a measure of how accessible public transportation is at a given location, and Commute Reports, that let you determine your commuting options.

To use Transit Score, just search for a location on the WalkScore site. Below the map (that shows all of the local amenities) you'll find your overall score. So the neighborhood of Capitol Hill in Seattle has a great Walk Score of 95 and an iffyTransit score of 71.

transit score

However, it's really all one's personal needs and commute. An "iffy" Transit Score can be just fine if those bus lines go right to where you work. If you click on the commute tab you can figure out what your options are. Techies living in Capitol Hill that work in Redmond, WA (a common commute that I experienced in a former lifel) have multiple bus and biking options:

transit score commute

There is an API for Transit Score and it is already being used by Zip Realty (a launch partner). The Walk Score API currently does 3 million requests per day. Both those APIs reside on Google App Engine so those 3 million requests only cost them $10 per day.

agencies without open dataTransit Score is only available in cities ( 114 agencies in total) that release their data in the GTFS format (General Transit Feed -- that G used to stand for Google). They suck in the normalized data from the handy GTFS Data Exchange. Through trial and error Front Seat has learned to only use data from cities that are reaching out to developers. There was a lot of transit data released via FOIAs, but it was very messy, out-of-date and error-ridden. Front Seat is currently waiting for 695 transit agencies to release their data. This wall of government shame is kept on City-Go-Round, a marketplace for transportation apps.

seattle transit heat mapFront Seat is very open about how they calculate Transit Score (and Walk Score for that matter). The algorithm uses route frequency (this is a proxy for location; more frequent equals more important e.g. buses come more often downtown), type (rail is deemed better than bus) and route distance. The heat map to the left shows Seattle's transit

Front Seat is a Seattle-based civic software company. They make money off ads and a pro-version of the Walk Score API. Transit Score was funded by a grant from The Rockefeller Foundation. Their many projects back-up the claim of being civic-oriented.

Transit Score is a great example of why government agencies should open their data. Citizens can make better decisions when they have the data.

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl