Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

April 21 2011

Data News: Week in Review

The Where 2.0 Conference was held April 19 - 21 in Santa Clara, Calif., so it's no surprise there were plenty of location-based developments to talk about this week in the data space. Here are a few of the data stories — place-based and otherwise — that caught my eye.

Your iPhone tracks your location

iPhone trackOn Wednesday, Pete Warden and Alasdair Allan made headlines with the story of their discovery of an iPhone file that tracks its owner's location. The iPhone appears to use cell-tower triangulation to periodically record user's latitude and longitude, storing the data in a file that lives on the iPhone and is transferred to a user's computer when the device is synced.

According to their research, the file appears to be part of iOS 4 update, as that's the point from which the recordings start. While the existence of the file raises some questions — what are Apple's plans for this data — more disconcerting may be that the file is unencrypted, leaving this trove of location data stored locally but unprotected. Apple doesn't transmit the data, it appears, but no other device seems to have a comparable file, according to Warden and Allan.

While there are questions about privacy and security here, the data is quite compelling, thanks in no small part to the iPhone Tracker tool Warden and Allan have built that will read this file on a user's computer and visualize their movements. Your phone has surreptitiously been tracking you, but the maps replay a fascinating and fairly accurate record of where you've travelled since June 2010.

Crowdsourced data versus "real statistics"

Ushahidi co-founder Eric Hersman wrote a strong defense of crowdsourced data this week in his post, "The Immediacy of the Crowd." His blog post served as a response to one that appeared last month on the social enterprise organization Benetech's blog. The title of the latter post -- "Crowdsourced data is not a substitute for real statistics" — probably demonstrates immediately why Ushahidi would object.

The Benetech post (along with a subsequent Fast Company article) suggests that crowdsourced data from mobile phones and SMS can "lead rescue teams in the wrong direction" and that that data might not be good for statistical analysis or modeling.

On one hand, this is an interesting and important academic debate here. Which is better, crowdsourced data or statistical patterns? Are there patterns in crowdsourced data that we can use, in aggregate or as predictions in real time?

But the back and forth between the blogs, as Hersman observes in his post, overlooks an important element: Crisis response is messy and hardly a "clinical environment where we all get to sit back, sift data and take our time to make a decision."

U.S. Senate finally releases its financial data ... in PDF

It's been almost two years since the U.S. Senate agreed to make the official record of its expenditures publicly available online. This week the Senate finally revealed its plan to release the information. According to the Sunlight Foundation, the Senate will begin to release records in November. This will cover the period from April to September.

But the data will be in PDF format. As the Sunlight Foundation notes with dismay:,

The legislation was rather clearly intended to create the release of actual data, not data in the difficult-to-reuse form of a paper document. Unfortunately, PDF documents can meet the standard of searchable (as long as the text is exposed), and itemized (if the items are listed), so the Senate is getting by on a technicality, and reaching for the lowest common denominator.

How do we demand more accessible, structured datasets? Or, how do we challenge the PDF?

Got data news?

Feel free to email me.


April 14 2011

Data News: Week in Review

Here are a few of the data stories that caught my eye this week.

Your personal data analyzed (at a genetic level)

23andMePersonal genomics company 23andMe made its DNA test available for free (sort of) on Tuesday of this week. Want to know if you have the genetic markers that may predispose you to heart disease, alcoholism, or breast cancer? A free test is hard to pass up.

This up-close look at your personal cellular data does come with certain strings attached: you have to sign up for 23andMe's $9-a-month Personal Genome Service. That brings the total cost to more than $100 a year.

Nevertheless, this latest push is another win for 23andMe, a Silicon Valley startup that is offering DNA analysis as a retail product, not simply a medical service. That's an important distinction. The move by 23andMe to give this data to "consumers" — and not just "patients" — signals a shift in the way we think about our medical information and our personal, chromosomal data. It also raises some big questions: Does this mean our genomic data has become a commodity? And if so, how much do we control the access, sale, and potential profit?

Hacking education with data

DonorsChooseAccording to a recent Brookings Institution survey, Americans want more data about their local schools. But despite the best efforts of open data projects, that information is still quite limited: census data, test scores, and the like.

The situation could improve with the announcement that the education non-profit DonorsChoose is opening its data to developers for a Hacking Education contest. DonorsChoose, which acts as a Kickstarter of sorts for education, gives teachers a platform to pitch their projects and their classroom needs. Some 165,000 teachers in more than 43,000 public schools have submitted 300,000-plus projects, and in turn have inspired around $80 million in charitable giving.

All that data — the types of projects, the amount of funding, the resources requests, the types of schools, donors' search strings, donors' financial commitment — is being made available via the DonorsChoose contest. In addition to analysis of the data, the non-profit is also seeking developers to build apps based on its API.

The grand prize? A trophy. But it's awarded by Stephen Colbert and includes tickets for you and three friends to see a taping of "The Colbert Report."

Cloudera releases a new version of Hadoop

ClouderaCloudera, one of the primary contributors to Apache Hadoop, has released a new version of its Hadoop distribution this week. Version 3 (CDH3) contains more than 1,000 patches and changes, many of which will be contributed back to the open source Hadoop project.

While Hadoop's big data management is free and open source, Cloudera makes its money selling enterprise support. Much of the coverage of this latest version focused on Cloudera's position as the leader in this space. GigaOm's Derrick Harris says that:

CDH3 is a big reason that, despite a recent spate of Hadoop-based big data products either on the market or about to be there, Cloudera says it isn't sweating all the new competition. Another is that Cloudera doesn't think competitive vendors have what it takes to cut into Cloudera's business.

Got data news?

Suggestions and stories are always welcome, so feel free to contact me with ideas.


Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!