Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

May 22 2012

Quantified me

For some reason I have an aversion to the quantified self terminology. I guess I'm suspicious of excessive overt tracking of stuff that I hope to make into unconscious habit. It probably goes back to when I used to be a runner. I ran a couple of marathons and I would of course log every run and used upcoming races to motivate my training. I ran with a pulse monitor and used the real-time feedback to adjust my pace to the intention of each training session.

I was incredibly disciplined about my training right up until I stopped improving. Once I plateaued I just couldn't stick with it. I experienced a similar pattern with biking, rowing, yoga, and everything else I tried. Train hard, track everything, plateau, quit.

Then a few years ago I read about a study that looked at motivation and it made the point that sometimes leaving things open ended actually improves our ability to stick with it. I've been looking for that study for two years but can't find it again. It has stuck in my head though and fundamentally changed how I think about things. It's made much more skeptical of the value of competitions and other goals in achieving long-term fitness. And something is different for me now because I've been doing CrossFit for three years without quitting. Of course, it might just be that I haven't plateaued yet. But I also think nurturing an open-ended mindset has helped.

Having plateaued and quit so many times I guess I'm just skeptical of the value of tracking the minutia of my exercise life. I wouldn't have known I plateaued if I hadn't tracked the data after all.

So not too long ago when Sara Winge forwarded me a link to an article on the "datasexual" with the subject line "You've been memed" I was taken aback. "Me? I don't track stuff. I don't own a Fitbit. In fact, I'm a huge skeptic of the value of all this stuff. To me it seems too much like putting the cart of technology before the horse of just doing the work." But then I thought about it honestly and I had to admit it. Who am I kidding? I'm an obsessive tracker.

I track every Crossfit workout on Beyond The Whiteboard. I started a paleo / ancestral health diet in December and I use a kitchen scale to measure portions. I kept a journal of every meal for three months and when that got cumbersome I started taking a picture of them with my phone. I do it to encourage consciousness of what I'm eating and to make sure I'm keeping my macronutrient balance where it should be. I weigh myself at least three times each week and log weight, waist, and neck measurements each time to estimate body fat.

Quantiifed data

Not too long ago after I rowed what felt like a fast 2k during a crossfit workout I dug up my old logs from the '90s to see how it compared to the twenty-something me (slower of course, but not awful). I still had those logs and knew where to find them.

From there it gets more obsessive. Once I changed my eating habits I started getting a full lipid panel and other tests every three months to assess the impact of my new high fat / low carb diet (I get over 2/3 of calories from fats now). The next time around I plan to add tests for inflammation markers and a few other things.

I wasn't happy with my doctor only being able to order fasting blood sugar though, so I bought a glucometer and started monitoring my own real-time blood sugar. I measure fasting and +1, +2, and +3 hour postprandial glucose levels after various meals to evaluate my insulin response and to better tune my diet. I also occasionally measure pre- and post-workout glucose levels to optimize when to workout relative to mealtime.

Periodic at home A1c tests verify that my long-term glucose levels are in keeping with what I'm measuring in real time — as a correlation to verify test accuracy and to help me interpret the short-term results. Oh, and I ordered a 23andMe test kit to see (among other things) if I have any genetic disposition to diabetes.

So, I guess I have to admit it. Quantifying the self isn't just something other people do, it's something I do. Yet I remain a skeptic.

The line I'm trying to walk is between obsessive tracking that results in post-plateau burnout and using tracking to maintain awareness and intention while trying to remain open ended. "Maybe I'll work out today." "Maybe I'll lose a few pounds, or maybe I'll gain a few." But at the same time I want to take advantage of the awareness that comes from tracking. More importantly, I want to know what the data says about how healthy I am. A degradation in insulin response wouldn't just be a problem with a plateauing exercise program after all, it would have major long-term health impact.


April 01 2012

What is smart disclosure?

Citizens generate an enormous amount of economically valuable data through interactions with with companies and government. Earlier this year, a report from the World Economic Forum and McKinsey Consulting described the emergence of personal data as of a new asset class." The value created from such data does not , however, always go to the benefit of consumers, particularly when third parties collect it, separating people from their personal data.

The emergence of new technologies and government policies has provided an opportunity to both empower consumers and create new markets from "smarter disclosure" of this personal data. Smart disclosure is when a private company or government agency provides a person with periodic access to his or her own data in open formats that enable them to easily put the data to use. Specifically, smart disclosure refers to the timely release of data in standardized, machine readable formats in ways that enable consumers to make better decisions about finance, healthcare, energy or other contexts.

Smart disclosure is "a new tool that helps provide consumers with greater access to the information they need to make informed choices," wrote Cass Sunstein, the U.S. administrator of the White House Office of Information and Regulatory Affairs (OIRA), in a post on smart disclosure on the White House blog. Sunstein delivered a keynote address at the White House Summit on smart disclosure at the U.S. National Archives on Friday. He authored a memorandum providing  guidance on smart disclosure guidance from OIRA in September 2011.

Smart disclosure is part of the final United States National Action Plan for its participation in the Open Government Partnership." Speaking at the launch of the Open Government Partnership in New York City last September, the president specifically referred to the role of smart disclosure in the United States:

"We’ve developed new tools -- called 'smart disclosures' -- so that the data we make public can help people make health care choices, help small businesses innovate, and help scientists achieve new breakthroughs," said President Obama. "We’ve been promoting greater disclosure of government information, empowering citizens with new ways to participate in their democracy," said President Obama. "We are releasing more data in usable forms on health and safety and the environment, because information is power, and helping people make informed decisions and entrepreneurs turn data into new products, they create new jobs."

In the months since the announcement, the U.S. National Science and Technology Council established a smart disclosure task force dedicated to promoting better policies and implementation across government.

"In many contexts, the federal government uses disclosure as a way to ensure that consumers know what they are purchasing and are able to compare alternatives," wrote Sunstein at the White House blog. "Consider nutrition facts labels, the newly designed automobile fuel economy labels, and  Modern technologies are giving rise to a series of new possibilities for promoting informed decisions."

Smart disclosure is a "case of the Administration asking agencies to focus on making available high value data (as distinct from traditional transparency and accountability data) for purposes other than decreasing corruption in government," wrote New York Law School professor Beth Noveck, the former U.S. deputy chief technology officer for open government, in an email. "It starts from the premise that consumers, when given access to information and useful decision tools built by third parties using that information, can self-regulate and stand on a more level playing field with companies who otherwise seek to obfuscate." The choice of Todd Park as United States CTO also sends a message about the importance of smart disclosure to the administration, she said.

The United Kingdom's “midata” smart disclosure initiative is an important smart disclosure case study outside of the United States. Progress there has come in large part because the UK has a privacy law that gives citizens the right to access their personal data held by private companies, unlike the United States. In the UK, however, companies have been complying with the law in a way that did not realize the real potential value of that right to data, which is to say that a citizen could request personal data and it would arrive the mail weeks later at a cost of a few dozen pounds. The UK government has launched a voluntary public-private partnership to enable companies to comply with the law by making the data available online in open formats. The recent introduction of the Consumer Privacy Bill of Rights from the White House and Privacy Report from the FTC suggests that such rights to personal data ownership might be negotiated, in principle, much as a right to credit reports have been in the past.

Four categories of smart disclosure

One of the most powerful versions of smart disclosure is when data on products or services (including pricing algorithms, quality, and features) is combined with personal data (like customer usage history, credit score, health, energy and education data) into "choice engines" (like search engines, interactive maps or mobile applications) that enable consumers to make better decisions in context, at the point of a buying or contractual decision. There are four broad categories where smart disclosure applies:

  1. When government releases data about products or services. For instance, when the Department of Health and Human Services releases hospital quality ratings, the Security and Exchange Commission releases public company financial filings in machine-readable formats at, or the Department of Education puts data about more than 7,000 institutions online in a College Navigator for prospective students.
  2. When government releases personal data about a citizen. For instance, when the Department of Veterans Affairs gives veterans access to health records using at the "Blue Button" or the IRS provides citizens with online access to their electronic tax transcript. The work of BrightScope liberating financial advisor data and 401(k) data has been an early signal of how data drives the innovation economy.
  3. When a private company releases information about products or services in machine readable formats. Entrepreneurs can then use that data to empower consumers. For instance, both and Hello Wallet may enhance consumer finance decisions.
  4. When a private company releases personal data about usage to a citizen. For instance, when a power utility company provides a household access to its energy usage data through the Green Button or when banks allowing customers to download their transaction histories in a machine readable format to use at or similar services. As with the Blue Button for healthcare data and consumer finance, the White House asserts that providing energy consumers with secure access to information about energy usage will increase innovation in the sector and empower citizens with more information.

An expanding colorwheel of buttons

Should smart disclosure initiatives continue to gather steam, citizens could see “Blue Button”-like and "Green Button"-like solutions for every kind of data government or industry collects about citizens.  For example, the Department of Defense has military training and experience records. Social Security and the Internal Revenue Service have the historical financial history of citizens, such as earnings and income. The Department of Veterans Affairs and Centers for Medicare and Medicaid Services have personal health records.

More "Green Button"-like mechanisms could enable secure, private access to private industry collects about citizen services. The latter could includes mobile phone bills, credit card fees, mortgage disclosures, mutual fund fee and more, except where there are legal restrictions, as for national security reasons.

Earlier this year, influential venture capitalist Fred Wilson encouraged entrepreneurs and VCs to get behind open data. Writing on his widely read blog, Wilson urged developers to adopt the Green Button.

"This is the kind of innovation that gets me excited," Wilson wrote. "The Green Button is like OAuth for energy data. It is a simple standard that the utilities can implement on one side and web/mobile developers can implement on the other side. And the result is a ton of information sharing about energy consumption and in all likelihood energy savings that result from more informed consumers.

When citizens gain access to data and put it to work, they can tap it to make better choices about everything from finance to healthcare to real estate, much in the same way that Web applications like Hipmunk and Zillow let consumers make more informed decisions.

"I'm a big fan of simplicity and open standards to unleash a lot of innovation," wrote Wilson. "APIs and open data aren't always simple concepts for end users. Green Buttons and Blue Buttons are pretty simple concepts that most consumers will understand. I'm hoping we soon see Yellow Buttons, Red Buttons, Purple Buttons, and Orange Buttons too. Let's get behind these open data initiatives. Let's build them into our apps. And let's pressure our hospitals, utilities, and other institutions to support them."

The next generation of open data is personal data, wrote open government analyst David Eaves this month:

I would love to see the blue button and green button initiative spread to companies and jurisdictions outside the United States. There is no reason why for example there cannot be Blue Buttons on the Provincial Health Care website in Canada, or the UK. Nor is there any reason why provincial energy corporations like BC Hydro or Bullfrog Energy (there's a progressive company that would get this) couldn't implement the Green Button. Doing so would enable Canadian software developers to create applications that could use this data and help citizens and tap into the US market. Conversely, Canadian citizens could tap into applications created in the US.

The opportunity here is huge. Not only could this revolutionize citizens access to their own health and energy consumption data, it would reduce the costs of sharing health care records, which in turn could potentially create savings for the industry at large.

Data drives consumer finance innovation

Despite recent headlines about the Green Button and the household energy data market, the biggest US smart disclosure story of this type is currently consumer finance, where there is already significant private sector activity going on today.

For instance, if a consumer visits, you can get personalized recommendations for a cheaper cell phone plan based on your calling history. will make specific recommendations on how to save (and alternative products to use) based on an analysis of the accounts it is pulling data from. Hello Wallet is enabled by smart disclosure by banks and government data. The sector's success hints at the innovation that's possible when people get open, portable access to their personal data in a a consumer market of sufficient size and value to attract entrepreneurial activity.

Such innovation is enabled in part because entrepreneurs and developers can go directly to data aggregation intermediaries like Yodlee or CashEdge and license the data, meaning that they do not have to strike deals directly with each of the private companies or build their own screen scraping technology, although some do go it alone.

"How do people actually make decisions?  How can data help improve those decisions in complex markets?  Research questions like these in behavioral economics are priorities for both the Russell Sage Foundation and the Alfred P. Sloan Foundation," said Daniel Goroff, a Sloan Program Director, in an interview yesterday.  "That's why we are launching a 'Smart Disclosure Research and Demonstration Design Competition.'  If you have ideas and want to win a prize,  please send a short essay.  Even if you are not in a position to carry out the work, we are especially interested in finding and funding projects that can help measure the costs and benefits of existing or novel 'choice engines.'" 

What is the future of smart disclosure?

This kind of vibrant innovation could spread to many other sectors, like energy, health, education, telecommunication, food and nutrition, if relevant data were liberated. The Green Button is an early signal in this area, with the potential to spread to 27 million households around the United States. The Blue Button, with over 800,000 current users, is spreading to private health plans like Aetna and Walgreens, with the potential to spread to 21 million users.

Despite an increasingly number of powerful tools that enable data journalists and scientists to interrogate data, many of even the most literate consumers do not look at data themselves, particularly if it is in machine-readable, as opposed to human-readable formats. Instead, they digest it from ratings agencies, consumer reports and guides to the best services or products in a given area. Increasingly, entrepreneurs are combining data with applications, algorithms and improved user interfaces to provide consumers with "choice engines."

As Tim O'Reilly outlined in his keynote speech yesterday, the future of smart disclosure includes more than quarterly data disclosure from the SEC or banks. If you're really lining up with the future, you have to think about real-time data and real-time data systems, he said. Tim outlined 10 key lessons his presentation, an annotated version of which is embedded below.

The Future of Smart Disclosure (pdf)
View more presentations from Tim O'Reilly

When released through smart disclosure, data resembles a classic "public good" in a broader economic sense. Disclosures of such open data in a useful format are currently under-produced by the marketplace, suggesting a potential role for government in the facilitation of its release. Generally, consumers do not have access to it today.

Well over a century ago, President Lincoln said that "the legitimate object of government is to do for the people what needs to be done, but which they cannot by individual effort do at all, or do so well, for themselves." The thesis behind smart disclosure in the 21st century is that when consumers have access to that personal data and the market creates new tools to put to work, citizens will be empowered make economic, education and lifestyle choices that enable to them to live healthier, wealthier, and -- in the most aspirational sense -- happier lives.

"Moving the government into the 21st century should be applauded," wrote Richard Thaler, an economics professor at the University of Chicago, in the New York Times last year. In a time when so many citizens are struggling with economic woes, unemployment and the high costs of energy, education and healthcare, better tools that help them invest and benefit from personal data are sorely needed..

Sponsored post

March 15 2012

Strata Week: Infographics for all

Here are some of the data stories that caught my attention this week.

More infographics incoming, thanks to Create

The visualization site launched a new tool this week that helps users create their own infographics. Aptly called Create, the new feature lets people take publicly available datasets (such as information from a Twitter hashtag), select a template, and publish their own infographics. infographic of the #strataconf tag
Segment from a Create infographic of the #stratconf hashtag.

As GigaOm's Derrick Harris observes, it's fairly easy to spot the limitations with this service — in the data you can use, in the templates that are available, and in the visualizations that are created. But after talking to's co-founder and Chief Content Officer Lee Sherman about some "serious customization options" that are in the works, Harris wonders if a tool like this could be something to spawn interest in data science:

"The problem is that we need more people with math skills to meet growing employer demand for data scientists and data analysts. But how do you get started caring about data in the first place when the barriers are so high? Really working with data requires a deep understanding of both math and statistics, and Excel isn't exactly a barrel of monkeys (nor are the charts it creates)."

Could be an on-ramp for more folks to start caring about and playing with data?

San Francisco upgrades its open data initiative

Late last week, San Francisco Mayor Ed Lee unveiled the new, a cloud-based open data website that will replace, one of the earliest examples of civic open data initiatives.

San Francisco Data banner

"By making City data more accessible to the public secures San Francisco's future as the world's first 2.0 City," said Lee in an announcement. "It's only natural that we move our Open Data platform to the cloud and adopt modern open interface to facilitate that flow and access to information and develop better tools to enhance City services."

The city's Chief Innovation Officer Jay Nath told TechCrunch that the update to the website expands access to information while saving the city money.

The new site contains some 175 datasets, including map-based crime data, active business listings, and various financial datasets. It's powered by the Seattle-based data startup Socrata.

The personal analytics of Stephen Wolfram

"One day I'm sure everyone will routinely collect all sorts of data about themselves," writes Mathematica and Wolfram Alpha creator Stephen Wolfram. "But because I've been interested in data for a very long time, I started doing this long ago. I actually assumed lots of other people were doing it too, but apparently they were not. And so now I have what is probably one of the world's largest collections of personal data."

And what a fascinating collection of data it is, including emails received and sent, phone calls made, calendar events planned, keystrokes made, and steps taken. Through this, you can see Wolfram's sleep, social, and work patterns, and even how various chapters of his book and Mathematica projects took shape.

"The overall pattern is fairly clear," Wolfram writes. "It's meetings and collaborative work during the day, a dinnertime break, more meetings and collaborative work, and then in the later evening more work on my own. I have to say that looking at all this data, I am struck by how shockingly regular many aspects of it are. But in general, I am happy to see it. For my consistent experience has been that the more routine I can make the basic practical aspects of my life, the more I am able to be energetic — and spontaneous — about intellectual and other things."

Fluent Conference: JavaScript & Beyond — Explore the changing worlds of JavaScript & HTML5 at the O'Reilly Fluent Conference (May 29 - 31 in San Francisco, Calif.).

Save 20% on registration with the code RADAR20

Got data news?

Feel free to email me.


February 09 2012

Strata Week: Your personal automated data scientist

Here are a few of the data stories that caught my attention this week:

Wolfram|Alpha Pro: An on-call data scientist

The computational knowledge engine Wolfram|Alpha unveiled a pro version this week. For $4.99 per month ($2.99 for students), Wolfram|Alpha Pro offers access to more of the computational power "under the hood" of the site, in part by allowing users to upload their own datasets, which Wolfram|Alpha will in turn analyze.

This includes:

  • Text files — Wolfram|Alpha will respond with the character and word count, provide an estimate on how long it would take to read aloud, and reveal the most common word, average sentence length and more.
  • Spreadsheets — It will crunch the numbers and return a variety of statistics and graphs.
  • Image files — It will analyze the image's dimensions, size, and colors, and let you apply several different filters.

Wolfram Alpha Pro example
Wolfram|Alpha Pro subscribers can upload and analyze their own datasets.

There's also a new extended keyboard that contains the Greek alphabet and other special characters for manually entering data. Data and analysis from these entries and any queries can also be downloaded.

"In a sense," writes Wolfram's founder Stephen Wolfram, "the concept is to imagine what a good data scientist would do if confronted with your data, then just immediately and automatically do that — and show you the results."

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

Crisis-mapping and data protection standards

Ushahidi's Patrick Meier takes a look at the recently released Data Protection Manual issued by the International Organization for Migration (IOM). According to the IOM, the manual is meant to serve as a guide to help:

" ... protect the personal data of the migrants in its care. It follows concerns about the general increase in data theft and loss and the recognition that hackers are finding ever more sophisticated ways of breaking into personal files. The IOM Data Protection Manual aims to protect the integrity and confidentiality of personal data and to prevent inappropriate disclosure."

Meier describes the manual as "required reading" but notes that there is no mention of social media in the 150-page document. "This is perfectly understandable given IOM's work," he writes, "but there is no denying that disaster-affected communities are becoming more digitally-enabled — and thus, increasingly the source of important, user-generated information."

Meier moves through the Data Protection Manual's principles, highlighting the ones that may be challenged when it comes to user-generated, crowdsourced data and raising important questions about consent, privacy, and security.

Doubting the dating industry's algorithms

Many online dating websites claim that their algorithms are able to help match singles with their perfect mate. But a forthcoming article in "Psychological Science in the Public Interest," a journal of the Association for Psychological Science, casts some doubt on the data science of dating.

According to the article's lead author Eli Finkel, associate professor of social psychology at Northwestern University, "there is no compelling evidence that any online dating matching algorithm actually works." Finkel argues that dating sites' algorithms do not "adhere to the standards of science," and adds that "it is unlikely that their algorithms can work, even in principle, given the limitations of the sorts of matching procedures that these sites use."

It's "relationship science" versus the in-take questions that most dating sites ask in order to help users create their profiles and suggest matches. Finkel and his coauthors note that some of the strongest predictors for good relationships — such as how couples interact under pressure — aren't assessed by dating sites.

The paper calls for the creation of a panel to grade the scientific credibility of each online dating site.

Got data news?

Feel free to email me.


January 11 2012

The rise of programmable self

Programmable self is a riff on the Quantified Self (QS). It is a simple concept:

Quantify what you want to change about yourself + motivational hacks = personal change success.

There are several potential "motivation hacks" that people regularly employ. The simplest of these is peer pressure. You could tell all of your co-workers every morning whether you kept your diet last night, for instance. Lots of research has shown that sort of thing is an effective motivator for change. Of course, you can make peer pressure digital by doing the same thing on Facebook/Twitter/Google+/whatever. Peer pressure has two components: shame and praise. It's motivating to avoid shame and to get praise. Do it because of a tweet and viola, you have digital peer pressure motivation.

Several books have recently popularized using money, in one form or another, as a motivational tool. There is some evidence, for instance, that people feel worse about losing $10 then they feel good about earning $10. This is called loss aversion, and it can easily be turned into a motivational hack. Having trouble finishing that book? Give 10 envelopes with $100 each to your best friend. Instruct them to mail the envelopes to your favorite (or most hated) charity for each month that you do not finish a chapter. Essentially, you've made your friend a "referee" of your motivational hack.

So, is there any potential to automate this process? To use software to hack your own motivation? One of the coolest applications that does just that is, which is designed to electronically manage contracts you make with yourself.

But that, by itself, is not programmable self.

Programmable self is the combination of a digital motivation hack, like Stickk, with a digital system that tracks behavior, like Fitbit (that's the Quantified Self part). You have to have both. Recently, for example, Stickk started supporting the use of the Withings Scale to support weight entries. Withings is a Wi-Fi-enabled scale that broadcasts your weight automagically to the Withings servers. From there, Withings will send your weight generally wherever you want: HealthVault, other personal health record (PHR) systems, or over to With that feature, Stickk became a programmable-self platform.

Stickk is pretty old, and Lose it or Lose It, which is focused specifically on losing weight, is also ancient in Internet time. It launched in 2009. The site requires you to take a picture of a weekly weigh in (you actually photograph the scale) and send it in. That counts as digital tracking, but I wonder if it supports Withings (or if it will).

In October 2011, Beeminder launched, billing itself as a direct Stickk competitor, but "for data geeks." Indeed, it is a little geeky: Beeminder is focused on weight change and other goals that are numerically similar to weight gain. The notion is that there is a proper path for the improvement of certain numbers — as well as a little "data jitter" to eliminate — in order to improve. Beeminder also refers to the classical term for the lack of self discipline: akrasia — so bonus points for that.

Last November, The Eatery launched from Massive Health. Massive Health is a massively funded dream team, and their first app is a classic programmable-self experiment. You simply take pictures of your food with your camera (digital tracking = photos) and let others rate your food choices (motivation hack = praise/shame). It's a good idea, and you can expect lots more from Massive Health that qualifies as programmable self.

Recently, GymPact made a big splash, even ending up in a New York Times blog post. Gympact is an iOS (soon Android) app that lets you check in at the gym. If you fail to check in, you get charged a fee. If you do keep your commitment to go to the gym, then you also earn some of the money from all of the people who failed to go to the gym.

Finally, Buster Benson and Jen S. McCabe are working on, which might be the first of the programmable-self platform plays.

All of these count as programmable self. I seriously doubt that any of these companies were aware of my original interview about programmable self or would even be comfortable with the term, which sounds pretty geeky and devious. (Which is, of course, why I love it.)

Other friends of mine in the serious games/games for health/gamification movement would probably count as programmable self, too. But some of them seem convinced that "fun" can have a deeper component in motivation then some of the more aggressive techniques that I, and other programmable self people, seem to favor. I should also mention that I am hardly the only one in the QS movement stumbling in this direction.

I will be writing about programmable self on Radar occasionally, but there is a lot more going on than I can track here . That's why I've also made a Tumblr about the subject and filled it with all of the "software for behavior change" goodness that anyone can take. My @fredtrotter Twitter account is mostly focused on programmable self as well.

Most importantly, I want to hear about what you have tried to do with your own personal change hacks, especially those that impact your health in one way or another. For that, I have set up a Programmable Self Google Group. Please join us. Some of the top minds in behavior change are already subscribers.

The Quantified Self movement is not primarily about the "tool creators" who make stuff for people to use, but a movement of users who defy the boundaries of tools and manage to create innovative quantification tools on their own. Many of these efforts also count as programmable-self approaches. No discussion of programmable self can ignore the work of individuals, so here is a decidedly non-exhaustive list of people innovating in this space:

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

January 06 2012

Visualization of the Week: AntiMap

A new mobile phone app, AntiMap Log, allows users to record their own data as they move around. The app uses the phone's GPS and compass sensors to capture the following data: latitude, longitude, compass direction, speed, distance, and time.

While the AntiMap Log — available for both Android and iPhone — is the data-gathering component, it's just one part of a trio of open source tools. AntiMap Simple and AntiMap Video provide the visualization and analysis components.

AntiMap Video was originally designed to help snowboarders visualize their data in real-time, synced with footage of their rides. Here's a demo video:

That same snowboarder data is also used in the following visualization:

AntiMap snowboard visualization

AntiMap describes the visualization:

Circles are used to visualise the plotted data. The color of each circle is mapped to the compass data (0˚ = black, 360˚ = white), and the size of each circle is mapped to the speed data (bigger circles = faster) ... You can see from the visualisation, during heelside turns (left) the colours are a lot whiter/brighter than toeside turns (right). The sharper/more obvious colour changes indicate either sudden turns or spins (eg. the few black rings right in the centre).

Found a great visualization? Tell us about it

This post is part of an ongoing series exploring visualizations. We're always looking for leads, so please drop a line if there's a visualization you think we should know about.

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

More Visualizations:

June 09 2011

Strata Week: The fears of face recognition

Here are the data stories that caught my attention this week.

Face recognition and Facebook

Face recognition technology isn't really a new Facebook feature, but until now it's only been available for U.S. users. The switch was flipped this week and face recognition made available for international users, prompting an outcry about privacy and an EU probe into the matter. The concerns involve using face recognition technology to tag users in photos without their consent.

As TechCrunch's Jason Kincaid points out, however, the fact that people can be tagged in photos without their consent happens with or without the face recognition technology:

To reiterate: the EU may conclude that Facebook users should be able to pre-approve their tags, and I don't necessarily think that would be a bad thing (I'm sick of tag spam, for one). But conflating this with the spookiness of facial recognition seems like a mistake — we should save that outcry for when companies really do start doing creepy things with the technology.

Facebook suggest tags option
Screenshot of Facebook's "Suggest Tags" menu (user photos were edited out of this image).

Tim O'Reilly wrote here on Radar that, in fact, Facebook's strategy for rolling out face recognition technology may be just the ticket:

Face recognition is here to stay. My question is whether to pretend that it doesn't exist, and leave its use to government agencies, repressive regimes, marketing data mining firms, insurance companies, and other monolithic entities, or whether to come to grips with it as a society by making it commonplace and useful, figuring out the downsides, and regulating those downsides.

Analyzing hacked passwords

bad passwordMuch of the uproar around recent hacks and security breaches has focused on the weaknesses of corporate systems themselves, as well as the impact stolen data might have on customers. But software architect Troy Hunt has turned his attention to a different matter, analyzing the passwords that were stolen.

Hunt has examined the 37,000 some-odd passwords that were made available via BitTorrent, just a small section of the million or so that LulzSec claimed to have taken in its latest breach of Sony Pictures. Hunt looked at the passwords in terms of length, randomness, uniqueness, and character types — generally accepted as the standards for password entropy. In other words, the more of these variables that you have, the stronger your password.

And no surprise, he found that most passwords aren't particularly strong.

Ninety-three percent of accounts were between six and 10 characters in length, and 50% were less than eight characters. Length is only one indicator of strength, and Hunt found that less than 4% of the passwords he analyzed had three or more character types (as in, capital letters, lower case letters, numbers, and so on). Half the passwords had only one character type, and of those, 90% were all lower case letters. Furthermore, less than 1% of passwords contained a non-alphanumeric character. There were a fair number of identical passwords, with "password" "123456" and "abc123" among the most common, and 20% of the passwords in this particular batch were repeats.

OSCON Data 2011, being held July 25-27 in Portland, Ore., is a gathering for developers who are hands-on, doing the systems work and evolving architectures and tools to manage data. (This event is co-located with OSCON.)

Save 20% on registration with the code OS11RAD

Just as problematic as these weak passwords, of course, is the repetition of passwords acros multiple databases. Although only 88 email addresses in this batch taken from Sony Pictures can be found in a similar data-dump from the stolen Gawker email addresses, two-thirds of those people used the same password to register on both sites.

"Based on the finding above," writes Hunt, "there's a statistically good chance that the majority of them will work with other websites. How many Gmail or eBay or Facebook accounts are we holding the keys to here? And of course 'we' is a bit misleading because anyone can grab these off the net right now. Scary stuff."

While the recent exploits demonstrate some of the ongoing problems around system security, Hunt's work highlights that there are a fair number of Internet users who are still not protecting themselves.

Archival data helps game developers recreate 1940s Los Angeles

LA NoireThe new video game L.A. Noire was released last month to great reviews, with many praising the accuracy of the game's 1940s Los Angeles setting.

Nathan Masters explains how the game's developers contacted archivists at a number of different collections in order to piece together the data about the city. Detailed WPA maps were found at the Huntington Library. U.S. Geological Survey data and photos were used from the UCLA Department of Geography and the Spence Air Photo Collection. From the Dick Whittington and Los Angeles Examiner photography collections at USC came images of cityscapes from the era. Numerous other libraries were consulted as well.

The Atlantic's Alexis Madrigal makes the wonderful suggestion for the game makers Rockstar Games to release the model for others to study and remix.

Got data news?

Feel free to email me.


May 26 2011

Mobile apps and the quiet handling of data

iPhone settingsThe web was never designed to be personal. Until Netscape added cookies to its servers and browsers in 1994 there was no way for a web server to store data on a user's computer. In 1996 there was a bit of a ruckus in the media about the privacy implications of cookies, then everyone relaxed a bit and got used to them.

Fifteen years later, the European Union has leapt into action and is now keen to enforce legislation in this area (despite a last-minute reprieve for the UK). As cookies are clearly defined and limited in scope, they make a good attack surface for legislators.

The Internet, and mobile in particular, have moved on a bit in the last 15 years, however.

Mobile apps, scattering data

I would happily predict that even in 20 years there will not be a 100% reliable, always-on, cheap wireless broadband option.

So unless you reside in Mountain View, Calif., luxuriating in virtually unlimited mobile data connectivity, I think you're going to find living 100% on the mobile web to be a pretty miserable experience.

Conversely, it will be harder and harder to find examples of apps on mobile devices that do not benefit from connection to data networks. So, unfortunately for the legislators, the once-clear boundary between device and service continues to blur and morph.

Software and data on iPhones and other devices are going to remain smeared across devices, the open web, and various other data services. Let's look at how this currently works.

Unique Device Identifiers (UDIDs)

To track a user across multiple apps you'd need some way of putting a unique tag on each device so that no matter which app read it, you'd know you had the same person.

This is precisely what the Unique Device Identifier (UDID) number on iOS devices can do. It's easily available to the writer of an app, and it cannot practically be changed or deleted.

These UDIDs allow developers to link data collected by different apps. (Interestingly, as the UDID acronym gets bandied about it will probably become irrationally feared.) Apple forbids the sharing of this data between companies, but within a company there is no effective means of preventing this.

The Shared Keychain in iOS allows apps published by a single developer to share data if they find themselves installed on the same iOS device — no network required.

Here's a theoretical example of how this might apply to apps from an insurance company:

  • You provide your date of birth to a motor insurance app to get a quick quote.
  • A year later you download a pension calculator app from a different division of the same company.
  • The pension app already knows your age, so it can get straight down to convincing you to buy savings products.

Android Open, being held October 9-11 in San Francisco, is a big-tent meeting ground for app and game developers, carriers, chip manufacturers, content creators, OEMs, researchers, entrepreneurs, VCs, and business leaders.

Save 20% on registration with the code AN11RAD

Data access to the Internet, with local storage on the device

The elephant in the room when talking about data protection is the fact that any app can silently connect to the Internet and send and receive data to its heart's content.

Developers are encouraged to show a spinner to indicate that the network is being accessed, but this is a guideline rather than an enforced requirement. This is not all about tracking users, of course. These capabilities allow things like remote throttling of app usage, enabling of new features, binding of sponsor data to parts of the app, updating media in the app, syncing with other services, etc. As there is no clear way to identify personal or tracking data within the app's local storage, any focused privacy legislation will be tricky.

The bottom line is that your iPhone apps are increasingly likely to be using a full set of web services without you ever setting up accounts, accepting terms and conditions, logging in or even being aware of it.

Apple Push Notification Service (APNS)

One of my personal favorites in terms of potential unexpected consequences is the Apple Push Notification Service (APNS), which allows developers to remotely pop up messages on iOS devices, or add badges with numbers to the icons of their apps.

Angry Birds notificationsThat's all relatively straightforward, but there is also the ability to make the iPhone play any audio file included in your app, whether the app is open at the time or not. Check the push permissions for "Angry Birds" for an example (Settings > Notifications).

When you installed "Angry Birds," Rovio explicitly asked for your permission to play sounds from Angry Birds on your phone whenever they like.

As an aside — If you're looking for true Internet notoriety, then gaining control of Rovio's servers would allow you to remotely command all iOS devices with Angry Birds installed to "tweet" (audibly, not via Twitter) in unison.

Data handling on PCs vs. data handling on mobile apps

Given all that's happening right now, how are we doing on transparency and consent? Let's compare some of the warnings and alerts you might get from three different use cases:

Case 1: Installing software on your PC that uses data on the Internet

  • Warning: this software was downloaded from the Internet
  • Please enter your administrator password to install
  • Antivirus warning: new software identified
  • Firewall warning: Unauthorized software trying to connect to the Internet

And when you run your new PC-based software:

  • Please provide your email to register your account
  • Please set a password
  • Click the confirm link in the email we've sent you to authorize your account
  • Accept the terms and conditions

Case 2: Accessing a website through a PC

  • Please install Flash plugin / authorize Java applet / install Silverlight
  • Register or log in
  • Provide email address / password
  • Click link in registration confirmation email
  • Can I set a cookie on your PC? (Thank you, EU)
  • Please accept the terms and conditions

Case 3: Installing an Internet-enabled app on your iPhone

  • Tap to install app
  • Errr...
  • That's it

Some final thoughts

The comparison between PC-based software and smartphone software shown above is stark, with many implications. There's a lot to work out, and there's a lot to debate. With that in mind, here's a few discussion points I think are worth exploring:

  • The "app way" of working could be great for business, but it only works if you trust the app delivery platform and the app developer. Organizations create and destroy trust in many ways, and we might benefit from a more explicit review of or focus on this.
  • Developers could be more open about what they are doing, but explaining technical issues in plain English can be tough. Frankly, most users aren't that interested, either.
  • New laws to control use of cookies are focusing on what legislators can see and understand. Legislation will always trail technology, leading to more "privacy theater."
  • Broader technology legislation that relies on applying judgement and intelligent interpretation may succeed more than narrow, knee-jerk legislation and zero tolerance.
  • The iPad brings the smartphone approach closer to the standard PC. Expect Mac OS X Lion to bring it all the way.
  • Just because it fits in your pocket, doesn't make it private.


April 14 2011

Data News: Week in Review

Here are a few of the data stories that caught my eye this week.

Your personal data analyzed (at a genetic level)

23andMePersonal genomics company 23andMe made its DNA test available for free (sort of) on Tuesday of this week. Want to know if you have the genetic markers that may predispose you to heart disease, alcoholism, or breast cancer? A free test is hard to pass up.

This up-close look at your personal cellular data does come with certain strings attached: you have to sign up for 23andMe's $9-a-month Personal Genome Service. That brings the total cost to more than $100 a year.

Nevertheless, this latest push is another win for 23andMe, a Silicon Valley startup that is offering DNA analysis as a retail product, not simply a medical service. That's an important distinction. The move by 23andMe to give this data to "consumers" — and not just "patients" — signals a shift in the way we think about our medical information and our personal, chromosomal data. It also raises some big questions: Does this mean our genomic data has become a commodity? And if so, how much do we control the access, sale, and potential profit?

Hacking education with data

DonorsChooseAccording to a recent Brookings Institution survey, Americans want more data about their local schools. But despite the best efforts of open data projects, that information is still quite limited: census data, test scores, and the like.

The situation could improve with the announcement that the education non-profit DonorsChoose is opening its data to developers for a Hacking Education contest. DonorsChoose, which acts as a Kickstarter of sorts for education, gives teachers a platform to pitch their projects and their classroom needs. Some 165,000 teachers in more than 43,000 public schools have submitted 300,000-plus projects, and in turn have inspired around $80 million in charitable giving.

All that data — the types of projects, the amount of funding, the resources requests, the types of schools, donors' search strings, donors' financial commitment — is being made available via the DonorsChoose contest. In addition to analysis of the data, the non-profit is also seeking developers to build apps based on its API.

The grand prize? A trophy. But it's awarded by Stephen Colbert and includes tickets for you and three friends to see a taping of "The Colbert Report."

Cloudera releases a new version of Hadoop

ClouderaCloudera, one of the primary contributors to Apache Hadoop, has released a new version of its Hadoop distribution this week. Version 3 (CDH3) contains more than 1,000 patches and changes, many of which will be contributed back to the open source Hadoop project.

While Hadoop's big data management is free and open source, Cloudera makes its money selling enterprise support. Much of the coverage of this latest version focused on Cloudera's position as the leader in this space. GigaOm's Derrick Harris says that:

CDH3 is a big reason that, despite a recent spate of Hadoop-based big data products either on the market or about to be there, Cloudera says it isn't sweating all the new competition. Another is that Cloudera doesn't think competitive vendors have what it takes to cut into Cloudera's business.

Got data news?

Suggestions and stories are always welcome, so feel free to contact me with ideas.


Personal data is the future, but does anybody care?

Like most people, I tend to surround myself with like-minded folks. Most of my dinner party conversations turn into rousing debates on the future of web standards, or which company will unlock the true power of personal data on the web, or how can we mark our bits with emotional cues to make our web experiences more human. That sort of thing.

But every now and then, I reconnect with old friends and even meet new people who don't find a conversation on data rousing at all. They have other things on their minds and they haven't thought about cookies or the amount of data Facebook is collecting on us. The mere utterance of the phrase "silos of data" kills a perfectly lovely conversation.

The problem is that understanding our personal data is important for everyone — not just geeks. People spend an incredible amount of time on Facebook, Google, Amazon, Twitter and other websites, creating content and telling the world how we feel, what we consume, how we think, and what we care about. And none of this belongs to us. I usually rile up a bit of a reaction when I mention that all of that time and energy spent is sold to advertisers, but the reaction boils down to privacy issues rather understanding the value of that information.

As I've been building my own personal data collection startup, I've thought a great deal about how I could communicate the value of knowing and owning your own data to non-geeks. The answer came to me after making a list of all of the personal data collection applications I have signed up for. I looked at those I use religiously versus those I've abandoned. Those I use religiously include: RunKeeper, TripIt, Foursquare, Gowalla, Fitbit, Mint, Hashable, OKCupid, and Foodspotting. Those that I love the idea of, but have since left behind, include: Hunch, Blippy, 23andMe, GoodReads, Plancast and Dopplr.

I know that others' lists will be different, but the point is that this process allowed me to step back and really think about what sort of real-time value I was getting out of gathering my own data. I was able to boil the results down to three categories that, I believe, could be used to incentivize personal data collection for just about anybody. These categories are:

  1. Utility
  2. Serendipity
  3. Self-expression

In order to incentivize the continued use of any personal data collection application, you either have to really excel in one of these areas or cover all three. Let me explain.

(Disclosure: O'Reilly AlphaTech Ventures is an investor in Foursquare, RunKeeper, and TripIt.)

Where 2.0: 2011, being held April 19-21 in Santa Clara, Calif., will explore the intersection of location technologies and trends in software development, business strategies, and marketing.

Save 25% on registration with the code WHR11RAD

1. Utility

Mint.comProbably the best example of utility from collecting personal data is By merely hooking up your online bank accounts, you get a snapshot of where you're spending your money, how much you have left, and you're given suggestions on where you can improve your financial situation. was so handy for so many people that they had to do very little marketing. Their users became rabid fans and told stories to everyone who would listen about how Mint saved them all kinds of money, exposed fees that they didn't know they were paying, and helped them get savvier about their finances.

Utility in itself isn't sexy, but if you make it incredibly beneficial and impossible to live without, people will pay for it. Utility includes things like:

  • Tracking — How much you spend, where you've been, how much you've consumed, when you did that thing you used to do last, etc.
  • Augmentation — Anything that can extrapolate from and add to the raw data is helpful.
  • Organization — Ways to sort and make sense of the raw data.
  • Visualization — A way to present the data so it is easy to interpret.

Utility is where TripIt proved more useful to me than Dopplr. I thought Dopplr's lovely design and serendipity-driven features were going to win me over, but at the end of the day, it was the usefulness of TripIt's easy itineraries, flight tracking, and augmentation through things like weather and maps, that led me to use it religiously while I almost completely forgot about Dopplr.

2. Serendipity

OKCupidOKCupid does the same basic thing that Hunch does: it asks the user to answer endless questions about their personal tastes and preferences. However, OKCupid has something over Hunch to incentivize users to actually spend the time to answer those questions: serendipity. And it's not just any serendipity, it's serendipity at its finest: the promise of finding love.

Answering questions about myself is kind of a fun notion ... once. People take personality quizzes all of the time online, but when we get the results, what is the first thing we do? We share it with friends. And once our friends have done the test and we all compare notes, that's it. We don't really go back. Where it gets interesting is if these results lead us to discover new friends, potential mates, cool stuff and ideas that could change our lives. And it isn't enough to have this happen once. It has to uncover serendipitous moments over and over again.

Serendipity is also the core element that drives my usage of geo-location applications. A couple of months ago, I was in New York and checked into a pizza place near Union Square. I looked to see who had checked in recently and Mark Suster's smiling face appeared. I had never met Mark, but I'd always wanted to and Foursquare allowed me to connect with him serendipitously in a place I never expected to. It's moments like these that drive me to continue to check in even though it takes time and effort to do so.

Serendipity is, ultimately, how you use the data to connect people to people, people to things that may interest them, and people to opportunities.

3. Self-expression

RunkeeperThere is definitely a utility in using an application like RunKeeper, but that's not primarily why I use it. I'm pretty proud of my commitment to training and my progress with running. RunKeeper gives me that tool I need to strongly signal to everyone who follows me that I'm a runner. The more I log my runs, the more people express how impressed they are, and so the more I log my runs. It's cyclical.

Other applications signal personal tastes as well. Foodspotting signals that you are cultured (if you take shots of a variety of ethnic foods), healthy (if you post organic, vegetarian or the like), indulgent (posting desserts, expensive meals, decadent burgers, etc.) or the like. Hashable signals you are a mover and a shaker without being accused of namedropping. signals whether you have hipster or hip-hop leanings. I have to admit that I've been known turn off the scrobbler when in a pop music mood. Why make the effort to stop the scrobbler and start the scrobbler again? Because I'm aware of the signals I'm sending.

The self-expression or taste signaling dimension of our personal data collection has the strongest potential for creating the ultimate personalized web experience. It's yet to be completely explored. We are practically screaming who we are and what we like as we post our activity on social applications, yet most recommendation engines and data mining engines continue to put us in traditional demographic and psychographic boxes. The potential of all this will be unlocked when emotional/taste data is mapped to products, check-ins, and our activity across social applications.

I'm looking forward to the day that personal data collection is part of the popular vernacular. Until then, it is up to us — the geeks and developers of theses applications — to help people collect these moments so they receive real-time value.


February 11 2011

The Locker Project: data for the people

Singly, a new company that made an appearance at the Strata Conference Startup showcase, exists to provide oxygen and commercial support to the open source Locker Project, and the new protocol TeleHash.

With some wonderful serendipity I met Singly on my first night at Strata. The next day, I talked in depth to Jeremie Miller and Simon Murtha-Smith, two of the three Singly co-founders (see later in this post). I also had the opportunity to ask Tim O'Reilly and Roger Magoulas for some of their thoughts on the significance of this project (see below for their comments).

It was a real "pinch myself in case I need to wake up from a dream" experience for me, to stumble across Jeremie Miller with Simon Murtha-Smith sitting behind a handwritten sign demoing Singly at Strata (see my pic opening this post). As Marshall Kirkpatrick noted at ReadWriteWeb:

Jeremie Miller is a revered figure among developers, best known for building XMPP, the open source protocol that powers most of the Instant Messaging apps in the world. Now Miller has raised funds and is building a team that will develop software aimed directly at the future of the web.

Singly, by giving people the ability to do things with their own data, has the potential to change our world. And, as Kirkpatrick notes, this won't be the first time Jeremie has done that.

I was drawn over to the Singly table when an awesome app they were demonstrating caught my eye. Fizz, an application from Bloom, was running on a locker with data aggregated from three different places

Fizz is an intriguing early manifestation of capabilities never seen before on the web. It provides the ability for us to control, aggregate, share and play with our own data streams, and bring together the bits and pieces of our digital selves scattered about the web (for more about Bloom and Singly, see Tim O'Reilly's comments below). The picture below is my Fizz. The large circles represent people and the small circles represent their status updates. Bloom explained the functionality in a company blog post:

Clicking a circle will reveal its contents. Typing in the search box will highlight matching statuses. This is an early preview of our work and we'll be adding more features in the next few weeks. We'd love to hear your feedback and suggestions.

If you are not already familiar with the Bloom team — Ben Cerveny, Tom Carden, and Jesper Sparre Andersen — go directly to their about page and you will understand why the match of Bloom and the Locker Project is a cause for great delight.

The Locker Project: A whole new way to connect from the protocol up

Singly, the Locker Project, and TeleHash take on and deliver an elegant and open solution to some of the holy grails of the next generation of networked communications. I have written on, and been nibbling at the edges of some of these grails in various projects myself for quite a while now. A glance at the monster mash of my pre-Strata post will give you an idea of how important I think Singly is.

That previous post raised the question of how to invert the search pyramid and to transform search into a social, democratic act. But if you are really interested in social search, I suggest staying keyed into what Singly is doing with the Locker Project.

One of Singly's three founders, Simon Murtha-Smith, was building a company called Introspectr, a social aggregator and search product. Singly's other founder Jason Cavnar was working on another similar project. And they came together as Singly because social aggregation and search is a very hard problem for one company to solve. They realized that the basic infrastructure needs to be open source and built on an open protocol.

To me, what is so important about the Locker Project is that it is built on a new open protocol, TeleHash. And having the Singly team focused on supplying tools and the trust/security layer for the Locker Project will mean that developers will have the whole stack.

I asked Miller to explain the relationship between TeleHash, the Locker Project and Singly.

Tish Shute: What is TeleHash?

Jeremie Miller: It's a peer-to-peer protocol to move bits of data for applications around. Not file sharing, but it's for actual applications to find each other and connect. So if you had an app and I had an app, whenever we're running that app on our devices, we can actually find those other devices from each other and then connect. Our applications can connect and do something. TeleHash is actually what has led to the Locker project itself.

Tish Shute: So TeleHash led to the Locker Project and the Locker Project led to Singly?

Jeremie Miller: Singly is a company that is sponsoring the open source Locker Project.

TeleHash is a protocol that lets the lockers connect with each other and share things. The locker is like all of your data. It's sort of like a digital person.

Singly interview

Left to right: Jeremie Miller, Jason Cavnar, Simon Murtha-Smith. I took the pic above of all three founders being interviewed by Marshall Kirkpatrick of ReadWriteWeb. I think we will look back on this moment and say it was an inflection point for the web. At least I tweeted that!

Tish Shute: A locker stores bits and pieces of your digital self?

Jeremie Miller: Yes. So TeleHash lets the lockers directly peer-to-peer connect with each other and share things. Singly, as a company, is going to be hosting lockers first and foremost. But the Locker Project is an open source project. You can have a locker in your machine or you can install it wherever you want.

Tish Shute: Will Singly provide the trust layer and hosting?

Jeremie Miller: Yeah. Singly is a company that will host lockers, as well as when people build applications that run inside your lockers or use your data, you need to be able to trust them. Maybe it's initially social data and you don't care that much about, but once you add browsing history, your health data, your running logs, or sleeping information, it's important to be careful about what you're running inside your locker and sharing.

So Singly will also look at the applications that are available that you can install and actually run them and look at what data they access. It will be able to come back and either certify or vouch for them.

I hope in the long-run, as this grows and builds, that power users may actually be able to buy a small device that they can plug into their home network and that would be their locker. Wouldn't that be cool? This little hard drive that you plug in.

Tish Shute: Architecturally, are TeleHash and the Locker Project related to your work on XMPP?

Jeremie Miller: XMPP in Jabber was designed for the specific purpose of instant messaging, but it was still a federated model in that you still had to go through a central point. It was designed with that in mind — for the communication path to be routed through somewhere. Where I've evolved is that I'm fascinated with truly distributed protocols that are completely decentralized so that things are going peer-to-peer instead of going through any server.

Peer-to-peer has gotten a pretty bad rap over the last 10 years because of file sharing, but the potential for it is awesome. There's so many really good things that can be done with peer-to-peer, and it hasn't gotten used much.

But the other side of the peer-to-peer thing that I think is critically important, look at the explosion of the computing devices around an individual — both in the home and on our person. I look at my home network router and I've got 30 devices in my house on Wi-Fi. That's a lot of devices.

But right now, to work with those devices I'm almost always going through a server somewhere, or through a data center somewhere. That's ridiculous.

Tish Shute: So we need a peer-to-peer network just to manage our own devices?

Jeremie Miller: A peer-to-peer network, yes. My phone should be talking straight to my computer, or to the iPad, or to the washing machine, or to the refrigerator. The applications in my TV should all be talking peer-to-peer. And it should be easy to do that. It shouldn't be that the only way you can do that is to go through a data center somewhere.

Note: Updates to the Locker Project will be posted through @lockerproject and GitHub

The Locker Project is not just "one more rebel army trying to undo these big data aggregations"

I discussed the impact TeleHash, the Locker Project and Singly might have on social network incumbents with Roger Magoulas and Tim O'Reilly. Both had insightful comments.

Magoulas pointed out:

I think Singly has Facebook-like aspects, but I think a better description is an app platform that integrates your personal and social network data — including data from Facebook.

Singly is likely to have challenges with some of their data sources, particularly if it gains traction with users. I like the app platform business model, although they face risks getting critical mass and app developer attention. I also like how they plan on using open source connectors to keep up with changing social network platforms.

Jeremie [Miller] has credibility with the open source community and is likely to find cooperating developers. The team seems to bring complementary strengths to the project and you can tell they all work well together.

Tim O'Reilly elaborated on the potential of this platform to bring something new to the ecosystem. Our interview follows:

Tish Shute: Will the Locker Project be able to break the lock of big sites, like Facebook, controlling everyone's data? Sometimes I feel we are stuck in the era of Zyngification, where you have to do what Zynga did and leverage the system in order to gain traction or do anything with social data.

Tim O'Reilly: I don't think breaking the Facebook lock is the objective of the Locker Project. The value of Facebook is having your data there with other people's data. What Singly may be able to do is give people better tools for managing their data. If you can take the data from various sites and manage it yourself, then you can potentially make better decisions about what you're going to allow and not allow. Right now, the interfaces on a lot of these sites make it difficult to understand the implications of making your data available.

If this is done right, it will create a marketplace where people will build interfaces that provide more control over personal data. People will still want to put data on sites for the same reason you put money in the bank: it's more valuable when it's combined with other people's money.

To conceive of this project as one more rebel army trying to undo these big data aggregations is just the wrong way to frame it.

Tish Shute: Framing the question the way you just did — that this is not just one more rebel army — might mean the stage at Strata will be filled with new startups next year. That's what I thought when I found out what the Locker Project and Singly are about: that we're about to see an explosion of creativity with personal and social data.

Tim O'Reilly: The tools we have now are pretty primitive. If we get a better set of tools, I think we'll see a lot of innovation. Some of those startups might be acquired by Facebook or Google, but if those smaller companies give people better visibility and control over their data, that's a good thing.

Tish Shute: I loved the marriage between Singly and Bloom [mentioned above]. It's interesting because Ben Cerveny and the Bloom team haven't really talked a lot about Bloom yet. I gather Bloom is moving toward consumer-facing work with data?

Tim O'Reilly: People think of data visualization as output, and the insight that I think Ben has had with Bloom is that data visualization will become a means of input and control.

I've started to feel that visualization as a way of making sense of complex data is kind of a dead-end. What you really want to do is build feedback loops where people can actually figure something out. Being able to manipulate data in real-time is an important shift. Data visualizations would then become interfaces rather than reports.

This post was edited and condensed. A longer version, featuring additional interviews and analysis, is available at UgoTrade.

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.
No Soup for you

Don't be the product, buy the product!

YES, I want to SOUP ●UP for ...