Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

June 28 2013

Four short links: 28 June 2013

  1. Huxley vs Orwellbuy Amusing Ourselves to Death if this rings true. The future is here, it’s just not evenly surveilled. (via rone)
  2. KeyMe — keys in the cloud. (Digital designs as backups for physical objects)
  3. Motorola Advanced Technology and Products GroupThe philosophy behind Motorola ATAP is to create an organization with the same level of appetite for technology advancement as DARPA, but with a consumer focus. It is a pretty interesting place to be. And they hired the excellent Johnny Chung Lee.
  4. Internet Credit Union — Internet Archive starts a Credit Union. Can’t wait to see memes on debit cards.

May 29 2013

Four short links: 29 May 2013

  1. Quick Reads of Notable New Zealanders — notable for two reasons: (a) CC-NC-BY licensed, and (b) gorgeous gorgeous web design. Not what one normally associates with Government web sites!
  2. svg.js — Javascript library for making and munging SVG images. (via Nelson Minar)
  3. Linkbot: Create with Robots (Kickstarter) — accessible and expandable modular robot. Loaded w/ absolute encoding, accelerometer, rechargeable lithium ion battery and ZigBee. (via IEEE Spectrum)
  4. The Promise and Peril of Real-Time Corrections to Political Misperceptions (PDF) — paper presenting results of an experiment comparing the effects of real-time corrections to corrections that are presented after a short distractor task. Although real-time corrections are modestly more effective than delayed corrections overall, closer inspection reveals that this is only true among individuals predisposed to reject the false claim. In contrast, individuals whose attitudes are supported by the inaccurate information distrust the source more when corrections are presented in real time, yielding beliefs comparable to those never exposed to a correction. We find no evidence of realtime corrections encouraging counterargument. Strategies for reducing these biases are discussed. So much for the Google Glass bullshit detector transforming politics. (via Vaughan Bell)

May 16 2013

Four short links: 16 May 2013

  1. Australian Filter Scope CreepThe Federal Government has confirmed its financial regulator has started requiring Australian Internet service providers to block websites suspected of providing fraudulent financial opportunities, in a move which appears to also open the door for other government agencies to unilaterally block sites they deem questionable in their own portfolios.
  2. Embedding Actions in Gmail — after years of benign neglect, it’s good to see Gmail worked on again. We’ve said for years that email’s a fertile ground for doing stuff better, and Google seem to have the religion. (see Send Money with Gmail for more).
  3. What Keeps Me Up at Night (Matt Webb) — Matt’s building a business around connected devices. Here he explains why the category could be owned by any of the big players. In times like this I remember Howard Aiken’s advice: Don’t worry about people stealing your ideas. If it is original you will have to ram it down their throats.
  4. Image Texture Predicts Avian Density and Species Richness (PLOSone) — Surprisingly and interestingly, remotely sensed vegetation structure measures (i.e., image texture) were often better predictors of avian density and species richness than field-measured vegetation structure, and thus show promise as a valuable tool for mapping habitat quality and characterizing biodiversity across broad areas.

January 18 2013

RFP-EZ: Making it easier for small companies to bid on government contracts

A few years ago, when I was doing the research that led to my work in open government, I had a conversation with Aneesh Chopra, later the first Federal CTO but at the time, the Secretary of Technology for the Commonwealth of Virginia.  I remember him telling me about the frustration of being in government, knowing that you could go to someone down the street to build a website in a week, but still having to put the job through procurement, a process taking nine months and resulting in a website costing ten times or more what it could have cost if he’d just been able to hire someone on the open market.

Much of the difficulty stems from stringent legal regulations that make it difficult for companies to compete and do business with government. (Like so many government regulations, these rules were designed with good intentions after scandals involving government officials steering contracts to their friends, but need to be simplified and updated for current circumstances.) The regulations are so complex that often, the people who do business with the federal government are more specialized in understanding that regulation than they are in the technology they’re providing. As a result, there are specialized intermediaries whose sole business is bidding on government jobs, and then subcontracting them to people who can actually do the work.

The problem has been compounded by the fact that many things that were once hard and expensive are now easy and cheap. But government rules make it hard to adopt cutting edge technology.

That’s why I’m excited to see the Small Business Administration launch RFP-EZ as part of the White House’s Presidential Innovation Fellows program. It’s a small step towards getting the door open — towards making it easier for new businesses to sell to government. RFP-EZ simplifies both the process for small companies to bid on government jobs and the process for government officials to post their requests. Hopefully it will increase government’s access to technology, increase competition in the federal space, and lower prices.

This is a huge opportunity for web developers and other commercial technology providers. Government is the largest buyer on the planet, and your potential to work on stuff that matters is unparalleled when you’re working with the platform of government. When government and private industry work together to solve problems, amazing things can happen. RFP-EZ is a step in that direction.

If you’re a startup or consulting firm who has a desire to make a difference, and a desire for revenue, I’d encourage you to check out what RFP-EZ has to offer. There are a few projects awaiting bids now, and from what I hear, more on their way. (This is still an experiment, and successful outcomes will lead to more jobs being posted.) If you’ve got a solution to the problems that are posted, take a step towards working on stuff that matters at scale.

I have another reason for urging innovative companies to participate. This project is an experiment. Take a look at the Federal Register notice about the project. It’s a pilot that has a clear start and end date. They’re using the pilot to gather data, learn from it, and iterate. They’ve given themselves room to succeed and permission to fail.  I’d like to see government do more of both. Your participation will encourage that response.

October 22 2012

What I learned about #debates, social media and being a pundit on Al Jazeera English

The Stream - Al Jazeera EnglishThe Stream - Al Jazeera EnglishEarlier this month, when I was asked by Al Jazeera English if I’d like to be go on live television to analyze the online side of the presidential debates, I didn’t immediately accept. I’d be facing a live international audience at a moment of intense political interest, without a great wealth of on-air training. That said, I felt honored to be asked by Al Jazeera. I’ve been following the network’s steady evolution over the past two decades, building from early beginnings during the first Gulf War to its current position as one of the best sources of live coverage and hard news from the Middle East. When Tahrir Square was at the height of its foment during the Arab Spring, Al Jazeera was livestreaming it online to the rest of the world.

I’ve been showing a slide in a presentation for months now that features Al Jazeera’s “The Stream” as a notable combination of social media, online video and broadcast journalism since its inception.

So, by and large, the choice was clear: say “yes,” and then figure out how to do a good job.

As is ever the case with new assignments, what would follow from that choice wasn’t as easy as it might have seemed. Some of the nuts and bolts of appearing were quite straightforward: Do a long pre-interview with the producer about my work and my perspective on how the Internet and social media were changing the dynamics of a live political event like the debate. (I captured much of that thinking here at Radar, in a post on digital feedback loops and the debate.) Go through makeup each time. Get wired up with a mic and an earpiece that connected me to the control room. Review each show’s outline, script and online engagement channels, from Twitter to YouTube to Google+ to Reddit.

I was also afforded a few luxuries that bordered on the surreal: a driver that picked me up and took me home from the studio. Bottled spring water. A modest honorarium to hang out in a television studio for a couple of hours and talk for a few intense minutes about what moments from the debates resonated online and why. The realization that my perspective could be seen by millions in Al Jazeera English’s international audience. People would be watching. I’d need to deliver something worth their time.

Entering The Stream

Live television doesn’t give anyone much room for error. On this particular show, The Stream, there was no room for a deep dive into analysis. We had time to answer a couple of questions of what happened on social media during the debates. Some spots were 30 seconds. Adding context in that context is a huge challenge. How much do you assume the people viewing know? What moments do you highlight? For this debate show, I had to assume that they watched the two candidates spar — but were they following the firehouse of commentary on Twitter? Even if they did, given how personalized social media has become, it was inevitable that what viewers saw online would be different than what we did in the studio.

When we saw the campaigns focus on Twitter during the debates, I saw that as news, and said as much. While the campaigns were also on Facebook, Google+, Tumblr, YouTube and blogs, along with the people formerly known as the audience, the forum for real-time social politics in the fall of 2012 remained Twitter, in all its character-limited glory.

Once the debates ended each night, campaigns and voters turned to the new watercoolers of the moment — blogs and article comment sections — to discuss what they’d seen. They went to Facebook and Google+ to share their reactions. To their credit, the Stream producers used Google+ Hangouts to immediately ask undecided voters what they thought and bring in political journalists to share their impressions. It’s a great use of the platform to involve more people in a show using the tools of the moment.

I’ve embedded each of the debate videos below, along with the full length episode of The Stream on data mining in the 2012 election. (I think I delivered, based upon the feedback I’ve received since in person and online, but I’m quite open to feedback if you’d like to comment.)

The Stream: Presidential Debates [10/3/2012]

The Stream: Vice Presidential Debate [10/11/2012]

The Stream: Presidential debates pre-show [10/16/2012]

On memes, social journalism and listening

The first two presidential debates and the vice-presidential debate spawned online memes. Given the issues before the country and the world, reducing these debates to those rapid expressions and the other moments that catalyzed strong online reactions was inherently self-limiting. The role of The Stream during the debates, however, was to look at these political events through the prism of social media to explain quickly and precisely what popped online. At this point, if you’re following the election, you’ve probably heard of at least two of them: Big Bird and “binders full of women.” (I explain both in the videos embedded above.) We also saw acmes of attention and debate conflict reflected online, from Vice President Biden’s use of “malarkey” to reaction to CNN chief political correspondent Candy Crowley’s real-time correction of former Massachusetts Governor Mitt Romney’s challenge to President Obama regarding his use of “act of terror” on the day after the United States Embassy to Libya was attacked.

There are limits to what you can discern through highlighting memes. While it might be easy to dismiss memes as silly or out-of-context moments, I think they serve a symbolic, even totemic role for people who share them online. There’s also a simple historic parallel: animated GIFs are the political cartoons of the present.

Reducing the role of social media in networked political debates to just Twitter, GIFs and status updates, however, would be a mistake. The combination of embeddable online video, blogs and wikis are all part of a blueprint for democratic participation that enables people to explore the issues debated in depth, which is particularly relevant if cable news shows fail to do so.

There’s also a risk of distracting from what we can learn about how the candidates would make policy or leadership decisions. I participated in a Google+ Hangout hosted by Storify last week about social media and elections. The panel of “social journalists” shared their perspectives on how the #debates are being covered in this hyper-connected moment — and whether social media is playing a positive role or not.

Personally, I see the role of social media in the current election as a mixed bag. Networked fact checking is a positive development. The campaigns and media alike can find interesting trends in real-time sentiment analysis, if they dive into the social data. I also see an important role for the broader Internet in providing as much analysis on policy or context as people are willing to search for, on social media or off.

There’s a risk, however, that public opinion or impressions of the debates are being prematurely shaped by the campaigns and their proxies, or that confirmation bias is being reaffirmed through homophilic relationships that are not representative of the electorate as whole.

All that being said, after these three shows, I plan to watch the last presidential debate, on foreign policy, differently. I’m going to pocket my smartphone, sleeve my iPad and keep my laptop closed. Instead of tracking the real-time feedback during the debates and participating in the ebb and flow of the conversation, I’m just going to actively listen and take notes. There are many foreign policy questions that will confront the 45th President of the United States. Tonight, I want to hear the responses of the candidates, unadorned by real-time spin, fact checking, debate bingo or instant reaction.

Afterwards, I’ll go back online to read liveblogs, see where the candidates may have gone awry, and look abroad to see how the world is reacting to a debate on foreign policy that stands to directly affect billions of people who will never vote in a U.S. election. First, however, I’ll form my own impressions, supported by the virtues of solitude, not the clamor of social media.

October 11 2012

Culture transmission is bi-directional

I read this piece in the New York Times the other day and have read it two or three more times since then. It dives into the controversy around DARPA’s involvement in hacker space funding. But frankly, every time I come across this controversy, I’m baffled.

I usually associate this sort of government distrust with Tea Party-led Republicans. The left, and even many of us in the middle, generally have more faith in government institutions. We’re more likely to view government as a tool to implement the collective will of the people. Lots of us figure that government is necessary, or at least useful, to accomplish things that are too big or hairy for any other group of citizens to achieve (in fact, a careful reading of Hayek will show even he thought so – commence comment flame war in 3 ..2 ..1 …).

So, to summarize, the right dislikes big government and typically the left embraces it. At least, right up until the moment the military is involved. Then the right worships big government (largely at the temple of the History Channel) and the left despises it.

Of course, I don’t know anything about the politics of the people criticizing this DARPA funding, just that they are worried that defense money will be a corrupting influence on the maker movement. Which would imply that they think Defense Department values are corrupting. And they might be right to have some concerns. While the U.S. military services are probably the single most competent piece of our entire government, the defense industrial complex that equips them is pretty damned awful. It’s inefficient, spends more time on political than actual engineering, and is where most of the world’s bad suits go to get rumpled. And there is no doubt that money is a vector along which culture and values will readily travel, so I suppose it’s reasonable to fear that the maker movement could be changed by it.

But what everyone seems to be missing is that this isn’t a one-way process and the military, via DARPA, is essentially saying “we want to absorb not just your technology but the culture of openness by which you create it.” That’s an amazing opportunity and shouldn’t be ignored. The money is one vector, but the interactions, magical projects, and collaboration are another, perhaps more powerful vector, along which the values of the maker movement can be swabbed directly into one of the most influential elements of our society. This is opportunity!

O’Reilly is participating in the DARPA MENTOR program and Dale has already discussed our involvement at length. So I need to disclose it, but this post isn’t about that. This post is about the idea that the military has been a change agent in our society many times before. This is an opportunity to do it again and for makers to influence how it happens.

For quite a few years, I worked in the defense space and, frankly, took a lot of crap for it from my friends on the left coast. But I always felt that the military was an important part of American society regardless of whether you agreed with its purpose or actual use, and that the best way to counter its less desirable tendencies was to engage with it. So while I worked my day job I also spent those years aggressively advocating open source software, emergent and incremental software processes, and “permissionless programming” web platforms for the DoD. I thought that the military could benefit from all of these things, but I also explicitly felt that they were a vector along which the cultural attributes of openness, transparency, and experimentation would readily travel. Those open and emergent ideas were a culture virus and I intended to shed them everywhere I could.

If you’re a technologist, you know that the military has always pushed the envelope. Silicon Valley itself began with Stanford’s government partnership during the Second World War. The world’s first interactive computer was Whirlwind, a component piece of the massive air defense program SAGE. So, if your vision is to unleash a democratized third industrial revolution based on the maker model, this is your opportunity. If you can insert open culture and values into the defense establishment at the same time, even better.

September 24 2012

What caused New York’s startup boom?

Google's New York officeGoogle's New York officeSince the crisis of 2008 New York City’s massive financial sector — the city’s richest economic engine, once seen to have unlimited potential for growth — has languished. In the meantime, attention has turned to its nascent startup sector, home to Foursquare, Tumblr, 10gen, Etsy and Gilt, where VC investment has surged even as it’s been flat in other big U.S. tech centers (PDF).

I’ve started to poke around the tech community here with a view toward eventually publishing a paper on the rise of New York’s startup scene. In my initial conversations, I’ve come up with a few broad questions I’ll focus on, and I’d welcome thoughts from this blog’s legion of smart readers on any of these.

  • How many people in New York’s startup community came from finance, and under what conditions did they make the move? In 2003, Google was a five-year-old, privately-held startup and Bear Stearns was an 80-year-old pillar of the financial sector. Five years later, Google was a pillar of the technical economy and among the world’s biggest companies; Bear Stearns had ceased to exist. Bright quantitatively-minded people who might have pursued finance for its stability and lucre now see that sector as unstable and not necessarily lucrative; its advantage over the technology sector in those respects has disappeared. Joining a 10-person startup is very different from taking a job at Google, but the comparative appeal of the two sectors has dramatically shifted.
  • To what degree have anchor institutions played a role in the New York startup scene? The relationship between Stanford University and Silicon Valley is well-documented; I’d like to figure out who’s producing steady streams of bright technologists in New York. Google’s Chelsea office, opened in 2006, now employs close to 3,000 people, and its alumni include Dennis Crowley, founder of Foursquare. That office is now old enough that it can generate a high volume of spin-offs as Googlers look for new challenges. And Columbia and NYU (and soon a Cornell-Technion consortium) have embraced New York’s startup community.
  • Does New York’s urban fabric make its labor market more liquid? Changing jobs in Silicon Valley can mean an extra 40 minutes on your commute if you have to slog up the 101 during rush hour. New York’s main business districts are much more compact; if you change jobs from a bank in Midtown to a startup on 28th Street, your commute won’t change by more than 10 minutes.
  • What are the dominant practice areas in New York’s tech scene, and how do they relate to the human capital available here? Have refugees from the finance, media and advertising industries brought with them distinctive skills from those areas? How much of the startup community here is targeted at acquiring those industries as clients?
  • What’s the city doing in response to the growth of its tech industry, and what can other cities learn from New York’s model? Other old, established cities like Chicago, Pittsburgh, Philadelphia and Washington claim to have robust startup communities. What do these cities have in common, and how have their governments reacted to the emergence of their tech communities? The emergence of a tech startup scene here could be particularly fortunate for New York in light of its dependence on the finance industry (at the peak of the finance boom, the industry contributed 20% and 13% of New York State’s and City’s income tax revenues, respectively; those figures in 2011 were 14% and 7%). To what degree can a city or state government desperate for diversification bring a startup community into existence?

Send along any ideas in the comments below!

August 21 2012

Three kinds of big data

Photo of the columns of Castor and Pollux by OliverN5 on FlickrIn the past couple of years, marketers and pundits have spent a lot of time labeling everything ”big data.” The reasoning goes something like this:

  • Everything is on the Internet.
  • The Internet has a lot of data.
  • Therefore, everything is big data.

When you have a hammer, everything looks like a nail. When you have a Hadoop deployment, everything looks like big data. And if you’re trying to cloak your company in the mantle of a burgeoning industry, big data will do just fine. But seeing big data everywhere is a sure way to hasten the inevitable fall from the peak of high expectations to the trough of disillusionment.

We saw this with cloud computing. From early idealists saying everything would live in a magical, limitless, free data center to today’s pragmatism about virtualization and infrastructure, we soon took off our rose-colored glasses and put on welding goggles so we could actually build stuff.

So where will big data go to grow up?

Once we get over ourselves and start rolling up our sleeves, I think big data will fall into three major buckets: Enterprise BI, Civil Engineering, and Customer Relationship Optimization. This is where we’ll see most IT spending, most government oversight, and most early adoption in the next few years.

Enterprise BI 2.0

For decades, analysts have relied on business intelligence (BI) products like Hyperion, Microstrategy and Cognos to crunch large amounts of information and generate reports. Data warehouses and BI tools are great at answering the same question — such as “what were Mary’s sales this quarter?” — over and over again. But they’ve been less good at the exploratory, what-if, unpredictable questions that matter for planning and decision making because that kind of fast exploration of unstructured data is traditionally hard to do and therefore expensive.

Most “legacy” BI tools are constrained in two ways:

  • First, they’ve been schema-then-capture tools in which the analyst decides what to collect, then later capture that data for analysis.
  • Second, they’ve typically focused on reporting what Avinash Kaushik (channeling Donald Rumsfeld) refers to as “known unknowns” — things we know we don’t know, and generate reports for.

These tools are used for reporting and operational purposes, usually focused on controlling costs, executing against an existing plan, and reporting on how things are going.

As my Strata co-chair Edd Dumbill pointed out when I asked for thoughts on this piece:

“The predominant functional application of big data technologies today is in ETL (Extract, Transform, and Load). I’ve heard the figure that it’s about 80% of Hadoop applications. Just the real grunt work of log file or sensor processing before loading into an analytic database like Vertica.”

The availability of cheap, fast computers and storage, as well as open source tools, have made it okay to capture first and ask questions later. That changes how we use data because it makes it okay to speculate beyond the initial question that triggered the collection of data.

What’s more, the speed with which we can get results — sometimes as fast as a human can ask them — makes data easier to explore interactively. This combination of interactivity and speculation takes BI into the realm of “unknown unknowns,” the insights that can produce a competitive advantage or an out-of-the-box differentiator.

We saw this shift in cloud computing: first, big public clouds wooed green-field startups. Then, in a few years, incumbent IT vendors introduced their private cloud offerings. Private clouds included only a fraction of the benefits of public clouds, but were nevertheless a sufficient blend of smoke, mirrors, and features to delay the inevitable move to public resources by a few years and appease the business. For better or worse, that’s where most of IT budgets are being spent today according to IDC, Gartner, and others.

In the next few years, then, look for acquisitions and product introductions — and not a little vaporware — as BI vendors that enterprises trust bring them “big data lite”: enough to satisfy their CEO’s golf buddies, but not so much that their jobs are threatened. This, after all, is how change comes to big organizations.

Ultimately, we’ll see traditional “known unknowns” BI reporting living alongside big-data-powered data import and cleanup, and fast, exploratory data “unknown unknown” interactivity.

Civil Engineering

The second use of big data is in society and government. Already, data mining can be used to predict disease outbreaks, understand traffic patterns, and improve education.

Cities are facing budget crunches, infrastructure problems, and a crowding from rural citizens. Solving these problems is urgent, and cities are perfect labs for big data initiatives. Take a metropolis like New York: hackathons; open feeds of public data; and a population that generates a flood of information as it shops, commutes, gets sick, eats, and just goes about its daily life.

Datagotham is just one example of a city's efforts to hack itself

I think municipal data is one of the big three for several reasons: it’s a good tie breaker for partisanship, we have new interfaces everyone can understand, and we finally have a mostly-connected citizenry.

In an era of partisan bickering, hard numbers can settle the debate. So, they’re not just good government; they’re good politics. Expect to see big data applied to social issues, helping us to make funding more effective and scarce government resources more efficient (perhaps to the chagrin of some public servants and lobbyists). As this works in the world’s biggest cities, it’ll spread to smaller ones, to states, and to municipalities.

Making data accessible to citizens is possible, too: Siri and Google Now show the potential for personalized agents; Narrative Science takes complex data and turns it into words the masses can consume easily; Watson and Wolfram Alpha can give smart answers, either through curated reasoning or making smart guesses.

For the first time, we have a connected citizenry armed (for the most part) with smartphones. Nielsen estimated that smartphones would overtake feature phones in 2011, and that concentration is high in urban cores. The App Store is full of apps for bus schedules, commuters, local events, and other tools that can quickly become how governments connect with their citizens and manage their bureaucracies.

The consequence of all this, of course, is more data. Once governments go digital, their interactions with citizens can be easily instrumented and analyzed for waste or efficiency. That’s sure to provoke resistance from those who don’t like the scrutiny or accountability, but it’s a side effect of digitization: every industry that goes digital gets analyzed and optimized, whether it likes it or not.

Customer Relationship Optimization

The final home of applied big data is marketing. More specifically, it’s improving the relationship with consumers so companies can, as Sergio Zyman once said, sell them more stuff, more often, for more money, more efficiently.

The biggest data systems today are focused on web analytics, ad optimization, and the like. Many of today’s most popular architectures were weaned on ads and marketing, and have their ancestry in direct marketing plans. They’re just more focused than the comparatively blunt instruments with which direct marketers used to work.

Tamis a Lait by El Bibliomata on FlickrThe number of contact points in a company has multiplied significantly. Where once there was a phone number and a mailing address, today there are web pages, social media accounts, and more. Tracking users across all these channels — and turning every click, like, share, friend, or retweet into the start of a long funnel that leads, inexorably, to revenue is a big challenge. It’s also one that companies like Salesforce understand, with its investments in chat, social media monitoring, co-browsing, and more.

This is what’s lately been referred to as the “360-degree customer view” (though it’s not clear that companies will actually act on customer data if they have it, or whether doing so will become a compliance minefield). Big data is already intricately linked to online marketing, but it will branch out in two ways.

First, it’ll go from online to offline. Near-field-equipped smartphones with ambient check-in are a marketer’s wet dream, and they’re coming to pockets everywhere. It’ll be possible to track queue lengths, store traffic, and more, giving retailers fresh insights into their brick-and-mortar sales. Ultimately, companies will bring the optimization that online retail has enjoyed to an offline world as consumers become trackable.

Second, it’ll go from Wall Street (or maybe that’s Madison Avenue and Middlefield Road) to Main Street. Tools will get easier to use, and while small businesses might not have a BI platform, they’ll have a tablet or a smartphone that they can bring to their places of business. Mobile payment players like Square are already making them reconsider the checkout process. Adding portable customer intelligence to the tool suite of local companies will broaden how we use marketing tools.

Headlong into the trough

That’s my bet for the next three years, given the molasses of market confusion, vendor promises, and unrealistic expectations we’re about to contend with. Will big data change the world? Absolutely. Will it be able to defy the usual cycle of earnest adoption, crushing disappointment, and eventual rebirth all technologies must travel? Certainly not.

Related:

June 29 2012

Four short links: 29 June 2012

  1. Personalization (Chris Lehmann) -- We should be careful about how we use that term, and we should be very skeptical of how well computerized programs can really personalize for kids. Most of what I see - especially from curriculum and assessment vendors - involves personalization of pace while still maintaining standardization of content. This.
  2. Unveiling Quadrigram (Near Future Laboratory) -- a Visual Programming Environment to gather, shape and share living data. By living data we mean data that are constantly changing and accumulating. They can come from social network, sensor feeds, human activity, surveys, or any kind of operation that produce digital information.
  3. Tim O'Reilly at MIT Media Lab (Ethan Zuckerman) -- a great recap of a Tim talk. There's an interesting discussion of the unmeasured value created by peer-to-peer activities (such as those made dead simple by the Internet), which is one of the new areas we're digging into here at O'Reilly.
  4. The State vs the Internet (David Eaves) -- we've all seen many ways in which the Internet is undermining the power of nation states. A session at Foo asked how it was going to end (which would give way first?), and this is an excellent recap. It could be that the corporation is actually the entity best positioned to adapt to the internet age. Small enough to leverage networks, big enough to generate a community that is actually loyal and engaged.

February 24 2012

Top stories: February 20-24, 2012

Here's a look at the top stories published across O'Reilly sites this week.

Data for the public good
The explosion of big data, open data and social data offers new opportunities to address humanity's biggest challenges. The open question is no longer if data can be used for the public good, but how.

Building the health information infrastructure for the modern epatient
The National Coordinator for Health IT, Dr. Farzad Mostashari, discusses patient empowerment, data access and ownership, and other important trends in healthcare.

Big data in the cloud
Big data and cloud technology go hand-in-hand, but it's comparatively early days. Strata conference chair Edd Dumbill explains the cloud landscape and compares the offerings of Amazon, Google and Microsoft.

Everyone has a big data problem
MetaLayer's Jonathan Gosier talks about the need to democratize data tools because everyone has a big data problem.

Three reasons why direct billing is ready for its close-up
David Sims looks at the state of direct billing and explains why it's poised to catch on beyond online games and media.


Strata 2012, Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work. Save 20% on Strata registration with the code RADAR20.

Cloud photo: Big cloud by jojo nicdao, on Flickr

February 22 2012

Data for the public good

Can data save the world? Not on its own. As an age of technology-fueled transparency, open innovation and big data dawns around the world, the success of new policy won't depend on any single chief information officer, chief executive or brilliant developer. Data for the public good will be driven by a distributed community of media, nonprofits, academics and civic advocates focused on better outcomes, more informed communities and the new news, in whatever form it is delivered.

Advocates, watchdogs and government officials now have new tools for data journalism and open government. Globally, there's a wave of transparency that will wash over every industry and government, from finance to healthcare to crime.

In that context, open government is about much more than open data — just look at the issues that flow around the #opengov hashtag on Twitter, including the nature identity, privacy, security, procurement, culture, cloud computing, civic engagement, participatory democracy, corruption, civic entrepreneurship or transparency.

If we accept the premise that Gov 2.0 is a potent combination of open government, mobile, open data, social media, collective intelligence and connectivity, the lessons of the past year suggest that a tidal wave of technology-fueled change is still building worldwide.

The Economist's support for open government data remains salient today:

"Public access to government figures is certain to release economic value and encourage entrepreneurship. That has already happened with weather data and with America's GPS satellite-navigation system that was opened for full commercial use a decade ago. And many firms make a good living out of searching for or repackaging patent filings."

As Clive Thompson reported at Wired last year, public sector data can help fuel jobs, and "shoving more public data into the commons could kick-start billions in economic activity." In the transportation sector, for instance, transit data is open government fuel for economic growth.

There is a tremendous amount of work ahead in building upon the foundations that civil society has constructed over decades. If you want a deep look at what the work of digitizing data really looks like, read Carl Malamud's interview with Slashdot on opening government data.

Data for the public good, however, goes far beyond government's own actions. In many cases, it will happen despite government action — or, often, inaction — as civic developers, data scientists and clinicians pioneer better analysis, visualization and feedback loops.

For every civic startup or regulation, there's a backstory that often involves a broad number of stakeholders. Governments have to commit to open up themselves but will, in many cases, need external expertise or even funding to do so. Citizens, industry and developers have to show up to use the data, demonstrating that there's not only demand, but also skill outside of government to put open data to work in service accountability, citizen utility and economic opportunity. Galvanizing the co-creation of civic services, policies or apps isn't easy, but tapping the potential of the civic surplus has attracted the attention of governments around the world.

There are many challenges for that vision to pass. For one, data quality and access remain poor. Socrata's open data study identified progress, but also pointed to a clear need for improvement: Only 30% of developers surveyed said that government data was available, and of that, 50% of the data was unusable.

Open data will not be a silver bullet to all of society's ills, but an increasing number of states are assembling platforms and stimulating an app economy.

Results-oriented mayors like Rahm Emanuel and Mike Bloomberg are committing to opening Chicago and opening government data in New York City, respectively.

Following are examples of where data for the public good is already having an impact upon the world we live in, along with some ideas about what lies ahead.

Financial good

Anyone looking for civic entrepreneurship will be hard pressed to find a better recent example than BrightScope. The efforts of Mike and Ryan Alfred are in line with traditional entrepreneurship: identifying an opportunity in a market that no one else has created value around, building a team to capitalize on it, and then investing years of hard work to execute on that vision. In the process, BrightScope has made government data about the financial industry more usable, searchable and open to the public.

Due to the efforts of these two entrepreneurs and their California-based startup, anyone who wants to learn more about financial advisers before tapping one to manage their assets can do so online.

Prior to BrightScope, the adviser data was locked up at the Securities and Exchange Commission (SEC) and the Financial Industry Regulatory Authority (FINRA).

"Ryan and I knew this data was there because we were advisers," said BrightScope co-founder Mike Alfred in a 2011 interview. "We knew data had been filed, but it wasn't clear what was being done with it. We'd never seen it liberated from the government databases."

While they knew the public data existed and had their idea years ago, Alfred said it didn't happen because they "weren't in the mindset of being data entrepreneurs" yet. "By going after 401(k) first, we could build the capacity to process large amounts of data," Alfred said. "We could take that data and present it on the web in a way that would be usable to the consumer."

Notably, the government data that BrightScope has gathered on financial advisers goes further than a given profile page. Over time, as search engines like Google and Bing index the information, the data has become searchable in places consumers are actually looking for it. That's aligned with one of the laws for open data that Tim O'Reilly has been sharing for years: Don't make people find data. Make data find the people.

As agencies adapt to new business relationships, consumers are starting to see increased access to government data. Now, more data that the nation's regulatory agencies collected on behalf of the public can be searched and understood by the public. Open data can improve lives, not least through adding more transparency into a financial sector that desperately needs more of it. This kind of data transparency will give the best financial advisers the advantage they deserve and make it much harder for your Aunt Betty to choose someone with a history of financial malpractice.

The next phase of financial data for good will use big data analysis and algorithmic consumer advice tools, or "choice engines," to make better decisions. The vast majority of consumers are unlikely to ever look directly at raw datasets themselves. Instead, they'll use mobile applications, search engines and social recommendations to make smarter choices.

There are already early examples of such services emerging. Billshrink, for example, lets consumers get personalized recommendations for a cheaper cell phone plan based on calling histories. Mint makes specific recommendations on how a citizen can save money based upon data analysis of the accounts added. Moreover, much of the innovation in this area is enabled by the ability of entrepreneurs and developers to go directly to data aggregation intermediaries like Yodlee or CashEdge to license the data.

EMC's Big Data solution accelerates business transformation. We offer a cost-efficient and scale-out IT infrastructure that allows organizations to access broad data sources, collaborate and execute real-time analysis and drive actionable insight.

Transit data as economic fuel

Transit data continues to be one of the richest and most dynamic areas for co-creation of services. Around the United States and beyond, there has been a blossoming of innovation in the city transit sector, driven by the passion of citizens and fueled by the release of real-time transit data by city governments.

Francisca Rojas, research director at the Harvard Kennedy School's Transparency Policy Project, has investigated the dynamics behind the disclosure of data by transit agencies in the United States, which she calls one of the most successful implementations of open government. "In just a few years, a rich community has developed around this data, with visionary champions for disclosure inside transit agencies collaborating with eager software developers to deliver multiple ways for riders to access real-time information about transit," wrote Rojas.

The Massachusetts Bay Transit Authority (MBTA) learned from Portland, Oregon's, TriMet that open data is better. "This was the best thing the MBTA had done in its history," said Laurel Ruma, O'Reilly's director of talent and a long-time resident in greater Boston, in her 2010 Ignite talk on real-time transit data. The MBTA's move to make real-time data available and support it has spawned a new ecosystem of mobile applications, many of which are featured at MBTA.com.

There are now 44 different consumer-facing applications for the TriMet system. Chicago, Washington and New York City also have a growing ecosystem of applications.

As more sensors go online in smarter cities, tracking the movements of traffic patterns will enable public administrators to optimize routes, schedules and capacity, driving efficiency and a better allocation of resources.

Transparency and civic goods

As John Wonderlich, policy director at the Sunlight Foundation, observed last year, access to legislative data brings citizens closer to their representatives. "When developers and programmers have better access to the data of Congress, they can better build the databases and tools that let the rest of us connect with the legislature."

That's the promise of the Sunlight Foundation's work, in general: Technology-fueled transparency will help fight corruption, fraud and reveal the influence behind policies. That work is guided by data, generated, scraped and aggregated from government and regulatory bodies. The Sunlight Foundation has been focused on opening up Congress through technology since the organization was founded. Some of its efforts culminated recently with the publication of a live XML feed for the House floor and a transparency portal for House legislative documents.

There are other horizons for transparency through open government data, which broadly refers to public sector records that have been made available to citizens. For a canonical resource on what makes such releases truly "open," consult the "8 Principles of Open Government Data."

For instance, while gerrymandering has been part of American civic life since the birth of the republic, one of the best policy innovations of 2011 may offer hope for improving the redistricting process. DistrictBuilder, an open-source tool created by the Public Mapping Project, allows anyone to easily create legal districts.

"During the last year, thousands of members of the public have participated in online redistricting and have created hundreds of valid public plans," said Micah Altman, senior research scientist at Harvard University Institute for Quantitative Social Science, via an email last year.

"In substantial part, this is due to the project's effort and software. This year represents a huge increase in participation compared to previous rounds of redistricting — for example, the number of plans produced and shared by members of the public this year is roughly 100 times the number of plans submitted by the public in the last round of redistricting 10 years ago," Altman said. "Furthermore, the extensive news coverage has helped make a whole new set of people aware of the issue and has re framed it as a problem that citizens can actively participate in to solve, rather than simply complain about."

Principles for data in the public good

As a result of digital technology, our collective public memory can now be shared and expanded upon daily. In a recent lecture on public data for public good at Code for America, Michal Migurski of Stamen Design made the point that part of the global financial crisis came through a crisis in public knowledge, citing "The Destruction of Economic Facts," by Hernando de Soto.

To arrive at virtuous feedback loops that amplify the signals that citizens, regulators, executives and elected leaders inundated with information need to make better decisions, data providers and infomediaries will need to embrace key principles, as Migurski's lecture outlined.

First, "data drives demand," wrote Tim O'Reilly, who attended the lecture and distilled Migurski's insights. "When Stamen launched crimespotting.org, it made people aware that the data existed. It was there, but until they put visualization front and center, it might as well not have been."

Second, "public demand drives better data," wrote O'Reilly. "Crimespotting led Oakland to improve their data publishing practices. The stability of the data and publishing on the web made it possible to have this data addressable with public links. There's an 'official version,' and that version is public, rather than hidden."

Third, "version control adds dimension to data," wrote O'Reilly. "Part of what matters so much when open source, the web, and open data meet government is that practices that developers take for granted become part of the way the public gets access to data. Rather than static snapshots, there's a sense that you can expect to move through time with the data."

The case for open data

Accountability and transparency are important civic goods, but adopting open data requires grounded arguments for a city chief financial officer to support these initiatives. When it comes to making a business case for open data, John Tolva, the chief technology officer for Chicago, identified four areas that support the investment in open government:

  1. Trust — "Open data can build or rebuild trust in the people we serve," Tolva said. "That pays dividends over time."
  2. Accountability of the work force — "We've built a performance dashboard with KPIs [key performance indicators] that track where the city directly touches a resident."
  3. Business building — "Weather apps, transit apps ... that's the easy stuff," he said. "Companies built on reading vital signs of the human body could be reading the vital signs of the city."
  4. Urban analytics — "Brett [Goldstein] established probability curves for violent crime. Now we're trying to do that elsewhere, uncovering cost savings, intervention points, and efficiencies."

New York City is also using data internally. The city is doing things like applying predictive analytics to building code violations and housing data to try to understand where potential fire risks might exist.

"The thing that's really exciting to me, better than internal data, of course, is open data," said New York City chief digital officer Rachel Sterne during her talk at Strata New York 2011. "This, I think, is where we really start to reach the potential of New York City becoming a platform like some of the bigger commercial platforms and open data platforms. How can New York City, with the enormous amount of data and resources we have, think of itself the same way Facebook has an API ecosystem or Twitter does? This can enable us to produce a more user-centric experience of government. It democratizes the exchange of information and services. If someone wants to do a better job than we are in communicating something, it's all out there. It empowers citizens to collaboratively create solutions. It's not just the consumption but the co-production of government services and democracy."

The promise of data journalism

NYTimes: 365/360 - 1984 (in color) by blprnt_van, on FlickrThe ascendance of data journalism in media and government will continue to gather force in the years ahead.

Journalists and citizens are confronted by unprecedented amounts of data and an expanded number of news sources, including a social web populated by our friends, family and colleagues. Newsrooms, the traditional hosts for information gathering and dissemination, are now part of a flattened environment for news. Developments often break first on social networks, and that information is then curated by a combination of professionals and amateurs. News is then analyzed and synthesized into contextualized journalism.

Data is being scraped by journalists, generated from citizen reporting, or gleaned from massive information dumps — such as with the Guardian's formidable data journalism, as detailed in a recent ebook. ScraperWiki, a favorite tool of civic coders at Code for America and elsewhere, enables anyone to collect, store and publish public data. As we grapple with the consumption challenges presented by this deluge of data, new publishing platforms are also empowering us to gather, refine, analyze and share data ourselves, turning it into information.

There are a growing number of data journalism efforts around the world, from New York Times interactive features to the award-winning investigative work of ProPublica. Here are just a few promising examples:

  • Spending Stories, from the Open Knowledge Foundation, is designed to add context to news stories based upon government data by connecting stories to the data used.
  • Poderopedia is trying to bring more transparency to Chile, using data visualizations that draw upon a database of editorial and crowdsourced data.
  • The State Decoded is working to make the law more user-friendly.
  • Public Laboratory is a tool kit and online community for grassroots data gathering and research that builds upon the success of Grassroots Mapping.
  • Internews and its local partner Nai Mediawatch launched a new website that shows incidents of violence against journalists in Afghanistan.

Open aid and development

The World Bank has been taking unprecedented steps to make its data more open and usable to everyone. The data.worldbank.org website that launched in September 2010 was designed to make the bank's open data easier to use. In the months since, more than 100 applications have been built using the data.

"Up until very recently, there was almost no way to figure out where a development project was," said Aleem Walji, practice manager for innovation and technology at the World Bank Institute, in an interview last year. "That was true for all donors, including us. You could go into a data bank, find a project ID, download a 100-page document, and somewhere it might mention it. To look at it all on a country level was impossible. That's exactly the kind of organization-centric search that's possible now with extracted information on a map, mashed up with indicators. All of sudden, donors and recipients can both look at relationships."

Open data efforts are not limited to development. More data-driven transparency in aid spending is also going online. Last year, the United States Agency for International Development (USAID) launched a public engagement effort to raise awareness about the devastating famine in the Horn of Africa. The FWD campaign includes a combination of open data, mapping and citizen engagement.

"Frankly, it's the first foray the agency is taking into open government, open data, and citizen engagement online," said Haley Van Dyck, director of digital strategy at USAID, in an interview last year.

"We recognize there is a lot more to do on this front, but are happy to start moving the ball forward. This campaign is different than anything USAID has done in the past. It is based on informing, engaging, and connecting with the American people to partner with us on these dire but solvable problems. We want to change not only the way USAID communicates with the American public, but also the way we share information."

USAID built and embedded interactive maps on the FWD site. The agency created the maps with open source mapping tools and published the datasets it used to make these maps on data.gov. All are available to the public and media to download and embed as well.

The combination of publishing maps and the open data that drives them simultaneously online is significantly evolved for any government agency, and it serves as a worthy bar for other efforts in the future to meet. USAID accomplished this by migrating its data to an open, machine-readable format.

"In the past, we released our data in inaccessible formats — mostly PDFs — that are often unable to be used effectively," said Van Dyck. "USAID is one of the premiere data collectors in the international development space. We want to start making that data open, making that data sharable, and using that data to tell stories about the crisis and the work we are doing on the ground in an interactive way."

Crisis data and emergency response

Unprecedented levels of connectivity now exist around the world. According to a 2011 survey from the Pew Internet and Life Project, more than 50% of American adults use social networks, 35% of American adults have smartphones, and 78% of American adults are connected to the Internet. When combined, those factors mean that we now see earthquake tweets spread faster than the seismic waves themselves. Networked publics can now share the effects of disasters in real time, providing officials with unprecedented insight into what's happening. Citizens act as sensors in the midst of the storm, creating an ad hoc system of networked accountability through data.

The growth of an Internet of Things is an important evolution. What we saw during Hurricane Irene in 2011 was the increasing importance of an Internet of people, where citizens act as sensors during an emergency. Emergency management practitioners and first responders have woken up to the potential of using social data for enhanced situational awareness and resource allocation.

An historic emergency social data summit in Washington in 2010 highlighted how relevant this area has become. And last year's hearing in the United States Senate on the role of social media in emergency management was "a turning point in Gov 2.0," said Brian Humphrey of the Los Angeles Fire Department.

The Red Cross has been at the forefront of using social data in a time of need. That's not entirely by choice, given that news of disasters has consistently broken first on Twitter. The challenge is for the men and women entrusted with coordinating response to identify signals in the noise.

First responders and crisis managers are using a growing suite of tools for gathering information and sharing crucial messages internally and with the public. Structured social data and geospatial mapping suggest one direction where these tools are evolving in the field.

A web application from ESRI deployed during historic floods in Australia demonstrated how crowdsourced social intelligence provided by Ushahidi can enable emergency social data to be integrated into crisis response in a meaningful way.

The Australian flooding web app includes the ability to toggle layers from OpenStreetMap, satellite imagery, and topography, and then filter by time or report type. By adding structured social data, the web app provides geospatial information system (GIS) operators with valuable situational awareness that goes beyond standard reporting, including the locations of property damage, roads affected, hazards, evacuations and power outages.

Long before the floods or the Red Cross joined Twitter, however, Brian Humphrey of the Los Angeles Fire Department (LAFD) was already online, listening. "The biggest gap directly involves response agencies and the Red Cross," said Humphrey, who currently serves as the LAFD's public affairs officer. "Through social media, we're trying to narrow that gap between response and recovery to offer real-time relief."

After the devastating 2010 earthquake in Haiti, the evolution of volunteers working collaboratively online also offered a glimpse into the potential of citizen-generated data. Crisis Commons has acted as a sort of "geeks without borders." Around the world, developers, GIS engineers, online media professionals and volunteers collaborated on information technology projects to support disaster relief for post-earthquake Haiti, mapping streets on OpenStreetMap and collecting crisis data on Ushahidi.

Healthcare

What happens when patients find out how good their doctors really are? That was the question that Harvard Medical School professor Dr. Atul Gawande asked in the New Yorker, nearly a decade ago.

The narrative he told in that essay makes the history of quality improvement in medicine compelling, connecting it to the creation of a data registry at the Cystic Fibrosis Foundation in the 1950s. As Gawande detailed, that data was privately held. After it became open, life expectancy for cystic fibrosis patients tripled.

In 2012, the new hope is in big data, where techniques for finding meaning in the huge amounts of unstructured data generated by healthcare diagnostics offer immense promise.

The trouble, say medical experts, is that data availability and quality remain significant pain points that are holding back existing programs.

There are, literally, bright spots that suggest what's possible. Dr. Gawande's 2011 essay, which considered whether "hotspotting" using health data could help lower medical costs by giving the neediest patients better care, offered another perspective on the issue. Early outcomes made the approach look compelling. As Dr. Gawande detailed, when a Medicare demonstration program offered medical institutions payments that financed the coordination of care for its most chronically expensive beneficiaries, hospital stays and trips to the emergency rooms dropped more than 15% over the course of three years. A test program adopting a similar approach in Atlantic City saw a 25% drop in costs.

Through sharing data and knowledge, and then creating a system to convert ideas into practice, clinicians in the ImproveCareNow network were able to improve the remission rate for Crohn's disease from 49% to 67% without the introduction of new drugs.

In Britain, researchers found that the outcomes for adult cardiac patients improved after the publication of information on death rates. With the release of meaningful new open government data about performance and outcomes from the British national healthcare system, similar improvements may be on the way.

"I do believe we are at the beginning of a revolutionary moment in health care, when patients and clinicians collect and share data, working together to create more effective health care systems," said Susannah Fox, associate director for digital strategy at the Pew Internet and Life Project, in an interview in January. Fox's research has documented the social life of health information, the concept of peer-to-peer healthcare, and the role of the Internet among people living with chronic disease.

In the past few years, entrepreneurs, developers and government agencies have been collaboratively exploring the power of open data to improve health. In the United States, the open data story in healthcare is evolving quickly, from new mobile apps that lead to better health decisions to data spurring changes in care at the U.S. Department of Veterans Affairs.

Since he entered public service, Todd Park, the first chief technology officer of the U.S. Department of Health and Human Services (HHS), has focused on unleashing the power of open data to improve health. If you aren't familiar with this story, read the Atlantic's feature article that explores Park's efforts to revolutionize the healthcare industry through better use of data.

Park has focused on releasing data at Health.Data.Gov. In a speech to a Hacks and Hackers meetup in New York City in 2011, Park emphasized that HHS wasn't just releasing new data: "[We're] also making existing data truly accessible or usable," he said, taking "stuff that's in a book or on a website and turning it into machine-readable data or an API."

Park said it's still quite early in the project and that the work isn't just about data — it's about how and where it's used. "Data by itself isn't useful. You don't go and download data and slather data on yourself and get healed," he said. "Data is useful when it's integrated with other stuff that does useful jobs for doctors, patients and consumers."

What lies ahead

There are four trends that warrant special attention as we look to the future of data for public good: civic network effects, hybridized data models, personal data ownership and smart disclosure.

Civic network effects

Community is a key ingredient in successful open government data initiatives. It's not enough to simply release data and hope that venture capitalists and developers magically become aware of the opportunity to put it to work. Marketing open government data is what repeatedly brought federal Chief Technology Officer Aneesh Chopra and Park out to Silicon Valley, New York City and other business and tech hubs.

Despite the addition of topical communities to Data.gov, conferences and new media efforts, government's attempts to act as an "impatient convener" can only go so far. Civic developer and startup communities are creating a new distributed ecosystem that will help create that community, from BuzzData to Socrata to new efforts like Max Ogden's DataCouch.

Smart disclosure

There are enormous economic and civic good opportunities in the "smart disclosure" of personal data, whereby a private company or government institution provides a person with access to his or her own data in open formats. Smart disclosure is defined by Cass Sunstein, Administrator of the White House Office for Information and Regulatory Affairs, as a process that "refers to the timely release of complex information and data in standardized, machine-readable formats in ways that enable consumers to make informed decisions."

For instance, the quarterly financial statements of the top public companies in the world are now available online through the Securities and Exchange Commission.

Why does it matter? The interactions of citizens with companies or government entities generate a huge amount of economically valuable data. If consumers and regulators had access to that data, they could tap it to make better choices about everything from finance to healthcare to real estate, much in the same way that web applications like Hipmunk and Zillow let consumers make more informed decisions.

Personal data assets

When a trend makes it to the World Economic Forum (WEF) in Davos, it's generally evidence that the trend is gathering steam. A report titled "Personal Data Ownership: The Emergence of a New Asset Class" suggests that 2012 will be the year when citizens start thinking more about data ownership, whether that data is generated by private companies or the public sector.

"Increasing the control that individuals have over the manner in which their personal data is collected, managed and shared will spur a host of new services and applications," wrote the paper's authors. "As some put it, personal data will be the new 'oil' — a valuable resource of the 21st century. It will emerge as a new asset class touching all aspects of society."

The idea of data as a currency is still in its infancy, as Strata Conference chair Edd Dumbill has emphasized. The Locker Project, which provides people with the ability to move their own data around, is one of many approaches.

The growth of the Quantified Self movement and online communities like PatientsLikeMe and 23andMe validates the strength of the movement. In the U.S. federal government, the Blue Button initiative, which enables veterans to download personal health data, has now spread to all federal employees and earned adoption at Aetna and Kaiser Permanente.

In early 2012, a Green Button was launched to unleash energy data in the same way. Venture capitalist Fred Wilson called the Green Button an "OAuth for energy data."

Wilson wrote:

"It is a simple standard that the utilities can implement on one side and web/mobile developers can implement on the other side. And the result is a ton of information sharing about energy consumption and, in all likelihood, energy savings that result from more informed consumers."

Hybridized public-private data

Free or low-cost online tools are empowering citizens to do more than donate money or blood: Now, they can donate, time, expertise or even act as sensors. In the United States, we saw a leading edge of this phenomenon in the Gulf of Mexico, where Oil Reporter, an open source oil spill reporting app, provided a prototype for data collection via smartphone. In Japan, an analogous effort called Safecast grew and matured in the wake of the nuclear disaster that resulted from a massive earthquake and subsequent tsunami in 2011.

Open source software and citizens acting as sensors have steadily been integrated into journalism over the past few years, most dramatically in the videos and pictures uploaded after the 2009 Iran election and during 2011's Arab Spring.

Citizen science looks like the next frontier. Safecast is combining open data collected by citizen science with academic, NGO and open government data (where available), and then making it widely available. It's similar to other projects, where public data and experimental data are percolating.

Public data is a public good

Despite the myriad challenges presented by legitimate concerns about privacy, security, intellectual property and liability, the promise of more informed citizens is significant. McKinsey's 2011 report dubbed big data as the next frontier for innovation, with billions of dollars of economic value yet to be created. When that innovation is applied on behalf of the public good, whether it's in city planning, transit, healthcare, government accountability or situational awareness, those effects will be extended.

We're entering the feedback economy, where dynamic feedback loops between customers and corporations, partners and providers, citizens and governments, or regulators and companies can both drive efficiencies and leaner, smarter governments.

The exabyte age will bring with it the twin challenges of information overload and overconsumption, both of which will require organizations of all sizes to use the emerging toolboxes for filtering, analysis and action. To create public good from public goods — the public sector data that governments collect, the private sector data that is being collected and the social data that we generate ourselves — we will need to collectively forge new compacts that honor existing laws and visionary agreements that enable the new data science to put the data to work.

Photo: NYTimes: 365/360 - 1984 (in color) by blprnt_van, on Flickr

Related:

November 07 2011

Four short links: 7 November 2011

  1. California and Bust (Vanity Fair) -- Michael Lewis digs into city and state finances, and the news ain't good.
  2. Tonido Plug 2 -- with only watts a day, you could have your own low-cost compute farm that runs off a car battery and a cheap solar panel.
  3. William Gibson Interview (The Paris Review) -- It's harder to imagine the past that went away than it is to imagine the future. What we were prior to our latest batch of technology is, in a way, unknowable. It would be harder to accurately imagine what New York City was like the day before the advent of broadcast television than to imagine what it will be like after life-size broadcast holography comes online. But actually the New York without the television is more mysterious, because we've already been there and nobody paid any attention. That world is gone.
  4. Zen and the Art of Making (Phil Torrone) -- thoughts on the difference between beginners and experts, and why the beginner's mindset is intoxicating and addictive.

October 27 2011

Strata Week: IBM puts Hadoop in the cloud

Here are a few of the data stories that caught my attention this week.

IBM's cloud-based Hadoop offering looks to make data analytics easier

IBM HadoopAt its conference in Las Vegas this week, IBM made a number of major big-data announcements, including making its Hadoop-based product InfoSphere BigInsights available immediately via the company's SmartCloud platform. InfoSphere BigInsights was unveiled earlier this year, and it is hardly the first offering that Big Blue is making to help its customers handle big data. The last few weeks have seen other major players also move toward Hadoop offerings — namely Oracle and Microsoft — but IBM is offering its service in the cloud, something that those other companies aren't yet doing. (For its part, Microsoft does say that a Hadoop service will come to Azure by the end of the year.)

IBM joins Amazon Web Services as the only other company currently offering Hadoop in the cloud, notes GigaOm's Derrick Harris. "Big data — and Hadoop, in particular — has largely been relegated to on-premise deployments because of the sheer amount of data involved," he writes, "but the cloud will be a more natural home for those workloads as companies begin analyzing more data that originates on the web."

Harris also points out that IBM's Hadoop offering is "fairly unique" insofar as it targets businesses rather than programmers. IBM itself contends that "bringing big data analytics to the cloud means clients can capture and analyze any data without the need for Hadoop skills, or having to install, run, or maintain hardware and software."

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

Cleaning up location data with Factual Resolve

The data platform Factual launched a new API for developers this week that tackles one of the more frustrating problems with location data: incomplete records. Called Factual Resolve, the new offering is, according to a company blog post, an "entity resolution API that can complete partial records, match one entity against another, and aid in de-duping and normalizing datasets."

Developers using Resolve tell it what they know about an entity (say, a venue name) and the API can return the rest of the information that Factual knows based on its database of U.S. places — address, category, latitude and longitude, and so on.

Tyler Bell, Factual's director of product, discussed the intersection of location and big data at this year's Where 2.0 conference. The full interview is contained in the following video:

Google and governments' data requests

As part of its efforts toward better transparency, Google has updated its Government Requests tool this week with information about the number of requests the company has received for user data since the beginning of 2011.

This is the first time that Google is disclosing not just the number of requests, but the number of user accounts specified as well. It's also made the raw data available so that interested developers and researchers can study and visualize the information.

According to Google, requests from U.S. government officials for content removal were up 70% in this reporting period (January-June 2011) versus the previous six months. And the number of user data requests was up by 29% compared to the previous reporting period. Google also says it received requests from local law enforcement agencies to take down various YouTube videos — one on police brutality, one that was allegedly defamatory — but Google says that it did not comply. But of the 5,950 user data requests (impacting some 11,000 user accounts) submitted between January and June 2011, Google says that it has complied with 93%, either fully or partially.

The U.S. was hardly the only government making an increased number of requests to Google. Spain, South Korea, and the U.K., for example, also made more requests. Several countries, including Sri Lanka and the Cook Islands, made their first requests.

Got data news?

Feel free to email me.

Related:

July 15 2011

Why don't they get it?

Predicting the development and adoption of new technology is difficult, with problems ranging from focusing too deeply on detailed technical features, to being swept along by an emotional gut feeling that may be unique to you.

When our forecasts don't pan out, we can feel great frustration with the people in the market who we see as either extremely cynical, irrational or stupid. Sometimes we just plain don't understand them.

In the mid 1990s I worked for a research company. At one point I produced forecasts of UK broadband penetration that turned out to be about 10 years too aggressive. We had totally underestimated the reluctance of BT to let go of potential ISDN revenues and invest in ADSL. The long-term technical call was correct, but the timing was naïve. I had understood the technology, but not the people.

Fast-forward 15 years and a tweet from Tim O'Reilly touched a nerve:

Sometimes copyright-protected industries are so far out of line, and I wonder why government doesn't see it http://bit.ly/lEWy9A #overreachless than a minute ago via Seesmic Desktop Favorite Retweet Reply

The piece he linked was highlighting television broadcasters' efforts to get the government involved in what are currently purely commercial negotiations between program producers and broadcasters. Rather than negotiate payments for rights, some broadcasters are asking the government to require program makers to grant these rights through new legislation.

You can see why broadcasters would want this, but how could a government begin to think this would be a good idea all-in-all for the economy?

To quote the distinguished Princeton psychologist and economist Daniel Kahneman, I believe it's because human beings are "endlessly complicated and interesting." Rather than take the knee-jerk response that politicians must just be in the pocket of big media, I'm going to look at how some aspects of human behavior make this kind of highly damaging legislation more likely.

We need only assume that politicians are people who are "against unemployment" and "against crime."

Fear of loss is much stronger than desire for gain

Kahneman's groundbreaking research with Amos Tversky on loss aversion showed that the fear of losing something generally (and strongly) outweighs the desire to acquire it.

So when an established industry like broadcasting cries out that there will be massive job losses if they don't get legislative support, then politicians' fear of loss will often greatly outweigh any desire to loosen or enact legislation to encourage innovation and new job creation.

If the broadcasters are able to convince the government that what was once considered fair use of material should really be seen as criminal copyright infringement, then they might also be able to push the "against crime" button.

Web 2.0 Summit, being held October 17-19 in San Francisco, will examine "The Data Frame" — focusing on the impact of data in today's networked economy.

Save $300 on registration with the code RADAR

A bird in the hand

Enacting heavy-handed legislation to support old, lumbering businesses can positively damage the prospects of new businesses and future job creation.

As investors, we learn through discounted cashflow analysis how to compare a dollar now with a dollar next year. We can apply this to compare a job now with a job in the future. Unfortunately, humans are often poor at this and undervalue future events compared to immediate circumstances (so-called hyperbolic discounting). In our broadcast media case, a perceived crime now (e.g. file sharing) can seem to totally outweigh the value of possible future job creation.

Political bandwidth is very narrow

To discuss or influence government policy we face a massive bandwidth problem, as politicians need to be able to state their position as crisp soundbites.

Allied to this is the fact that politically-engaged people tend to feel the urge to pick a side or risk being portrayed as spineless ditherers. Having picked a side, another psychological bias (confirmation bias) can kick in, leading to a tendency to fit evidence to their current viewpoint. Broadcast legislation is a walk in the park compared to climate change legislation, however. And I'll leave evolution denial to The Onion ...

In our media example, there are two sides to copyright (and also patent) law:

  1. Encouragement of innovation and creativity
  2. Punishment of criminal infringers

Our broadcasters are pushing hard to make the first point disappear off the radar, and ensure that copyright is perceived solely as an open-and-shut "protection of property" issue, similar to housebreaking or auto theft.

Unless the proponents of innovation can reclaim this ground, we will see that there is only room for the simplest of soundbites. They will eventually lead to works of art like the Protect IP Act.

So how do you combine an understanding of people and technologies?

The world is complex. Even the most sophisticated attempt to model "things" has led to a realization that this can only take us so far, and that we must put people's behavior at the center of our models.

If you doubt this, ask Google what they are up to with Google+.

I'm not a psychologist — I have an iPhone app development business and a background in media and business strategy. However, I would strongly recommend that anyone discussing or commenting on new technology get to know as many of the quirks and biases of human behavior as they can, as you're modeling people first and technology second.

I'm sure you've come across Freakonomics, but if you really want to swallow up the rest of the day I can recommend the Wikipedia list of cognitive biases, which has more than 100 listed reasons why people don't behave like technology.

So, next time you find yourself wondering why elegant and simple logical assumptions have once again been poleaxed by "some bunch of [insert your favorite insult here]," you'll probably find that some, if not all, of these cognitive effects are at play somewhere in the model.



Related:


July 07 2011

Developer Week in Review: The unglamorous life of video game developers

In an effort to Stop the Madness Now, this week's review will contain no references to lawsuits, rumors about Apple products, discussion of recent court cases of any kind, and it will be 100% gluten free.

Suddenly, enterprise server development doesn't seem that bad

Ever had one of those days, filled with endless meetings, when you wish you could be working on something fun, exciting, and wildly creative? You know, like a video game. Well, as it turns out, you might as well fantasize about working in a sweatshop making shirts, because the working conditions appear to be equivalent.

That's the conclusion that people are coming to, as details emerge about the horrific conditions under which the game "L.A. Noire" was produced. The reports paint a picture of never-ending work weeks, verbal abuse, and unpaid overtime. Now imagine being stuck in that kind of workplace for seven years.

OSCON 2011 — Join today's open source innovators, builders, and pioneers July 25-29 as they gather at the Oregon Convention Center in Portland, Ore.

Save 20% on registration with the code OS11RAD

The Internet: It can route around any malfunction except politics

In recent days there have been several governmental attempts to break the Internet in the name of the public good (if, in some cases, the public is defined as the owners of copyrighted material.) We begin with that bastion of free speech, the government of Australia. With cries of "Think of the Children!" echoing around Ayers Rock, two major Aussie ISPs began voluntarily blacklisting a list of allegedly child-porn-friendly sites generously provided to them by the government. Previous versions of this list have been helpfully supplied to the rest of the world by WikiLeaks (the contents of the list are secret), and in the past has included such dens of depravity as a dental website. Having the government decide what websites people can visit ... nope, nothing could possibly go wrong here.

At least the folks Down Under aren't trying to fundamentally subvert the working mechanisms that make the Internet function. For that level of creativity, you need to turn to the US Congress, which seems willing to break the Internet if it makes the film and record industries happy. The latest version of the PROTECT IP act (I won't make you endure the incredibly contorted words that make up the acronym) would require DNS providers to let the government seize DNS records at will if they believed that they were involved in intellectual property violations. Kind of like the DMCA, as implemented by Stalin (notice how I cleverly avoided invoking Godwin's law there, by switching dictators ...). Not surprisingly, the people who actually have to make the Internet function are not amused.



Required summer reading: The most dangerous software errors


The good people of MITRE have just released the 2011 list of the top 25 most dangerous software errors. Several of them have made the Week in Review before, usually right after a major company was taken down by one of them. If you don't know these culprits by heart, you should, because the bad guys certainly do!

After visiting the site, I desperately want a CWE/CAPEC t-shirt. The security ninjas at work will love it. I guess I'll have to wait until this year's comes out, alas.

Got news?

Please send tips and leads here.



Related:




April 21 2011

ePayments Week: Where adds context to PayPal

Here's what caught my attention in the payment space this week.

EBay buys a hyper-local friend for PayPal

WhereEBay's purchase of Where, a mobile app for finding local deals, gives the gift of context to PayPal. It's the second deal in recent weeks that connects a payment provider with a check-in service or advertiser to make a complete loop from discovery to payment. FourSquare demoed a similar link-up at SXSW last month. EBay will bring the whole deal in house, integrating PayPal into the Where app so that users can discover deals in Where and then pay for them with a single click. Erick Schonfeld at TechCrunch offers a solid rationale for the purchase, and also notes the data play inherent in it. All that data that eBay has on its and PayPal's users could help Where server up more relevant offers and advertising to PayPal's users.

PayPal explained the deal in the context of other acquisitions it's making. Amanda Pires, PayPal's senior director of global communications, said in a blog post that "Local commerce companies like Where are blurring the lines between in-store and online shopping." Last month, EBay made another purchase that similarly crossed lines when it said it would buy GSI Commerce, a provider of e-commerce services for retail brands. That deal could eventually put PayPal at the register of physical stores. With the Where acquisition, now they'll have a way to get you to the store, too.



O'Reilly authors discuss iPhone's built-in travel log


iPhone trackThis week's big news in geolocation came from Alasdair Allan and Pete Warden, who reported their discovery of an unencrypted file on iPhones (and their synced computers) tracking their movements since they upgraded to iOS4 sometime last summer.

Allan and Warden discussed their discovery at Where 2.0 on Wednesday. Although Apple had yet to offer an explanation of the file to them (or to media inquiries), Allan and Warden said they speculated that the data was from interactions between the phone and radio cell towers, whether that was a call, a text, a data packet, or simply a locating signal. For Allan, it added up to 29,000 points of data over 293 days.

As both hastened to point out, telecom carriers already have this kind of information on you, regardless of what kind of phone you carry. But that data is treated with a higher level of security, since it's considered sensitive. "What's interesting about this data is that it's unencrypted and available," said Allan. "It's insecure." (See Alasdair's post for more details on the discovery and the open source app they created to manipulate and visualize the data.)

Responding to comments that this data had already been discovered and was well known, Allan said during a Where 2.0 session: "It's not well known. We're pretty geeky. If we didn't know, then a lot of people didn't know."

White House calls for identity ecosystem

Just days before Barack Obama headed out to Palo Alto to host a virtual town hall meeting in the real-world space that houses Facebook's headquarters, the White House backed a plan to spur private industry to create more secure forms of online identity. Noting that identity theft and online fraud are serious problems that cost the economy billions every year, the administration called on private industry to come up with a solution that might free the citizenry from the tyranny of dozens of username/password combinations.

Kashmir Hill on Forbes.com wrote that the government's aim is to create an "identity ecosystem," which sounds a lot like the plan that OpenID has been advocating for a while. Emily Badger on Miller-McCune.com looked closely at the line the administration is walking between showing leadership or looking like Big Brother. Badger talked with Amie Stepanovich, national security counsel for the Electronic Privacy Information Center in Washington. The interview gives the sense the White House tiptoed carefully around this point, making sure it wasn't suggesting a government-issued national online identity number (something that's been kicked around before but wouldn't be received well by most citizens) and scrubbing any sign of the Department of Homeland Security's involvement (even though, Badger notes, they've been involved in the formative thinking on this issue for years).

Any authentication system raises new risks. If a security key fob is necessary, like the ones provided by RSA, people will lose it. Mobile phones could be used, too, but they're just as easy to lose. Biometrics tap a validation mechanism that's harder to lose, but it's not clear whether people are willing to put up with a retina scan just to access their Netflix queues.



Got news?


News tips and suggestions are always welcome, so please send them along.


If you're interested in learning more about the payment development space, check out PayPal X DevZone, a collaboration between O'Reilly and PayPal.


Related:

March 24 2011

Search Notes: Google and government scrutiny

This week's column explores the latest in how we access information online and how the courts and governments are weighing in.

Google continues to be one of the primary ways we navigate the web

GoogleA recent Citi report using comScore data is yet the latest that illustrates how much we use Google to find information online.

The report found that Google is the top source of traffic for 74% of the 35 properties analyzed and that Google traffic has remained steady or increased for 69% of them.

However, it was a slightly different picture for media sites, as many saw less traffic from Google and more traffic from Facebook.

Also, a recent Pew study found that for the 24% of Americans who get most of their political news from the internet, Google comes in third at 13% (after CNN and Yahoo).

More generally, 67% of Americans get most political news from TV and 27% rely on newspapers (the latter is down from 33% in 2002). This trend is what's being seen generally for media, as noted in a recent comprehensive study by Pew Research Center's Internet & American Life Project and Project for Excellence in Journalism, in partnership with the John S. and James L. Knight Foundation.

Google and governments, courts, and other legal entanglements

Google's mission is to "organize the world's information and make it universally accessible and useful." Notice the use of the word "world" rather than "Internet." They're organizing our email, our voice mail, and the earth.

While having everything at our fingertips at a moment's notice is awesome, it also can make governments and courts nervous.

Case in point, the U.S. Senate is planning to hold an anti-trust investigation into Google's "dominance over Internet search" and their increasing competition with ecommerce sites.

Senator Herb Kohl noted that the "Internet continues to grow in importance to the national economy." He wants to look into allegations by websites that they "are being treated unfairly in search ranking, and in their ability to purchase search advertising."

Texas also recently filed an anti-trust lawsuit against Google, looking for access to information about how both organic and paid results are ranked.

Of course, if Google reveals too much, then their systems can be gamed. Searchers won't get the best results. Site owners would lose out too as the most relevant and useful result wouldn't appear at the top of results.

Why should we trust Google to rank results fairly? Ultimately, if they build a searcher experience that doesn't benefit the searcher, they could lose users and market share, so it's in their best interest to continue on their stated path.



"Right to be forgotten"


Another fairly recent case involves the Spanish courts. Google search simply indexes and ranks content that exists on the web. When something negative appears about a person or company, they will sometimes ask Google to remove it, but Google's stance is typically that the person or company has to work with the content owner to remove the content — Google just indexes what is public. (Exceptions to this exist.)

In Spain (and other parts of Europe), someone has "the right to be forgotten," but this doesn't apply to newspapers as they are protected by freedom of expression rules. Does it apply to Google's index of that newspaper content? Apparently, it's been ruled both that freedom of expression rules don't apply to search engines and that Google is a publisher and laws that apply to newspapers apply equally to Google.

A Spanish plastic surgeon wants Google to remove a negative newspaper article from 1991 from their search results (although he can't legally ask the newspaper itself to remove the article). The Wall Street Journal sums up the case this way:

The Spanish regulator says that in situations where having material included in search results leads to a massive disclosure of personal data, the individual concerned has the right to ask the search engine to remove it on privacy grounds. Google calls that censorship.

Google does remove content based on government requests when legally obligated to do so and it makes a summary of those requests available.

Sidenote to anyone upset about a negative newspaper article appearing in search results: It's probably a bad idea to try to bribe the journalist into taking the content down.



Google can't become the "Alexandria of out of-print books" quite yet


Google BooksSearch isn't the only area being scrutinized. Google has also been scanning the world's books and making them universally accessible. The courts justrejected a settlement between Google and the Authors Guild that created an opt-out model for authors. Neither Google nor the Authors Guild is happy. Authors Guild president Scott Turow said, "this Alexandria of out-of-print books appears lost at the moment."



Block any site from your Google search results


Since we all use Google to navigate the web, it makes sense that we want to be able to have our own personal Google and block the sites we don't like. Last month in this column, we talked about Google's chrome extension that enabled searchers to create a personal blocklist. Now this ability is open to everyone. Once you click on a listing and then return to the search results, the listing you clicked includes a "block all results" link. Click that and you'll never see results from that site again. You can manage this block list in your Google account.



Bye, AllTheWeb!


AllTheWebGoogle may seem unstoppable, but only a few years before Google launched, another search engine was dominant on the web. Alta Vista launched in late 1995 with innovative crawling technology that helped it gain vast popularity. Alta Vista later lost out to Google and was acquired by Yahoo. In late 2010, Yahoo announced they were closing down several properties, including Alta Vista.

That hasn't happened yet, but AllTheWeb, another of Yahoo's search properties is closing April 4th, at which time you'll be redirected to Yahoo. Alta Vista can't be far behind.




Related:


March 14 2011

02mydafsoup-01

An Open Letter to Hillary Clinton | protecthonesty.tumblr.com 2011-03-14

An Open Letter to Secretary of State Hillary Clinton Regarding P.J. Crowley’s Resignation

March 14, 2011

Secretary of State Hillary Clinton
US Department of State
2201 C Street NW
Washington, DC 20520

Dear Madam Secretary,

We the undersigned are writing to express our severe disappointment at the resignation of P.J. Crowley, Assistant Secretary for the Bureau of Public Affairs at the State Department.

A number of us were present at the meeting where Mr. Crowley expressed his personal opinions, but all of us are concerned to learn that Mr. Crowley’s statements appear to have led to his resignation.  In the context of an open and honest discussion in an academic institution, we were eager to hear Mr. Crowley’s views and willing to give him our opinions and advice. It is this type of openness to dissenting opinions, frankness of assessments, and honesty of discourse that leads to both the advancement of human knowledge and the healthy function of an open, democratic society.  We are discouraged to find such dialogue prompting the resignation of a public official. If public officials are made to fear expressing their truthful opinions, we have laid the groundwork for ineffective, dishonest, and unresponsive governance.

We hope that you agree with such sentiments and we look forward to seeing renewed support for frank civic dialogue at the State Department.
 
Signed:

[...]

March 04 2011

March 03 2011

Pressure Grows for Answers on Fracking

Congressional Democrats demand answers about the safety of hydraulic fracturing after revelations that wastewater from such drilling is regularly dumped into rivers and streams without proper treatment.
Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl