Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

February 13 2014

Four short links: 13 February 2014

  1. The Common Crawl WWW Ranking — open data, open methodology, behind an open ranking of the top sites on the web. Preprint paper available. (via Slashdot)
  2. Felton’s Sensors (Quartz) — inside the gadgets Nicholas Felton uses to quantify himself.
  3. Myo Armband (IEEE Spectrum) — armband input device with eight EMG (electromyography) muscle activity sensors along with a nine-axis inertial measurement unit (that’s three axes each for accelerometer, gyro, and magnetometer), meaning that you get forearm gesture sensing along with relative motion sensing (as opposed to absolute position). The EMG sensors pick up on the electrical potential generated by muscle cells, and with the Myo on your forearm, the sensors can read all of the muscles that control your fingers, letting them spy on finger position as well as grip strength.
  4. Bitcoin Exchanges Under Massive and Concerted Attack — he who lives by the network dies by the network. a DDoS attack is taking Bitcoin’s transaction malleability problem and applying it to many transactions in the network, simultaneously. “So as transactions are being created, malformed/parallel transactions are also being created so as to create a fog of confusion over the entire network, which then affects almost every single implementation out there,” he added. Antonopoulos went on to say that Blockchain.info’s implementation is not affected, but some exchanges have been affected – their internal accounting systems are gradually going out of sync with the network.

January 30 2014

Four short links: 30 January 2014

  1. $200k of Spaceships Destroyed (The Verge) — More than 2,200 of the game’s players, members of EVE’s largest alliances, came together to shoot each other out of the sky. The resultant damage was valued at more than $200,000 of real-world money. [...] Already, the battle has had an impact on the economics and politics of EVE’s universe: as both side scramble to rearm and rebuild, the price of in-game resource tritanium is starting to rise. “This sort of conflict,” Coker said, “is what science fiction warned us about.”
  2. Google Now Has an AI Ethics Committee (HufPo) — sorry for the HufPo link. One of the requirements of the DeepMind acquisition was that Google agreed to create an AI safety and ethics review board to ensure this technology is developed safely. Page’s First Law of Robotics: A robot may not block an advertisement, nor through inaction, allow an advertisement to come to harm.
  3. Academic Torrentsa scalable, secure, and fault-tolerant repository for data, with blazing fast download speeds built on BitTorrent.
  4. Hack Schools Meet California Regulators (Venturebeat) — turns out vocational training is a regulated profession. Regulation meets disruption, annihilate in burst of press releases.

January 21 2014

Four short links: 21 January 2014

  1. On Being a Senior Engineer (Etsy) — Mature engineers know that no matter how complete, elegant, or superior their designs are, it won’t matter if no one wants to work alongside them because they are assholes.
  2. Control Theory (Coursera) — Learn about how to make mobile robots move in effective, safe, predictable, and collaborative ways using modern control theory. (via DIY Drones)
  3. US Moves Towards Open Access (WaPo) — Congress passed a budget that will make about half of taxpayer-funded research available to the public.
  4. NHS Patient Data Available for Companies to Buy (The Guardian) — Once live, organisations such as university research departments – but also insurers and drug companies – will be able to apply to the new Health and Social Care Information Centre (HSCIC) to gain access to the database, called care.data. If an application is approved then firms will have to pay to extract this information, which will be scrubbed of some personal identifiers but not enough to make the information completely anonymous – a process known as “pseudonymisation”. Recipe for disaster as it has been repeatedly shown that it’s easy to identify individuals, given enough scrubbed data. Can’t see why the NHS just doesn’t make it an app in Facebook. “Nat’s Prostate status: it’s complicated.”

December 23 2013

Four short links: 23 December 2013

  1. DelFly Explorer — 20 grams, 9 minutes of autonomous flight, via barometer and new stereo vision system. (via Wayne Radinsky)
  2. Banning Autonomous Killing Machines (Tech Republic) — While no autonomous weapons have been built yet, it’s not a theoretical concern, either. Late last year, the U.S. Department of Defense (DoD) released its policy around how autonomous weapons should be used if they were to be deployed in the battlefield. The policy limits how they should operate, but definitely doesn’t ban them. (via Slashdot)
  3. Scientific Data Lost at Alarming Rate — says scientific paper PUBLISHED BEHIND A PAYWALL.
  4. Security of Browser Extension Password Managers (PDF) — This research shows that the examined password managers made design decisions that greatly increase the chance of users unknowingly exposing their passwords through application-level flaws. Many of the flaws relate to the browser-integrated password managers that don’t follow the same-origin policy that is crucial to browser security. In the case of password managers, this means that passwords could be filled into unintended credential forms, making password theft easier.

December 04 2013

Four short links: 4 December 2013

  1. Skyjack — drone that takes over other drones. Welcome to the Malware of Things.
  2. Bootstrap Worlda curricular module for students ages 12-16, which teaches algebraic and geometric concepts through computer programming. (via Esther Wojicki)
  3. Harvestopen source BSD-licensed toolkit for building web applications for integrating, discovering, and reporting data. Designed for biomedical data first. (via Mozilla Science Lab)
  4. Project ILIAD — crowdsourced antibiotic discovery.

November 28 2013

November 26 2013

Four short links: 26 November 2013

  1. The Death and Life of Great Internet Cities“The sense that you were given some space on the Internet, and allowed to do anything you wanted to in that space, it’s completely gone from these new social sites,” said Scott. “Like prisoners, or livestock, or anybody locked in institution, I am sure the residents of these new places don’t even notice the walls anymore.”
  2. What You’re Not Supposed To Do With Google Glass (Esquire) — Maybe I can put these interruptions to good use. I once read that in ancient Rome, when a general came home victorious, they’d throw him a triumphal parade. But there was always a slave who walked behind the general, whispering in his ear to keep him humble. “You are mortal,” the slave would say. I’ve always wanted a modern nonslave version of this — a way to remind myself to keep perspective. And Glass seemed the first gadget that would allow me to do that. In the morning, I schedule a series of messages to e-mail myself throughout the day. “You are mortal.” “You are going to die someday.” “Stop being a selfish bastard and think about others.” (via BoingBoing)
  3. Neural Networks and Deep Learning — Chapter 1 up and free, and there’s an IndieGogo campaign to fund the rest.
  4. What We Know and Don’t KnowThat highly controlled approach creates the misconception that fossils come out of the ground with labels attached. Or worse, that discovery comes from cloaked geniuses instead of open discussion. We’re hoping to combat these misconceptions by pursuing an open approach. This is today’s evolutionary science, not the science of fifty years ago We’re here sharing science. [...] Science isn’t the answers, science is the process. Open science in paleoanthropology.

November 18 2013

Four short links: 18 November 2013

  1. The Virtuous Pipeline of Code (Public Resource) — Chicago partnering with Public Resource to open its legal codes for good. “This is great! What can we do to help?” Bravo Chicago, and everyone else—take note!
  2. Smithsonian’s 3D Data — models of 21 objects, from a gunboat to the Wright Brothers’ plane, to a wooly mammoth skeleton, to Lincoln’s life masks. I wasn’t able to find a rights statement on the site which explicitly governed the 3D models. (via Smithsonian Magazine)
  3. Anki’s Robot Cars (Xconomy) — The common characteristics of these future products, in Sofman’s mind: “Relatively simple and elegant hardware; incredibly complicated software; and Web and wireless connectivity to be able to continually expand the experience over time.” (via Slashdot)
  4. An Empirical Evaluation of TCP Performance in Online GamesWe show that because TCP was originally designed for unidirectional and network-limited bulk data transfers, it cannot adapt well to MMORPG traffic. In particular, the window-based congestion control and the fast retransmit algorithm for loss recovery are ineffective. Furthermore, TCP is overkill, as not every game packet needs to be transmitted in a reliably and orderly manner. We also show that the degraded network performance did impact users’ willingness to continue a game.

September 29 2013

Four short links: 1 October 2013

  1. Farmbot Wikiopen-source, scalable, automated precision farming machines.
  2. Amazon’s Chaotic Storage — photos from inside an Amazon warehouse. At the heart of the operation is a sophisticated database that tracks and monitors every single product that enters/leaves the warehouse and keeps a tally on every single shelf space and whether it’s empty or contains a product. Software-optimised spaces, for habitation by augmented humans.
  3. Public Safety Codes of the World — Kickstarter project to fund the release of public safety codes.
  4. #xoxo Thoreau Talk (Maciej Ceglowski) — exquisitely good talk by the Pinboard creator, on success, simplicity, and focus.

August 01 2013

Four short links: 2 August 2013

  1. Unhappy Truckers and Other Algorithmic ProblemsEven the insides of vans are subjected to a kind of routing algorithm; the next time you get a package, look for a three-letter letter code, like “RDL.” That means “rear door left,” and it is so the driver has to take as few steps as possible to locate the package. (via Sam Minnee)
  2. Fuel3D: A Sub-$1000 3D Scanner (Kickstarter) — a point-and-shoot 3D imaging system that captures extremely high resolution mesh and color information of objects. Fuel3D is the world’s first 3D scanner to combine pre-calibrated stereo cameras with photometric imaging to capture and process files in seconds.
  3. Corporate Open Source Anti-Patterns (YouTube) — Brian Cantrill’s talk, slides here. (via Daniel Bachhuber)
  4. Hacking for Humanity) (The Economist) — Getting PhDs and data specialists to donate their skills to charities is the idea behind the event’s organizer, DataKind UK, an offshoot of the American nonprofit group.

April 30 2013

Linking open data to augmented intelligence and the economy

After years of steady growth, open data is now entering into public discourse, particularly in the public sector. If President Barack Obama decides to put the White House’s long-awaited new open data mandate before the nation this spring, it will finally enter the mainstream.

As more governments, businesses, media organizations and institutions adopt open data initiatives, interest in the evidence behind  release and the outcomes from it is similarly increasing. High hopes abound in many sectors, from development to energy to health to safety to transportation.

“Today, the digital revolution fueled by open data is starting to do for the modern world of agriculture what the industrial revolution did for agricultural productivity over the past century,” said Secretary of Agriculture Tom Vilsack, speaking at the G-8 Open Data for Agriculture Conference.

As other countries consider releasing their public sector information as data and machine-readable formats onto the Internet, they’ll need to consider and learn from years of effort at data.gov.uk, data.gov in the United States, and Kenya in Africa.

nigel_shadboltnigel_shadboltOne of the crucial sources of analysis for the success or failure of open data efforts will necessarily be research institutions and academics. That’s precisely why research from the Open Data Institute and Professor Nigel Shadbolt (@Nigel_Shadbolt) will matter in the months and years ahead.

In the following interview, Professor Shadbolt and I discuss what lies ahead. His responses were lightly edited for content and clarity.

How does your research on artificial intelligence (AI) relate to open data?

AI has always fascinated me. The quest for understanding what makes us smart and how we can make computers smart has always engaged me. While we’re trying to understand the principles of human intelligence and build a “brain in a box, smarter robots” or better speech processing algorithms, the world’s gone and done a different kind of AI: augmented intelligence. The web, with billions of human brains, has a new kind of collective and distributive capability that we couldn’t even see coming in AI. A number of us have coined a phrase, “Web science,” to understand the Web at a systems level, much as we do when we think about human biology. We talk about “systems biology” because there are just so many elements: technical, organizational, cultural.

The Web really captured my attention ten years ago as this really new manifestation of collective problem-solving. If you think about the link into earlier work I’d done, in what was called “knowledge engineering” or knowledge-based systems, there the problem was that all of the knowledge resided on systems on people’s desks. What the web has done is finish this with something that looks a lot like a supremely distributed database. Now, that distributed knowledge base is one version of the Semantic Web. The way I got into open data was the notion of using linked data and semantic Web technologies to integrate data at scale across the web — and one really high value source of data is open government data.

What was the reason behind the founding and funding of the Open Data Institute (ODI)?

The open government data piece originated in work I did in 2003 and 2004. We were looking at this whole idea of putting new data-linking standards on the Web. I had a project in the United Kingdom that was working with government to show the opportunities to use these techniques to link data. As in all of these things, that work was reported to Parliament. There was real interest in it, but not really top-level heavy “political cover” interest. Tim Berners-Lee’s engagement with the previous prime minister led to Gordon Brown appointing Tim and I to look at setting up data.gov.uk, getting data released and then the current coalition government taking that forward.

Throughout this time, Tim and I have been arguing that we could really do with a central focus, an institute whose principle motivation was working out how we could find real value in this data. The ODI does exactly that. It’s got about $60 million of public money over five years to incubate companies, build capacity, train people, and ensure that the public sector is supplying high quality data that can be consumed. The fundamental idea is that you ensure high quality supply by generating a strong demand side. The good demand side isn’t just public sector, it’s also the private sector.

What have we learned so far about what works and what doesn’t? What are the strategies or approaches that have some evidence behind them?

I think there are some clear learnings. One that I’ve been banging on about recently has been that yes, it really does matter to turn the dial so that governments have a presumption to publish non-personal public data. If you would publish it anyway, under a Freedom of Information request or whatever your local legislative equivalent is, why aren’t you publishing it anyway as open data? That, as a behavioral change. is a big one for many administrations where either the existing workflow or culture is, “Okay, we collect it. We sit on it. We do some analysis on it, and we might give it away piecemeal if people ask for it.” We should construct publication process from the outset to presume to publish openly. That’s still something that we are two or three years away from, working hard with the public sector to work out how to do and how to do properly.

We’ve also learned that in many jurisdictions, the amount of [open data] expertise within administrations and within departments is slight. There just isn’t really the skillset, in many cases. for people to know what it is to publish using technology platforms. So there’s a capability-building piece, too.

One of the most important things is it’s not enough to just put lots and lots of datasets out there. It would be great if the “presumption to publish” meant they were all out there anyway — but when you haven’t got any datasets out there and you’re thinking about where to start, the tough question is to say, “How can I publish data that matters to people?”

The data that matters is revealed in the fact that if we look at the download stats on these various UK, US and other [open data] sites. There’s a very, very distinctive parallel curve. Some datasets are very, very heavily utilized. You suspect they have high utility to many, many people. Many of the others, if they can be found at all, aren’t being used particularly much. That’s not to say that, under that long tail, there isn’t large amounts of use. A particularly arcane open dataset may have exquisite use to a small number of people.

The real truth is that it’s easy to republish your national statistics. It’s much harder to do a serious job on publishing your spending data in detail, publishing police and crime data, publishing educational data, publishing actual overall health performance indicators. These are tough datasets to release. As people are fond of saying, it holds politicians’ feet to the fire. It’s easy to build a site that’s full of stuff — but does the stuff actually matter? And does it have any economic utility?

Page views and traffic aren’t ideal metrics for measuring success for an open data platform. What should people measure, in terms of actual outcomes in citizens’ lives? Improved services or money saved? Performance or corrupt politicians held accountable? Companies started or new markets created?

You’ve enumerated some of them. It’s certainly true that one of the challenges is to instrument the effect or the impact. Actually, it’s the last thing that governments, nation states, regions or cities who are enthused to do this thing do. It’s quite hard.

Datasets, once downloaded, may then be virally reproduced all over the place, so that you don’t notice it from a government site. One of the requirements in most of the open licensing which is so essential to this effort is usually has a requirement for essential attribution. Those licenses should be embedded in the machine readable datasets themselves. Not enough attention is paid to that piece of process, to actually noticing when you’re looking at other applications, other data and publishing efforts, that attribution is there. We should be smarter about getting better sense from the attribution data.

The other sources of impact, though: How do you evidence actual internal efficiencies and internal government-wide benefits of open data? I had an interesting discussion recently, where the department of IT had said, “You know, I thought this was all stick and no carrot. I thought this was all in overhead, to get my data out there for other people’s benefits, but we’re now finding it so much easier to re-consume our own data and repurpose it in other contexts that it’s taken a huge amount of friction out of our own publication efforts.”

Quantified measures would really help, if we had standard methods to notice those kinds of impacts. Our economists, people whose impact is around understanding where value is created, really haven’t embraced open markets, particularly open data markets, in a very substantial way. I think we need a good number of capable economists pilling into this, trying to understand new forms of value and what the values are that are created.

I think a lot of the traditional models don’t stand up here. Bizarrely, it’s much easier to measure impact when information scarcity exists and you have something that I don’t, and I have to pay you a certain fee for that stuff. I can measure that value. When you’ve taken that asymmetry out, when you’ve made open data available more widely, what are the new things that flourish? In some respects, you’ll take some value out of the market, but you’re going to replace it by wider, more distributed, capable services. This is a key issue.

The ODI will certainly be commissioning and is undertaking work in this area. We published a piece of work jointly with Deloitte in London, looking at evidence-linked methodology.

You mentioned the demand-side of open data. What are you learning in that area — and what’s being done?

There’s an interesting tension here. If we turn the dial in the governmental mindset to the “presumption to publish” — and in the UK, our public data principles actually embrace that as government policy — you are meant to publish unless there’s an issue in personal information or national security why you would not. In a sense, you say, “Well, we just publish everything out there? That’s what we’ll do. Some of it will have utility, and some of it won’t.”

When the Web took off, and you offered pages as a business or an individual, you didn’t foresee the link-making that would occur. You didn’t foresee that PageRank would ultimately give you a measure of your importance and relevance in the world and could even be monetized after the fact. You didn’t foresee that those pages have their own essential network effect, that the more pages there are that interconnect, that there’s value being created out of it and so there’s is a strong argument [for publishing them].

So, you know, just publish. In truth, the demand side is an absolutely great and essential test of whether actually [publishing data] does matter.

Again, to take the Web as an analogy, large amounts of the Web are unattended to, neglected, and rot. It’s just stuff nobody cares about, actually. What we’re seeing in the open data effort in the UK is that it’s clear that some data is very privileged. It’s at the center of lots of other datasets.

In particular, [data about] location, occurrence, and when things occurred, and stable ways of identifying those things which are occurring. Then, of course, the data space that relates to companies, their identifications, the contracts they call, and the spending they engage in. That is the meat and drink of business intelligence apps all across the planet. If you started to turn off an ability for any business intelligence to access legal identifiers or business identifiers, all sorts of oversight would fall apart, apart from anything else.

The demand side [of open data] can be characterized. It’s not just economic. It will have to do with transparency, accountability and regulatory action. The economic side of open data gives you huge room for maneuver and substantial credibility when you can say, “Look, this dataset of spending data in the UK, published by local authorities, is the subject of detailed analytics from companies who look at all data about how local authorities and governments are spending their data. They sell procurement analysis insights back to business and on to third parties and other parts of the business world, saying ‘This is the shape of how the UK PLC is buying.’”

What are some of the lessons we can learn from how the World Wide Web grew and the value that it’s delivered around the world?

That’s always a worry, that, in some sense, the empowered get more powerful. What we do see is that, in open data in particular, new sorts of players couldn’t enter the game at all.

My favorite example is in mass transportation. In the UK, we have to fight quite hard to get some of the data from bus, rail and other forms of transportation made openly available. Until that was done, there was a pretty small number of supplies from this market.

In London, where all of it was made available from the Transport for London Authority, there’s just been an explosion of apps and businesses who are giving you subtly and distinct experiences as users of that data. I’ve got about eight or nine apps on my phone that give me interestingly distinctive views of moving about the city of London. I couldn’t have predicted or anticipated many of those exist.

I’m sure the companies who held that data could’ve spent large amounts of money and still not given me anything like the experience I now have. The flood of innovation around the data has really been significant and many, many more players and stakeholders in that space.

The Web taught us that serendipitous reuse, where you can’t anticipate where the bright idea comes from, is what is so empowering. The flipside of that is that it also reveals that, in some cases, the data isn’t necessarily of a quality that you might’ve thought. This effort might allow for civic improvement or indeed, business improvement in some cases, where businesses come and improve the data the state holds.

What’s happening in the UK with the so-called “MiData Initiative,” which posits that people have a right to access and use personal data disclosed to them?

I think this is every bit as potentially disruptive and important as open government data. We’re starting to see the emergence of what we might think of as a new class of important data, “personal assets.”

People have talked about “personal information management systems” for a long time now. Frequently, it’s revolved around managing your calendar or your contact list, but it’s much deeper. Imagine that you, the consumer, or you, the citizen, had a central locus of authority around data that was relevant to you: consumer data from retail, from the banks that you deal with, from the telcos you interact with, from the utilities you get your gas, water and electricity from. Imagine if that data infosphere was something that you could access easily, with a right to reuse and redistribute it as you saw fit.

The canonical example, of course, is health data. It isn’t all data that business holds, it’s also data the state holds, like your health records, educational transcript, welfare, tax, or any number of areas.

In the UK, we’ve been working towards empowering consumers, in particular through this MiData program. We’re trying to get to a place where consumers have a right to data held about their transactions by businesses, [released] back to them in a reusable and flexible way. We’ve been working on a voluntary program in this area for the last year. We have a consultation on taking up power to require large companies to give that information back. There is a commitment to the UK, for the first time, to get health records back to patients as data they control, but I think it has to go much more widely.

Personal data is a natural complement to open data. Some of the most interesting applications I’m sure we’re going to see in this area are where you take your personal data and enrich it with open data relating to businesses, the services of government, or the actual trading environment you’re in. In the UK, we’ve got six large energy companies that compete to sell energy to you.

Why shouldn’t groups and individuals be able to get together and collectively purchase in the same way that corporations can purchase and get their discounts? Why can’t individuals be in a spot market, effectively, where it’s easy to move from one supplier to another? Along with those efficiencies in the market and improvements in service delivery, it’s about empowering consumers at the end of the day.

This post is part of our ongoing series on the open data economy.

April 18 2013

Sprinting toward the future of Jamaica

Creating the conditions for startups to form is now a policy imperative for governments around the world, as Julian Jay Robinson, minister of state in Jamaica’s Ministry of Science, Technology, Energy and Mining, reminded the attendees at the “Developing the Caribbean” conference last week in Kingston, Jamaica.

photo-22photo-22

Robinson said Jamaica is working on deploying wireless broadband access, securing networks and stimulating tech entrepreneurship around the island, a set of priorities that would have sounded of the moment in Washington, Paris, Hong Kong or Bangalore. He also described open access and open data as fundamental parts of democratic governance, explicitly aligning the release of public data with economic development and anti-corruption efforts. Robinson also pledged to help ensure that Jamaica’s open data efforts would be successful, offering a key ally within government to members of civil society.

The interest in adding technical ability and capacity around the Caribbean was sparked by other efforts around the world, particularly Kenya’s open government data efforts. That’s what led the organizers to invite Paul Kukubo to speak about Kenya’s experience, which Robinson noted might be more relevant to Jamaica than that of the global north.

Kukubo, the head of Kenya’s Information, Communication and Technology Board, was a key player in getting the country’s open data initiative off the ground and evangelizing it to developers in Nairobi. At the conference, Kukubo gave Jamaicans two key pieces of advice. First, open data efforts must be aligned with national priorities, from reducing corruption to improving digital services to economic development.

“You can’t do your open data initiative outside of what you’re trying to do for your country,” said Kukubo.

Second, political leadership is essential to success. In Kenya, the president was personally involved in open data, Kukubo said. Now that a new president has been officially elected, however, there are new questions about what happens next, particularly given that pickup in Kenya’s development community hasn’t been as dynamic as officials might have hoped. There’s also a significant issue on the demand-side of open data, with respect to the absence of a Freedom of Information Law in Kenya.

When I asked Kukubo about these issues, he said he expects a Freedom of Information law will be passed this year in Kenya. He also replied that the momentum on open data wasn’t just about the supply side.

“We feel that in the usage side, especially with respect to the developer ecosystem, we haven’t necessarily gotten as much traction from developers using data and interpreting cleverly as we might have wanted to have,” he said. “We’re putting putting more into that area.”

With respect to leadership, Kukubo pointed out that newly elected Kenyan President Uhuru Kenyatta drove open data release and policy when he was the minister of finance. Kukubo expects him to be very supportive of open data in office.

The development of open data in Jamaica, by way of contrast, has been driven by academia, said professor Maurice McNaughton, director of the Center of Excellence at the Mona School of Business at the University of the West Indies (UWI). The Caribbean Open Institute, for instance, has been working closely with Jamaica’s Rural Agriculture Development Authority (RADA). There are high hopes that releases of more data from RADA and other Jamaican institutions will improve Jamaica’s economy and the effectiveness of its government.

Open data could add $35 million annually to the Jamaican economy, said Damian Cox, director of the Access to Information Unit in the Office of the Prime Minister, citing a United Nations estimate. Cox also explicitly aligned open data with measuring progress toward Millennium Development Goals, positing that increasing the availability of data will enable the civil society, government agencies and the UN to more accurately assess success.

The development of (open) data-driven journalism

Developing the Caribbean focused on the demand side of open data as well, particularly the role of intermediaries in collecting, cleaning, fact checking, and presenting data, matched with necessary narrative and context. That kind of work is precisely what data-driven journalism does, which is why it was one of the major themes of the conference. I was invited to give an overview of data-driven journalism that connected some trends and highlighted the best work in the field.

I’ve written quite a bit about how data-driven journalism is making sense of the world elsewhere, with a report yet to come. What I found in Jamaica is that media there have long since begun experimenting in the field, from the investigative journalism at Panos Caribbean to the relatively recent launch of diGJamaica by the Gleaner Company.

diGJamaica is modeled upon the Jamaican Handbook and includes more than a million pages from The Gleaner newspaper, going back to 1834. The site publishes directories of public entities and public data, including visualizations. It charges for access to the archives.

Legends and legacies

Usain Bolt in JamaicaUsain Bolt in Jamaica

Olympic champion Usain Bolt, photographed in his (fast) car at the UWI/Usain Bolt Track in Mona, Jamaica.

Normally, meeting the fastest man on earth would be the most memorable part of any trip. The moment that left the deepest impression from my journey to the Caribbean, however, came not from encountering Usain Bolt on a run but from within a seminar room on a university campus.

As a member of a panel of judges, I saw dozens of young people present after working for 30 hours at a hackathon at the University of the West Indies. While even the most mature of the working apps was still a prototype, the best of them were squarely focused on issues that affect real Jamaicans: scoring the risk of farmers that needed banking loans and collecting and sharing data about produce.

The winning team created a working mobile app that would enable government officials to collect data at farms. While none of the apps are likely to be adopted by the agricultural agency in its current form, or show up in the Google Play store this week, the experience the teams gained will help them in the future.

As I left the island, the perspective that I’d taken away from trips to Brazil, Moldova and Africa last year was further confirmed: technical talent and creativity can be found everywhere in the world, along with considerable passion to apply design thinking, data and mobile technology to improve the societies people live within. This is innovation that matters, not just clones of popular social networking apps — though the judges saw more than a couple of those ideas flow by as well.

In the years ahead, Jamaican developers will play an important role in media, commerce and government on the island. If attracting young people to engineering and teaching them to code is the long-term legacy of efforts like Developing the Caribbean, it will deserve its own thumbs up from Mr. Bolt. The track to that future looks wide open.

photo-23photo-23

Disclosure: the cost of my travel to Jamaica was paid for by the organizers of the Developing the Caribbean conference.

March 28 2013

Four short links: 28 March 2013

  1. What American Startups Can Learn From the Cutthroat Chinese Software IndustryIt follows that the idea of “viral” or “organic” growth doesn’t exist in China. “User acquisition is all about media buys. Platform-to-platform in China is war, and it is fought viciously and bitterly. If you have a Gmail account and send an email to, for example, NetEase163.com, which is the local web dominant player, it will most likely go to spam or junk folders regardless of your settings. Just to get an email to go through to your inbox, the company sending the email needs to have a special partnership.” This entire article is a horror show.
  2. White House Hangout Maker Movement (Whitehouse) — During the Hangout, Tom Kalil will discuss the elements of an “all hands on deck” effort to promote Making, with participants including: Dale Dougherty, Founder and Publisher of MAKE; Tara Tiger Brown, Los Angeles Makerspace; Super Awesome Sylvia, Super Awesome Maker Show; Saul Griffith, Co-Founder, Otherlab; Venkatesh Prasad, Ford.
  3. Municipal Codes of DC Freed (BoingBoing) — more good work by Carl Malamud. He’s specifically providing data for apps.
  4. The Modern Malware Review (PDF) — 90% of fully undetected malware was delivered via web-browsing; It took antivirus vendors 4 times as long to detect malware from web-based applications as opposed to email (20 days for web, 5 days for email); FTP was observed to be exceptionally high-risk.

March 22 2013

Sensoring the news

When I went to the 2013 SXSW Interactive Festival to host a conversation with NPR’s Javaun Moradi about sensors, society and the media, I thought we would be talking about the future of data journalism. By the time I left the event, I’d learned that sensor journalism had long since arrived and been applied. Today, inexpensive, easy-to-use open source hardware is making it easier for media outlets to create data themselves.

“Interest in sensor data has grown dramatically over the last year,” said Moradi. “Groups are experimenting in the areas of environmental monitoring, journalism, human rights activism, and civic accountability.” His post on what sensor networks mean for journalism sparked our collaboration after we connected in December 2011 about how data was being use in the media.

AP Beijing Air Quality graphicAP Beijing Air Quality graphic

Associated Press visualization of Beijing air quality. See related feature.

At a SXSW panel on “sensoring the news,” Sarah Williams, an assistant professor at MIT, described how the Civic Data Design Project had partnered with the Associated Press to independently measure air quality in Beijing.

Prior to the 2008 Olympics, the coaches of the Olympic teams had expressed serious concern about the impact of air pollution on the athletes. That, in turn, put pressure on the Chinese government to take substantive steps to improve those conditions. While the Chinese government released an index of air quality, explained Williams, they didn’t explain what went into it, nor did they provide the raw data.

The Beijing Air Tracks project arose from the need to determine what the conditions on the ground really were. AP reporters carried sensors connected to their cellphones to detect particulate and carbon monoxide levels, enabling them to report air quality conditions back in real-time as they moved around the Olympic venues and city.

The sensor data helped the AP measure the effect of policy decisions that the Chinese government made, said Williams, from closing down factories to widespread shutdowns of different kinds of industries. The results from the sensor journalism project, which showed a decrease in particulates but conditions 12 to 25 times worse than New York City on certain days, were published as an interactive data visualization.

AP Beijing mash-up of particulate levels and photography in Beijing.AP Beijing mash-up of particulate levels and photography in Beijing.

Associated Press mashup of particulate levels and photography at the Olympic stadium in Beijing over time.

This AP project is a prime example of how sensors, data journalism, and old-fashioned, on-the-ground reporting can be combined to shine a new level of accountability on official reports. It won’t be the last time this happens, either. Around the world, from the Amazon to Los Angeles to Japan, sensor data is now being put to use by civic media and journalists.

Sensing civic media

There are an increasing number of sensors in our lives, said John Keefe, a data news editor for WNYC, speaking at his SXSW panel in Austin. From the physical sensors in smartphones to new possibilities built with Arduino or Raspberry Pi hardware, Keefe highlighted how journalists could seize hold of new possibilities.

“Google takes data from maps and Android phones and creates traffic data,” Keefe said. “In a sense, that’s sensor data being used live in a public service. What are we doing in journalism like that? What could we do?”

The evolution of Safecast offers a glimpse of networked accountability, collecting and publishing radiation data through sensors, citizen science and the Internet. The project, which won last year’s Knight News Challenge on data, is now building the infrastructure to enable people to help monitor air quality in Los Angeles.

Sensor journalism is also being applied to make sense of the world in using remote sensing data and satellite imagery. The director of that project, Gustavo Faleiros, recently described how environmental reporting can be combined with civic media to collect data, with relevant projects in Asia, Africa and the Americas. For instance, Faleiros cited an environmental monitoring project led by Eric Paulos of the University of California at Berkeley’s Center for New Media, where sensors on taxis were used to gather data in Accra, Ghana.

Another direction that sensor data could be applied lies in social justice and education. At SXSW, Sarah Williams described [slides] how the Air Quality Egg, an open source hardware device, is being used to make an argument for public improvements. At the Cypress Hills Community School, kids are bringing the eggs home, measuring air quality and putting data online, said Williams.

Air Quality Eggs at Cypress Hill Community SchoolAir Quality Eggs at Cypress Hill Community School

Air Quality Eggs at Cypress Hill Community School.

“Health sensors are useful when they can compare personal real-time data against population-wide data,” said Nadav Aharony, who also spoke on our panel in Austin.

Aharony talked about how Behavio, a startup based upon his research on smartphones and data at MIT, has created funf, an open source sensing toolkit for Android devices. Aharony’s team has now deployed an integration with Dropbox that requires no coding ability to use.

According to Aharony, the One Laptop Per Child project is using funf in tablets deployed in Africa, in areas where there are no schools. Researchers will use funf as a behavioral tool to sense how children are interacting with the devices, including whether tablets are next to one another.

Sensing citizen science

While challenges lie ahead, it’s clear that sensors will be used to create data where there was none before. At SXSW, Williams described a project in Nairobi, Kenya, where cellphones are being used to map informal bus systems.

The Digital Matatus project is publishing the data into the General Transit Feed Standard, one of the most promising emerging global standards for transit data. “Hopefully, a year from now [we] will have all the bus routes from Nairobi,” Williams said.

Map of Matatus stops in Nairobi, KenyaMap of Matatus stops in Nairobi, Kenya

Map of Matatus stops in Nairobi, Kenya

Data journalism has long depended upon official data released by agencies. In recent years, data journalists have begun scraping data. Sensors allow another step in that evolution to take place, where civic media can create data to inform the public interest.

Matt Waite, a professor of practice and head of the Drone Journalism Lab at the University of Nebraska-Lincoln, joined the panel in Austin using a Google Hangout and shared how he and his students are experimenting with sensors to gather data for projects.

Journalists are going to run up against stories where no one has data, he said. “The old way was to give up,” said Waite. “I don’t think that’s the way to do it.”

Sensors give journalists a new, interesting way to enlist a distributed audience in gathering needed data, he explained. “Is it ‘capital N’ news? Probably not,” said Waite. “But it’s something people are really interested in. The easy part is getting a parts list together and writing software. The hard part is the creative process it takes to figure out what we are going to measure and what it means.”

In an interview with the Nieman Journalism Lab on sensor journalism, Waite also raised practical concerns with the quality of data collection that can be gathered with inexpensive hardware. “One legitimate concern about doing this is, you’re talking about doing it with the cheapest software you can find,” Waite told the Nieman Lab’s Caroline O’Donovan. “It’s not expertly calibrated. It’s not as sensitive as it possibly could be.”

Those are questions that will be explored practically in New York in the months ahead, when New York City’s public radio station will be collaborating with the Columbia School of Public Health to collect data about New York’s environmental conditions. They’ll put particulate detectors, carbon dioxide monitors, leg motion sensors, audio monitors, cameras and GPS trackers on bicycles and ride around the city collecting pollution data.

“At WNYC, we already do crowdsourcing, where we ask our audience to do something,” said Keefe. “What if we could get our audience to do something with this? What if you could get an audience to work with you to solve a problem?”

Keefe also announced the Cicada Project, where WNYC is inviting its listeners to build homemade sensors and track the emergence of cicadas this spring across New Jersey, New York and the Northeast region.

This cicada tracker project is a 21st century parallel to the role that birders have played for decades in the annual Christmas Bird Count, creating new horizons for citizen science and public media.

Update: WNYC’s public is responding in interesting ways that go beyond donations. On Twitter, Keefe highlighted the work of a NYC-based hacker, Guan, who was able to make a cicada tracker for $20, 1/4 the cost of WNYC’s kit.

Sensing challenges ahead

Just as civic technologists need to be mindful of “solutionism,” so too will data journalists need to be aware of the “sensorism” that exists in the health care world, as John Wilbanks pointed out this winter.

“Sensorism is rife in the sciences,” Wilbanks wrote. “Pick a data generation task that used to be human centric and odds are someone is trying to automate and parallelize it (often via solutionism, oddly — there’s an app to generate that data). What’s missing is the epistemic transformation that makes the data emerging from sensors actually useful to make a scientific conclusion — or a policy decision supposedly based on a scientific consensus.”

Anyone looking to practice sensor journalism will face interesting challenges, from incorrect conclusions based upon faulty data to increased risks to journalists carrying the sensors, to gaming or misreporting.

“Data accuracy is both a real and a perceived problem,” said Moradi at SXSW. “Third-party verification by journalists or other non-aligned groups may be needed.”

Much as in the cases of “drone journalism” and data journalism, context, usage and ethics have to be considered before you launch a quadcopter, fire up a scraper or embed sensors around your city. The question you come back to is whether you’re facing a new ethical problem or an old ethical problem with new technology, suggested Waite at SXSW. “The truth is, most ethical issues you can find with a new analogue.”

It may be, however, that sensor data, applied to taking a “social MRI” or other uses, may present us with novel challenges. For instance, who owns the data? Who can access or use it? Under what conditions?

A GPS device is a form of sensor, after all, and one that’s quite useful to law enforcement. While the Supreme Court ruled that the use of a GPS device for tracking a person without a warrant was unconstitutional, sensor data from cellphones may provide law enforcement with equal or greater insight into a target’s movements. Journalists may well face unexpected questions about protecting sources if their sensor data captures the movements or actions of a person of interest.

“There’s a lot of concern around privacy,” said Moradi. “What data can the government request? Will private companies abuse personal data for marketing or sales? Do citizens have the right to personal data held by companies and government?”

Aharony outlined many of the issues in a 2011 paper on stealing reality, exploring what happens when criminals become data scientists.

“It’s like a slow-moving attack if you attach yourself to someone’s communication,” said Aharony, in a follow-up interview in Austin. “‘iPhonegate‘ didn’t surprise people who know about mobile app data or how the cellular network is architected. Look at what happened to Path. You can make mistakes without meaning to. You have to think about this and encrypt the data.”

This post is part of our series investigating data journalism.

March 19 2013

The City of Chicago wants you to fork its data on GitHub

GitHub has been gaining new prominence as the use of open source software in government grows.

Earlier this month, I included a few thoughts from Chicago’s chief information officer, Brett Goldstein, about the city’s use of GitHub, in a piece exploring GitHub’s role in government.

While Goldstein says that Chicago’s open data portal will remain the primary means through which Chicago releases public sector data, publishing open data on GitHub is an experiment that will be interesting to watch, in terms of whether it affects reuse or collaboration around it.

In a followup email, Goldstein, who also serves as Chicago’s chief data officer, shared more about why the city is on GitHub and what they’re learning. Our discussion follows.

Chicago's presence on GitHubChicago's presence on GitHub

The City of Chicago is on GitHub.

What has your experience on GitHub been like to date?

Brett Goldstein: It has been a positive experience so far. Our local developer community is very excited by the MIT License on these datasets, and we have received positive reactions from outside of Chicago as well.

This is a new experiment for us, so we are learning along with the community. For instance, GitHub was not built to be a data portal, so it was difficult to upload our buildings dataset, which was over 2GB. We are rethinking how to deploy that data more efficiently.

Why use GitHub, as opposed to some other data repository?

Brett Goldstein: GitHub provides the ability to download, fork, make pull requests, and merge changes back to the original data. This is a new experiment, where we can see if it’s possible to crowdsource better data. GitHub provides the necessary functionality. We already had a presence on GitHub, so it was a natural extension to that as a complement to our existing data portal.

Why does it make sense for the city to use or publish open source code?

Brett Goldstein: Three reasons. First, it solves issues with incorporating data in open source and proprietary projects. The city’s data is available to be used publicly, and this step removes any remaining licensing barriers. These datasets were targeted because they are incredibly useful in the daily life of residents and visitors to Chicago. They are the most likely to be used in outside projects. We hope this data can be incorporated into existing projects. We also hope that developers will feel more comfortable developing applications or services based on an open source license.

Second, it fits within the city’s ethos and vision for data. These datasets are items that are visible in daily life — transportation and buildings. It is not proprietary data and should be open, editable, and usable by the public.

Third, we engage in projects like this because they ultimately benefit the people of Chicago. Not only do our residents get better apps when we do what we can to support a more creative and vibrant developer community, they also will get a smarter and more nimble government using tools that are created by sharing data.

We open source many of our projects because we feel the methodology and data will benefit other municipalities.

Is anyone pulling it or collaborating with you? Have you used that code? Would you, if it happened?

Brett Goldstein: We collaborated with Ian Dees, who is a significant contributor to OpenStreetMaps, to launch this idea. We anticipate that buildings data will be integrated in OpenStreetMaps now that it’s available with a compatible license.

We have had 21 forks and a handful of pull requests fixing some issues in our README. We have not had a pull request fixing the actual data.

We do intend to merge requests to fix the data and are working on our internal process to review, reject, and merge requests. This is an exciting experiment for us, really at the forefront of what governments are doing, and we are learning along with the community as well.

Is anyone using the open data that wasn’t before, now that it’s JSON?

Brett Goldstein: We seem to be reaching a new audience with posting data on GitHub, working in tandem with our heavily trafficked data portal. A core goal of this administration is to make data open and available. We have one of the most ambitious open data programs in the country. Our portal has over 400 datasets that are machine readable, downloadable and searchable. Since it’s hosted on Socrata, basic analysis of the data is possible as well.

March 08 2013

GitHub gains new prominence as the use of open source within governments grows

github-social-codinggithub-social-codingWhen it comes to government IT in 2013, GitHub may have surpassed Twitter and Facebook as the most interesting social network. 

GitHub’s profile has been rising recently, from a Wired article about open source in government, to its high profile use by the White House and within the Consumer Financial Protection Bureau. This March, after the first White House hackathon in February, the administration’s digital team posted its new API standards on GitHub. In addition to the U.S., code from the United Kingdom, Canada, Argentina and Finland is also on the platform.

“We’re reaching a tipping point where we’re seeing more collaboration not only within government agencies, but also between different agencies, and between the government and the public,” said GitHub head of communications Liz Clinkenbeard, when I asked her for comment.

Overall, 2012 was a breakout year for the use of GitHub by government, with more than 350 government code repositories by year’s end.

Total government GitHub repositoriesTotal government GitHub repositories

Total number of government repositories on GitHub.

In January 2012, the British government committed the code for GOV.UK to GitHub.

NASA, after its first commit, added 11 more code repositories over the course of the year.

In September, the new Open Gov Foundation published the code for the MADISON legislative platform. In December, the U.S. Code went on GitHub.

GitHub’s profile was raised further in Washington this week when Ben Balter was announced as the company’s federal liaison. Balter made some open source history last year, when he was part of the federal government’s first agency-to-agency pull request. He also was a big part of giving the White House some much-needed geek cred when he coded the administration’s digital government strategy in HTML5.

Balter will be GitHub’s first government-focused employee. He won’t, however, be saddled with an undecipherable title. In a sly dig at the slow-moving institutions of government, and in keeping with GitHub’s love for octocats, Balter will be the first “Government Bureaucat,” focused on “helping government to do all sorts of governmenty things, well, more awesomely,” wrote GitHub CIO Scott Chacon.

Part of Balter’s job will be to evangelize the use of GitHub’s platform as well as open source in government, in general. The latter will come naturally to him, given how he and the other Presidential Innovation Fellows approached their work.

“Virtually everything the Presidential Innovation Fellows touched was open sourced,” said Balter when I interviewed him earlier this week. “That’s everything from better IT procurement software to internal tools that we used to streamline paperwork. Even more important, much of that development (particularly RFPEZ) happened entirely in the open. We were taking the open source ethos and applying it to how government solutions were developed, regardless whether or not the code was eventually public. That’s a big shift.”

Balter is a proponent of social coding in the open as a means of providing some transparency to interested citizens. “You can go back and see why an agency made a certain decision, especially when tools like these are used to aid formal decision making,” he said. “That can have an empowering effect on the public.”

Forking code in city hall and beyond

There’s notable government activity beyond the Beltway as well.

The City of Chicago is now on GitHub, where chief data officer and city CIO Brett Goldstein is releasing open data as JSON files, along with open source code.

Both Goldstein and Philadelphia chief data officer Mark Headd are also laudably participating in conversations about code and data on Hacker News threads.

“Chicago has released over 400 datasets using our data portal, which is located at data.cityofchicago.org,” Headd wrote on HackerNews. While Goldstein says that the city’s portal will remain the primary way they release public sector data, publishing data on GitHub is an experiment that will be interesting to watch, in terms of whether it affects reuse.

“We hope [the datasets on GitHub] will be widely used by open source projects, businesses, or non-profits,” wrote Goldstein. “GitHub also allows an on-going collaboration with editing and improving data, unlike the typical portal technology. Because it’s an open source license, data can be hosted on other services, and we’d also like to see applications that could facilitate easier editing of geographic data by non-technical users.”

Headd is also on GitHub in a professional capacity, where he and his colleagues have been publishing code to a City of Philadelphia repository.

“We use [GitHub] to share some of our official city apps,” commented Headd on the same Hacker News thread. “These are usually simple web apps built with tools like Bootstrap and jQuery. We’ll be open sourcing more of these going forward. Not only are we interested in sharing the code for these apps, we’re actively encouraging people to fork, improve and send pull requests.”

While there’s still a long road ahead for widespread code sharing between the public and government, the economic circumstances of cities and agencies could create the conditions for more code sharing inside government. In a TED Talk last year, Clay Shirky suggested that adopting open source methods for collaboration could even transform government.

A more modest (although still audacious) goal would be to simply change how government IT is done.

“I’ve often said, the hardest part of being a software developer is training yourself to Google the problem first and see if someone else has already solved it,” said Balter during our interview. “I think we’re going to see government begin to learn that lesson, especially as budgets begin to tighten. It’s a relative ‘app store’ of technology solutions just waiting to be used or improved upon. That’s the first step: rather than going out to a contractor and reinventing the wheel each time, it’s training ourselves that we’re part of a larger ecosystem and to look for prior art. On the flip side, it’s about contributing back to that commons once the problem has been solved. It’s about realizing you’re part of a community. We’re quickly approaching a tipping point where it’s going to be easier for government to work together than alone. All this means that a taxpayer’s dollar can go further, do more with less, and ultimately deliver better citizen services.”

Some people may understandably bridle at including open source code and open data under the broader umbrella of “open government,” particularly if such efforts are not balanced by adherence to good government principles around transparency and accountability.

That said, there’s reason to hail collaboration around software and data as bonafide examples of 21st century civic participation, where better platforms for social coding enable improved outcomes. The commits and pulls of staff and residents on GitHub may feel like small steps, but they represent measurable progress toward more government not just of the people, but with the people.

“Open source in government is nothing new,” said Balter. “What’s new is that we’re finally approaching a tipping point at which, for federal employees, it’s going to be easier to work together, than work apart. Whereas before, ‘open source’ often meant compiling, zipping, and uploading, when you fuse the internal development tools with the external publishing tools, and you make those tools incredibly easy to use, participating in the open source community becomes trivial. Often, it can be more painful for an agency to avoid it completely. I think we’re about to see a big uptick in the amount of open source participation, and not just in the traditional sense. Open source can be between business units within an agency. Often the left hand doesn’t know what the right is doing between agencies. The problems agencies face are not unique. Often the taxpayer is paying to solve the same problem multiple times. Ultimately, in a collaborative commons with the public, we’re working together to make our government better.”

February 25 2013

Governments looking for economic ROI must focus on open data with business value

There’s increasing interest in the open data economy from the research wings of consulting firms. Capgemini Consulting just published a new report on the open data economy. McKinsey’s Global Institute is following up its research on big data with an inquiry into open data and government innovation. Deloitte has been taking a long look at open data business models. Forrester says open data isn’t (just) for governments anymore and says more research is coming. If Bain & Company doesn’t update its work on “data as an asset” this year to meet inbound interest in open data from the public sector, it may well find itself in the unusual position of lagging the market for intellectual expertise.

As Radar readers know, I’ve been trying to “make dollars and sense” of the open data economy since December, looking at investments, business models and entrepreneurs.

harvey_lewisharvey_lewisIn January, I interviewed Harvey Lewis, the research director for the analytics department of Deloitte U.K. Lewis, who holds a doctorate in hypersonic aerodynamics, has been working for nearly 20 years on projects in the public sector, defense industry and national security. Today, he’s responsible for applying an analytical eye to consumer businesses, manufacturing, banking, insurance and the public sector. Over the past year, his team has been examining the impact of open data releases on the economy of the United Kingdom. The British government’s embrace of open data makes such research timely.

Given the many constituencies interested in open data these days, from advocates for transparency and good government to organizations interested in co-creating civic services to entrepreneurs focused on building and scaling sustainable startups, one insight stood out from our discussion in particular:

“The things you do to enable transparency … aren’t necessarily the same things you do to enable economic growth and economic impact,” said Lewis.

“For economic growth, focus on data that are likely to diffuse throughout the economy in the widest and greatest possible way. That’s dynamic data, data that’s granular, collected on a regular basis, updated, and made available through APIs that application developers and businesses can use.”

The rest of our interview, lightly edited for content and clarity, follows.

Why is Deloitte interested in open data?

Harvey Lewis: In late 2011, we realized that open data was probably going to be one of those areas that was likely to be transformational, maybe not in the short term, but certainly in the long term. A lot of the technology that companies are using to do analysis of data will become increasingly commoditized, so the advantage that people were going to get was going to come through their interpretations of data and by looking for other commercial mechanisms for getting value from data.

The great thing about open data is that it provides those opportunities. It provides, in some ways, a level playing field and ways of creating revenue and opportunities that just don’t exist in other spaces.

You’ve been investigating the demand for open data from businesses. How have you approached the research?

Harvey Lewis: We’ve been working with professor Nigel Shadbolt in the U.K., who is one of the great champions on the global stage for open data. He and I started work on our open data activity back about 12 months ago.

Our interest was not so much in open government data but more the spectrum of open data, from government, business and individual citizens. We thought we would run an exercise over the spring of 2012, inviting various organizations to come and debate open data. We were very keen to get a cross-section of people from public and private sectors in those discussions because we wanted to understand what businesses thought of open data. We published a report [PDF] in June of last year, which was largely qualitative, looking at what we thought was happening in the world of open data, from a business perspective.

There were four main hypotheses to that vision:

The first part was that we thought every business should have a strategy to explore open data. If you look at the quantity of data that’s now available globally, even just from government, it’s an extraordinary amount, if you measure it just by the number of datasets that are published. In the U.K., it’s in the tens of thousands. In the U.S., it’s in the hundreds of thousands. There’s a vast resource of data that’s freely available that can be used to supplement existing sources of information, proprietary or otherwise, and enrich companies’ views of the world.

The second part was that businesses themselves would start to open up their data. There are different ways of gaining revenue and value from data if they opened it up. This was quite a controversial subject, as I’m sure you might imagine, in some of the discussions. Nevertheless, we’re starting already to see companies releasing subsets of their data on competition websites, inviting the crowd to come up with innovative solutions. We’re also seeing evidence that companies are releasing their data to improve the way they interact with their customers. I think one of the great broad impacts of businesses opening up their data is reputational enhancement — and that can have a real economic benefit.

The third part of our hypothesis was that open data would inspire customer engagement. That is, I think, a great topic for exploration within the public sector itself. Releasing this data isn’t just about “publishing it and they will come” — it’s about releasing data and using that data to engage in a different type of conversation with citizens and consumers.

Certainly in the U.K., we’re starting to see the fruits of that and some new initiatives. There’s a concept called “midata” in the U.K., where the government is encouraging service providers to release consumer data back to individuals so they can shop around for the best deals in the market. I think that’s a great vision for open data.

The fourth part was the privacy and the ethical responsibilities that come with the processing of open data, with companies and government starting to work more closely together to come up with a new paradigm for responsibility and privacy.

Nigel Shadbolt and I committed to doing further work on the economic business case for open data to try to address some of these hypothetical views of the future.

That launched this second phase of our work, which was trying to quantify that economic benefit. We decided very early on, because of Nigel Shadbolt’s relationship to the Open Data Institute, to work closely with that organization, as it was born in the summer of 2012.

We spent a lot of time gathering data. Particularly, we were looking at whether or not we could infer from the demand for open data from a variety of government portals what the economic benefit would be. We looked to a number of other measures and data sources, including a very broad balance sheet analysis to try to infer how companies were increasingly using data to run their businesses and benefit their businesses.

What did you find in this inquiry?

Harvey Lewis: We published a second report, called “Open Growth,” in early December of last year. The fundamental problem in trying to estimate the economic benefit is around, essentially, a lack of data. It sounds quite ironic, doesn’t it, that there’s a lack of data to quantify the effect of open data?

In particular, it’s still early days for determining economic benefit. When you’re trying to uncover second-order effects in the economy due to open data, it’s very early days to be able to see those effects percolate through different sectors. We were really challenged. Nevertheless, we were able to look quite closely at the sorts of data that the U.K. government had been publishing and draw some conclusions about what that meant for the economy.

For example, we were able to categorize nearly 40,000 datasets that are publicly available from the U.K. government and other public bodies in the U.K. into a number of discreet categories. Thirty-three percent of the data that was being published by the government was related to government expenditure. A large slice of the data that was being supplied had to do with the economy demographics and health.

Does more transparency lead to positive economic outcomes?

Harvey Lewis: In the U.K., and certainly to some extent in the U.S., there are multiple objectives at work in open data.

One of the primary objectives is transparency, publishing data that allows citizens to really kick the tires on public services, hopefully leading them to be improved, to increase quality and choice for individual citizens.

The things you do to enable transparency, however, aren’t necessarily the same things you do to enable economic growth and economic impact. For economic growth, focus on data that are likely to diffuse throughout the economy in the widest and greatest possible way. That’s dynamic data, data that’s granular, collected on a regular basis, updated, and made available through APIs that application developers and businesses can use.

Put some guarantees around those data sources to preserve their formats, longevity and utility, so that businesses have the confidence to use them and start building companies on the backs of them. Investors have got to have confidence that data will be available in the long term.

Those are the steps you take for economic growth. They’re quite different from the steps you might take for transparency, which is about making sure that all data that has a potential bearing on public services and cities and interpretation of government policy is made available.

You defined five business model archetypes in your report: “suppliers, aggregators, developers, enrichers and enablers.” Which examples have been sustainable?

Harvey Lewis: In coming up with that list, we did an analysis of as many companies as we could find. We tried to apprize business models from publicly available information to get a better understanding of what they were doing with the data and how they were generating revenue from it.

We had a long list of about 15 or 16 discreet business models that we were then able to cluster into these five archetypes.

Suppliers are publishing open data, including, of course, public sector bodies. Some businesses are publishing their data. While there may be no direct financial return if they publish data as open data and make it freely available, there are nevertheless other benefits that are going to become very meaningful in the future.

It’s something that a lot of businesses won’t be able to ignore, particularly when it comes to sustainability and financial data. Consumers are putting a lot of businesses under a great deal of scrutiny now to make sure that businesses are operating with integrity and can be trusted. A lot of this is about public good or customer good, and that can be quite intangible.

The second area, aggregators, is perhaps the largest. Organizations are pooling publicly available data, combining it and producing insights from it that are useful. They’re starting to sell those insights to businesses. One example in the report takes open data from the public body that all companies that are operating in the U.K. have to register with. They combine that data with other sources from the web, social media and elsewhere to produce intelligence that other businesses can use. They’re growing at quite a phenomenal rate.

We’re seeing a decline of organizations that are purely aggregating public sources of information. I don’t think there’s a sustainable business model there. Particular areas, like business intelligence, energy and utilities, are taking public data and are getting insights. It’s the insights that have monetary value, not the data itself.

The third are the classic app developers. This is of greatest interest where the data that is provided by the public sector is granular, real-time, updated frequently and close to the hearts of ordinary citizens. Transport data, crime data, and health data are probably the three types of data where software developed on the back of that data is going to have the greatest impact.

In the U.K., we’re seeing a lot of transport applications that enable people to plan journeys across what is, in some cases, quite a fragmented transport infrastructure — and get real benefits as a result. I think it’s only a matter of time before we start to see health data being turned into applications in exactly the same way, allowing individuals to make more informed choices, understand their own health and how to improve it and so on.

The fourth area, enrichers, is a very interesting one. We think this is the “dark matter” of the open data economy. These are larger, typically established businesses that are hoovering significant quantities of open data and combining it with their own proprietary sources to offer services to customers. These sorts of services have traditionally existed and aren’t going to go away if the open data supplies dry up. They are hugely powerful. I’m thinking of insurers and retailers who have a lot of their own data about customers and are seeking better models of risk and understanding of customers. I think it’s difficult to measure economic benefit coming from this particular archetype.

The last area is enablers. These are organizations that don’t make money from open data directly but provide platforms and technologies that other businesses and individuals use. Competition websites are a very good example, where they provide a facility that allows businesses, public sector institutions, or research institutions to make subsets of their data available to seek solutions from the crowd.

Those are the five principal archetypes. The one that stands out, underpinning the open data market at the moment, is the “enricher” model. I think the hope is that the startups and small-to-medium enterprises in the aggregation and the developer areas are going to be the new engine for growth in open data.

Do you see adjustments being made based upon demand? Or are U.K. data releases conditioned upon what the government finds easy or politically low-risk?

Harvey Lewis: This comes back to my point about multiple objectives. The government in the U.K. is addressing a set of objectives through its open data initiative, one of which is economic growth. I’m sure it’s the same as in other countries around the world.

If the question is whether the government is releasing the right data to meet a transparency objective, then the answer is “yes.” Is it releasing the right data from an economic growth perspective? The answer is “almost.” It’s certainly doing an increasingly better job at that.

This is where the Open Data Institute really comes to the fore, because their remit, as far as the government is concerned, is to stimulate demand. They’re able to go back to the government and say, “Look, the real opportunity here is in the wholesale and retail sector. Or in the real estate sector — there are large swaths of government data that are valuable and relevant to this sector that are underutilized.” That’s an opportunity for the government to engage with businesses in those sectors, to encourage the use of open data and to demonstrate the benefits and outcomes that they can achieve.

It’s a very good question, but it depends on which objective you’re thinking about as to whether or not the answer is the right one. I think if you look toward the Danish government, for example, and the way that they’re approaching open data, there’s been a priority on economic growth. The sorts of datasets they’re releasing are going to stimulate growth in the Danish market, but they may not satisfy fully the requirements that one might expect from a transparency perspective or social growth perspective.

Does data format or method of release matter for outcomes, to the extent that you could measure it?

Harvey Lewis: From our analysis, data released through APIs and, in particular, transport data was in significant demand. There were noticeably more applications being built on the back of transport data published through an API than in almost any other area.

As a mechanism for making it easy for businesses to get hold of data, APIs are pretty crucial. Being able to provide data using that mechanism is a very good way of stimulating use.

Based on some of the other work that we’ve been doing, there’s a big push to release data in its raw form. CSV is talked about quite a lot. In some cases, that works well. In other cases, it is a barrier to entry for small-to-medium enterprises.

To go back to the general practitioner prescribing data, a single month’s worth of data is published in a CSV file each month. The file size is about half a gigabyte and contains typically over four million records. If you’re a small-to-medium enterprise with limited resources — or even if you’re a journalist — you cannot open that data file in typical desktop or laptop software. There’s just too many records. Even if you can find software that will open it, running queries on it takes a very long time.

There’s a natural barrier to entry for some formats that you really only appreciate once you try to process and get to grips with the data. That, I think, is something that needs to be thought through.

There’s an imperative to get data out there, but if you provide that data in a format that small-to-medium enterprises can’t use, I think it’s unfair. Larger businesses have the tools and the specialist capability to look at these files. That creates a problem, an economic barrier. It also creates a transparency barrier because although you may be publishing the data, no one can access it. Then you don’t get the benefits of increased transparency and accountability.

Where you’ve got potentially high-value datasets in health, crime, spending data and energy and environment data, a lot of care needs to be put into what formats are going to make that most easily accessible.

It isn’t always obvious. It isn’t the CSV file. It certainly isn’t the PDF! It isn’t anything, actually, that requires specialist knowledge and tools.

What are the next steps for your research inquiry?

Harvey Lewis: We’re continuing our work, trying to formulate ideas and methods. That includes using case studies and use cases, getting information from the public sector about how much it costs to generate the data, and looking at accounts of actual scenarios.

Understanding the economic impact, despite its challenges, is really important to policymakers around open data, to ensure that the benefits of releasing open data outweigh the costs of producing it. That’s absolutely essential to the business case of open data.

The other part of our activity is focusing on the insights that can be derived from open data that benefit the public sector or private sector companies. We’re looking quite hard at the growth opportunities in open data and the areas where significant cost savings or efficiencies can be gained.

We’re also looking at some interesting potential policy areas by mashing up different sources of data. For example, can you go some way to understanding the relationship between crime and mental health? With the release of detailed crime data and detailed prescribing data, there’s an opportunity, at a very granular level, to understand potential correlations and then do some research into the underlying causes. The focus of our research is subtly shifting toward more use-case type analysis, rather than looking at an abstract, generic picture about open data.

Bottom line: does releasing open data lead to significant economic benefit?

Harvey Lewis: My instinct and the data we have today suggest that it is going to lead to significant economic benefit. Precisely how big that benefit is needs further study.

I think it’s likely to be more in the realm of the broader impacts and some of the intangibles where we see the greatest impact, necessarily through new businesses starting up and more businesses using open data. We will see those things.


This post is part of our ongoing investigation into the open data economy.

February 22 2013

White House moves to increase public access to scientific research online

Today, the White House responded to a We The People e-petition that asked for free online access to taxpayer-funded research.

open-access-smallopen-access-smallAs part of the response, John Holdren, the director of the White House Office of Science and Technology Policy, released a memorandum today directing agencies with “more than $100 million in research and development expenditures to develop plans to make the results of federally-funded research publically available free of charge within 12 months after original publication.”

The Obama administration has been considering access to federally funded scientific research for years, including a report to Congress in March 2012. The relevant e-petition, which had gathered more than 65,000 signatures, had gone unanswered since May of last year.

As Hayley Tsukayama notes in the Washington Post, the White House acknowledged the open access policies of the National Institutes of Health as a successful model for sharing research.

“This is a big win for researchers, taxpayers, and everyone who depends on research for new medicines, useful technologies, or effective public policies,” said Peter Suber, Director of the Public Knowledge Open Access Project, in a release. “Assuring public access to non-classified publicly-funded research is a long-standing interest of Public Knowledge, and we thank the Obama Administration for taking this significant step.”

Every federal agency covered by this memomorandum will eventually need to “ensure that the public can read, download, and analyze in digital form final peer-reviewed manuscripts or final published documents within a timeframe that is appropriate for each type of research conducted or sponsored by the agency.”

An open government success story?

From the day they were announced, one of the biggest question marks about We The People e-petitions has always been whether the administration would make policy changes or take public stances it had not before on a given issue.

While the memorandum and the potential outcomes from its release come with caveats, from a $100 million threshold to national security or economic competition, this answer from the director of the White House Office of Science Policy accompanied by a memorandum directing agencies to make a plan for public access to research is a substantive outcome.

While there are many reasons to be critical of some open government initiatives, it certainly appears that today, We The People were heard in the halls of government.

An earlier version of this post appears on the Radar Tumblr, including tweets regarding the policy change. Photo Credit: ajc1 on Flickr.

Reposted bycheg00 cheg00

February 21 2013

VA looks to apply innovation to better care and service for veterans

va-header-logova-header-logoThere are few areas as emblematic of a nation’s values than how it treats the veterans of its wars. As improved battlefield care keeps more soldiers alive from injuries that would have been lethal in past wars, more grievously injured veterans survive to come home to the United States.

Upon return, however, the newest veterans face many of the challenges that previous generations have encountered, ranging from re-entering the civilian workforce to rehabilitating broken bodies and treating traumatic brain injuries. As they come home, they are encumbered by more than scars and memories. Their war records are missing. When they apply for benefits, they’re added to a growing backlog of claims at the Department of Veterans Affairs (VA). And even as the raw number of claims grows to nearly 900,000, the average time to process them is also rising. According to Aaron Glanz of the Center for Investigative Reporting, veterans now wait an average of 272 days for their claims to be processed, with some dying in the interim.

While new teams and technologies are being deployed to help with the backlog, a recent report (PDF) from the Office of the Inspector General of the Veterans Administration found that new software deployed around the country that was designed to help reduce the backlog was actually adding to it. While high error rates, disorganization and mishandled claims may be traced to issues with training and implementation of the new systems, the transition from paper-based records to a digital system is proving to be difficult and deeply painful to veterans and families applying for benefits. As Andrew McAfee bluntly put it more than two years ago, these kinds of bureaucratic issues aren’t just a problem to be fixed: “they’re a moral stain on the country.”

Given that context, the launch of a new VA innovation center today takes on a different meaning. The scale and gravity of the problems that the VA faces demand true innovation: new ideas, technology or methodologies that challenge and improve upon existing processes and systems, improving the lives of people or the function of the society that they live within.

“When we set out in 2010 to knowingly adopt the ‘I word’, we did so with the full knowledge that there had to be something there,” said Jonah J. Czerwinski, senior advisor to VA Secretary Eric Shinseki and director of the VA Innovation Initiative, in a recent interview. “We chose to define value around four measurable attributes that mean something to taxpayers, veterans, Congressional delegations and staff: access, quality, cost control and customer satisfaction. The hard part was making it real. We focused for the first year on creating a foundation for what we knew had to justify its own existence, including identifying problem areas.”

The new VA Center for Innovation (VACI) is the descendent of the VA’s Innovation Initiative (VAi2), which was launched in 2010. Along with the VACI, the VA announced that it would adopt an innovation fellows program, following the successful example set by the White House, Department of Health and Human Services and the Consumer Financial Protection Bureau, and bring in an “entrepreneur-in-residence.” The new VACI will back 13 new projects from an industry competition, including improvements to prosthetics, automated sterilization, the Blue Button and cochlear implants. The VA also released a report on the VACI’s progress to date.

“We’re delving into new ways of providing audiology at great distances,” said Czerwinski, “delivering video into the home cheaply, with on-demand care, and the first wearable automatic kidney. Skeptics can judge any innovation endeavor by different measures. The question is whether at the end of the cycle if it’s still relevant.”

The rest of my interview with Czerwinski follows, slightly edited for clarity and content.

Why launch an “innovation center?”

Jonah J. Czerwinski: When we started VAi2, our intent was delving into the projects the secretary charged us with achieving. The secretary has big goals: eliminate homelessness, eliminate backlog, increase access to care.

It’s not enough for an organization to create a VC fund. It’s the way in which we structure ourselves and find compelling new ways of solving problems. We had more ways to do that. The reason why we have a center for innovation is not because we need to start innovating — we have been innovating for decades, at local levels. We’ve been disaggregated in different way. We may accomplish objectives but the organization as a whole may not benefit.

We have a cultural mission with the center that’s a little more subtle. It’s not just about funding different areas. It’s about changing from a culture where people are incented to manage problems in perpetuity to one in which people are incented to solve problems. It’s not enough to reduce backlog by a percentage point or the number of re-admissions with an infection. How do you reward someone for eliminating something wholesale?

We want our workforce to be part of that objective, to be part of coming up with those ideas. The innovation competition started in 2009 led to 75 ideas to solve problems. We have projects in almost every state now.

How will innovation help with the claims backlog?

Jonah J. Czerwinski: It’s complicated. Tech, laws, people factors, process factors, preferences by parts of interest groups all combine to make this hard. We hear different answers, depending upon the state. The variation is frustrating because it seems unfair. There are process improvements that you can’t solve from a central office. It can’t be solved simply by creating a new claims process. We can’t hire people to do this for us. It is inherently a governmental duty.

We’ve started to wrestle with automation, end-to-end. We have a Fast Track Initiative, where we’re asking how would you take a process, starting with a veteran, and end up with a decision. The insurance industry does this. We’ve hired a company to create the first end-to-end claims process as a prototype. It works enough that it created a new definition for what’s in the realm of the possible. It’s created permission to start revisiting the rules. There’s going to be a better way to automate the claims process.

What’s changed for veterans because of the “Blue Button?”

Jonah J. Czerwinski: There’s a use case where veterans receive care from both the VA and private sector hospitals. That happens about half the time. A non-VA hospital doesn’t have VISTA, our EHR [electronic health record.] If a patient goes there for care, like for an ER visit during a weekend because of congestive heart failure, doctors don’t have the information that we know about the patient at the VA. We can provide it for them without interoperability issues. That’s one direction. It’s also a way to create transparency in quality of care, if the hospital has visibility in your healthcare status.

In terms of continuity of care, when that veteran comes back to a VA hospital, the techs don’t have visibility into what happened at the other hospital. A veteran can download clinical information and bring that back. We now have a level of care between the public and private sector you never had before.

February 13 2013

Personal data ownership drives market transparency and empowers consumers

On Monday morning, the Obama administration launched a new community focused on consumer data at Data.gov. While there was no new data to be found among the 507 datasets listed there, it was the first time that smart disclosure has an official home in federal government.

Data.gov consumer slide apps imageData.gov consumer slide apps image

Image via Data.gov.

“Smart disclosure means transparent, plain language, comprehensive, synthesis and analysis of data that helps consumers make better-informed decisions,” said Christopher Meyer, the vice president for external affairs and information services at Consumers Union, the nonprofit that publishes “Consumer Reports,” in an interview. “The Obama administration deserves credit for championing agency disclosure of data sets and pulling it together into one web site. The best outcome will be widespread consumer use of the tools — and that remains to be seen.”

You can find the new community at Consumer.Data.gov or data.gov/consumer. Both URLs forward visitors to the same landing page, where they can explore the data, past challenges, external resources on the topic, in addition to a page about smart disclosure, blog posts, forums and feedback.

“Analyzing data and giving plain language understanding of that data to consumers is a critical part of what Consumer Reports does,” said Meyer. “Having hundreds of data sets available on one (hopefully) easy-to-use platform will enable us to provide even more useful information to consumers at a time when family budgets are tight and health care and financial ‘choices” have never been more plentiful.”

The newest community brings the total number of communities on Data.gov to 16. A survey of the existing communities didn’t turn up much recent activity in the forums or blogs, although the health care community at HealthData.gov has more signs of life than others and there are ongoing challenges at Challenge.gov associated with many different topics.

Another side of open?

Smart disclosure is one of the 17 initiatives that the U.S. committed to as part of the National Action Plan for the Open Government Partnership.

“We’ve developed new tools — called ‘smart disclosures’ — so that the data we make public can help people make health care choices, help small businesses innovate, and help scientists achieve new breakthroughs,” said President Obama, speaking at the launch of the Open Government Partnership in New York City in September 2011. “We’ve been promoting greater disclosure of government information, empowering citizens with new ways to participate in their democracy. We are releasing more data in usable forms on health and safety and the environment, because information is power, and helping people make informed decisions and entrepreneurs turn data into new products, they create new jobs.”

In the months since, the Obama administration has been promoting the use of smart disclosure across federal government through a task force (PDF), working to embed the practice as part of the ways that agencies deliver on consumer policy. The United Kingdom’s “Midata” initiative is an important smart disclosure case study outside of the United States.

In 2012, the U.S. Treasury Department launched a finance data community, joining open data initiatives in health care, energy, education, development and safety.

“I think you have to say that what has been accomplished so far is mostly [that] the release of government data has spawned a new generation of apps,” said Richard Thaler, professor of behavioral science and economics at the University of Chicago, in an interview. “This has been a win-win for business and consumers. New businesses are created to utilize the now available government data, and consumers now know when the next bus will arrive. The next step will be to get the private sector data into the picture — but that is only the bright future at this stage, rather than something that has already been accomplished. It is great that the government has led the way in releasing data, since it will give them more credibility when they ask private companies to do the same.”

Open data as catalyst?

While their business or organizational goals for data usage may diverge, consumer advocates, entrepreneurs and media are all looking for more insight into what’s actually happening in marketplaces for goods and services.

“Data releases are critical,” said Meyer. “First, even raw, less consumer-friendly data can help change government and industry behavior when it is published. Second, sunlight truly is the best disinfectant. We believe government and industry want to do right by consumers. Scrutiny of data makes the next iteration better, whether it’s produced by the government or a hospital.”

What will make these kinds of disclosures “smart?” When they involve timely, regular release of personal data in standardized, machine readable formats. When data is more liquid, it can easily be ingested by entrepreneurs and developers to be used in tools and services to help people to make more informed decisions as they navigate marketplaces for finance, health care, energy, education or other areas.

“We use government datasets a great deal in the health care space,” said Meyer. “We use CMS ‘Hospital Compare’ data to publish ratings on patient experience and re-admissions. To develop ratings of preventive services for heart disease, we rely on the U.S. Preventive Services Task Force.”

The stories of Brightscope and Panjiva are instructive: both startups had to invest significant time, money and engineering talent in acquiring and cleaning up government data before they could put it to work adding transparency to supply chains or financial advisers.

“It’s cliche, but true – knowledge is power,” said Yaron Samid, the CEO of BillGuard, in an interview. “In BillGuard’s case, when we inform consumers about a charge on their credit bill that was disputed by thousands of other consumers or a known grey charge merchant before they shop, it empowers them to make active choices in protecting their money – and spending it, penny for penny, how they choose and explicitly authorize. The release and cross-sector collaboration of billing dispute data will empower consumers and help put an end to deceptive sales and billing practices, the same way crowdsourced “mark as spam” data did for the anti-spam industry.”

What tools exist for smart disclosure today?

If you look through the tools and services at the new alpha.data.gov, quite a few of the examples are tools that use smart disclosure. When they solve knotty problems, such consumer-facing products or services have the potential to massively scale quickly:

As Meyer pointed out in our interview, however, which ones catch on is still an open question.

“We are still in the nascent stage of identifying many smart disclosure outcomes that have benefited consumers in a practical way,” he said. “Where we can see demonstrable progress is the government’s acknowledgement that freeing the data is the first and most necessary step to giving private sector innovators opportunity to move the marketplace in a pro-consumer direction.”

The difference between open data on a government website and data put to work where consumers are making decisions, however, is significant.

“‘Freeing the data’ is just the first step,” said Meyer. “It has to be organized in a consumer-friendly format. That means a much more intense effort by the government to understand what consumers want and how they can best absorb the data. Consumer Reports and its policy and action arm, Consumers Union, have spent an enormous amount of time trying to get federal and state governments and private health providers to release information about hospital-acquired infections in order to prevent medical harms that kill 100,000 people a year. We’re making progress with government agencies, although we have a long way to go.”

There has already been some movement in sectors where consumers are used to downloading data, like banking. For instance, BillShrink and Hello Wallet use government and private sector data to help people to make better consumer finance decisions. OPower combines energy efficiency data from appliances and government data on energy usage and weather to produce personalized advice on how to save money on energy bills. BillGuard analyzes millions of billing disputes to find “grey charge” patterns on credit cards and debit cards. (Disclosure: Tim O’Reilly is on BillGuard’s Advisory Board and is a shareholder in the startup.)

“To get an idea of the potential here, think about what has happened to the travel agent business,” said Thaler. “That industry has essentially been replaced by websites servings as choice engines. While this has been a loss to those who used to be travel agents, I think most consumers feel they are better served by being able to search the various travel and lodging options via the Internet. When it comes to choosing a calling plan or a credit card, it is very difficult to get the necessary data, either on prices or on one’s own utilization, to make a good choice. The same is true for mortgages. If we can make the underlying data available, we can help consumers make much better choices in these and other domains, and at the same time make these industries more competitive and transparent. There are similar opportunities in education, especially in the post-high school, for-profit sector.”

Recent data releases have the potential to create new insights into previously opaque markets.

“There are also citizen complaint registries that have been created either by statute (Consumer Product Improvement Safety Act of 2008) or by federal agencies, like the Consumer Financial Protection Bureau (CFPB). [These registries] will create rich datasets that industry can use to improve their products and consumer advocates can analyze to point out where the marketplace hasn’t worked,” said Meyer.

In 2012, the CFPB, in fact,began publishing a new database online. As was the case with the Consumer Product Safety Commission in 2011, the consumer complaint database did not go online without industry opposition, as Suzy Khimm reported in her feature story on the CFPB. That said, the CFPB has been making consumer complaints available to the public online since last June.

That data is now being consumed by BillGuard, enabling more consumers to derive benefit that might not have been available otherwise.

“The CFPB has made their consumer complaint database open to the public,” said Samid. “Billing disputes are the No. 1 complaint category for credit cards. We also source consumer complaint data from the web and anonymized billing disputes directly from banks. We are working with other government agencies to share our findings about grey charges, but cannot disclose those relationships just yet.”

“Choice engines” for an open data economy

Many of this emerging class of services use multiple datasets to provide consumers with insight into their choices. For instance, reviews and experiences of prior customers can be mashed up with regulatory data from government agencies, including complaints. Data from patient reviews could power health care startups. The integration of food inspection data into Yelp will give consumers more insights into dining decisions. Trulia and Zillow suggest another direction for government data use, as seen in real estate.

If these early examples are any guide, there’s an interesting role for consumer policy makers and regulators to play: open data stewards and suppliers. Given that the release such data has an effect on the market for products and services, expect more companies in affected industries to resist such initiatives, much in the same way that that CPSC and CFPB database were opposed by industry. Such resistance may be subtle, where government data collection is portrayed as part of a regulator’s mission but its release into the marketplace is undermined.

Nonetheless, smart disclosure taps into larger trends, in particular “personal data ownership” and consumer empowerment. The growth of an energy usage management sector and participatory health care show how personal data can be used, once acquired. The use of behavioral science in combination with such data is of great interest to business interest and should attract the attention of policy makers, legislators and regulators.

After all, convening and pursuing smart disclosure initiatives puts government in an interesting role. If government agencies or private companies then choose to apply behavioral economics in programs or policies, with an eye on improving health or financial well-being, how should the policies themselves be disclosed use? What principles matter?

“The guideline I suggest is that if a firm is keeping track of your usage and purchases, then you should be able to get access to that data in a machine-readable, standardized format that, with one click, you could upload to a search engine website,” said Thaler. “As for the proper balance, I am proposing only that consumers have access to their raw purchase history, not proprietary inferences the firm may have drawn. To give an example, you should have a right to download the list of all the movies you have rented from Netflix, but not the conclusions they have reached about what sort of movies you might also like. Also, any policy like this should begin with larger firms that already have sophisticated information systems keeping track of consumer data. For those firms, the costs of providing the data to their consumers should be minor.”

Given the growth of student loans, more transparency and understanding for higher education education choices is needed. For that to happen, prospective students will need more access to their own personal data to build the profile that they can then use to get personalized recommendations about education, along with data from higher education institutions, including outcomes for different kinds of students, from graduation rates to job placement.

Disclosures of data regarding outcomes can have other effects as well.

“I referenced the hospital-acquired infection battle earlier,” said Meyer. “In 1999, the Institute of Medicine released a study, “To err is human,” that showed tens of thousands of consumers were dying because of preventable medical harms. Consumers Union started a campaign in 2003 to reduce the number of deaths due to hospital-acquired infections. Our plan was to get laws passed in states that required disclosure of infections. We have helped get laws passed in 30 states, which is great, but getting the states to comply with useful data has been difficult. We’re starting to see progress in reducing infections but it’s taken a long time.”


This post is part of our ongoing investigation into the open data economy.

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl