Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

April 30 2013

Linking open data to augmented intelligence and the economy

After years of steady growth, open data is now entering into public discourse, particularly in the public sector. If President Barack Obama decides to put the White House’s long-awaited new open data mandate before the nation this spring, it will finally enter the mainstream.

As more governments, businesses, media organizations and institutions adopt open data initiatives, interest in the evidence behind  release and the outcomes from it is similarly increasing. High hopes abound in many sectors, from development to energy to health to safety to transportation.

“Today, the digital revolution fueled by open data is starting to do for the modern world of agriculture what the industrial revolution did for agricultural productivity over the past century,” said Secretary of Agriculture Tom Vilsack, speaking at the G-8 Open Data for Agriculture Conference.

As other countries consider releasing their public sector information as data and machine-readable formats onto the Internet, they’ll need to consider and learn from years of effort at data.gov.uk, data.gov in the United States, and Kenya in Africa.

nigel_shadboltnigel_shadboltOne of the crucial sources of analysis for the success or failure of open data efforts will necessarily be research institutions and academics. That’s precisely why research from the Open Data Institute and Professor Nigel Shadbolt (@Nigel_Shadbolt) will matter in the months and years ahead.

In the following interview, Professor Shadbolt and I discuss what lies ahead. His responses were lightly edited for content and clarity.

How does your research on artificial intelligence (AI) relate to open data?

AI has always fascinated me. The quest for understanding what makes us smart and how we can make computers smart has always engaged me. While we’re trying to understand the principles of human intelligence and build a “brain in a box, smarter robots” or better speech processing algorithms, the world’s gone and done a different kind of AI: augmented intelligence. The web, with billions of human brains, has a new kind of collective and distributive capability that we couldn’t even see coming in AI. A number of us have coined a phrase, “Web science,” to understand the Web at a systems level, much as we do when we think about human biology. We talk about “systems biology” because there are just so many elements: technical, organizational, cultural.

The Web really captured my attention ten years ago as this really new manifestation of collective problem-solving. If you think about the link into earlier work I’d done, in what was called “knowledge engineering” or knowledge-based systems, there the problem was that all of the knowledge resided on systems on people’s desks. What the web has done is finish this with something that looks a lot like a supremely distributed database. Now, that distributed knowledge base is one version of the Semantic Web. The way I got into open data was the notion of using linked data and semantic Web technologies to integrate data at scale across the web — and one really high value source of data is open government data.

What was the reason behind the founding and funding of the Open Data Institute (ODI)?

The open government data piece originated in work I did in 2003 and 2004. We were looking at this whole idea of putting new data-linking standards on the Web. I had a project in the United Kingdom that was working with government to show the opportunities to use these techniques to link data. As in all of these things, that work was reported to Parliament. There was real interest in it, but not really top-level heavy “political cover” interest. Tim Berners-Lee’s engagement with the previous prime minister led to Gordon Brown appointing Tim and I to look at setting up data.gov.uk, getting data released and then the current coalition government taking that forward.

Throughout this time, Tim and I have been arguing that we could really do with a central focus, an institute whose principle motivation was working out how we could find real value in this data. The ODI does exactly that. It’s got about $60 million of public money over five years to incubate companies, build capacity, train people, and ensure that the public sector is supplying high quality data that can be consumed. The fundamental idea is that you ensure high quality supply by generating a strong demand side. The good demand side isn’t just public sector, it’s also the private sector.

What have we learned so far about what works and what doesn’t? What are the strategies or approaches that have some evidence behind them?

I think there are some clear learnings. One that I’ve been banging on about recently has been that yes, it really does matter to turn the dial so that governments have a presumption to publish non-personal public data. If you would publish it anyway, under a Freedom of Information request or whatever your local legislative equivalent is, why aren’t you publishing it anyway as open data? That, as a behavioral change. is a big one for many administrations where either the existing workflow or culture is, “Okay, we collect it. We sit on it. We do some analysis on it, and we might give it away piecemeal if people ask for it.” We should construct publication process from the outset to presume to publish openly. That’s still something that we are two or three years away from, working hard with the public sector to work out how to do and how to do properly.

We’ve also learned that in many jurisdictions, the amount of [open data] expertise within administrations and within departments is slight. There just isn’t really the skillset, in many cases. for people to know what it is to publish using technology platforms. So there’s a capability-building piece, too.

One of the most important things is it’s not enough to just put lots and lots of datasets out there. It would be great if the “presumption to publish” meant they were all out there anyway — but when you haven’t got any datasets out there and you’re thinking about where to start, the tough question is to say, “How can I publish data that matters to people?”

The data that matters is revealed in the fact that if we look at the download stats on these various UK, US and other [open data] sites. There’s a very, very distinctive parallel curve. Some datasets are very, very heavily utilized. You suspect they have high utility to many, many people. Many of the others, if they can be found at all, aren’t being used particularly much. That’s not to say that, under that long tail, there isn’t large amounts of use. A particularly arcane open dataset may have exquisite use to a small number of people.

The real truth is that it’s easy to republish your national statistics. It’s much harder to do a serious job on publishing your spending data in detail, publishing police and crime data, publishing educational data, publishing actual overall health performance indicators. These are tough datasets to release. As people are fond of saying, it holds politicians’ feet to the fire. It’s easy to build a site that’s full of stuff — but does the stuff actually matter? And does it have any economic utility?

Page views and traffic aren’t ideal metrics for measuring success for an open data platform. What should people measure, in terms of actual outcomes in citizens’ lives? Improved services or money saved? Performance or corrupt politicians held accountable? Companies started or new markets created?

You’ve enumerated some of them. It’s certainly true that one of the challenges is to instrument the effect or the impact. Actually, it’s the last thing that governments, nation states, regions or cities who are enthused to do this thing do. It’s quite hard.

Datasets, once downloaded, may then be virally reproduced all over the place, so that you don’t notice it from a government site. One of the requirements in most of the open licensing which is so essential to this effort is usually has a requirement for essential attribution. Those licenses should be embedded in the machine readable datasets themselves. Not enough attention is paid to that piece of process, to actually noticing when you’re looking at other applications, other data and publishing efforts, that attribution is there. We should be smarter about getting better sense from the attribution data.

The other sources of impact, though: How do you evidence actual internal efficiencies and internal government-wide benefits of open data? I had an interesting discussion recently, where the department of IT had said, “You know, I thought this was all stick and no carrot. I thought this was all in overhead, to get my data out there for other people’s benefits, but we’re now finding it so much easier to re-consume our own data and repurpose it in other contexts that it’s taken a huge amount of friction out of our own publication efforts.”

Quantified measures would really help, if we had standard methods to notice those kinds of impacts. Our economists, people whose impact is around understanding where value is created, really haven’t embraced open markets, particularly open data markets, in a very substantial way. I think we need a good number of capable economists pilling into this, trying to understand new forms of value and what the values are that are created.

I think a lot of the traditional models don’t stand up here. Bizarrely, it’s much easier to measure impact when information scarcity exists and you have something that I don’t, and I have to pay you a certain fee for that stuff. I can measure that value. When you’ve taken that asymmetry out, when you’ve made open data available more widely, what are the new things that flourish? In some respects, you’ll take some value out of the market, but you’re going to replace it by wider, more distributed, capable services. This is a key issue.

The ODI will certainly be commissioning and is undertaking work in this area. We published a piece of work jointly with Deloitte in London, looking at evidence-linked methodology.

You mentioned the demand-side of open data. What are you learning in that area — and what’s being done?

There’s an interesting tension here. If we turn the dial in the governmental mindset to the “presumption to publish” — and in the UK, our public data principles actually embrace that as government policy — you are meant to publish unless there’s an issue in personal information or national security why you would not. In a sense, you say, “Well, we just publish everything out there? That’s what we’ll do. Some of it will have utility, and some of it won’t.”

When the Web took off, and you offered pages as a business or an individual, you didn’t foresee the link-making that would occur. You didn’t foresee that PageRank would ultimately give you a measure of your importance and relevance in the world and could even be monetized after the fact. You didn’t foresee that those pages have their own essential network effect, that the more pages there are that interconnect, that there’s value being created out of it and so there’s is a strong argument [for publishing them].

So, you know, just publish. In truth, the demand side is an absolutely great and essential test of whether actually [publishing data] does matter.

Again, to take the Web as an analogy, large amounts of the Web are unattended to, neglected, and rot. It’s just stuff nobody cares about, actually. What we’re seeing in the open data effort in the UK is that it’s clear that some data is very privileged. It’s at the center of lots of other datasets.

In particular, [data about] location, occurrence, and when things occurred, and stable ways of identifying those things which are occurring. Then, of course, the data space that relates to companies, their identifications, the contracts they call, and the spending they engage in. That is the meat and drink of business intelligence apps all across the planet. If you started to turn off an ability for any business intelligence to access legal identifiers or business identifiers, all sorts of oversight would fall apart, apart from anything else.

The demand side [of open data] can be characterized. It’s not just economic. It will have to do with transparency, accountability and regulatory action. The economic side of open data gives you huge room for maneuver and substantial credibility when you can say, “Look, this dataset of spending data in the UK, published by local authorities, is the subject of detailed analytics from companies who look at all data about how local authorities and governments are spending their data. They sell procurement analysis insights back to business and on to third parties and other parts of the business world, saying ‘This is the shape of how the UK PLC is buying.’”

What are some of the lessons we can learn from how the World Wide Web grew and the value that it’s delivered around the world?

That’s always a worry, that, in some sense, the empowered get more powerful. What we do see is that, in open data in particular, new sorts of players couldn’t enter the game at all.

My favorite example is in mass transportation. In the UK, we have to fight quite hard to get some of the data from bus, rail and other forms of transportation made openly available. Until that was done, there was a pretty small number of supplies from this market.

In London, where all of it was made available from the Transport for London Authority, there’s just been an explosion of apps and businesses who are giving you subtly and distinct experiences as users of that data. I’ve got about eight or nine apps on my phone that give me interestingly distinctive views of moving about the city of London. I couldn’t have predicted or anticipated many of those exist.

I’m sure the companies who held that data could’ve spent large amounts of money and still not given me anything like the experience I now have. The flood of innovation around the data has really been significant and many, many more players and stakeholders in that space.

The Web taught us that serendipitous reuse, where you can’t anticipate where the bright idea comes from, is what is so empowering. The flipside of that is that it also reveals that, in some cases, the data isn’t necessarily of a quality that you might’ve thought. This effort might allow for civic improvement or indeed, business improvement in some cases, where businesses come and improve the data the state holds.

What’s happening in the UK with the so-called “MiData Initiative,” which posits that people have a right to access and use personal data disclosed to them?

I think this is every bit as potentially disruptive and important as open government data. We’re starting to see the emergence of what we might think of as a new class of important data, “personal assets.”

People have talked about “personal information management systems” for a long time now. Frequently, it’s revolved around managing your calendar or your contact list, but it’s much deeper. Imagine that you, the consumer, or you, the citizen, had a central locus of authority around data that was relevant to you: consumer data from retail, from the banks that you deal with, from the telcos you interact with, from the utilities you get your gas, water and electricity from. Imagine if that data infosphere was something that you could access easily, with a right to reuse and redistribute it as you saw fit.

The canonical example, of course, is health data. It isn’t all data that business holds, it’s also data the state holds, like your health records, educational transcript, welfare, tax, or any number of areas.

In the UK, we’ve been working towards empowering consumers, in particular through this MiData program. We’re trying to get to a place where consumers have a right to data held about their transactions by businesses, [released] back to them in a reusable and flexible way. We’ve been working on a voluntary program in this area for the last year. We have a consultation on taking up power to require large companies to give that information back. There is a commitment to the UK, for the first time, to get health records back to patients as data they control, but I think it has to go much more widely.

Personal data is a natural complement to open data. Some of the most interesting applications I’m sure we’re going to see in this area are where you take your personal data and enrich it with open data relating to businesses, the services of government, or the actual trading environment you’re in. In the UK, we’ve got six large energy companies that compete to sell energy to you.

Why shouldn’t groups and individuals be able to get together and collectively purchase in the same way that corporations can purchase and get their discounts? Why can’t individuals be in a spot market, effectively, where it’s easy to move from one supplier to another? Along with those efficiencies in the market and improvements in service delivery, it’s about empowering consumers at the end of the day.

This post is part of our ongoing series on the open data economy.

February 05 2013

Investing in the open data economy

If you had 10 million pounds to spend on open data research, development and startups, what would you do with it? That’s precisely the opportunity that Gavin Starks (@AgentGav) has been given as the first CEO of the Open Data Institute (ODI) in the United Kingdom.

GavinStarksGavinStarksThe ODI, which officially opened last September, was founded by Sir Tim Berners-Lee and Professor Nigel Shadbolt. The independent, non-partisan, “limited by guarantee” nonprofit is a hybrid institution focused on unlocking the value in open data by incubating startups, advising governments, and educating students and media.

Previously, Starks was the founder and chairman of AMEE, a social enterprise that scored environmental costs and risks for businesses. (O’Reilly’s AlphaTech Ventures was one of its funders.) He’s also worked in the arts, science and technology. I spoke to Starks about the work of the ODI and open data earlier this winter as part of our continuing series investigating the open data economy.

What have you accomplished to date?

Gavin Starks: We opened our offices on the first of October last year. Over the first 12 weeks of operation, we’ve had a phenomenal run. The ODI is looking to create value to help everyone address some of the greatest challenges of our time, whether that’s in education, health, in our economy or to benefit our environment.

Since October, we’ve had literally hundreds of people through the door. We’ve secured $750,000 in matched funding from the Amida Network, on top of a 10-million-pound investment from the UK Government’s Technology Strategy Board. We’ve helped identify 200 million pounds a year in savings for the health service in the UK.

200 million pounds? What do you base that estimate upon?

Gavin Starks: Part of our remit is to bring together the main experts from different areas. To illustrate the kind of benefit that I think we can bring here, one part of what we’re doing is to try and unlock data supply.

The Health Service in the UK started to release a lot of its prescription information as open data about nine months ago. We worked with some of the main experts in the health service with a big data analytics firm, Mastodon C, which is a startup that we’re incubating at the ODI.

Together, they identified potential areas of investigation. The data science folks drilled into every single prescription. (I think the dataset was something like 47 million rows of data.) What they were looking at there was the difference between proprietary drugs and generics, where there may be a generic equivalent. In many cases, the generic equivalent has no clinical difference between the proprietary drugs — and so the cost difference is huge. It might be 81 pence or 81 pennies for a generic to more than 20 pounds for a drug that’s still under license.

Looking at the entire dataset, the analytics revealed different patterns, and from that, cost differences. If we carried out this research over a year ago, for example, we could have saved 200 million pounds over the last year. It really is quite significant. That’s on one class of drugs, on one area. We think this research could be repeated against different classes of drugs and replicated internationally.

UK statin map screenshotUK statin map screenshot

Percentage of proprietary statin prescribing by CCG Sept 2011 – May 2012.
Image Credit: PrescribingAnalytics.com

Which open data business models are the most exciting to you?

Gavin Starks: I think there’s lots of different areas to explore here. There are areas where there can be cost savings brought to any organization, whether it’s public sector or private sector organizations. There’s also areas of new innovation. (I think that they’re quite different directions.) Some of the work that we’ve done with the prescription data, that’s where you’re looking at efficiencies.

We’ve got other startups that are based in our offices here in Shoreditch and London that are looking at transportation information. They’re looking at location-based services and other forms of analytics within the commercial sectors: financial information, credit ratings, those kinds of areas. When you start to pull together different levels of open data that have been available but haven’t been that accessible in the past, there’s new services that can be derived from them.

What creates a paid or value-add service? It’s essential that we create a starting point where free and open access to the data itself can be made available for certain use cases for as many people as possible. There, you stimulate innovation if you can gain access to discern new insight from that data.

Having the data aggregated, structured and accessible in an automated way is worth paying for. There could be a service-level-agreement-based model. There could be a carve-out of use cases. You could borrow from the Creative Commons world and say, “If you’re going to have a share alike license on this, then that’s fine, you can use it for free. But if you’re going to start creating closed assets, as a result, there may be a charge for the use of data at that point.”

I think there’s a whole range of different data models, but really, the goal here is to try and discern what insight can be derived from existing datasets and what happens when you start mashing them up with other datasets.

What are the greatest challenges to achieving some of the economic outcomes that the UK Cabinet Office has described?

Gavin Starks: I think there are many challenges. One of the key ones is just understanding. One challenge we’ve heard consistently from pretty much everybody has been, “We believe there’s a huge amount of potential here, but where do we start?”

Part of the ODI’s mission is to provide training, education and assets that enable people to begin on that journey. We’re in the process right now of designing our first dozen or so training programs. We’re working at one level with the World Bank to train the world’s politicians and national leaders, and we’re working at the other end with schools to create programs that fit with existing graduate courses.

Education is one of the biggest challenges. We want to train more than technologists — we also want to train lawyers and journalists about the business case to enable people to understand and move forward at the same pace. There’s very little point in just saying, “There is huge value here,” without trying to demonstrate that return on investment (ROI) and value case at the same time.

What is the ODI’s approach to incubating civic startups?

Gavin Starks: There are two parts to it. One is unlocking supply. We’re working with different government departments and public sector agencies to help them understand what unlocking supply means. Creating structured, addressable, repeatable data creates the supply piece so that you can actually start to build a business. It’s very high-risk to try and build a business when you don’t have a guarantee of supply.

Two, encouraging and incubating the demand side. We’ve got six startups in our space already. They’re all at different stages. Some of them are very early on, just trying to navigate toward the value here that we can discern from the data. Others are more mature, and maybe have some existing revenue streams, but they’re looking at how to really make this scale.

What we’ve found is of benefit so far — and again, we’re only three months in — is our ability to network and convene the different stakeholders. We can take a small startup and get them in front of one of the large corporations and help them bridge that sales process. Helping them communicate their ideas in a clear way, where the value is obvious to the end customer, is important.

What are some of the approaches that have worked to unlock value from open government data?

Gavin Starks: We’re not believers in “If you build it, they will come.” You need to create a guaranteed data supply, but you also need to really engage with people to start to unlock ideas.

We’ve been running our own hackathons, but I think there’s a difference in the way that we’ve structured them and organized them. We include domain experts and frame the hack events around a specific problem or a specific set of problems. For example, we had a weekend-long hackathon in the health space, looking at different datasets, convening domain experts and technical experts.

It involved competitions, when the winner gets a seat at the ODI to take their idea forward. It might be that an idea turns into a business, it might turn into a project, or it might just turn into a research program.

I think that you need to really lead people by the hand through the process of innovation, helping them and supporting them to unlock the value, rather than just having the datasets there and expecting them to be used.

Given the cost the UK’s National Audit Office ascribed to opening data, is the investment worth it?

Gavin Starks: This is like the early days of the web. There are lots of estimates about how much everything is going to be worth and what sort of ROI people are going to see. What we’ve yet to see, I think, is the honest answer.

The reason I’m very excited about this area is that I see the same potential as I saw in the mid-1990s, when I got involved with the web. The same patterns exist today. There are new datasets and ecosystems coming into existence that can be data-mined. They can be joined together in novel ways. They can bridge the virtual and physical worlds. They can bring together people who have not been able to collaborate in different ways.

There’s a huge amount of value to be unlocked. There will be some dead ends, as we had in the web’s development, but there will be some incredible wins. We’re trying to refine our own skills around identifying where those potential hot spots might be.

Health services is an area where it’s really obvious there’s a lot of benefits. There are clear benefits from opening up transportation and location-based services. You can see the potential behind energy efficiency, creating efficient supply chains and opening up more information around education.

You can see resonant points. We’re really drilling into those and asking, “What happens when you really put together the right domain experts and the supportive backers?”

Those backers can be financial as well as in industry. The Open Data Institute has been pulling together those experts and providing a neutral space for that innovation to happen.

Which of those areas have the most clear economic value, in terms of creating shorter term returns on investment and quick wins?

Gavin Starks: I don’t think there’s a single answer to that question. If you look at location-based services, corporate data, health data or education, there are examples and use cases in different parts of the world where they will have different weightings.

If you were looking at water sanitation in areas of the world where there is an absence of it, then they may provide more immediate return than unlocking huge amounts of transportation information.

In Denmark, look at the release of the equivalent of zip code data and the more detailed addresses. I believe the numbers there went from four-fold return to 17-fold return, in terms of value to the country of their investment in decent address-level data.

This is one area that we’ve provided a consultation response in the UK. I think it may vary from state-to-state in the U.S., or maybe in areas where the specific focus on health would be very beneficial. There may be areas where a focus on energy efficiency may be most beneficial.

What conditions lead to beneficial outcomes for open data?

Gavin Starks: A lot of the real issues are not really about the technology. When it comes to the technology, we know what a lot of the solutions are. How can we address or improve the data quality? What standards need to exist? What anonymity, privacy or secrecy needs to exist around the data? How do we really measure the outcomes? What are the circumstances where stakeholders need to get involved?

You definitely need political buy-in, but there also needs to be a sense of what the data landscape is. What’s the inventory? What’s the legal situation? Who has access? What kind of access is required? What does success look like against a particular use case?

You could be looking at health in somewhere like Rwanda, you could be looking at a national statistics office in a particular country where they may not have access to the data themselves, and they don’t have very much access to resources. You could be looking at contracting, government procurement and improving simple accountability, where there may be more information flow than there is around energy data, for example.

I think there’s a range of different use cases that we need to really explore here. We’re looking for great use cases where we can say, “This is something that’s simple to achieve, that’s repeatable, that helps lower costs and stimulate innovation.”

We are really at the beginning of a journey here.

Red Hat made headlines for becoming the first billion-dollar open source company. What do you think the first billion-dollar open data company will be?

Gavin Starks: It would be not unlikely for that to be in the health arena.


This interview has been edited and condensed for clarity. This post is part of our ongoing investigation into the open data economy.

Related:

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl