Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

March 19 2013

The City of Chicago wants you to fork its data on GitHub

GitHub has been gaining new prominence as the use of open source software in government grows.

Earlier this month, I included a few thoughts from Chicago’s chief information officer, Brett Goldstein, about the city’s use of GitHub, in a piece exploring GitHub’s role in government.

While Goldstein says that Chicago’s open data portal will remain the primary means through which Chicago releases public sector data, publishing open data on GitHub is an experiment that will be interesting to watch, in terms of whether it affects reuse or collaboration around it.

In a followup email, Goldstein, who also serves as Chicago’s chief data officer, shared more about why the city is on GitHub and what they’re learning. Our discussion follows.

Chicago's presence on GitHubChicago's presence on GitHub

The City of Chicago is on GitHub.

What has your experience on GitHub been like to date?

Brett Goldstein: It has been a positive experience so far. Our local developer community is very excited by the MIT License on these datasets, and we have received positive reactions from outside of Chicago as well.

This is a new experiment for us, so we are learning along with the community. For instance, GitHub was not built to be a data portal, so it was difficult to upload our buildings dataset, which was over 2GB. We are rethinking how to deploy that data more efficiently.

Why use GitHub, as opposed to some other data repository?

Brett Goldstein: GitHub provides the ability to download, fork, make pull requests, and merge changes back to the original data. This is a new experiment, where we can see if it’s possible to crowdsource better data. GitHub provides the necessary functionality. We already had a presence on GitHub, so it was a natural extension to that as a complement to our existing data portal.

Why does it make sense for the city to use or publish open source code?

Brett Goldstein: Three reasons. First, it solves issues with incorporating data in open source and proprietary projects. The city’s data is available to be used publicly, and this step removes any remaining licensing barriers. These datasets were targeted because they are incredibly useful in the daily life of residents and visitors to Chicago. They are the most likely to be used in outside projects. We hope this data can be incorporated into existing projects. We also hope that developers will feel more comfortable developing applications or services based on an open source license.

Second, it fits within the city’s ethos and vision for data. These datasets are items that are visible in daily life — transportation and buildings. It is not proprietary data and should be open, editable, and usable by the public.

Third, we engage in projects like this because they ultimately benefit the people of Chicago. Not only do our residents get better apps when we do what we can to support a more creative and vibrant developer community, they also will get a smarter and more nimble government using tools that are created by sharing data.

We open source many of our projects because we feel the methodology and data will benefit other municipalities.

Is anyone pulling it or collaborating with you? Have you used that code? Would you, if it happened?

Brett Goldstein: We collaborated with Ian Dees, who is a significant contributor to OpenStreetMaps, to launch this idea. We anticipate that buildings data will be integrated in OpenStreetMaps now that it’s available with a compatible license.

We have had 21 forks and a handful of pull requests fixing some issues in our README. We have not had a pull request fixing the actual data.

We do intend to merge requests to fix the data and are working on our internal process to review, reject, and merge requests. This is an exciting experiment for us, really at the forefront of what governments are doing, and we are learning along with the community as well.

Is anyone using the open data that wasn’t before, now that it’s JSON?

Brett Goldstein: We seem to be reaching a new audience with posting data on GitHub, working in tandem with our heavily trafficked data portal. A core goal of this administration is to make data open and available. We have one of the most ambitious open data programs in the country. Our portal has over 400 datasets that are machine readable, downloadable and searchable. Since it’s hosted on Socrata, basic analysis of the data is possible as well.

March 08 2013

GitHub gains new prominence as the use of open source within governments grows

github-social-codinggithub-social-codingWhen it comes to government IT in 2013, GitHub may have surpassed Twitter and Facebook as the most interesting social network. 

GitHub’s profile has been rising recently, from a Wired article about open source in government, to its high profile use by the White House and within the Consumer Financial Protection Bureau. This March, after the first White House hackathon in February, the administration’s digital team posted its new API standards on GitHub. In addition to the U.S., code from the United Kingdom, Canada, Argentina and Finland is also on the platform.

“We’re reaching a tipping point where we’re seeing more collaboration not only within government agencies, but also between different agencies, and between the government and the public,” said GitHub head of communications Liz Clinkenbeard, when I asked her for comment.

Overall, 2012 was a breakout year for the use of GitHub by government, with more than 350 government code repositories by year’s end.

Total government GitHub repositoriesTotal government GitHub repositories

Total number of government repositories on GitHub.

In January 2012, the British government committed the code for GOV.UK to GitHub.

NASA, after its first commit, added 11 more code repositories over the course of the year.

In September, the new Open Gov Foundation published the code for the MADISON legislative platform. In December, the U.S. Code went on GitHub.

GitHub’s profile was raised further in Washington this week when Ben Balter was announced as the company’s federal liaison. Balter made some open source history last year, when he was part of the federal government’s first agency-to-agency pull request. He also was a big part of giving the White House some much-needed geek cred when he coded the administration’s digital government strategy in HTML5.

Balter will be GitHub’s first government-focused employee. He won’t, however, be saddled with an undecipherable title. In a sly dig at the slow-moving institutions of government, and in keeping with GitHub’s love for octocats, Balter will be the first “Government Bureaucat,” focused on “helping government to do all sorts of governmenty things, well, more awesomely,” wrote GitHub CIO Scott Chacon.

Part of Balter’s job will be to evangelize the use of GitHub’s platform as well as open source in government, in general. The latter will come naturally to him, given how he and the other Presidential Innovation Fellows approached their work.

“Virtually everything the Presidential Innovation Fellows touched was open sourced,” said Balter when I interviewed him earlier this week. “That’s everything from better IT procurement software to internal tools that we used to streamline paperwork. Even more important, much of that development (particularly RFPEZ) happened entirely in the open. We were taking the open source ethos and applying it to how government solutions were developed, regardless whether or not the code was eventually public. That’s a big shift.”

Balter is a proponent of social coding in the open as a means of providing some transparency to interested citizens. “You can go back and see why an agency made a certain decision, especially when tools like these are used to aid formal decision making,” he said. “That can have an empowering effect on the public.”

Forking code in city hall and beyond

There’s notable government activity beyond the Beltway as well.

The City of Chicago is now on GitHub, where chief data officer and city CIO Brett Goldstein is releasing open data as JSON files, along with open source code.

Both Goldstein and Philadelphia chief data officer Mark Headd are also laudably participating in conversations about code and data on Hacker News threads.

“Chicago has released over 400 datasets using our data portal, which is located at,” Headd wrote on HackerNews. While Goldstein says that the city’s portal will remain the primary way they release public sector data, publishing data on GitHub is an experiment that will be interesting to watch, in terms of whether it affects reuse.

“We hope [the datasets on GitHub] will be widely used by open source projects, businesses, or non-profits,” wrote Goldstein. “GitHub also allows an on-going collaboration with editing and improving data, unlike the typical portal technology. Because it’s an open source license, data can be hosted on other services, and we’d also like to see applications that could facilitate easier editing of geographic data by non-technical users.”

Headd is also on GitHub in a professional capacity, where he and his colleagues have been publishing code to a City of Philadelphia repository.

“We use [GitHub] to share some of our official city apps,” commented Headd on the same Hacker News thread. “These are usually simple web apps built with tools like Bootstrap and jQuery. We’ll be open sourcing more of these going forward. Not only are we interested in sharing the code for these apps, we’re actively encouraging people to fork, improve and send pull requests.”

While there’s still a long road ahead for widespread code sharing between the public and government, the economic circumstances of cities and agencies could create the conditions for more code sharing inside government. In a TED Talk last year, Clay Shirky suggested that adopting open source methods for collaboration could even transform government.

A more modest (although still audacious) goal would be to simply change how government IT is done.

“I’ve often said, the hardest part of being a software developer is training yourself to Google the problem first and see if someone else has already solved it,” said Balter during our interview. “I think we’re going to see government begin to learn that lesson, especially as budgets begin to tighten. It’s a relative ‘app store’ of technology solutions just waiting to be used or improved upon. That’s the first step: rather than going out to a contractor and reinventing the wheel each time, it’s training ourselves that we’re part of a larger ecosystem and to look for prior art. On the flip side, it’s about contributing back to that commons once the problem has been solved. It’s about realizing you’re part of a community. We’re quickly approaching a tipping point where it’s going to be easier for government to work together than alone. All this means that a taxpayer’s dollar can go further, do more with less, and ultimately deliver better citizen services.”

Some people may understandably bridle at including open source code and open data under the broader umbrella of “open government,” particularly if such efforts are not balanced by adherence to good government principles around transparency and accountability.

That said, there’s reason to hail collaboration around software and data as bonafide examples of 21st century civic participation, where better platforms for social coding enable improved outcomes. The commits and pulls of staff and residents on GitHub may feel like small steps, but they represent measurable progress toward more government not just of the people, but with the people.

“Open source in government is nothing new,” said Balter. “What’s new is that we’re finally approaching a tipping point at which, for federal employees, it’s going to be easier to work together, than work apart. Whereas before, ‘open source’ often meant compiling, zipping, and uploading, when you fuse the internal development tools with the external publishing tools, and you make those tools incredibly easy to use, participating in the open source community becomes trivial. Often, it can be more painful for an agency to avoid it completely. I think we’re about to see a big uptick in the amount of open source participation, and not just in the traditional sense. Open source can be between business units within an agency. Often the left hand doesn’t know what the right is doing between agencies. The problems agencies face are not unique. Often the taxpayer is paying to solve the same problem multiple times. Ultimately, in a collaborative commons with the public, we’re working together to make our government better.”

Sponsored post
soup-sponsored will be discontinued :(

Dear fans and users,
today, we have to share very sad news. will stop working in less than 10 days. :(
It's breaking our heart and we honestly tried whatever we could to keep the platform up and running. But the high costs and low revenue streams made it impossible to continue with it. We invested a lot of personal time and money to operate the platform, but when it's over, it's over.
We are really sorry. is part of the internet history and online for one and a half decades.
Here are the hard facts:
- In 10 days the platform will stop working.
- Backup your data in this time
- We will not keep backups nor can we recover your data
July, 20th, 2020 is the due date.
Please, share your thoughts and feelings here.
Reposted bydotmariuszMagoryannerdanelmangoerainbowzombieskilledmyunicorntomashLogHiMakalesorSilentRulebiauekjamaicanbeatlevuneserenitephinangusiastysmoke11Climbingpragne-ataraksjisauerscharfArchimedesgreywolfmodalnaTheCrimsonIdoljormungundmarbearwaco6mieczuuFeindfeuerDagarhenvairashowmetherainbowszpaqusdivihindsightTabslawujcioBateyelynTabslaensommenitaeliblameyouHalobeatzalicexxx

February 25 2013

Governments looking for economic ROI must focus on open data with business value

There’s increasing interest in the open data economy from the research wings of consulting firms. Capgemini Consulting just published a new report on the open data economy. McKinsey’s Global Institute is following up its research on big data with an inquiry into open data and government innovation. Deloitte has been taking a long look at open data business models. Forrester says open data isn’t (just) for governments anymore and says more research is coming. If Bain & Company doesn’t update its work on “data as an asset” this year to meet inbound interest in open data from the public sector, it may well find itself in the unusual position of lagging the market for intellectual expertise.

As Radar readers know, I’ve been trying to “make dollars and sense” of the open data economy since December, looking at investments, business models and entrepreneurs.

harvey_lewisharvey_lewisIn January, I interviewed Harvey Lewis, the research director for the analytics department of Deloitte U.K. Lewis, who holds a doctorate in hypersonic aerodynamics, has been working for nearly 20 years on projects in the public sector, defense industry and national security. Today, he’s responsible for applying an analytical eye to consumer businesses, manufacturing, banking, insurance and the public sector. Over the past year, his team has been examining the impact of open data releases on the economy of the United Kingdom. The British government’s embrace of open data makes such research timely.

Given the many constituencies interested in open data these days, from advocates for transparency and good government to organizations interested in co-creating civic services to entrepreneurs focused on building and scaling sustainable startups, one insight stood out from our discussion in particular:

“The things you do to enable transparency … aren’t necessarily the same things you do to enable economic growth and economic impact,” said Lewis.

“For economic growth, focus on data that are likely to diffuse throughout the economy in the widest and greatest possible way. That’s dynamic data, data that’s granular, collected on a regular basis, updated, and made available through APIs that application developers and businesses can use.”

The rest of our interview, lightly edited for content and clarity, follows.

Why is Deloitte interested in open data?

Harvey Lewis: In late 2011, we realized that open data was probably going to be one of those areas that was likely to be transformational, maybe not in the short term, but certainly in the long term. A lot of the technology that companies are using to do analysis of data will become increasingly commoditized, so the advantage that people were going to get was going to come through their interpretations of data and by looking for other commercial mechanisms for getting value from data.

The great thing about open data is that it provides those opportunities. It provides, in some ways, a level playing field and ways of creating revenue and opportunities that just don’t exist in other spaces.

You’ve been investigating the demand for open data from businesses. How have you approached the research?

Harvey Lewis: We’ve been working with professor Nigel Shadbolt in the U.K., who is one of the great champions on the global stage for open data. He and I started work on our open data activity back about 12 months ago.

Our interest was not so much in open government data but more the spectrum of open data, from government, business and individual citizens. We thought we would run an exercise over the spring of 2012, inviting various organizations to come and debate open data. We were very keen to get a cross-section of people from public and private sectors in those discussions because we wanted to understand what businesses thought of open data. We published a report [PDF] in June of last year, which was largely qualitative, looking at what we thought was happening in the world of open data, from a business perspective.

There were four main hypotheses to that vision:

The first part was that we thought every business should have a strategy to explore open data. If you look at the quantity of data that’s now available globally, even just from government, it’s an extraordinary amount, if you measure it just by the number of datasets that are published. In the U.K., it’s in the tens of thousands. In the U.S., it’s in the hundreds of thousands. There’s a vast resource of data that’s freely available that can be used to supplement existing sources of information, proprietary or otherwise, and enrich companies’ views of the world.

The second part was that businesses themselves would start to open up their data. There are different ways of gaining revenue and value from data if they opened it up. This was quite a controversial subject, as I’m sure you might imagine, in some of the discussions. Nevertheless, we’re starting already to see companies releasing subsets of their data on competition websites, inviting the crowd to come up with innovative solutions. We’re also seeing evidence that companies are releasing their data to improve the way they interact with their customers. I think one of the great broad impacts of businesses opening up their data is reputational enhancement — and that can have a real economic benefit.

The third part of our hypothesis was that open data would inspire customer engagement. That is, I think, a great topic for exploration within the public sector itself. Releasing this data isn’t just about “publishing it and they will come” — it’s about releasing data and using that data to engage in a different type of conversation with citizens and consumers.

Certainly in the U.K., we’re starting to see the fruits of that and some new initiatives. There’s a concept called “midata” in the U.K., where the government is encouraging service providers to release consumer data back to individuals so they can shop around for the best deals in the market. I think that’s a great vision for open data.

The fourth part was the privacy and the ethical responsibilities that come with the processing of open data, with companies and government starting to work more closely together to come up with a new paradigm for responsibility and privacy.

Nigel Shadbolt and I committed to doing further work on the economic business case for open data to try to address some of these hypothetical views of the future.

That launched this second phase of our work, which was trying to quantify that economic benefit. We decided very early on, because of Nigel Shadbolt’s relationship to the Open Data Institute, to work closely with that organization, as it was born in the summer of 2012.

We spent a lot of time gathering data. Particularly, we were looking at whether or not we could infer from the demand for open data from a variety of government portals what the economic benefit would be. We looked to a number of other measures and data sources, including a very broad balance sheet analysis to try to infer how companies were increasingly using data to run their businesses and benefit their businesses.

What did you find in this inquiry?

Harvey Lewis: We published a second report, called “Open Growth,” in early December of last year. The fundamental problem in trying to estimate the economic benefit is around, essentially, a lack of data. It sounds quite ironic, doesn’t it, that there’s a lack of data to quantify the effect of open data?

In particular, it’s still early days for determining economic benefit. When you’re trying to uncover second-order effects in the economy due to open data, it’s very early days to be able to see those effects percolate through different sectors. We were really challenged. Nevertheless, we were able to look quite closely at the sorts of data that the U.K. government had been publishing and draw some conclusions about what that meant for the economy.

For example, we were able to categorize nearly 40,000 datasets that are publicly available from the U.K. government and other public bodies in the U.K. into a number of discreet categories. Thirty-three percent of the data that was being published by the government was related to government expenditure. A large slice of the data that was being supplied had to do with the economy demographics and health.

Does more transparency lead to positive economic outcomes?

Harvey Lewis: In the U.K., and certainly to some extent in the U.S., there are multiple objectives at work in open data.

One of the primary objectives is transparency, publishing data that allows citizens to really kick the tires on public services, hopefully leading them to be improved, to increase quality and choice for individual citizens.

The things you do to enable transparency, however, aren’t necessarily the same things you do to enable economic growth and economic impact. For economic growth, focus on data that are likely to diffuse throughout the economy in the widest and greatest possible way. That’s dynamic data, data that’s granular, collected on a regular basis, updated, and made available through APIs that application developers and businesses can use.

Put some guarantees around those data sources to preserve their formats, longevity and utility, so that businesses have the confidence to use them and start building companies on the backs of them. Investors have got to have confidence that data will be available in the long term.

Those are the steps you take for economic growth. They’re quite different from the steps you might take for transparency, which is about making sure that all data that has a potential bearing on public services and cities and interpretation of government policy is made available.

You defined five business model archetypes in your report: “suppliers, aggregators, developers, enrichers and enablers.” Which examples have been sustainable?

Harvey Lewis: In coming up with that list, we did an analysis of as many companies as we could find. We tried to apprize business models from publicly available information to get a better understanding of what they were doing with the data and how they were generating revenue from it.

We had a long list of about 15 or 16 discreet business models that we were then able to cluster into these five archetypes.

Suppliers are publishing open data, including, of course, public sector bodies. Some businesses are publishing their data. While there may be no direct financial return if they publish data as open data and make it freely available, there are nevertheless other benefits that are going to become very meaningful in the future.

It’s something that a lot of businesses won’t be able to ignore, particularly when it comes to sustainability and financial data. Consumers are putting a lot of businesses under a great deal of scrutiny now to make sure that businesses are operating with integrity and can be trusted. A lot of this is about public good or customer good, and that can be quite intangible.

The second area, aggregators, is perhaps the largest. Organizations are pooling publicly available data, combining it and producing insights from it that are useful. They’re starting to sell those insights to businesses. One example in the report takes open data from the public body that all companies that are operating in the U.K. have to register with. They combine that data with other sources from the web, social media and elsewhere to produce intelligence that other businesses can use. They’re growing at quite a phenomenal rate.

We’re seeing a decline of organizations that are purely aggregating public sources of information. I don’t think there’s a sustainable business model there. Particular areas, like business intelligence, energy and utilities, are taking public data and are getting insights. It’s the insights that have monetary value, not the data itself.

The third are the classic app developers. This is of greatest interest where the data that is provided by the public sector is granular, real-time, updated frequently and close to the hearts of ordinary citizens. Transport data, crime data, and health data are probably the three types of data where software developed on the back of that data is going to have the greatest impact.

In the U.K., we’re seeing a lot of transport applications that enable people to plan journeys across what is, in some cases, quite a fragmented transport infrastructure — and get real benefits as a result. I think it’s only a matter of time before we start to see health data being turned into applications in exactly the same way, allowing individuals to make more informed choices, understand their own health and how to improve it and so on.

The fourth area, enrichers, is a very interesting one. We think this is the “dark matter” of the open data economy. These are larger, typically established businesses that are hoovering significant quantities of open data and combining it with their own proprietary sources to offer services to customers. These sorts of services have traditionally existed and aren’t going to go away if the open data supplies dry up. They are hugely powerful. I’m thinking of insurers and retailers who have a lot of their own data about customers and are seeking better models of risk and understanding of customers. I think it’s difficult to measure economic benefit coming from this particular archetype.

The last area is enablers. These are organizations that don’t make money from open data directly but provide platforms and technologies that other businesses and individuals use. Competition websites are a very good example, where they provide a facility that allows businesses, public sector institutions, or research institutions to make subsets of their data available to seek solutions from the crowd.

Those are the five principal archetypes. The one that stands out, underpinning the open data market at the moment, is the “enricher” model. I think the hope is that the startups and small-to-medium enterprises in the aggregation and the developer areas are going to be the new engine for growth in open data.

Do you see adjustments being made based upon demand? Or are U.K. data releases conditioned upon what the government finds easy or politically low-risk?

Harvey Lewis: This comes back to my point about multiple objectives. The government in the U.K. is addressing a set of objectives through its open data initiative, one of which is economic growth. I’m sure it’s the same as in other countries around the world.

If the question is whether the government is releasing the right data to meet a transparency objective, then the answer is “yes.” Is it releasing the right data from an economic growth perspective? The answer is “almost.” It’s certainly doing an increasingly better job at that.

This is where the Open Data Institute really comes to the fore, because their remit, as far as the government is concerned, is to stimulate demand. They’re able to go back to the government and say, “Look, the real opportunity here is in the wholesale and retail sector. Or in the real estate sector — there are large swaths of government data that are valuable and relevant to this sector that are underutilized.” That’s an opportunity for the government to engage with businesses in those sectors, to encourage the use of open data and to demonstrate the benefits and outcomes that they can achieve.

It’s a very good question, but it depends on which objective you’re thinking about as to whether or not the answer is the right one. I think if you look toward the Danish government, for example, and the way that they’re approaching open data, there’s been a priority on economic growth. The sorts of datasets they’re releasing are going to stimulate growth in the Danish market, but they may not satisfy fully the requirements that one might expect from a transparency perspective or social growth perspective.

Does data format or method of release matter for outcomes, to the extent that you could measure it?

Harvey Lewis: From our analysis, data released through APIs and, in particular, transport data was in significant demand. There were noticeably more applications being built on the back of transport data published through an API than in almost any other area.

As a mechanism for making it easy for businesses to get hold of data, APIs are pretty crucial. Being able to provide data using that mechanism is a very good way of stimulating use.

Based on some of the other work that we’ve been doing, there’s a big push to release data in its raw form. CSV is talked about quite a lot. In some cases, that works well. In other cases, it is a barrier to entry for small-to-medium enterprises.

To go back to the general practitioner prescribing data, a single month’s worth of data is published in a CSV file each month. The file size is about half a gigabyte and contains typically over four million records. If you’re a small-to-medium enterprise with limited resources — or even if you’re a journalist — you cannot open that data file in typical desktop or laptop software. There’s just too many records. Even if you can find software that will open it, running queries on it takes a very long time.

There’s a natural barrier to entry for some formats that you really only appreciate once you try to process and get to grips with the data. That, I think, is something that needs to be thought through.

There’s an imperative to get data out there, but if you provide that data in a format that small-to-medium enterprises can’t use, I think it’s unfair. Larger businesses have the tools and the specialist capability to look at these files. That creates a problem, an economic barrier. It also creates a transparency barrier because although you may be publishing the data, no one can access it. Then you don’t get the benefits of increased transparency and accountability.

Where you’ve got potentially high-value datasets in health, crime, spending data and energy and environment data, a lot of care needs to be put into what formats are going to make that most easily accessible.

It isn’t always obvious. It isn’t the CSV file. It certainly isn’t the PDF! It isn’t anything, actually, that requires specialist knowledge and tools.

What are the next steps for your research inquiry?

Harvey Lewis: We’re continuing our work, trying to formulate ideas and methods. That includes using case studies and use cases, getting information from the public sector about how much it costs to generate the data, and looking at accounts of actual scenarios.

Understanding the economic impact, despite its challenges, is really important to policymakers around open data, to ensure that the benefits of releasing open data outweigh the costs of producing it. That’s absolutely essential to the business case of open data.

The other part of our activity is focusing on the insights that can be derived from open data that benefit the public sector or private sector companies. We’re looking quite hard at the growth opportunities in open data and the areas where significant cost savings or efficiencies can be gained.

We’re also looking at some interesting potential policy areas by mashing up different sources of data. For example, can you go some way to understanding the relationship between crime and mental health? With the release of detailed crime data and detailed prescribing data, there’s an opportunity, at a very granular level, to understand potential correlations and then do some research into the underlying causes. The focus of our research is subtly shifting toward more use-case type analysis, rather than looking at an abstract, generic picture about open data.

Bottom line: does releasing open data lead to significant economic benefit?

Harvey Lewis: My instinct and the data we have today suggest that it is going to lead to significant economic benefit. Precisely how big that benefit is needs further study.

I think it’s likely to be more in the realm of the broader impacts and some of the intangibles where we see the greatest impact, necessarily through new businesses starting up and more businesses using open data. We will see those things.

This post is part of our ongoing investigation into the open data economy.

February 23 2013

Rufus Pollock: Wo sind die Belege, dass verschlossene Daten besser sind?

Politik, Wirtschaft und Gesellschaft könnten von offenen Daten profitieren, wirbt Ökonom Rufus Pollock im Interview mit Man müsse aber genau zwischen wirklich offenen und nur geteilten, öffentlich einsehbaren Daten unterscheiden. Es zeichne sich eine Trendwende zu mehr offenen Daten ab.

Zur Person: Rufus Pollock ist Ökonom und einer der Gründer und Leiter der Open Knowledge Foundation. Er ist Associate am Centre for Intellectual Property and Information Law der Universität Cambridge und Fellow der Shuttleworth Foundation. Die Open-Data-Szene ist bunt gemischt: Regierungen und Verwaltungen, Unternehmen, zivilgesellschaftliche Akteure, Journalisten und andere treffen zusammen, mit unterschiedlichen Motivationen und Zielen. Was ist das verbindende Element?

Rufus Pollock: Im Kern geht es um Information und darum, dass Dienste und Produkte auf Informationen beruhen. Man nehme das Beispiel: Ein Produzent liefert Brot an Supermärkte, das wir als Kunden kaufen. Er benötigt zum Beispiel Informationen über Straßen und Verkehr. Oder ich plane eine Reise: Ich benötige Informationen über das Wetter an meinem Zielort. Für ein besseres Gesundheitssystem benötigt man Informationen und Daten über die Situation in Krankenhäusern. In all diesen Fällen können offene Daten zu Verbesserungen führen.

Rufus Pollock. Foto: Sebastiaan ter Burg, CC BY-SA

Rufus Pollock. Foto: Sebastiaan ter Burg, CC BY-SA.

Natürlich verbinden sich mit offenen Daten unterschiedliche Zwecke und unterschiedliche Akteure halten diese für unterschiedlich wichtig. Für einige ist es mehr Transparenz und Rechenschaftspflicht von Regierungen und Verwaltungen, andere starten Unternehmen und schaffen Arbeitsplätze.

Ein weiteres Ziel sind effiziente und bessere öffentliche Dienstleistungen. Warum haben wir eine Marktwirtschaft? Weil wir davon ausgehen, dass sie Ressourcen besser als andere Systeme zuordnet und verteilt. Und offene Daten ermöglichen es, dass Ressourcen besser verwendet und weniger verschwendet werden.

Alle diese Zwecke sind also berechtigt. Welche einem wichtiger sind, ist von Mensch zu Mensch natürlich unterschiedlich. Aus Verwaltungen hört man, offene Daten würden einen neuen Geist oder eine neue Kultur in der Verwaltung voraussetzen. Gerade die deutsche Verwaltung geht auf die preußischen Reformen zurück und ist noch geprägt davon. Wie wollen Sie Verwaltungen von offenen Daten überzeugen? Eine schöne App hier, eine Plattform da, das reicht wohl nicht aus.

Rufus Pollock: Wir sehen eine wachsende Anzahl an Belegen, dass offene Daten von Nutzen sind. Wir haben gesehen, dass offene Daten und mehr Transparenz die Sterblichkeitsraten in Krankenhäusern gesenkt haben. Forscher haben die wirtschaftlichen und gesellschaftlichen Vorteile offener Daten untersucht.

Und offene Daten sind nicht teuer. Die Kosten, um schon vorhandene Daten zu öffnen, sind meistens nahe null oder sehr gering. Ich möchte das Argument auch einmal umdrehen: Wo sind die Belege, dass es besser ist, wenn Daten verschlossen gehalten und verkauft werden? Häufig sind Daten auch überhaupt nicht erhältlich und werden nicht einmal verkauft. Die Belege, dass das besser ist, würde ich gerne einmal sehen.

Aber die Dinge ändern sich. Es werden viel mehr Daten geöffnet. Selbst unter Regierungen ist die Haltung inzwischen eine andere als noch vor zwei Jahren. Man sieht es auch daran, dass etwa das Fraunhofer Fokus-Institut eine Konferenz darüber veranstaltet hat, das ja nicht unbedingt zum radikalen Flügel zählt. Das wird zunehmend Mainstream. In Deutschland gab es auch große Aufregung, als etwa die Schufa ankündigte, sie wolle Facebook-Daten im Rahmen eines Forschungsprojekts darauf untersuchen, ob sie für Bonitätsprüfungen einsetzbar sind. Diese Daten bei Facebook sind natürlich nicht offen, aber dennoch sind sie öffentlich. Damit verbinden sich Ängste. Ist die Sorge nicht berechtigt, dass Daten vor allem jenen nützen, die mit ihnen umzugehen verstehen? Letzten Endes also einer Minderheit, selbst wenn sie offen zutage liegen?

Rufus Pollock: Eine Antwort ist, dass die Nutzer mehr Kontrolle über ihre eigenen Daten haben sollten. Ich sollte etwa wählen können, mit wem ich meinen „sozialen Graphen”, meine Beziehungen bei Facebook teile. Man kann natürlich immer Schlüsse ziehen, auch dann, wenn jemand seine Daten nicht teilt. In diesem Fall geht es natürlich auch um fehlende Einwilligung.

Open Data dagegen ist in der Regel das Ergebnis einer Entscheidung. Wir sollten auch zwischen wirklich offenen und nur „mehr geteilten” Daten unterscheiden. Es gibt hier einen fundamentalen Unterschied. Man kann zwischen rechtlichen, technischen und politischen Aspekten der Diskussion um offene Daten unterscheiden. Wo sehen Sie die größten Probleme, um Daten in ihrem Verständnis zu öffnen?

Rufus Pollock: Bislang besteht das Problem vor allem darin, vorhandene Daten zu öffnen. Das braucht Zeit und Einsatz. Die technischen Probleme sind nicht riesig. Es gibt sie natürlich, aber mit etwas Erfindergeist lassen sie sich lösen.

Die rechtlichen Probleme sind offensichtlich gewichtig: Es ist gut, dass man entscheiden kann, Daten zu öffnen. Aber wir könnten eigentumsähnliche Rechte an Daten auch weniger umfassend und tiefgreifend gestalten. Wenn solche Monopolrechte wirklich benötigt werden, dann sollten sie zeitlich begrenzt sein – so wie auch das Urheberrecht zeitlich begrenzt ist.

Wichtig dabei ist natürlich, zwischen Immaterialgüter- und Persönlichkeitsrechten zu unterscheiden. Mein Recht auf Privatheit gilt ein Leben lang. Aber das Recht, über Daten zu verfügen, könnte und sollte viel kürzer sein, womöglich allenfalls ein paar Jahre.

Foto: Calistobreeze, CC BY-NC-SA.

February 22 2013

White House moves to increase public access to scientific research online

Today, the White House responded to a We The People e-petition that asked for free online access to taxpayer-funded research.

open-access-smallopen-access-smallAs part of the response, John Holdren, the director of the White House Office of Science and Technology Policy, released a memorandum today directing agencies with “more than $100 million in research and development expenditures to develop plans to make the results of federally-funded research publically available free of charge within 12 months after original publication.”

The Obama administration has been considering access to federally funded scientific research for years, including a report to Congress in March 2012. The relevant e-petition, which had gathered more than 65,000 signatures, had gone unanswered since May of last year.

As Hayley Tsukayama notes in the Washington Post, the White House acknowledged the open access policies of the National Institutes of Health as a successful model for sharing research.

“This is a big win for researchers, taxpayers, and everyone who depends on research for new medicines, useful technologies, or effective public policies,” said Peter Suber, Director of the Public Knowledge Open Access Project, in a release. “Assuring public access to non-classified publicly-funded research is a long-standing interest of Public Knowledge, and we thank the Obama Administration for taking this significant step.”

Every federal agency covered by this memomorandum will eventually need to “ensure that the public can read, download, and analyze in digital form final peer-reviewed manuscripts or final published documents within a timeframe that is appropriate for each type of research conducted or sponsored by the agency.”

An open government success story?

From the day they were announced, one of the biggest question marks about We The People e-petitions has always been whether the administration would make policy changes or take public stances it had not before on a given issue.

While the memorandum and the potential outcomes from its release come with caveats, from a $100 million threshold to national security or economic competition, this answer from the director of the White House Office of Science Policy accompanied by a memorandum directing agencies to make a plan for public access to research is a substantive outcome.

While there are many reasons to be critical of some open government initiatives, it certainly appears that today, We The People were heard in the halls of government.

An earlier version of this post appears on the Radar Tumblr, including tweets regarding the policy change. Photo Credit: ajc1 on Flickr.

Reposted bycheg00 cheg00

February 21 2013

VA looks to apply innovation to better care and service for veterans

va-header-logova-header-logoThere are few areas as emblematic of a nation’s values than how it treats the veterans of its wars. As improved battlefield care keeps more soldiers alive from injuries that would have been lethal in past wars, more grievously injured veterans survive to come home to the United States.

Upon return, however, the newest veterans face many of the challenges that previous generations have encountered, ranging from re-entering the civilian workforce to rehabilitating broken bodies and treating traumatic brain injuries. As they come home, they are encumbered by more than scars and memories. Their war records are missing. When they apply for benefits, they’re added to a growing backlog of claims at the Department of Veterans Affairs (VA). And even as the raw number of claims grows to nearly 900,000, the average time to process them is also rising. According to Aaron Glanz of the Center for Investigative Reporting, veterans now wait an average of 272 days for their claims to be processed, with some dying in the interim.

While new teams and technologies are being deployed to help with the backlog, a recent report (PDF) from the Office of the Inspector General of the Veterans Administration found that new software deployed around the country that was designed to help reduce the backlog was actually adding to it. While high error rates, disorganization and mishandled claims may be traced to issues with training and implementation of the new systems, the transition from paper-based records to a digital system is proving to be difficult and deeply painful to veterans and families applying for benefits. As Andrew McAfee bluntly put it more than two years ago, these kinds of bureaucratic issues aren’t just a problem to be fixed: “they’re a moral stain on the country.”

Given that context, the launch of a new VA innovation center today takes on a different meaning. The scale and gravity of the problems that the VA faces demand true innovation: new ideas, technology or methodologies that challenge and improve upon existing processes and systems, improving the lives of people or the function of the society that they live within.

“When we set out in 2010 to knowingly adopt the ‘I word’, we did so with the full knowledge that there had to be something there,” said Jonah J. Czerwinski, senior advisor to VA Secretary Eric Shinseki and director of the VA Innovation Initiative, in a recent interview. “We chose to define value around four measurable attributes that mean something to taxpayers, veterans, Congressional delegations and staff: access, quality, cost control and customer satisfaction. The hard part was making it real. We focused for the first year on creating a foundation for what we knew had to justify its own existence, including identifying problem areas.”

The new VA Center for Innovation (VACI) is the descendent of the VA’s Innovation Initiative (VAi2), which was launched in 2010. Along with the VACI, the VA announced that it would adopt an innovation fellows program, following the successful example set by the White House, Department of Health and Human Services and the Consumer Financial Protection Bureau, and bring in an “entrepreneur-in-residence.” The new VACI will back 13 new projects from an industry competition, including improvements to prosthetics, automated sterilization, the Blue Button and cochlear implants. The VA also released a report on the VACI’s progress to date.

“We’re delving into new ways of providing audiology at great distances,” said Czerwinski, “delivering video into the home cheaply, with on-demand care, and the first wearable automatic kidney. Skeptics can judge any innovation endeavor by different measures. The question is whether at the end of the cycle if it’s still relevant.”

The rest of my interview with Czerwinski follows, slightly edited for clarity and content.

Why launch an “innovation center?”

Jonah J. Czerwinski: When we started VAi2, our intent was delving into the projects the secretary charged us with achieving. The secretary has big goals: eliminate homelessness, eliminate backlog, increase access to care.

It’s not enough for an organization to create a VC fund. It’s the way in which we structure ourselves and find compelling new ways of solving problems. We had more ways to do that. The reason why we have a center for innovation is not because we need to start innovating — we have been innovating for decades, at local levels. We’ve been disaggregated in different way. We may accomplish objectives but the organization as a whole may not benefit.

We have a cultural mission with the center that’s a little more subtle. It’s not just about funding different areas. It’s about changing from a culture where people are incented to manage problems in perpetuity to one in which people are incented to solve problems. It’s not enough to reduce backlog by a percentage point or the number of re-admissions with an infection. How do you reward someone for eliminating something wholesale?

We want our workforce to be part of that objective, to be part of coming up with those ideas. The innovation competition started in 2009 led to 75 ideas to solve problems. We have projects in almost every state now.

How will innovation help with the claims backlog?

Jonah J. Czerwinski: It’s complicated. Tech, laws, people factors, process factors, preferences by parts of interest groups all combine to make this hard. We hear different answers, depending upon the state. The variation is frustrating because it seems unfair. There are process improvements that you can’t solve from a central office. It can’t be solved simply by creating a new claims process. We can’t hire people to do this for us. It is inherently a governmental duty.

We’ve started to wrestle with automation, end-to-end. We have a Fast Track Initiative, where we’re asking how would you take a process, starting with a veteran, and end up with a decision. The insurance industry does this. We’ve hired a company to create the first end-to-end claims process as a prototype. It works enough that it created a new definition for what’s in the realm of the possible. It’s created permission to start revisiting the rules. There’s going to be a better way to automate the claims process.

What’s changed for veterans because of the “Blue Button?”

Jonah J. Czerwinski: There’s a use case where veterans receive care from both the VA and private sector hospitals. That happens about half the time. A non-VA hospital doesn’t have VISTA, our EHR [electronic health record.] If a patient goes there for care, like for an ER visit during a weekend because of congestive heart failure, doctors don’t have the information that we know about the patient at the VA. We can provide it for them without interoperability issues. That’s one direction. It’s also a way to create transparency in quality of care, if the hospital has visibility in your healthcare status.

In terms of continuity of care, when that veteran comes back to a VA hospital, the techs don’t have visibility into what happened at the other hospital. A veteran can download clinical information and bring that back. We now have a level of care between the public and private sector you never had before.

February 14 2013

Information Mining: Aus dem Steinbruch der Wissenschaft

Große Massen an Forschungsdaten werden mit Techniken des „Information Mining” maschinell ausgewertet, um neue statistische Muster zu entdecken. Das wirft technische und rechtliche Fragen und Probleme auf: In PDF-Dateien lässt sich schlecht nach Daten schürfen. Soll Information Mining in der Wissenschaft allgemein erlaubt oder lizenzrechtlich geregelt werden?

In nicht wenigen Wissenschaftsdisziplinen verschwimmen die Grenzen zwischen Text und Software, etwa wenn man living documents betrachtet, die Updates unterliegen, oder dazu übergeht, Texte in Umgebungen wie Github oder Figshare kollaborativ zu entwickeln. Wenn man Texte als eine Art kompilierte Software ansieht, sollte man auch deren Quelltexten, den Forschungsdaten, Aufmerksamkeit schenken. Denn wie Jenny Molloy von der Open Knowledge Foundation resümiert: „Science is built on data“.

Textpublikationen dokumentieren die Schaffung eines Wissensstands, die in Form von Zitaten oder Projektbewilligungen belohnt wird. Die zugrundeliegenden Daten bleiben oft verborgen – es sei denn, man stellt sie im Open Access bereit. Dies birgt gewisse Risiken: Wissenschaftler, die keinen Beitrag zur Erhebung leisteten, könnten die Daten auswerten und den ursprünglichen Datenproduzenten zur Konkurrenz werden.

Andererseits potenziert die offene Zugänglichkeit den wissenschaftlichen Fortschritt und die Verwertung der Daten, da unzählige Wissenschaftler sie auswerten können. Diese Crowd-Komponente der Datennutzung wird ergänzt durch die technischen Möglichkeiten des Data Mining. Digital vorliegendende Forschungsdaten werden automatisiert und rechnergestützt ausgewertet – ob Datenreihen, Tabellen, Graphen, Audio- und Videodateien, Software oder Texte.

Muster in Datenbergen entdecken

Digitale Verfügbarkeit und maschinelle Auswertungen kennzeichnen den Aufstieg der data-driven science, die statistische Muster in schier unendlichen Daten ausmacht, um diese anschließend wissenschaftlich zu erklären. Dieser Ansatz ergänzt die traditionelle theorie- und hypothesengetriebene Wissenschaft, die von Theorien ausgeht, Hypothesen ableitet, Erhebungsinstrumente entwirft, dann Daten erhebt und anschließend analysiert.

Um die Möglichkeiten der neuen Methoden auszuschöpfen, sollten die Daten jedoch offen verfügbar sein. So verlangen es zum Beispiel die Panton Principles, die fordern, dass Forschungsdaten auf jede mögliche Art offen genutzt, ausgewertet und weiterverbreitet werden dürfen, solange die Datenproduzenten genannt werden. Sogar diese Bedingungen entfallen, wenn die Resultate in die public domain, in die Gemeinfreiheit entlassen werden.

Stochern in PDF-Dateien

In der Praxis sind Forschungsdaten zwar teils verfügbar – sei es nur für Subskribenten wissenschaftlicher Journale oder auch für jedermann – offen sind sie jedoch nicht unbedingt: Weder rechtlich, denn selbst Informationen in auslesbaren Formaten stehen längst nicht immer unter einer Lizenz, die Data Mining ausdrücklich erlaubt. Noch technisch, denn oft finden sich Daten in versiegelten PDF-Dateien, die nicht maschinell ausgewertet werden können. Ein Umstand, den die Open-Science-Community pointiert mit der Analogie beschreibt, Daten aus einer PDF-Datei zu extrahieren gleiche dem Versuch, aus einem Hamburger wieder ein Rind zu machen.
Gegen das Text- und Data-Mining positionieren sich kommerzielle Akteure, deren Geschäftsmodell auf der Verknappung von Information basiert: In einer Konsultation (PDF) des Intellectual Property Office in Großbritannien sprachen sich zahlreiche dieser Informationsanbieter gegen eine Blankoerlaubnis zum Content-Mining copyright-belasteter Inhalte zu wissenschaftlichen Zwecken aus – selbst wenn die Institution eines Forschers auf die Inhalte via Subskription zugreifen darf und obwohl die Forschungsergebnisse mit öffentlichen Geldern produziert wurden.

Einige der Informationsanbieter schlugen vor, den Zugang über Lizenzierungen zu regeln, die allerdings vermutlich – dem traditionellen Geschäftsmodell folgend – kostenpflichtig sein dürften. Dem Chemiker Peter Murray-Rust etwa gestattete ein Verlag nach zwei Jahren zäher Verhandlung das Text-Mining von Publikationen, jedoch nur wenn die Rechte an den Resultaten an den Verlag fielen und nicht öffentlich zugänglich gemacht würden.

Nutzen der Offenheit

Volkswirtschaftlich betrachtet haben Data- und Text-Mining jedoch ungeheures Potential: Ihre Anwendung in der Wissenschaft könnte nach einer McKinsey-Studie der europäischen Wirtschaft eine Wertschöpfung von 250 Milliarden Euro pro Jahr bescheren. Das setzt aber voraus, dass Informationen offen verfügbar sind, denn der Ausschluss kommerzieller Daten-Nutzung verhindert, dass neue Dienste und Produkte entwickelt werden.

Murray-Rust etwa entwickelte Techniken zum Data-Mining kristallographischer Daten, deren Ergebnisse sehr fruchtbar für die Schaffung neuer medizinischer Wirkstoffe sein können. Wenn es nicht erlaubt ist, die ausgewerteten Daten kommerziell zu verwerten, werden Pharmafirmen vor der Verwendung Abstand nehmen. Nicht zuletzt ermöglicht Text- und Data-Mining auch effizienteres Information Retrieval, etwa wenn Forschern Empfehlungsdienste nach einer Analyse relevante Daten oder Texte vorschlagen und aufwändige Recherchen abkürzen.

Ulrich Herb ist Herausgeber des frei verfügbaren Sammelbandes „Open Initiatives: Offenheit in der digitalen Welt und Wissenschaft”, Promovend zum Thema „Open Social Science“, Open Access Experte der Universität des Saarlandes, freiberuflicher Wissenschafts- und Publikationsberater. Foto: Flickr/Born1945, CC BY-SA.

February 13 2013

Personal data ownership drives market transparency and empowers consumers

On Monday morning, the Obama administration launched a new community focused on consumer data at While there was no new data to be found among the 507 datasets listed there, it was the first time that smart disclosure has an official home in federal government. consumer slide apps consumer slide apps image

Image via

“Smart disclosure means transparent, plain language, comprehensive, synthesis and analysis of data that helps consumers make better-informed decisions,” said Christopher Meyer, the vice president for external affairs and information services at Consumers Union, the nonprofit that publishes “Consumer Reports,” in an interview. “The Obama administration deserves credit for championing agency disclosure of data sets and pulling it together into one web site. The best outcome will be widespread consumer use of the tools — and that remains to be seen.”

You can find the new community at or Both URLs forward visitors to the same landing page, where they can explore the data, past challenges, external resources on the topic, in addition to a page about smart disclosure, blog posts, forums and feedback.

“Analyzing data and giving plain language understanding of that data to consumers is a critical part of what Consumer Reports does,” said Meyer. “Having hundreds of data sets available on one (hopefully) easy-to-use platform will enable us to provide even more useful information to consumers at a time when family budgets are tight and health care and financial ‘choices” have never been more plentiful.”

The newest community brings the total number of communities on to 16. A survey of the existing communities didn’t turn up much recent activity in the forums or blogs, although the health care community at has more signs of life than others and there are ongoing challenges at associated with many different topics.

Another side of open?

Smart disclosure is one of the 17 initiatives that the U.S. committed to as part of the National Action Plan for the Open Government Partnership.

“We’ve developed new tools — called ‘smart disclosures’ — so that the data we make public can help people make health care choices, help small businesses innovate, and help scientists achieve new breakthroughs,” said President Obama, speaking at the launch of the Open Government Partnership in New York City in September 2011. “We’ve been promoting greater disclosure of government information, empowering citizens with new ways to participate in their democracy. We are releasing more data in usable forms on health and safety and the environment, because information is power, and helping people make informed decisions and entrepreneurs turn data into new products, they create new jobs.”

In the months since, the Obama administration has been promoting the use of smart disclosure across federal government through a task force (PDF), working to embed the practice as part of the ways that agencies deliver on consumer policy. The United Kingdom’s “Midata” initiative is an important smart disclosure case study outside of the United States.

In 2012, the U.S. Treasury Department launched a finance data community, joining open data initiatives in health care, energy, education, development and safety.

“I think you have to say that what has been accomplished so far is mostly [that] the release of government data has spawned a new generation of apps,” said Richard Thaler, professor of behavioral science and economics at the University of Chicago, in an interview. “This has been a win-win for business and consumers. New businesses are created to utilize the now available government data, and consumers now know when the next bus will arrive. The next step will be to get the private sector data into the picture — but that is only the bright future at this stage, rather than something that has already been accomplished. It is great that the government has led the way in releasing data, since it will give them more credibility when they ask private companies to do the same.”

Open data as catalyst?

While their business or organizational goals for data usage may diverge, consumer advocates, entrepreneurs and media are all looking for more insight into what’s actually happening in marketplaces for goods and services.

“Data releases are critical,” said Meyer. “First, even raw, less consumer-friendly data can help change government and industry behavior when it is published. Second, sunlight truly is the best disinfectant. We believe government and industry want to do right by consumers. Scrutiny of data makes the next iteration better, whether it’s produced by the government or a hospital.”

What will make these kinds of disclosures “smart?” When they involve timely, regular release of personal data in standardized, machine readable formats. When data is more liquid, it can easily be ingested by entrepreneurs and developers to be used in tools and services to help people to make more informed decisions as they navigate marketplaces for finance, health care, energy, education or other areas.

“We use government datasets a great deal in the health care space,” said Meyer. “We use CMS ‘Hospital Compare’ data to publish ratings on patient experience and re-admissions. To develop ratings of preventive services for heart disease, we rely on the U.S. Preventive Services Task Force.”

The stories of Brightscope and Panjiva are instructive: both startups had to invest significant time, money and engineering talent in acquiring and cleaning up government data before they could put it to work adding transparency to supply chains or financial advisers.

“It’s cliche, but true – knowledge is power,” said Yaron Samid, the CEO of BillGuard, in an interview. “In BillGuard’s case, when we inform consumers about a charge on their credit bill that was disputed by thousands of other consumers or a known grey charge merchant before they shop, it empowers them to make active choices in protecting their money – and spending it, penny for penny, how they choose and explicitly authorize. The release and cross-sector collaboration of billing dispute data will empower consumers and help put an end to deceptive sales and billing practices, the same way crowdsourced “mark as spam” data did for the anti-spam industry.”

What tools exist for smart disclosure today?

If you look through the tools and services at the new, quite a few of the examples are tools that use smart disclosure. When they solve knotty problems, such consumer-facing products or services have the potential to massively scale quickly:

As Meyer pointed out in our interview, however, which ones catch on is still an open question.

“We are still in the nascent stage of identifying many smart disclosure outcomes that have benefited consumers in a practical way,” he said. “Where we can see demonstrable progress is the government’s acknowledgement that freeing the data is the first and most necessary step to giving private sector innovators opportunity to move the marketplace in a pro-consumer direction.”

The difference between open data on a government website and data put to work where consumers are making decisions, however, is significant.

“‘Freeing the data’ is just the first step,” said Meyer. “It has to be organized in a consumer-friendly format. That means a much more intense effort by the government to understand what consumers want and how they can best absorb the data. Consumer Reports and its policy and action arm, Consumers Union, have spent an enormous amount of time trying to get federal and state governments and private health providers to release information about hospital-acquired infections in order to prevent medical harms that kill 100,000 people a year. We’re making progress with government agencies, although we have a long way to go.”

There has already been some movement in sectors where consumers are used to downloading data, like banking. For instance, BillShrink and Hello Wallet use government and private sector data to help people to make better consumer finance decisions. OPower combines energy efficiency data from appliances and government data on energy usage and weather to produce personalized advice on how to save money on energy bills. BillGuard analyzes millions of billing disputes to find “grey charge” patterns on credit cards and debit cards. (Disclosure: Tim O’Reilly is on BillGuard’s Advisory Board and is a shareholder in the startup.)

“To get an idea of the potential here, think about what has happened to the travel agent business,” said Thaler. “That industry has essentially been replaced by websites servings as choice engines. While this has been a loss to those who used to be travel agents, I think most consumers feel they are better served by being able to search the various travel and lodging options via the Internet. When it comes to choosing a calling plan or a credit card, it is very difficult to get the necessary data, either on prices or on one’s own utilization, to make a good choice. The same is true for mortgages. If we can make the underlying data available, we can help consumers make much better choices in these and other domains, and at the same time make these industries more competitive and transparent. There are similar opportunities in education, especially in the post-high school, for-profit sector.”

Recent data releases have the potential to create new insights into previously opaque markets.

“There are also citizen complaint registries that have been created either by statute (Consumer Product Improvement Safety Act of 2008) or by federal agencies, like the Consumer Financial Protection Bureau (CFPB). [These registries] will create rich datasets that industry can use to improve their products and consumer advocates can analyze to point out where the marketplace hasn’t worked,” said Meyer.

In 2012, the CFPB, in fact,began publishing a new database online. As was the case with the Consumer Product Safety Commission in 2011, the consumer complaint database did not go online without industry opposition, as Suzy Khimm reported in her feature story on the CFPB. That said, the CFPB has been making consumer complaints available to the public online since last June.

That data is now being consumed by BillGuard, enabling more consumers to derive benefit that might not have been available otherwise.

“The CFPB has made their consumer complaint database open to the public,” said Samid. “Billing disputes are the No. 1 complaint category for credit cards. We also source consumer complaint data from the web and anonymized billing disputes directly from banks. We are working with other government agencies to share our findings about grey charges, but cannot disclose those relationships just yet.”

“Choice engines” for an open data economy

Many of this emerging class of services use multiple datasets to provide consumers with insight into their choices. For instance, reviews and experiences of prior customers can be mashed up with regulatory data from government agencies, including complaints. Data from patient reviews could power health care startups. The integration of food inspection data into Yelp will give consumers more insights into dining decisions. Trulia and Zillow suggest another direction for government data use, as seen in real estate.

If these early examples are any guide, there’s an interesting role for consumer policy makers and regulators to play: open data stewards and suppliers. Given that the release such data has an effect on the market for products and services, expect more companies in affected industries to resist such initiatives, much in the same way that that CPSC and CFPB database were opposed by industry. Such resistance may be subtle, where government data collection is portrayed as part of a regulator’s mission but its release into the marketplace is undermined.

Nonetheless, smart disclosure taps into larger trends, in particular “personal data ownership” and consumer empowerment. The growth of an energy usage management sector and participatory health care show how personal data can be used, once acquired. The use of behavioral science in combination with such data is of great interest to business interest and should attract the attention of policy makers, legislators and regulators.

After all, convening and pursuing smart disclosure initiatives puts government in an interesting role. If government agencies or private companies then choose to apply behavioral economics in programs or policies, with an eye on improving health or financial well-being, how should the policies themselves be disclosed use? What principles matter?

“The guideline I suggest is that if a firm is keeping track of your usage and purchases, then you should be able to get access to that data in a machine-readable, standardized format that, with one click, you could upload to a search engine website,” said Thaler. “As for the proper balance, I am proposing only that consumers have access to their raw purchase history, not proprietary inferences the firm may have drawn. To give an example, you should have a right to download the list of all the movies you have rented from Netflix, but not the conclusions they have reached about what sort of movies you might also like. Also, any policy like this should begin with larger firms that already have sophisticated information systems keeping track of consumer data. For those firms, the costs of providing the data to their consumers should be minor.”

Given the growth of student loans, more transparency and understanding for higher education education choices is needed. For that to happen, prospective students will need more access to their own personal data to build the profile that they can then use to get personalized recommendations about education, along with data from higher education institutions, including outcomes for different kinds of students, from graduation rates to job placement.

Disclosures of data regarding outcomes can have other effects as well.

“I referenced the hospital-acquired infection battle earlier,” said Meyer. “In 1999, the Institute of Medicine released a study, “To err is human,” that showed tens of thousands of consumers were dying because of preventable medical harms. Consumers Union started a campaign in 2003 to reduce the number of deaths due to hospital-acquired infections. Our plan was to get laws passed in states that required disclosure of infections. We have helped get laws passed in 30 states, which is great, but getting the states to comply with useful data has been difficult. We’re starting to see progress in reducing infections but it’s taken a long time.”

This post is part of our ongoing investigation into the open data economy.

February 08 2013

Datenportal des Bundes: Preußen im Internet

Die Bundesregierung versucht „Open Data“, wird dafür jedoch nicht etwa gelobt, sondern mit einer Protest-Website konfrontiert. Wie das angehen kann? Es geht um Begrifflichkeiten, Rechte, Communities und Technik. Ein Erklärungsversuch.

Bund und Länder bekommen ein Datenportal, aus dessen Namen wenige Tage vor dem offiziellen Start das Wort „Open” verschwunden ist. Verschiedene Initiativen, Organisationen und Aktivisten empören sich in einem offenen Brief beim Bundesinnenministerium. Die Unterzeichnenden wollen wenigstens nicht als Feigenblatt eines scheiternden Ansatzes herhalten. Letzteres drohte, weil die für das neue Portal „Govdata – Das Datenportal für Deutschland” Zuständigen über mehrere sogenannte „Community Workshops” versuchen, die potenziellen Nutzer des Portals in die Planung mit einzubeziehen. Dieser Ansatz wird von niemandem grundsätzlich kritisiert, er sollte vielmehr selbstverständlich sein bei staatlichen Vorhaben dieser Art. Dass die einbezogene Community sich nun aber ausdrücklich distanziert, belegt, dass der Ansatz in diesem Fall zumindest teilweise gescheitert ist.

Der Protest entzündet sich vor allem an drei Punkten:

  • Erstens bekennen sich Bund und Länder mit Govdata Deutschland nicht eindeutig zu den unter Open-Data-Aktivisten international längst anerkannten Standards und Definitionen von „Open”. Stattdessen schaffen sie eine nationale Insellösung und geben damit nach Ansicht der Unterzeichner des Protestbriefs ein schlechtes Vorbild ab.
  • Zweitens bleibt es den Behörden selbst überlassen, ob sie Daten zu dem Portal beisteuern – und wenn ja, welche Nutzung der Daten sie gestatten.
  • Drittens halten Kritiker die bisher über das Portal verfügbaren Daten für kaum nachgefragte „Schnarchdaten”. Das, heißt es, sei die Folge von unverbindlichen oder falschen Prioritäten, welche Daten zur Verfügung gestellt werden sollen. Dass die Standortdaten von Hundekotbehältern weniger interessant sind als der Energieverbrauch öffentlicher Anlagen, liegt auf der Hand. Zusammengenommen steckt in dem offenen Brief der Vorwurf, die deutsche Politik ergehe sich in halbherzigen Schritten und täusche Innovationsbereitschaft nur vor.

Nun findet sich also kein „Open” mehr im Namen des Portals. Damit entgeht die Bundesregierung zumindest der Kritik, sie betreibe hier Etikettenschwindel und eine Verwässerung des Begriffs „Open Data”. Das verweist bereits auf eine sehr grundsätzliche Meinungsverschiedenheit zwischen Community und Bundesregierung. Dass diese erst so spät entdeckt wurde, liegt vor allem an etwas sehr banalem: Zeitdruck.

Aus informierten Kreisen ist zu erfahren, dass die Cebit 2013 die Zeitpläne regiert hat. Um bis dahin etwas in Richtung „Open Government Data” präsentieren zu können, wurde eine Studie beim Fraunhofer-Institut für offene Kommunikationssysteme (FOKUS) in Auftrag gegebendem, die dem Vernehmen nach mit viel zu wenig Bearbeitungszeit ausgestattet war. Ebenso hastig ging es dann an die technische Umsetzung der Studienempfehlungen, die sich bereits in diesem Stadium zu wenig um den Bedeutungsgehalt des sensiblen Begriffs „Open Data” scherten. Das alles geschah und geschieht unter der Leitung des Bundesinnenministeriums, aber in ständiger Abstimmung mit einer bunt besetzten Bund-Länder-Arbeitsgruppe. Was auf die nächste Streitursache hindeutet.

Amtsschimmel 2.0 statt politischer Vorgaben

Schon die Trägheit eines solchen Abstimmungsprozesses ist offensichtlich. Dass er zu lauwarmen Ergebnissen neigt, ist ein Erfahrungssatz. Beides hätte einzig durch eine mit entsprechender politischer Prokura ausgestattete funktionale Leitung auf Bundesebene abgemildert werden können. Die aber gibt es nicht. Weder hat im föderalen System der Bund eine rechtliche Kompetenz, um in Sachen öffentlicher Daten „durchzuregieren”, noch gibt es bei der Bundesregierung offenbar den politischen Willen, wenigstens eine starke Leithammelfunktion jenseits der Kompetenzverteilung zu übernehmen.

In den Ländern sind verschiedene Ministerien, in einem Fall sogar das Landwirtschaftsressort verantwortlich. So kommt es, dass man auf Arbeitsebene des Bundesinnenministeriums zwar irgendwie „offene Daten” will, aber immer nur Bittsteller gegenüber den Behördenleitungen ist, die sich – mangels klarer politischer Ansage ihrer jeweiligen Landesregierung – vor allem ihren eigenen Maßstäben verpflichtet fühlen. Und entsprechend sehen dann auch die Nutzungsbedingungen für Govdata aus, deren zwei Varianten einer „Datenlizenz Deutschland” im Zentrum der Kritik der Protestierenden stehen.

Keine Standards, kein Open, so einfach ist das

Die Unterzeichner des offenen Briefes pochen auf Standards. Dazu gehören neben einigen Standardlizenzmodellen in erster Linie die von unterschiedlichen nichtstaatlichen Akteuren kuratierten Definitionen und Prinzipien unter, und Keine dieser Quellen ist in irgendeiner Weise bindend, in überstaatliche Strukturen oder Abkommen eingebunden. Niemand hat weltweite Markenrechte am Begriff „Open Data” und kann darüber erzwingen, welche Daten wirklich „Open” sind. Aber immerhin wurden die Regeln bislang stets im Konsens einer großen Zahl von Beteiligten weltweit gefunden.

Im Fall der Freigabe von Daten öffentlicher Stellen, neudeutsch „Public Sector Information” (PSI) genannt, werden immer wieder die USA als Paradebeispiel eines bei Daten sehr freigiebigen Staates angeführt. Im Zweifel, so heißt es, seien Behördendaten dort für jede/n einsehbar und nachnutzbar, Verschlusssachen die Ausnahme. Das amerikanische Datenportal gilt als großes Vorbild bei der elektronischen Verfügbarmachung öffentlicher Daten, gefolgt vom britischen Pendant. Die staatliche Verwaltung preußischer Prägung funktioniert leider anders.

Verwaltungstradition, elektronisch gewendet

Vor der Verabschiedung des Informationsfreiheitsgesetzes (IFG) auf Bundesebene und entsprechender Ländergesetze galt eindeutig: Im Zweifel unterliegen behördliche Vorgänge dem Amtsgeheimnis. Wer Akteneinsicht haben wollte, musste eine besondere Berechtigung vorweisen können. Noch heute versuchen viele Behörden, Auskunftsanträge nach dem IFG und vergleichbaren Gesetzen auf Biegen und Brechen zu erschweren, indem sie die Ausnahmeregeln kreativ interpretieren. Initiativen wie „Frag den Staat” arbeiten mit modernen Mitteln dagegen an.

Die Kritiker befürchten daher, dass viele deutsche Behörden – so sie überhaupt die wirklich interessanten Daten herausgeben – die Option wählen, nach der nur eine nicht-kommerzielle Nutzung ihrer Daten gestattet ist. Diese Lizenz aber widerspricht dem für offene Daten weithin anerkannten Grundsatz, wonach die Daten auch kommerziell nachnutzbar sein müssen. Im Grunde ist es aber noch schlimmer.

Zwar sind auch die Datenportale anderer Länder nicht ohne Fehl und Tadel im Sinne der reinen Lehre von „Open Data“. Aber Govdata geht – der Fraunhofer-Studie folgend – einen ganz besonders eigenwilligen Weg. Statt die Nutzung der Daten vertraglich zu regeln, wie es alle Standardlizenzmodelle im Open-Content-Bereich von Wikipedia bis Linux tun, verlässt man sich lieber auf das deutsche Verwaltungsrecht und setzt damit eine weitere Ursache des entbrannten Streits. Juristischen Laien ist das Problem nur schwer zu vermitteln – hier ein Versuch.

Fakten sind frei, außer wenn deutsche Behörden sie verwahren

Wer sich irgendwie mit dem sogenannten „geistigen Eigentum” befasst, kann sich üblicherweise an einen Grundsatz halten: Ideen und Fakten sind frei. Sie sind schlicht nicht „schutzfähig”, werden also nicht erfasst vom automatischen Schutz des Urheberrechts. Dieser Grundsatz gilt zwar nicht allumfassend, denn wer es darauf anlegt, kann technische Ideen in engen Grenzen durch Patente schützen lassen. Der europäische Datenbankenschutz überzieht große Teile der Datenlandschaft mit einem problematischen indirekten Investitionsschutz. Die Bundesregierung ist mit dem Plan für ein Leistungsschutzrecht für Presseverleger gerade sogar dabei, kleinste Textschnippsel lizenzpflichtig zu machen. Aber ein einzelnes Faktum, eine schlichte Information wie etwa die Tageshöchsttemperatur in Berlin an Heiligabend 2012 ist frei verwendbar – selbst wenn sie in einer ansonsten insgesamt geschützten Datenbank enthalten sein sollte. Doch wer sich auf diesen Grundsatz verlässt, hat die Rechnung ohne die schier unbegrenzten Möglichkeiten des deutschen Verwaltungsrechts gemacht.

Das Urheberrechtsgesetz ist Teil des Zivilrechts, nicht des Verwaltungsrechts. Alle, die sich in zivilrechtlichem Rahmen bewegen müssen (etwa Privatpersonen und Unternehmen), müssen mit den Werkzeugen auskommen, die ihnen das Zivilrecht anbietet. Im Urheberrechtsgesetz ist ein Schutzrecht für einzelne Fakten nicht vorgesehen, also besteht kein geeignetes Werkzeug, um rechtlich durchsetzbare Kontrolle über einzelne Fakten auszuüben. So weit, so klar. Staatliche Stellen aber können sich statt des zivilrechtlichen Rahmens einen anderen aussuchen, nämlich den des Verwaltungsrechts, innerhalb dessen sie sich gewissermaßen ihre eigenen rechtlichen Werkzeuge bauen können. Das geht so:

Sofern eine Behörde nicht gesetzlich verpflichtet ist, in einer bestimmten Weise zu handeln, kann sie für alles, was sie tut, fast beliebige Regeln aufstellen. Diese Regeln werden dann in einem sogenannten „Verwaltungsakt” festgelegt und durchgesetzt mittels amtlichen „Verwaltungszwangs”. Ein Verwaltungsakt wirkt darum wie ein Gesetz im Miniaturformat. Auch Nutzungsbedingungen können in dieser Form definiert werden. Der oben genannte Grundsatz, dass Fakten frei und nachnutzbar sind, sobald man sie hat, kann dadurch umschifft werden. Govdata schlägt genau diesen Weg ein, um den Behörden die Möglichkeit zu geben, die Verwendung der von ihnen bereitgestellten Daten noch kleinteiliger kontrollieren zu können, als es nach dem Zivilrecht möglich wäre. Bis hinunter zum einzelnen Faktum. Ob diese Kontrolle überhaupt aktiv ausgeübt werden soll, darüber sind sich wahrscheinlich noch nicht einmal die Teilnehmer der Bund-Länder-Arbeitsgruppe im Klaren. Schon das spräche dagegen, sie überhaupt vorzusehen.

Datenprojekte notfalls halb illegal?

Dies alles lässt verständlich werden, warum die gegenwärtige Debatte sehr grundsätzlich geführt wird. Nicht nur schafft sich Deutschland gerade seine eigenen Datenregeln, die bei der Vermischung mit Daten von anderswo zusätzlich beachtet werden müssen. Das erhöht den Aufwand für Datenprojekte und dehnt die rechtlichen Grauzonen aus. Zu allem Überfluss wird auch noch die Reichweite der Datenregeln über die bekannten Grenzen des Zivilrechts hinausgetrieben. Dass das einen spürbaren Dämpfer für die Open-Data-Community bedeuten wird, ist möglich, aber keineswegs ausgemacht. Manch ein Projekt wird es schlicht aufgeben, sich um die rechtlichen Fragen seiner Arbeit Gedanken zu machen. Und einfach loslegen. Wenn es Ärger geben sollte, kann das auch eine willkommene Marketinghilfe sein. Dennoch hätte der gegenwärtige Streit vermieden werden können und sollen. Er lässt den Start von Govdata als verpasste Chance erscheinen.

In gekürzter Form zuerst erschienen im Data Blog bei Zeit Online. Dieser Text wird unter der Creative-Commons-Lizenz Namensnennung 3.0 de veröffentlicht. John Hendrik Weitzmann gehört zu den Unterzeichnern von

Datenportal des Bundes: Preußen im Internet

Die Bundesregierung versucht „Open Data“, wird dafür jedoch nicht etwa gelobt, sondern mit einer Protest-Website konfrontiert. Wie das angehen kann?


February 07 2013

Looking at the many faces and forms of data journalism

Over the past year, I’ve been investigating data journalism. In that work, I’ve found no better source for understanding the who, where, what, how and why of what’s happening in this area than the journalists who are using and even building the tools needed to make sense of the exabyte age. Yesterday, I hosted a Google Hangout with several notable practitioners of data journalism. Video of the discussion is embedded below:

Over the course of the discussion, we talked about what data journalism is, how journalists are using it, the importance of storytelling, ethics, the role of open source and “showing your work” and much more.


Guests on the hangout included:


Here are just a few of the sites, services and projects we discussed:

In addition, you can see more of our data journalism coverage here.

February 05 2013

Investing in the open data economy

If you had 10 million pounds to spend on open data research, development and startups, what would you do with it? That’s precisely the opportunity that Gavin Starks (@AgentGav) has been given as the first CEO of the Open Data Institute (ODI) in the United Kingdom.

GavinStarksGavinStarksThe ODI, which officially opened last September, was founded by Sir Tim Berners-Lee and Professor Nigel Shadbolt. The independent, non-partisan, “limited by guarantee” nonprofit is a hybrid institution focused on unlocking the value in open data by incubating startups, advising governments, and educating students and media.

Previously, Starks was the founder and chairman of AMEE, a social enterprise that scored environmental costs and risks for businesses. (O’Reilly’s AlphaTech Ventures was one of its funders.) He’s also worked in the arts, science and technology. I spoke to Starks about the work of the ODI and open data earlier this winter as part of our continuing series investigating the open data economy.

What have you accomplished to date?

Gavin Starks: We opened our offices on the first of October last year. Over the first 12 weeks of operation, we’ve had a phenomenal run. The ODI is looking to create value to help everyone address some of the greatest challenges of our time, whether that’s in education, health, in our economy or to benefit our environment.

Since October, we’ve had literally hundreds of people through the door. We’ve secured $750,000 in matched funding from the Amida Network, on top of a 10-million-pound investment from the UK Government’s Technology Strategy Board. We’ve helped identify 200 million pounds a year in savings for the health service in the UK.

200 million pounds? What do you base that estimate upon?

Gavin Starks: Part of our remit is to bring together the main experts from different areas. To illustrate the kind of benefit that I think we can bring here, one part of what we’re doing is to try and unlock data supply.

The Health Service in the UK started to release a lot of its prescription information as open data about nine months ago. We worked with some of the main experts in the health service with a big data analytics firm, Mastodon C, which is a startup that we’re incubating at the ODI.

Together, they identified potential areas of investigation. The data science folks drilled into every single prescription. (I think the dataset was something like 47 million rows of data.) What they were looking at there was the difference between proprietary drugs and generics, where there may be a generic equivalent. In many cases, the generic equivalent has no clinical difference between the proprietary drugs — and so the cost difference is huge. It might be 81 pence or 81 pennies for a generic to more than 20 pounds for a drug that’s still under license.

Looking at the entire dataset, the analytics revealed different patterns, and from that, cost differences. If we carried out this research over a year ago, for example, we could have saved 200 million pounds over the last year. It really is quite significant. That’s on one class of drugs, on one area. We think this research could be repeated against different classes of drugs and replicated internationally.

UK statin map screenshotUK statin map screenshot

Percentage of proprietary statin prescribing by CCG Sept 2011 – May 2012.
Image Credit:

Which open data business models are the most exciting to you?

Gavin Starks: I think there’s lots of different areas to explore here. There are areas where there can be cost savings brought to any organization, whether it’s public sector or private sector organizations. There’s also areas of new innovation. (I think that they’re quite different directions.) Some of the work that we’ve done with the prescription data, that’s where you’re looking at efficiencies.

We’ve got other startups that are based in our offices here in Shoreditch and London that are looking at transportation information. They’re looking at location-based services and other forms of analytics within the commercial sectors: financial information, credit ratings, those kinds of areas. When you start to pull together different levels of open data that have been available but haven’t been that accessible in the past, there’s new services that can be derived from them.

What creates a paid or value-add service? It’s essential that we create a starting point where free and open access to the data itself can be made available for certain use cases for as many people as possible. There, you stimulate innovation if you can gain access to discern new insight from that data.

Having the data aggregated, structured and accessible in an automated way is worth paying for. There could be a service-level-agreement-based model. There could be a carve-out of use cases. You could borrow from the Creative Commons world and say, “If you’re going to have a share alike license on this, then that’s fine, you can use it for free. But if you’re going to start creating closed assets, as a result, there may be a charge for the use of data at that point.”

I think there’s a whole range of different data models, but really, the goal here is to try and discern what insight can be derived from existing datasets and what happens when you start mashing them up with other datasets.

What are the greatest challenges to achieving some of the economic outcomes that the UK Cabinet Office has described?

Gavin Starks: I think there are many challenges. One of the key ones is just understanding. One challenge we’ve heard consistently from pretty much everybody has been, “We believe there’s a huge amount of potential here, but where do we start?”

Part of the ODI’s mission is to provide training, education and assets that enable people to begin on that journey. We’re in the process right now of designing our first dozen or so training programs. We’re working at one level with the World Bank to train the world’s politicians and national leaders, and we’re working at the other end with schools to create programs that fit with existing graduate courses.

Education is one of the biggest challenges. We want to train more than technologists — we also want to train lawyers and journalists about the business case to enable people to understand and move forward at the same pace. There’s very little point in just saying, “There is huge value here,” without trying to demonstrate that return on investment (ROI) and value case at the same time.

What is the ODI’s approach to incubating civic startups?

Gavin Starks: There are two parts to it. One is unlocking supply. We’re working with different government departments and public sector agencies to help them understand what unlocking supply means. Creating structured, addressable, repeatable data creates the supply piece so that you can actually start to build a business. It’s very high-risk to try and build a business when you don’t have a guarantee of supply.

Two, encouraging and incubating the demand side. We’ve got six startups in our space already. They’re all at different stages. Some of them are very early on, just trying to navigate toward the value here that we can discern from the data. Others are more mature, and maybe have some existing revenue streams, but they’re looking at how to really make this scale.

What we’ve found is of benefit so far — and again, we’re only three months in — is our ability to network and convene the different stakeholders. We can take a small startup and get them in front of one of the large corporations and help them bridge that sales process. Helping them communicate their ideas in a clear way, where the value is obvious to the end customer, is important.

What are some of the approaches that have worked to unlock value from open government data?

Gavin Starks: We’re not believers in “If you build it, they will come.” You need to create a guaranteed data supply, but you also need to really engage with people to start to unlock ideas.

We’ve been running our own hackathons, but I think there’s a difference in the way that we’ve structured them and organized them. We include domain experts and frame the hack events around a specific problem or a specific set of problems. For example, we had a weekend-long hackathon in the health space, looking at different datasets, convening domain experts and technical experts.

It involved competitions, when the winner gets a seat at the ODI to take their idea forward. It might be that an idea turns into a business, it might turn into a project, or it might just turn into a research program.

I think that you need to really lead people by the hand through the process of innovation, helping them and supporting them to unlock the value, rather than just having the datasets there and expecting them to be used.

Given the cost the UK’s National Audit Office ascribed to opening data, is the investment worth it?

Gavin Starks: This is like the early days of the web. There are lots of estimates about how much everything is going to be worth and what sort of ROI people are going to see. What we’ve yet to see, I think, is the honest answer.

The reason I’m very excited about this area is that I see the same potential as I saw in the mid-1990s, when I got involved with the web. The same patterns exist today. There are new datasets and ecosystems coming into existence that can be data-mined. They can be joined together in novel ways. They can bridge the virtual and physical worlds. They can bring together people who have not been able to collaborate in different ways.

There’s a huge amount of value to be unlocked. There will be some dead ends, as we had in the web’s development, but there will be some incredible wins. We’re trying to refine our own skills around identifying where those potential hot spots might be.

Health services is an area where it’s really obvious there’s a lot of benefits. There are clear benefits from opening up transportation and location-based services. You can see the potential behind energy efficiency, creating efficient supply chains and opening up more information around education.

You can see resonant points. We’re really drilling into those and asking, “What happens when you really put together the right domain experts and the supportive backers?”

Those backers can be financial as well as in industry. The Open Data Institute has been pulling together those experts and providing a neutral space for that innovation to happen.

Which of those areas have the most clear economic value, in terms of creating shorter term returns on investment and quick wins?

Gavin Starks: I don’t think there’s a single answer to that question. If you look at location-based services, corporate data, health data or education, there are examples and use cases in different parts of the world where they will have different weightings.

If you were looking at water sanitation in areas of the world where there is an absence of it, then they may provide more immediate return than unlocking huge amounts of transportation information.

In Denmark, look at the release of the equivalent of zip code data and the more detailed addresses. I believe the numbers there went from four-fold return to 17-fold return, in terms of value to the country of their investment in decent address-level data.

This is one area that we’ve provided a consultation response in the UK. I think it may vary from state-to-state in the U.S., or maybe in areas where the specific focus on health would be very beneficial. There may be areas where a focus on energy efficiency may be most beneficial.

What conditions lead to beneficial outcomes for open data?

Gavin Starks: A lot of the real issues are not really about the technology. When it comes to the technology, we know what a lot of the solutions are. How can we address or improve the data quality? What standards need to exist? What anonymity, privacy or secrecy needs to exist around the data? How do we really measure the outcomes? What are the circumstances where stakeholders need to get involved?

You definitely need political buy-in, but there also needs to be a sense of what the data landscape is. What’s the inventory? What’s the legal situation? Who has access? What kind of access is required? What does success look like against a particular use case?

You could be looking at health in somewhere like Rwanda, you could be looking at a national statistics office in a particular country where they may not have access to the data themselves, and they don’t have very much access to resources. You could be looking at contracting, government procurement and improving simple accountability, where there may be more information flow than there is around energy data, for example.

I think there’s a range of different use cases that we need to really explore here. We’re looking for great use cases where we can say, “This is something that’s simple to achieve, that’s repeatable, that helps lower costs and stimulate innovation.”

We are really at the beginning of a journey here.

Red Hat made headlines for becoming the first billion-dollar open source company. What do you think the first billion-dollar open data company will be?

Gavin Starks: It would be not unlikely for that to be in the health arena.

This interview has been edited and condensed for clarity. This post is part of our ongoing investigation into the open data economy.


February 01 2013

Zusammenarbeit für offene Wissenschaftsdaten

Wissenschaftliche Organisationen weltweit diskutieren über den offenen Zugang zu Forschungsdaten.


January 31 2013

NASA launches second International Space Apps Challenge

From April 20 to April 21, on Earth Day, the second international Space Apps Challenge will invite developers on all seven continents to the bridge to contribute code to NASA projects.

space app challengespace app challenge

Given longstanding concerns about the sustainability of apps contests, I was curious about NASA’s thinking behind launching this challenge. When I asked NASA’s open government team about the work, I immediately heard back from Nick Skytland (@Skytland), who heads up NASA’s open innovation team.

“The International Space Apps Challenge was a different approach from other federal government ‘app contests’ held before,” replied Skytland, via email.

“Instead of incentivizing technology development through open data and a prize purse, we sought to create a unique platform for international technological cooperation though a weekend-long event hosted in multiple locations across the world. We didn’t just focus on developing software apps, but actually included open hardware, citizen science, and data visualization as well.”

Aspects of that answer will please many open data advocates, like Clay Johnson or David Eaves. When Eaves recently looked at apps contests, in the context of his work on Open Data Day (coming up on February 23rd), he emphasized the importance of events that build community and applications that meet the needs of citizens or respond to business demand.

The rest of my email interview with Skytland follows.

Why is the International Space Apps Challenge worth doing again?

Nick Skytland: We see the International Space Apps Challenge event as a valuable platform for the Agency because it:

  • Creates new technologies and approaches that can solve some of the key challenges of space exploration, as well as making current efforts more cost-effective.
  • Uses open data and technology to address global needs to improve life on Earth and in space.
  • Demonstrates our commitment to the principles of the Open Government Partnership in a concrete way.

What were the results from the first challenge?

Nick Skytland: More than 100 unique open-source solutions were developed in less then 48 hours.

There were 6 winning apps, but the real “results” of the challenge was a 2,000+ person community engaged in and excited about space exploration, ready to apply that experience to challenges identified by the agency at relatively low cost and on a short timeline.

How does this challenge contribute to NASA’s mission?

Nick Skytland: There were many direct benefits. The first International Space Apps Challenge offered seven challenges specific to satellite hardware and payloads, including submissions from at least two commercial organizations. These challenges received multiple solutions in the areas of satellite tracking, suborbital payloads, command and control systems, and leveraging commercial smartphone technology for orbital remote sensing.

Additionally, a large focus of the Space Apps Challenge is on citizen innovation in the commercial space sector, lowering the cost and barriers to space so that it becomes easier to enter the market. By focusing on citizen entrepreneurship, Space Apps enables NASA to be deeply involved with the quickly emerging space startup culture. The event was extremely helpful in encouraging the collection and dissemination of space-derived data.

As you know, we have amazing open data. Space Apps is a key opportunity for us to continue to open new data sources and invite citizens to use them. Space Apps also encouraged the development of new technologies and new industries, like the space-based 3D printing industry and open-source ROV (remote submersibles for underwater exploration.)

How much of the code from more than 200 “solutions” is still in use?

Nick Skytland: We didn’t track this last time around, but almost all (if not all) of the code is still available online, many of the projects continued on well after the event, and some teams continue to work on their projects today. The best example of this is the Pineapple Project, which participated in numerous other hackathons after the 2012 International Space Apps Challenge and just recently was accepted into the Geeks Without Borders accelerator program.

Of the 71 challenges that were offered last year, a low percentage were NASA challenges — about 13, if I recall correctly. There are many reasons for this, mostly that cultural adoption of open government philosophies within government is just slow. What last year did for us is lay the groundwork. Now we have much more buy-in and interest in what can be done. This year, our challenges from NASA are much more mission-focused and relevant to needs program managers have within the agency.

Additionally, many of the externally submitted challenges we have come from other agencies who are interested in using space apps as a platform to address needs they have. Most notably, we recently worked with the Peace Corps on the Innovation Challenge they offered at RHoK in December 2012, with great results.

The International Space Apps Challenge was a way for us not only to move forward technology development, drawing on the talents and initiative of bright-minded developers, engineers, and technologists, but also a platform to actually engage people who have a passion and desire to make an immediate impact on the world.

What’s new in 2013?

Nick Skytland: Our goal for this year is to improve the platform, create an even better engagement experience, and focus the collective talents of people around the world on develop technological solutions that are relevant and immediately useful.

We have a high level of internal buy-in at NASA and a lot of participation outside NASA, from both other government organizations and local leads in many new locations. Fortunately, this means we can focus our efforts on making this an meaningful event and we are well ahead of the curve in terms of planning to do this.

To date, 44 locations have confirmed their participation and we have six spots remaining, although four of these are reserved as placeholders for cities we are pursuing. We have 50 challenge ideas already drafted for the event, 25 of which come directly from NASA. We will be releasing the entire list of challenges around March 15th on

We have 55 organizations so far that are supporting the event, including seven other U.S. government organizations, and international agencies. Embassies or consulates are either directly leading or hosting the events in Monterrey, Krakow, Sofia, Jakarta, Santa Cruz, Rome, London and Auckland.


January 28 2013

Open data economy: Eight business models for open data and insight from Deloitte UK

When I asked whether the push to free up government data was resulting in economic activity and startup creation, I started to receive emails from people around the United States and Europe. I’ll be publishing more of what I learned in our ongoing series of open data interviews and profiles over the next month, but two responses are worth sharing now.

Open questions about open growth

The first response concerned Deloitte’s ongoing research into open data in the United Kingdom [PDF], conducted in collaboration with the Open Data Institute.

Harvey Lewis, one of the primary investigators for the research project, recently wrote about some of Deloitte’s preliminary findings at the Open Government Partnership’s blog in a post on “open growth.” To date, Deloitte has not found the quantitative evidence the team needs to definitely demonstrate the economic value of open data. That said, the team found much of interest in the space:

“… new businesses and new business models are beginning to emerge: Suppliers, aggregators, developers, enrichers and enablers. Working with the Open Data Institute, Deloitte has been investigating the demand for open data from businesses. Looking at the actual supply of and demand for open data in the UK provides some indication of the breadth of sectors the data is relevant to and the scale of data they could be considering.

The research suggests that the key link in the value chain for open data is the consumer (or the citizen). On balance, consumer-driven sectors of the economy will benefit most from open government data that has direct relevance to the choices individuals make as part of their day-to-day lives.”

I interviewed Lewis last week about Deloitte’s findings — stay tuned for more insight into that research in February.

8 business models for open data

Michele Osella, a researcher and business analyst in the Business Model & Policy Innovation Unit at the Istituto Superiore Mario Boella in Italy, wrote in to share examples of emerging business models based upon the research I cited in my post in December. His email reminded me that in Europe, open data is often discussed in the context of public sector information (PSI). Ongoing case studies of re-use are available at the European Public Sector Information Platform website.

Osella linked to a presentation on business models in PSI reuse and shared a list of eight business models, including case studies for six of them:

  1. Premium Product / Service.
  2. Freemium Product / Service. None of the 13 enterprises interviewed by us falls into this case, but a slew of instances may be provided: a classic example in this vein is represented by mobile apps related to public transportation in urban areas. [Link added.]
  3. Open Source. OpenCorporates and OpenPolis
  4.  Infrastructural Razor & Blades. Public Data Sets on Amazon Web Service
  5. Demand-Oriented Platform. DataMarket and Infochimps
  6. Supply-Oriented Platform. Socrata and Microsoft Open Government Data Initiative
  7. Free, as Branded Advertising. IBM City Forward, IBM Many Eyes or Google Public Data Explorer
  8. White-Label Development.. This business model has not consolidated yet, but some embryonic attempts seem to be particularly promising.

Agree? Disagree? Have other examples of these models or other business models? Please let me know in the comments or write in to

In the meantime, here are several other posts that have informed my investigation into open data business models:

This post is part of our ongoing investigation into the open data economy.

Four short links: 28 January 2013

  1. Aaron’s Army — powerful words from Carl Malamud. Aaron was part of an army of citizens that believes democracy only works when the citizenry are informed, when we know about our rights—and our obligations. An army that believes we must make justice and knowledge available to all—not just the well born or those that have grabbed the reigns of power—so that we may govern ourselves more wisely.
  2. Vaurien the Chaos TCP Monkeya project at Netflix to enhance the infrastructure tolerance. The Chaos Monkey will randomly shut down some servers or block some network connections, and the system is supposed to survive to these events. It’s a way to verify the high availability and tolerance of the system. (via Pete Warden)
  3. Foto Forensics — tool which uses image processing algorithms to help you identify doctoring in images. The creator’s deconstruction of Victoria’s Secret catalogue model photos is impressive. (via Nelson Minar)
  4. All Trials Registered — Ben Goldacre steps up his campaign to ensure trial data is reported and used accurately. I’m astonished that there are people who would withhold data, obfuscate results, or opt out of the system entirely, let alone that those people would vigorously assert that they are, in fact, professional scientists.

GEMA vs. Youtube: Sperrtafel-Streit und eine neue Visualierung

Die unendliche Geschichte des Streits GEMA vs. Youtube bringt immer wieder neue Schlagzeilen. Nachdem die GEMA Youtube bereits eine Abmahnung zukommen ließ, klagt sie nun. Sie ist der Meinung, die eingeblendeten Sperrtafeln seien bloße Stimmungsmache. Bei aller Kritik an der GEMA lässt sich das schlecht von der Hand weisen: Die Sperrtafeln erzählen zumindest eine einseitige Interpretation des Streits, auch wenn am ramponierten Image der GEMA sicher nicht nur die Anderen Schuld sind.

Die Berliner Agentur Opendatacity hat jedenfalls heute eine Visualisierung zum Streit veröffentlicht (die sich leider noch nicht unter 520 Pixeln Breite einbetten lässt). Basierend auf etwa 200.000 Suchanfragen hat sie eine „Top 1000”-Liste erstellt. Die Visualisierung zeigt, wieviele Videos mit welchen Angaben gesperrt werden – für Deutschland und andere Länder. Die Hintergründe gibt es bei Lorenz Matzat.

Unterstützt durch MyVideo. Realisiert von OpenDataCity. Anwendung steht unter CC-BY 3.0.

Reposted byRKareyouboreddocquejaggereglerion-justforfun

January 25 2013

Transparenz auf Chinesisch

China hat ein eigenes Informationsfreiheitsgesetz, das es Bürgern, Medien oder NGOs erlaubt, an Daten, die der Staat sammelt, zu kommen.


January 23 2013

Making open data more valuable, one micropayment at a time

When it comes to making sense of the open data economy, tracking cents is valuable. In San Francisco, where Mayor Ed Lee’s administration has reinvigorated city efforts to release open data for economic benefits, entrepreneur Yo Yoshida has made the City by the Bay’s government data central to his mobile ecommerce startup, Appallicious.

Appallicious is positioning its Skipitt mobile platform as a way for cities to easily process mobile transactions for their residents. The startup is generating revenue from each transaction the city takes with its platform using micropayments, a strategy that’s novel in the world of open data but has enabled Appallicious to make enough money to hire more employees and look to expand to other municipalities. I spoke to Yoshida last fall about his startup, what it’s like to go through city procurement, and whether he sees a market opportunity in more open government data.

Where did the idea for Appallicious come from?

Yo Yoshida: About three years ago, I was working on another platform with a friend that I met years ago, working on a company called Beaker. We discovered a number of problems. One of them was being able to find our way around San Francisco and not only get information, but be able to transact with different services and facilities, including going to a football game at the 49ers stadium. Why couldn’t we order a beer to our seats or order merchandise? Or find the food trucks that were sitting in some of the parks and then place an order from that?

So we were looking at what solutions were out there via mobile. We started exploring how to go about doing this. We looked first at the vendors and approaching them. That’s been done with a lot of other specific verticals. We started talking to the city a little bit. We looked at the open data legislation that was coming out at that time and said, “This is the information we need, but now we also need to be able to figure out how to monetize and populate that.”

We set about starting to build a platform that could not only support one type of transaction — ordering merchandise or something like that — but provide what I needed as a citizen to fulfill my needs and solve problems. We approached San Francisco Recreations and Parks because we had heard, through a third party, that they had been looking for a solution like this for two years. We showed them what we were doing. They asked us to come back with a demonstration of a product in a few weeks. We came back and showed them the first iteration of a mobile app.

Essentially, what we built was a mobile commerce platform that supports multiple tenants of financial transactions using open data. We enable the government — or whoever we’re working with — to be able to manage it from a multi-tiered, hierarchical structure.

We’ve built this platform to enable government to manage all of their mobile technology and transactions through software as a service.

What’s your business model?

Yo Yoshida: San Francisco Recreations and Parks has 1,200 facilities in San Francisco. The parks are free. The museums, obviously, are not, but they all sit on park land. You’re talking about permits, reservations for picnic tables. You have all of these different facilities, and all sorts of different ways to transact at each of these facilities. What we’ve done is create an informational piece for the public, which gives them the ability to find all sorts of facilities.

There’s two different models for the financial piece. One is subscription-based.

However, with San Francisco Recreations and Parks, we saw a bigger and a more sustainable proposition in taking micropayments on transactions. There’s tons of transactions going on every day, from permitting to making reservations to scheduling classes to ticketing for events. Golden Gate Park gets 15 million visitors a year, including those visiting the Botanical Gardens, the Japanese Tea Gardens, and the California Academy of Science. Essentially, what we’re setting up is a micropayment or a convenience fee on each of those transactions.

San Francisco’s Recreations and Parks annual revenue alone is $35 million. That’s a percentage of ticket sales and lease prices for everything that all of these different properties sit on. Their extended reach is $200 million plus. So if we were to tap into that marketplace and take micropayments on them, we’re looking at a couple million dollars a year for us.

How big is your company now?

Yo Yoshida: We started with two people. We are now about to hire a total of 12. We expect to grow to maybe 30 by next summer, all depending on our funding rounds as they come through. We have interest from other cities, like San Diego, Denver and Los Angeles. We’re basically a plug-and-play solution for government or cities to be able to take open data, plug it in and then start creating financial pools out of it for the consumers to be able to have easy transactions.

Can other cities “plug and play” open data into your system?

Yo Yoshida: The biggest pain for me, obviously, is the transactions. Some cities have to pass legislation. If they have open data, plugging in and getting the informational piece out first, which is what Recreations and Parks is doing, essentially, is a no-brainer.

If someone has good open datasets, it would take maybe a month to implement this for an entire large city, depending on the departments. You first would have the tools for everyone to be able to find their way around. For instance, there’s always been pain points with Muni, like finding the three-day passes. There’s no reason why you shouldn’t have that built into your map and into your directions if you’re going to one of those facilities, and then be able to use that to actually go to the museums as well.

Entrepreneurs trying to use government data sometimes describe challenges around its quality. Is that true here?

Yo Yoshida: We had to work with San Francisco on that, but each of the departments that we’re working with has assigned someone to clean up the data. You can’t have bad data in there. We’ve had that pain point in our past conversations. Frequently, it is a three-month wait time for them to clean up their data.

The Department of Public Health is doing it now. Their GIS person usually is the person that gets assigned to making sure all of the data that’s opened up to the public is cleaned up. He’s done an amazing job cleaning up all of the data points. It’s been a win-win situation because they all want this technology. They know they have to have clean data to get it, so they’re cleaning up their data.

Do you think more startups will target government as a customer?

Yo Yoshida: The procurement process was a long and grueling process. A lot of it came from the City Attorney’s office not understanding what this was, what this technology is like and that they can’t own everything. We did struggle a little bit there. We were very patient. We educated them as we went along. Most small startups can’t get to that place yet.

I think having someone sitting above that who actually understands software as a service and drives these things through a few times so they can get used to this process is going to make a huge difference for entrepreneurs.

We see this type of development and drive from the Mayor’s office as a huge opportunity to get the process streamlined and more efficient, so that entrepreneurs can actually come up and create technology. I mean, we suffered for a year, but we got it through. Hopefully, that will pave the way for others. With the new legislation, we’re hoping that they’re going to make it a much more efficient process and have someone there that actually understands this process.

The barriers to entry were so high before. If they streamline the process for entrepreneurs, there’s an incredible ability to access extreme amounts of revenue.

Is there a market opportunity in the open data San Francisco is releasing?

Yo Yoshida: There’s a small market play selling apps. I think you’re going to see, with companies like ours, that there truly is an ability to innovate on top of open data.

There absolutely is opportunity. It’s created us. We know that there’s going to be competitors coming along behind us, filling some needs that we can’t. The subscription-based model is going to probably work for several departments, like the Department of Public Health.

As far as hackathons and stuff like that, personally, I think they’re very innovative, but they’re not sustainable. There are definitely companies that are sustainable moving forward.

As far as I can tell, we are pretty much the first sustainable one on the scene. Our projected numbers, just off of micropayments, are going to not only generate revenue for us, but generate revenue for the city. I am looking at this as a sustainable company that can move forward and scale through and accommodate every type of city.

I see lots of new apps and lots of great informational apps, but they don’t make money. You have to sustain the technology. As you know, every version needs a new update. Who’s going to be maintaining that? How are you going to pay for the maintenance and how are you going to pay for the staff to do it? You have to create the real company. Our infrastructure is created to be a sustainable solution for cities moving forward.

This interview has been edited and condensed for clarity. This post is part of our ongoing investigation into the open data economy.


January 04 2013

Open Data & Open Government – Show- statt Transparenzeffekt

In anderen Ländern ist Open Data – der offene Zugang zu Daten des öffentlichen Sektors – Chefsache, in Deutschland eher Stiefkind.


Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.
No Soup for you

Don't be the product, buy the product!

YES, I want to SOUP ●UP for ...