Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

June 29 2012

UK Cabinet Office relaunches Data.gov.uk, releases open data white paper

The British government is doubling down on the notion that open data can be a catalyst for increased government transparency, civic utility and economic prosperity.

Yesterday, the United Kingdom's Cabinet Office hosted an event in London, England to highlight the release of a new white paper on "unleashing the potential of open data," linked at the bottom of the post, and the relaunch of a Data.gov.uk, the country's open data platform. The site now has over 9,000 data sets on it, according to the Cabinet Office.

In the video below, Francis Maude, minister for the Cabinet Office, talks about the white paper, which was the result of a public consultation over the last year.

"I think it's all good overall," commented author Dr. Ben Goldacre, via email.

"The UK government have been saying the right things about data for a long time: that it's the 21st century's raw material, that it has economic and social benefits, that privacy issues need caution, and so on. That in itself is reassuring, as governments can sometimes be completely clueless about this kind of stuff.

They also get the nerdy details: that standards matter, and so on. Also, all the stuff about building reciprocal relationships with developers, building coder capacity, two way relationships to improve datasets etc is all great. The star rating system for departments is neat, as one lesson from this whole area is simple structured public feedback often improves services.

The main concern is that the core reference data hasn't been released for free. The Postcode Address File allows developers to convert addresses into postcodes: this kind of dataset is like the road network of the digital domain, and it needs to be open with free movement so businesses and services can meet users. Our excellent Ordnance Survey maps are still locked up at the more detailed levels, which is problematic since a lot of geographical data from local government uses OS data too, so release of that is hindered. Companies House data is also still pay only.

The Cabinet Office seem to have been fighting hard for this stuff, which is great, but it's proving difficult to release."

The Guardian's Datablog published a smart, cogent analysis of the open data white paper and a spreadsheet of the government's commitments under it.

I strongly agree with Simon Rogers, the editor of the Datablog, that one of the most significant elements of the white paper is its acknowledgement of the need to engage developers and solicit their feedback on the quality and availability of open government data.

"Traditionally, government has almost ignored developers, even as prime users of its data," wrote Simon Rogers at the Guardian. "This commitment to take that community into account is probably the most striking part of this White Paper, which will allow users to ask government for specific datasets, feedback on how they've used them and, crucially, 'inform us when there are anomalies or mistakes in our data.'"

The past several years have shown such engagement is a critical aspect of building communities around open data. Directly engaging entrepreneurs, venture capitalists, industry and academia is, as US CTO Todd Park's success with stimulating innovation around open health data has demonstrated, necessary for downstream success. Publishing high quality open data online is, in that context, necessary but unsufficient for better downstream outcomes for citizens. In the context of the costs incurred through publishing open data, this investment of time and energy in community engagement can't be underemphasized - and the inclusion of this strategic element in the white paper is notable.

All that being said, an actual strategy for developer engagement was not published in the white paper - stay tuned on that count.

Maude, Berners-Lee and Pollock on open data

Earlier this spring, I interviewed Francis Maude, the United Kingdom's minister for the Cabinet Office, about the responsibilities and opportunities for open government and transparency, including its relationship to prosperity and security. The video of our interview is embedded below:

The British government has also now officially adopted the "5 star" rubric of one of its most celebrated citizens, World Wide Web inventor Tim Berners-Lee, for evaluating the quality of open government data. Below, I've embedded my interview on open data with Berners-Lee, which remains relevant today:

For another view from civil society on what, exactly, open data is and why it matters, watch my interview with Rufus Pollock, the co-founder of the Open Knowledge Foundation, below. The Open Knowledge Foundation supports the, Comprehensive Knowledge Archive Network (CKAN), the open source open data platform software that underpins the Data.gov.uk site.

UK government white paper on open data

June 22 2012

The emerging political force of the network of networks

The shape and substance of our networked world is constantly emerging over time, stretching back over decades. Over the past year, the promise of the Internet as a platform for collective action moved from theory to practice, as networked movements of protesters and consumers have used connection technologies around the world in the service of their causes.

This month, more eyes and minds came alive to the potential of this historic moment during the ninth Personal Democracy Forum (PDF) in New York City, where for two intense days the nexus of technology, politics and campaigns came together on stage (and off) in a compelling, provocative mix of TED-style keynotes and lightning talks, longer panels, and the slipstream serendipity of hallway conversations and the backchannel on Twitter.


If you are interested in the intersection of politics, technology, social change and the Internet, PDF has long since become a must-attend event, as many of the most prominent members of the "Internet public" convene to talk about what's changing and why.

The first day began with a huge helping of technology policy, followed with a hint of triumphalism regarding the newfound power of the Internet in politics that was balanced by Jaron Lanier's concern about the impact of the digital economy on the middle class. The conference kicked off with a conversation between two United States Congressmen who were central to the historic online movement that halted the progression of the Stop Online Piracy Act (SOPA) and the Protect IP Act (PIPA) in the U.S. House of Representatives and Senate: Representative Darrell Issa (R-CA) and Senator Ron Wyden (D-OR). You can watch a video of their conversation with Personal Democracy Media founder Andrew Rasiej below:

During this conversation, Rep. Issa and Sen. Ron Wyden introduced a proposal for a "Digital Bill of Rights." They published a draft set of principles on MADISON, the online legislation platform built last December during the first Congressional hackathon.

Both Congressmen pointed to different policy choices that stand to affect billions of people, ranging from proposed legislation about intellectual property, to the broader issue of online innovation and Internet freedom, and international agreements like the Anti-Counterfeiting Trade Agreement (ACTA or the Trans Pacific Partnership). Such policy choices also include online and network security: Rep. Issa sponsored and voted for CISPA, whereas Sen. Wyden is to opposed to a similar legislative approach in the Senate. SOPA, PIPA, ACTA and TPP have all been posted on MADISON for public comment.


On the second day of PDF, conversations and talks turned toward not only what is happening around the networked world but what could be in store for citizens in failed states in the developing world or those inhabiting huge cities in the West, with implications that can be simultaneously exhilarating and discomfiting. There was a strong current of discussion about the power of "adhocracy" and the force of the networked movements that are now forming, dissolving and reforming in new ways, eddying around the foundations of established societal institutions around the globe. Micah Sifry, co-founder of the Personal Democracy Forum, hailed five of these talks as exemplars of the "radical power of the Internet public.

These keynotes, by Chris Soghoian, Dave Parry, Peter Fein, Sascha Meinrath and Deanna Zandt, "could serve as a 50-minute primer on the radical power of the Internet public to change the world, why it's so important to nurture that public, where some of the threats to the Internet are coming from, and how people are routing around them to build a future 'intranet' that might well stand free from governmental and corporate control," wrote Sifry. (Three of them are embedded individually below; the rest you can watch in the complete video catalog at the bottom of this section.)

Given the historic changes in the Middle East and Africa over the past year during the Arab Spring, or the networked protests we've seen during the Occupy movement or over elections in Russia or austerity measures in Greece, it's no surprise that there was great interest in not just talking about what was happening, but why. This year, PDF attendees were also fortunate to hear about the experiences of netizens in China and Russia. The degree of change created by adding wireless Internet connectivity, social networking and online video to increasingly networked societies will vary from country to country. There are clearly powerful lessons that can be gleaned from the experiences of other humans around the globe. Learning where social change is happening (or not) and understanding how our world is changing due to the influence of networks is core to being a digitally literate citizen in the 21st century.

Declaring that we, as a nation or global polity, stand at a historic inflection point for the future of the Open Web or the role of the Internet in presidential politics or the balance of digital security and privacy feels, frankly, like a reiteration of past punditry, going well back to the .com boom in the 1990s.

That said, it doesn't make it less true. We've never been this connected to a network of networks, nor have the public, governments and corporations been so acutely aware of the risks and rewards that those connection technologies pose. It wasn't an accident that Muammar Gaddafi namechecked Facebook before his fall, nor that the current President of the United States (or his opponent in the the upcoming election) are talking directly with the public over the Internet. One area that PDF might have dwelt more upon is the dark side of networks, from organized crime and crimesourcing to government-sponsored hacking to the consequences of poorly considered online videos or updates.

We live in a moment of breathtaking technological changes that stand to disrupt nearly every sector of society, for good or ill. Many thanks to the curators and conveners of this year's conference for amplifying the voices of those whose work focuses on documenting and understanding how our digital world is changing — and a special thanks to all of the inspiring people who are not only being the change they wish to see in the world but making it.

Below, I've embedded a selection of the PDF 12 talks that resonated with me. These videos should serve a starting point, however, not an ending: every person on the program of this year's conference had something important to share, from Baratunde Thurston to Jan Hemme to Susan Crawford to Leslie Harris to Carne Ross to the RIAA's Cary Sherman — and the list goes on and on. You can watch all 45 talks from PDF 2012 (at least, the ones that have been uploaded to YouTube by the Personal Democracy Media team) in the player below:

Yochai Benkler | SOPA/PIPA: A Case Study in Networked Discourse and Activism

In this talk, Harvard law professor Yochai Benkler (@ybenkler) discussed using the Berkman Center's media cloud to trace how the Internet became a networked platform for collective action against SOPA and PIPA. Benkler applies a fascinating term — the "attention backbone" — to describe how influential nodes in a network direct traffic and awareness to research or data. If you're interested in the evolution of the blueprint for democratic participation online, you'll find this talk compelling.

Sascha Meinrath | Commotion and the Rise of the Intranet Era

Mesh networks have become an important — and growing — force for carrying connectivity to more citizens around the world. The work of Sasha Meinrath (@SashaMeinrath) at the Open Technology Institute in the New America Foundation is well worth following.

Mark Surman | Making Movements: What Punk Rock, Scouting, and the Royal Society Can Teach

Mark Surman (@msurman), the executive director of the Mozilla Foundation, shared a draft of his PDF talk prior to the conference. He offered his thoughts on "movement making," connecting lessons from punk rock, scouting and the Royal Society.

With the onrush of mobile apps and swift ride of Facebook, what we think about as the Internet — the open platform that is the World Wide Web — is changing. Surman contrasted the Internet today, enabled by an end-to-end principle, built upon open-source technologies and on open protocols, with the one of permissions, walled gardens and controlled app stores that we're seeing grow around the world. "Tim Berners-Lee built the idea that the web should be LEGO into its very design," said Surman. We'll see how if all of these pieces (loosely joined?) fit as well together in the future.

Juan Pardinas | OGP: Global Steroids for National Reformers

There are substantial responsibilities and challenges inherent in moving forward with the historic Open Government Partnership (OGP) that officially launched in New York City last September. Juan Pardinas (@jepardinas) took the position that OGP will have a positive impact on the world and that the seat civil society has at the partnership's table will matter. By the time the next annual OGP conference rolls around in 2013, history may well have rendered its own verdict on whether this effort will endure to lasting effect.

Given diplomatic challenges around South Africa's proposed secrecy law, all of the stakeholders in the Open Government Partnership will need to keep pressure on other stakeholders if significant progress is going to be made. If OGP is to be judged more than a PR opportunity for politicians and diplomats to make bold framing statements, government and civil society leaders will need to do more to hold countries accountable to the commitments required for participation: all participating countries must submit Action Plans after a bonafide public consultation. Moreover, they'll need to define the metrics by which progress should be judged and be clear with citizens about the timelines for change.

Michael Anti | Walking Along the Great Firewall

Michael Anti (@mranti) is a Chinese journalist and political blogger who has earned global attention for activism in the service of freedom of the press in China. When Anti was exiled from Facebook over its real names policy, his account deletion became an important example for other activists around the world. At PDF, he shared a frank perspective on where free speech stands in China, including how the Chinese government is responding to the challenges of their increasingly networked society. For perspective, there are now more Internet users in China (an estimated 350 million) than the total population of the United States. As you'll hear in Anti's talk, the Chinese government is learning and watching what happens elsewhere.





Masha Gessen | The Future of the Russian Protest Movement

Masha Gessen (@mashagessen), a Russian and American journalist, threw a bucket of ice water on any hopes that increasing Internet penetration or social media would in of themselves lead to improvements in governance, reduce corruption, or improve the ability of Russia's people to petition their government for grievances.





An Xiao Mina | Internet Street Art and Social Change in China

This beautiful and challenging talk by Mina (@anxiaostudio) offered a fascinating insight: memes are the street art of the censored web. If you want to learn more about how Chinese artists and citizens are communicating online, watch this creative, compelling presentation. (Note: there are naked people in this video, which will make it NSFW is some workplaces.)

Chris Soghoian | Lessons from the Bin Laden Raid and Cyberwar

Soghoian (@csoghoian), who has a well-earned reputation for finding privacy and security issues in the products and services of the world's biggest tech companies, offered up a talk that made three strong points:

  1. Automatic security updates are generally quite a good thing for users.
  2. It's highly problematic if governments create viruses that masquerade as such updates.
  3. The federal government could use an official who owns consumer IT security, not just "cybersecurity" in at the corporate or national level.

Zac Moffatt | The Real Story of 2012: Using Digital for Persuasion

Moffatt (@zacmoffatt> is the digital director for the Mitt Romney presidential campaign. In his talk, Moffatt said 2012 will be the first election cycle where persuasion and mobilization will be core elements of the digital experience. Connecting with millions of voters who have moved to the Internet is clearly a strategic priority for his team — and it appears to be paying off. The Guardian reported recently that the Romney campaign is closing the digital data gap with the Obama campaign.


Nick Judd wrote up further analysis of Moffatt's talk on digital strategy over at TechPresident.

Alex Torpey | The Local Revolution

Alex Torpey (@AlexTorpey) attracted widespread attention when he was elected mayor of South Orange New Jersey last year at the age of 23. In the months since he was elected, Torpey has been trying to interest his peers in politics. His talk at PDF focused on asking for more participation in local government and to rethink partisanship: Torpey ran as an independent. As Gov 2.0 goes local, Mayor Torpey looks likely to be one of its leaders.

Gilad Lotan | Networked Power: What We Learn From Data

If you're interested in a data-driven analysis of networked political power and media influence, Gilan Lotan's talk is a must-watch. Lotan, who tweets as @gilgul, crunched massive amounts of tweets to help the people formerly known as the audience to better understand networked movements for change.






Cheryl Contee | The End of the Digital Divide

Jack and Jill Politics co-founder Cheryl Contee (@cheryl) took a profoundly personal approach when she talked about the death and rebirth of the digital divide. She posited that what underserved citizens in the United States now face isn't so much the classic concerns of the 1990s, where citizens weren't connected to the Internet, but rather a skills gap for open jobs and a lack of investment to address those issues in poor and minority communities. She also highlighted how important mentorship can be in bridging that divide. When Contee shared how Yale computer lab director Margaret Krebs helped her, she briefly teared up — and she called on technologists, innovators and leaders to give others a hand up.

Tracing the storify of PDF 12

I published a storify of Personal Democracy Forum 2012 after the event. Incomplete though it may be, it preserves some thoughtful commentary and context shared in the Twittersphere during the event.

June 08 2012

mHealth apps are just the beginning of the disruption in healthcare from open health data

Two years ago, the potential of government making health information as useful as weather data felt like an abstraction. Healthcare data could give citizens the same "blue dot" for navigating health and illness akin to the one GPS data fuels on the glowing map of geolocated mobile devices that are in more and more hands.

After all, profound changes in entire industries, take years, even generations, to occur. In government, the pace of progress can feel even slower, measured in evolutionary time and epochs.

Sometimes, history works differently, particularly given the effect of rapid technological changes. It's only a little more than a decade since President Clinton announced he would unscramble global positioning system data (GPS) for civilian use. President Obama's second U.S. chief technology officer, Todd Park, estimated that GPS data is estimated to have unlocked some $90 billion dollars in value in the United States.

In the context, the arc of the Health Data Initiative (HDI) in the United States might leave some jaded observers with whiplash. From a small beginning, the initiative to put health data to work has now expanded around the United States and attracted great interest from abroad, including observers from England National Health Service eager to understand what strategies have unlocked innovation around public data sets.

While the potential of government health data driving innovation may well have felt like an abstraction to many observers, in June 2012, real health apps and services are here -- and their potential to change how society accesses health information, deliver care, lowers costs, connects patients to one another, creates jobs, empowers care givers and cuts fraud is profound. The venture capital community seems to have noticed the opportunity here: according to HHS Secretary Sebelius, investment in healthcare startups is up 60% since 2009.

Headlines about rockstar Bon Jovi 'rocking Datapalooza' and the smorgasbord of health apps on display, however, while both understandable and largely warranted, don't convey the deeper undercurrent of change.

On March 10, 2010, the initiative started with 36 people brainstorming in a room. On June 2, 2010, approximately 325 in-person attendees saw 7 health apps demoed at an historic forum in the theater of Institute of Medicine in Washington, D.C, with another 10 apps packed into an expo in the rotunda outside. All of the apps or services used open government data from the United States Department of Health and Human Services (HHS).

In 2012, 242 applications or services that were based upon or use open data were submitted for consideration to third annual "Health Datapalooza. About 70 health app exhibitors made it to the expo. The conference itself had some 1400 registered attendees, not counting press and staff, and was sold out in advance of the event in the cavernous Washington Convention Center in DC. On Wednesday, I asked Dr. Bob Kucher, now of Venrock Capital and the Brookings Institution, about how the Health Data Initiative has grown and evolved. Dr. Kucher was instrumental to its founding when he served in the Obama administration. Our interview is embedded below:

Revolutionizing the healthcare industry --- in HHS Secretary Sebelius's words, reformulating Wired executive editor Thomas Goetz's 'latent data' to "lazy data" --- has meant years of work unlocking government data and actively engaging the developers, entrepreneurial and venture capital community. While the process of making health data open and machine-readable is far from done, there has been incontrovertible progress in standing up new application programming interfaces (APIs) that enable entrepreneurs, academic institutions and government itself to retrieve it one demand. On Monday, in concert with the Health Data Palooza, a new version of HealthData.gov launched, including the release of new data sets that enable not just hospital quality comparisons but insurance fees as well.

Two years later, the blossoming of the HDI Forum into a massive conference that attracted the interest of the media, venture capitalists and entrepreneurs from around the nation is a short-term development that few people would have predicted in 2010 but that a nation starved for solutions to spiraling healthcare costs and some action from a federal government that all too frequently looks broken is welcome.

"The immense fiscal pressure driving 'innovation' in the health context actually means belated leveraging of data insights other industries take for granted from customer databases," said Chuck Curran, executive director and general counsel or the Network Advertising Initiative, when interviewed at this year's HDI Forum. For example, he suggested, look at "the dashboarding of latent/lazy data on community health, combined with geographic visualizations, to enable “hotspot”-focused interventions, or info about service plan information like the new HHS interface for insurance plan data (including the API).

Curran also highlighted the role that fiscal pressure is having on making both individual payers and employers a natural source of business funding and adoption for entrepreneurs innovating with health data, with apps like My Drugs Costs holding the potential to help citizens and businesses alike cut down on an estimated $95 billion dollars in annual unnecessary spending on pharmaceuticals.

Curran said that health app providers have fully internalized smart disclosure : "it’s not enough to have open data available for specialist analysis -- there must be simplified interfaces for actionable insights and patient ownership of the care plan."

For entrepreneurs eying the healthcare industry and established players within it, the 2012 Health Data Palooza offers an excellent opportunity to "take the pulse of mHealth, as Jody Ranck wrote at GigaOm this week:

Roughly 95 percent of the potential entrepreneur pool doesn’t know that these vast stores of data exist, so the HHS is working to increase awareness through the Health Data Initiative. The results have been astounding. Numerous companies, including Google and Microsoft, have held health-data code-a-thons and Health 2.0 developer challenges. These have produced applications in a fraction of the time it has historically taken. Applications for understanding and managing chronic diseases, finding the best healthcare provider, locating clinical trials and helping doctors find the best specialist for a given condition have been built based on the open data available through the initiative.

In addition to the Health Datapalooza, the Health Data Initiative hosts other events which have spawned more health innovators. RockHealth, a Health 2.0 incubator, launched at its SXSW 2011 White House Startup America Roundtable. In the wake of these successful events, StartUp Health, a network of health startup incubators, entrepreneurs and investors, was created. The organization is focused on building a robust ecosystem that can support entrepreneurs in the health and wellness space.

This health data ecosystem has now spread around the United States, from Silicon Valley to New York to Louisiana. During this year's Health Datapalooza, I spoke with Ramesh Kolluru, a technologist who works at the University of Louisiana, about his work on a hackathon in Louisiana, the "Cajun Codefest," and his impressions of the forum in Washington:

One story that stood out from this year's crop of health data apps was Symcat, an mHealth app that enables people to look up their symptoms and find nearby hospitals and clinics. The application was developed by two medical students at Johns Hopkins University who happened to share a passion for tinkering, engineering and healthcare. They put their passion to work - and somehow found the time (remember, they're in medical school) to build a beautiful, usable health app. The pair landed a $100,000 prize from the Robert Wood Johnson Foundation for their efforts. In the video embedded below, I interview Craig Munsen, one of the medical students, about his application. (Notably, the pair intends to use their prize to invest in the business, not pay off medical school debt.)

There are more notable applications and services to profile from this year's expo - and in the weeks ahead, expect to see some of them here on Radar, For now, it's important now to recognize the work of all of the men and women who have worked so hard over the past two years create public good from public data.

Releasing and making open health data useful, however, is about far more than these mHealth apps: It's about saving lives, improving the quality of care, adding more transparency to a system that needs it, and creating jobs. Park spoke with me this spring about how open data relates to much more than consumer-facing mHealth apps:

As the US CTO seeks to scale open data across federal government by applying the lessons learned in the health data initiative, look for more industries to receive digital fuel for innovation, from energy to education to transit and finance. The White House digital government strategy explicitly embraces releasing open data in APIs to enable more accountability, civic utility and economic value creation.

While major challenges lie ahead, from data quality to security or privacy, the opportunity to extend the data revolution in healthcare to other industries looks more tangible now than it has in years past.

Business publications, including the Wall Street Journal, have woken up to the disruptive potential of open government data As Michael Hickins wrote this week, "The potential applications for data from agencies as disparate as the Department of Transportation and Department of Labor are endless, and will affect businesses in every industry imaginable. Including yours. But if you can think of how that data could let someone disrupt your business, you can stop that from happening by getting there first."

This growing health data movement is not placed within any single individual city, state, agency or company. It's beautifully chaotic, decentralized, and self-propelled, said Park this past week.

"The Health Data Initiative is no longer a government initiative," he said. "It's an American one. "

May 29 2012

US CTO seeks to scale agile thinking and open data across federal government

In the 21st century, federal government must go mobile, putting government services and information at the fingertips of citizens, said United States Chief Technology Officer Todd Park in a recent wide-ranging interview. "That's the first digital government result, outcome, and objective that's desired."

To achieve that vision, Park and U.S. chief information officer Steven VanRoekel are working together to improve how government shares data, architects new digital services and collaborates across agencies to reduce costs and increase productivity through smarter use of information technology.

Park, who was chosen by President Obama to be the second CTO of the United States in March, has been (relatively) quiet over the course of his first two months on the job.

Last Wednesday, that changed. Park launched a new Presidential innovation Fellows program, in concert with VanRoekel's new digital government strategy, at TechCrunch's Disrupt conference in New York City. This was followed by another event for a government audience at the Interior Department headquarters in Washington, D.C. Last Friday, he presented his team's agenda to the President's Council of Advisors on Science and Technology.

"The way I think about the strategy is that you're really talking about three elements," said Park, in our interview. "First, it's going mobile, putting government services at the literal fingertips of the people in the same way that basically every other industry and sector has done. Second, it's being smarter about how we procure technology as we move government in this direction. Finally, it's liberating data. In the end, it's the idea of 'government as a platform.'"

"We're looking for a few good men and women"

In the context of the nation's new digital government strategy, Park announced the launch of five projects that this new class of Innovation Fellows will be entrusted with implementing: a broad Open Data Initiative, Blue Button for America, RFP-EZ, The 20% Campaign, and MyGov.

The idea of the Presidential Innovation Fellows Program, said Park, is to bring in people from outside government to work with innovators inside the government. These agile teams will work together within a six-month time frame to deliver results.

The fellowships are basically scaling up the idea of "entrepreneurs in residence," said Park. "It's a portfolio of five projects that, on top of the digital government strategy, will advance the implementation of it in a variety of ways."

The biggest challenge to bringing the five programs that the US CTO has proposed to successful completion is getting 15 talented men and women to join his team and implement them. There's reason for optimism. Park shared vie email that:

"... within 24 hours of TechCrunch Disrupt, 600 people had already registered via Whitehouse.gov to apply to be a Presidential Innovation Fellow, and another several hundred people had expressed interest in following and engaging in the five projects in some other capacity."

To put that in context, Code for America received 550 applications for 24 fellowships last year. That makes both of these fellowships more competitive than getting in to Harvard in 2012, which received 34,285 applications for its next freshman class. There appears to be considerable appetite for a different kind of public service that applies technology and data for the public good.

Park is enthusiastic about putting open government data to work on behalf of the American people, amplifying the vision that his predecessor, Aneesh Chopra, championed around the country for the past three years.

"The fellows are going to have an extraordinary opportunity to make government work better for their fellow citizens," said Park in our interview. "These projects leverage, substantiate and push forward the whole principle of liberating data. Liberate data."

"To me, one of the aspects of the strategy about which I am most excited, that sends my heart into overdrive, is the idea that going forward, the default state of government data shall be open and machine-readable," said Park. "I think that's just fantastic. You'll want to, of course, evolve the legacy data as fast as you can in that same direction. Setting that as 'this is how we are rolling going forward' — and this is where we expect data to ultimately go — is just terrific."

In the videos and interview that follow, Park talks more about his vision for each of the programs.

A federal government-wide Open Data Initiative

In the video below, Park discusses the Presidential Innovation Fellows program and introduces the first program, which focuses on open data:

Park: The Open Data Initiative is a program to seed and expand the work that we're doing to liberate government data as a platform. Encourage, on a voluntary basis, the liberation of data by corporations, as part of the national data platform, and to actively stimulate the development of new tools and services, and enhance existing tools and services, leveraging the data to help improve Americans' lives in very tangible ways, and create jobs for the future.

This leverages the Open Government Directive to say "look, the default going forward is open data." Also the directive to "API-ize" two high priority datasets and also, in targeted ways, go beyond that, and really push to get more data out there in, critically, machine-readable form, in APIs, and to educate the entrepreneur and innovators of the world that it's there through meetups, and hackathons, and challenges, and "Datapaloozas."

We're doubling down on the Health Data Initiative, we are also launching a much more high-profile Safety Data Initiative, which we kicked off last week. An Energy Data Initiative, which kicked off this week. An education data initiative, which we're kicking off soon, and an Impact Data Initiative, which is about liberating data with respect to inputs and outputs in the non-profit space.

We're also going to be exploring an initiative in the realm of personal finance, enabling Americans to access copies of their financial data from public sector agencies and private sector institutions. So, the format that we're going to be leveraging to execute these initiatives is cloned from the Health Data Initiative.

This will make new data available. It will also take the existing public data that is unusable to developers, i.e. in the form of PDFs, books or static websites, and turn it into liquid machine-readable, downloadable, accessible data via API. Then — because we're consistently hearing that 95% of the innovators and entrepreneurs who could turn our data into magic don't even know the data exists, let alone that it's available to them — engage the developer community and the entrepreneurial community with the data from the beginning. Let them know it's there, get their feedback, make it better.

Blue Button for America

Park: The idea is to develop an open source patient portal capability that will replace MyHealthyVet, which is the Veterans Administration's current patient portal. This will actually allow the Blue Button itself to iterate and evolve more rapidly, so that everY time you add more data to it, it won't require heart surgery. It will be a lot easier, and of course will be open source, so that anyone else who wants to use it can use it as well. On top of that, we're going to do a lot of "biz dev" in America to get the word out about Blue Button and encourage more and more holders of data in the private sector to adopt Blue Button. We're also going to work to help stimulate more tool development by entrepreneurs that can upload Blue Button data and make it useful in all kinds of ways for patients. That's Blue Button for America.

What is RFP-EZ?

Park: The objective is "buying smarter." The project that we're working ON with the Small Business Administration on is called "RFP-EZ."

Basically, it's the idea of setting up a streamlined process for the government to procure solutions from innovative, high-growth tech companies. As you know, most high-growth companies regard the government as way too difficult to sell to.

That A) deprives startups and high-growth companies from the government as a marketplace and, B) perhaps even more problematically, actually deprives the government of their solutions.

The hope here is, through the actions of the RFP-EZ team, to create a process and a prototype that the government can much more easily procure solutions from innovative private firms.

It A) opens up this emerging market called "the government" to high-tech startups and B) infects the government with more of their solutions, which are radically more, pound for pound, effective and cost efficient than a lot of the stuff that the government is currently procuring through conventional channels. That's RFP-EZ.

The 20% Campaign

Park: The 20% Campaign is a project that's being championed by USAID. It's an effort at USAID to, working with other government agencies, NGOs and companies, to catalog the movement of foreign assistance payments from cash to electronics. So, just for example, USAID pays its contractors electronically, obviously, but the contractor who, say, pays highway workers in Afghanistan or the way that police officers get paid in Afghanistan is actually principally via cash. Or has been. And that creates all kinds of waste issues, fraud, and abuse.

The idea is actually to move to electronic payment, including mobile payment — and this has the potential to significantly cut waste, fraud and abuse, to improve financial inclusion, to actually let people on phones, to enable them to access bank accounts set up for them. That leads to all kinds of good things, including safety: it's not ideal to be carrying around large amounts of cash in highly kinetic environments.

The Afghan National Police started paying certain contingents of police officers via mobile phones and mobile payments, as opposed to cash, and what happened is that the police officers started reporting an up to a 30% raise. Of course, their pay hadn't changed, but basically, when it was in cash, a bunch of it got lost. This is obviously a good thing, but it's even more important if you realize that when they were paid what they were paid in cash that they ultimately physically received, that was less than the Taliban in this province was actually paying people to join the Taliban — but the mobile payment, and that level of salary, was greater than the Taliban was paying. That's a critical difference.

It's basically taking foreign assistance payments through the last mile to mobile.

MyGov is the U.S. version of Gov.uk

Park: MyGov is an effort to rapidly prototype a citizen-centric system that allows Americans the information and resources of government that are right for them. Think of it as a personalized channel for Americans to be able to access information resources across government and get feedback from citizens about those information and resources.

How do you plan to scale what you learned while you were HHS CTO to the all of the federal government?

Park: Specifically, we're doing exactly the same thing we did with the Health Data Initiative, kicking off the initiatives with a "data jam" — an ideation workshop where we invite, just like with health data, 40 amazing tech and energy minds, tech and safety innovators, to a room — at the White House, in the case of the Safety Data Initiative, or at Stanford University, in the case of the Energy Initiative.

We walk into the room for several hours and say, "Here's a big pile of data. What would you do with this data?" And they invent 15 or 20 news classes of products or services of the future that we could build with the data. And then we challenge them to, at the end of the session, build prototypes or actual working products, that instantiates their ideas in 90 days, to be highlighted at a White House — hosted Safety Datapalooza, Energy Datapalooza, Education Datapalooza, Impact Datapalooza, etc.

We also take the intellectual capital from the workshops, publish it on the White House website, and publicize the opportunity around the country: Discover the data, come up with your own ideas, build prototypes, and throw your hat in the ring to showcase at a Datapalooza.

What happens at the Datapaloozas — our experience in health guides us — is that, first of all, the prototypes and working products inspire many more innovators to actually build new services, products and features, because the data suddenly becomes really concrete to them, in terms of how it could be used.

Secondly, it helps persuade additional folks in the government to liberate more data, making it available, making it machine-readable, as opposed to saying, "Look, I don't know what the upside is. I can only imagine downsides." What happened in health is, when they went to a Datapalooza, they actually saw that, if data is made available, then at no cost to you and no cost to taxpayers, other people who are very smart will build incredible things that actually enhance your mission. And so you should do the same.

As more data gets liberated, that then leads to more products and services getting built, which then inspires more data liberation, which then leads to more products and services getting built — so you have a virtual spiral, like what's happened in health.

The objective of each of these initiatives is not just to liberate data. Data by itself isn't helpful. You can't eat data. You can't pour data on a wound and heal it. You can't pour data on your house and make it more energy efficient. Data is only useful if it's applied to deliver benefit. The whole point of this exercise, the whole point of these kickoff efforts, is to catalyze the development of an ecosystem of data supply and data use to improve the lives of Americans in very tangible ways — and create jobs.

We have the developers and the suppliers of data actually talk to each other, create value for the American people, and then rinse, wash, repeat.

We're recruiting, to join the team of Presidential Innovation Fellows, entrepreneurs and developers from the outside to come in and help with this effort to liberate data, make it machine-readable, and get it out there to entrepreneurs and help catalyze development of this ecosystem.

We went to TechCrunch Disrupt for a reason: it's right smack dab center in the middle of people we want to recruit. We invite people to check out the projects on WhiteHouse.gov and, if you're interested in applying to be a fellow, indicate their interest. Even if they can't come to DC for 6-plus months to be a fellow, but they want to follow one of the projects or contribute or help in some way, we are inviting them express interest in that as well. For example, if you're an entrepreneur, and you're really interested in the education space, and learning about what data is available in education, you can check out the project, look at the data, and perhaps you can build something really good to show at the Education Datapalooza.

Is open data just about government data? What about smart disclosure?

Park: In the context of the Open Data Initiatives projects, it's not just about liberation of government health data: it's also about government catalyzing the release, on a voluntary basis, of private sector data.

Obviously, scaling Blue Button will extend the open data ecosystem. We're also doubling down on Green Button. I was just in California to host discussions around Green Button. Utilities representing 31 million households and businesses have now committed to make Green Button happen. Close to 10 million households and businesses already have access to Green Button data.

There's also a whole bunch of conversation happening about, at some point later this year, having the first utilities add the option of what we're calling "Green Button Connect." Right now, the Green Button is a download, where you go to a website, hit a green button and bam, you download your data. Green Button Connect is the ability for you to say as a consumer, "I authorize this third party to receive a continuous feed of my electricity usage data."

That creates massive additional opportunity for new products and services. That could go live later this year.

As part of the education data initiative, we are pursuing the launch and scale up of something called "My Data," which will have a red color button. (It will probably, ultimately, be called "Red Button.") This is the ability for students and their families to download an electronic copy of their student loan data, of their transcript data, of their academic assessment data.

That notion of people getting their own data, whether it's your health data, your education data, your finance data, your energy use data, that's an important part of these open data initiatives as well, with government helping to catalyze the release of that data to then feed the ecosystem.

How does open data specifically relate to the things that Americans care about, access to healthcare, reducing energy bills, giving their kids more educational opportunities, and job creation? Is this just about apps?

Park: In healthcare, for example, you'll see a growing array of examples that leverage data to create tangible benefit in many, many ways for Americans. Everything from helping me find the right doctor or hospital for my family to being notified of a clinical trial that could assist my profile and save my life, and the ability to get the latest and greatest information about how to manage my asthma and diabetes via government knowledge in the National Library of Medicine.

There is a whole shift in healthcare systems away from pay-for-volume of services to basically paying to get people healthy. It goes by lots of different names — accountable care organizations or episodic payment — but the fundamental common theme is that the doctors and hospitals increasingly will be paid to keep people healthy and to co-ordinate their care, and keep them out of the hospital, and out of the ER.

There's a whole fleet of companies and services that utilize data to help doctors and hospitals do that work, like utilize Medicare claims data to help identity segments of a patient population that are at real risk, and need to get to the ER or hospital soon. There are tools that help journalists identify easily public health issues, like healthcare outcomes disparities by race, gender and ethnicity. There are tools that help country commissioners and mayors understand what's going on in a community, from a health standpoint, and make better policy decisions, like showing them food desserts. There's just a whole fleet of rapidly growing services for consumers, for doctors, nurses, journalists, employers, public policy makers, that help them make decisions, help them deliver improved health and healthcare, and create jobs, all at the same time.

That's very exciting. If you look at all of those products and services — and a subset of them are the ones that self-identify to us, to actually be exhibited at the Health Datapaloozas. Look at the 20 healthcare apps that were at the first Datapalooza or the 50 that were at the second. This year, there are 230 companies that are being narrowed down to about a total of 100 that will be at the Datapalooza. They collectively serve millions of people today, either through brand new products and services or through new features on existing platforms. They help people in ways that we would never have thought of, let alone build.

The taxpayer dollars expended here were zero. We basically just took our data, made it available in machine-readable format, educated entrepreneurs that it was there, and they did the rest. Think about these other sectors, and think about what's possible in those sectors.

In education, through making the data that we've made available, you can imagine much better tools to help you shop for the college that will deliver the biggest bang for your buck and is the best fit for your situation.

We've actually made available a bunch of data about college outcomes and are making more data available in machine-readable form so it can feed college search tools much better. We are going to be enabling students to download machine-readable copies of their own financial aid application, student loan data and school records. That will really turbo charge "smart scholarship" and school search capabilities for those students. You can actually mash that up with college outcomes in a really powerful, personalized college and scholarship search engine that is enabled by your personal data plus machine-readable data. Tools that help kids and their parents pick the right college for their education and get the right financial aid, that's something government is going to facilitate.

In the energy space, there are apps and services that help you leverage your Green Button data and other data to really assess your electricity usage compared to that of others and get concrete tips on how you can actually save yourself money. We're already seeing very clever, very cool efforts to integrate gamification and social networking into that kind of app, to make it a lot more fun and engaging — and make yourself money.

One dataset that's particularly spectacular that we're making a lot more usable is the EnergyStar database. It's got 40,000 different appliances, everything from washing machines to servers that consumers and businesses use. We are creating a much, much easier to use public, downloadable NSTAR database. It's got really detailed information on the energy use profiles and performance of each of these 40,000 appliances and devices. Imagine that actually integrated into much smarter services.

On safety, the kinds of ideas that people are bringing together are awesome. They're everything from using publicly available safety data to plot the optimal route for your kid to walk home or for a first responder to travel through a city and get to a place most expeditiously.

There's this super awesome resource on Data.gov called the "Safer Products API," which is published by the Consumer Products Safety Commission (CPSC). Consumers send in safety reports to CPSC, but until March of last year, you had to FOIA [Freedom of Information Act] CPSC to get these. So what they've now done is actually publish an API which not only makes the entire database of these reports public, without you having to FOIA them, but also makes it available through an API.

One of the ideas that came up is that, when people buy products on eBay, Craiglist, etc, all the time, some huge percentage of Americans never get to know about a recall — a recall of a crib, a recall of a toy. And even when a company recalls new products, old products are in circulation. What if someone built the ability to integrate the recall data and attach it to all the stuff in the eBays and Craigslists of the world?

Former CIO Vivek Kundra often touted government recall apps based upon government data during his tenure. Is this API the same thing, shared again, or something new?

Park: I think the smartest thing the government can do with data like product recalls data is not build our own shopping sites, or our own product information sites: it's to get the information out there in machine-readable form, so that lots and lots of other platforms that have audiences with millions of people already, and who are really good at creating shopping experiences or product comparison experiences, get the data into their hands, so that they can integrate it seamlessly into what they do. I feel that that's really the core play that the government should be engaged in.

I don't know if the Safer Products API was included in the recall app. What I do know is that before 2011, you had to FOIA to get the data. I think that even if the government included it in some app the government built, that it's important for it to get used by lots and lots of other apps that have a collective audience that's massively greater than any app the government could itself build.

Another example of this is the Hospital Compare website. The Hospital Compare website has been around for a long time. Nobody knows about it. There was a survey done that found 94% of Americans didn't know that there was hospital quality data that was available, let alone that there was a hospital compare website. So, the notion of A) making the hospital care data downloadable and B), we actually deployed it a year and a half ago in API form at Medicare.gov.

That then makes the data much easier for lots of other platforms to incorporate it, that are far more likely than HospitalCompare.gov to be able to present the information in actionable forms for citizens. Even if we build our own apps, we have to get this data out to lots of other people that can help people with it. To do that, we have to make it machine-readable, we have to put it into RESTFUL APIs — or at least make it downloadable — and get the word out to entrepreneurs that it's something they can use.

This is a stunning arbitrage opportunity. Even if you take all this data and you "API-ize" it, it's not automatic that entrepreneurs are going to know it's there.

Let's assume that the hospital quality data is good — which it is — and that you build it, and put it into an API. If nobody knows about it, you've delivered no value to the American people. People don't care whether you API a bunch of data. What they care about is that when they need to find a hospital, like I did, for my baby, I can get that information.

The private sector, in the places where we have pushed the pedal to the medal on this, has just demonstrated the incredible ability to make this data a lot more relevant and help a lot more people with it than we could have by ourselves.

White House photo used on associated home and category pages: white house by dcJohn, on Flickr

May 22 2012

Data journalism research at Columbia aims to close data science skills gap

Successfully applying data science to the practice of journalism requires more than providing context and finding clarity in vasts amount of unstructured data: it will require media organizations to think differently about how they work and who they venerate. It will mean evolving towards a multidisciplinary approach to delivering stories, where reporters, videographers, news application developers, interactive designers, editors and community moderators collaborate on storytelling, instead of being segregated by departments or buildings.

The role models for this emerging practice of data journalism won't be found on broadcast television or on the lists of the top journalists over the past century. They're drawn from the increasing pool of people who are building new breeds of newsrooms and extending the practice of computational journalism. They see the reporting that provisions their journalism as data, a body of work that can itself can be collected, analyzed, shared and used to create longitudinal insights about the ways that society, industry or government are changing. (Or not, as the case may be.)

In a recent interview, Emily Bell (@EmilyBell), director of the Tow Center for Digital Journalism at the Columbia University School of Journalism, offered her perspective about what's needed to train the data journalists of the future and the changes that still need to occur in media organizations to maximize their potential. In this context, while the role of institutions and "journalism education are themselves evolving, they both will still fundamentally matter for "what's next," as practitioners adapt to changing newsonomics.

Our discussion took place in the context of a notable investment in the future of data journalism: a $2 million research grant to Columbia University from the Knight Foundation to research and distribute best practices for digital reportage, data visualizations and measuring impact. Bell explained more about what how the research effort will help newsrooms determine what's next on the Knight Foundation's blog:

The knowledge gap that exists between the cutting edge of data science, how information spreads, its effects on people who consume information and the average newsroom is wide. We want to encourage those with the skills in these fields and an interest and knowledge in journalism to produce research projects and ideas that will both help explain this world and also provide guidance for journalism in the tricky area of ‘what next’. It is an aim to produce work which is widely accessible and immediately relevant to both those producing journalism and also those learning the skills of journalism.

We are focusing on funding research projects which relate to the transparency of public information and its intersection with journalism, research into what might broadly be termed data journalism, and the third area of ‘impact’ or, more simply put, what works and what doesn’t.

Our interview, lightly edited for content and clarity, follows.

What did you do before you became director of the Tow Center for Digital Journalism?

I spent ten years where I was editor-in-chief of The Guardian website. During the last four of those, I was also overall director of digital content for all The Guardian properties. That included things like mobile applications, et cetera, but from the editorial side.

Over the course of that decade, you saw one or two things change online, in terms of what journalists could do, the tools available to them and the news consumption habits of people. You also saw the media industry change, in terms of the business models and institutions that support journalism as we think of it. What are the biggest challenges and opportunities for the future journalism?

For newspapers, there was an early warning system: that newspaper circulation has not really consistently risen since the early 1980s. We had a long trajectory of increased production and actually, an overall systemic decline which has been masked by a very, very healthy advertising market, which really went on an incredible bull run with a more static pictures, and just "widen the pipe," which I think fooled a lot of journalism outlets and publishers into thinking that that was the real disruption.

And, of course, it wasn’t.

The real disruption was the ability of anybody anywhere to upload multimedia content and share it with anybody else who was on a connected device. That was the thing that really hit hard, when you look at 2004 onwards.

What journalism has to do is reinvent its processes, its business models and its skillsets to function in a world where human capital does not scale well, in terms of sifting, presenting and explaining all of this information. That’s really the key to it.

The skills that journalists need to do that -- including identifying a story, knowing why something is important and putting it in context -- are incredibly important. But how you do that, which particular elements you now use to tell that story are changing.

Those now include the skills of understanding the platform that you’re operating on and the technologies which are shaping your audiences’ behaviors and the world of data.

By data, I don’t just mean large caches of numbers you might be given or might be released by institutions: I mean that the data thrown off by all of our activity, all the time, is simply transforming the speed and the scope of what can be explained and reported on and identified as stories at a really astonishing speed. If you don’t have the fundamental tools to understand why that change is important and you don’t have the tools to help you interpret and get those stories out to a wide public, then you’re going to struggle to be a sustainable journalist.

The challenge for sustainable journalism going forward is not so different from what exists in other industries: there's a skills gap. Data scientists and data journalists use almost the exact same tools. What are the tools and skills that are needed to make sense of all of this data that you talked about? What will you do to catalog and educate students about them?

It's interesting when you say that the skills of these clients are very similar, which is absolutely right. First of all, you have a basic level of numeracy needed - and maybe not just a basic level, but a more sophisticated understanding of statistical analysis. That’s not something which is routinely taught in journalism schools but that I think will increasingly have to be.

The second thing is having some coding skills or some computer science understanding to help with identifying the best, most efficient tools and the various ways that data is manipulated.

The third thing is that when you’re talking about 'data scientists,' it’s really a combination of those skills. Adding data doesn’t mean you don't have to have other journalism skills which do not change: understanding context, understanding what the story might be, and knowing how to derive that from the data that you’re given or the data that exists. If it’s straightforward, how do you collect it? How do you analyze it? How do you interpret them and present it?

It’s easy to say, but it’s difficult to do. It’s particularly difficult to reorient the skillsets of an industry which have very much resided around the idea of a written story and an ability with editing. Even in the places where I would say there’s sophisticated use of data in journalism, it’s still a minority sport.

I’ve talked to several heads of data in large news organizations and they’ve said, “We have this huge skills gap because we can find plenty of people who can do the math; we can find plenty of people who are data scientists; we can’t find enough people who have those skills but also have a passion or an interest in telling stories in a journalistic context and making those relatable.”

You need a mindset which is about putting this in the context of the story and spotting stories, as well having creative and interesting ideas about how you can actually collect this material for your own stories. It’s not a passive kind of processing function if you’re a data journalist: it’s an active speaking, inquiring and discovery process. I think that that’s something which is actually available to all journalists.

Think about just local information and how local reporters go out and speak to people every day on the beat, collect information, et cetera. At the moment, most get from those entities don’t structure the information in a way that will help them find patterns and build new stories in the future.

This is not just about an amazing graphic that the New York Times does with census data over the past 150 years. This is about almost every story. Almost every story has some component of reusability or a component where you can collect the data in a way that helps your reporting in the future.

To do that requires a level of knowledge about the tools that you’re using, like coding, Google Refine or Fusion Tables. There are lots of freely available tools out there that are making this easier. But, if you don’t have the mindset that approaches, understands and knows why this is going to help you and make you a better reporter, then it’s sometimes hard to motivate journalists to see why they might want to grab on.

The other thing to say, which is really important, is there is currently a lack of both jobs and role models for people to point to and say, “I want to be that person.”

I think the final thing I would say to the industry is we’re getting a lot of smart journalists now. We are one of the schools where all of our digital concentrations from students this year include a basic grounding in data journalism. Every single one of them. We have an advanced course taught by Susan McGregor in data visualization. But we’re producing people from the school now, who are being hired to do these jobs, and the people who are hiring them are saying, “Write your own job description because we know we want you to do something, we just don’t quite know what it is. Can you tell us?”

You can’t cookie-cutter these people out of schools and drop them into existing roles in news trends because those are still developing. What we’re seeing are some very smart reporters with data-centric mindsets and also the ability to do these stories -- but they want to be out reporting. They don’t want to be confined to a desk and a spreadsheet. Some editors usually find that very hard to understand, “Well, what does that job look like?”

I think that this is where working with the industry, we can start to figure some of these things out, produce some experimental work or stories, and do some of the thinking in the classroom that helps people figure out what this whole new world is going to look like.

What do journalism schools need to do to close this 'skills gap?' How do they need to respond to changing business models? What combination of education, training and hands-on experience must they provide?

One of the first things they need to do is identify the problem clearly and be honest about it. I like to think that we’ve done that at Columbia, although I’m not a data journalist. I don’t have a background in it. I’m a writer. I am, if you like, completely the old school.

But one of the things I did do at The Guardian was helped people who early on said to me, “Some of this transformation means that we have to think about data as being a core part of what we do.” Because of the political context and the position I was in, I was able to recognize that that was an important thing that they were saying and we could push through changes and adoption in those areas of the newsroom.

That’s how The Guardian became interested in data. It’s the same in journalism school. One of the early things that we talked about [at Columbia] was how we needed to shift some of what the school did on its axis and acknowledge that this was going to be key part of what we do in the future. Once we acknowledged that that is something we had to work towards, [we hired] Susan McGregor from the Wall Street Journal’s Interactive Team. She’s an expert in data journalism and has an MA in technology in education.

If you say to me, “Well, what’s the ground vision here?” I would say the same thing I would say to anybody: over time, and hopefully not too long a course of time, we want to attract a type of student that is interested and capable in this approach. That means getting out and motivating and talking to people. It means producing attractive examples which high school children and undergraduate programs think about [in their studies]. It means talking to the CS [computer science] programs -- and, in fact, more about talking to those programs and math majors than you would be talking to the liberal arts professors or the historians or the lawyers or the people who have traditionally been involved.

I think that has an effect: it starts to show people who are oriented towards storytelling but have capabilities which are align more with data science skill sets that there’s a real task for them. We can’t message that early enough as an industry. We can’t message it early enough as an educator to get people into those tracks. We have to really make sure that the teaching is high quality and that we’re not just carried away with the idea of the new thing, we need to think pretty deeply about how we get those skills.

What sort of basic sort of statistical teaching do you need? What are the skills you need for data visualization? How do you need to introduce design as well as computer science skills into the classroom, in a way which makes sense for stories? How do you tier that understanding?

You're always going to produce superstars. Hopefully, we’ll be producing superstars in this arena soon as well.

We need to take the mission seriously. Then we need to build resources around it. And that’s difficult for educational organizations because it takes time to introduce new courses. It takes time to signal that this is something you think is important.

I think we’ve done a reasonable job of that so far at Columbia, but we’ve got a lot further to go. It's important that institutions like Columbia do take the lead and demonstrate that we think this is something that has to be a core curriculum component.

That’s hard, because journalism schools are known for producing writers. They’re known for different types of narratives. They are not necessarily lauded for producing math or computer science majors. That has to change.

Related:

May 16 2012

How to start a successful business in health care at Health 2.0 conference

Great piles of cash are descending on entrepreneurs who develop health care apps, but that doesn't make it any easier to create a useful one that your audience will adopt. Furthermore, lowered costs and streamlined application development technique let you fashion a working prototype faster than ever, but that also reduces the time you can fumble around looking for a business model. These were some of the insights I got at Spring Fling 2012: Matchpoint Boston, put on by Health 2.0 this week.

This conference was a bit of a grab-bag, including one-on-one meetings between entrepreneurs and their potential funders and customers, keynotes and panels by health care experts, round-table discussions among peers, and lightning-talk demos. I think the hallway track was the most potent part of this conference, and it was probably planned that way. The variety at the conference mirrors the work of Health 2.0 itself, which includes local chapters, challenges, an influential blog, and partnerships with a range of organizations. Overall, I appreciated the chance to get a snapshot of a critical industry searching for ways to make a positive difference in the world while capitalizing on ways to cut down on the blatant waste and mismanagement that bedevil the multi-trillion-dollar health care field.

Let's look, for instance, at the benefits of faster development time. Health IT companies go through fairly standard early stages (idea, prototype, incubator, venture capital funding) but cochairs Indu Subaiya and Matthew Holt showed slides demonstrating that modern techniques can leave companies in the red for less time and accelerate earnings. On the other hand, Jonathan Bush of athenahealth gave a keynote listing bits of advice for company founders and admitting that his own company had made significant errors that required time to recover from. Does the fast pace of modern development leave less room for company heads to make the inevitable mistakes?

I also heard Margaret Laws, director of the California HealthCare Foundation's Innovations Fund, warn that most of the current applications being developed for health care aim to salve common concerns among doctors or patients but don't address what she calls the "crisis points" in health care. Brad Fluegel of Health Evolution Partners observed that, with the flood of new entrepreneurs in health IT, a lot of old ideas are being recycled without adequate attention to why they failed before.

I'm afraid this blog is coming out too negative, focusing on the dour and the dire, but I do believe that health IT needs to acknowledge its risks in order to avoid squandering the money and attention it's getting, and on the positive side to reap the benefits of this incredibly fertile moment of possibilities in health care. Truly, there's a lot to celebrate in health IT as well. Here are some of the fascinating start-ups I saw at the show:

  • hellohealth aims at that vast area of health care planning and administration that cries out for efficiency improvements--the area where we could do the most good by cutting costs without cutting back on effective patient care. Presenter Shahid Shah described the company as the intersection of patient management with revenue cycle management. They plan to help physicians manage appointments and follow-ups better, and rationalize the whole patient experience.

  • hellohealth will offer portals for patients as well. They're unique, so far as I know, in charging patients for certain features.

  • Corey Booker demo'd onPulse, which aims to bring together doctors with groups of patients, and patients with groups of the doctors treating them. For instance, when a doctor finds an online article of interest to diabetics, she can share it with all the patients in her practice suffering from diabetes. onPulse also makes it easier for a doctor to draw in others who are treating the same patient. The information built up about their interactions can be preserved for billing.

    onPulse overlaps in several respects with HealthTap, a doctor-patient site that I've covered several times and for which an onPulse staffer expressed admiration. But HealthTap leaves discussions out in the open, whereas onPulse connects doctors and patients in private.

  • HealthPasskey.com is another one of these patient/doctor services with a patient portal. It allows doctors to upload continuity of care documents in the standard CCD format to the patient's site, and supports various services such as making appointments.

    A couple weeks ago I reported a controversy over hospitals' claims that they couldn't share patient records with the patients. Check out the innovative services I've just highlighted here as a context for judging whether the technical and legal challenges for hospitals are really too daunting. I recognize that each of the sites I've described pick off particular pieces of the EHR problem and that opening up the whole kit and kaboodle is a larger task, but these sites still prove that all the capabilities are in place for institutions willing to exploit them.

  • GlobalMed has recently released a suitcase-sized box that contains all the tools required to do a standard medical exam. This allows traveling nurse practitioners or other licensed personnel to do a quick check-up at a patient's location without requiring a doctor or a trip to the clinic. Images can also be taken. Everything gets uploaded to a site where a doctor can do an assessment and mark up records later. The suitcase weighs about 30 pounds, rolls on wheels, and costs about $30,000 (price to come down if they start manufacturing in high quantities).

  • SwipeSense won Health 2.0's 100 Day Innovation Challenge. They make a simple device that hospital staff can wear on their belts and wipe their hands on. This may not be as good as washing your hands, but takes advantage of people's natural behavior and reduces the chance of infections. It also picks up when someone is using the device and creates reports about compliance. SwipeSense is being tested at the Rush University Medical Center.

  • Thryve, one of several apps that helps you track your food intake and make better choices, won the highest audience approval at Thursday's Launch! demos.

  • Winner of last weekend's developer challenge was No Sleep Kills, an app that aims to reduce accidents related to sleep deprivation (I need a corresponding app to guard against errors from sleep-deprived blogging). You can enter information on your recent sleep patterns and get back a warning not to drive.

It's worth noting that the last item in that list, No Sleep Kills, draws information from Health and Human Services's Healthy People site. This raises the final issue I want to bring up in regard to the Spring Fling. Sophisticated developers know their work depends heavily on data about public health and on groups of patients. HHS has actually just released another major trove of public health statistics. Our collective knowledge of who needs help, what works, and who best delivers the care would be immensely enhanced if doctors and institutions who currently guard their data would be willing to open it up in aggregate, non-identifiable form. I recently promoted this ideal in coverage of Sage Congress.

In the entirely laudable drive to monetize improvements in health care, I would like the health IT field to choose solutions that open up data rather than keep it proprietary. One of the biggest problems with health care, in this age of big data and incredibly sophisticated statistical tools, is our tragedy of the anti-commons where each institution seeks to gain competitive advantage through hoarding its data. They don't necessarily use their own data in socially beneficial ways, either (they're more interested in ratcheting up opportunities for marketing expensive care). We need collective sources of data in order to make the most of innovation.

OSCON 2012 Healthcare Track — The conjunction of open source and open data with health technology promises to improve creaking infrastructure and give greater control and engagement to patients. Learn more at OSCON 2012, being held July 16-20 in Portland, Oregon.

Save 20% on registration with the code RADAR20

May 04 2012

Top Stories: April 30-May 4, 2012

Here's a look at the top stories published across O'Reilly sites this week.

The U.K.'s battle for open standards
Influence, money, a bit of drama — not things you typically associate with open standards, yet that's what the U.K. government is facing as it evaluates open options.

Mobile web development isn't slowing down
Over the last two years, mobile web development has continued its rapid evolution. In this interview, Fluent speaker and "Programming the Mobile Web" author Maximiliano Firtman discusses the short-term changes that caught his attention.

Editorial Radar: Functional languages
O'Reilly editors Mike Loukides and Mike Hendrickson discuss the advantages of functional programming languages and how functional language techniques can be deployed with almost any language.


Jason Grigsby and Lyza Danger Gardner on mobile web design
In this Velocity podcast, the co-authors of "Head First Mobile Web" discuss mobile website optimization, mobile design considerations, and common mobile development mistakes.

Parliament / Big Ben photo: UK parliament by Alan Cleaver, on Flickr


Fluent Conference: JavaScript & Beyond — Explore the changing worlds of JavaScript & HTML5 at the O'Reilly Fluent Conference, May 29 - 31 in San Francisco. Save 20% on registration with the code RADAR20.

May 03 2012

Strata Week: Google offers big data analytics

Here are the data stories that caught my attention this week.

BigQuery for everyone

Google BigQueryGoogle has released its big data analytics service BigQuery to the public. Initially made available to a small number of developers late last year, now anyone can sign up for the service. A free account lets you query up to 100 GB of data per month, with the option to pay for additional queries and/or storage.

"Google's aim may be to sell data storage in the cloud, as much as it is to sell analytic software," says The New York Times' Quentin Hardy. "A company using BigQuery has to have data stored in the cloud data system, which costs 12 cents a gigabyte a month, for up to two terabytes, or 2,000 gigabytes. Above that, prices are negotiated with Google. BigQuery analysis costs 3.5 cents a gigabyte of data processed."

The interface for BigQuery is meant to lower the bar for these sorts of analytics — there's a UI and a REST interface. In the Times article, Google project manager Ju-kay Kwek says Google is hoping developers build tools that encourage widespread use of the product by executives and other non-developers.

If folks are looking for something to cut their teeth on with BigQuery, GitHub's public timeline is now a publicly available dataset. The data is being synced regularly, so you can query things like popular languages and popular repos. To that end, GitHub is running a data visualization contest.

The Data Journalism Handbook

The Data Journalism Handbook had its release this week at the 2012 International Journalism Festival in Italy. The book, which is freely available and openly licensed, was a joint effort of the European Journalism Centre and the Open Knowledge Foundation. It's meant to serve as a reference for those interested in the field of data journalism.

In the introduction, "Deutsche Welle's" Mirko Lorenz writes:

"Today, news stories are flowing in as they happen, from multiple sources, eye-witnesses, blogs, and what has happened is filtered through a vast network of social connections, being ranked, commented and more often than not, ignored. This is why data journalism is so important. Gathering, filtering and visualizing what is happening beyond what the eye can see has a growing value."


Velocity 2012: Web Operations & Performance — The smartest minds in web operations and performance are coming together for the Velocity Conference, being held June 25-27 in Santa Clara, Calif.



Save 20% on registration with the code RADAR20

Open data is a joke?

Tim Slee fired a shot across the bow of the open data movement with a post this week arguing that "the open data movement is a joke." Moreover, it's not a movement at all, he contends. Slee turns a critical eye to the Canadian government's open data efforts in particular, noting that: "The Harper government's actions around 'open government,' and the lack of any significant consequences for those actions, show just how empty the word 'open' has become."

Slee is also critical of open data efforts outside the government, calling the open data movement "a phrase dragged out by media-oriented personalities to cloak a private-sector initiative in the mantle of progressive politics."

Open data activist David Eaves responded strongly to Slee's post with one of his own, recognizing his own frustrations with "one of the most — if not the most — closed and controlling [governments] in Canada's history." But Eaves takes exception with the ways in which Slee characterizes the open data movement. He contends that many of the corporations involved with the open data movement — something Slee charges has corrupted open data — are U.S. corporations (and points out that in Canada, "most companies don't even know what open data is"). Eaves adds, too, that many of these corporations are led by geeks.

Eaves writes:

"Just as an authoritarian regime can run on open-source software, so too might it engage in open data. Open data is not the solution for Open Government (I don't believe there is a single solution, or that Open Government is an achievable state of being — just a goal to pursue consistently), and I don't believe anyone has made the case that it is. I know I haven't. But I do believe open data can help. Like many others, I believe access to government information can lead to better informed public policy debates and hopefully some improved services for citizens (such as access to transit information). I'm not deluded into thinking that open data is going to provide a steady stream of obvious 'gotcha moments' where government malfeasance is discovered, but I am hopeful that government data can arm citizens with information that the government is using to inform its decisions so that they can better challenge, and ultimately help hold accountable, said government."

Got data news?

Feel free to email me.

Related:

May 02 2012

The UK's battle for open standards

Many of you are probably not aware, but there is an ongoing battle within the U.K. that will shape the future of the U.K. tech industry. It's all about open standards.

Last year, the Cabinet Office ran a consultation on open standards covering 970 CIOs and academics. The result of this consultation was a policy (PDF) in favour of royalty-free (RF) open standards in the U.K. I'm not going to go through the benefits of open standards in this space, other than to note that they are essential for the U.K.'s future competitive position, for spurring on innovation and creating a level playing field within the tech field. For those who wish to read more on this subject, Mark Thompson, the only academic I know to have published a paper on open standards in a quality peer reviewed journal, has provided an excellent overview.

Normally, I put these battles into an historical context, and I certainly have a plethora of examples of past industries attempting to lobby against future change. However, to keep this short I'll simply note that the incumbent industry has reacted to the Cabinet Office policy with attempts to redefine open standards to include non-open FRAND (fair, reasonable and non discriminatory) licenses and portray some sort of legitimate debate of RF versus FRAND, which doesn't exist.

Whilst this is clearly wrong and underhanded, there's another story I wish to focus on. It relates to the accusations that the meetings have been filled with "spokespeople for big vendors to argue in favour of paid-for software, specifically giving advocates of FRAND the chance to argue that free software on RF terms would be a bad thing" as reported by TechWeek Europe.

The back story is that since the Government policy on open standards was put in place, the Cabinet Office was pressured into a u-turn and running another consultation by various standards bodies and other vested interests. The arguments used were either fortuitous misunderstandings of the policy or willful misinformation in favour of current business interests. The Cabinet Office then appeared to relent to the pressure and undertake a second set of consultations. What happened next shows the sorry behaviour of lobbyists in our industry.

"Software patent heavyweights piled into the first public meeting," filling the room with unrepresentative views backed up by vendors flying in senior individuals from the U.S. It apparently seems that the chair of the roundtable was himself a paid lobbyist working on behalf of those vested interests, a fact that he forgot to mention to the Cabinet Office. Microsoft has now been "accused of trying to secretly influence government consultation."

What's surprising is that the majority of this had been uncovered by two journalists — Mark Ballard at Computer Weekly and Glyn Moody — who work mainly outside the mainstream media. In fact, the mainstream media has remained silent on the issue, with the notable exception of The Guardian.

The end result of the work of these two journalists is that the Cabinet Office has had to extend the consultation and, as noted by The Guardian, "rerun one of its discussion roundtables after it found that an independent facilitator of one of its discussions was simultaneously advising Microsoft on the consultation."

So, we have two plucky journalists who stand alone uncovering large corporations that are bullying Government to protect profits worth hundreds of millions. Our heroes' journey uncovers gerrymandering, skullduggery, rampant conflicts of interests, dubious ethics and a host of other sordid details and ... hold on, this sounds like a Hollywood script, not real life. Why on earth isn't mainstream media all over this, especially given the leaked Bell Pottinger memo on exploiting citizen initiatives?

The silence makes me wonder whether investigative journalism into things that might matter and might make a positive difference doesn't sell much advertising? Would it help if the open standards battle had celebrity endorsement? Alas, that's not the case and the battle for open standards might have been extended, but it is still ongoing. This issue is as important to the U.K. as SOPA / PIPA were to the U.S., but rather than fighting against a Government trying to do something that harms the growth of future industry, we are fighting with a Government trying to do the right thing and benefit a nation.

If you're too busy to help, that's understandable, but don't ever grumble about why the U.K. Government doesn't do more to support open standards and open source. The U.K. Government is trying to make a difference. It's trying to fight a good fight against a huge and well-funded lobby, but it needs you to turn up.

The battle for open standards needs help, so get involved.

Related:

April 18 2012

What responsibilities and challenges come with open government?

A historic Open Government Partnership launched in New York City last September with 8 founding countries. Months later representatives from 73 countries and 55 governments have come together to present their open government action plans and formally endorse the principles in the Open Government Partnership. Yesterday, hundreds of attendees from government, civil society, media and the private sector watched in person and online as Brazilian President Dilma Rousseff spoke about her country's efforts to root out corruption and engage the Brazilian people in governance and more active citizenship. United States Secretary of State Hillary Clinton preceded her, defining an open or closed society as a key dividing line of the 21st century.

Today's agenda includes more regional breakouts and an opening plenary session on the "Responsibility and Challenges that Come with Openness." If you have an Internet connection, you should be able to watch the discussion in the embedded player below:

Watch live streaming video from ogp2012 at livestream.com

The plenary will feature Walid al-Saqaf of YemenPortal.net & Alkasir, minister Francis Maude from the United Kingdom, Tunisian Secretary of State Ben Abbes, and Fernando Rodrigues, and investigative journalist from Folha de São Paulo in Brazil.

The liveblog of the entire proceedings is embedded below.



April 10 2012

Open source is interoperable with smarter government at the CFPB

CFPBWhen you look at the government IT landscape of 2012, federal CIOs are being asked to address a lot of needs. They have to accomplish your mission. They need to be able to scale initiatives to tens of thousands of agency workers. They're under pressure to address not just network security but web security and mobile device security. They also need to be innovative, because all of this is supported by the same of less funding. These are common requirements in every agency.

As the first federal "start-up agency" in a generation, some of those needs at the Consumer Financial Protection Bureau (CFPB) are even more pressing. On the other hand, the opportunity for the agency to be smarter, leaner and "open from the beginning" is also immense.

Progress establishing the agency's infrastructure and culture over the first 16 months has been promising, save for larger context of getting a director at the helm. Enabling open government by design isn't just a catchphrase at the CFPB. There has been a bold vision behind the CFPB from the outset, where a 21st century regulator would leverage new technologies to find problems in the economy before the next great financial crisis escalates.

In the private sector, there's great interest right now is finding actionable insight in large volumes of data. Making sense of big data is increasingly being viewed as a strategic imperative in the public sector as well. Recently, the White House put its stamp on that reality with a $200 million big data research and development initiative, including a focus on improving the available tools. There's now an entire ecosystem of software around Hadoop, which is itself open source code. The problem that now exists in many organizations, across the public and private sector, is not so much that the technology to manipulate big data isn't available: it's that the expertise to apply big data doesn't exist in-house. The data science talent shortage is real.

People who work and play in the open source community understand the importance of sharing code, especially when that action leads to improving the code base. That's not necessarily an ethic or a perspective that has been pervasive across the federal government. That does seem to be slowly changing, with leadership from the top: the White House used Drupal for its site and has since contributed modules back into the open source community, including one that helps with 508 compliance.

In an in-person interview last week, CFPB CIO Chris Willey (@ChrisWilleyDC) and acting deputy CIO Matthew Burton (@MatthewBurton) sat down to talk about the agency's new open source policy, government IT, security, programming in-house, the myths around code-sharing, and big data.

The fact that this government IT leadership team is strongly supportive of sharing code back to the open source community is probably the most interesting part of this policy, as Scott Merrill picked up in his post on the CFPB and Github.

Our interview follows.

In addition to being the leader of the CFPB's development team over the past year and half, Burton was just named acting deputy chief information officer. What will that mean?

Willey: He hasn't been leading the software development team the whole time. In fact, we only really had an org chart as of October. In the time that he's been here, Matt has led his team to some amazing things. We're going to talk about a one of them today, but we've also got a great intranet. We've got some great internal apps that are being built and that we've built. We've unleashed one version of the supervision system that helps bank examiners do their work in the field. We've got a lot of faith he's going to do great things.

What it actually means is that he's going to be backing me up as CIO. Even though we're a fairly small organization, we have an awful lot going on. We have 76 active IT projects, for example. We're just building a team. We're actually doubling in size this fiscal year, from about 35 staff to 70, as well as adding lots of contractors. We're just growing the whole pie. We've got 800 people on board now. We're going to have 1,100 on board in the whole bureau by the end of the fiscal year. There's a lot happening, and I recognize we need to have some additional hands and brain cells helping me out.

With respect to building an internal IT team, what's the thinking behind having technical talent inside of an agency like this one? What does that change, in terms of your relationship with technology and your capacity to work?

Burton: I think it's all about experimentation. Having technical people on staff allows an organization to do new things. I think the way most agencies work is that when they have a technical need, they don't have the technical people on staff to make it happen so instead, that need becomes larger and larger until it justifies the contract. And by then, the problem is very difficult to solve.

By having developers and designers in-house, we can constantly be addressing things as they come up. In some cases, before the businesses even know it's a problem. By doing that, we're constantly staying ahead of the curve instead of always reacting to problems that we're facing.

How do you use open source technology to accomplish your mission? What are the tools you're using now?

Willey: We're actually trying to use open source in every aspect of what we do. It's not just in software development, although that's been a big focus for us. We're trying to do it on the infrastructure side as well.

As we look at network and system monitoring, we look at the tools that help us manage the infrastructure. As I've mentioned in the past, we are 100% in the cloud today. Open source has been a big help for us in giving us the ability to manipulate those infrastructures that we have out there.

At the end of the day, we want to bring in the tools that make the most sense for the business needs. It's not about only selecting open source or having necessarily a preference for open source.

What we've seen is that over time, the open source marketplace has matured. A lot of tools that might not have been ready for prime time a year ago or two years ago are today. By bringing them into the fold, we potentially save money. We potentially have systems that we can extend. We could more easily integrate with the other things that we have inside the shop that maybe we built or maybe things that we've acquired through other means. Open source gives us a lot of flexibility because there's a lot of opportunities to do things that we might not be able to do with some proprietary software.

Can you share a couple of specific examples of open source tools that you're using and what you actually use them for within mission?

Willey: On network monitoring, for example, we're using ZFS, which is an open source monitoring tool. We've been working with Nagios as well. Nagios, we actually inherited from Treasury — and while Treasury's not necessarily known for its use of open source technologies, it uses that internally for network monitoring. Splunk is another one that we have been using for web analysis. [After the interview, Burton and Willey also shared that they built the CFPB's intranet on MediaWiki, the software that drives Wikipedia.]

Burton: On the development side, we've invested a lot in Django and WordPress. Our site is a hybrid of them. It's WordPress at its core, with Django on top of that.

In November of 2010, it was actually a few weeks before I started here, Merici [Vinton] called me and said, "Matt, what should we use for our website?"

And I said, "Well, what's it going to do?"

And she said, "At first, it's going to be a blog with a few pages."

And this website needed to be up and running by February. And there was no hosting; there was nothing. There were no developers.

So I said, "Use WordPress."

And by early February, we had our website up. I'm not sure that would have been possible if we had to go through a lengthy procurement process for something not open source.

We use a lot of jQuery. We use Linux servers. For development ops, we use Selenium and Jenkins and Git to manage our releases and source code. We actually have GitHub Enterprise, which although not open source, is very sharing-focused. It encourages sharing internally. And we're using GitHub on the public side to share our code. It's great to have the same interface internally as we're using externally.

Developers and citizens alike can go to github.com/cfpb and see code that you've released back to the public and for other federal agencies. What projects are there?

Burton: These are the ones that came up between basic building blocks. They range from code that may not strike an outside developer as that interesting but that's really useful for the government, all the way to things that we created from scratch that are very developer-focused and are going to be very useful for any developer.

On the first side of that spectrum, there's an app that we made for transit subsidy involvement. Treasury used to manage our transit subsidy balances. That involved going to a webpage that you would print out, write into with a pen and then fax to someone.

Willey: Or scan and email it.

Burton: Right. And then once you'd had your supervisor sign it, faxed it over to someone, eventually, several weeks later, you would get your benefits. We started to take over that process and the human resources office came to us and asked, "How can we do this better?"

Obviously, that should just be a web form that you type into, that will auto fill any detail it knows about you. You press submit and it goes into the database, which goes directly to the DOT [Department of Transportation]. So that's what we made. We demoed that for DOT and they really like it. USAID is also into it. It's encouraging to see that something really simple could prove really useful for other agencies.

On the other side of the spectrum, we use a lot of Django tools. As an example, we have a tool we just released through our website called "Ask CFPB." It's a Django-based question and answer tool, with a series of questions and answers.

Now, the content is managed in Django. All of the content is managed from our staging server behind the firewall. When we need to get that content, we need to get the update from staging over to production.

Before, what we had to do was pick up the entire database, copy it and them move it over to production, which was kind of a nightmare. And there was no Django tool for selectively moving data modifications.

So we sat there and we thought, "Oh, we really need something to do that because we're going to be doing a lot of that. We can't be copying the database over every time we need to correct a copy. So two of our developers developed a Django app called "Nudge." Basically, you go into a Django and if you've ever seen a Django admin, you just go into it and assess, "Hey, here's everything that's changed. What do you want to move over?"

You can pick and choose what you want to move over and, with the click of a button, it goes to production. I think that's something that every Django developer will have a use for if they have a staging server.

In a way, we were sort of surprised it didn't exist. So, we needed it. We built it. Now we're giving it back and anybody in the world can use it.

You mentioned the cloud. I know that CFPB is very associated with Treasury. Are you using Treasury's FISMA moderate cloud?

Willey: We have a mix of what I would say are private and public clouds. On the public side, we're using our own cloud environments that we have established. On the private side, we are using Treasury for some of our apps. We're slowly migrating off of treasury systems onto our own cloud infrastructure or our own cloud.

In the case of email, for example, we're looking at email as a service. So we'll be looking at Google, Microsoft and others just to see what's out there and what we might be able to use.

Why is it important for the CFPB to share code back to the public? And who else in the federal government has done something like this, aside from the folks at the White House?

Burton:: We see it the same way that we believe the rest of the open source community sees it: The only way this stuff is going to get better and become more viable is if people share. Without that, then it'll only be hobbyists. It'll only be people who build their own little personal thing. Maybe it's great. Maybe it's not. Open source gets better by the community actually contributing to it. So it's self-interest in a lot of ways. If the tools get better, then what we have available to us is, therefore, gets better. We can actually do our mission better.

Using the transit subsidy enrollment application example, it's also an opportunity for government to help itself, for one agency to help another. We've created this thing. Every federal agency has a transit subsidy program. They all need to allow people to enroll in it. Therefore, it's immediately useful to any other agency in the federal government. That's just a matter of government improving its own processes.

If one group does it, why should another group have to figure it out or have to pay lots of money to have it figured out? Why not just share it internally and then everybody benefits?

Why do you think it's taken until 2012 to have that insight actually be made into reality in terms of a policy?

Burton: I think to some degree, the tools have changed. The ability to actually do this easily is a lot better now than it was even a year or two ago. Government also traditionally lags behind the private sector in a lot of ways. I think that's changing, too. With this administration in particular, I think what we've seen is that government has started to become a little bit on parity with the private sector, including some of the thinking around how to use technology to improve business processes. That's really exciting. And I think as a result, there are a lot of great people coming in as developers and designers who want to work in the federal government because they see that change.

Willey: It's also because we're new. There are two things behind that. First, we're able to sort of craft a technology philosophy with a modern perspective. So we can, from our founding, ask "What is the right way to do this?" Other agencies, if they want to do this, have to turn around decades of culture. We don't have that burden. I think that's a big reason why we're able to do this.

The second thing is a lot of agencies don't have the intense need that we do. We have 76 projects to do. We have to use every means available to us.

We can't say, "We're not going to use a large share of the software that's available to us." That's just not an option. We have to say, "Yes, we will consider this as a commercial good, just like any other piece of proprietary software."

In terms of the broader context for technology and policy, how does open source relate to open government?

Willey: When I was working for the District, Apps for Democracy was a big contest that we did around opening data and then asking developers to write applications using that data that could then be used by anybody. We said that the next logical step was to sort of create more participatory government. And in my mind, open sourcing the projects that we do is a way of asking the citizenry to participate in the active government.

So by putting something in the public space, somebody could pick that up. Maybe not the transit subsidy enrollment project — but maybe some other project that we've put out there that's useful outside of government as well as inside of government. Somebody can pick that code up, contribute to it and then we benefit. In that way, the public is helping us make government better.

When you have conversations around open source in government, what do you say about what it means to put your code online and to have people look at it or work on it? Can you take changes that people make to the code base to improve it and then use it yourself?

Willey: Everything that we put out there will be reviewed by our security team. The goal is that, by the time it's out there, not to have any security vulnerabilities. If someone does discover a security vulnerability, however, we'll be sharing that code in a way that makes it much more likely that someone will point it out to us and maybe even provide a fix than they will exploit it because it's out there. They wouldn't be exploiting our instance of the code; they would be working with the code on Github.com.

I've seen people in government with a misperception of what open source means. They hear that it's code that anyone can contribute to. I think that they don't understand that you're controlling your own instance of it. They think that anyone can come along and just write anything into your code that they like. And, of course, it's not like that.

I think as we talk more and more about this to other agencies, we might run into that, but I think it'll be good to have strong advocates in government, especially on the security side, who can say, "No, that's not the case; it doesn't work that way."

Burton: We have a firewall between our public and private instances at Git as well. So even if somebody contributes code, that's also reviewed on the way in. We wouldn't implement it unless we made sure that, from a security perspective, the code was not malicious. We're taking those precautions as well.

I can't point to one specifically, but I know that there have been articles and studies done on the relative security of open source. I think the consensus in the industry is that the peer review process of open source actually helps from a security perspective. It's not that you have a chaos of people contributing code whenever they want to. It improves the process. It's like the thinking behind academic papers. You do peer review because it enhances the quality of the work. I think that's true for open source as well.

We actually want to create a community of peer reviewers of code within the federal government. As we talk to agencies, we want people to actually use the stuff we build. We want them to contribute to it. We actually want them to be a community. As each agency contributes things, the other agencies can actually review that code and help each other from that perspective as well.

It's actually fairly hard. As we build more projects, it's going to put a little bit of a strain on our IT security team, doing an extra level of scrutiny to make sure that the code going out is safe. But the only way to get there is to grow that pie. And I think by talking with other agencies, we'll be able to do that.

A classic open source koan is that "with many eyes, all bugs become shallow." In IT security, is it that with many eyes, all worms become shallow?

Burton: What the Department of Defense said was if someone has malicious intent and the code isn't available, they'll have some way of getting the code. But if it is available and everyone has access to it, then any vulnerabilities that are there are much more likely to be corrected than before they're exploited.

How do you see open source contributing to your ability to get insights from large amounts of data? If you're recruiting developers, can they actually make a difference in helping their fellow citizens?

Burton: It's all about recruiting. As we go out and we bring on data people and software developers, we're looking for that kind of expertise. We're looking for people that have worked with PostgreSQL. We're looking for people that have worked with Solar. We're looking for people that have worked with Hadoop, because then we can start to build that expertise in-house. Those tools are out there.

R is an interesting example. What we're finding is that as more people are coming out of academia into the professional world, they're actually used to using R in school. And then they have to come out and learn a different tool and they're actually working in the marketplace.

It's similar with the Mac versus the PC. You get people using the Mac in college — and suddenly they have to go to a Windows interface. Why impose that on them? If they're going to be extremely productive with a tool like R, why not allow that to be used?

We're starting to see, in some pockets of the bureau, push from the business side to actually use some of these tools, which is great. That's another change I think that's happened in the last couple of years.

Before, there would've been big resistance on that kind of thing. Now that we're getting pushed a little bit, we have to respond to that. We also think it's worth it that we do.

Related:

April 09 2012

The Consumer Financial Protection Bureau shares code built for the people with the people

Editor's Note: This guest post is written by Matthew Burton, the acting deputy chief information officer of the Consumer Financial Protection Bureau (@CFPB). The quiet evolution in government IT has been a long road, with many forks. In the original version of this piece, published on the CFPB's blog, Burton needed to take the time to explain what open source software is because many people in government and citizens in the country still don't understand it, unlike readers here at Radar. That's why the post below includes a short section outlining the basics of open source. — Alex Howard.


The Consumer Financial Protection Bureau (CFPB) was fortunate to be born in the digital era. We've been able to rethink many of the practices that make financial products confusing to consumers and certain regulations burdensome for businesses. We've also been able to launch the CFPB with a state-of-the-art technical infrastructure that's more stable and more cost-effective than an equivalent system was just 10 years ago.

Many of the things we're doing are new to government, which has made them difficult to achieve. But the hard part lies ahead. While our current technology is great, those of us on the CFPB's Technology & Innovation team will have failed if we're still using the same tools 10 years from now. Our goal is not to tie the Bureau to 2012's technology, but to create something that stays modern and relevant — no matter the year.

Good internal technology policies can help, especially the policy that governs our use of software source code. We are unveiling that policy today.

Source code is the set of instructions that tells software how to work. This is distinct from data, which is the content that a user inputs into the software. Unlike data, most users never see software source code; it works behind the scenes while the users interact with their data through a more intuitive, human-friendly interface.

Some software lets users modify its source code, so that they can tweak the code to achieve their own goals if the software doesn't specifically do what users want. Source code that can be freely modified and redistributed is known as "open-source software," and it has been instrumental to the CFPB's innovation efforts for a few reasons:

  • It is usually very easy to acquire, as there are no ongoing licensing fees. Just pay once, and the product is yours.
  • It keeps our data open. If we decide one day to move our website to another platform, we don't have to worry about whether the current platform is going to keep us from exporting all of our data. (Only some proprietary software keeps its data open, but all open source software does so.)
  • It lets us use tailor-made tools without having to build those tools from scratch. This lets us do things that nobody else has ever done, and do them quickly.

Until recently, the federal government was hesitant to adopt open-source software due to a perceived ambiguity around its legal status as a commercial good. In 2009, however, the Department of Defense made it clear that open source software products are on equal footing with their proprietary counterparts.

We agree, and the first section of our source code policy is unequivocal: We use open-source software, and we do so because it helps us fulfill our mission.

Open-source software works because it enables people from around the world to share their contributions with each other. The CFPB has benefited tremendously from other people's efforts, so it's only right that we give back to the community by sharing our work with others.

This brings us to the second part of our policy: When we build our own software or contract with a third party to build it for us, we will share the code with the public at no charge. Exceptions will be made when source code exposes sensitive details that would put the Bureau at risk for security breaches; but we believe that, in general, hiding source code does not make the software safer.

We're sharing our code for a few reasons:

  • First, it is the right thing to do: the Bureau will use public dollars to create the source code, so the public should have access to that creation.
  • Second, it gives the public a window into how a government agency conducts its business. Our job is to protect consumers and to regulate financial institutions, and every citizen deserves to know exactly how we perform those missions.
  • Third, code sharing makes our products better. By letting the development community propose modifications , our software will become more stable, more secure, and more powerful with less time and expense from our team. Sharing our code positions us to maintain a technological pace that would otherwise be impossible for a government agency.

The CFPB is serious about building great technology. This policy will not necessarily make that an easy job, but it will make the goal achievable.

Our policy is available in three formats: HTML, for easy access; PDF, for good presentation; and as a GitHub Gist, which will make it easy for other organizations to adopt a similar policy and will allow the public to easily track any revisions we make to the policy.

If you're a coder, keep an eye on our GitHub account. We'll be releasing code for a few projects in the coming weeks.

Related:

April 05 2012

Steep climb for National Cancer Institute toward open source collaboration

Although a lot of government agencies produce open source software, hardly any develop relationships with a community of outside programmers, testers, and other contributors. I recently spoke to John Speakman of the National Cancer Institute to learn about their crowdsourcing initiative and the barriers they've encountered.

First let's orient ourselves a bit--forgive me for dumping out a lot of abbreviations and organizational affiliations here. The NCI is part of the National Institutes of Health. Speakman is the Chief Program Officer for NCI's Center for Biomedical Informatics and Information Technology. Their major open source software initiative is the Cancer Biomedical Informatics Grid (caBIG), which supports tools for transferring and manipulating cancer research data. For example, it provides access to data classifying the carcinogenic aspects of genes (The Cancer Genome Atlas) and resources to help researchers ask questions of and visualize this data (the Cancer Molecular Analysis Portal).

Plenty of outside researchers use caBIG software, but it's a one-way street, somewhat in the way the Department of Veterans Affairs used to release its VistA software. NCI sees the advantages of a give-and-take such as the CONNECT project has achieved, through assiduous cultivation of interested outside contributors, and wants to wean its outside users away from the dependent relationship that has been all take and no give. And even the VA decided last year that a more collaborative arrangement for VistA would benefit them, thus putting the software under the guidance of an independent non-profit, the Open Source Electronic Health Record Agent (OSEHRA).

Another model is Forge.mil, which the Department of Defense set up with the help of CollabNet, the well-known organization in charge of the Subversion revision control tool. Forge.mil represents a collaboration between the DoD and private contractors, encouraging them to create shared libraries that hopefully increase each contractor's productivity, but it is not open source.

The OSEHRA model--creating an independent, non-government custodian--seems a robust solution, although it takes a lot of effort and risks failure if the organization can't create a community around the project. (Communities don't just spring into being at the snap of a bureaucrat's fingers, as many corporations have found to their regret.) In the case of CONNECT, the independent Alembic Foundation stepped in to fill the gap after a lawsuit stalled CONNECT's development within the government. According to Alembic co-founder David Riley, with the contract issues resolved, CONNECT's original sponsor--the Office of the National Coordinator--is spinning off CONNECT to a private sector, open source entity, and work is underway to merge the two baselines.

Whether an agency manages its own project or spins off management, it has to invest a lot of work to turn an internal project into one that appeals to outside developers. This burden has been discovered by many private corporations as well as public entities. Tasks include:

  • Setting up public repositories for code and data.

  • Creating a clean software package with good version control that make downloading and uploading simple.

  • Possibly adding an API to encourage third-party plugins, an effort that may require a good deal of refactoring and a definition of clear interfaces.

  • Substantially adding to the documentation.

  • General purging of internal code and data (sometimes even passwords!) that get in the way of general use.

Companies and institutions have also learned that "build it and they will come" doesn't usually work. An open source or open data initiative must be promoted vigorously, usually with challenges and competitions such as the Department of Health and Human Services offer in their annual Health Data Initiative forums (a.k.a datapaloozas).

With these considerations in mind, the NCI decided in the summer of 2011 to start looking for guidance and potential collaborators. Here, laws designed long ago to combat cronyism put up barriers. The NCI was not allowed to contact anyone it wanted out of the blue. Instead, it has to issue a Request for Information and talk to people who responded. Although the RFI went online, it obviously wasn't widely seen. After all, do you regularly look for RFIs and RFPs from government agencies? If so, I can safely guess that you're paid by a large company or lobbying agency to follow a particular area of interest.

RFIs and RFPs are released as a gesture toward transparency, but in reality they just make it easier for the usual crowd of established contractors and lobbyists to build on the relationships they already have with agencies. And true to form, the NCI received only a limited set of responses, frustrated in their attempts to talk to new actors with the expertise they needed for their open source efforts.

And because the RFI had to allow a limited time window for responses, there is no point in responding to it now.

Still, Speakman and his colleagues are educating themselves and meeting with stakeholders. Cancer research is a hot topic drawing zealous attention from many academic and commercial entities, and they're hungry for data. Already, the NCI is encouraged by the initial positive response from the cancer informatics community, many of whom are eager to see the caBIG software deposited in an open repository like GitHub right away. Luckily, HHS has already negotiated terms of service with GitHub and SourceForge, removing at least one important barrier to entry. The NCI is packaging its first tool (a laboratory information management system called caLIMS) for deposit into a public repository. So I'm hoping the NCI is too caBIG to fail.

April 01 2012

What is smart disclosure?

Citizens generate an enormous amount of economically valuable data through interactions with with companies and government. Earlier this year, a report from the World Economic Forum and McKinsey Consulting described the emergence of personal data as of a new asset class." The value created from such data does not , however, always go to the benefit of consumers, particularly when third parties collect it, separating people from their personal data.

The emergence of new technologies and government policies has provided an opportunity to both empower consumers and create new markets from "smarter disclosure" of this personal data. Smart disclosure is when a private company or government agency provides a person with periodic access to his or her own data in open formats that enable them to easily put the data to use. Specifically, smart disclosure refers to the timely release of data in standardized, machine readable formats in ways that enable consumers to make better decisions about finance, healthcare, energy or other contexts.

Smart disclosure is "a new tool that helps provide consumers with greater access to the information they need to make informed choices," wrote Cass Sunstein, the U.S. administrator of the White House Office of Information and Regulatory Affairs (OIRA), in a post on smart disclosure on the White House blog. Sunstein delivered a keynote address at the White House Summit on smart disclosure at the U.S. National Archives on Friday. He authored a memorandum providing  guidance on smart disclosure guidance from OIRA in September 2011.

Smart disclosure is part of the final United States National Action Plan for its participation in the Open Government Partnership." Speaking at the launch of the Open Government Partnership in New York City last September, the president specifically referred to the role of smart disclosure in the United States:

"We’ve developed new tools -- called 'smart disclosures' -- so that the data we make public can help people make health care choices, help small businesses innovate, and help scientists achieve new breakthroughs," said President Obama. "We’ve been promoting greater disclosure of government information, empowering citizens with new ways to participate in their democracy," said President Obama. "We are releasing more data in usable forms on health and safety and the environment, because information is power, and helping people make informed decisions and entrepreneurs turn data into new products, they create new jobs."

In the months since the announcement, the U.S. National Science and Technology Council established a smart disclosure task force dedicated to promoting better policies and implementation across government.

"In many contexts, the federal government uses disclosure as a way to ensure that consumers know what they are purchasing and are able to compare alternatives," wrote Sunstein at the White House blog. "Consider nutrition facts labels, the newly designed automobile fuel economy labels, and ChooseMyPlate.gov.  Modern technologies are giving rise to a series of new possibilities for promoting informed decisions."

Smart disclosure is a "case of the Administration asking agencies to focus on making available high value data (as distinct from traditional transparency and accountability data) for purposes other than decreasing corruption in government," wrote New York Law School professor Beth Noveck, the former U.S. deputy chief technology officer for open government, in an email. "It starts from the premise that consumers, when given access to information and useful decision tools built by third parties using that information, can self-regulate and stand on a more level playing field with companies who otherwise seek to obfuscate." The choice of Todd Park as United States CTO also sends a message about the importance of smart disclosure to the administration, she said.

The United Kingdom's “midata” smart disclosure initiative is an important smart disclosure case study outside of the United States. Progress there has come in large part because the UK has a privacy law that gives citizens the right to access their personal data held by private companies, unlike the United States. In the UK, however, companies have been complying with the law in a way that did not realize the real potential value of that right to data, which is to say that a citizen could request personal data and it would arrive the mail weeks later at a cost of a few dozen pounds. The UK government has launched a voluntary public-private partnership to enable companies to comply with the law by making the data available online in open formats. The recent introduction of the Consumer Privacy Bill of Rights from the White House and Privacy Report from the FTC suggests that such rights to personal data ownership might be negotiated, in principle, much as a right to credit reports have been in the past.

Four categories of smart disclosure

One of the most powerful versions of smart disclosure is when data on products or services (including pricing algorithms, quality, and features) is combined with personal data (like customer usage history, credit score, health, energy and education data) into "choice engines" (like search engines, interactive maps or mobile applications) that enable consumers to make better decisions in context, at the point of a buying or contractual decision. There are four broad categories where smart disclosure applies:

  1. When government releases data about products or services. For instance, when the Department of Health and Human Services releases hospital quality ratings, the Security and Exchange Commission releases public company financial filings in machine-readable formats at XBLR.SEC.gov, or the Department of Education puts data about more than 7,000 institutions online in a College Navigator for prospective students.
  2. When government releases personal data about a citizen. For instance, when the Department of Veterans Affairs gives veterans access to health records using at the "Blue Button" or the IRS provides citizens with online access to their electronic tax transcript. The work of BrightScope liberating financial advisor data and 401(k) data has been an early signal of how data drives the innovation economy.
  3. When a private company releases information about products or services in machine readable formats. Entrepreneurs can then use that data to empower consumers. For instance, both Billshrink.com and Hello Wallet may enhance consumer finance decisions.
  4. When a private company releases personal data about usage to a citizen. For instance, when a power utility company provides a household access to its energy usage data through the Green Button or when banks allowing customers to download their transaction histories in a machine readable format to use at Mint.com or similar services. As with the Blue Button for healthcare data and consumer finance, the White House asserts that providing energy consumers with secure access to information about energy usage will increase innovation in the sector and empower citizens with more information.

An expanding colorwheel of buttons

Should smart disclosure initiatives continue to gather steam, citizens could see “Blue Button”-like and "Green Button"-like solutions for every kind of data government or industry collects about citizens.  For example, the Department of Defense has military training and experience records. Social Security and the Internal Revenue Service have the historical financial history of citizens, such as earnings and income. The Department of Veterans Affairs and Centers for Medicare and Medicaid Services have personal health records.

More "Green Button"-like mechanisms could enable secure, private access to private industry collects about citizen services. The latter could includes mobile phone bills, credit card fees, mortgage disclosures, mutual fund fee and more, except where there are legal restrictions, as for national security reasons.

Earlier this year, influential venture capitalist Fred Wilson encouraged entrepreneurs and VCs to get behind open data. Writing on his widely read blog, Wilson urged developers to adopt the Green Button.

"This is the kind of innovation that gets me excited," Wilson wrote. "The Green Button is like OAuth for energy data. It is a simple standard that the utilities can implement on one side and web/mobile developers can implement on the other side. And the result is a ton of information sharing about energy consumption and in all likelihood energy savings that result from more informed consumers.

When citizens gain access to data and put it to work, they can tap it to make better choices about everything from finance to healthcare to real estate, much in the same way that Web applications like Hipmunk and Zillow let consumers make more informed decisions.

"I'm a big fan of simplicity and open standards to unleash a lot of innovation," wrote Wilson. "APIs and open data aren't always simple concepts for end users. Green Buttons and Blue Buttons are pretty simple concepts that most consumers will understand. I'm hoping we soon see Yellow Buttons, Red Buttons, Purple Buttons, and Orange Buttons too. Let's get behind these open data initiatives. Let's build them into our apps. And let's pressure our hospitals, utilities, and other institutions to support them."

The next generation of open data is personal data, wrote open government analyst David Eaves this month:

I would love to see the blue button and green button initiative spread to companies and jurisdictions outside the United States. There is no reason why for example there cannot be Blue Buttons on the Provincial Health Care website in Canada, or the UK. Nor is there any reason why provincial energy corporations like BC Hydro or Bullfrog Energy (there's a progressive company that would get this) couldn't implement the Green Button. Doing so would enable Canadian software developers to create applications that could use this data and help citizens and tap into the US market. Conversely, Canadian citizens could tap into applications created in the US.

The opportunity here is huge. Not only could this revolutionize citizens access to their own health and energy consumption data, it would reduce the costs of sharing health care records, which in turn could potentially create savings for the industry at large.

Data drives consumer finance innovation

Despite recent headlines about the Green Button and the household energy data market, the biggest US smart disclosure story of this type is currently consumer finance, where there is already significant private sector activity going on today.

For instance, if a consumer visits Billshrink.com, you can get personalized recommendations for a cheaper cell phone plan based on your calling history. Mint.com will make specific recommendations on how to save (and alternative products to use) based on an analysis of the accounts it is pulling data from. Hello Wallet is enabled by smart disclosure by banks and government data. The sector's success hints at the innovation that's possible when people get open, portable access to their personal data in a a consumer market of sufficient size and value to attract entrepreneurial activity.

Such innovation is enabled in part because entrepreneurs and developers can go directly to data aggregation intermediaries like Yodlee or CashEdge and license the data, meaning that they do not have to strike deals directly with each of the private companies or build their own screen scraping technology, although some do go it alone.

"How do people actually make decisions?  How can data help improve those decisions in complex markets?  Research questions like these in behavioral economics are priorities for both the Russell Sage Foundation and the Alfred P. Sloan Foundation," said Daniel Goroff, a Sloan Program Director, in an interview yesterday.  "That's why we are launching a 'Smart Disclosure Research and Demonstration Design Competition.'  If you have ideas and want to win a prize,  please send Innocentive.com a short essay.  Even if you are not in a position to carry out the work, we are especially interested in finding and funding projects that can help measure the costs and benefits of existing or novel 'choice engines.'" 

What is the future of smart disclosure?

This kind of vibrant innovation could spread to many other sectors, like energy, health, education, telecommunication, food and nutrition, if relevant data were liberated. The Green Button is an early signal in this area, with the potential to spread to 27 million households around the United States. The Blue Button, with over 800,000 current users, is spreading to private health plans like Aetna and Walgreens, with the potential to spread to 21 million users.

Despite an increasingly number of powerful tools that enable data journalists and scientists to interrogate data, many of even the most literate consumers do not look at data themselves, particularly if it is in machine-readable, as opposed to human-readable formats. Instead, they digest it from ratings agencies, consumer reports and guides to the best services or products in a given area. Increasingly, entrepreneurs are combining data with applications, algorithms and improved user interfaces to provide consumers with "choice engines."

As Tim O'Reilly outlined in his keynote speech yesterday, the future of smart disclosure includes more than quarterly data disclosure from the SEC or banks. If you're really lining up with the future, you have to think about real-time data and real-time data systems, he said. Tim outlined 10 key lessons his presentation, an annotated version of which is embedded below.

The Future of Smart Disclosure (pdf)
View more presentations from Tim O'Reilly

When released through smart disclosure, data resembles a classic "public good" in a broader economic sense. Disclosures of such open data in a useful format are currently under-produced by the marketplace, suggesting a potential role for government in the facilitation of its release. Generally, consumers do not have access to it today.

Well over a century ago, President Lincoln said that "the legitimate object of government is to do for the people what needs to be done, but which they cannot by individual effort do at all, or do so well, for themselves." The thesis behind smart disclosure in the 21st century is that when consumers have access to that personal data and the market creates new tools to put to work, citizens will be empowered make economic, education and lifestyle choices that enable to them to live healthier, wealthier, and -- in the most aspirational sense -- happier lives.

"Moving the government into the 21st century should be applauded," wrote Richard Thaler, an economics professor at the University of Chicago, in the New York Times last year. In a time when so many citizens are struggling with economic woes, unemployment and the high costs of energy, education and healthcare, better tools that help them invest and benefit from personal data are sorely needed..

March 15 2012

Strata Week: Infographics for all

Here are some of the data stories that caught my attention this week.

More infographics incoming, thanks to Visual.ly Create

The visualization site Visual.ly launched a new tool this week that helps users create their own infographics. Aptly called Visual.ly Create, the new feature lets people take publicly available datasets (such as information from a Twitter hashtag), select a template, and publish their own infographics.

Visual.ly infographic of the #strataconf tag
Segment from a Visual.ly Create infographic of the #stratconf hashtag.

As GigaOm's Derrick Harris observes, it's fairly easy to spot the limitations with this service — in the data you can use, in the templates that are available, and in the visualizations that are created. But after talking to Visual.ly's co-founder and Chief Content Officer Lee Sherman about some "serious customization options" that are in the works, Harris wonders if a tool like this could be something to spawn interest in data science:

"The problem is that we need more people with math skills to meet growing employer demand for data scientists and data analysts. But how do you get started caring about data in the first place when the barriers are so high? Really working with data requires a deep understanding of both math and statistics, and Excel isn't exactly a barrel of monkeys (nor are the charts it creates)."

Could Visual.ly be an on-ramp for more folks to start caring about and playing with data?

San Francisco upgrades its open data initiative

Late last week, San Francisco Mayor Ed Lee unveiled the new data.SFgov.org, a cloud-based open data website that will replace DataSF.org, one of the earliest examples of civic open data initiatives.

San Francisco Data banner

"By making City data more accessible to the public secures San Francisco's future as the world's first 2.0 City," said Lee in an announcement. "It's only natural that we move our Open Data platform to the cloud and adopt modern open interface to facilitate that flow and access to information and develop better tools to enhance City services."

The city's Chief Innovation Officer Jay Nath told TechCrunch that the update to the website expands access to information while saving the city money.

The new site contains some 175 datasets, including map-based crime data, active business listings, and various financial datasets. It's powered by the Seattle-based data startup Socrata.

The personal analytics of Stephen Wolfram

"One day I'm sure everyone will routinely collect all sorts of data about themselves," writes Mathematica and Wolfram Alpha creator Stephen Wolfram. "But because I've been interested in data for a very long time, I started doing this long ago. I actually assumed lots of other people were doing it too, but apparently they were not. And so now I have what is probably one of the world's largest collections of personal data."

And what a fascinating collection of data it is, including emails received and sent, phone calls made, calendar events planned, keystrokes made, and steps taken. Through this, you can see Wolfram's sleep, social, and work patterns, and even how various chapters of his book and Mathematica projects took shape.

"The overall pattern is fairly clear," Wolfram writes. "It's meetings and collaborative work during the day, a dinnertime break, more meetings and collaborative work, and then in the later evening more work on my own. I have to say that looking at all this data, I am struck by how shockingly regular many aspects of it are. But in general, I am happy to see it. For my consistent experience has been that the more routine I can make the basic practical aspects of my life, the more I am able to be energetic — and spontaneous — about intellectual and other things."

Fluent Conference: JavaScript & Beyond — Explore the changing worlds of JavaScript & HTML5 at the O'Reilly Fluent Conference (May 29 - 31 in San Francisco, Calif.).

Save 20% on registration with the code RADAR20

Got data news?

Feel free to email me.

Related:

March 09 2012

OK, I Admit It. I have a mancrush on the new Federal CTO, Todd Park

I couldn't be more delighted by the announcement today that Todd Park has been named the new Chief Technology Officer for the United States, replacing Aneesh Chopra.

I first met Todd in 2008 at the urging of Mitch Kapor, who thought that Todd was the best exemplar in the healthcare world of my ideas about the power of data to transform business and society, and that I would find him to be a kindred spirit. And so it was. My lunch with Todd turned into a multi-hour brainstorm as we walked around the cliffs of Lands End in San Francisco. Todd was on fire with ideas about how to change healthcare, and the opportunity of the new job he'd just accepted, to become the CTO at HHS.

Subsequently, I helped Todd to organize a series of workshops and conferences at HHS to plan and execute their open data strategy. I met with Todd and told him how important it was not just to make data public and hope developers would come, but to actually do developer evangelism. I told him how various tech companies ran their developer programs, including some stories about Amazon's rollout of AWS: they had first held a small, private event to which they invited people and companies who'd been unofficially hacking on their data, told them their plans, and recruited them to build apps against the new APIs that were planned. Then, when they made their public announcement, they had cool apps to show, not just good intentions.

Todd immediately grasped the blueprint, and executed with astonishing speed. Before long, he held a workshop for an invited group of developers, entrepreneurs and health data wonks to map out useful data that could be liberated, and useful applications that could be built with it. Six months later, he held a public conference to showcase the 40-odd applications that had been developed. Now in its third year, the event has grown into what Todd calls the Health Datapalooza. As noted on GigaOm, the event has already led to several venture backed startup. (Applications are open for startups to be showcased at this year's event, June 5-6 in Washington D.C.)

Since I introduced him to Eric Ries, author of The Lean Startup, Todd has been introducing the methodology to Washington, insisting on programs that can show real results (learning and pivots) in only 90 days. He just knows how to make stuff happen.

Todd is also an incredibly inspiring speaker. At my various Gov 2.0 events, he routinely got a standing ovation. His enthusiasm, insight, and optimism are infectious.

Todd Park

When Todd Park talks, I listen. (Photo by James Duncan Davidson from the 2010 Gov 2.0 Summit. http://www.flickr.com/photos/oreillyconf/4967787323/in/photostream/)

Many will ask about Todd's technical credentials. After all, he is trained as a healthcare economist, not an engineer or scientist. There are three good answers:

1. Economists are playing an incredibly important role at today's technology companies, as extracting meaning and monetization from massive amounts of data become one of the key levers of success and competitive advantage. (Think Hal Varian at Google, working to optimize the ad auction.) Healthcare in particular is one of those areas where science, human factors, and economics are on a collision course, but virtually every sector of our nation is undergoing a transformation as a result of intelligence derived from data analysis. That's why I put Todd on my list for Forbes.com of the world's most important data scientists.

2. Todd is an enormously successful technology entrepreneur, with two brilliant companies - Athenahealth and Castlight Health - under his belt. In each case, he was able to succeed by understanding the power of data to transform an industry.

3. He's an amazing learner. In a 1998 interview describing the founding of Athena Health, he described his leadership philosophy: "Put enough of an idea together to inspire a team of really good people to jump with you into a general zone like medical practices. Then, just learn as much as you possibly can and what you really can do to be helpful and then act against that opportunity. No question."

Todd is one of the most remarkable people I've ever met, in a career filled with remarkable people. As Alex Howard notes, he should be an inspiration for more "retired" tech entrepreneurs to go into government. This is a guy who could do literally anything he put his mind to, and he's taking up the challenge of making our government smarter about technology. I want to put out a request to all my friends in the technology world: if Todd calls you and asks you for help, please take the call, and do whatever he asks.

March 01 2012

In the age of big data, data journalism has profound importance for society

The promise of data journalism was a strong theme throughout the National Institute for Computer-Assisted Reporting's (NICAR) 2012 conference. In 2012, making sense of big data through narrative and context, particularly unstructured data, will be a central goal for data scientists around the world, whether they work in newsrooms, Wall Street or Silicon Valley. Notably, that goal will be substantially enabled by a growing set of common tools, whether they're employed by government technologists opening Chicago, healthcare technologists or newsroom developers.

At NICAR 2012, you could literally see the code underpinning the future of journalism written - or at least projected - on the walls.

"The energy level was incredible," said David Herzog, associate professor for print and digital news at the Missouri School of Journalism, in an email interview after NICAR. "I didn't see participants wringing their hands and worrying about the future of journalism. They're too busy building it."

Just as open civic software is increasingly baked into government, open source is playing a pivotal role in the new data journalism.

"Free and open-source tools dominated," said Herzog. "It's clear from the panels and hands-on classes that free and open source tools have eliminated the barrier to entry in terms of many software costs."

While many developers are agnostic with respect to which tools they use to get a job done, the people who are building and sharing tools for data journalism are often doing it with open source code. As Dan Sinker, the head of the Knight-Mozilla News Technology Partnership for Mozilla, wrote afterwards, journo-coders took NICAR 12 "to a whole new level."

While some of that open source development was definitely driven by the requirements of the Knight News Challenge, which funded the PANDA and Overview projects, there's also a collaborative spirit in evidence throughout this community.

This is a group of people who are fiercely committed to "showing your work" -- and for newsroom developers, that means sharing your code. To put it another way, code, don't tell. Sessions on Python, Django, mapping, Google Refine and Google Fusion tables were packed at NICAR 12.

No, this is not your father's computer-assisted reporting.

"I thought this stacked up as the best NICAR conference since the first in 1993," said Herzog. "It's always been tough to choose from the menu of panels, demos and hands-on classes at NICAR conferences. But I thought there was an abundance of great, informative, sessions put on by the participants. Also, I think NICAR offered a good range of options for newbies and experts alike. For instance, attendees could learn how to map using Google Fusion tables on the beginner's end, or PostGIS and qGIS at the advanced level. Harvesting data through web scraping has become an ever bigger deal for data journalists. At the same time, it's getting easier for folks with no or little programming chops to scrape using tools like spreadsheets, Google Refine and ScraperWiki. "

On the history of NICAR

According to IRE, NICAR was founded in 1989. Since its founding, the Institute has trained thousands of journalists how to find, collect and public electronic information.

Today, "the NICAR conference helps journalists, hackers, and developers figure out best practices, best methods,and best digital tools for doing journalism that involves data analysis and classic reporting in the field," said Brant Houston, former executive director of Investigative Reporters and Editors, in an email interview. "The NICAR conference also obviously includes investigative journalism and the standards for data integrity and credibility."

"I believe the first IRE-sponsored [conference] was in 1993 in Raleigh, when a few reporters were trying to acquire and learn to use spreadsheets, database managers, etc. on newly open electronic records," said Sarah Cohen, the Knight professor of the practice of journalism and public policy at Duke University, in an email interview. "Elliott Jaspin was going around the country teaching reporters how to get data off of 9-track tapes. There really was no public Internet. At the time, it was really, really hard to use the new PC's, and a few reporters were trying to find new stories. The famous ones had been Elliott's school bus drivers who had drunk driving records and the Atlanta Color of Money series on redlining."

"St. Louis was my 10th NICAR conference," said Anthony DeBarros, the senior database editor at USA Today, in an email interview. "My first was in 1999 in Boston. The conference is a place where news nerds can gather and remind themselves that they're not alone in their love of numbers, data analysis, writing code and finding great stories by poring over columns in a spreadsheet. It serves as an important training vehicle for journalists getting started with data in the newsroom, and it's always kept journalists apprised of technological developments that offer new ways of finding and telling stories. At the same time, its connection to IRE keeps it firmly rooted in the best aspects of investigative reporting -- digging up stories that serve the public good.

Baby, you can drive my CAR

Long before we started talking about "data journalism," the practice of computer-assisted reporting (CAR) was growing around the world.

"The practice of CAR has changed over time as the tools and environment in the digital world has changed," said Houston. "So it began in the time of mainframes in the late 60s and then moved onto PCs (which increased speed and flexibility of analysis and presentation) and then moved onto the Web, which accelerated the ability to gather, analyze and present data. The basic goals have remained the same. To sift through data and make sense of it, often with social science methods. CAR tends to be an "umbrella" term - one that includes precision journalism and data driven journalism and any methodology that makes sense of date such as visualization and effective presentations of data."

On one level, CAR is still around because the journalism world hasn't coined a good term to use instead.

"Computer-assisted reporting" is an antiquated term, but most people who practice it have recognized that for years," said DeBarros. "It sticks around because no one has yet to come up with a dynamite replacement. Phil Meyer, the godfather of the movement, wrote a seminal book called "Precision Journalism, and that term is a good one to describe that segment of CAR that deals with statistics and the use of social science methods in newsgathering. As an umbrella term, data journalism seems to be the best description at the moment, probably because it adequately covers most of the areas that CAR has become -- from traditional data-driven reporting to the newer category of news applications."

The most significant shift in CAR may well be when all of those computers being used for reporting were connected through the network of networks in the 1990s.

"It may seem obvious, but of course the Internet changed it all, and for a while it got smushed in with trying to learn how to navigate the Internet for stories, and how to download data," said Cohen. "Then there was a stage when everyone was building internal intranets to deliver public records inside newsrooms to help find people on deadline, etc. So for much of the time, it was focused on reporting, not publishing or presentation. Now the data journalism folks have emerged from the other direction: People who are using data obtained through APIs who often skip the reporting side, and use the same techniques to deliver unfiltered information to their readers in an easier format the the government is giving us. But I think it's starting to come back together -- the so-called data journalists are getting more interested in reporting, and the more traditional CAR reporters are interested in getting their stories on the web in more interesting ways.

Whatever you call it, the goals are still the same.

"CAR has always been about using data to find and tell stories," said DeBarros. "And it still is. What has changed in recent years is more emphasis toward online presentations (interactive maps and applications) and the coding skills required to produce them (JavaScript, HTML/CSS, Django, Ruby on Rails). Earlier NICAR conferences revolved much more around the best stories of the year and how to use data techniques to cover particular topics and beats. That's still in place. But more recently, the conference and the practice has widened to include much more coding and presentation topics. That reflects the state of media -- every newsroom is working overtime to make its content work well on the web, on mobile, and on apps, and data journalists tend to be forward thinkers so it's not surprising that the conference would expand to include those topics."

What stood out at NICAR 2012?

The tools and tactics on display at NICAR were enough to convince Tyler Dukes at Duke to write that NICAR taught me I know nothing. Browse through the tools, slides and links from NICAR 2012 curated by Chrys Wu to get a sense of just how much is out there. The big theme, however, without a doubt, was data.

"Data really is the meat of the conference, and a quick scan of the schedule shows there were tons of sessions on all kinds of data topics, from the Census to healthcare to crime to education," said DeBarros.

What I saw everywhere at NICAR was interest not simply in what data was out there, however, but how to get it and put it to use, from finding stories and source to providing empirical evidence to back up other reporting to telling stories with maps and visualizations.

"A major theme was the analysis of data (using spreadsheets, data managers, GIS) that gives journalism more credibility by seeing patterns, trends and outliers," said Houston. "Other themes included collection and analysis of social media, visualization of data, planning and organizing stories based on data analysis, programming for web scraping (data collection from the Web) and mashing up various Web programs."

"Harvesting data through web scraping has become an ever bigger deal for data journalists," said Herzog. "At the same time, it's getting easier for folks with no or little programming chops to scrape using tools like spreadsheets, Google Refine and ScraperWiki. That said, another message for me was how important programming has become. No, not all journalists or even data journalists need to learn programming. But as Rich Gordon at Medill has said, all journalists should have an appreciation and understanding of what it can do."

Cohen similarly pointed to data, specifically its form. "The theme that I saw this year was a focus on unstructured rather than structured data," she said. "For a long time, we've been hammering governments to give us 'data' in columns and rows. I think we're increasingly seeing that stories just as likely (if not more likely) come from the unstructured information that comes from documents, audio and video, tweets, other social media -- from government and non-government sources. The other theme is that there is a lot more collaboration, openness and sharing among competing news organizations. (Witness PANDA and census.ire.org and the New York Times campaign finance API). But it only goes so far -- you don't see ProPublica sharing the 40+ states' medical licensure data that Dan scraped with everyone else. (I have to admit, though, I haven't asked him to share.) IRE has always been about sharing techniques and tools --- now we're actually sharing source material."

While data dominated NICAR 12, other trends mattered as well, from open mapping tools to macroeconomic trends in the media industry. "A lot of newsrooms are grappling with rapid change in mapping technology," said DeBarros. "Many of us for years did quite well with Flash, but the lack of support for Flash on iPad has fueled exploration into maps built on open source technologies that work across a range of online environments. Many newsrooms are grappling with this, and the number of mapping sessions at the conference reflected this."

There's also serious context to the interest in developing data journalism skills. More than 166 U.S. newspapers have stopped putting out a print edition or closed down altogether since 2008, resulting in more than 35,000 job losses or buyouts in the newspaper industry since 2007.

"The economic slump and the fundamental change in the print publishing business means that journalists are more aware of the business side than ever," said DeBarros, "and I think the conference reflected that more than in the past. There was a great session on turning your good work into money by Chase Davis and Matt Wynn, for example. I was on a panel talking about the business reasons for starting APIs. The general unease most journalists feel knowing that our industry still faces difficult economic times. Watching a new generation of journalists come into the fold has been exciting."

One notable aspect of that next generation of data journalists is that it does not appear likely to look or sound the same as the newsrooms of the 20th century.

"This was the most diverse conference that I can remember," said Herzog. "I saw more women and people of color than ever before. We had data journalists from many countries: Korea, the U.K., Serbia, Germany, Canada, Latin America, Denmark, Sweden and more. Also, the conference is much more diverse in terms of professional skills and interests. Web 2.0 entrepreneurs, programmers, open data advocates, data visualization specialists, educators, and app builders mixed with traditional CAR jockeys. I also saw a younger crowd, a new generation of data journalists who are moving into the fold. For many of the participants, this was their first conference."

What problems does data journalism face?

While the tools are improving, there are still immense challenges ahead, from the technology itself to education to resources in newsroom. "A major unsolved challenge is making the analysis of unstructured data easier and faster to do. Those working on this include myself, Sarah Cohen, the DocumentCloud team, teams at AP and Chicago Tribune and many others," said Houston.

There's also the matter of improving the level of fundamental numeracy in the media. "This is going to sound basic, but there are still far too many journalists around the world who cannot open an Excel spreadsheet, sort the values or write an equation to determine percentage change," said DeBarros, "and that includes a large number of the college interns I see year after year, which really scares me. Journalism programs need to step up and understand that we live in a data-rich society, and math skills and basic data analysis skills are highly relevant to journalism. The 400+ journalists at NICAR still represent something of an outlier in the industry, and that has to change if journalism is going to remain relevant in an information-based culture."

In that context, Cohen has high hopes for a new project, the Reporters Lab. "The big unsolved problem to me is that it's still just too hard to use "data" writ large," she said. " You might have seen 4 or 5 panels on how to scrape data [at NICAR]. People have to write one-off computer programs using Python or Ruby or something to scrape a site, rather than use a tool like Kapow, because newsrooms can't (and never have) invest that kind of money into something that really isn't mission-critical. I think Kapow and its cousins cost $20,000-$40,000 a year. Our project to find those kinds of holes and create, commission or adapt free, open source tools for regular reporters to use, not the data journalist skilled in programming. We're building communities of people who want to work on these problems."

What role does data journalism play in open government?

On the third day of NICAR 2012, I presented upon "open data journalism, which, to paraphrase Jonathan Stray, I'd define as obtaining, reporting upon, curating and publishing open data in the public interest. As someone who's been following the open government movement closely for a few years now, the parallels to what civic hackers are doing and what this community of data journalists are working on is unescapable. They're focused on putting data to work for the public good, whether it's in the public interest, for profit, in the service of civic utility or, in the biggest crossover, government accountability.

To do so will require that data journalists and civic coders alike apply the powerful emerging tools in the newsroom stack to the explosion of digital bits and bytes from government, business and our fellow citizens.

The need for data journalism, in the context of massive amounts of government data being released, could not any more timely, particularly given persistent quality issues.

"I can't find any downsides of more data rather than less," said Cohen, "but I worry about a few things."

First, emphasized Cohen, there's an issue of whether data is created open from the beginning -- and the consequences of 'sanitizing' it before release. "The demand for structured, nicely scrubbed data for the purpose of building apps can result in fake records rather than real records being released. USASpending.gov is a good example of that -- we don't get access to the actual spending records like invoices and purchase orders that agencies use, or the systems they use to actually do their business. Instead we have a side system whose only purpose is to make it public, so it's not a high priority inside agencies and there's no natural audit trail on it. It's not used to spend money, so mistakes aren't likely to be caught."

Second, there's the question of whether information relevant to an investigation has been scrubbed for release. "We get the lowest common denominator of information," she said. "There are a lot of records used for accountability that depend on our ability to see personally identifiable information (as opposed to private or personal information, which isn't the same thing). For instance, if you want to do stories on how farm subsidies are paid, you kind of have to know who gets them. If you want to do something on fraud in FEMA claims, you have to be able to find the people and businesses who get the aid. But when it gets pushed out as open government data, it often gets scrubbed of important details and then we have a harder time getting them under FOIA because the agencies say the records are already public."

To address those two issues, Cohen recommends getting more source documents, as a historian would. "I think what we can do is to push harder for actual records, and to not settle for what the White House wants to give us," she said. "We also have to get better at using records that aren't held in nice, neat forms -- they're not born that way, and we should get better at using records in whatever form they exist."

Why do data journalism and news apps matter?

Given the economic and technological context, it might seem like the case for data journalism should make itself. "CAR, data journalism, precision journalism, and news apps all are crucial to journalism -- and the future of journalism -- because they make sense of the tremendous amounts of data," said Houston, "so that people can understand the world and make sensible decisions and policies."

Given the reality that those practicing data journalism remain a tiny percentage of the world's media, however, there's clearly still a need for its foremost practitioners to show why it matters, in terms of impact.

"We're living in a data-driven culture," said DeBarros. "A data-savvy journalist can use the Twitter API or a spreadsheet to find news as readily as he or she can use the telephone to call a source. Not only that, we serve many readers who are accustomed to dealing with data every day -- accountants, educators, researchers, marketers. If we're going to capture their attention, we need to speak the language of data with authority. And they are smart enough to know whether we've done our research correctly or not. As for news apps, they're important because -- when done right -- they can make large amounts of data easily understood and relevant to each person using them."

New tools, same rules

While the platforms and toolkits for journalism are evolving and the sources of data are exploding, many things haven't changed. For one, the ethics that guide the choices of the profession remain central to the journalism of the 21st century, as the new NPR's new ethics guide makes clear.

Whether news developers are rendering data in real-time, validating data in the real world, or improving news coverage with data, good data journalism still must tell a story. And as Erika Owens reflected in her own blog after NICAR, looking back upon a group field trip to the marvelous City Museum in St. Louis, journalism is also joyous, whether one is "crafting the perfect lede or slaying an infuriating bug."

Whether the tool is a smartphone, notebook or dataset, these tools must also extend investigative reporting, as the Los Angeles Times Doug Smith emphasized to me at the NICAR conference.

If text is the next frontier in data journalism, harnessing the power of big data, it will be in the service of telling stories more effectively. Digital journalism and digital humanities are merging in the service of more informed society.

Profiles of the data journalist

To learn more about the people who are redefining the practice computer-assisted reporting, in some cases, building the newsroom stack for the 21st century, Radar conducted a series of email interviews with data journalists during the 2012 NICAR Conference. The first two of the series are linked below:

February 13 2012

Open innovation works in the public sector, say federal CTOs

President Barack Obama named Aneesh Chopra as the nation’s first chief technology officer in April 2009. In the nearly three years since, he was a tireless, passionate advocate for applying technology to make government and society work better. If you're not familiar with the work of the nation's first CTO, make sure to read Nancy Scola's extended "exit interview" with Aneesh Chopra at the Atlantic. where he was clear about his role: "As an advisor to the president, I have three main responsibilities," he said: "To make sure he has the best information to make the right policy calls for the country, which is a question of my judgment."

On his last day at the White House, Chopra released an "open innovator's toolkit" that highlights twenty different case studies in how he, his staff and his fellow chief technology officers at federal agencies have been trying to stimulate innovation in government.

Chopra announced the toolkit last week at a forum on open innovation at the Center for American Progress in Washington. The forum was moderated by former Virginia congressman Tom Perriello, who currently serves as counselor for policy to the Center for American Progress and featured Todd Park, U.S. Department of Health and Human Services CTO, Peter Levin, senior advisor to the Veterans Affair Secretary and U.S. Department of Veterans Affairs CTO, and Chris Vein, deputy U.S. CTO for government innovation at the White House Office of Science and Technology Policy. Video of the event is embedded below:

An open innovator's toolkit

"Today, we are unveiling 20 specific techniques that are in of themselves interesting and useful -- but they speak to this broader movement of how we are shifting, in many ways, or expanding upon the traditional policy levers of government," said Chopra in his remarks on Wednesday. In the interview with the Atlantic and in last week's forum, Chopra laid out four pillars in the administration's approach to open innovation:

  • Moving beyond providing public sector data by request to publishing machine-readable open data by default
  • Engaging with the public not simply as a regulator but as "impatient convener"
  • Using prizes and competitions to achieve outcomes, not just procurements
  • Focusing on attracting talented people to government by allowing them to serve as “entrepreneurs-in-residence.”

"We are clearly moving to a world where you don't just get data by requesting it but it's the default setting to publish it," said Chopra. "We're moving to a world where we're acting beyond the role of regulator to one of 'impatient convening.' We are clearly moving to a world where we're not just investing through mechanisms like procurement and RFPs to one where where we're tapping into the expertise of the American people through challenges, prizes and competition. And we are changing the face of government, recruiting individuals who have more of an entrepreneur-in-residence feel than a traditional careerist position that has in it the expectation of a lifetime of service. "

"Entrepreneurs and innovators around the country are contributing to our greater good. In some cases, they're coming in for a tour of duty, as you'll hear from Todd and Peter. But in many others, they're coming in where they can and how they can because if we tap into the collective expertise of the American people we can actually overcome some of the most vexing challenges that today, when you read the newspaper and you watch Washington, you say, 'Gosh, do we have it in us' to get beyond the divisions and these challenges, not just at the federal government but across all level of the public sector."

Open innovation, applied

Applying open innovation "is a task we’ve seen deployed effectively across our nation’s most innovative companies," writes Chopra in the memorandum on open innovation that the White House released this week. "Procter & Gamble’s “Connect+Develop” strategy to source 50% of its innovations from the outside; Amazon’s “Just Do It” awards to celebrate innovative ideas from within; and Facebook’s “Development Platform” that generated an estimated 180,000 jobs in 2011 focused on growing the economy while returning benefits to Facebook in the process."

The examples that Chopra cited are "bonafide," said MIT principal research professor Andrew McAfee, via email. "Open innovation or crowdsourcing or whatever you want to call it is real, and is (slowly) making inroads into mainstream (i.e. non high-tech) corporate America. P&G is real. Innocentive is real. Kickstarter is real. Idea solicitations like the ones from Starbucks are real, and lead-user innovation is really real."

McAfee also shared the insight of Eric Von Hippel on innovation:

“What is changing,” is that it is getting easier for consumers to innovate, with the Internet and such tools, and it is becoming more visible for the same reason. Historically though the only person who had the incentive to publicize innovation was the producer. People build institutions around how a process works and the mass production era products were built by mass production companies, but they weren’t invented by them. When you create institutions like mass production companies you create the infrastructure to help and protect them such as heavy patent protection. Now though we see that innovation is distributed, open collaborative.”

In his remarks, Chopra hailed a crowdsourced approach to the design of DARPA's next-generation combat vehicle, where an idea from a U.S. immigrant led to a better outcome. "The techniques we’ve deployed along the way have empowered innovators, consumers, and policymakers at all levels to better use technology, data, and innovation," wrote Chopra in the memo.

"We’ve demonstrated that “open innovation,” the crowdsourcing of citizen expertise to enhance government innovation, delivers real results. Fundamentally, we believe that the American people, when equipped with the right tools, can solve many problems." To be fair, the "toolkit" in question amounts more to a list of links and case studies than a detailed manual or textbook, but people interested in innovating in government at the local, state and national level should find it useful.

The question now is whether the country and its citizens will be the "winners in the productivity revolutions of the future," posed Chopra, looking to the markets for mobile technology, healthcare and clean energy. In that context, Chopra said that "open data is an active ingredient" in job creation and economic development, citing existing examples. 6 million Californians can now download their energy data through the Green Button, said Chopra, with new Web apps like Watt Quiz providing better interfaces for citizens to make more informed consumption decision.

More than 76,000 Americans found places to get treatment or health services using iTriage, said Chopra, with open data spurring better healthcare decisions by a more informed mobile citizenry. He hailed the role of collaborative innovation in open government, with citing mobile healthcare app ginger.io.

Open government platforms

During his tenure as US CTO, Chopra was a proponent of open data, participatory platforms and one of the Obama administration's most prominent evangelists for the use of technology to make government more open and collaborative. Our September 2010 interview on his work is embedded below:

In his talk last Wednesday, Chopra highlighted two notable examples of open government. First, he described the "startup culture" at the Consumer Financial Protection Bureau, highlighting the process by which the new .gov agency designed a better mortgage disclosure form.

Second, Chopra cited two e-petitions to veto the Stop Online Piracy Act and Protect IP Act on the White House e-petition platform, We The People, as an important example of open government in actions. The e-petitions, which gathered more than 103,000 signatures, are proof that when citizens are given the opportunity to participate, they will, said Chopra. The White House response, which came at a historic moment in the week the Web changed Washington. "SOPA/PIPA is exactly what We the People was meant to do," Chopra told Nancy Scola.

Traditionally, Congress formally requests a Statement of Administration Policy, called a "SAP." Requests for SAPs come in all the time from Congress. We respond based on the dynamics of Washington, priorities and timelines. One would argue that a Washington-centric approach would have have been to await the request for a SAP and publish it, oftentimes when a major vote is happening. If you contrast that were SOPA/PIPA was, still in committee or just getting out of committee, and not yet on the floor, traditionally a White House would not issue a SAP that early. So the train we were on, the routine Washington line of business, we would have awaited the right time to issue a SAP, and done it at congressional request. It just wasn't time yet. The We the People process flipped upside-down to whom we are responsible for providing input. In gathering over a hundred thousand signatures, on SOPA/PIPA, the American people effectively demanded a SAP.

Innovation for healthcare and veterans

"I think people will embrace the open innovation approach because it works," said Todd Park at last week's forum, citing examples at Novartis, Aventis and Walgreens, amongst others. Park cited "Joy's Law," by Sun Microsystems computer science pioneer Bill Joy: "no matter who you are, you have to remember that most of the smart people don't work for you."

Part of making that work is opening up systems in a way that enables citizens, developers and industry to collaborate in creating solutions. "We're moving the culture away from proprietary, closed systems … into something that is modular, standards-based & open, said Peter Levin.

If you went to the Veterans Affairs website in 2009, you couldn't see where you were in the process, said Levin. One of the ways to solve that problem is to create a platform for people to talk to each other, he explained, which the VA was able to do that through its Facebook page.

That may be a "colossal policy change," in his view, but it had an important result: "the whole patronizing fear that if we open up dialogue, open up channels, you'll create a problem you can't undo - that's not true for us," he said.

If you want to rock and roll, emphasized Park, don't just have your own smart people work on a challenge. That's an approach that Aventis executives found success using in a data diabetes challenge. Walgreens will be installing "Health Guides" at its stores to act as a free "health concierge," said Park, as opposed to what they would have done normally. They launched a challenge and, in under three months, got 50 credible prototypes. Now, said Park, mHealthCoach is building Health Guides for Walgreens.

One of the most important observations Park made, however, may have been that there has been too much of a focus on apps created from open data, as opposed to data informing policy makers and care givers. If you want to revolutionize the healthcare industry, open data needs to be at the fingertips of the people who need it most, where then need it most, when they need it most.

For instance, at a recent conference, he said, "Aetna rolled out this innovation called a nurse." If you want to have data help people, built a better IT cockpit for that nurse that helps that person become more omniscient. Have the nurse talk over the telephone with a human who can be helped by the power of the open data in front of the healthcare worker.

Who will pick up the first federal CTO's baton?

Tim O'Reilly made a case for Chopra in April 2009, when the news of his selection leaked. Tim put the role of a federal CTO in the context of someone who provides "visionary leadership, to help a company (or in this case, a government) explore the transformative potential of new technology." In many respects, he delivered upon that goal during his tenure. The person who fills the role will need to provide similar leadership, and to do so in a difficult context, given economic and political headwinds that confront the White House.

As he turns the page towards the next chapter of his career -- one which sources cited by the Washington Post might lead him into politics in Virginia -- the open question now will be who President Obama will choose to be the next "T" in the White House Office of Science and Technology Policy, a role that remains undefined, in terms of Congressional action.

The administration made a strong choice in federal CIO Steven VanRoekel. Inside of government, Park or Levin are both strong candidates for the role, along with Andrew Blumenthal, CTO at the Bureau of Alcohol, Tobacco and Firearms. In the interim, Chris Vein, deputy chief technology office for public sector innovation, is carrying the open government innovation banner in the White House.

In this election year, who the administration chooses to pick up the baton from Chopra will be an important symbol of its commitment to harnessing technology on behalf of the American people. Given the need for open innovation to addressing the nation's grand challenges, from healthcare to energy to education, the person tapped to run this next leg will play an important role in the country's future.

Related:

February 01 2012

With GOV.UK, British government redefines the online government platform

The British Government has launched a beta of its GOV.UK platform, testing a single domain for that could be used throughout government. The new single government domain will eventually replace Directgov, the UK government portal which launched back in 2004. GOV.UK is aimed squarely as delivering faster digital services to citizens through a much improved user interface at decreased cost.

Unfortunately, far too often .gov websites cost millions and don't deliver as needed. GOV.UK is mobile-friendly, platform agnostic, uses HTML5, scalable, open source, hosted in the cloud and open for feedback. Those criteria collectively embody the default for how government should approach their online efforts in the 21st century.

gov.uk screenshot

“Digital public services should be easy to find and simple to use - they must also be cost effective and SME-friendly," said Francis Maude, the British Minister for the Cabinet Office, in a prepared statement. "The beta release of a single domain takes us one step closer to this goal."

Tom Loosemore, deputy director of government digital service at UK Government, introduced the beta of GOV.UK at the Government Digital Service blog, including a great deal of context on its development and history. Over at the Financial Times Tech blog, Tim Bradshaw published an excellent review of the GOV.UK beta.

As Bradshaw highlights, what's notable about the new beta is not just the site itself but the team and culture behind it: that of a large startup, not the more ponderous bureaucracy of Whitehall, the traditional "analogue" institution..

GOV.UK is a watershed in how government approaches Web design, both in terms of what you see online and how it was developed. The British team of developers, designers and managers behind the platform collaboratively built GOV.UK in-house using agile development and the kind of iterative processes one generally only sees in modern Web design shops. Given that this platform is designed to serve as a common online architecture for the government of the United Kingdom, that's meaningful.

“Our approach is changing," said Maude. "IT needs to be commissioned or rented, rather than procured in huge, expensive contracts of long duration. We are embracing new, cloud-based start-ups and enterprise companies and this will bring benefits for small and medium sized enterprises here in the UK and so contribute to growth.”

The designers of GOV.UK, in fact, specifically describe it as "government as a platform," in terms of something that others can build upon. It was open from the start, given that the new site was built entirely using open source tools. The code behind GOV.UK has been released as open source code on GitHub.

"For me, this platform is all about putting the user needs first in the delivery of public services online in the UK," said Mike Bracken, executive director of government digital services. Bracken is the former director of digital development at the Guardian News and Media and was involved in setting up MySociety. "For too long, user need has been trumped by internal demands, existing technology choices and restrictive procurement practices. Gov.uk puts user need firmly in charge of all our digital thinking, and about time too."

The Gov.UK stack

Reached via email, Bracken explained more about the technology choices that have gone into GOV.UK, starting with the platform diagram below.

gov.uk screenshot

Why create an open source stack? "Why not?" asked Bracken."It's a government platform, and as such it belongs to us all and we want people to contribute and share in its development."

While many local, state and federal sites in the United States have chosen to adapt and use Wordpress or Drupal as open government platforms, the UK team started with afresh.

"Much of the code is based on our earlier alpha, which we launched in May last year as an early prototype for a single platform," said Bracken. "We learnt from the journey, and rewrote some key components recently, one key element of the prototype in scale."

According to Bracken, the budget for the beta is £1.7 million pounds, which they are running under at present. (By way of contrast, the open government reboot of FCC.gov was estimated to cost 1.35 million dollars.) There are about 40 developers coding on GOV.UK, said Bracken, but the entire Government Digital Service has around 120 staff, with up to 1800 external testers. They also used several external development houses to complement their team, some for only two weeks at a time.

Why build an entirely new open government platform? "It works," said Bracken. "It's inherently flexible, best of breed and completely modular. And it doesn't require any software licenses."

Bracken believes that the GOV.UK will give the British government agility, flexibility and freedom to change as they go, which are, as he noted not characteristics aligned with the usual technology build in the UK -- or elsewhere, for that matter.

Given the British government's ambitious plans for open data, the GOV.UK platform also will need to be act as, well, a platform. On that count, they're still planning, not implementing.

"With regard to API's, our long term plan is to 'go wholesale,' by which we mean expose data and services via API's," said Bracken. "We are at the early stages of mapping out key attributes, particularly around identity services, so to be fair it's early days yet. The inherent flexibility does allow for us to accommodate future changes, but it would be premature to make substantial claims to back up API delivery at this point."

The GOV.UK platform will be adaptable for the purposes of city government as well, over time. "We aim to migrate key department sites onto it in the first period of migration, and then look at government agencies," said Bracken. "The migration, with over 400 domains to review, will take more than a year. We aim to offer various platform services which meet the needs of all Government service providers."

Making GOV.UK citizen-centric

The GOV.UK platform was also designed to be citizen-centric, keeping the tasks that people come to a government site to accomplish in mind. Its designers, apparently amply supplied with classic British humor, dubbed the engine that tracks them the "Needotron."

"We didn't just identify top needs," said Loosemore, via email. "We built a machine to manage them for us now and in the future. Currently there are 667!" Loosemore said that they've open sourced the Needotron code, for those interested in tracking needs of their own.

"There are some of the Top needs we've not got to properly yet," said Loosemore. "For example, job search is still sub-optimal, as is the stuff to do with losing your passport."

According to Loosemore, some the top needs that citizens have when they come to a site in the UK are determining the minimum wage, learning when the public and bank holidays are or when the clocks change for British Summer Time. They also come to central government to pay their council tax, which is actually a local function, but GOV.UK is designed to route those users to the correct site using geolocation.

This beta will have the top 1000 things you would need to do government, said Maude, speaking at the Sunlight Foundation this week. (If that's so, there's over 300 more yet to go.)

"There's massive change needed in our approach to how to digitize what we do," he said. "Instead of locking in with a massive supplier, we need to be thinking of it the other way around. What do people need from government? Work from the outside in and redesign processes."

In his comments, Maude emphasized the importance of citizen-centricity, with respect to interfaces. We don't need to educate people on how to use a service, he said. We need to educate government on how to serve the citizen.

"Like U.S., the U.K. has a huge budget deficit," he said. "The public expects to be able to transact with government in a cheap, easy way. This enables them to do it in a cheaper, easier way, with choices. It's not about cutting 10 or 20% from the cost but how to do it for 10 or 20% of the total cost."

The tech behind Gov.UK

James Stewart, who was the tech lead on the beta of GOV.UK, recently blogged about and browser support. He emailed me the following breakdown of the rest of the technology behind GOV.UK.

Hosting and Infrastructure:

  • DNS hosted by Dyn.com
  • Servers are Amazon EC2 instances running Ubuntu 10.04LTS
  • Email (internal alerts) sending via Amazon SES and Gmail
  • Miscellaneous file storage on Amazon S3
  • Jetty application server
  • Nginx, Apache and mod_passenger
  • Jenkins continuous integration server
  • Caching by Varnish
  • Configuration management using Puppet

Front end

  • Javascript uses jQuery, jQuery UI, Chosen, and a variety of other plugins
  • Gill Sans, provided by fonts.com
  • Google web font loader

Languages, Frameworks and Plugins

"Most of the application code is written in Ruby, running on a mixture of Rails and Sinatra," said Stewart. "Rails and Sinatra gave us the right balance of productivity and clean code, and were well known to the team we've assembled. We've used a range of gems along with these, full details of which can be found in the Gemfiles at Github.com/alphagov."

The router for GOV.UK is written in Scala and uses Scalatra for its internal API, said Stewart. "The router distributes requests to the appropriate backend apps, allowing us to keep individual apps very focused on a particular problem without exposing that to visitors," said Stewart. "We did a bake-off between a ruby implementation and a Scala implementation and were convinced that the Scala version was better able to handle the high level of concurrency this app will require."

Databases

  • MongoDB. "We started out building everything using MySQL but moved to MongoDB as we realised how much of our content fitted its document-centric approach," said Stewart. "Over time we've been more and more impressed with it and expect to increase our usage of it in the future."
  • MySQL, hosted using Amazon's RDS platform. "Some of the data we need to store is still essentially relational and we use MySQL to store that," said Stewart. "Amazon RDS takes away many of the scaling and resilience concerns we had with that, without requiring changes to our application code."
  • MaPit geocoding and information service from mySociety. "MaPit not only does conventional geocoding, " said Stewart, in terms of determining what the given the longitude or latitude is for a postcode, but " italso gives us details of all the local government areas a postcode is in, which lets us point visitors to relevant local services."

Collaboration tools

gov.uk screenshot

  • Campfire for team chat
  • Google Apps
  • MediaWiki
  • Pivotal Tracker
  • Many, many index cards.

Related:

January 24 2012

"The President of the United States is on the phone. Would you like to Hangout on Google+?"

We're suddenly very close to science fiction becoming reality television, live streamed to large and small screens around the world. On Monday, January 30th, 2012, the fireside chats that FDR hosted on citizens' radios in the 20th century will have a digital analogue in the new millennium: President Barack Obama will host a Google+ Hangout from the West Wing, only a few weeks after the White House joined Google+.

Screenshot of President Obama sending a tweet through the @whitehouse account
A screenshot from July 6, 2011, of President Obama sending his first tweet through the @whitehouse account. On January 30, he'll host the first president Hangout on Google+.

If you have a question for the president, you can ask it by submitting a video to the White House's video channel, where you can also vote upon other questions. The president will be answering "several of the most popular questions that have been submitted through YouTube, and some of the people who submitted questions will even be invited to join the president in the Hangout and take part in the live conversation," explained Kori Schulman, deputy director of digital content at the White House, at the White House blog.

The real-time presidency

This upcoming "President Hangout" offers a fascinating window into what bids to be a disruptive scenario to citizen-to-government (or citizen-to-citizen) communications in our near future. Mobile Hangouts on smartphones running the world's biggest mobile operating system, Android, could enable citizens to connect to important conversations from wherever a call finds them.

Such town halls could be live streamed and shared through Facebook, Google+ or the White House's iOS app, reaching hundreds of millions of people connected through mobile broadband connections. In the future, we might even see iOS cameras enable citizens to "get some FaceTime with the president" through his iPad. The quality of the video on the iPad 2 is poor now, as owners know, but what if Apple adds a camera to the iPad 3 as good as the one it added to the iPhone4S? That would enable instant video chat through 100m+ connected iOS devices, along with millions of MacBooks and iMacs that have webcams.

In that future, I can't help but think of video phones from the "Jetsons." Or "Blade Runner," "Minority Report," "The Fifth Element" or "Total Recall.' Or, better yet, "Star Trek," since Gene Roddenberry's vision of a peaceful future is a lot better than the dystopian epics Philip K. Dick tended to write.

Style or open government substance?

The technology we have in our hands right now, of course, is pretty exciting. The prospect of a presidential Hangout has naturally been getting plenty of attention in the media, from CNET to Mashable to the L.A. Times to NextGov, where Joseph Marks has one of the smartest takes to date. In his post, Marks, a close observer of how the White House is using technology in support of open government, goes right to the heart of what analysts and the media should be asking: What does this mean and how will it work?

The administration is touting the Google Plus event as 'the first completely-virtual interview from the White House.' It's not entirely clear what that means. It could signal merely that the president will respond directly to questioners' YouTube videos rather than having them keyed up by a moderator. In past social media Town Halls conducted through Twitter, Facebook and LinkedIn, Obama has typically shared the stage with a moderator who introduced and sometimes picked questions. If questioners are able to ask their questions directly, including follow-up questions through the Hangout feature, that would be a more significant innovation.

To put it another way, will the first presidential Google+ Hangout be about substance, or is this about burnishing the president 's tech-savvy image and credentials in an election year?

When I asked that question openly on Twitter, Gadi Ben Yehuda, who analyzes and teaches about the government's use of social media for IBM, replied: "Both, I bet. Message is medium, after all. Style, in this case, is part of substance."

As it happens, Macon Phillips, director of digital strategy at the White House, was also listening. "What criteria would you use to answer that question?" he asked. Noah Chestnut, director of digital media at Hamilton Place Strategies in D.C., suggested the following criteria: "Q's asked, length + content of A's, follow-up Q's vs. cursory, who writes the process stories."

As I analyze this new experiment in digital democracy, I will look at A) whether the questions answered were based upon the ones most citizens wanted asked and B) whether the answers were rehashed talking points or specific to the intent of the questions asked. That latter point was one fair critique I've seen levied by the writers at techPresident after the first "Twitter Townhall" last July.

In reply, Phillips tweeted: "Well, if the past 2 post-SOTU [State of the Union] events are any indication, you should be optimistic! One the exciting things about the Hangout format is that conversational aspect." As evidence for this assertion, Phillips linked to videos of YouTube interviews with President Obama after the 2010 and 2011 State of the Union addresses. The president answered questions sourced from the Google Moderator tool on the CitizenTube channel.

There are process questions that matter as well. Will Steve Grove, head of community partnerships at Google+, be asking the questions? Or will  the president himself respond directly to the questions of citizens?

Phillips replied that there will be a "little bit of both to involve both the voting prior and the participants during." He also told the Associated Press that the White House would have no role in choosing the questions or participants in the Hangout. "For online engagement to be interesting, it has to be honest," Phillips said. "We want to give Americans more control over this conversation and the chance to ask questions they care about."

In other words, citizens will be able to ask the president questions directly via YouTube and, if chosen, may have the opportunity to join him in the Hangout. When I asked Phillips my own follow-up question, he suggested that "for specifics on format, better to connect w/@GROVE but we are planning for ?'s that are voted on & others asked live."

I was unable to reach Grove. However, he told the Associated Press that the Hangout "will make for a really personal conversation with the president that's never really happened before."

Will there be #realtalk in real time?

Direct interactivity through a Hangout could also introduce that rare element that's missing at many presidential appearances: unscripted moments. That's what the editors of techPresident will be watching for in this new experiment. "Our prevailing hypothesis around here is that one great promise of the Internet in politics is to create unscripted moments, opportunities to yank politicians off of their talking points and into a confrontation with the real and complex problems America faces today," wrote Nick Judd. "We saw this in July at the very end of the Twitter event with Obama. Reid Epstein saw a similar occurrence when former Massachusetts Gov. Mitt Romney's presidential aspirations took him to a New Hampshire diner, where he met a gay veteran who asked him about same-sex marriage. We're hungrily looking for examples of this in the integrations of the Internet and of social media in presidential debates, and not finding many so far."

What will be particularly interesting will be the opportunities that citizens have to ask follow-up questions on the Hangout if they're not satisfied with an answer. That feedback loop is what tends to be missing from these online forums. Many citizens haven't had the opportunity to ask informed, aggressive follow-up questions like, say, at a presidential press conference at the White House. The evolution of these platforms will occur when organizations stop "adopting" them and start actually using them. In this case, using the killer app of the Google+ platform to connect directly with the American people.

As of this morning, 30,594 people have submitted 16,047 questions and cast 208,431 votes. Currently, the most popular video questions are about stopping the PROTECT IP Act and Stop Online Piracy Act (SOPA) and the Anti-Counterfeiting Trade Agreement (ACTA), which would establish international standards for intellectual property. The top question comes from "Anonymous," and asks "Mr. President, it's all good and well that SOPA and PIPA are slowed down in Congress, but what are you doing about ACTA? This is an international agreement which could prove much more devastating."

To date, President Obama, has not commented extensively on ACTA or either of these bills. If any of those questions are answered, it will indeed be evidence that the White House is listening and the president's commitment "to creating a system of transparency, public participation, and collaboration" using social media and technology is genuine.

A version of this post originally appeared on Google+.

Related:

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl