Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

May 22 2012

Data journalism research at Columbia aims to close data science skills gap

Successfully applying data science to the practice of journalism requires more than providing context and finding clarity in vasts amount of unstructured data: it will require media organizations to think differently about how they work and who they venerate. It will mean evolving towards a multidisciplinary approach to delivering stories, where reporters, videographers, news application developers, interactive designers, editors and community moderators collaborate on storytelling, instead of being segregated by departments or buildings.

The role models for this emerging practice of data journalism won't be found on broadcast television or on the lists of the top journalists over the past century. They're drawn from the increasing pool of people who are building new breeds of newsrooms and extending the practice of computational journalism. They see the reporting that provisions their journalism as data, a body of work that can itself can be collected, analyzed, shared and used to create longitudinal insights about the ways that society, industry or government are changing. (Or not, as the case may be.)

In a recent interview, Emily Bell (@EmilyBell), director of the Tow Center for Digital Journalism at the Columbia University School of Journalism, offered her perspective about what's needed to train the data journalists of the future and the changes that still need to occur in media organizations to maximize their potential. In this context, while the role of institutions and "journalism education are themselves evolving, they both will still fundamentally matter for "what's next," as practitioners adapt to changing newsonomics.

Our discussion took place in the context of a notable investment in the future of data journalism: a $2 million research grant to Columbia University from the Knight Foundation to research and distribute best practices for digital reportage, data visualizations and measuring impact. Bell explained more about what how the research effort will help newsrooms determine what's next on the Knight Foundation's blog:

The knowledge gap that exists between the cutting edge of data science, how information spreads, its effects on people who consume information and the average newsroom is wide. We want to encourage those with the skills in these fields and an interest and knowledge in journalism to produce research projects and ideas that will both help explain this world and also provide guidance for journalism in the tricky area of ‘what next’. It is an aim to produce work which is widely accessible and immediately relevant to both those producing journalism and also those learning the skills of journalism.

We are focusing on funding research projects which relate to the transparency of public information and its intersection with journalism, research into what might broadly be termed data journalism, and the third area of ‘impact’ or, more simply put, what works and what doesn’t.

Our interview, lightly edited for content and clarity, follows.

What did you do before you became director of the Tow Center for Digital Journalism?

I spent ten years where I was editor-in-chief of The Guardian website. During the last four of those, I was also overall director of digital content for all The Guardian properties. That included things like mobile applications, et cetera, but from the editorial side.

Over the course of that decade, you saw one or two things change online, in terms of what journalists could do, the tools available to them and the news consumption habits of people. You also saw the media industry change, in terms of the business models and institutions that support journalism as we think of it. What are the biggest challenges and opportunities for the future journalism?

For newspapers, there was an early warning system: that newspaper circulation has not really consistently risen since the early 1980s. We had a long trajectory of increased production and actually, an overall systemic decline which has been masked by a very, very healthy advertising market, which really went on an incredible bull run with a more static pictures, and just "widen the pipe," which I think fooled a lot of journalism outlets and publishers into thinking that that was the real disruption.

And, of course, it wasn’t.

The real disruption was the ability of anybody anywhere to upload multimedia content and share it with anybody else who was on a connected device. That was the thing that really hit hard, when you look at 2004 onwards.

What journalism has to do is reinvent its processes, its business models and its skillsets to function in a world where human capital does not scale well, in terms of sifting, presenting and explaining all of this information. That’s really the key to it.

The skills that journalists need to do that -- including identifying a story, knowing why something is important and putting it in context -- are incredibly important. But how you do that, which particular elements you now use to tell that story are changing.

Those now include the skills of understanding the platform that you’re operating on and the technologies which are shaping your audiences’ behaviors and the world of data.

By data, I don’t just mean large caches of numbers you might be given or might be released by institutions: I mean that the data thrown off by all of our activity, all the time, is simply transforming the speed and the scope of what can be explained and reported on and identified as stories at a really astonishing speed. If you don’t have the fundamental tools to understand why that change is important and you don’t have the tools to help you interpret and get those stories out to a wide public, then you’re going to struggle to be a sustainable journalist.

The challenge for sustainable journalism going forward is not so different from what exists in other industries: there's a skills gap. Data scientists and data journalists use almost the exact same tools. What are the tools and skills that are needed to make sense of all of this data that you talked about? What will you do to catalog and educate students about them?

It's interesting when you say that the skills of these clients are very similar, which is absolutely right. First of all, you have a basic level of numeracy needed - and maybe not just a basic level, but a more sophisticated understanding of statistical analysis. That’s not something which is routinely taught in journalism schools but that I think will increasingly have to be.

The second thing is having some coding skills or some computer science understanding to help with identifying the best, most efficient tools and the various ways that data is manipulated.

The third thing is that when you’re talking about 'data scientists,' it’s really a combination of those skills. Adding data doesn’t mean you don't have to have other journalism skills which do not change: understanding context, understanding what the story might be, and knowing how to derive that from the data that you’re given or the data that exists. If it’s straightforward, how do you collect it? How do you analyze it? How do you interpret them and present it?

It’s easy to say, but it’s difficult to do. It’s particularly difficult to reorient the skillsets of an industry which have very much resided around the idea of a written story and an ability with editing. Even in the places where I would say there’s sophisticated use of data in journalism, it’s still a minority sport.

I’ve talked to several heads of data in large news organizations and they’ve said, “We have this huge skills gap because we can find plenty of people who can do the math; we can find plenty of people who are data scientists; we can’t find enough people who have those skills but also have a passion or an interest in telling stories in a journalistic context and making those relatable.”

You need a mindset which is about putting this in the context of the story and spotting stories, as well having creative and interesting ideas about how you can actually collect this material for your own stories. It’s not a passive kind of processing function if you’re a data journalist: it’s an active speaking, inquiring and discovery process. I think that that’s something which is actually available to all journalists.

Think about just local information and how local reporters go out and speak to people every day on the beat, collect information, et cetera. At the moment, most get from those entities don’t structure the information in a way that will help them find patterns and build new stories in the future.

This is not just about an amazing graphic that the New York Times does with census data over the past 150 years. This is about almost every story. Almost every story has some component of reusability or a component where you can collect the data in a way that helps your reporting in the future.

To do that requires a level of knowledge about the tools that you’re using, like coding, Google Refine or Fusion Tables. There are lots of freely available tools out there that are making this easier. But, if you don’t have the mindset that approaches, understands and knows why this is going to help you and make you a better reporter, then it’s sometimes hard to motivate journalists to see why they might want to grab on.

The other thing to say, which is really important, is there is currently a lack of both jobs and role models for people to point to and say, “I want to be that person.”

I think the final thing I would say to the industry is we’re getting a lot of smart journalists now. We are one of the schools where all of our digital concentrations from students this year include a basic grounding in data journalism. Every single one of them. We have an advanced course taught by Susan McGregor in data visualization. But we’re producing people from the school now, who are being hired to do these jobs, and the people who are hiring them are saying, “Write your own job description because we know we want you to do something, we just don’t quite know what it is. Can you tell us?”

You can’t cookie-cutter these people out of schools and drop them into existing roles in news trends because those are still developing. What we’re seeing are some very smart reporters with data-centric mindsets and also the ability to do these stories -- but they want to be out reporting. They don’t want to be confined to a desk and a spreadsheet. Some editors usually find that very hard to understand, “Well, what does that job look like?”

I think that this is where working with the industry, we can start to figure some of these things out, produce some experimental work or stories, and do some of the thinking in the classroom that helps people figure out what this whole new world is going to look like.

What do journalism schools need to do to close this 'skills gap?' How do they need to respond to changing business models? What combination of education, training and hands-on experience must they provide?

One of the first things they need to do is identify the problem clearly and be honest about it. I like to think that we’ve done that at Columbia, although I’m not a data journalist. I don’t have a background in it. I’m a writer. I am, if you like, completely the old school.

But one of the things I did do at The Guardian was helped people who early on said to me, “Some of this transformation means that we have to think about data as being a core part of what we do.” Because of the political context and the position I was in, I was able to recognize that that was an important thing that they were saying and we could push through changes and adoption in those areas of the newsroom.

That’s how The Guardian became interested in data. It’s the same in journalism school. One of the early things that we talked about [at Columbia] was how we needed to shift some of what the school did on its axis and acknowledge that this was going to be key part of what we do in the future. Once we acknowledged that that is something we had to work towards, [we hired] Susan McGregor from the Wall Street Journal’s Interactive Team. She’s an expert in data journalism and has an MA in technology in education.

If you say to me, “Well, what’s the ground vision here?” I would say the same thing I would say to anybody: over time, and hopefully not too long a course of time, we want to attract a type of student that is interested and capable in this approach. That means getting out and motivating and talking to people. It means producing attractive examples which high school children and undergraduate programs think about [in their studies]. It means talking to the CS [computer science] programs -- and, in fact, more about talking to those programs and math majors than you would be talking to the liberal arts professors or the historians or the lawyers or the people who have traditionally been involved.

I think that has an effect: it starts to show people who are oriented towards storytelling but have capabilities which are align more with data science skill sets that there’s a real task for them. We can’t message that early enough as an industry. We can’t message it early enough as an educator to get people into those tracks. We have to really make sure that the teaching is high quality and that we’re not just carried away with the idea of the new thing, we need to think pretty deeply about how we get those skills.

What sort of basic sort of statistical teaching do you need? What are the skills you need for data visualization? How do you need to introduce design as well as computer science skills into the classroom, in a way which makes sense for stories? How do you tier that understanding?

You're always going to produce superstars. Hopefully, we’ll be producing superstars in this arena soon as well.

We need to take the mission seriously. Then we need to build resources around it. And that’s difficult for educational organizations because it takes time to introduce new courses. It takes time to signal that this is something you think is important.

I think we’ve done a reasonable job of that so far at Columbia, but we’ve got a lot further to go. It's important that institutions like Columbia do take the lead and demonstrate that we think this is something that has to be a core curriculum component.

That’s hard, because journalism schools are known for producing writers. They’re known for different types of narratives. They are not necessarily lauded for producing math or computer science majors. That has to change.

Related:

May 16 2012

How to start a successful business in health care at Health 2.0 conference

Great piles of cash are descending on entrepreneurs who develop health care apps, but that doesn't make it any easier to create a useful one that your audience will adopt. Furthermore, lowered costs and streamlined application development technique let you fashion a working prototype faster than ever, but that also reduces the time you can fumble around looking for a business model. These were some of the insights I got at Spring Fling 2012: Matchpoint Boston, put on by Health 2.0 this week.

This conference was a bit of a grab-bag, including one-on-one meetings between entrepreneurs and their potential funders and customers, keynotes and panels by health care experts, round-table discussions among peers, and lightning-talk demos. I think the hallway track was the most potent part of this conference, and it was probably planned that way. The variety at the conference mirrors the work of Health 2.0 itself, which includes local chapters, challenges, an influential blog, and partnerships with a range of organizations. Overall, I appreciated the chance to get a snapshot of a critical industry searching for ways to make a positive difference in the world while capitalizing on ways to cut down on the blatant waste and mismanagement that bedevil the multi-trillion-dollar health care field.

Let's look, for instance, at the benefits of faster development time. Health IT companies go through fairly standard early stages (idea, prototype, incubator, venture capital funding) but cochairs Indu Subaiya and Matthew Holt showed slides demonstrating that modern techniques can leave companies in the red for less time and accelerate earnings. On the other hand, Jonathan Bush of athenahealth gave a keynote listing bits of advice for company founders and admitting that his own company had made significant errors that required time to recover from. Does the fast pace of modern development leave less room for company heads to make the inevitable mistakes?

I also heard Margaret Laws, director of the California HealthCare Foundation's Innovations Fund, warn that most of the current applications being developed for health care aim to salve common concerns among doctors or patients but don't address what she calls the "crisis points" in health care. Brad Fluegel of Health Evolution Partners observed that, with the flood of new entrepreneurs in health IT, a lot of old ideas are being recycled without adequate attention to why they failed before.

I'm afraid this blog is coming out too negative, focusing on the dour and the dire, but I do believe that health IT needs to acknowledge its risks in order to avoid squandering the money and attention it's getting, and on the positive side to reap the benefits of this incredibly fertile moment of possibilities in health care. Truly, there's a lot to celebrate in health IT as well. Here are some of the fascinating start-ups I saw at the show:

  • hellohealth aims at that vast area of health care planning and administration that cries out for efficiency improvements--the area where we could do the most good by cutting costs without cutting back on effective patient care. Presenter Shahid Shah described the company as the intersection of patient management with revenue cycle management. They plan to help physicians manage appointments and follow-ups better, and rationalize the whole patient experience.

  • hellohealth will offer portals for patients as well. They're unique, so far as I know, in charging patients for certain features.

  • Corey Booker demo'd onPulse, which aims to bring together doctors with groups of patients, and patients with groups of the doctors treating them. For instance, when a doctor finds an online article of interest to diabetics, she can share it with all the patients in her practice suffering from diabetes. onPulse also makes it easier for a doctor to draw in others who are treating the same patient. The information built up about their interactions can be preserved for billing.

    onPulse overlaps in several respects with HealthTap, a doctor-patient site that I've covered several times and for which an onPulse staffer expressed admiration. But HealthTap leaves discussions out in the open, whereas onPulse connects doctors and patients in private.

  • HealthPasskey.com is another one of these patient/doctor services with a patient portal. It allows doctors to upload continuity of care documents in the standard CCD format to the patient's site, and supports various services such as making appointments.

    A couple weeks ago I reported a controversy over hospitals' claims that they couldn't share patient records with the patients. Check out the innovative services I've just highlighted here as a context for judging whether the technical and legal challenges for hospitals are really too daunting. I recognize that each of the sites I've described pick off particular pieces of the EHR problem and that opening up the whole kit and kaboodle is a larger task, but these sites still prove that all the capabilities are in place for institutions willing to exploit them.

  • GlobalMed has recently released a suitcase-sized box that contains all the tools required to do a standard medical exam. This allows traveling nurse practitioners or other licensed personnel to do a quick check-up at a patient's location without requiring a doctor or a trip to the clinic. Images can also be taken. Everything gets uploaded to a site where a doctor can do an assessment and mark up records later. The suitcase weighs about 30 pounds, rolls on wheels, and costs about $30,000 (price to come down if they start manufacturing in high quantities).

  • SwipeSense won Health 2.0's 100 Day Innovation Challenge. They make a simple device that hospital staff can wear on their belts and wipe their hands on. This may not be as good as washing your hands, but takes advantage of people's natural behavior and reduces the chance of infections. It also picks up when someone is using the device and creates reports about compliance. SwipeSense is being tested at the Rush University Medical Center.

  • Thryve, one of several apps that helps you track your food intake and make better choices, won the highest audience approval at Thursday's Launch! demos.

  • Winner of last weekend's developer challenge was No Sleep Kills, an app that aims to reduce accidents related to sleep deprivation (I need a corresponding app to guard against errors from sleep-deprived blogging). You can enter information on your recent sleep patterns and get back a warning not to drive.

It's worth noting that the last item in that list, No Sleep Kills, draws information from Health and Human Services's Healthy People site. This raises the final issue I want to bring up in regard to the Spring Fling. Sophisticated developers know their work depends heavily on data about public health and on groups of patients. HHS has actually just released another major trove of public health statistics. Our collective knowledge of who needs help, what works, and who best delivers the care would be immensely enhanced if doctors and institutions who currently guard their data would be willing to open it up in aggregate, non-identifiable form. I recently promoted this ideal in coverage of Sage Congress.

In the entirely laudable drive to monetize improvements in health care, I would like the health IT field to choose solutions that open up data rather than keep it proprietary. One of the biggest problems with health care, in this age of big data and incredibly sophisticated statistical tools, is our tragedy of the anti-commons where each institution seeks to gain competitive advantage through hoarding its data. They don't necessarily use their own data in socially beneficial ways, either (they're more interested in ratcheting up opportunities for marketing expensive care). We need collective sources of data in order to make the most of innovation.

OSCON 2012 Healthcare Track — The conjunction of open source and open data with health technology promises to improve creaking infrastructure and give greater control and engagement to patients. Learn more at OSCON 2012, being held July 16-20 in Portland, Oregon.

Save 20% on registration with the code RADAR20

April 18 2012

What responsibilities and challenges come with open government?

A historic Open Government Partnership launched in New York City last September with 8 founding countries. Months later representatives from 73 countries and 55 governments have come together to present their open government action plans and formally endorse the principles in the Open Government Partnership. Yesterday, hundreds of attendees from government, civil society, media and the private sector watched in person and online as Brazilian President Dilma Rousseff spoke about her country's efforts to root out corruption and engage the Brazilian people in governance and more active citizenship. United States Secretary of State Hillary Clinton preceded her, defining an open or closed society as a key dividing line of the 21st century.

Today's agenda includes more regional breakouts and an opening plenary session on the "Responsibility and Challenges that Come with Openness." If you have an Internet connection, you should be able to watch the discussion in the embedded player below:

Watch live streaming video from ogp2012 at livestream.com

The plenary will feature Walid al-Saqaf of YemenPortal.net & Alkasir, minister Francis Maude from the United Kingdom, Tunisian Secretary of State Ben Abbes, and Fernando Rodrigues, and investigative journalist from Folha de São Paulo in Brazil.

The liveblog of the entire proceedings is embedded below.



April 05 2012

Steep climb for National Cancer Institute toward open source collaboration

Although a lot of government agencies produce open source software, hardly any develop relationships with a community of outside programmers, testers, and other contributors. I recently spoke to John Speakman of the National Cancer Institute to learn about their crowdsourcing initiative and the barriers they've encountered.

First let's orient ourselves a bit--forgive me for dumping out a lot of abbreviations and organizational affiliations here. The NCI is part of the National Institutes of Health. Speakman is the Chief Program Officer for NCI's Center for Biomedical Informatics and Information Technology. Their major open source software initiative is the Cancer Biomedical Informatics Grid (caBIG), which supports tools for transferring and manipulating cancer research data. For example, it provides access to data classifying the carcinogenic aspects of genes (The Cancer Genome Atlas) and resources to help researchers ask questions of and visualize this data (the Cancer Molecular Analysis Portal).

Plenty of outside researchers use caBIG software, but it's a one-way street, somewhat in the way the Department of Veterans Affairs used to release its VistA software. NCI sees the advantages of a give-and-take such as the CONNECT project has achieved, through assiduous cultivation of interested outside contributors, and wants to wean its outside users away from the dependent relationship that has been all take and no give. And even the VA decided last year that a more collaborative arrangement for VistA would benefit them, thus putting the software under the guidance of an independent non-profit, the Open Source Electronic Health Record Agent (OSEHRA).

Another model is Forge.mil, which the Department of Defense set up with the help of CollabNet, the well-known organization in charge of the Subversion revision control tool. Forge.mil represents a collaboration between the DoD and private contractors, encouraging them to create shared libraries that hopefully increase each contractor's productivity, but it is not open source.

The OSEHRA model--creating an independent, non-government custodian--seems a robust solution, although it takes a lot of effort and risks failure if the organization can't create a community around the project. (Communities don't just spring into being at the snap of a bureaucrat's fingers, as many corporations have found to their regret.) In the case of CONNECT, the independent Alembic Foundation stepped in to fill the gap after a lawsuit stalled CONNECT's development within the government. According to Alembic co-founder David Riley, with the contract issues resolved, CONNECT's original sponsor--the Office of the National Coordinator--is spinning off CONNECT to a private sector, open source entity, and work is underway to merge the two baselines.

Whether an agency manages its own project or spins off management, it has to invest a lot of work to turn an internal project into one that appeals to outside developers. This burden has been discovered by many private corporations as well as public entities. Tasks include:

  • Setting up public repositories for code and data.

  • Creating a clean software package with good version control that make downloading and uploading simple.

  • Possibly adding an API to encourage third-party plugins, an effort that may require a good deal of refactoring and a definition of clear interfaces.

  • Substantially adding to the documentation.

  • General purging of internal code and data (sometimes even passwords!) that get in the way of general use.

Companies and institutions have also learned that "build it and they will come" doesn't usually work. An open source or open data initiative must be promoted vigorously, usually with challenges and competitions such as the Department of Health and Human Services offer in their annual Health Data Initiative forums (a.k.a datapaloozas).

With these considerations in mind, the NCI decided in the summer of 2011 to start looking for guidance and potential collaborators. Here, laws designed long ago to combat cronyism put up barriers. The NCI was not allowed to contact anyone it wanted out of the blue. Instead, it has to issue a Request for Information and talk to people who responded. Although the RFI went online, it obviously wasn't widely seen. After all, do you regularly look for RFIs and RFPs from government agencies? If so, I can safely guess that you're paid by a large company or lobbying agency to follow a particular area of interest.

RFIs and RFPs are released as a gesture toward transparency, but in reality they just make it easier for the usual crowd of established contractors and lobbyists to build on the relationships they already have with agencies. And true to form, the NCI received only a limited set of responses, frustrated in their attempts to talk to new actors with the expertise they needed for their open source efforts.

And because the RFI had to allow a limited time window for responses, there is no point in responding to it now.

Still, Speakman and his colleagues are educating themselves and meeting with stakeholders. Cancer research is a hot topic drawing zealous attention from many academic and commercial entities, and they're hungry for data. Already, the NCI is encouraged by the initial positive response from the cancer informatics community, many of whom are eager to see the caBIG software deposited in an open repository like GitHub right away. Luckily, HHS has already negotiated terms of service with GitHub and SourceForge, removing at least one important barrier to entry. The NCI is packaging its first tool (a laboratory information management system called caLIMS) for deposit into a public repository. So I'm hoping the NCI is too caBIG to fail.

March 09 2012

HHS CTO Todd Park to serve as the second chief technology officer of the United States

The White House has announced that Todd Park (@Todd_Park), the chief technology officer for the Department of Health and Human Services, will step into the role left open by Aneesh Chopra, the first person to hold the newly created position.

At the White House blog, John P. Holdren, assistant to the president for science and technology and director of the White House Office of Science and Technology Policy, wrote that:

For nearly three years, Todd has served as CTO of the U.S. Department of Health and Human Services, where he was a hugely energetic force for positive change. He led the successful execution of an array of breakthrough initiatives, including the creation of HealthCare.gov, the first website to provide consumers with a comprehensive inventory of public and private health insurance plans available across the Nation by zip code in a single, easy-to-use tool.

I knew Park's young family could be a factor in whether he would be the next US CTO, given that he'd already served longer than perhaps expected. That said, if the President of the United States asked you to serve as his CTO, would you say no?

This is some of the best personnel news to come out of Washington and the federal government under President Obama. Park has been working to revolutionize the healthcare industry at HHS since 2009, and in the private sector as an entrepreneur since 1997. Now he'll have the opportunity to try to improve how the entire federal government works through technology. It's a daunting challenge, but one that he may have been born to take on. Park is charismatic, understands technology on a systems level, and has been successful in applying open innovation and a lean startup approach to government at HHS.

White House director of digital Macon Phillips was "thrilled" about the choice:

As a close observer of the impact of technology on government, it's extremely exciting to hear that HHS's "entrepreneur in residence" is moving into a much bigger stage. Park's entrepreneurial energy and experience drive both his outlook and execution. He also seems to grok project management, which former US CIO Vivek Kundra identified as a core skill to encourage in the public sector. If he's able to harness the power of data to the benefit of the entire country, the outcome could be massive public good.

It's a shame more "Todd Parks" don't serve in government — but then there are very few of them in the world.

On a 30,000-foot level, his personal story is deeply compelling. He's the son of a brilliant immigrant who came here from Korea, attained a graduate-level education, spent his career in a company in the United States and raised a family, including a son who then went on to live the American dream, founding two successful healthcare companies and retiring a wealthy man.

From a 2008 interview on Park's background:

"My father emigrated to the United States from rural South Korea in the late 60s on a scholarship to the University of Utah. He got a PhD in chemical engineering and joined Dow Chemical. He worked there for the next 30 years. He actually has about 72 patents, more patents than anybody in Dow Chemical’s history except for Dr. Dow himself.

He raised me in a small town in Ohio. He sacrificed a lot to try to give me the best options he could. I went to Harvard for my undergrad education. I actually wanted to be in the Naval Academy and I really had my heart set on that, but then Dad and Mom sat me down one evening and said, “Son, no pressure, but we’ve wanted you to go to Harvard since before you were born.” The way they said, that I knew there was no hyperbole. I knew they were serious. I said, “Jeez, if you’re that serious about it, fine, I’ll go.” So I went.

In the matter of what I do in my life, nothing will ever compare to what my dad did: growing up in the Korean War, born dirt poor, emigrating to a brand new country, and becoming one of the most decorated chemical engineers in the world. My entire life is a quest to live up to half of what my dad actually did in some ways. That’s my background. We’re an immigrant family."

That's a powerful narrative, and one that I think should be compelling to the nation — and maybe the world — right now. Park was a successful entrepreneur, retired in his thirties to spend time with his family, and then received the call to enter public service.

As Park describes it, he was planning to retire from the 24x7 life of an entrepreneur, spend time with his young family, and become a healthcare investor when he received an email from HHS Deputy Secretary Bill Corr asking him to become the HHS CTO. As a long-time admirer of Corr, Park took the meeting.

"At the end of the meeting I said, 'This is actually a really amazing job. I'd really love to do this job, but I'll be divorced,'" Park recalled. "Bill replied that that would be bad, and if you're going to be divorced you shouldn't do this job. But why don't you go back and talk to Amy about it and see what she says?

"So I talked to Amy about it, and she was incredibly angry. But then after four days she came back to me and said, 'If they're really creating an entrepreneur in residence job at HHS, it's your national duty to take that job. And as much as I can't believe I'm saying this, I'll move back to the East Coast -- which I hate -- with our baby, to be there with you.'"

The country needs more examples of public servants like Park and his family. If Facebook, Twitter and other startups mint thousands of millionaires and a new class of founders who can "retire" early, I hope some of them will be inspired and become "entrepreneurs in residence" at the federal, state and city level as well.

Park could be a transformational figure of some magnitude in our history, if the politics, the resources and other external forces — war, natural or economic disaster — don't thwart his good work. That's all out of his control, of course, but the prospects here are notable.

For more context on the next US CTO, I've embedded my September 2010 interview with Park about his work at HHS below:

And, befitting the timing, here's an interview with Park about health data from the 2011 SXSW Interactive festival:

Congratulations to Park and condolences to HHS, which will have a hard time filling his shoes.

February 21 2012

Building the health information infrastructure for the modern epatient

To learn more about what levers the government is pulling to catalyze innovation in the healthcare system, I turned to Dr. Farzad Mostashari (@Farzad_ONC). As the National Coordinator for Health IT, Mostashari is one of the most important public officials entrusted with improving the nation's healthcare system through smarter use of technology.

Dr. Farzad MostashariMostashari, a public-health informatics specialist, was named ONC chief in April 2011, replacing Dr. David Blumenthal. Mostashari's full biography, available at HHS.gov, notes that he "was one of the lead investigators in the outbreaks of West Nile Virus and anthrax in New York City, and was among the first developers of real-time electronic disease surveillance systems nationwide."

I talked to Mostashari on the same day that he published a look back over 2011, which he hailed as a year of momentous progress in health information technology. Our interview follows.

What excites you about your work? What trends matter here?

Farzad Mostashari‏: Well, it's a really fun job. It feels like this is the ideal time for this health IT revolution to tie into other massive megatrends that are happening around consumer and patient empowerment, payment and delivery reform, as I talked about in my TED Med Talk with Aneesh Chopra.

These three streams [how patients are cared for, how care is paid for, and how people take care of their own health] coming together feels great. And it really feels like we're making amazing progress.

How does what's happening today grow out of the passage of the Health Information Technology for Economic and Clinical Health Act (HITECH) Act in 2009?

Farzad Mostashari‏: HITECH was a key part of ARRA, the American Recovery and Reinvestment Act. This is the reinvestment part. People think of roadways and runways and railways. This is the information infrastructure for healthcare.

In the past two years, we made as much progress on adoption as we had made in the past 20 years before that. We doubled the adoption of electronic health records in physician offices between the time the stimulus passed and now. What that says is that a large number of barriers have been addressed, including the financial barriers that are addressed by the health IT incentive payments.

It also, I think, points to the innovation that's happening in the health IT marketplace, with more products that people want to buy and want to use, and an explosion in the number of options people have.

The programs we put in place, like the Regional Health IT Extension Centers modeled after the Agriculture Extension program, give a helping hand. There are local nonprofits throughout the country that are working with one-third of all primary care providers in this country to help them adopt electronic health records, particularly smaller practices and maybe health centers, critical access hospitals and so forth.

This is obviously a big lift and a big change for medicine. It moves at what Jay Walker called "med speed," not tech speed. The pace of transformation in medicine that's happening right now may be unparalleled. It's a good thing.

Healthcare providers have a number of options as they adopt electronic health records. How do you think about the choice between open source versus proprietary options?

Farzad Mostashari‏: We're pretty agnostic in terms of the technology and the business model. What matters are the outcomes. We've really left the decisions about what technology to use to the people who have to live with it, like the doctors and hospitals who make the purchases.

There are definitely some very successful models, not only on the EHR side, but also on the health information exchange side.

(Note: For more on this subject, read Brian Ahier's Radar post on the Health Internet.)

What role do open standards play in the future of healthcare?

Farzad Mostashari‏: We are passionate believers in open standards. We think that everybody should be using them. We've gotten really great participation by vendors of open source and proprietary software, in terms of participating in an open standards development process.

I think what we've enabled, through things like modular certification, is a lot more innovation. Different pieces of the entire ecosystem could be done through reducing the barrier to entry, enabling a variety of different innovative startups to come to the field. What we're seeing is, a lot of the time, this is migrating from installed software to web services.

If we're setting up a reference implementation of the standards, like the Connect software or popHealth, we do it through a process where the result is open source. I think the government as a platform approach at the Veterans Affairs department, DoD, and so forth is tremendously important.

How is the mobile revolution changing healthcare?

We had Jay Walker talking about big change [at a recent ONC Grantee Meeting]. I just have this indelible image of him waving in his left hand a clay cone with cuneiform on it that is from 2,000 B.C. — 4,000 years ago — and in his right hand he held his iPhone.

He was saying both of them represented the cutting edge of technology that evolved to meet consumer need. His strong assertion was that this is absolutely going to revolutionize what happens in medicine at tech speed. Again, not "med speed."

I had the experience of being at my clinic, where I get care, and the pharmacist sitting in the starched, white coat behind the counter telling me that I should take this medicine at night.

And I said, "Well, it's easier for me to take it in the morning." And he said, "Well, it works better at night."

And I asked, acting as an empowered patient, "Well, what's the half life?" And he answered, "Okay. Let me look it up."

He started clacking away at his pharmacy information system; clickity clack, clickity clack. I can't see what he's doing. And then he says, "Ah hell," and he pulls out his smartphone and Googles it.

There's now a democratization of information and information tools, where we're pushing the analytics to the cloud. Being able to put that in the hand of not just every doctor or every healthcare provider but every patient is absolutely going to be that third strand of the DNA, putting us on the right path for getting healthcare that results in health.

We're making sure that people know they have a right to get their own data, making sure that the policies are aligned with that. We're making sure that we make it easy for doctors to give patients their own information through things like the Direct Project, the Blue Button, meaningful use requirements, or the Consumer E-Health Pledge.

We have more than 250 organizations that collectively hold data for 100 million Americans that pledge to make it easy for people to get electronic copies of their own data.

Do you think people will take ownership of their personal health data and engage in what Susannah Fox has described as "peer-to-peer healthcare"?

Farzad Mostashari‏: I think that it will be not just possible, not even just okay, but actually encouraged for patients to be engaged in their care as partners. Let the epatient help. I think we're going to see that emerging as there's more access and more tools for people to do stuff with their data once they get it through things like the health data initiative. We're also beginning to work with stakeholder groups, like Consumer's Union, the American Nurses Association and some of the disease groups, to change attitudes around it being okay to ask for your own records.

This interview was edited and condensed. Photo from The Office of the National Coordinator for Health Information Technology.

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

Related:

January 20 2012

Massachusetts Open Checkbook: running through the ledger of choices and challenges in open government

On December 5, Massachusetts Governor Deval Patrick joined with state treasurer Steven Grossman to create an open government initiative with the promising moniker Open Checkbook. The site launched to some acclaim and has received over 220,000 hits. I decided to take a look at what's offered and what's missing from this site, and to ask someone in the government here in Massachusetts to describe their thinking in creating the site. The results can give us some insight into the effort it takes at each stage to release government data--and even more significantly, what it takes to increase the data's value.

As a finance project, Open Checkbook hones in on one area of open government: how it spends. With Open Checkbook you can find out where the money goes in the Massachusetts state government, right down to particular salaries or particular payments to vendors. This is highly welcome in a tight economy, especially in a state that is still often unfairly tarred as "Taxachusetts," decades after tax rates were lowered--a state where news of patronage and pension scandals is common enough to get tiresome--a state where cynical voters have put referendum questions on the ballot in favor of lower taxes at least three times.

I discussed Open Checkbook with Jeffrey Simon, who works for the Governor as the director of the state's economic stimulus program and who was involved in the Open Checkbook from the beginning. The site is run by a steering committee formed by the Governor and Treasurer and made up of members of their staffs. The approach being used in Open Checkbook is based on the experience they had developing the state's stimulus program, and the website that Director Simon's office created for that program. The steering committee has been eager to add context to data, helping visitors who are uninitiated in the arcanery of state budgeting get a sense of what expenditures are for.

A first look at the web site

Let's take a quick tour of this service. To get to the home page, visit the main Massachusetts government portal, look down the right-hand side, and click on "Open Checkbook." You can then explore finances along several dimensions. For instance, choosing "View by Department" gives you a table of expenditures at a high level of abstraction, ranging from "Administration and Finance" to "Transportation." An attractive pie chart also appears. At this high level, the pie chart strikes one as odd because it shows an entirely different breakdown of expenditures from the table on the left. However, it makes sense once you realize that the pie chart reflects a breakdown of expenses within a particular category: for instance, what percentage of "Administration and Finance" went to various commitments, such as the Department of Revenue. You can investigate expenditures by drilling down in two ways:

  • View "Administration and Finance" in more detail by clicking on the plus sign next to its row in the table.

  • View the percentages of different expenditures by hovering over parts of the pie chart, and then view (for instance) the different Departments of Revenue by clicking on its segment in the pie chart.

If you persist in clicking down through the table on the left, you eventually see payments made to individual vendors. And then, by hovering over a vendor's name, you can pull up "Vendor Details" and eventually see a particular expenditure on a particular date, including the Fund Name and Account Name. Other parts of the Open Checkbook site break down expenditures by vendor or by relatively abstract "spending categories" instead of by department.

At lower levels, one also encounters some of the neat twists added by the site's creators. By hovering over the name of an expenditure, one can pull up a brief description. By clicking on the name of an agency, one can go to the agency's web page.

Expert reactions

Open Checkbook gets very high marks from Kaitlin Lee of the Sunlight Foundation. The level of detail provided on each vendor goes far beyond data provided by most states. Crucially, the state shows full information about the program that funded each expenditure (funding type, account number, funding type, object code classification, and fund name, whether the source is federal or a general fund), so that researchers can trace the flow of money from programs all the way through to payments. Future plans to include data on tax credits impressed Lee, because most states don't consider tax credits worth reporting in their data sets--but of course, it does represent an expenditure, and sometimes a quite controversial one. (The Boston Globe, for instance, recently highlighted tax credits to film companies that took wings on their own.) As for the commitment by Open Checkbook to update the site nightly, Lee enthusiastically called it "unprecedented." She was also impressed with their informative FAQ and list of future plans.

I asked Beth Noveck of New York Law School (and formerly the Deputy Chief Technology Officer in the Obama Administration) for a comment. She writes:

Open Checkbook is a fabulous exemplar of a government using open data to make itself more transparent to the public. To become even more accountable and effective, the next iteration of Checkbook 2.0 should ideally track how people are using the data downstream, and how are agencies are encouraging the creation of useful mashups and visualizations. To promote this kind of participation, the creators can start by articulating what kinds of visualizations they want to see people build and create a feedback loop by showing how have they taken action (e.g., projects cut, money saved) based on this data.

I also exchanged email with John F. Moore, Founder and CEO of Government in The Lab, who wrote me:

Massachusetts, especially under Governor Patrick, has taken a proactive role in using technology to better engage citizens in the process of government. Their early efforts on social media led to a very high quality social media toolkit for all government employees to use, providing best practices and legal guidance. The Open Checkbook project is another solid step forward, working to engage citizens by providing the information that is most interesting to citizens in tough economic times: how the government is spending money. From the viewpoint of the average citizen I think the site does a great job. It is fairly easy to use, has a search that works as you would expect, and allows people to easily answer basic questions.

The State of Massachusetts would be well served running more advertising to continue to build awareness amongst citizens of this resource. Too many governments entities invest time and money building wonderful solutions only to have them underutilized by citizens due to the lack of awareness that the solutions even exist.

Levels of usability for open data

After delving down to the detail pages of Open Checkbook, one can download a table in the form of a CSV file, as mentioned earlier. Why is this only at low levels, rather than at more comprehensive levels where one can glean more interesting data? Part of the reason has to do with sheer volume: some pages have very long tables and data downloads would strain the web site. But a more fundamental challenge exists.

Let's think for a moment about the role of structuring data. Contributions follow a power law. In open government, a few programmers with both the knowledge and the zeal to create applications will sift through government data, and the rest of the population will gratefully consume the information presented in easy-to-read forms. So the value of government data--the topic with which I started off this posting--depends a lot on its accessibility to programmers to turn it into consumable information.

Alexander Howard has reported on a five-star rating system Tim Berners-Lee developed for government data, which he nicely laid out in a video of a Government 2.0 presentation. Because Berners-Lee's system highlights relatively abstract concepts of open formats and Linked Data, I will condense the possibilities to three:

Moving paper documents to the Internet

In this most rudimentary effort, agencies put up PDFs or Microsoft Word files instead of printing them. This saves an investigator the cost of a stamp or a trip down to the registry, and definitely can make a quantitative difference in the amount of research using government information. But it's not clear how much of this information are useful for third-party applications, and extracting such data becomes an immense chore.

Adding metadata

At the next level, documents are enhanced with semantic mark-up. Laws, regulations, and council agendas, for instance, can be marked up to indicate titles, dates, summaries, and other structured sections. (The U.S. Congress's Thomas site is one of the best-known examples.) Raw data is presented in tabular form, often with pie charts or other visualizations, and sophisticated systems let you sort and filter displays by its columns.

Programmatic access

At the highest level in current use, data can be loaded into programs for big data research. The key here is a regular format (comma-separated value files are fine), although APIs are useful to allow queries according to criteria chosen by the user, such as "Show me all agencies that spent over one million dollars on consultants in 2009."

Programmatic access is the empowering factor that brings data to cell phones and visualizations.(We'll take another look at Berners-Lee's Linked Data later.) And it's increasingly common. Many of the data sets on the federal government's open government data platform, Data.gov can be downloaded programmatically. Socrata is a good model for the use of an API, providing precise data types that facilitate number-crunching by computer. The state of California doesn't seem to have APIs, but does sport some impressive data sets. For instance, with a few clicks, I can download a file of electricity consumption in California by any combination of counties, sectors, and years.

Lee believes that an API, while useful for retrieving small quantities of data, isn't too important. She's just as happy with one button allowing you to download all the data from the site in a ZIP file. Organizations like the Sunlight Foundation can then create API access to the data for the pleasure of other researchers. Lee's criticism of Open Checkbook, therefore, focuses on how little data you get with each download. Each download is limited to the first 10,000 rows, which may be nowhere near enough to retrieve the data on a large page. But the state could probably upgrade the site fairly easily to meet her criteria.

The research you can do on data is limited to a great extent by the way the data was collected. While I can easily compare electricity use by county in California, I can't ask "What percentage of electricity went to air conditioning." In the future, if California institutes a "smart grid," we could even investigate the consumption of air conditioning (and perhaps snoop on each other in interesting ways). Meanwhile, someone might be able to ferret out some facts about air conditioning by downloading information on a county basis and applying data retrieved elsewhere about climate.

Open Checkbook is considering an API, according to Simon, but the steering committee has to evaluate the costs of creating the API and compare it to potential benefits. So it's not certain that Massachusetts will move soon to the highest level of my hierarchy.

It should also be noted that Open Checkbook contains only state expenditures. Municipalities aren't covered (except that one can find what the state paid to them), nor are independent agencies like the MBTA transit system. Incorporating all those entities into Open Checkbook is a long-term goal.

Currently, one can just retrieve tables manually in CSV format (which can then be loaded into any one of many popular spreadsheets or scripting libraries). Furthermore, each table covers a small data set. What would it take to process data on a state-side basis?

The costs of consistent data

Tabular data requires the consistent application of categories during data entry. Data would be of little use--and in fact would offer the deceptive appearance of useful information--if different agencies classified something like advertising or disability payments in different categories but had their different classifications combined in one table.

To solve this, Berners-Lee promotes Linked Data, the cause to which he has devote most of his public life for the past few years. Technologies for Linked Data haven't been widely adopted yet, but are producing impressive results in scattered applications. (The thrust behind Linked Data is beyond the scope of this article. I'll just say that if you're comfortable with the concepts of taxonomies and ontologies you'll adapt quickly to Linked Data, and that if you're not comfortable with those concepts you'll probably want to tiptoe off and not hear any more.) Berners-Lee assures us in his Government 2.0 video that Linked Data solutions can be developed informally and incrementally, shared on a voluntary basis among institutions without bureaucratic intervention. But the creation of a data description is only one small step toward harmonizing data sets.

To answer the question mentioned earlier in this posting ("Show me all agencies that spent over one million dollars on consultants in 2009") the state would have to impose rigid workflow rules about categorizing and entering data. Training and enforcement could be entail prohibitive costs. Extracting data in structured formats from many different databases is also a burden.

If consistency could be achieved, the next level would be to coordinate data sets across many different jurisdictions. But this would take a good deal of effort to set and apply standards. Again, creating a data description is just one step in the process of collecting useful data, whether it's data on consultants in Open Checkbook or the use of electricity for air conditioning in California.

Public reactions and future plans

Still, the current level of organization in Open Checkbook promises benefits both inside and outside the state administration. For instance, the Committee for Public Counsel Services streamlined its accounts by combining small payments in checks of $50 or more. Simon is hoping that financial officers in agencies could find who was paying individually for services that the state could purchase more cheaply on a bulk level. The Secretary of Administration and Finance can quickly find out what an expenditure was for and to whom it was paid, without having to pick up the phone and ask for a report.

Journalists and activists can also indulge their investigative impulses. The salaries of all employees and pensions paid to retirees are all visible. Given some of the recent abuses reported in the press, a lot of citizens will welcome this transparency.

Simon says that the Governor and the Treasurer also want to use the data to create a dialog with the public. In just the month that the data has been released, many comments and suggestions have come in. Some of the interactions include:

  • One vendor complained that the site listed her home address. Investigation revealed that this was a systemic filtering problem affecting about 100 vendors. The team fixed the problem within 12 hours, and the Comptroller made permanent corrections to the accounting system so the issue will not recur.

  • After receiving multiple questions, the site developed a standard response form and updated their FAQs.

  • Users pointed out incompatibilities with Internet Explorer 9. A link was created on the Open Checkbook home page instructing users how to enable IE9 compatibility mode. A future update will fix the problem permanently.

Simon welcomes error reports and would like even more public feedback. Lots of positive comments have also been received, such as "I am overjoyed that we now have tool like this in the state of Massachusetts" and "Finally we have a politician who follows through." Eventually, I think, it would be valuable for the state to set up forums with logins and discussion areas to churn up group discussions of topics of interest to the public.

December 30 2011

2011 Gov 2.0 year in review

By most accounts, the biggest stories of 2011 were the Arab Spring, the historic earthquake and tsunami in Japan, and the death of Osama Bin Laden. In each case, an increasingly networked world experienced those events together through the growing number of screens. At the beginning of the year, a Pew Internet survey emphasized the Internet's importance in civil society. By year's end, more people were connected than ever before.

Time magazine named 2011 the year of the protester, as apt a choice as "You" was in 2006. "No one could have known that when a Tunisian fruit vendor set himself on fire in a public square, it would incite protests that would topple dictators and start a global wave of dissent," noted Time. "In 2011, protesters didn't just voice their complaints; they changed the world."

The Arab Spring extended well through summer, fall and winter, fueled by decades of unemployment, repression, and autocratic rule in Tunisia, Egypt, Libya, Syria, Yemen and Bahrain. This year's timeline of protest, revolution and uprising was not created by connection technologies, but by year's end, it had been accelerated by millions of brave young people connected to one another and the rest of the world through cell phones, social networks and the Internet.  

"We use Facebook to schedule the protests, Twitter to coordinate, and YouTube to tell the world," said an unnamed activist in Cairo in January.

In the months that followed, the Occupy Wall Street movement used the same tools in the parks and streets of the United States to protest economic inequality and call for accountability in the financial industry, albeit without the same revolutionary results.

This was the year where unemployment remained stubbornly high in the United States and around the world, putting job creation and economic growth atop the nation's priority list.

The theme that defined governments in Europe, particularly England, was austerity, as a growing debt crisis and financial contagion spread and persisted throughout the year. In Washington, the theme might be gridlock, symbolized by a threatened government shutdown in April and then brinkmanship over the debt crisis during the summer. As the year came to a close, a dispute between the White House, Senate and House over the extension of payroll tax cuts rounded out a long year of divided government.

We also saw a growing conflict between closed and open. It was a year that included social media adoption by government and a year where governments took measures to censor and block it. It was a year when we learned to think different about hacking, even while the "hacktivism" embodied in groups like Anonymous worried officials and executives in boardrooms around the world.

The United States bid farewell to its first CIO, Vivek Kundra, and welcomed his replacement, Steven VanRoekel, who advanced a "future first" vision for government that focuses on cloud, open standards, modularity and shared services. VanRoekel brought a .com mentality to the FCC, including a perspective that "everything should be an API," which caught the attention of some tech observers. While Kundra may have left government, his legacy remains: cloud computing and open data aren't going away in federal government, according to his replacement and General Services Administration (GSA) officials.

This was the year where the death of Steve Jobs caused more than a few people to wonder what Jobs would do as president. His legacy will resonate for many years to come, including the App Store that informed the vision of government as a platform.

If you look back at a January interview with Clay Johnson on key trends for Gov 2.0 and open government in 2011, some of his predictions bore out. The House of Representatives did indeed compete with the White House on open government, though not in story lines that played out in the national media or Sunday morning talk shows. The Government Oversight and Reform Committee took a tough look at the executive's progress in a hearing on open government. Other predictions? Not so much. Rural broadband stalled. Transparency as infrastructure is still in the future. We're still waiting on that to be automated, though when the collective intelligence of people in Washington looks at new versions of bills tied to the social web, there's at least a kludge.

Many of the issues and themes in 2011 were extensions of those in the 2010 Gov 2.0 Year in Review: the idea of government as a platform spread around the world; gated governments faced disruption; open government initiatives were stuck in beta; open data went global; and laws and regulations were chasing technology, online privacy, cloud computing, open source and citizen engagement.

"It's tough to choose which issue dominated the year in transparency, but I'd say that the Open Government Partnership, the E-government funding fight, and the Super Committee all loomed large for Sunlight," said John Wonderlich, policy director for the Sunlight Foundation. "On the state level, I'd include Utah's fight over FOI laws, Tennessee's Governor exempting himself from financial disclosure requirements, and the Wisconsin fight as very notable issues.  And the rise of Super PACs and undisclosed money in politics is probably an issue we're only just starting to see."

Three dominant tech policy issues

Privacy, identity and cybersecurity dominated tech policy headlines coming out of D.C. all year. By year's end, however, no major cybersecurity or consumer privacy bill had made it through the U.S. Congress to the president's desk. In the meantime, the Federal Trade Commission (FTC) made its own moves. As a result, Google, Facebook and Twitter are all now subject to "audits" by the FTC every two years.

On the third issue — cybersecurity — there was progress: The U.S. government's National Strategy for Trusted Identities in Cyberspace addressed key issues around creating an "identity ecosystem online." Implementation, however, will require continued effort and innovation from the private sector. By year's end, Verizon became the first identity provider to receive Level of Access 3 credentialing from the U.S. government. Look for more identity providers to follow in 2012, with citizens gaining increased access to government services online as a result.

A meme goes mainstream

This was the year when the story of local governments using technology with citizens earned more attention from mainstream media, including outlets like the Associated Press and National Public Radio.

In February, the AP published a story about how cities are using tech to cull ideas from citizens. In the private sector, leveraging collective intelligence is often called crowdsourcing. In open government, it's "citizensourcing." In cities around the country, the approach is gaining traction.

At Yahoo Canada, Carmi Levy wrote that the future of government is citizen focused. In his view, open government is about leveraging technology and citizens to do more with less. It's about doing more than leaving or speaking up: it's making government work better.

In November, NPR listeners learned more about the open government movement around the country when the Kojo Nnamdi Show hosted an hour-long discussion on local Gov 2.0 on WAMU in Washington, D.C. Around the same time, the Associated Press reported that a flood of government data is fueling the rise of city apps:

New York, San Francisco and other cities are now working together to develop data standards that will make it possible for apps to interact with data from any city. The idea, advocates of open data say, is to transform government from a centralized provider of services into a platform on which citizens can build their own tools to make government work better.

Gov 2.0 goes local

All around the country, pockets of innovation and creativity could be found, as "doing more with less" became a familiar mantra in many councils and state houses. New open data platforms or citizen-led initiatives sprouted everywhere.

Here's just a sample of what happened at the local level in 2011:

If you want the full fire hose, including setbacks to open government on the state level, read the archives of the Sunlight Foundation's blog, which aggregated news throughout the year.

Several cities in the United States hopped on the open government and open data bandwagon in 2011. Baltimore empowered its citizens to acts as sensors with new mobile apps and Open311. New York City is opening government data and working to create new relationships with citizens and civic developers in the service of smart government. Further afield, Britain earned well deserved attention for seeking alpha, with its web initiatives and an open architecture that could be relevant to local governments everywhere.

In 2011, a model open government initiative gained traction in Cook County. In 2012, we'll see if other municipalities follow. The good news is that the Pew Internet and Life Project found that open government is tied to higher levels of community satisfaction. That carrot for politicians comes up against the reality that in a time of decreased resources, being more open has to make economic sense and lead to better services or more efficiency, not just be "the right thing to do."

One of the best stories in open government came from Chicago, where sustainability and analytics are guiding Chicago's open data and app contest efforts. The city's approach offers important insights to governments at all levels. Can the Internet help disrupt the power of Chicago lobbyists through transparency? We'll learn more in 2012.

Rise of the civic startups

This year, early entrants like SeeClickFix and Citysourced became relatively old hat with the rise of a new class of civic startups that aspire to interface with the existing architectures of democracy. Some hope to augment what exists, others to replicate democratic institutions in digital form.  [Disclosure: O'Reilly AlphaTech Ventures is an investor in SeeClickFix.]

This year, new players like ElectNext, OpenGovernment.org, Civic Commons, Votizen and POPVOX entered the mix alongside many other examples of social media and government innovation. [Disclosure: Tim O'Reilly was an early angel investor in POPVOX.]

In Canada, BuzzData aspires to be the GitHub of datasets. Simpl launched as a platform to bridge the connection between social innovators and government. Nation Builder went live with its new online activism platform.

Existing civic startups made progress as well. BrightScope unlocked government data on financial advisers and made the information publicly available so it could be indexed by search engines. The Sunlight Foundation put open government programming on TV and a health app in your pocket. Code for America's 2011 annual report offered insight into the startup nonprofit's accomplishments.

Emerging civic media

The 2011 Knight News Challenge winners illustrated data's ascendance in media and government. It's clear that data journalism and data tools will play key roles in the future of media and open government.

It was in that context that the evolution of Safecast offered us a glimpse into the future of networked accountability, as citizen science and open data help to inform our understanding of the world. After a tsunami caused a nuclear disaster in Japan, a radiation detection network starting aggregating and publishing data. Open sensor networks look like an important part of journalism's future.

Other parts of the future of news are more nebulous, though there was no shortage of discussion about it. The question of where citizens will get their local news wasn't answered in 2011. A Pew survey of local news sources revealed the influence of social and mobile trends, along with a generation gap. As newsprint fades, what will replace it for communities? We don't know yet.

Some working models are likely to be found in civic media, where new change agents aren't just talking about the future of news; they're building it. Whether it's mobile innovation or the "Freedom Box," there's change afoot.

This was also a deadly year for journalists. The annual report from the Committee to Protect Journalists found 44 journalists were killed in the line of duty, with the deaths of dozens more potentially associated with the process of gathering and sharing information. Only one in six people lives in a country with a free press, according to the 2011 report on world press freedom from Freedom House.

Open source in government

At the federal level, open source continued its quiet revolution in government IT. In April, the new version of FCC.gov incorporated the principles of Web 2.0 into the FCC's online operations. From open data to platform thinking, the reboot elevated FCC.gov from one of the worst federal websites to one of the best. In August, the Energy Department estimated that the new Energy.gov would save $10 million annually through a combination of open source technology and cloud computing.

The White House launched IT Dashboard and released parts of it as open source code. (It remains to be seen whether the code from those platforms is re-used in the market.)

NASA's commitment to open source and its game plan for open government were up for discussion at the recent NASA Open Source Summit. One of NASA's open source projects, Nebula, saw its technology used in an eponymous startup. Nebula, the company, combines open source software and hardware in an appliance. If Nebula succeeds, its "cloud controller" could enable every company to implement cloud computing.

In cities, the adoption of "Change By Us" in Philadelphia and OpenDataPhilly in Chattanooga showed the potential of reusable civic software.

At the end of 2011, Civic Commons opened up its marketplace. The Marketplace is designed to be a resource for open source government apps. As Nick Judd observed at techPresident, both Civic Commons and its Marketplace "propose to make fundamental changes to the way local governments procure IT goods and services."

Open government goes global

As White House tech talent comes and goes, open government continued to grow globally.

In September, a global Open Government Partnership (OGP) launched in New York City. Video of the launch, beginning with examples of open government innovation from around the world, is embedded below:

Making the Open Government Partnership work won't be easy, but it's an important initiative to watch in 2011. As The Economist's review of the Open Government Partnership highlights, one of the most important elements is the United States' commitment to join the Extractive Industries Transparency Initiative. If this initiative bears fruit, citizens will have a chance to see how much of the payments oil and gas companies send to governments actually end up in the public's coffers.

Even before the official launch of the OGP, there was reason to think that something important was afoot globally in the intersection of governments, technology and society. In Africa, the government of Kenya launched Open Kenya and looked to the country's dynamic development community to make useful applications for its citizens. In Canada, British Columbia joined the ranks of governments embracing open government platforms. Canadian citizens in the province of British Columbia now have three new websites that focus on open government data, making information related to accountability available and providing easier access to services and officials. In India, the seeds of Gov 2.0 started bearing fruit through a growing raft of civil society initiatives. In Russia, Rospil.info aimed to expose state corruption.

For open government advocates, the biggest advance of the year was "the recognition of the need for transparency of government information world wide as a means for holding government and its officials accountable," said Ellen Miller, executive director of the Sunlight Foundation, via email. "The transparency genie is out of the bottle — world wide — and it's not going back into the darkness of that lantern ever again.  Progress will be slow, but it will be progress."

Federal open government initiatives

"Cuts in e-gov funds, Data.gov evolution, Challenge.gov and the launch of many contests were the big stories of the year," commented Steve Ressler, the founder of Govloop. Ressler saw Gov 2.0 go from a shiny thing to people critically asking how it delivers results.

At the beginning of the year, OMB Watch released a report that found progress on open government but a long road ahead. At the end of 2011, the Sunlight Foundation assessed the Open Government Directive two years on and found "mixed results." John Wonderlich put it this way:

Openness without information is emptiness.  If some agencies won't even share the plans they've made for publishing new information, how far can their commitment to openness possibly go? The Open Government Directive has caused a lot of good.  And it has also often failed to live up to its promise, the administration's rhetoric, and agencies' own self-imposed compliance plans. We should remember that Presidential rhetoric and bureaucratic commitments are not the same thing as results, especially as even more administration work happens through broad, plan-making executive actions and plans.

In 2011, reports of the death of open government were greatly exaggerated. That doesn't mean its health in the United States federal government is robust. In popular culture, of course, its image is even worse. In April, Jon Stewart and the Daily Show mocked the Obama administration and the president for a perceived lack of transparency.

Stewart and many other commentators have understandably wondered why the president's meeting with open government advocates to receive a transparency award wasn't on the official schedule or covered by the media. A first-hand account of the meeting from open government advocate Danielle Brian offered a useful perspective on the issues that arose that go beyond a sound bite or one-liner.

Some projects are always going to be judged as more or less effective in delivering on the mission of government than others. An open government approach to creating a "Health Internet" may be the most disruptive of them. For those who expected to see rapid, dynamic changes in Washington fueled by technology, however, the bloom has long since come off of the proverbial rose. Open government is looking a lot more like an ultra-marathon than a 400-yard dash. As a conference at the National Archives reminded the open government community, media access to government information also has a long way to go.

Reports on citizen participation and rulemaking from America Speaks offered open government guidance beyond technology. Overall, the administration received mixed marks. While America Speaks found that government agencies "display an admirable willingness to experiment with new tools and techniques to involve citizens with their decision-making processes," it also found the "Open Government Initiative and most Federal Agency plans have failed to offer standards for what constitutes high-quality public participation."

On the one hand, agencies are increasing the number of people devoted to public engagement and using a range of online and offline forums. On the other, "deliberative processes, in which citizens learn, express points of view, and have a chance to find common ground, are rarely incorporated." Getting to a more social open government is going to take a lot more work.

There were other notable landmarks. After months of preparation, the local .gov startup went live. While ConsumerFinance.gov went online back in February, the Consumer Financial Protection Board (CFPB) officially launched on the anniversary of H.R.4173 (the Dodd-Frank Wall Street Reform and Consumer Protection Act),  with Richard Cordray nominated to lead it.  By year's end, however, he still had not been confirmed. Questions about the future of the agency remain, but to place credit where credit is due: the new consumer bureau has been open to ideas about how it can do its work better. This approach is what led New York Times personal finance columnist Ron Lieber to muse recently that "its openness thus far suggests the tantalizing possibility that it could be the nation's first open-source regulator."

When a regulator asks for help redesigning a mortgage disclosure form, something interesting is afoot.

It's extremely rare that an agency gets built from scratch, particularly in this economic and political context. It's notable, in that context, that the 21st century regulator embraced many of the principles of open government in leveraging technology to stand up the Consumer Financial Protection Bureau.

This fall, I talked with Danny Weitzner, White House deputy chief technology officer for Internet policy, about the administration's open government progress in 2011. Our interview is embedded below:

In our interview, we talked about what the Internet means to government and society, intellectual property, the risks of a balkanized Internet, digital privacy, the Direct Project, a "right to connect," ICE takedowns and open data initiatives. On the last issue, the Blue Button movement, which enables veterans to download a personal health record, now has a website: BlueButtonData.org. In September, Federal CTO Aneesh Chopra challenged the energy industry to collaborate in the design of a "green button" modeled after that Blue Button. All three of California's public utilities have agreed to standardize energy data for that idea.

Tim O'Reilly talked with Chopra and White House deputy CTO for public sector innovation Chris Vein about the White House's action plan for open government innovation at the Strata Summit in September. According to Chopra, the administration is expanding Data.gov communities to agencies, focusing on "smart disclosure" and building out "government as a platform," with an eye to embracing more open innovators.

As part of its commitments to the Open Government Partnership, the White House also launched an e-petitions platform this fall called "We The People."

The White House has now asked for feedback on the U.S. Open Government National Action Plan, focusing on best practices and metrics for public participation. Early responses include focusing on outcomes first and drawing attention to success, not compliance. If you're interested in giving your input, Chopra is asking the country questions on Quora.

Opening the People's House

Despite the abysmal public perception of Congress, genuine institutional changes in the House of Representatives, driven by the GOP embracing innovation and transparency, are incrementally happening. As Tim O'Reilly observed earlier in the year, the current leadership of the House is doing a better job on transparency than their predecessors.

In April, Speaker John Boehner and Majority Leader Eric Cantor sent a letter to the House Clerk about releasing legislative data. Then, in September, a live XML feed for the House floor went online. Yes, there's a long way to go on open legislative data quality in Congress — but at year's end,  following the first "Congressional hackathon," the House approved sweeping open data standards.

The House also made progress in opening up its recorded videos to the nation. In January, Carl Malamud helped make the hearings of the House Committee on Oversight and Government Reform available on the Internet in high-quality video at house.resource.org. Later in the year, HouseLive.gov brought live video to mobile devices.

Despite the adoption of Twitter and Facebook by the majority of senators and representatives, Congress as a whole still faces challenges in identifying constituents on social media.

It's also worth noting that, no matter what efforts have been made to open the People's House through technology, at year's end, this was the least popular Congress in history.

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

Open data

The open data movement received three significant endorsements on the world stage in 2011.

1. Open government data was featured in the launch of the Open Government Partnership.

That launch, however, offered an opportunity to reflect upon the fundamental conditions for open government to exist. Simply opening up data is not a replacement for a Constitution that enforces a rule of law, free and fair elections, an effective judiciary, decent schools, basic regulatory bodies or civil society, particularly if the data does not relate to meaningful aspects of society. That said, open data is a key pillar of how policy makers are now thinking about open government around the world.

2. The World Bank continued to expand what it calls "open development" with its own open data efforts

The World Bank is building upon the 2010 launch of data.worldbank.org. It's now helping countries prepare and launch open government data platforms, including support for Kenya. In December, the World Bank hosted a webinar about how countries can start and run open government data ecosystems, launched an online open data community, and published a series of research papers on the topic.

Realizing the Vision of Open Government Data (Long Version): Opportunities, Challenges and Pitfalls

3. The European Union's support for open data

The BBC reported that Europe's governments are "sitting on assets that could be worth 40bn euros ($52bn, £33.6bn) a year" in public sector data. In addition, the European Commission has launched an open data strategy for the EU. Here's Neelie Kroes, vice president of the European Commission, on public data for all:

Big data means big opportunities. These opportunities can flow from public and private data — or indeed from mixing the two. But a public sector lead can set an example, allowing the same taxpayers who have paid for the data to be gathered to benefit from its wider use. In my opinion, data should be open and available by default and exceptions should be justified — not the other way around, as is too often the case still today.

Access to public data also has an important and growing economic significance. Open data can be fuel for innovation, growth and job creation. The overall economic impact across the whole EU could be tens of billions of Euros per year. That's amazing, of course! But, big data is not just about big money. It promises a host of socially and environmentally beneficial uses too — for example, in healthcare or through the analysis of pollution patterns. It can help make citizens' lives easier, more informed, more connected.


As Glynn Moody wrote at Computer World UK, Europe is starting to get it.

Open data is not a partisan issue, in the view of professor Nigel Shadbolt. In 2012, Shadbolt will lead an "Open Data Institute" in England with Tim Berners-Lee.

Shadbolt is not out on a limb on this issue. In Canada and Britain, conservative governments supported new open data initiatives. In 2011, open government data also gathered bipartisan support in Washington when Rep. Darrell Issa introduced the DATA Act to track government financial spending. We talked about that and other open government issues this fall during an interview at the Strata Conference:

There was no shortage of other open data milestones, from Google adding the Public Data Explorer to its suite of free data tools to an International Open Government Data Camp in Poland.

In New York City, social, mapping and mobile data told the story of Hurricane Irene. In the information ecosystem of 2011, media, government and citizens alike played a critical role in sharing information about what's happening in natural disasters, putting open data to work and providing help to one another.

Here at Radar, MySociety founder Tom Steinberg sounded a cautionary note about creating sustainable open data projects with purpose. The next wave of government app contests need to incorporate sustainability, community, and civic value. Whether developers are asked to participate in app contests, federal challenges, or civic hackathons, in 2012, the architects behind these efforts need to focus on the needs of citizens and sustainability.

Open mapping

One of the biggest challenges government agencies and municipalities have is converting open data to information from which people easily can draw knowledge. One of the most powerful ways humanity has developed to communicate information over time is through maps. If you can take data in an open form and map it out, then you have an opportunity to tell stories in a way that's relevant to a region or personalized to an individual.

There were enough new mapping projects in 2011 that they deserved their own category. In general, the barrier to entry for mapping got lower thanks to new open source platforms like MapBox, which powered the Global Adaptation Index and a map of the humanitarian emergency in the Horn of Africa. And Data.nai.org.afs charted attacks on the media onto an interactive map of Afghanistan.

IssueMap.org, a new project launched by the FCC and FortiusOne, aimed to convert open data into knowledge and insight. The National Broadband Map, one of the largest implementations of open source and open data in government to date, displayed more than 25 million records and incorporated crowdsourced reporting. A new interactive feature posted at WhiteHouse.gov used open data to visualize excess federal property.

"Maps can be a very valuable part of transparency in government," wrote Jack Dangermond, founder of ESRI. "Maps give people a greater understanding of the world around them. They can help tell stories and, many times, be more valuable than the data itself. They provide a context for taxpayers to better understand how spending or decisions are being made in a circumstance of where they work and live. Maps help us describe conditions and situations, and help tell stories, often related to one's own understanding of content."

Social media use grows in government

When there's a holiday, disaster, sporting event, political debate or any other public happening, we now experience it collectively. In 2011, we were reminded that there were a lot of experiences that used to be exclusively private that are now public because of the impact of social media, from breakups to flirting to police brutality. From remembering MLK online to civil disobedience at the #Occupy protests, we now can share what we're seeing with an increasingly networked global citizenry.

Those same updates, however, can be used by autocratic regimes to track down protestors, dissidents and journalists. If the question is whether the Internet and social media are tools of freedom or tools of oppression, the answer may have to be "yes." If online influence is essential to 21st century governance, however, how should government leaders proceed?

Some answers could be found in the lessons learned by the Federal Emergency Management Agency (FEMA), the Red Cross and Crisis Commons that were entered into the Congressional Record when the U.S. Senate heard testimony on the role of social media in crisis response.

If you're a soldier, you should approach social media carefully. The U.S. Army issued a handy social media manual to help soldiers, and the Department of Veterans Affairs issued a progressive social media policy.

A forum on social media at the National Archives featured a preview of a "citizen archivist dashboard" and a lively discussion of the past, present and future of social media — a future which will certainly include the growth of networks in many countries. For instance, in 2011, Chinese social media found its legs.

For a comprehensive discussion of how governments dealt with social media in 2011, check out this piece I wrote for National Journal.

Intellectual property and Internet freedom

In 2011, the United Nations said that disconnecting Internet users is a breach of human rights. That didn't stop governments around the world from considering it under certain conditions. The UN report came at an important time. As Mathew Ingram wrote at GigaOm, reporting on a UNESCO report on freedom of expression online, governments are still trying to kill, replace or undo the Internet.

In 2011, Russia earned special notice when it blocked proposals for freedoms in cyberspace. The Russian blogosphere came under attack in April. This fall, DDoS attacks were used in Russia after the elections in an attempt to squelch free speech. As Russian activists get connected, they'll be risking much to express their discontent.

In May, the eG8 showed that online innovation and freedom of expression still need strong defenders. While the first eG8 Forum in Paris featured hundreds of business and digital luminaries, the policies discussed were of serious concern to entrepreneurs, activists, media and citizens around the world. If the Internet has become the public arena for our time, as the official G8 statement that followed the Forum emphasized, then defending the openness and freedoms that have supported its development is more important than ever.

That need became clearer at year's end when the United States Congress considered anti-piracy bills that could cripple Internet industries. In 2012, the Stop Online Piracy Act (SOPA) and PROTECT IP Act will be before Congress again. Many citizens are hoping that their representatives decide not to break the Internet.

After all, if an open Internet is the basis for democracy flourishing around the world, billions of people will be counting upon our leaders to keep it open and accessible.

What story defined the year for you?

On Govloop, the government social network, the community held its own debate on the issue of the year. There, the threat of a government shutdown led the list. A related issue — "austerity" — was the story that defined government in 2011 in Chris Dorobek's poll. I asked people on Govloop, Quora, Twitter, Facebook and Google+ what the most important Gov 2.0 or open government story of 2011 was and why. Their answers were all about what happened in the U.S., versus the globe, but here's what I heard:

1. The departure of Kundra and White House deputy CTO for open government Beth Noveck mattered

"The biggest story of the year was Vivek Kundra and Beth Noveck leaving the White House," commented Andy Krzmarzick, director of community engagement at Govloop. "Those personnel changes really stalled momentum, generally speaking, on the federal level. I respect their successors immensely, but I think they have an uphill climb as we head into an election year and resisters dig in their heels to wait it out and see if there is a change in administration before they spend a lot of time and energy at this stage of the game. Fortunately, the movement has enough of a ground swell that we'll carry the torch forward regardless of leadership ... but it sure helps to have strong champions."

Terell Jones, director of green IT solutions at EcomNets, agreed. "The departure of Vivek Kundra as CIO of the United States. Under his watch they developed the Cloud Computing Strategy, the 25 Point Plan, and the Federal Data Center Consolidation Initiative (FDCCI). He saved the federal government millions, but they cut his budget so he would be ineffective; so, he escaped to Harvard University," commented Jones. "He may have been frustrated with the speed at which government moves, but he made great strides in the right direction. I hope his replacement will stay the course."

2. Budget cuts to the Office of Management and Budget's E-Government Fund

"I think the biggest story is the Open Government budget cuts," commented Steve Radick, a lead associate with Booz Allen Hamilton, which consults with federal agencies. "After all, these seemed to be the writing on the wall for Vivek's departure, and forced everyone to re-think why open government was so important. It wasn't just for the sake of becoming a more open government — open government needed to be about more than that. It needed to show real mission impact. I think these budget cuts and the subsequent realization of the Gov 2.0 community that Gov 2.0 efforts needed to be deeper than just retweets, friends, and fans was the biggest story of 2011."

3. Insider trading in Congress

"I think the most important story of the year was the 60 Minutes expose on insider trading in Congress," commented Joe Flood, a D.C.-area writer and former web editor at DC.gov and NOAA. "It demonstrated the power of data to illuminate connections that were hidden, showing how members of Congress made stock trades based upon their inside information on pending legislation. It showed what could be done with open data as well as why government transparency is so vital."

4. Hackathons

"I feel like 2011 was kind of the year of the hackathon," commented Karen Suhaka, founder of Legination. "Might just be my perception, but the idea seems to be gaining significant steam."

5. iPads in government

"I think the winner should be iPads on the House Floor and in committee hearings," commented Josh Spayher, a Chicago attorney and creator of GovSM.com. "[It] totally transforms the way members of Congress can access information when they need it."

6. Social media in emergencies, National Archives and Records Administration (NARA), and open government in the European Union

"I think there was significant progress in the use of social media for emergency alerts/warnings and disaster response this year," commented Mollie Walker, editor of FierceGovernmentIT.  "It also shows agencies are letting this evolve beyond a broadcast medium and seeing the value of a feedback loop for mission-critical action. Although it hasn't really come to fruition yet (it's technically in the "operational" phase, though development and migration appear to still be in progress), I think the NARA's electronic record archive has some positive implications for open government going forward. It's something to watch for in 2012, but the fact that NARA tied up a lot of loose ends in 2011 was a big win. The open government efforts in the E.U. are also worth noting. While there have been isolated initiatives in the U.S. and U.K., seeing a governing body such as the E.U. set new standards for openness could have a broader impact on how the rest of the world manages and shares public information."

If you think there's another story that deserves to be listed, please let us know in the comments.

The year ahead

What should we expect in the year ahead? Some predictions are easier than others. The Pew Internet and Life Project found that more than 50% of U.S. adults used the Internet for political purposes during the 2010 midterm elections. Pew's research also showed that a majority of U.S. citizens now turn to the web for news and information about politics. Expect that to grow in 2012.

This year, there was evidence of the maker movement's potential for education, jobs and innovation. That same DIY spirit will matter even more in the year ahead. We also saw the impact of apps that matter, like a mobile geolocation app that connected first responders to heart attack victims. If developers want to make an impact, we need more applications that help us help each another.

In 2011, there were more ways for citizens to provide feedback to their governments than perhaps ever before. In 2012, the open question will be whether "We the People" will use these new participatory platforms to help government work better.

The evolution of these kinds of platforms is neither U.S.-centric nor limited to tech-savvy college students. Citizen engagement matters more now in every sense: crowdfunding, crowdsourcing, crowdmapping, collective intelligence, group translation, and human sensor networks. There's a growth in "do it ourselves (DIO) government," or as the folks at techPresident like to say, "We government." As institutions shift from eGov to WeGov, leaders will be looking more to all of us to help them in the transition.

Related:

December 20 2011

There's a map for that

On November 6, 2012, millions of citizens in the United States will elect or re-elect representatives in Congress. Long before those citizens reach the polls, however, their elected representatives and their political allies in the state legislatures will have selected their voters.

Given powerful new data analysis tools, the practice of "gerrymandering, or creating partisan, incumbent-protected electoral districts through the manipulation of maps, has reached new heights in the 21st century. The drawing of these maps has been one of the least transparent processes in governance. Public participation has been limited or even blocked by the authorities in charge of redistricting.

While gerrymandering has been part of American civic life since the birth of the republic, one of the best policy innovations of 2011 may offer hope for improving the redistricting process. DistrictBuilder, an open-source tool created by the Public Mapping Project, allows anyone to easily create legal districts.

Michael P. McDonald, associate professor at George Mason University and director of the U.S. Elections Project, and Micah Altman, senior research scientist at Harvard University Institute for Quantitative Social Science, collaborated on the creation of DistrictBuilder with Azavea.

"During the last year, thousands of members of the public have participated in online redistricting and have created hundreds of valid public plans," said Altman, via an email. "In substantial part, this is due to the project's effort and software. This year represents a huge increase in participation compared to previous rounds of redistricting — for example, the number of plans produced and shared by members of the public this year is roughly 100 times the number of plans submitted by the public in the last round of redistricting 10 years ago. Furthermore, the extensive news coverage has helped make a whole new set of people aware of the issue and has reframed it as a problem that citizens can actively participate in to solve, rather than simply complain about."

For more on the potential and the challenges present here, watch the C-SPAN video of the Brookings Institution discussion on Congressional redistricting and gerrymandering, including what's happening in states such as California and Maryland. Participants include Norm Ornstein of the American Enterprise Institute and David Wasserman of the Cook Political Report. 

The technology of district building

DistrictBuilder lets users analyze if a given map complies with federal and advocacy-oriented standards. That means maps created with DistrictBuilder are legal and may be submitted to a given's state's authority. The software pulls data from several sources, including the 2010 US Census (race, age, population and ethnicity); election data; and map data, including how the current districts are drawn. Districts can also be divided by county lines, overall competitiveness between parties, and voting age. Each district must have the same total population number, though they are not required to have the same number of eligible voters.

On the tech side, DistrictBuilder is a combination of Django, GeoServer, Celery, jQuery, PostgreSQL, and PostGIS. For more developer-related posts about DistrictBuilder, visit the Azavea website. A webinar that explains how to use DistrictBuilder is available here.

DistrictBuilder is not the first attempt to make software that lets citizens try their hands at redistricting. ESRI launched a web-based application for Los Angeles this year.

"The online app makes redistricting accessible to a wide audience, increasing the transparency of the process and encouraging citizen engagement," said Mark Greninger, geographic information officer for the County of Los Angeles, in a prepared statement. "Citizens feel more confident because they are able to build their own plans online from wherever they are most comfortable. The tool is flexible enough to accommodate a lot of information and does not require specialized technical capabilities."

DistrictBuilder does, however, look like an upgrade to existing options available online. "There are a handful of tools" that enable citizens to participate, said Justin Massa in an email. Massa was the director of project and grant development at the Metro Chicago Information Center (MCIC) and is currently the founder and CEO of Food Genius. "An ESRI plugin and Borderline jump to mind although I know there are more, but all of them are proprietary and quite expensive. There's a few web-based versions, but none of them were usable in my testing."

Redistricting competitions

DistrictBuilder is being used in several state competitions to stimulate more public participation in the redistricting process and improve the maps themselves. "While gerrymandering is unlikely to be the driving force in the trend toward polarization in U.S. politics, it would result in a significant number of seats changing hands, and this could have a substantial effect on what laws get passed," said Altman. "We don't necessarily expect that software alone will change this, or that the legislatures will adopt public plans (even where they are clearly better) but making software and data available, holding competitions, and hosting sites where the public can easily evaluate and create plans that pass legal muster, has increased participation and awareness dramatically."

The New York Redistricting Project (NYRP) is hosting an open competition to redistrict New York congressional and state legislative districts. NYRP is collaborating with the Center for Electoral Politics and Democracy at Fordham University in an effort to see if college students can outclass Albany. The deadline for entering the New York student competition is Jan. 5, and the contest is open to all NY students.

In Philadelphia, FixPhillyDistricts.com included cash prizes when it kicked off in August of this year. By the end of September, citizensourced redistricting efforts reached the finish line, though it's unclear how much impact they had. In Virginia, a similar competition is taking aim at the "rigged redistricting process."

"This [DistrictBuilder] redistricting software is available not only to students, but to the public at large," said Costas Panagopoulos in a phone interview. At Fordham University, Panagopoulos is an assistant professor of political science, the director of the Center for Electoral Politics and Democracy, and the director of the graduate program in Elections and Campaign Management. "It's open source, user friendly and has no costs associated with it. It's a great opportunity for people to get involved and have the tools they need to design maps as alternatives for legislatures to consider."

Panagopoulos says maps created in DistrictBuilder can matter when redistricting disputes end up in the courts. "We have seen evidence from other states where competitions have been held," he said. "Official government entities have looked to maps that have been drawn by students for guidance. In Virginia, students submitted maps that enhanced minority representation. There are elements in the plan that will be officially adopted."

While it might seem unlikely that a map created by a team of students will be adopted, elements created by students in New York could make their way into discussions in Albany, posited Panagopoulos. "Our sense is that the criteria students will use to design maps will be somewhat different than what lawmakers will choose to pursue," he said. "Lawmakers may take concerns about protecting incumbents or partisan interests more to heart than citizens will. At the end of the day, if lawmakers think that a plan is ultimately worse off for both parties, they may adopt something that's more benign. That's what happened in the last round of redistricting. Legislators pushed through a different map rather than the one imposed by a judge."

For a concrete example of how the politics play out in one state, look at Texas. Ross Ramsey, the executive editor of The Texas Tribune, wrote about redistricting in the Texas legislature and courts:

The 2010 elections put overwhelming Republican majorities in both houses of the Legislature just as the time came to draw new political maps for state legislators, the Congressional delegation and members of the State Board of Education. Those Republicans drew maps to give each district an even number of people and to maximize the number of Republican districts that could be created, they thought, under the Voting Rights Act and the federal and state constitutions.

Or look at Illinois, where a Democratic redistricting plan would maximize the number of Democratic districts in that state. Or Pennsylvania, where a new map is drawing condemnation for being "rife with gerrymandering," according to Eric Boehm of the PA Independent.

While redistricting has historically not been the most accessible governance issue to the voting public, historic levels of dissatisfaction with the United States Congress could be channeled into more civic engagement. "The bottom line is that the public never had an opportunity to be as involved in redistricting as they are now," said Panagopoulos. "It's important that the public get involved."

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

Better redistricting software requires better data

Redistricting is "an emerging open-government issue that, for whatever reason, hasn't gotten a ton of attention yet from our part of the world," wrote Massa. "This scene is filled with proprietary datasets, intentionally confusing legislative proposals, antiquated laws that don't compel the publication of shape files, and election results data that is unbelievably messy."

As is the case with other open-government platforms, DistrictBuilder will only work with the right base data. "About a year ago, MCIC worked on a voting data project just for seven counties around Chicago," said Massa. "We found that none of the data we obtained from county election boards matched what the Census published as part of the '08 boundary files." In other words, a hoary software adage applies: "garbage in, garbage out."

That's where MCIC has played a role. "MCIC has been working with the Midwest Democracy Network to implement DistrictBuilder for six states in the Midwest," wrote Massa. According to Massa, Illinois, Indiana, Wisconsin, Michigan, and Ohio didn't have anything available at a state level. Among these states, according to Massa, only Minnesota publishes clean data. Earlier this year, MCIC launched DistrictBuilder for Minnesota.

"The unfortunate part is that the data to power a truly democratic process exists," said Massa. "We all know that no one is hand-drawing maps and then typing out the lengthy legislative proposals that describe, in text, the boundaries of a district. The fact that the political parties use tech and data to craft their proposals and then, in most cases, refuse to publish the data they used to make their decisions, or electronic versions of the proposals themselves, is particularly infuriating. This is a prime example of data 'empowering the empowered'."

Image Credit: Elkanah Tisdale's illustration of gerrymandering, via Wikipedia.

Related:

September 20 2011

Historic global Open Government Partnership launches in New York City

Open government is about to assume a higher profile in foreign affairs. On July 12, 2011, the State Department hosted an historic gathering in Washington to announce the (OGP) with Brazil and six other nations. Today in New York City, this unprecedented global partnership will launch. Heads of state, representatives of civil society, members of the free press and technologists will convene at the New York offices of Google to hail the "Power of Open" around the world. In the afternoon, President Obama and the leaders of seven other countries will announce their national action plans and commitments to open government. I'll be liveblogging the event here on the Radar Gov 2.0 channel and tweeting out pictures to Tumblr and other social platforms. Virtual participants will be able to watch the launch at Google's YouTube channel at 9 AM EST.

Some 43 countries have now indicated their intent to join this international open government partnership, with the vast majority joining the founding eight members, led by Brazil and the United States. The formation of the OGP revisited the bilateral U.S.-Indian partnership on open government that was announced during President Obama's trip to India last November, although India subsequently withdrew from the OGP in July.

In her remarks on July 12 at the State Department, Secretary of State Clinton explicitly connected open government to economic activity. "We've also seen the correlation between openness in government and success in the economic sphere," said Clinton. "Countries committed to defending transparency and fighting corruption are often more attractive to entrepreneurs. And if you can create small- and medium-size businesses, you have a broader base for economic activity. At a time when global competition for trade and investment is fierce, openness is not just good for governance, it is also good for a sustainable growth in GDP."

In the week following Clinton's speech, I spoke with Mario Otero, Under Secretary of State for Democracy and Global Affairs at the United States State Department, about the Open Government Partnership and what it will mean. Our interview follows. You can also listen to an audio recording of our discussion, embedded above.

Can you explain how open government and a greater degrees of transparency or accountability are related to investment, economic output or activity?

"I think what the secretary said really summarizes well one aspect of what's economic growth and even economic development in a country, which is really how the rest of the world perceive it and how the rest of the world measures risk when you invest in a country," said Otero. "Clearly, if anyone looks at the components of country risk as you invest, issues that have to do with transparency and accountability are present within the factors that comprise that equation.

Otero explored other aspects of open government that arose in discussions at the forums at the State Department in July. "One was clearly that transparency will insure that resources are used for what they are meant to be used for in their totality, in part because you are seeing the transfer of funds and the use of funds, to make sure that some of that is not being set aside for other things or in some way taken out for corrupt reasons," she said. "That concept of being able to use a country's revenues in order to carry out a government's mandate and plan is again one way which the economic concept becomes important. Even if you're talking about health, if in fact you're providing improved health services to your population, you are improving their capacity to be productive citizens and to contribute to the economy. I mean, you can just go across the board."

"Another thing that came up that was very interesting, and it was actually brought up on Kenya, was the degree to which they themselves were not asking to collect information completely, but now that they are, how it is that they look at some of the items that they import into the country they they themselves could produced or could have. Just looking more carefully both at their balance of trade issues, recording all the information, giving emphasis to using data to make decisions, led, certainly the Kenyan participants, to give a couple of examples of how their imports had decreased in a couple of areas."

"These are different ways that open government can address directly the question that you are asking. I think we're going to come up with a lot more applications for open government that relate to reducing costs, said Otero. "As countries do this work more and more, we will see, especially when they are looking at the budget and the way the resources are allocated, that this will also, and the Secretary talked about this, conceivably have an impact on the tax revenue base of a country, because there are many citizens, either for excuse or otherwise, say 'well, why am I going to pay taxes if it's going to go into the pockets of some bureaucrat and it's really not going to bring about changes.' The minute you have more transparency and people begin to see how their taxes are being used, you then again increase the tax revenue that the country has available."

I brought up how the new city government in Chicago is thinking about data and the global movement towards open data, which Otero said is part of OGP. For city government under Mayor Emmanuel, open data is viewed as a means for government to understand its own operations, become more productive and deploy its resources more efficiently and effectively. The example of Chicago led Otero to highlight an aspect of the Open Government Partnership that's she found very interesting. "It is open to developed countries that have cities like Chicago, and developing countries, like a Kenya," said Otero.

"The point is that some of these tools for transparency can be used even by countries that one might think may not have the resources to be able to do that, or even the know how," she said. "In fact, it is available across the board and that is one of the characteristics of the Open Government Partnership, both recognizing that and ensuring that the leadership in this partnership from the outset is comprised of countries from the north or from the south. Again, showing examples of how you can do this in the south that are attainable to the countries that want to do that. It's very interesting that we can talk about Chicago and, say, Kisumu, Kenya in the same breath."

What concrete outcomes for open government around the world should citizens, advocates, entrepreneurs and technologists be looking for from this partnership?

"The partnership is really the first time that there is a multilateral platform to address these issues," said Otero. "The partnership could have focused on countries come in and present best practices and exchange ideas and then just go home," said Otero. "The partnership is really focused on first having countries participate that have already demonstrated interest in this area and have already put in place a number of specific things and the material laid out, if you will, the minimum standards that are being requested. What the partnership really looks for is to provide a mechanism by which the countries can each develop their own national plans on ways to expand what they're doing on transparency, accountability, and civic engagement, or to start new initiatives for them. That is really what is very different and important about this partnership, is that it is very action- and results-oriented."

When countries join the Open Government Partnership, they commit themselves to address one of several "grand challenges." "They can be anything from public service, addressing public integrity issues, for managing public resources," said Otero. "Using these challenges, they need to be able to create a plan. Now countries can, of course, choose what they will address. The partnership is not saying 'now all of you have to do the same thing.' It's very much based upon the way in which each country is assessing the specific ways it is interested in addressing. The Partnership is challenging countries to identify those areas of most interest to them, and then to be able to develop a plan that will allow them to make changes and have some real results come out of this. The broad vision for this effort is to really mobilize countries to do something very concrete and in the process develop their own capacity for doing it. Of course here, one can note that there will be some resources available to help countries do this work. That's really at the core of the work."

One clear difference that we see today from past decades is the reality of an increasingly wired citizenry. "The role of technology in doing all of this is very apparent to anyone that's been alive in the last decade," she said. "How countries are using technology, everything from using social media to creating their own websites to a variety of different things is really impressive and very innovative. So, of course, the private sector, if they've got any brains in their head, are seeing this as an important business opportunity."

"Whether you're creating new apps or working with directly with different governments, keeping your eyes open in this space, you also create different mechanisms, different technologies that can be of use to government. The bottom line is that the real effort here and the real outcome that would make the Open Government Partnership successful is signing up a significant number of countries that participate, and having those countries launch their own national plans and carry them out."

What were some of the platforms and technologies that have inspired you?

In Estonia they talk about creating a 'paperless government, Otero observed. "They really are creating 'e-governance,' as they call it, throughout, which is really quite amazing," she said. "In Iceland, it's very interesting that they're using social media to be able to have citizens participate in the redrafting of their constitution. They're using Facebook, and Twitter, and other things to just be able to communicate with the population.

Otero also pointed to the dynamic technology sector in Kenya, which launched an open government data platform this summer. Kenyans have advanced in technology more than any country in Africa, said Otero with the M-PESA system and the way that Kenyans can access information record data using mobile phones. "I think the Kenyan understand the importance of being able to use this data and some of the ideas that they put forth were more related to this area of saving resources and making some of the money available for other work. Otero also referenced open government work in Mexico, England, Honduras, Tanzania and Uganda.

India withdrew from the partnership, reportedly over concerns about a third party "audit" of its progress. Can you offer any more detail?

"It makes all the sense in the world to have independent experts who don't do an audit, which is a word that you used, but really assess, and look, and monitor the progress that's being made," said Otero. "They do this in a way to maintain that accountability, but also to make sure that you're not rating these countries or grading them or putting them in a category from 1 to 100 or whatever. That process is in place that was decided upon and all the countries believe that it adds vigor and rigor to this effort. I think, as you said, India has provided great value in this area of open government, of transparency, of accountability. They have done very important work, and they are strongly committed to the principles that are espoused by the Open Government Partnership. In fact, in the time that they worked directly, they really contributed a great deal. I think, right now, the government has indicated that they can't participate, and I think that the reason is precisely the one that you've laid out."

"I think that they will continue to follow the progress of the partnership. Many countries have bilateral relationships with India and continue to address these kinds of issue in a more bilateral way, because they have a great deal to contribute, both to this initiative and the overall work in transparency. I think, certainly, we completely respect their decision right now to watch this closely but not be part of it right now, and to continue doing their work internally. That's really the way that I understand their position."

Progress and setbacks toward open government

Over the summer and fall, analysis and information have steadily emerged about what this open government partnership will mean to open government in the United States and around the world. David Sasaki wondered if the OGP was "democracy building 2.0." Greg Michener echoed his analysis, wondering if the Brazil was fit to lead the OGP. Global Integrity explained its role in the OGP. Emma Smith questioned whether the Philippines is serious about open government.

In the U.S., OMB Watch posited that the OGP could drive U.S. commitments, particularly if, as John Wonderlich suggested at the Sunlight Foundation suggested, a U.S. national plan for open government was matched by subsequent follow through. The White House open government "status update" capped a historic week for open government in Washington, as the administration prepares to launch e-petitions. Quiet successes, however, have been matched with setbacks to open government in Washington over the past three years. The Obama administration now faces an uncertain future for funding for its Office of Management and Budget's open government initiatives after the U.S. Senate appropriations committee shortchanged the Electronic Government Fund by some $10 million dollars last week. With these proposed funding cuts the U.S. Congress is, as OMB Watch put, it "about to underfund the very tools that will tell them how federal money is being spent." When President Obama announces the U.S. National Plan for Open Government PDF) (embedded below), the implementation will have to be undertaken in that context.

The future of funding for open government platforms coming from the White House, however, now must be taken in the context of a much broader narrative that includes dozens of other countries and hundreds of millions of other citizens. Aleem Walji, writing at the World Bank, put the effort in the context of a broad move from "eGov to 'WeGov'. His analysis captures something important: whatever action the United States does or does not take in its own movements towards greater transparent, accountable or participatory government, there is a global movement towards transparency that is now changing the relationship of the governed to their governments. Unprecedented levels of connectivity and mobile devices have created new connections between citizens and information that lie outside of traditional methods of government command and control. The future of open government may well literally be in all of our hands.

This interview was condensed and edited. A full audio recording is embedded above.

September 19 2011

Promoting Open Source Software in Government: The Challenges of Motivation and Follow-Through


The Journal of Information Technology & Politics has just published a special issue on open source software. My article "Promoting Open Source Software in Government: The Challenges of Motivation and Follow-Through" appears in this issue, and the publisher has given me permission to put a prepublication draft online.

The main subject of the article is the battle between the Open Document Format (ODF) and Microsoft's Office standard, OOXML, which might sound like a quaint echo of a by-gone era but is still a critical issue in open government. But during the time my article developed, I saw new trends in government procurement--such as the Apps for Democracy challenge and the data.gov site--and incorporated some of the potential they represent into the piece.

Working with the publisher Taylor & Francis was enriching. The prepublication draft I gave them ranged far and wide among topics, and although these topics pleased the peer reviewers, my style did not. They demanded a much more rigorous accounting of theses and their justification. In response to their critique, I shortened the article a lot and oriented it around the four main criteria for successful adoption of open source by government agencies:

  1. An external trigger, such as a deadline for upgrading existing software

  2. An emphasis on strategic goals, rather than a naive focus on cost

  3. A principled commitment to open source among managers and IT staff responsible for making the transition, accompanied by the technical sophistication and creativity to implement an open source strategy

  4. High-level support at the policy-making level, such as the legislature or city council

Whenever I tell colleagues about the special issue on open source, they ask whether it's available under a Creative Commons license, or at least online for free download. This was also the first issue I raised with the editor as soon as my article was accepted, and he raised it with the publisher, but they decided to stick to their usual licensing policies. Allowing authors to put up a prepublication draft is adroit marketing, but also represents a pretty open policy as academic journals go.

On the one hand, I see the decision to leave the articles under a conventional license as organizational inertia, and a form of inertia I can sympathize with. It's hard to make an exception to one's business model and legal process for a single issue of a journal. Moreover, working as I do for a publisher, I feel strongly that each publisher should make the licensing and distribution choices that it feels is right for it.

But reflecting on the academic review process I had just undergone, I realized that the licensing choice reflected the significant difference between my attitude toward the topic and the attitude taken by academics who run journals. I have been "embedded" in free software communities for years and see my writing as an emerging distillation of what they have taught me. To people like me who promote open information, making our papers open is a logical expression of the values we're promoting in writing the papers.

But the academic approach is much more stand-offish. An anthropologist doesn't feel that he needs to invoke tribal spirits before writing about the tribe's rituals to invoke spirits, nor does a political scientist feel it necessary to organize a worker's revolution in order to write about Marxism. And having outsiders critique practices is valuable. I value the process that improved my paper.

But something special happens when an academic produces insights from the midst of a community or a movement. It's like illuminating a light emitting diode instead of just "shining light on a subject." I recently finished the book by Henry Jenkins, Fans, Bloggers, and Gamers: Media Consumers in a Digital Age, which hammers on this issue. As with his better-known book Convergence Culture, Jenkins is convinced that research about popular culture is uniquely informed by participating in fan communities. These communities don't waste much attention on licenses and copyrights. They aren't merely fawning enthusiasts, either--they critique the culture roughly and demandingly. I wonder what other disciplines could take from Jenkins.

August 17 2011

Opening government, the Chicago way

Chicago Skyline @ Night by Rhys Asplundh, on FlickrCities are experimenting with releasing more public data, engaging with citizens on social networks, adopting open source software, and finding ways to use new technologies to work with their citizens. They've been doing it through the depth of the Great Recession, amidst aging infrastructure, spiraling costs and flat or falling budgets. In that context, using technology and the Internet to make government work better and cities smarter is no longer a "nice to have" ... it's become a must-have.

In 2011, with the election of former White House chief of staff and congressman Rahm Emanuel, Chicago has joined the ranks of cities embracing the open government movement. Before his inauguration, Emanuel released a strategic plan that explicitly endorsed open data as a part of Chicago's future. The new administration hired its first chief technology officer, John Tolva, and a chief data officer, Brett Goldstein. In the months since, the new Chicago government is doing something notable, as far as governments go: it's following through on some of its open government promises.

Interviews with Chicago journalists and open government advocates, along with Tolva and Goldstein themselves, led me to a clear conclusion: there's something new going on in the Windy City that's worth sharing with the rest of the country and world.

"Appointing Tolva and Goldstein was one of the biggest ways in which Rahm has followed through," said Virginia Carlson, president of the Metro Chicago Information Center (MCIC), in an interview this summer. "The two of them make for a powerhouse, with Brett helping with releasing the data, in terms of the APIs and the time he's spent with the community."

The city has been releasing about two datasets a week since the new administration came into office, said Brian Boyer, news application developer for the Chicago Tribune. (That data trend is a big part of what motivated Boyer to work on the Panda Project.)

From where Tolva sits, what's happening in Chicago is not limited to open data or involving the tech community in improving the city. The culture of the mayor's office "changed radically with Mayor Emmanuel," said Tolva (@ChicagoCTO), speaking in a phone interview this summer. "I'm seeing the passion of the startup world here."

There's a long road ahead for open government in Chicago — the legacy of corruption, fraud and graft in City Hall there is legendary, after all — but it's safe to say that a new generation of civic coders and city officials are doing what they can to upgrade both the tone and technology that underpins city life.

"There was a lot of catching up to do," allowed Tolva. "A lot of it has been the open data publication. We've been getting very high-value datasets out almost every day. We launched an app competition. We got a performance dashboard up."

All of that is only the first step, he said. "It's part of a larger vision for stoking the entrepreneurial fires, where open data is used for much more than transparency. Data is a kind of raw material that the city encourages people to use. We're working on a digital roadmap and thinking more broadly. What can we do that will help businesses make the city more livable in a systemic way? One way we're going about that is rethinking what public space means. What are the kinds of data and interoperability standards that will allow that invisible architecture to be as accessible as a park is, and as malleable in purpose?"

Strata Conference New York 2011, being held Sept. 22-23, covers the latest and best tools and technologies for data science — from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.

Save 30% on registration with the code STN11RAD

Tolva also offered some constructive criticism for the technologists in the open government movement to consider: "The community of civic nerds has not done a great job at engaging the big civic innovators who have no knowledge of technical skills or that area," he said. "We're trying to bring them together. One of my roles — the reason we're in the mayor's office — is to try to be that translator between the architects and the urban planners of the world and technologists."

Tolva said he is working on both economic development and applying technology to empower others to help the city work better. "I'm working with the commissioner to evangelize and convene innovators in Chicago's technology community, including Threadless, Groupon and EveryBlock. We want to promote that sensibility from the mayor's office, in terms of business developments. One third of my job is the analytics part of that, bringing data-driven decision making to the city departments, down into the individual commissions."

The movement toward opening up Chicago's government data predates the Emmanuel administration, as Carlson reminded me when I asked about new releases since the inauguration.

"The conversations started in November of 2009," she said. "The city has been building its data catalog for over a year and a half. We've been waiting for someone to come in and pull the switch. Maybe one quarter of what's now available was available before the new mayor took office. Three quarters of the data was sitting on internal servers waiting for someone to say, 'yes, we can publish it!' The salaries of city workers, for instance, was absolutely something that Rahm has released, along with lots of 311 data."

311 data has been the target of much of the initial open government activity in cities around the country, given the insight it can provide into the problems that citizens are reporting and the customer service they receive from their governments. When city officials can look at what 311 data can reveal about their urban environments, for instance, new opportunities emerge for improving the way government can target its efforts in cooperation with developers and citizens. That's the kind of "citizensourcing" smarter government that Tolva is looking to tap into in Chicago.

"This is as much about citizens talking to the infrastructure of the city as infrastructure talking to itself," he said. "It's where urban informatics and smarter cities cross over to Gov 2.0. There are efficiencies to be gained by having both approaches. You get the best of both worlds by getting an Internet of things to grow."

The most important thing that Tolva said that he has been able to change in the first months of the young administration is integrating technology into more of Chicago's governing culture. "If a policy point is being debated, and decisions are being made, people are saying 'let's go look at the data.' The people in office are new enough that they can't run on anecdotes. There's the beginning of a culture merging political sensibility with what the city is telling us."

That culture sounds more than a little like the new data journalism, applied to an emerging civic stack.

"I'm proud — and a bit harried by — the number of people asking for a regression analysis," said Tolva. "We have policy analysts who are dabbling with ArcGIS and trying Python."

The business case for open data

Like every other metropolis, Chicago has budget constraints. In the current economic climate, spending public dollars has to provide a return for taxpayers. Accountability and transparency are important civic goods — but making a business case for open data requires more grounded arguments for a city CFO to support these initiatives.

"The mayor is firmly committed to innovation that really matters and that can be built upon," said Tolva. When it comes to the business case for open data, Tolva identified four areas that support the investment, including an economic rationale:

  1. Trust — "Open data can build or rebuild trust in the people we serve," Tolva said. "That pays dividends over time."
  2. Accountability of the work force — "We've built a performance dashboard with KPIs [key performance indicators] that track where the city directly touches a resident."
  3. Business building — "Weather apps, transit apps, that's the easy stuff," he said. "Companies built on reading vital signs of the human body could be reading the vital signs of the city."
  4. Urban analytics — "Brett [Goldstein] established probability curves for violent crime. Now we're trying to do that elsewhere, uncovering cost savings, intervention points, and efficiencies."

Opening Chicago's data

Opening up Chicago's government data further will take time, expertise, and political support, along with a lot of hard work. Applying it is no different. For now, Tolva and Goldstein have the former three components firmly in hand. The latter is what lies ahead.

"In the realm of public safety, I had a good sense of the relevant data structures," said Goldstein in an interview this summer. "The city is an enterprise that's so large, with so many different functions and so many different data structures, that making sense of the landscape and developing a plan is a challenge."

From enterprise resource planning systems to public health to transportation, there's great diversity in how city data is structured and stored.

"One of the things Chicago has done very well is collect data," Goldstein said. "Now, one of the things we need to do is develop a holistic vision for an enterprise data architecture and data warehouse. How to do you take the things that are meaningful from architecture and then make them meaningful to the public?"

Given the challenges involved here, it wasn't surprising to hear Goldstein say that "we're not where I want to be yet" — but he's approaching the process methodically. "I want to know the entire lay of the land, have everything mapped out and understand the next steps."

As he looks ahead, Goldstein is less worried about access or load concerns, given the city's use of the Socrata online platform for open data. He's more focused on sustainable design.

"I want to make sure that the path we take the city on is sustainable and has a more open architecture," he said. "I find that when we choose proprietary solutions, it's hard to get the data out. If I'm going to sit down and code, I'm going to do it in Python, use Linux, and I'm going to be happier about it.

Goldstein is well aware of persistent issues around data quality that have dogged the use — and reputation — of open government data releases. "I'm very traditional in how I deal with data," he said. "It's the same as working with analytics. You need to make sure data is clean and high quality."

The process to get to clean data is, as Goldstein described it, quite methodical: "We have multiple phases for how we roll out data internally, starting with working with the business owner. We figure out how we'll get it out of the transactional database. After that, we determine if it's clean, if it's verified, and if we can sign off on it technically. "

The last step is analyzing whether the process is sustainable. "Some people send a spreadsheet, upload it and maintain it manually," said Goldstein. "That's not sustainable. We have hundreds of datasets. We're not going to do that. You need to write code that updates data on its own, and then you can focus on new datasets."

At a high level, Chicago's chief data officer emphasizes the value of open data in providing the city with insight into its business processes. "Opening data alone isn't enough," Goldstein said. "We're giving people the data to make meaningful apps and do meaningful research — but are we putting out a tabular dataset? Is it spatially enabled? Are we offering KML files directly versus a downloadable file? If we keep the KML file updated, then [application developers] can access the data directly from the app."

In this respect, Goldstein's focus on making data clean, sustainable and directly available suggests that he's attuned to what citizens want when they build applications. An open data study from late last year found that a majority of citizens prefer to explore and interact with data online, as opposed to downloading data to examine in a spreadsheet.

To fully embrace this vision, however, Chicago is going to have to build out its data capabilities to become a smarter city. "The first step is moving over to a more open platform," said Goldstein. "You don't have to make a multi-million-dollar investment to get a fancy GUI and something meaningful. If you bring something over to Linux, between Python and R you can produce some remarkable outcomes. These are some really low-cost solutions."

They're looking to use city data to make the city more productive and the processes better, said MCIC president Virginia Carlson. "For example, what if the city wants to understand zoning and the retail food landscape? Using its own food licensing and food inspection data, they can see where food is being sold. If Walmart is coming in, can the city mine its own data to understand where food deserts are and have a much richer understanding of its landscape?"

The city won't be working on this alone either, emphasized Goldstein. "We have great academic partners and lots of people coming to the table. We don't need to be afraid of using these tools. It's high time."

Refining apps competitions

The design of the Apps for Metro Chicago competition, offers some insight into how Chicago has learned from what other cities have done in their own open government and open data efforts. The competition is taking a next-generation approach, trying to provide technical assistance and connect communities with software developers.

"When I think about where we are, versus a San Francisco or Boston, it's because of examples of what worked and what didn't," said Tolva. "The judging criteria for the competition takes into account the sustainability of an idea, along with its cross-platform nature." In the video below, city officials talk about open data and building applications that are useful to the community.

Given the points that have been raised about the sustainability of apps contests, tying development to the demonstrated needs of citizens looks like an idea whose time has come. Look to the submitted ideas for NYC Big Apps in version 3.0 of its competition, for instance.

"We've elevated business viability in the judging rubric and are working with a great partner, MCIC," said Tolva. With regard to NYC BigApps 3, "there are all kinds of apps that we'd love to have," he allowed, but the applications in Apps for Metro Chicago have to solve business problems.

"The judging rubric has it that you have to demonstrate community participation and then release open source code," said Carlson. "The app has to be free to users for a year. We're very conscious that we don't want this to be a big competition ... and then it's over."

Tolva also focused on building community around apps contests and bringing more voices into the process. "We're using the Apps for Chicago to get a new kind of civic engagement and participation, which you can get involved in whether you write code or not," he said. "We've invited community leaders and groups to the table. The idea for a 'Yelp for social services' didn't come from a technologist, for example. We're curating ideas from non-technologists."

Apps for Metro ChicagoThe hypothesis in Chicago is that this hybrid strategy will result in better outcomes for taxpayers, developers and, crucially, citizens. "The apps competition needed to have a data expert, with someone outside of the city running it," said Carlson. "Justin Massa helped write the rules. Chicago was the first place to bring in unbiased external experts. Can we understand what we need to by doing open data right? This story is just beginning. The questions will be if, in six to eight months, whether this model works. We need to promote data sharing and cleanliness between data departments, to have data tickets, an internal account and a liaison, who can share that information, getting that productivity feedback and communication with developers."

The better part of an apps competition is the feedback on the data, said Carlson, not just how the city can use data on the public-facing side but apply data on the enterprise architecture side. "We're trying to capitalize on the cool factor to enhance internal processes, working with staff, and trying to get data to understand the city."

Writing the rough code of history

"We have been trying to get data out of state and local government for more than 20 years," said Carlson. "For me, to see this tide coming along from loosely affiliated millennials willing to stay up all night is inspiring. That's what's creating the energy to free up the data — this distributed network that's been living and breathing opening up the data."

There's more than the energy of millennials to celebrate here, however, as she emphasized. "They're pushing the data out to citizens as a way of running the city," she said. "It's in a business enterprise kind of way — that's the way Rahm is thinking about it. Using it internally hasn't been emphasized a lot, but it's a big part of what they're trying to do."

To get anywhere close to achieving that goal, Chicago will have to close the IT gap between the public and private sector, particularly in the emerging field of data science.

From the outside, it looks like the city's technology officials are hungry to improve how Chicago uses technology. "In the private sector and research community, we do cutting-edge work," said Goldstein. "Why shouldn't the government do this? Why should the bar be any lower?"

For now, as the new administration finds its way, there's hope that Chicago will take a leading role among other cities adopting open government.

"The combination of committed political leadership, engaged civic leaders and a vibrant start-up scene has made Chicago the place to watch for people who care about technology and society," said John S. Bracken, director of media innovation at the John S. and James L. Knight Foundation, when asked for comment. "We're living in what is potentially one of the most important times in the city's history."

Photo: Chicago Skyline @ Night by Rhys Asplundh, on Flickr



Related:


July 31 2011

App outreach and sustainability: lessons learned by Portland, Oregon

Having decided to hang around Portland for a couple days after the Open Source convention, I attended a hackathon sponsored by the City of Portland and a number of local high tech companies, and talked to Rick Nixon (program manager for technology initiatives in the Portland city government) about the two big problems faced by contests and challenges in government apps: encouraging developers to turn their cool apps into sustainable products, and getting the public to use them.

It's now widely recognized that most of the apps produced by government challenges are quickly abandoned. None of the apps that won awards at the original government challenge--Vivek Kundra's celebrated Apps for Democracy contest in Washington, DC--still exist.

Correction: Alex Howard tells me one of the Apps for Democracy
winners is still in use, and points out that other cities have found
strategies for sustainability.

And how could one expect a developer to put in the time to maintain an app, much less turn it into a robust, broadly useful tool for the general public? Productizing software requires a major investment. User interface design is a skill all its own, databases have to be maintained, APIs require documentation that nobody enjoys writing, and so forth. (Customer service is yet another burden--one that Nixon finds himself taking on for apps developed by private individuals for the city of Portland.) Developers quit their day jobs when they decide to pursue interesting products. The payoff for something in the public sphere just isn't there.

If a government's goal is just to let the commercial world know that a data set is available, a challenge may be just the thing to do, even if no direct long-term applications emerge. But as Nixon pointed out, award ceremonies create a very short blip in the public attention. Governments and private foundations may soon decide that the money sunk into challenges and awards is money wasted--especially as the number of challenges proliferate, as I've seen them do in the field of health.

Because traditional incentives can never bulk up enough muscle to make it worthwhile for a developer to productize a government app, the governments can try taking the exact opposite approach and require any winning app to be open source. That's what Portland's CivicApps does. Nixon says they also require a winning developer to offer the app online for at least a year after the contest. This gives the app time to gain some traction.

Because nearly any app that's useful to one government is useful to many, open source should make support a trivial problem. For instance, take Portland's city council agenda API, which lets programmers issue queries like "show me the votes on item 506" or "what was the disposition of item 95?" On the front end, a city developer named Oscar Godson created a nice wizard, with features such as prepopulated fields and picklists, that lets staff quickly create agendas. The data format for storing agendas is JSON and the API is so simple that I started retrieving fields in 5 minutes of Ruby coding. And at the session introducing the API, several people suggested enhancements. (I suggested a diff facility and a search facility, and someone else suggested that session times be coded in standard formats so that people could plan when to arrive.) Why couldn't hundreds of governments chip in to support such a project?

Code for America, a public service organization for programmers supported by O'Reilly and many other institutions, combines a variety of strategies. All projects are open source, but developers are hooked up with projects for a long enough period to achieve real development milestones. But there may still be a role for the macho theatrics of a one-day hackathon or short-term challenge.

Enhancing the platform available to developers can also stimulate more apps. Nixon pointed out that, when Portland first released geographic data in the form of Shapefiles, a local developer created a site to serve them up more easily via an API, mobilizing others to create more apps. He is now part of the Code For America effort doing exactly the same thing--serving up geographic data--for other large municipalities.

Public acceptance is the other big problem. A few apps hit the big time, notably the Portland PDX bus app that tells you how soon a bus is coming so you can minimize the time you wait out in the rain. But most remain unknown and unappreciated. Nixon and I saw no way forward here, except perhaps that one must lead the way with increasing public involvement in government, and that this involvement will result in an increased use of software that facilitates it.

The wealth of simple APIs made a lot of people productive today. The applications presented at the end of the Portland hackathon were:

  • A mapping program that shows how much one's friends know each other, clustering people together who know each other well

  • An information retrieval program that organizes movies to help you find one to watch

  • A natural language processing application that finds and displays activities related to a particular location

  • An event planner that lets you combine the users of many different social networks, as well as email and text messaging users (grand prize winner)


  • A JSON parser written in Lua communicating with a GTK user interface written in Scheme (just for the exercise)

  • A popularity sorter for the city council agenda, basing popularity on the number of comments posted

  • A JavaScript implementation of LinkedIn Circles

  • A geographic display of local institutions matching a search string, using the Twilio API

  • A visualization of votes among city council members

  • An aggregator for likes and comments on Facebook and (eventually) other sites

  • A resume generator using LinkedIn data

  • A tool for generating consistent location names for different parts of the world that call things by different terms

Approximately 130 man-and-woman hours went into today's achievements. A project like Code for America multiplies that by hundreds.

June 16 2011

Advances, setbacks, and continuing impediments to government transparency

I heard yesterday about the good, the bad, and the edgy in open government at Computers, Freedom & Privacy, being held this week in Washington, DC. A panel that covered open meetings laws and social networking started with a summary by Andy Wilson of Public Citizen Texas of how Utah and Texas trended in different directions.

Utah state legislators recently suffered embarrassments when email messages were demanded and released under the open records law in that state, the Government Records Access and Management Act (GRAMA). In their urgency to protect future email from public exposure, they passed (and the governor signed) a bill called HB 477 that went to the extreme of cutting off public access to most records. Widespread outrage accompanied the act, predictably organizing under the slogan "Don't Kill GRAMA!", and succeeded in restoring access to records.

In Texas, during the same period, activists succeeded in moving the state forward in terms of open records. Public utilities were required to put data online, over their objections that it represented "competitive information." Electronic filing was instituted for new classes of government information.

Wilson said that Texas legislators were no more enlightened than Utah's. In fact, he called Texas "even more parochial and conservative" than Utah. He attributed the successes in Texas to well-organized NGO sector, and to their advantage in acting pro-actively instead of in reaction to some shock. They used financial arguments to bring many new records online, pointing out how much paper and postage it would save.

Readers of the Government 2.0 site know many of the US Administration's achievements in transparency and public participation, and have probably heard the unfortunate budget cuts that will devastate sites such as Recovery.gov and Data.gov (Congress reduced this "Electronic Government" fund from a requested 34 million to 8 million dollars). Daniel Schuman of Sunlight Foundation mentioned that a Congressional hearing will be held on its budget today, but not in a fashion that gives one confidence in Congress's commitment to open government: he said it will be in a room that seats ten people, with no webcasting.

Nevertheless, even Congress has made great strides in opening up to the public itself. After taking back the House in 2010, the Republicans created several rules opening up their procedures to public scrutiny. The Sunlight Foundation proposed a Public Online Information Act to standardize the release of government information in open, "user-friendly" formats. The bill was introduced into the previous Congress but failed, and has been introduced again.

Joe Newman of the Project on Government Oversight discussed the barriers to opening Federal email and to the use of social networks such as Twitter by legislators and agencies. Plenty are on these networks: nearly all Senators have at least one Twitter feed, and most of the House as well. But many simply use it as an extra channel for announcing their press releases. "Self-promotion is not transparency," Newman pointed out. Real social networking success comes when citizens talk back--hopefully in productive ways, but even flaming shows that there's some chance of engagement. Coffee Party USA, represented at the panel by founder Annabel Park, may be one stimulus to reaching out from the citizen side.

It's ironic that one of the best exemplars of how to use Twitter in Congress, before his fall, was Anthony Weiner. (Before this session, I had felt relieved to get through a day in Washington without anyone mentioning Weiner.) His tweets had information, personality, and appeal. There was some controversy over whether his handle, @repweiner, was proper because it tied his name to the office he held, but the convention is widespread. Congressional use of Twitter dropped 30% after the Weiner scandal hit, but Newman assured us the setback will be temporary.

I brought up a point I have made before in blogs, that government use of commercial sites such as Facebook and Twitter ties their users and citizens into networks whose purposes and goals might not align with the purposes and goals of government. However, Newman warned against asking the government to set up its own social network, particularly if law enforcement can snoop around at what people are storing there. Schuman said there is no way to avoid the popular networks. "Politicians go where their constituents are. If people are on Facebook, they go on Facebook; if people are in the town square they go to the town square."

On the agency side, people are reluctant to take up social media until they're back up by clear policies, and Newman said these aren't in place yet. The problem is not a lack of policy memoranda--quite the opposite. Half a dozen agencies have weighed in with numerous documents about the use of social media. Few federal employee are going to read through them all and follow the references to other documents. Those who do who will come out the other end still not sure where the source of authority resides. Meanwhile, Congress has tried and failed two years in a row to legislate how Federal agencies should store their email. An Electronic Preservation Act also failed to pass last year, and this year (as part of a larger bill) seems to be held up be partisan wrangling. But successes in open government, both in Congress and in the states, demonstrate that it appeals to politicians across the aisle.

June 12 2011

How a Health 2.0 code-a-thon works

I had a blast today at my first Health 2.0 code-a-thon. These are held regularly in different cities; today's was in Washington, DC. Another one will be held on the weekend following (please pardon the plug) O'Reilly's Open Source Convention. Today I kibbetzed and occasionally probed teams' decisions with questions without trying to code (or get in the way), and this participation was completely consistent with the wide range of things people were doing. A code-a-thon is a place where people with data in search of ideas meet people with ideas in search of data.

Health 2.0 leadership and staff: Matthew Holt, Lizzie Dunklee, and Shelle Hyde

Health 2.0 leadership and staff: Matthew Holt, Lizzie Dunklee, and Shelle Hyde

At the furthest corner of the open space generously given to us by Kaiser Permanente Center for Total Health, one team of three to four people sullenly huddled around a table and pounded their laptops for hours, never saying a word at any time I was there to notice. Two meters away from them sat a clump of voluble health care developers producing nothing concrete at all, but visibly enjoying their conversations around the general theme of "what seniors want."

I talked to two coding teams about their projects. The first was taking data generated by an agency in Washington, DC about HIV-positive residents and trying to produce visualizations of important trends and variations. The other took records from the Department of Veterans Affairs' Blue Button site and mashed them up with information available about medications from the National Cancer Institute's thesaurus through their LexEVS tool. The goal was simply to let a veteran position the mouse over the name of the medication in the Blue Button output and have a description of that medication pop up.

Choice of technology is a central task in any programming project. At a code-a-thon, agile soon morphs into quick and dirty. The HIV team had data in spreadsheet format, so the leader tried at first just to stuff it into a Google Doc and use Google charts to make the visualizations. The Blue Button project leader managed to load the plaintext format into an XML schema, and planned to use Greasemonkey to add the popup. This choice was based on privacy concerns: he wanted to confine data to the screen of the veteran, and didn't want anything that could potentially send the veteran's information to a remote system.

Both projects started with three coders, but the HIV one soon attracted another group of three who worked intensively on it through lunch and during the whole afternoon. Although the HIV project ended up with two or three times the number of coders as the Blue Button project, the HIV team remained two separately, loosely coordinating teams. This was even reflected by their positions at opposite sides of the room.

I soon noticed two other handicaps the HIV team(s) had to grapple with. The first consisted of problems with the input data, starting with from the date format (it was simply strings such as 6/11/2011, not a true date in the format Google Docs supports). A second handicap was absolutely classic and has derailed many projects of a bigger scale than this one: the team wasn't sure what data to select and how to visualize it. Confusion reined over which demographics would be of most interest to the agency who gave them the data, and how to handle complex relationships such as different risk factors for getting HIV. The project leader was familiar with the agency and probably could have enunciated a vision, but for some reason it was hard to get across to the teams.

Turn-out was low for this hack-a-thon. Attendance shrank to about 15 for the presentations of projects. But the tension built as six o'clock approached. At the last minute, I was asked to be one of the judges.

There's a happy ending to all this: every team overcome its essential difficulties and annoying blocks.

  • Aether, the quietly intense team whose activity was totally opaque to me, pulled off a stunningly deft feat of programming. They are trying to improve patient compliance by using SMS text messaging to help the patient stay in contact with the physician and remain conscious of his own role in his treatment. A patient registers his cell phone number (or is registered by his doctor) and can then enter relevant information, such as a daily glucose reading, which the tool displays in a graph. Next steps includes adding notifications so the system can remind patients to participate or give them advice. The ability to compare physicians is also a goal. Aether won first prize today.

  • The Blue Button team achieved its basic goal of mashing up the NCI thesaurus with medications on a veteran's display. The output is crude (a lot of XML tags come out in the display, and the inserted text currently overlays the screen instead of being a hover-over), but the proof of concept succeeded. This was a big achievement for two coders with self-described rusty skills. Next steps include hooking up with other data sets and augmenting other fields such as allergies. This team won second prize today.

  • SeeDC, the HIV team, succeeded in curating their data--which they estimated to take up half their time today--and ultimately stored it in a relational database while using the project leader's favorite platform (Django) to generate visualizations. Another team member stuck to Google Charts and also produced some very nice displays. One of the team's goals is to make it easier for their agency not only to view the implications of their data but to generate reports for higher-level agencies. They also plan to work with the agency to help them collect and store cleaner data. This includes moving from paper forms to the web or a mobile interface.

  • SeNeSo (Senior Network Social) aims to improve the elderly's social life and their enjoyment of available activities. The proposed platform (no coding was done) includes a calendar, notifications of events, and event invitations. The platform could be integrated with some larger social networking site like Facebook. I could tell, by watching the team's discussions throughout the day, that these helped the team dramatically focus and scale down their goals to something achievable and clearly of value.

  • A final project used Google Refine to filter, sort, and check data from FDA product labels (mashed up with some privately collected data) on drugs submitted by firms.

For the Health 2.0 organization, the code-a-thons form a sequence leading up to an annual San Francisco event. The points I want to draw from this event are that 1) joining a code-a-thon for a day is lots of fun, 2) you can meet really fascinating and talented people at code-a-thons, 3) great ideas can really take off at these events, and 4) you don't have to domain-specific knowledge (health care in this instance) or even be a professional developer to contribute.

How a Health 2.0 code-a-thon works

I had a blast today at my first Health 2.0 code-a-thon. These are held regularly in different cities; today's was in Washington, DC. Another one will be held on the weekend following (please pardon the plug) O'Reilly's Open Source Convention. Today I kibbetzed and occasionally probed teams' decisions with questions without trying to code (or get in the way), and this participation was completely consistent with the wide range of things people were doing. A code-a-thon is a place where people with data in search of ideas meet people with ideas in search of data.

Health 2.0 leadership and staff: Matthew Holt, Lizzie Dunklee, and Shelle Hyde

Health 2.0 leadership and staff: Matthew Holt, Lizzie Dunklee, and Shelle Hyde

At the furthest corner of the open space generously given to us by Kaiser Permanente Center for Total Health, one team of three to four people sullenly huddled around a table and pounded their laptops for hours, never saying a word at any time I was there to notice. Two meters away from them sat a clump of voluble health care developers producing nothing concrete at all, but visibly enjoying their conversations around the general theme of "what seniors want."

Cristian Lui at opening session

Cristian Lui at opening session

I talked to two coding teams about their projects. The first was taking data generated by an agency in Washington, DC about HIV-positive residents and trying to produce visualizations of important trends and variations. The other took records from the Department of Veterans Affairs' Blue Button site and mashed them up with information available about medications from the National Cancer Institute's thesaurus through their LexEVS tool. The goal was simply to let a veteran position the mouse over the name of the medication in the Blue Button output and have a description of that medication pop up.

Choice of technology is a central task in any programming project. At a code-a-thon, agile soon morphs into quick and dirty. The HIV team had data in spreadsheet format, so the leader tried at first just to stuff it into a Google Doc and use Google charts to make the visualizations. The Blue Button project leader managed to load the plaintext format into an XML schema, and planned to use Greasemonkey to add the popup. This choice was based on privacy concerns: he wanted to confine data to the screen of the veteran, and didn't want anything that could potentially send the veteran's information to a remote system.

Both projects started with three coders, but the HIV one soon attracted another group of three who worked intensively on it through lunch and during the whole afternoon. Although the HIV project ended up with two or three times the number of coders as the Blue Button project, the HIV team remained two separately, loosely coordinating teams. This was even reflected by their positions at opposite sides of the room.

I soon noticed two other handicaps the HIV team(s) had to grapple with. The first consisted of problems with the input data, starting with from the date format (it was simply strings such as 6/11/2011, not a true date in the format Google Docs supports). A second handicap was absolutely classic and has derailed many projects of a bigger scale than this one: the team wasn't sure what data to select and how to visualize it. Confusion reined over which demographics would be of most interest to the agency who gave them the data, and how to handle complex relationships such as different risk factors for getting HIV. The project leader was familiar with the agency and probably could have enunciated a vision, but for some reason it was hard to get across to the teams.

HIV team

HIV team

Turn-out was low for this hack-a-thon. Attendance shrank to about 15 for the presentations of projects. But the tension built as six o'clock approached. At the last minute, I was asked to be one of the judges.

There's a happy ending to all this: every team overcome its essential difficulties and annoying blocks.

  • Aether, the quietly intense team whose activity was totally opaque to me, pulled off a stunningly deft feat of programming. They are trying to improve patient compliance by using SMS text messaging to help the patient stay in contact with the physician and remain conscious of his own role in his treatment. A patient registers his cell phone number (or is registered by his doctor) and can then enter relevant information, such as a daily glucose reading, which the tool displays in a graph. Next steps includes adding notifications so the system can remind patients to participate or give them advice. The ability to compare physicians is also a goal. Aether won first prize today.

  • The Blue Button team achieved its basic goal of mashing up the NCI thesaurus with medications on a veteran's display. The output is crude (a lot of XML tags come out in the display, and the inserted text currently overlays the screen instead of being a hover-over), but the proof of concept succeeded. This was a big achievement for two coders with self-described rusty skills. Next steps include hooking up with other data sets and augmenting other fields such as allergies. This team won second prize today.

  • SeeDC, the HIV team, succeeded in curating their data--which they estimated to take up half their time today--and ultimately stored it in a relational database while using the project leader's favorite platform (Django) to generate visualizations. Another team member stuck to Google Charts and also produced some very nice displays. One of the team's goals is to make it easier for their agency not only to view the implications of their data but to generate reports for higher-level agencies. They also plan to work with the agency to help them collect and store cleaner data. This includes moving from paper forms to the web or a mobile interface.

  • SeNeSo (Senior Network Social) aims to improve the elderly's social life and their enjoyment of available activities. The proposed platform (no coding was done) includes a calendar, notifications of events, and event invitations. The platform could be integrated with some larger social networking site like Facebook. I could tell, by watching the team's discussions throughout the day, that these helped the team dramatically focus and scale down their goals to something achievable and clearly of value.

  • A final project used Google Refine to filter, sort, and check data from FDA product labels (mashed up with some privately collected data) on drugs submitted by firms.

For the Health 2.0 organization, the code-a-thons form a sequence leading up to an annual San Francisco event. The points I want to draw from this event are that 1) joining a code-a-thon for a day is lots of fun, 2) you can meet really fascinating and talented people at code-a-thons, 3) great ideas can really take off at these events, and 4) you don't have to have domain-specific knowledge (health care in this instance) or even be a professional developer to contribute.

How a Health 2.0 code-a-thon works

I had a blast today at my first Health 2.0 code-a-thon. These are held regularly in different cities; today's was in Washington, DC. Another one will be held on the weekend following (please pardon the plug) O'Reilly's Open Source Convention. Today I kibbetzed and occasionally probed teams' decisions with questions without trying to code (or get in the way), and this participation was completely consistent with the wide range of things people were doing. A code-a-thon is a place where people with data in search of ideas meet people with ideas in search of data.

At the furthest corner of the open space generously given to us by Kaiser Permanente Center for Total Health, one team of three to four people sullenly huddled around a table and pounded their laptops for hours, never saying a word at any time I was there to notice. Two meters away from them sat a clump of voluble health care developers producing nothing concrete at all, but visibly enjoying their conversations around the general theme of "what seniors want."

I talked to two coding teams about their projects. The first was taking data generated by an agency in Washington, DC about HIV-positive residents and trying to produce visualizations of important trends and variations. The other took records from the Department of Veterans Affairs' Blue Button site and mashed them up with information available about medications from the National Cancer Institute's thesaurus through their LexEVS tool. The goal was simply to let a veteran position the mouse over the name of the medication in the Blue Button output and have a description of that medication pop up.

Choice of technology is a central task in any programming project. At a code-a-thon, agile soon morphs into quick and dirty. The HIV team had data in spreadsheet format, so the leader tried at first just to stuff it into a Google Doc and use Google charts to make the visualizations. The Blue Button project leader managed to load the plaintext format into an XML schema, and planned to use Greasemonkey to add the popup. This choice was based on privacy concerns: he wanted to confine data to the screen of the veteran, and didn't want anything that could potentially send the veteran's information to a remote system.

Both projects started with three coders, but the HIV one soon attracted another group of three who worked intensively on it through lunch and during the whole afternoon. Although the HIV project ended up with two or three times the number of coders as the Blue Button project, the HIV team remained two separately, loosely coordinating teams. This was even reflected by their positions at opposite sides of the room.

I soon noticed two other handicaps the HIV team(s) had to grapple with. The first consisted of problems with the input data, starting with from the date format (it was simply strings such as 6/11/2011, not a true date in the format Google Docs supports). A second handicap was absolutely classic and has derailed many projects of a bigger scale than this one: the team wasn't sure what data to select and how to visualize it. Confusion reined over which demographics would be of most interest to the agency who gave them the data, and how to handle complex relationships such as different risk factors for getting HIV. The project leader was familiar with the agency and probably could have enunciated a vision, but for some reason it was hard to get across to the teams.

Turn-out was low for this hack-a-thon. Attendance shrank to about 15 for the presentations of projects. But the tension built as six o'clock approached. At the last minute, I was asked to be one of the judges.

There's a happy ending to all this: every team overcome its essential difficulties and annoying blocks.

  • Aether, the quietly intense team whose activity was totally opaque to me, pulled off a stunningly deft feat of programming. They are trying to improve patient compliance by using SMS text messaging to help the patient stay in contact with the physician and remain conscious of his own role in his treatment. A patient registers his cell phone number (or is registered by his doctor) and can then enter relevant information, such as a daily glucose reading, which the tool displays in a graph. Next steps includes adding notifications so the system can remind patients to participate or give them advice. The ability to compare physicians is also a goal. Aether won first prize today.

  • The Blue Button team achieved its basic goal of mashing up the NCI thesaurus with medications on a veteran's display. The output is crude (a lot of XML tags come out in the display, and the inserted text currently overlays the screen instead of being a hover-over), but the proof of concept succeeded. This was a big achievement for two coders with self-described rusty skills. Next steps include hooking up with other data sets and augmenting other fields such as allergies. This team won second prize today.

  • SeeDC, the HIV team, succeeded in curating their data--which they estimated to take up half their time today--and ultimately stored it in a relational database while using the project leader's favorite platform (Django) to generate visualizations. Another team member stuck to Google Charts and also produced some very nice displays. One of the team's goals is to make it easier for their agency not only to view the implications of their data but to generate reports for higher-level agencies. They also plan to work with the agency to help them collect and store cleaner data. This includes moving from paper forms to the web or a mobile interface.

  • SeNeSo (Senior Network Social) aims to improve the elderly's social life and their enjoyment of available activities. The proposed platform (no coding was done) includes a calendar, notifications of events, and event invitations. The platform could be integrated with some larger social networking site like Facebook. I could tell, by watching the team's discussions throughout the day, that these helped the team dramatically focus and scale down their goals to something achievable and clearly of value.

  • A final project used Google Refine to filter, sort, and check data from FDA product labels (mashed up with some privately collected data) on drugs submitted by firms.

For the Health 2.0 organization, the code-a-thons form a sequence leading up to an annual San Francisco event. The points I want to draw from this event are that 1) joining a code-a-thon for a day is lots of fun, 2) you can meet really fascinating and talented people at code-a-thons, 3) great ideas can really take off at these events, and 4) you don't have to domain-specific knowledge (health care in this instance) or even be a professional developer to contribute.

June 10 2011

Challenges aired at Health Data Initiative Forum

A major bash by the Department of Health and Human Services and the Institute of Medicine--together with the NIH, EPA, and others--drew hundreds of people yesterday in Washington, DC to discuss the use of government data in health care. By placing "challenges" in the title to this article, I'm indulging in a play on words. Most readers will come expecting me to talk about problems that require grappling and resolution. But the "challenges" I'm talking about are contests held by numerous institutions to produce applications that consume health care data and, ultimately, benefits for patients, doctors, policy makers, and the public at large.

The challenges to which so many developers have responded draw on data that HHS has loaded in a programmable format onto Health.Data.Gov--and that they continue to release at a fast clip (see for instance the Health Indicators Warehouse). The zeal with with HHS has organized its data under convenient APIs puts it among the foremost of agencies in the open government movement.

Todd Park, HHS CTO and hero to everything in the open government movement, described the "datapalooza" as follows: "Today is just a day of massive celebration." And there was certainly a lot of mutual back-slapping among Obama Administration leaders on the stage. But the strength of the new applications shown off at the conference proved there was indeed reason to celebrate. The release of data and its use by developers ranging from college kids to international corporations is a non-partisan cause for exultation.

By bringing us to a new plateau, these apps also show some of the next steps we need to take in health care to extract real improvements in care and costs savings from the data on which the apps are based. So there are actually "challenges" of the problematic type implicit in the conference after all. I'll start by describing the progress shown by some of the apps I saw; a briefer but more comprehensive list can be found on the blog of my colleague, Brian Ahier. Then I'll move on to next steps.

A sample of recent health-care apps and challenges

In the interest of full disclosure, I should mention that I submitted some ratings of apps as a judge in the Health 2.0 Developer Challenge. Some of the interesting projects I discovered at the conference include:

SleepBot

This won the top prize in a special contest run by the Institute of Medicine and the National Academy of Engineering for college students, Go Viral to Improve Health. The app was developed by two students, one of whom said she never gets enough sleep because she goes to school full-time and works two jobs. How, one might ask, did she have time to write a winning app? I hope that after winning this prize she can quit one job.

The app makes it extremely easy to measure sleep and compare different nights. You just press a button that says "I'm going to sleep!" and the app records the current time. If you press your button in the morning to turn off your alarm on the mobile device, the app records you as waking up. You can display your sleep times for multiple nights and save the data for running statistics. I think this app would make an excellent complement for the Zeo sleep monitor, which I've covered in other blogs.

Organized Wisdom

This site connects doctors and patients, somewhat like HealthTap, which I covered two months ago. The data aspect of Organized Wisdom that I find interesting is their display of common ailments--all the usuals you'd expect, such as asthma and diabetes--associated with each geographic area you search for. This data comes straight from healthdata.gov. You can also click on each ailment to pull up an NIH page about it. This data could be used by doctors to intervene more effectively in the locations they work in, or by patients to choose where to live or prepare for the risks where they do live.

I should note that the conference also highlighted a journalist's site called

MyNYP

This patient portal, run by the New York-Presbyterian hospital, illustrates the evolution of current clinical institutions toward patient-centered care, such a focus of health activists. By exposing patient data on a portal, a hospital can turn its EHR (electronic health record) to a PHR (personal health record). MyNYP also resembles many hospitals in exposing their data in the lithe CCR format and providing an interface to Microsoft HealthVault, so that patients can mingle their hospital data with other data they accumulate. A web developer from MyNYP told me they provide a limited way for a patient also to upload data. I expect this feature will expand as patients demand ways to provide their doctors with observations of daily living.

The interface to MyNYP is clean and readable, but one aspect shows the complexity of demands for usable interfaces. Each hospital visit is a separate item, providing its own button for you to upload its data to HealthVault. For the neediest patients--those likely to care most about their data--this could mean having to click buttons dozens of times to upload a few months' worth of data. A simple list of checkboxes on the side and an "All" button, such as Gmail provides, could let patients select data more easily.

Lumeris Maestro

This is heavy-duty software for helping hospitals and clinics meet the requirements of "accountable delivery," which could be the formation of an Accountable Care Organization (one of the hot trends in current movements to lower health care costs) or some other type of integrated care. For instance, it can help you check areas of variability (are one set of doctors treating a condition substantially differently from another set?). The software helps organizations integrate clinical, pharmacy, and other data. It is designed to let organizations keep their current EHRs and run the accountability functions on top of them.

I held a special regard for this project because the presentation went very fast and raced through a number of buzzwords. I trust that experts in the field appreciated the importance of the goals and design, but the general public could easily have bypassed the project.

Feedback Disease Outbreak Investigation System

If you can figure out from the name what this project does, you don't need to read this blog. The project accumulates data about outbreaks of food-related illness and data that traces the origin of food delivered to various stores and restaurants. It provides elegant and animated visualizations that help an epidemiologist quickly track down the origin of food poisoning, hopefully in time to prevent further victims.

As with Lumeris Maestro, the presentation of this project at the conference was rather pro forma. The presenters didn't even bother to update their prepared script to mention the recent spectacular and widely reported outbreak of E coli deaths in Europe. But the demo persuasively conveyed the value of this software under crisis conditions.

Healthline

This site also excited a special sympathy in me because it's heavily based on taxonomies. When searching for in-depth data about a heart attack, it's important to know that clinicians refer to it as a myocardial infarction. Healthline correlates all these different forms of terminology (along with diagnostic codes) and offers rich information to the general public, including beautiful anatomical diagrams.

From the app to the ecosystem

The notion of an "ecosystem" for health care apps was invoked at least twice during yesterday's conference, by Park and by Risa Lavizzo-Mourey, President of the Robert Wood Johnson Foundation. I believe they were conceiving this ecosystem as a business environment that made software development lucrative enough to attract more apps. Kathleen Sebelius, HHS secretary, indicated in her opening remarks that the contestents for this year's challenges had to demonstrate sustainable business models as well as good applications. And an initiative called StartUpHealth was announced by Organized Wisdom, planning to provide funding and mentoring over a series of years to promising health-related enterpreneurs. All well and good. But I will use "ecosystem" in this section to denote a rather different set of supports and usage.

The first part of the ecosystem is getting accurate and useful data for the apps to play with. Opening government databases is critically important, but improving them is the next step. Much of the practice required by the government under "meaningful use" criteria is directed at making the data submitted by doctors is complete, consistent, and truly reflective of what they do in the clinic. Just one common example of dirty data is provided by diagnostic codes, which researchers complain are usually chosen for billing for the highest possible payment rather than indicating what's really wrong with the patient.

In short, we may have to do the best we can for years to come with incomplete or misleading data. And we need to create better standards (as I described in my blog from the Massachusetts Governor's Health Care Conference as well as incorporate them into electronic products clinicians use every day.

App developers should also coordinate with each other to use data formats in common. Letting a thousand flowers bloom should entail letting a thousand XML schemas proliferate. As we've seen in this article, healthdata.gov itself offers some degree of standardization, and so do industry formats such as the CCR.

More standards will need to be introduced so that consumers can easily plug machines such as blood pressure monitors into commodity cell phones, and let the apps recognize and communicate with such external devices.

Once the app is outfitted and running, we need to incorporate its results into actual changes in patient care. This reflects a question I asked of several app developers: your tool will be eagerly adopted by patients who are responsible, tech savvy, and conscious of their health options. But those are the patients already most likely to take care of their health. What about the famously unresponsive patient? (This population was highlighted at the conference by a project called ElizaLIVE.) A useful app will be able to integrate with an EHR or PHR, along with devices consulted by the patient and any other interventions planned by the doctor or by public health officials.

Thus, the ecosystem I envision looks like this figure:

A health care app at the center of an ecosystem

So the health care field is left with plenty of challenge, after all, to handle its challenges. A lot is going on in this area. The National Cancer Institute, Walgreens, and Sanofi-aventis all announced new challenges. The University of Michigan announced a graduate program in health informatics, and the Robert Wood Johnson Foundation announced a Health Data Consortium (of which O'Reilly Media is a member) for pushing the incorporation of data into more apps and into health initiatives in general.

June 03 2011

Should the patent office open its internal guidelines to the public?

Anyone following policy issues around technological innovation has noticed the power and scope of patents expanding over time. For instance, most people are aware of the Supreme Court's decision to allow the patenting of genes. Computer experts are more concerned about the decisions to patent software. Many forces contribute to the expanding reach of the patent system over time, and to understand them better I recommend a thoughtful, readable summary by law professor Melissa F. Wasserman.

Wasserman argues that the patent office, the appeals court that reviews its decisions, and even Congress have incentives to keep expanding patents. Her anecdotes strike home and her reasoning is lucid, although of course we lack experimental methods for testing her hypotheses. (That is, we can't prove that patent examiners or courts were biased by looking at statistics.) I think you'll find her article quite readable, with most of fussy legal language relegated to the footnotes. (I heard about the article thanks to an email from Harvard Law School's Petrie-Flom Center for Health Law Policy Biotechnology and Bioethics.)

As a simple example of the bias toward extending patents, consider that nobody ever appeals a patent examiner's decision to grant a patent, but aggrieved applicants often appeal decisions to deny a patent. And defending the decision to deny a patent costs the patent office a lot of money, which it can't make up from fees. Because the appeals court hears of dubious decisions only when a patent it denied, it has no opportunity to say, "Woah there, stop expanding the patent system."

But it gets even worse. Wasserman offers several subtle reasons why having a denial reversed hurts the patent office, whereas it hardly ever suffers if a patent is successfully challenged years later.

One of the most interesting observations in the paper--which Wasserman makes briefly in passing, on page 14--is that the administrators of the patent office provide guidance to examiners in a number of internal memos that are never exposed to the public. Here is a cause for open government advocates: show us the memos that contain criteria for approving or denying patents!

Wasserman is not unsympathetic to the patent office. On the contrary, she takes raises the question above the usual cries of "poor, overworked examiners" or "corporate-friendly, biased judges" and finds systemic reasons for today's patent bloat. These range from making it easier to challenge a patent right at the start to overhauling the funding of the patent office so it gets the support it needs both for approving and denying patents.

February 16 2011

Google Public Data Explorer goes public

The explosion of data has created important new roles for mapping tools, data journalism and data science. Today, Google made it possible for the public to visualize data in the Google Public Data Explorer.

Uploading a dataset is straightforward. Once the data sets have been uploaded, users can easily link to them or embed them. For instance, embedded below is a data visualization of unemployment rates in the continental United States. Click play to watch it change over time, with the expected alarming growth over the past three years.

As Cliff Kuang writes at Fast Company's design blog, Google infographic tools went online after the company bought the Gapminder Trendalizer, the data visualization technology invented by Dr. Hans Rosling.

Google Public Data Explorer isn't the first big data visualization app to go online, as Mike Melanson pointed out over at ReadWriteWeb. Sites like Factual, CKAN, InfoChimps and Amazon's Public Data Sets are also making it easier for people to work with big data

.

Of note to government agencies: Google is looking for partnerships with "official providers" of public data, which can request to have their datasets appear in the Public Data Explorer directory.

In a post on Google's official blog, Omar Benjelloun, technical lead of Google's public data team, wrote more about Public Data Explorer and the different ways that the search giant has been working with public data:

Together with our data provider partners, we've curated 27 datasets including more than 300 data metrics. You can now use the Public Data Explorer to visualize everything from labor productivity (OECD) to Internet speed (Ookla) to gender balance in parliaments (UNECE) to government debt levels (IMF) to population density by municipality (Statistics Catalonia), with more data being added every week.

Google also introduced a new metadata format, the Dataset Publishing Language (DSPL). DSPL is an XML-based format that Google says will support rich, interactive visualizations like those in the Public Data Explorer.

For those interested, as is Google's way, they have created a helpful embeddable document that explains how to use Public Data Explorer:

And for those interested in what democratized data visualization means to journalism, check out Megan Garber's thoughtful article at the Nieman Journalism Lab.

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl