Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

February 06 2014

Self-directed learning, and O’Reilly’s role in the ConnectED program

I wanted to provide a bit of perspective on the donation, announced on Wednesday by the White House, of a Safari Books Online subscription providing access to O’Reilly Media books, videos, and other educational content to every high school in the country.

First off, this came up very suddenly, with a request from the White House that reached me only on Monday, as the White House and Department of Education were gearing up to Wednesday’s announcement about broadband and iPads in schools.  I had a followup conversation with David Edelman, a young staffer who taught himself programming by reading O’Reilly books when in middle school, and launched a web development firm while in high school.  He made the case that connectivity alone, without content, wasn’t all it could be. And he thought of his own experience, and he thought of us.

So we began brainstorming if there were any way we could donate a library of O’Reilly ebooks to every high school in the country. Fortunately, there may be a relatively easy way for us to do that, via Safari Books Online, the subscription service we launched in 2000 in partnership with the Pearson Technology Group. Safari already offers access to corporations and colleges in addition to individuals, so we should be able to work out some kind of special library as part of this offering.

Andrew Savikas, the CEO of Safari, was game. We still haven’t figured out all the details on how we’ll be implementing the program, but in essence, we’ll be providing a custom Safari subscription containing a rich library of content from O’Reilly (and potentially other publishers, if they want to join us) to all high schools in the US.

What’s interesting here is that when we think about education, we often think about investing in teachers. And yes, teachers are incredibly important. But they are only one of the resources we provide to motivated students.

I can’t tell you how often people come up to me and say, “I taught myself everything I know about programming from your books.” In fast-moving fields like software development, people learn from their peers, by looking at source code, and reading books or watching videos to learn more about how things work. They teach themselves.

And if this is true of our adult customers, it is also true of high schoolers and even middle schoolers. I still laugh to remember when it came time to sign the contract for Adam Goldstein’s first book with us, Applescript: The Missing Manual, and he sheepishly confessed that his mother would have to sign for him, because he was only sixteen. His proposal had been flawless – over email, how were we to know how young he was? Adam went on to be an Internet entrepreneur, founder and CEO of the Hipmunk travel search engine.

Other people from O’Reilly’s extended circle of friends who may be well known to you who began their software careers in high school or younger include Eric Ries of Lean Startup fame, Dylan Field of Figma, Alex Rampell of TrialPay, and, sadly, Aaron Swartz.

As David explained the goals of the ConnectED program, he made the point that if only one or two kids in every school gets fired up to build and learn on their own, that could make a huge difference to the future of our country.

It’s easy to see how kids get exposed to programming when they live in Silicon Valley or another high-tech hub. It’s a lot harder in many other parts of the country. So we’re glad to be part of the ConnectEd program, and hope that one day we’ll all be using powerful new services that got built because some kid, somewhere, got his start programming as a result of our participation in this initiative.

June 14 2013

Four short links: 14 June 2014

  1. How Geeks Opened up the UK Government (Guardian) — excellent video introduction to how the UK is transforming its civil service to digital delivery. Most powerful moment for me was scrolling through various depts’ web sites and seeing consistent visual design.
  2. Tools for Working Remotely — Braid’s set of tools (Trello, Hackpad, Slingshot, etc.) for remote software teams.
  3. Git Push to Deploy on Google App EngineEnabling this feature will create a remote Git repository for your application’s source code. Pushing your application’s source code to this repository will simultaneously archive the latest the version of the code and deploy it to the App Engine platform.
  4. Amazon’s 3D Printer Store — printers and supplies. Deeply underwhelming moment of it arriving on the mainstream.

May 01 2013

Towards a more open world

Last September, I gave a 5 minute Ignite talk at the tenth Ignite DC. The video just became available. My talk, embedded below, focused on what I’ve been writing about here at Radar for the past three years: open government, journalism, media, mobile technology and more.

The 20 slides that I used for the Ignite were a condensed version of a much longer presentation I’d created for a talk on open data and journalism in Moldova, also I’ve embedded below.

April 30 2013

Linking open data to augmented intelligence and the economy

After years of steady growth, open data is now entering into public discourse, particularly in the public sector. If President Barack Obama decides to put the White House’s long-awaited new open data mandate before the nation this spring, it will finally enter the mainstream.

As more governments, businesses, media organizations and institutions adopt open data initiatives, interest in the evidence behind  release and the outcomes from it is similarly increasing. High hopes abound in many sectors, from development to energy to health to safety to transportation.

“Today, the digital revolution fueled by open data is starting to do for the modern world of agriculture what the industrial revolution did for agricultural productivity over the past century,” said Secretary of Agriculture Tom Vilsack, speaking at the G-8 Open Data for Agriculture Conference.

As other countries consider releasing their public sector information as data and machine-readable formats onto the Internet, they’ll need to consider and learn from years of effort at data.gov.uk, data.gov in the United States, and Kenya in Africa.

nigel_shadboltnigel_shadboltOne of the crucial sources of analysis for the success or failure of open data efforts will necessarily be research institutions and academics. That’s precisely why research from the Open Data Institute and Professor Nigel Shadbolt (@Nigel_Shadbolt) will matter in the months and years ahead.

In the following interview, Professor Shadbolt and I discuss what lies ahead. His responses were lightly edited for content and clarity.

How does your research on artificial intelligence (AI) relate to open data?

AI has always fascinated me. The quest for understanding what makes us smart and how we can make computers smart has always engaged me. While we’re trying to understand the principles of human intelligence and build a “brain in a box, smarter robots” or better speech processing algorithms, the world’s gone and done a different kind of AI: augmented intelligence. The web, with billions of human brains, has a new kind of collective and distributive capability that we couldn’t even see coming in AI. A number of us have coined a phrase, “Web science,” to understand the Web at a systems level, much as we do when we think about human biology. We talk about “systems biology” because there are just so many elements: technical, organizational, cultural.

The Web really captured my attention ten years ago as this really new manifestation of collective problem-solving. If you think about the link into earlier work I’d done, in what was called “knowledge engineering” or knowledge-based systems, there the problem was that all of the knowledge resided on systems on people’s desks. What the web has done is finish this with something that looks a lot like a supremely distributed database. Now, that distributed knowledge base is one version of the Semantic Web. The way I got into open data was the notion of using linked data and semantic Web technologies to integrate data at scale across the web — and one really high value source of data is open government data.

What was the reason behind the founding and funding of the Open Data Institute (ODI)?

The open government data piece originated in work I did in 2003 and 2004. We were looking at this whole idea of putting new data-linking standards on the Web. I had a project in the United Kingdom that was working with government to show the opportunities to use these techniques to link data. As in all of these things, that work was reported to Parliament. There was real interest in it, but not really top-level heavy “political cover” interest. Tim Berners-Lee’s engagement with the previous prime minister led to Gordon Brown appointing Tim and I to look at setting up data.gov.uk, getting data released and then the current coalition government taking that forward.

Throughout this time, Tim and I have been arguing that we could really do with a central focus, an institute whose principle motivation was working out how we could find real value in this data. The ODI does exactly that. It’s got about $60 million of public money over five years to incubate companies, build capacity, train people, and ensure that the public sector is supplying high quality data that can be consumed. The fundamental idea is that you ensure high quality supply by generating a strong demand side. The good demand side isn’t just public sector, it’s also the private sector.

What have we learned so far about what works and what doesn’t? What are the strategies or approaches that have some evidence behind them?

I think there are some clear learnings. One that I’ve been banging on about recently has been that yes, it really does matter to turn the dial so that governments have a presumption to publish non-personal public data. If you would publish it anyway, under a Freedom of Information request or whatever your local legislative equivalent is, why aren’t you publishing it anyway as open data? That, as a behavioral change. is a big one for many administrations where either the existing workflow or culture is, “Okay, we collect it. We sit on it. We do some analysis on it, and we might give it away piecemeal if people ask for it.” We should construct publication process from the outset to presume to publish openly. That’s still something that we are two or three years away from, working hard with the public sector to work out how to do and how to do properly.

We’ve also learned that in many jurisdictions, the amount of [open data] expertise within administrations and within departments is slight. There just isn’t really the skillset, in many cases. for people to know what it is to publish using technology platforms. So there’s a capability-building piece, too.

One of the most important things is it’s not enough to just put lots and lots of datasets out there. It would be great if the “presumption to publish” meant they were all out there anyway — but when you haven’t got any datasets out there and you’re thinking about where to start, the tough question is to say, “How can I publish data that matters to people?”

The data that matters is revealed in the fact that if we look at the download stats on these various UK, US and other [open data] sites. There’s a very, very distinctive parallel curve. Some datasets are very, very heavily utilized. You suspect they have high utility to many, many people. Many of the others, if they can be found at all, aren’t being used particularly much. That’s not to say that, under that long tail, there isn’t large amounts of use. A particularly arcane open dataset may have exquisite use to a small number of people.

The real truth is that it’s easy to republish your national statistics. It’s much harder to do a serious job on publishing your spending data in detail, publishing police and crime data, publishing educational data, publishing actual overall health performance indicators. These are tough datasets to release. As people are fond of saying, it holds politicians’ feet to the fire. It’s easy to build a site that’s full of stuff — but does the stuff actually matter? And does it have any economic utility?

Page views and traffic aren’t ideal metrics for measuring success for an open data platform. What should people measure, in terms of actual outcomes in citizens’ lives? Improved services or money saved? Performance or corrupt politicians held accountable? Companies started or new markets created?

You’ve enumerated some of them. It’s certainly true that one of the challenges is to instrument the effect or the impact. Actually, it’s the last thing that governments, nation states, regions or cities who are enthused to do this thing do. It’s quite hard.

Datasets, once downloaded, may then be virally reproduced all over the place, so that you don’t notice it from a government site. One of the requirements in most of the open licensing which is so essential to this effort is usually has a requirement for essential attribution. Those licenses should be embedded in the machine readable datasets themselves. Not enough attention is paid to that piece of process, to actually noticing when you’re looking at other applications, other data and publishing efforts, that attribution is there. We should be smarter about getting better sense from the attribution data.

The other sources of impact, though: How do you evidence actual internal efficiencies and internal government-wide benefits of open data? I had an interesting discussion recently, where the department of IT had said, “You know, I thought this was all stick and no carrot. I thought this was all in overhead, to get my data out there for other people’s benefits, but we’re now finding it so much easier to re-consume our own data and repurpose it in other contexts that it’s taken a huge amount of friction out of our own publication efforts.”

Quantified measures would really help, if we had standard methods to notice those kinds of impacts. Our economists, people whose impact is around understanding where value is created, really haven’t embraced open markets, particularly open data markets, in a very substantial way. I think we need a good number of capable economists pilling into this, trying to understand new forms of value and what the values are that are created.

I think a lot of the traditional models don’t stand up here. Bizarrely, it’s much easier to measure impact when information scarcity exists and you have something that I don’t, and I have to pay you a certain fee for that stuff. I can measure that value. When you’ve taken that asymmetry out, when you’ve made open data available more widely, what are the new things that flourish? In some respects, you’ll take some value out of the market, but you’re going to replace it by wider, more distributed, capable services. This is a key issue.

The ODI will certainly be commissioning and is undertaking work in this area. We published a piece of work jointly with Deloitte in London, looking at evidence-linked methodology.

You mentioned the demand-side of open data. What are you learning in that area — and what’s being done?

There’s an interesting tension here. If we turn the dial in the governmental mindset to the “presumption to publish” — and in the UK, our public data principles actually embrace that as government policy — you are meant to publish unless there’s an issue in personal information or national security why you would not. In a sense, you say, “Well, we just publish everything out there? That’s what we’ll do. Some of it will have utility, and some of it won’t.”

When the Web took off, and you offered pages as a business or an individual, you didn’t foresee the link-making that would occur. You didn’t foresee that PageRank would ultimately give you a measure of your importance and relevance in the world and could even be monetized after the fact. You didn’t foresee that those pages have their own essential network effect, that the more pages there are that interconnect, that there’s value being created out of it and so there’s is a strong argument [for publishing them].

So, you know, just publish. In truth, the demand side is an absolutely great and essential test of whether actually [publishing data] does matter.

Again, to take the Web as an analogy, large amounts of the Web are unattended to, neglected, and rot. It’s just stuff nobody cares about, actually. What we’re seeing in the open data effort in the UK is that it’s clear that some data is very privileged. It’s at the center of lots of other datasets.

In particular, [data about] location, occurrence, and when things occurred, and stable ways of identifying those things which are occurring. Then, of course, the data space that relates to companies, their identifications, the contracts they call, and the spending they engage in. That is the meat and drink of business intelligence apps all across the planet. If you started to turn off an ability for any business intelligence to access legal identifiers or business identifiers, all sorts of oversight would fall apart, apart from anything else.

The demand side [of open data] can be characterized. It’s not just economic. It will have to do with transparency, accountability and regulatory action. The economic side of open data gives you huge room for maneuver and substantial credibility when you can say, “Look, this dataset of spending data in the UK, published by local authorities, is the subject of detailed analytics from companies who look at all data about how local authorities and governments are spending their data. They sell procurement analysis insights back to business and on to third parties and other parts of the business world, saying ‘This is the shape of how the UK PLC is buying.’”

What are some of the lessons we can learn from how the World Wide Web grew and the value that it’s delivered around the world?

That’s always a worry, that, in some sense, the empowered get more powerful. What we do see is that, in open data in particular, new sorts of players couldn’t enter the game at all.

My favorite example is in mass transportation. In the UK, we have to fight quite hard to get some of the data from bus, rail and other forms of transportation made openly available. Until that was done, there was a pretty small number of supplies from this market.

In London, where all of it was made available from the Transport for London Authority, there’s just been an explosion of apps and businesses who are giving you subtly and distinct experiences as users of that data. I’ve got about eight or nine apps on my phone that give me interestingly distinctive views of moving about the city of London. I couldn’t have predicted or anticipated many of those exist.

I’m sure the companies who held that data could’ve spent large amounts of money and still not given me anything like the experience I now have. The flood of innovation around the data has really been significant and many, many more players and stakeholders in that space.

The Web taught us that serendipitous reuse, where you can’t anticipate where the bright idea comes from, is what is so empowering. The flipside of that is that it also reveals that, in some cases, the data isn’t necessarily of a quality that you might’ve thought. This effort might allow for civic improvement or indeed, business improvement in some cases, where businesses come and improve the data the state holds.

What’s happening in the UK with the so-called “MiData Initiative,” which posits that people have a right to access and use personal data disclosed to them?

I think this is every bit as potentially disruptive and important as open government data. We’re starting to see the emergence of what we might think of as a new class of important data, “personal assets.”

People have talked about “personal information management systems” for a long time now. Frequently, it’s revolved around managing your calendar or your contact list, but it’s much deeper. Imagine that you, the consumer, or you, the citizen, had a central locus of authority around data that was relevant to you: consumer data from retail, from the banks that you deal with, from the telcos you interact with, from the utilities you get your gas, water and electricity from. Imagine if that data infosphere was something that you could access easily, with a right to reuse and redistribute it as you saw fit.

The canonical example, of course, is health data. It isn’t all data that business holds, it’s also data the state holds, like your health records, educational transcript, welfare, tax, or any number of areas.

In the UK, we’ve been working towards empowering consumers, in particular through this MiData program. We’re trying to get to a place where consumers have a right to data held about their transactions by businesses, [released] back to them in a reusable and flexible way. We’ve been working on a voluntary program in this area for the last year. We have a consultation on taking up power to require large companies to give that information back. There is a commitment to the UK, for the first time, to get health records back to patients as data they control, but I think it has to go much more widely.

Personal data is a natural complement to open data. Some of the most interesting applications I’m sure we’re going to see in this area are where you take your personal data and enrich it with open data relating to businesses, the services of government, or the actual trading environment you’re in. In the UK, we’ve got six large energy companies that compete to sell energy to you.

Why shouldn’t groups and individuals be able to get together and collectively purchase in the same way that corporations can purchase and get their discounts? Why can’t individuals be in a spot market, effectively, where it’s easy to move from one supplier to another? Along with those efficiencies in the market and improvements in service delivery, it’s about empowering consumers at the end of the day.

This post is part of our ongoing series on the open data economy.

April 18 2013

Sprinting toward the future of Jamaica

Creating the conditions for startups to form is now a policy imperative for governments around the world, as Julian Jay Robinson, minister of state in Jamaica’s Ministry of Science, Technology, Energy and Mining, reminded the attendees at the “Developing the Caribbean” conference last week in Kingston, Jamaica.

photo-22photo-22

Robinson said Jamaica is working on deploying wireless broadband access, securing networks and stimulating tech entrepreneurship around the island, a set of priorities that would have sounded of the moment in Washington, Paris, Hong Kong or Bangalore. He also described open access and open data as fundamental parts of democratic governance, explicitly aligning the release of public data with economic development and anti-corruption efforts. Robinson also pledged to help ensure that Jamaica’s open data efforts would be successful, offering a key ally within government to members of civil society.

The interest in adding technical ability and capacity around the Caribbean was sparked by other efforts around the world, particularly Kenya’s open government data efforts. That’s what led the organizers to invite Paul Kukubo to speak about Kenya’s experience, which Robinson noted might be more relevant to Jamaica than that of the global north.

Kukubo, the head of Kenya’s Information, Communication and Technology Board, was a key player in getting the country’s open data initiative off the ground and evangelizing it to developers in Nairobi. At the conference, Kukubo gave Jamaicans two key pieces of advice. First, open data efforts must be aligned with national priorities, from reducing corruption to improving digital services to economic development.

“You can’t do your open data initiative outside of what you’re trying to do for your country,” said Kukubo.

Second, political leadership is essential to success. In Kenya, the president was personally involved in open data, Kukubo said. Now that a new president has been officially elected, however, there are new questions about what happens next, particularly given that pickup in Kenya’s development community hasn’t been as dynamic as officials might have hoped. There’s also a significant issue on the demand-side of open data, with respect to the absence of a Freedom of Information Law in Kenya.

When I asked Kukubo about these issues, he said he expects a Freedom of Information law will be passed this year in Kenya. He also replied that the momentum on open data wasn’t just about the supply side.

“We feel that in the usage side, especially with respect to the developer ecosystem, we haven’t necessarily gotten as much traction from developers using data and interpreting cleverly as we might have wanted to have,” he said. “We’re putting putting more into that area.”

With respect to leadership, Kukubo pointed out that newly elected Kenyan President Uhuru Kenyatta drove open data release and policy when he was the minister of finance. Kukubo expects him to be very supportive of open data in office.

The development of open data in Jamaica, by way of contrast, has been driven by academia, said professor Maurice McNaughton, director of the Center of Excellence at the Mona School of Business at the University of the West Indies (UWI). The Caribbean Open Institute, for instance, has been working closely with Jamaica’s Rural Agriculture Development Authority (RADA). There are high hopes that releases of more data from RADA and other Jamaican institutions will improve Jamaica’s economy and the effectiveness of its government.

Open data could add $35 million annually to the Jamaican economy, said Damian Cox, director of the Access to Information Unit in the Office of the Prime Minister, citing a United Nations estimate. Cox also explicitly aligned open data with measuring progress toward Millennium Development Goals, positing that increasing the availability of data will enable the civil society, government agencies and the UN to more accurately assess success.

The development of (open) data-driven journalism

Developing the Caribbean focused on the demand side of open data as well, particularly the role of intermediaries in collecting, cleaning, fact checking, and presenting data, matched with necessary narrative and context. That kind of work is precisely what data-driven journalism does, which is why it was one of the major themes of the conference. I was invited to give an overview of data-driven journalism that connected some trends and highlighted the best work in the field.

I’ve written quite a bit about how data-driven journalism is making sense of the world elsewhere, with a report yet to come. What I found in Jamaica is that media there have long since begun experimenting in the field, from the investigative journalism at Panos Caribbean to the relatively recent launch of diGJamaica by the Gleaner Company.

diGJamaica is modeled upon the Jamaican Handbook and includes more than a million pages from The Gleaner newspaper, going back to 1834. The site publishes directories of public entities and public data, including visualizations. It charges for access to the archives.

Legends and legacies

Usain Bolt in JamaicaUsain Bolt in Jamaica

Olympic champion Usain Bolt, photographed in his (fast) car at the UWI/Usain Bolt Track in Mona, Jamaica.

Normally, meeting the fastest man on earth would be the most memorable part of any trip. The moment that left the deepest impression from my journey to the Caribbean, however, came not from encountering Usain Bolt on a run but from within a seminar room on a university campus.

As a member of a panel of judges, I saw dozens of young people present after working for 30 hours at a hackathon at the University of the West Indies. While even the most mature of the working apps was still a prototype, the best of them were squarely focused on issues that affect real Jamaicans: scoring the risk of farmers that needed banking loans and collecting and sharing data about produce.

The winning team created a working mobile app that would enable government officials to collect data at farms. While none of the apps are likely to be adopted by the agricultural agency in its current form, or show up in the Google Play store this week, the experience the teams gained will help them in the future.

As I left the island, the perspective that I’d taken away from trips to Brazil, Moldova and Africa last year was further confirmed: technical talent and creativity can be found everywhere in the world, along with considerable passion to apply design thinking, data and mobile technology to improve the societies people live within. This is innovation that matters, not just clones of popular social networking apps — though the judges saw more than a couple of those ideas flow by as well.

In the years ahead, Jamaican developers will play an important role in media, commerce and government on the island. If attracting young people to engineering and teaching them to code is the long-term legacy of efforts like Developing the Caribbean, it will deserve its own thumbs up from Mr. Bolt. The track to that future looks wide open.

photo-23photo-23

Disclosure: the cost of my travel to Jamaica was paid for by the organizers of the Developing the Caribbean conference.

April 04 2013

Four short links: 4 April 2013

  1. geo-bootstrap — Twitter Bootstrap fork that looks like a classic geocities page. Because. (via Narciso Jaramillo)
  2. Digital Public Library of America — public libraries sharing full text and metadata for scans, coordinating digitisation, maximum reuse. See The Verge piece. (via Dan Cohen)
  3. Snake Robots — I don’t think this is a joke. The snake robot’s versatile abilities make it a useful tool for reaching locations or viewpoints that humans or other equipment cannot. The robots are able to climb to a high vantage point, maneuver through a variety of terrains, and fit through tight spaces like fences or pipes. These abilities can be useful for scouting and reconnaissance applications in either urban or natural environments. Watch the video, the nightmares will haunt you. (via Aaron Straup Cope)
  4. The Power of Data in Aboriginal Hands (PDF) — critique of government statistical data gathering of Aboriginal populations. That ABS [Australian Bureau of Statistics] survey is designed to assist governments, commentators or academics who want to construct policies that shape our lives or encourage a one-sided public discourse about us and our position in the Australian nation. The survey does not provide information that Indigenous people can use to advance our position because the data is aggregated at the national or state level or within the broad ABS categories of very remote, remote, regional or urban Australia. These categories are constructed in the imagination of the Australian nation state. They are not geographic, social or cultural spaces that have relevance to Aboriginal people. [...] The Australian nation’s foundation document of 1901 explicitly excluded Indigenous people from being counted in the national census. That provision in the constitution, combined with Section 51, sub section 26, which empowered the Commonwealth to make special laws for ‘the people of any race, other than the Aboriginal race in any State’ was an unambiguous and defining statement about Australian nation building. The Founding Fathers mandated the federated governments of Australia to oversee the disappearance of Aboriginal people in Australia.

January 31 2013

NASA launches second International Space Apps Challenge

From April 20 to April 21, on Earth Day, the second international Space Apps Challenge will invite developers on all seven continents to the bridge to contribute code to NASA projects.

space app challengespace app challenge

Given longstanding concerns about the sustainability of apps contests, I was curious about NASA’s thinking behind launching this challenge. When I asked NASA’s open government team about the work, I immediately heard back from Nick Skytland (@Skytland), who heads up NASA’s open innovation team.

“The International Space Apps Challenge was a different approach from other federal government ‘app contests’ held before,” replied Skytland, via email.

“Instead of incentivizing technology development through open data and a prize purse, we sought to create a unique platform for international technological cooperation though a weekend-long event hosted in multiple locations across the world. We didn’t just focus on developing software apps, but actually included open hardware, citizen science, and data visualization as well.”

Aspects of that answer will please many open data advocates, like Clay Johnson or David Eaves. When Eaves recently looked at apps contests, in the context of his work on Open Data Day (coming up on February 23rd), he emphasized the importance of events that build community and applications that meet the needs of citizens or respond to business demand.

The rest of my email interview with Skytland follows.

Why is the International Space Apps Challenge worth doing again?

Nick Skytland: We see the International Space Apps Challenge event as a valuable platform for the Agency because it:

  • Creates new technologies and approaches that can solve some of the key challenges of space exploration, as well as making current efforts more cost-effective.
  • Uses open data and technology to address global needs to improve life on Earth and in space.
  • Demonstrates our commitment to the principles of the Open Government Partnership in a concrete way.

What were the results from the first challenge?

Nick Skytland: More than 100 unique open-source solutions were developed in less then 48 hours.

There were 6 winning apps, but the real “results” of the challenge was a 2,000+ person community engaged in and excited about space exploration, ready to apply that experience to challenges identified by the agency at relatively low cost and on a short timeline.

How does this challenge contribute to NASA’s mission?

Nick Skytland: There were many direct benefits. The first International Space Apps Challenge offered seven challenges specific to satellite hardware and payloads, including submissions from at least two commercial organizations. These challenges received multiple solutions in the areas of satellite tracking, suborbital payloads, command and control systems, and leveraging commercial smartphone technology for orbital remote sensing.

Additionally, a large focus of the Space Apps Challenge is on citizen innovation in the commercial space sector, lowering the cost and barriers to space so that it becomes easier to enter the market. By focusing on citizen entrepreneurship, Space Apps enables NASA to be deeply involved with the quickly emerging space startup culture. The event was extremely helpful in encouraging the collection and dissemination of space-derived data.

As you know, we have amazing open data. Space Apps is a key opportunity for us to continue to open new data sources and invite citizens to use them. Space Apps also encouraged the development of new technologies and new industries, like the space-based 3D printing industry and open-source ROV (remote submersibles for underwater exploration.)

How much of the code from more than 200 “solutions” is still in use?

Nick Skytland: We didn’t track this last time around, but almost all (if not all) of the code is still available online, many of the projects continued on well after the event, and some teams continue to work on their projects today. The best example of this is the Pineapple Project, which participated in numerous other hackathons after the 2012 International Space Apps Challenge and just recently was accepted into the Geeks Without Borders accelerator program.

Of the 71 challenges that were offered last year, a low percentage were NASA challenges — about 13, if I recall correctly. There are many reasons for this, mostly that cultural adoption of open government philosophies within government is just slow. What last year did for us is lay the groundwork. Now we have much more buy-in and interest in what can be done. This year, our challenges from NASA are much more mission-focused and relevant to needs program managers have within the agency.

Additionally, many of the externally submitted challenges we have come from other agencies who are interested in using space apps as a platform to address needs they have. Most notably, we recently worked with the Peace Corps on the Innovation Challenge they offered at RHoK in December 2012, with great results.

The International Space Apps Challenge was a way for us not only to move forward technology development, drawing on the talents and initiative of bright-minded developers, engineers, and technologists, but also a platform to actually engage people who have a passion and desire to make an immediate impact on the world.

What’s new in 2013?

Nick Skytland: Our goal for this year is to improve the platform, create an even better engagement experience, and focus the collective talents of people around the world on develop technological solutions that are relevant and immediately useful.

We have a high level of internal buy-in at NASA and a lot of participation outside NASA, from both other government organizations and local leads in many new locations. Fortunately, this means we can focus our efforts on making this an meaningful event and we are well ahead of the curve in terms of planning to do this.

To date, 44 locations have confirmed their participation and we have six spots remaining, although four of these are reserved as placeholders for cities we are pursuing. We have 50 challenge ideas already drafted for the event, 25 of which come directly from NASA. We will be releasing the entire list of challenges around March 15th on spaceappschallenge.org.

We have 55 organizations so far that are supporting the event, including seven other U.S. government organizations, and international agencies. Embassies or consulates are either directly leading or hosting the events in Monterrey, Krakow, Sofia, Jakarta, Santa Cruz, Rome, London and Auckland.

 

January 17 2013

Yelp partners with NYC and SF on restaurant inspection data

One of the key notions in my “Government as a Platform” advocacy has been that there are other ways to partner with the private sector besides hiring contractors and buying technology. One of the best of these is to provide data that can be used by the private sector to build or enrich their own citizen-facing services. Yes, the government runs a weather website but it’s more important that data from government weather satellites shows up on the Weather Channel, your local TV and radio stations, Google and Bing weather feeds, and so on. They already have more eyeballs and ears combined than the government could or should possibly acquire for its own website.

That’s why I’m so excited to see a joint effort by New York City, San Francisco, and Yelp to incorporate government health inspection data into Yelp reviews. I was involved in some early discussions and made some introductions, and have been delighted to see the project take shape.

My biggest contribution was to point to GTFS as a model. Bibiana McHugh at the city of Portland’s TriMet transit agency reached out to Google, Bing, and others with the question: “If we came up with a standard format for transit schedules, could you use it?” Google Transit was the result — a service that has spread to many other U.S. cities. When you rejoice in the convenience of getting transit timetables on your phone, remember to thank Portland officials as well as Google.

In a similar way, Yelp, New York, and San Francisco came up with a data format for health inspection data. The specification is at http://yelp.com/healthscores. It will reportedly be announced at the US Conference of Mayors with San Francisco Mayor Ed Lee today.

Code for America built a site for other municipalities to pledge support. I’d also love to see support in other local restaurant review services from companies like Foursquare, Google, Microsoft, and Yahoo!  This is, as Chris Anderson of TED likes to say, “an idea worth spreading.”

December 06 2012

The United States (Code) is on Github

When Congress launched Congress.gov in beta, they didn’t open the data. This fall, a trio of open government developers took it upon themselves to do what custodians of the U.S. Code and laws in the Library of Congress could have done years ago: published data and scrapers for legislation in Congress from THOMAS.gov in the public domain. The data at github.com/unitedstates is published using an “unlicense” and updated nightly. Credit for releasing this data to the public goes to Sunlight Foundation developer Eric Mill, GovTrack.us founder Josh Tauberer and New York Times developer Derek Willis.

“It would be fantastic if the relevant bodies published this data themselves and made these datasets and scrapers unnecessary,” said Mill, in an email interview. “It would increase the information’s accuracy and timeliness, and probably its breadth. It would certainly save us a lot of work! Until that time, I hope that our approach to this data, based on the joint experience of developers who have each worked with it for years, can model to government what developers who aim to serve the public are actually looking for online.”

If the People’s House is going to become a platform for the people, it will need to release its data to the people. If Congressional leaders want THOMAS.gov to be a platform for members of Congress, legislative staff, civic developers and media, the Library of Congress will need to release structured legislative data. THOMAS is also not updated in real-time, which means that there will continue to be a lag between a bill’s introduction and the nation’s ability to read the bill before a vote.

Until that happens, however, this combination of scraping and open source data publishing offers a way forward on Congressional data to be released to the public, wrote Willis, on his personal blog:

Two years ago, there was a round of blog posts touched off by Clay Johnson that asked, “Why shouldn’t there be a GitHub for data?” My own view at the time was that availability of the data wasn’t as much an issue as smart usage and documentation of it: ‘We need to import, prune, massage, convert. It’s how we learn.’

Turns out that GitHub actually makes this easier, and I’ve had a conversion of sorts to the idea of putting data in version control systems that make it easier to view, download and report issues with data … I’m excited to see this repository grow to include not only other congressional information from THOMAS and the new Congress.gov site, but also related data from other sources. That this is already happening only shows me that for common government data this is a great way to go.

In the future, legislation data could be used to show iterations of laws and improve the ability of communities at OpenCongress, POPVOX or CrunchGov to discover and discuss proposals. As Congress incorporates more tablets on the floor during debates, such data could also be used to update legislative dashboards.

The choice to use Github as a platform for government data and scraper code is another significant milestone in a breakout year for Github’s use in government. In January, the British government committed GOV.UK code to Github. NASA, after contributing its first code in January added 11 code repositories this year. In August, the White House committed code to Github. In September, the Open Gov Foundation open sourced the MADISON crowd sourced legislation platform.

The choice to use Github for this scraper and legislative data, however, presents a new and interesting iteration in the site’s open source story.

“Github is a great fit for this because it’s neutral ground and it’s a welcoming environment for other potential contributors,” wrote Sunlight Labs director Tom Lee, in an email. “Sunlight expects to invest substantial resources in maintaining and improving this codebase, but it’s not ours: we think the data made available by this code belongs to every American. Consequently the project needed to embrace a form that ensures that it will continue to exist, and be free of encumbrances, in a way that’s not dependent on any one organization’s fortunes.”

Mill, an open government developer at Sunlight Labs, shared more perspective in the rest of our email interview, below.

Is this based on the GovTrack.us scraper?

Eric Mill: All three of us have contributed at least one code change to our new THOMAS scraper; the majority of the code was written by me. Some of the code has been taken or adapted from Josh’s work.

The scraper that currently actively populates the information on GovTrack is an older Perl-based scraper. None of that code was used directly in this project. Josh had undertaken an incomplete, experimental rewrite of these scrapers in Python about a year ago (code), but my understanding is it never got to the point of replacing GovTrack’s original Perl scripts.

We used the code from this rewrite in our new scraper, and it was extremely helpful in two ways &mddash; providing a roadmap of how THOMAS’ URLs and sitemap work, and parsing meaning out of the text of official actions.

Parsing the meaning out of action text is, I would say, about half the value and work of the project. When you look at a page on GovTrack or OpenCongress and see the timeline of a bill’s life — “Passed House,” “Signed by the President,” etc. — that information is only obtainable by analyzing the order and nature of the sentences of the official actions that THOMAS lists. Sentences are finicky, inconsistent things, and extracting meaning from them is tricky work. Just scraping them out of THOMAS.gov’s HTML is only half the battle. Josh has experience at doing this for GovTrack. The code in which this experience was encapsulated drastically reduced how long it took to create this.

How long did this take to build?

Eric Mill: Creating the whole scraper, and the accompanying dataset, was about 4 weeks of work on my part. About half of that time was spent actually scraping — reverse engineering THOMAS’ HTML — and the other half was spent creating the necessary framework, documentation, and general level of rigor for this to be a project that the community can invest in and rely on.

There will certainly be more work to come. THOMAS is shutting down in a year, to be replaced by Congress.gov. As Congress.gov grows to have the same level of data as THOMAS, we’ll gradually transition the scraper to use Congress.gov as its data source.

Was this data online before? What’s new?

Eric Mill: All of the data in this project has existed in an open way at GovTrack.us, which has provided bulk data downloads for years. The Sunlight Foundation and OpenCongress have both created applications based on this data, as have many other people and organizations.

This project was undertaken as a collaboration because Josh and I believed that the data was fundamental enough that it should exist in a public, owner-less commons, and that the code to generate it should be in the same place.

There are other benefits, too. Although the source code to GovTrack’s scrapers has been available, it depends on being embedded in GovTrack’s system, and the use of a database server. It was also written in Perl, a language less widely used today, and produced only XML. This new Python scraper has no other dependencies, runs without a database, and generates both JSON and XML. It can be easily extended to output other data formats.

Finally, everyone who worked on the project has had experience in dealing with legislative information. We were able to use that to make various improvements to how the data is structured and presented that make it easier for developers to use the data quickly and connect it to other data sources.

Searches for bills in Scout use data collected directly from this scraper. What else are people doing with the data?

Eric Mill: Right now, I only know for a fact that the Sunlight Foundation is using the data. GovTrack recently sent an email to its developer list announcing that in the near future, its existing dataset would be deprecated in favor of this new one, so the data should be used in GovTrack before long.

Pleasantly, I’ve found nearly nothing new by switching from GovTrack’s original dataset to this one. GovTrack’s data has always had a high level of quality. So far, the new dataset looks to be as good.

Is it common to host open data on Github?

Eric Mill: Not really. Github’s not designed for large-scale data hosting. This is an experiment to see whether this is a useful place to host it. The primary benefit is that no single person or organization (besides Github) is paying for download bandwidth.

The data is published as a convenience, for people to quickly download for analysis or curiosity. I expect that any person or project that intends to integrate the data into their work on an ongoing basis will do so by using the scraper, not downloading the data repeatedly from Github. It’s not our intent that anyone make their project dependent on the Github download links.

Laudably, Josh Tauberer donated his legislator dataset and converted it to YAML. What’s YAML?

Eric Mill: YAML is a lightweight data format intended to be easy for humans to both read and write. This dataset, unlike the one scraped from THOMAS, is maintained mostly through manual effort. Therefore, the data itself needs to be in source control, it needs to not be scary to look at and it needs to be obvious how to fix or improve it.

What’s in this legislator dataset? What can be done with it?

Eric Mill: The legislator dataset contains information about members of Congress from 1789 to the present day. It is a wealth of vital data for anyone doing any sort of application or analysis of members of Congress. This includes a breakdown of their name, a crosswalk of identifiers on other services, and social media accounts. Crucially, it also includes a member of Congress’ change in party, chamber, and name over time.

For example, it’s a pretty necessary companion to the dataset that our scraper gathers from THOMAS. THOMAS tells you the name of the person who sponsored this bill in 2003, and gives you a THOMAS-specific ID number. But it doesn’t tell you what that person’s party was at the time, or if the person is still a member of the same chamber now as they were in 2003 (or whether they’re in office at all). So if you want to say “how many Republicans sponsored bills in 2003,” or if you’d like to draw in information from outside sources, such as campaign finance information, you will need a dataset like the one that’s been publicly donated here.

Sunlight’s API on members of Congress is easily the most prominent API, widely used by people and organizations to build systems that involve legislators. That API’s data is a tiny subset of this new one.

You moved a legal citation and extractor into this code. What do they do here?

Eric Mill: The legal citation extractor, called “Citation,” plucks references to the US Code (and other things) out of text. Just about any system that deals with legal documents benefits from discovering links between those documents. For example, I use this project to power US Code searches on Scout, so that the site returns results that cite some piece of the law, regardless of how that citation is formatted. There’s no text-based search, simple or advanced, that would bring back results matching a variety of formats or matching subsections — something dedicated to the arcane craft of citation formats is required.

The citation extractor is built to be easy for others to invest in. It’s a stand-alone tool that can be used through the command line, HTTP, or directly through JavaScript. This makes it suitable for the front-end or back-end, and easy to integrate into a project written in any language. It’s very far from complete, but even now it’s already proven extremely useful at creating powerful features for us that weren’t possible before.

The parser for the U.S. Code itself is a dataset, written by my colleague Thom Neale. The U.S. Code is published by the government in various formats, but none of them are suitable for easy reuse. The Office of Law Revision Counsel, which publishes the U.S. Code, is planning on producing a dedicated XML version of the US Code, but they only began the procurement process recently. It could be quite some time before it appears.

Thom’s work parses the “locator code” form of the data, which is a binary format designed for telling GPO’s typesetting machines how to print documents. It is very specialized and very complicated. This parser is still in an early stage and not in use in production anywhere yet. When it’s ready, it’ll produce reliable JSON files containing the law of the United States in a sensible, reusable form.

Does Github’s organization structure makes a data commons possible?

Eric Mill: Github deliberately aligns its interests with the open source community, so it is possible to host all of our code and data there for free. Github offers unlimited public repositories, collaborators, bandwidth, and disk space to organizations and users at no charge. They do this while being an extremely successful, profitable business.

On Github, there are two types of accounts: users and organizations. Organizations are independent entities, but no one has to log in as an organization or share a password. Instead, at least one user will be marked as the “owner” of an organization. Ownership can easily change hands or be distributed amongst various users. This means that Josh, Derek, and I can all have equal ownership of the “unitedstates” repositories and data. Any of us can extend that ownership to anyone we want in a simple, secure way, without password sharing.

Github as a company has established both a space and a culture that values the commons. All software development work, from hobbyist to non-profit to corporation, from web to mobile to enterprise, benefits from a foundation of open source code. Github is the best living example of this truth, so it’s not surprising to me that it was the best fit for our work.

Why is this important to the public?

Eric Mill: The work and artifacts of our government should be available in bulk, for easy download, in accessible formats, and without license restrictions. This is a principle that may sound important and obvious to every technologist out there, but it’s rarely the case in practice. When it is, the bag is usually mixed. Not every member of the public will be able or want to interact directly with our data or scrapers. That’s fine. Developers are the force multipliers of public information. Every citizen can benefit somehow from what a developer can build with government information.

Related:

November 26 2012

Investigating data journalism

Great journalism has always been based on adding context, clarity and compelling storytelling to facts. While the tools have improved, the art is the same: explaining the who, what, where, when and why behind the story. The explosion of data, however, provides new opportunities to think about reporting, analysis and publishing stories.

As you may know, there’s already a Data Journalism Handbook to help journalists get started. (I contributed some commentary to it). Over the next month, I’m going to be investigating the best data journalism tools currently in use and the data-driven business models that are working for news startups. We’ll then publish a report that shares those insights and combines them with our profiles of data journalists.

Why dig deeper? Getting to the heart of what’s hype and what’s actually new and noteworthy is worth doing. I’d like to know, for instance, whether tutorials specifically designed for journalists can be useful, as Joe Brockmeier suggested at ReadWrite. On a broader scale, how many data journalists are working today? How many will be needed? What are the primary tools they rely upon now? What will they need in 2013? Who are the leaders or primary drivers in the area? What are the most notable projects? What organizations are embracing data journalism, and why?

This isn’t a new interest for me, but it’s one I’d like to found in more research. When I was offered an opportunity to give a talk at the second International Open Government Data Conference at the World Bank this July, I chose to talk about open data journalism and invited practitioners on stage to share what they do. If you watch the talk and the ensuing discussion in the video below, you’ll pick up great insight from the work of the Sunlight Foundation, the experience of Homicide Watch and why the World Bank is focused on open data journalism in developing countries.

The sites and themes that I explored in that talk will be familiar to Radar readers, focusing on the changing dynamic between the people formerly known as the audience and the editors, researchers and reporters who are charged with making sense of the data deluge for the public good. If you’ve watched one of my Ignites or my Berkman Center talk, much of this won’t be new to you, but the short talk should be a good overview of where I think this aspect of data journalism is going and why I think it’s worth paying attention to today.

For instance, at the Open Government Data Conference Bill Allison talked about how open data creates government accountability and reveals political corruption. We heard from Chris Amico, a data journalist who created a platform to help a court reporter tell the story of every homicide in a city. And we heard from Craig Hammer how the World Bank is working to build capacity in media organizations around the world to use data to show citizens how and where borrowed development dollars are being spent on their behalf.

The last point, regarding capacity, is a critical one. Just as McKinsey identified a gap between available analytic talent and the demand created by big data, there is a data science skills gap in journalism. Rapidly expanding troves of data are useless without the skills to analyze it, whatever the context. An over focus on tech skills could exclude the best candidates for these jobs — but there will need to be training to build them.

This reality hasn’t gone unnoticed by foundations or the academy. In May, the Knight Foundation gave Columbia University $2 million for research to help close the data science skills gap. (I expect to be talking to Emily Bell, Jonathan Stray and the other instructors and students.)

Media organizations must be able to put data to work, a need that was amply demonstrated during Hurricane Sandy, when public open government data feeds became critical infrastructure.

What I’d like to hear from you is what you see working around the world, from the Guardian to ProPublica, and what you’re working on, and where. To kick things off, I’d like to know which organizations are doing the most innovative work in data journalism.

Please weigh in through the comments or drop me a line at alex@oreilly.com or at @digiphile on Twitter.

November 02 2012

Charging up: Networking resources and recovery after Hurricane Sandy

Even though the direct danger from Hurricane Sandy has passed, lower Manhattan and many parts of Connecticut and New Jersey remain a disaster zone, with millions of people still without power, reduced access to food and gas, and widespread damage from flooding. As of yesterday, according to reports from Wall Street Journal, thousands of residents remain in high-rise buildings with no water, power or heat.

E-government services are in heavy demand, from registering for disaster aid to finding resources, like those offered by the Office of the New York City Advocate. People who need to find shelter can use the Red Cross shelter app. FEMA has set up a dedicated landing page for Hurricane Sandy and a direct means to apply for disaster assistance:

Public officials have embraced social media during the disaster as never before, sharing information about where to find help.

No power and diminished wireless capacity, however, mean that the Internet is not accessible in many homes. In the post below, learn more on what you can do on the ground to help and how you can contribute online.

For those who have lost power, using Twitter offline to stay connected to those updates is useful — along with using weather radios.

That said, for those that can get connected on mobile devices, there are digital resources emerging, from a crowdsourced Sandy coworking map in NYC to an OpenTrip Planner app for navigating affected transit options. This Google Maps mashup shows where to find food, shelter and charging stations in Hoboken, New Jersey.

In these conditions, mobile devices are even more crucial connectors to friends, family, services, resources and information. With that shift, government websites must be more mobile-friendly and offer ways to get information through text messaging.

Widespread power outages also mean that sharing the means to keep devices charged is now an act of community and charity.

Ways to to help with Sandy relief

A decade ago, if there was a disaster, you could donate money and blood. In 2012, you can also donate your time and skills. New York Times blogger Jeremy Zillar has compiled a list of hurricane recovery and disaster recovery resources. The conditions on the ground also mean that finding ways to physically help matter.

WNYC has a list of volunteer options around NYC. The Occupy Wall Street movement has shifted to “Occupy Sandy,” focusing on getting volunteers to help pick up and deliver food in neighborhoods around New York City. As Nick Judd reported for TechPresident, this “people-powered recovery” is volunteering to process incoming offers of help and requests for aid.

They’re working with Recovers.org, a new civic startup, which has now registered some 5,000 volunteers from around the New York City area. Recovers is pooling resources and supplies with community centers and churches to help in the following communities:

If you want to help but are far away from directly volunteering in New York, Connecticut or New Jersey, there are several efforts underway to volunteer online, including hackathons around the world tomorrow. Just as open government data feeds critical infrastructure during disasters, it is also integral to recovery and relief. To make that data matter to affected populations, however, the data must be put to use. That’s where the following efforts come in.

“There are a number of ways tech people can help right now,” commented Gisli Olafsson, Emergency Response Director at NetHope, reached via email. “The digital volunteer communities are coordinating many of those efforts over a Skype chat group that we established few days before Sandy arrived. I asked them for input and here are their suggestions:

  1. Sign up and participate in the crisis camps that are being organized this weekend at Geeks Without Borders and Sandy Crisis Camp.
  2. Help create visualizations and fill in the map gaps. Here is a link to all the maps we know about so far. Help people find out what map to look at for x,y,z.
  3. View damage photos to help rate damage assessments at Sandy OpenStreetMap. There are over 2000 images to identify and so far over 1000 helpers.”

Currently, there are Crisis Camps scheduled for Boston, Portland, Washington (DC), Galway (Ireland), San Francisco, Seattle, Auckland (NZ) and Denver, at RubyCon.

“If you are in any of those cities, please go the Sandy CrisisCamp blog post and sign up for the EventBrite for the CrisisCamp you want to attend in person or virtually,” writes Chad Catacchio (@chadcat), Crisis Commons communication lead.

“If you want to start a camp in your city this weekend, we are still open to the idea, but time is running short (it might be better to aim for next week),” he wrote.

UPDATE: New York-based nonprofit DataKind tweeted that they’re trying to rally the NY Tech community to pitch in real life on Saturday and linked to a new Facebook group. New York’s tech volunteers have already been at work helping city residents over the last 24 hours, with the New York Tech Meetup organizing hurricane recovery efforts.

People with technical skills in the New York area who want to help can volunteer online here and check out the NY Tech responds blog.

As Hurricane Sandy approached, hackers built tools to understand the storm. Now that it’s passed, “Hurricane Hackers” are working on projects to help with the recovery. The crisis camp in Boston will be hosted at the MIT Media Lab by Hurricane Hackers this weekend.

Sandy Crisis Camps already have several projects in the works. “We have been asked by FEMA to build and maintain a damage assessment map for the entire state of Rhode Island,” writes Catacchio. He continues:

“We will also be assisting in monitoring social media and other channels and directing reports to FEMA there. We’ll be building the map using ArcGIS and will be needing a wide range of skill sets from developers to communications to mapping. Before the weekend, we could certainly use some help from ArcGIS folks in getting the map ready for reporting, so if that is of interest, please email Pascal Schuback at pascal@crisiscommons.org. Secondly, there has been an ask by NYU and the consortium of colleges in NYC to help them determine hotel capacity/vacancy as well as gas stations that are open and serving fuel. If other official requests for aid come in, we will let the community know. Right now, we DO anticipate more official requests, and again, if you are working with the official response/recovery and need tech support assistance, please let us know: email either Pascal or David Black at david@crisiscommons.org. We are looking to have a productive weekend of tackling real needs to help the helpers on the ground serving those affected by this terrible storm.”

Related:

October 31 2012

NYC’s PLAN to alert citizens to danger during Hurricane Sandy

Starting at around 8:36 PM ET last night, as Hurricane Sandy began to flood the streets of lower Manhattan, many New Yorkers began to receive an unexpected message: a text alert on their mobile phones that strongly urged them to seek shelter. It showed up on iPhones:

…and upon Android devices:

While the message was clear enough, the way that these messages ended up on the screens may not have been clear to recipients or observers. And still other New Yorkers were left wondering why emergency alerts weren’t on their phones.

Here’s the explanation: the emergency alerts that went out last night came from New York’s Personal Localized Alerting Network, the “PLAN” the Big Apple launched in late 2011.

NYC chief digital officer Rachel Haot confirmed that the messages New Yorkers received last night were the result of a public-private partnership between the Federal Communications Commission, the Federal Emergency Management Agency, the New York City Office of Emergency Management (OEM), the CTIA and wireless carriers.

While the alerts may look quite similar to text messages, the messages themselves run in parallel, enabling them to get through txt traffic congestion. NYC’s PLAN is the local version of the Commercial Mobile Alert System (CMAS) that has been rolling out nation-wide over the last year.

“This new technology could make a tremendous difference during
disasters like the recent tornadoes in Alabama where minutes – or even seconds – of extra warning could make the difference between life and death,” said FCC chairman Julius Genachowski, speaking last May in New York City. “And we saw the difference alerting systems can make in Japan, where they have an earthquake early warning system that issued alerts that saved lives.”

NYC was the first city to have it up and running, last December, and less than a year later, the alerts showed up where and when they mattered.

The first such message I saw shared by a New Yorker actually came on October 28th, when the chief digital officer of the Columbia Journalism School, Sree Sreenivasan, tweeted about receiving the alert:

He tweeted out the second alert he received, on the night of the 29th, as well:

These PLAN alerts go out to everyone in a targeted geographic area with enabled mobile devices, enabling emergency management officials at the state and local level to get an alert to the right people at the right time. And in an emergency like a hurricane, earthquake or fire, connecting affected residents to critical information at the right time and place are essential.

While the government texting him gave national security writer Marc Ambinder some qualms about privacy, the way the data is handled looks much less disconcerting than, say, needing to opt-out of sharing location data or wireless wiretapping.

PLAN alerts are free and automatic, unlike opt-in messages from Notify NYC or signing up for email alerts from OEM.

Not all New Yorkers received an emergency alert during Sandy because not all mobile devices have the necessary hardware installed or have updated relevant software. In May 2011, new iPhones and Android devices already had the chip. (Most older phones, not so much.)

These alerts don’t go out for minor issues, either: the system is only used by authorized state, local or national officials during public safety emergencies. They send the alert to CMAS, it’s authenticated, and then the system pushes it out to all enabled devices in a geographic area.

Consumers receive only three types of messages: alerts issued by the President, Amber Alerts, and alerts involving “imminent threats to safety or life.” The last category covers the ones that went out about Hurricane Sandy in NYC last night.

According to the FCC, participating mobile carriers can allow their subscribers to block all but Presidential alerts, although it may be a little complicated to navigate a website or call center to do so. By 2014, every mobile phone sold in the United States must be CMAS-capable. (You can learn more about CMAS in this PDF). Whether such mobile phones should be subsidized for the poor is a larger question that will be left to the next administration.

As more consumers replace their devices in the years ahead, more people around the United States will also be able to receive these messages, benefiting from a public-private partnership that actually worked to deliver on improved public safety.

At least one New Yorker got the message and listened to it:

“If ‘act’ means stay put, then why yes I did,” tweeted Noreen Whysel, operations manager Information Architecture Institute. “It was enough to convince my husband from going out….”

Here’s hoping New York City doesn’t have use this PLAN to tell her and others about impending disaster again soon.

October 19 2012

San Francisco looks to tap into the open data economy

As interest in open data continues to grow around the world, cities have become laboratories for participatory democracy. They’re also ground zero for new experiments in spawning civic startups that deliver city services or enable new relationships between the people and city government. San Francisco was one of the first municipalities in the United States to embrace the city as a platform paradigm in 2009, with the launch of an open data platform.

Years later, the city government is pushing to use its open data to accelerate economic development. On Monday, San Francisco announced revised open data legislation to enable that change and highlighted civic entrepreneurs who are putting the city’s data to work in new mobile apps.

City staff have already published the revised open data legislation on GitHub. (If other cities want to “fork” it, clone away.) David Chiu, the chairman of the San Francisco Board of Supervisors, the city’s legislative body, introduced the new version on Monday and submitted it on Tuesday. A vote is expected before the end of the year.

Speaking at the offices of the Hatchery in San Francisco, Chiu observed that, by and large, the data that San Francisco has put out showed the city in a positive light. In the future, he suggested, that should change. Chiu challenged the city and the smartest citizens of San Francisco to release more data, figure out where the city could take risks, be more entrepreneurial and use data to hold the city accountable. In his remarks, he said that San Francisco is working on open budgeting but is still months away from getting the data that they need.

Rise of the CDO

This new version of the open data legislation will create a chief data officer (CDO) position, assign coordinators for open data in each city department, and make it clear in procurement language that the city owns data and retains access to it.

“Timelines, mandates and especially the part about getting them to inventory what data they collect are all really good,” said Luke Fretwell, founder of Govfresh, which covers open government in San Francisco. “It’s important that’s in place. Otherwise, there’s no way to be accountable. Previous directives didn’t do it.”

The city’s new CDO will “be responsible for sharing city data with the public, facilitating the sharing of information between City departments, and analyzing how data sets can be used to improve city decision making,” according to the revised legislation.

In creating a CDO, San Francisco is running a play from the open data playbooks of Chicago and Philadelphia. (San Francisco’s new CDO will be a member of the mayor’s staff in the budget office.) Moreover, the growth of CDOs around the country confirms the newfound importance of civic data in cities. If open government data is to be a strategic asset that can be developed for the public good, civic utility and economic value, it follows that it needs better stewards.

Assigning a coordinator in each department is also an acknowledgement that open data consumers need a point of contact and accountability. In theory, this could help create better feedback loops between the city and the cohort of civic entrepreneurs that this policy is aimed at stimulating.

Who owns the data?

San Francisco’s experience with NextBus and a conflict over NextMuni real-time data is a notable case study for other cities and states that are considering similar policies.

The revised legislation directs the Committee on Information Technology (COIT) to, within 60 days from the passage of the legislation, enact “rules for including open data requirements in applicable City contracts and standard contract provisions that promote the City’s open data policies, including, where appropriate, provisions to ensure that the City retains ownership of City data and the ability to post the data on data.sfgov.org or make it available through other means.”

That language makes it clear that it’s the city that owns city data, not a private company. That’s in line with a principle that open government data is a public good that should be available to the public, not locked up in a proprietary format or a for-pay database. There’s some nuance to the issue, in terms of thinking through what rights a private company that invests in acquiring and cleaning up government data holds, but the basic principle that the public should have access to public data is sound. The procurement practices in place will mean that any newly purchased system that captures structured data must have a public API.

Putting open data to work

Speaking at the Hatchery on Monday, Mayor Ed Lee highlighted three projects that each showcase open data put to use. The new Rec & Park app (iOS download), built by San Francisco-based startup Appallicious, enables citizens to find trails, dog parks, playgrounds and other recreational resources on a mobile device. “Outside” (iOS download), from San Francisco-based 100plus, encourages users to complete “healthy missions” in their neighborhoods. The third project, from mapping giant Esri, is a beautiful web-based visualization of San Francisco’s urban growth based upon open data from San Francisco’s planning departments.

The power of prediction

Over the past three years, transparency, accountability, cost savings and mobile apps have constituted much of the rationale for open data in cities. Now, San Francisco is renewing its pitch for the role of open data in job creation and combining increased efficiency and services.

Jon Walton, San Francisco’s chief information officer (CIO), identified two next steps for San Francisco in an interview earlier this year: working with other cities to create a federated model (now online at cities.data.gov) and using its own data internally to identify and solve issues. (San Francisco and cities everywhere will benefit from looking to New York City’s work with predictive data analytics.)

“We’re thinking about using data behind the firewalls,” said Walton. “We want to give people a graduated approach, in terms of whether they want to share data for themselves, to a department, to the city, or worldwide.”

On that count, it’s notable that Mayor Lee is now publicly encouraging more data sharing between private companies that are collecting data in San Francisco. As TechCrunch reported, the San Francisco government quietly passed a new milestone when it added to its open data platform private-sector datasets on pedestrian and traffic movement collected by Motionloft.

“This gives the city a new metric on when and where congestion happens, and how many pedestrians and vehicles indicate a slowdown will occur,” said Motionloft CEO Jon Mills, in an interview.

Mills sees opportunities ahead to apply predictive data analytics to life and death situations by providing geospatial intelligence for first responders in the city.

“We go even further when police and fire data are brought in to show the relation between emergency situations and our data,” he said. “What patterns cause emergencies in different neighborhoods or blocks? We’ll know, and the city will be able to avoid many horrible situations.”

Such data-sharing could have a real impact on department bottom lines: while “Twitter311” created a lot of buzz in the social media world, access to real-time transit data is what is estimated to have saved San Francisco more than $1 million a year by reducing the volume of San Francisco 311 calls by 21.7%.

Open data visualization can also enable public servants to understand how city residents are interacting and living in an urban area. For instance, a map of San Francisco pedestrian injuries shows high-injury corridors that merit more attention.

Open data and crowdsourcing will not solve all IT ills

While San Francisco was an early adopter of open data, that investment hasn’t changed an underlying reality: the city government remains burdened by a legacy of dysfunctional tech infrastructure, as detailed in a report issued in August 2012 by the City and County of San Francisco.

“San Francisco’s city-wide technology governing structure is ineffective and poorly organized, hampered by a hands-off Mayor, a weak Committee on Information Technology, an unreliable Department of Technology, and a departmentalized culture that only reinforces the City’s technological ineffectiveness,” state the report’s authors.

San Francisco government has embraced technologically progressive laws and rhetoric, but hasn’t always followed through on them, from setting deadlines to reforming human resources, code sharing or procurement.

“Departments with budgets in the tens of millions of dollars — including the very agency tasked with policing government ethics — still have miles to go,” commented Gov 2.0 advocate Adriel Hampton and former San Francisco government staffer in an interview earlier this year.

Hampton, who has turned his advocacy to legal standards for open data in California and to working at Nationbuilder, a campaign software startup, says that San Francisco has used technology “very poorly” over the past decade. While he credited the city’s efforts in mobile government and recent progress on open data, the larger system is plagued with problems that are endemic in government IT.

Hampton said the city’s e-government efforts largely remain in silos. “Lots of departments have e-services, but there has been no significant progress in integrating processes across departments, and some agencies are doing great while others are a mess,” commented Hampton. “Want to do business in SF? Here’s a sea of PDFs.”

The long-standing issues here go beyond policy, in his view. “San Francisco has a very fragmented IT structure, where the CIO doesn’t have real authority, and proven inability to deliver on multi-departmental IT projects,” he said. As an example, Hampton pointed to San Francisco’s Justice Information Tracking System, a $25 million, 10-year project that has made some progress, but still has not been delivered.

“The City is very good at creating feel-good requirements for its vendors that simply result in compliant companies marking up and reselling everything from hardware to IT software and services,” he commented. “This makes for not only higher costs and bureaucratic waste, but huge openings for fraud. Contracting reform was the number one issue identified in the ImproveSF employee ideation exercise in 2010, but it sure didn’t make the press release.”

Hampton sees the need for two major reforms to keep San Francisco on a path to progress: empowering the CIO position with more direct authority over departmental IT projects, and reforming how San Francisco procures technology, an issue he says affects all other parts of the IT landscape. The reason city IT is so bad, he says, its that it’s run by a 13-member council. “[The] poor CIO’s hardly got a shot.”

All that said, Hampton gives David Chiu and San Francisco city government high marks for their recent actions. “Bringing in Socrata to power the open data portal is a solid move and shows commitment to executing on the open data principle,” he said.

While catalyzing more civic entrepreneurship is important, creating enduring structural change in how San Francisco uses technology will require improving how the city government collects, stores, consumes and releases data, along with how it procures, governs and builds upon technology.

On that count, Chicago’s experience may be relevant. Efforts to open government data there have led to both progress and direction, as Chicago CTO John Tolva blogged in January:

“Open data and its analysis are the basis of our permission to interject the following questions into policy debate: How can we quantify the subject-matter underlying a given decision? How can we parse the vital signs of our city to guide our policymaking? … It isn’t just app competitions and civic altruism that prompts developers to create applications from government data. 2011 was the year when it became clear that there’s a new kind of startup ecosystem taking root on the edges of government. Open data is increasingly seen as a foundation for new businesses built using open source technologies, agile development methods, and competitive pricing. High-profile failures of enterprise technology initiatives and the acute budget and resource constraints inside government only make this more appealing.”

Open data and job creation?

While realizing internal efficiencies and cost savings are key requirements for city CIOs, they don’t hold the political cachet of new jobs and startups, particularly in an election year. San Francisco is now explicitly connecting its release of open data to jobs.

“San Francisco’s open data policies are creating jobs, improving our city and making it easier for residents and visitors to communicate with government,” commented Mayor Lee, via email.

Lee is optimistic about the future, too: “I know that, at the heart of this data, there will be a lot more jobs created,” he said on Monday at the Hatchery.

Open data’s potential for job creation is also complemented by its role as a raw material for existing businesses. “This legislation creates more opportunities for the Esri community to create data-driven decision products,” said Bronwyn Agrios, a project manager at Esri, in an interview.

Esri, however, as an established cloud mapping giant, is in a different position than startups enabled by open data. Communications strategist Brian Purchia, the former new media director for former San Francisco Mayor Gavin Newsom, points to Appallicious.

Appallicious “would not have been possible with [San Francisco's] open data efforts,” said Purchia. “They have have hired about 10 folks and are looking to expand to other cities.”

The startup’s software drives the city’s new Rec & Park app, including the potential to enable mobile transactions in the next iteration.

“Motionloft will absolutely grow from our involvement in San Francisco open data,” said Motionloft CEO Mills. “By providing some great data and tools to the city of San Francisco, it enables Motionloft to develop solutions for other cities and government agencies. We’ll be hiring developers, sales people, and data experts to keep up with our plans to grow this nationwide, and internationally.”

The next big question for these startups, as with so many others in nearby Silicon Valley, is whether their initial successes can scale. For that to happen for startups that depend upon government data, other cities will not only need to open up more data, they’ll need to standardize it.

Motionloft, at least, has already moved beyond the Bay Area, although other cities haven’t incorporated its data yet. Esri, as a major enterprise provider of proprietary software to local governments, has some skin in this game.

“City governments are typically using Esri software in some capacity,” said Agrios. “It will certainly be interesting to see how geo data standards emerge given the rapid involvement of civic startups eagerly consuming city data. Location-aware technologists on both sides of the fence, private and public, will need to work together to figure this out.”

If the marketplace for civic applications based upon open data develops further, it could help with a key issue that has dogged the results of city app contests: sustainability. It could also help with a huge problem for city governments: the cost of providing e-services to more mobile residents as budgets continue to tighten.

San Francisco CIO Walton sees an even bigger opportunity for the growth of civic apps that go far beyond the Bay Area, if cities can coordinate their efforts.

“There’s lots of potential here,” Walton said. “The challenge is replicating successes like Open311 in other verticals. If you look at the grand scale of time, we’re just getting started. For instance, I use Nextbus, an open source app that uses San Francisco’s open data … If I have Nextbus on my phone, when I get off a plane in Chicago or New York City, I want to be able to use it there, too. I think we can achieve that by working together.”

If a national movement toward open data and civic apps gathers more momentum, perhaps we’ll solve a perplexing problem, mused Walton.

“In a sense, we have transferred the intellectual property for apps to the public,” he said. “On one hand, that’s great, but I’m always concerned about what happens when an app stops working. By creating data standards and making apps portable, we will create enough users so that there’s enough community to support an application.”

Related:

October 17 2012

Data from health care reviews could power “Yelp for health care” startups

A hospital in MaineA hospital in MaineGiven where my work and health has taken me this year, I’ve been thinking much more about the relationship of the Internet and health data to accountability and patient-driven health care.

When I was looking for a place in Maine to go for care this summer, I went online to look at my options. I consulted hospital data from the government at HospitalCompare.HHS.gov and patient feedback data on Yelp, and then made a decision based upon proximity and those ratings. If I had been closer to where I live in Washington D.C., I would also have consulted friends, peers or neighbors for their recommendations of local medical establishments.

My brush with needing to find health care when I was far from home reminded me of the prism that collective intelligence can now provide for the treatment choices we make, if we have access to the Internet.

Patients today are sharing more of their health data and experiences online voluntarily, which in turn means that the Internet is shaping health care. There’s a growing phenomenon of “e-patients” and caregivers going online to find communities and information about illness and disability.

Aided by search engines and social media, newly empowered patients are discussing health conditions with others suffering from disease and sickness — and they’re taking that peer-to-peer health care knowledge into their doctors’ offices with them, frequently on mobile devices. E-patients are sharing their health data of their own volition because they have a serious health condition, want to get healthy, and are willing.

From the perspective of practicing physicians and hospitals, the trend of patients contributing to and consulting on online forums adds the potential for errors, fraud, or misunderstanding. And yet, I don’t think there’s any going back from a networked future of peer-to-peer health care, anymore than we can turn back the dial on networked politics or disaster response.

What’s needed in all three of these areas is better data that informs better data-driven decisions. Some of that data will come from industry, some from government, and some from citizens.

This fall, the Obama administration proposed a system for patients to report medical mistakes. The system would create a new “consumer reporting system for patient safety” that would enable patients to tell the federal government about unsafe practices or errors. This kind of review data, if validated by government, could be baked into the next generation of consumer “choice engines,” adding another layer for people, like me, searching for care online.

There are precedents for the collection and publishing of consumer data, including the Consumer Product Safety Commission’s public complaint database at SaferProducts.gov and the Consumer Financial Protection Bureau’s complaint database. Each met with initial resistance by industry but have successfully gone online without massive abuse or misuse, at least to date.

It will be interesting to see how medical associations, hospitals and doctors react. Given that such data could amount to government collecting data relevant to thousands of “Yelps for health care,” there’s both potential and reason for caution. Health care is a bit different than product safety or consumer finance, particularly with respect to how a patient experiences or understands his or her treatment or outcomes for a given injury or illness. For those that support or oppose this approach, there is an opportunity for public comment on proposed data collection at the Federal Register.

The power of performance data

Combining patients review data with government-collected performance data could be quite powerful in helping to drive better decisions and adding more transparency to health care.

In the United Kingdom, officials are keen to find the right balance between open data, transparency and prosperity.

“David Cameron, the Prime Minister, has made open data a top priority because of the evidence that this public asset can transform outcomes and effectiveness, as well as accountability,” said Tim Kelsey, in an interview this year. He used to head up the United Kingdom’s transparency and open data efforts and now works at its National Health Service.

“There is a good evidence base to support this,” said Kelsey. “Probably the most famous example is how, in cardiac surgery, surgeons on both sides of the Atlantic have reduced the number of patient deaths through comparative analysis of their outcomes.”

More data collected by patients, advocates, governments and industry could help to shed light on the performance of more physicians and clinics engaged in other expensive and lifesaving surgeries and associated outcomes.

Should that be extrapolated across the medical industry, it’s a safe bet that some medical practices or physicians will use whatever tools or legislative influence they have to fight or discredit websites, services or data that puts them in a poor light. This might parallel the reception that BrightScope’s profiles of financial advisors have received in industry.

When I talked recently with Dr. Atul Gawande about health data and care givers, he said more transparency in these areas is crucial: