Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

February 25 2013

Governments looking for economic ROI must focus on open data with business value

There’s increasing interest in the open data economy from the research wings of consulting firms. Capgemini Consulting just published a new report on the open data economy. McKinsey’s Global Institute is following up its research on big data with an inquiry into open data and government innovation. Deloitte has been taking a long look at open data business models. Forrester says open data isn’t (just) for governments anymore and says more research is coming. If Bain & Company doesn’t update its work on “data as an asset” this year to meet inbound interest in open data from the public sector, it may well find itself in the unusual position of lagging the market for intellectual expertise.

As Radar readers know, I’ve been trying to “make dollars and sense” of the open data economy since December, looking at investments, business models and entrepreneurs.

harvey_lewisharvey_lewisIn January, I interviewed Harvey Lewis, the research director for the analytics department of Deloitte U.K. Lewis, who holds a doctorate in hypersonic aerodynamics, has been working for nearly 20 years on projects in the public sector, defense industry and national security. Today, he’s responsible for applying an analytical eye to consumer businesses, manufacturing, banking, insurance and the public sector. Over the past year, his team has been examining the impact of open data releases on the economy of the United Kingdom. The British government’s embrace of open data makes such research timely.

Given the many constituencies interested in open data these days, from advocates for transparency and good government to organizations interested in co-creating civic services to entrepreneurs focused on building and scaling sustainable startups, one insight stood out from our discussion in particular:

“The things you do to enable transparency … aren’t necessarily the same things you do to enable economic growth and economic impact,” said Lewis.

“For economic growth, focus on data that are likely to diffuse throughout the economy in the widest and greatest possible way. That’s dynamic data, data that’s granular, collected on a regular basis, updated, and made available through APIs that application developers and businesses can use.”

The rest of our interview, lightly edited for content and clarity, follows.

Why is Deloitte interested in open data?

Harvey Lewis: In late 2011, we realized that open data was probably going to be one of those areas that was likely to be transformational, maybe not in the short term, but certainly in the long term. A lot of the technology that companies are using to do analysis of data will become increasingly commoditized, so the advantage that people were going to get was going to come through their interpretations of data and by looking for other commercial mechanisms for getting value from data.

The great thing about open data is that it provides those opportunities. It provides, in some ways, a level playing field and ways of creating revenue and opportunities that just don’t exist in other spaces.

You’ve been investigating the demand for open data from businesses. How have you approached the research?

Harvey Lewis: We’ve been working with professor Nigel Shadbolt in the U.K., who is one of the great champions on the global stage for open data. He and I started work on our open data activity back about 12 months ago.

Our interest was not so much in open government data but more the spectrum of open data, from government, business and individual citizens. We thought we would run an exercise over the spring of 2012, inviting various organizations to come and debate open data. We were very keen to get a cross-section of people from public and private sectors in those discussions because we wanted to understand what businesses thought of open data. We published a report [PDF] in June of last year, which was largely qualitative, looking at what we thought was happening in the world of open data, from a business perspective.

There were four main hypotheses to that vision:

The first part was that we thought every business should have a strategy to explore open data. If you look at the quantity of data that’s now available globally, even just from government, it’s an extraordinary amount, if you measure it just by the number of datasets that are published. In the U.K., it’s in the tens of thousands. In the U.S., it’s in the hundreds of thousands. There’s a vast resource of data that’s freely available that can be used to supplement existing sources of information, proprietary or otherwise, and enrich companies’ views of the world.

The second part was that businesses themselves would start to open up their data. There are different ways of gaining revenue and value from data if they opened it up. This was quite a controversial subject, as I’m sure you might imagine, in some of the discussions. Nevertheless, we’re starting already to see companies releasing subsets of their data on competition websites, inviting the crowd to come up with innovative solutions. We’re also seeing evidence that companies are releasing their data to improve the way they interact with their customers. I think one of the great broad impacts of businesses opening up their data is reputational enhancement — and that can have a real economic benefit.

The third part of our hypothesis was that open data would inspire customer engagement. That is, I think, a great topic for exploration within the public sector itself. Releasing this data isn’t just about “publishing it and they will come” — it’s about releasing data and using that data to engage in a different type of conversation with citizens and consumers.

Certainly in the U.K., we’re starting to see the fruits of that and some new initiatives. There’s a concept called “midata” in the U.K., where the government is encouraging service providers to release consumer data back to individuals so they can shop around for the best deals in the market. I think that’s a great vision for open data.

The fourth part was the privacy and the ethical responsibilities that come with the processing of open data, with companies and government starting to work more closely together to come up with a new paradigm for responsibility and privacy.

Nigel Shadbolt and I committed to doing further work on the economic business case for open data to try to address some of these hypothetical views of the future.

That launched this second phase of our work, which was trying to quantify that economic benefit. We decided very early on, because of Nigel Shadbolt’s relationship to the Open Data Institute, to work closely with that organization, as it was born in the summer of 2012.

We spent a lot of time gathering data. Particularly, we were looking at whether or not we could infer from the demand for open data from a variety of government portals what the economic benefit would be. We looked to a number of other measures and data sources, including a very broad balance sheet analysis to try to infer how companies were increasingly using data to run their businesses and benefit their businesses.

What did you find in this inquiry?

Harvey Lewis: We published a second report, called “Open Growth,” in early December of last year. The fundamental problem in trying to estimate the economic benefit is around, essentially, a lack of data. It sounds quite ironic, doesn’t it, that there’s a lack of data to quantify the effect of open data?

In particular, it’s still early days for determining economic benefit. When you’re trying to uncover second-order effects in the economy due to open data, it’s very early days to be able to see those effects percolate through different sectors. We were really challenged. Nevertheless, we were able to look quite closely at the sorts of data that the U.K. government had been publishing and draw some conclusions about what that meant for the economy.

For example, we were able to categorize nearly 40,000 datasets that are publicly available from the U.K. government and other public bodies in the U.K. into a number of discreet categories. Thirty-three percent of the data that was being published by the government was related to government expenditure. A large slice of the data that was being supplied had to do with the economy demographics and health.

Does more transparency lead to positive economic outcomes?

Harvey Lewis: In the U.K., and certainly to some extent in the U.S., there are multiple objectives at work in open data.

One of the primary objectives is transparency, publishing data that allows citizens to really kick the tires on public services, hopefully leading them to be improved, to increase quality and choice for individual citizens.

The things you do to enable transparency, however, aren’t necessarily the same things you do to enable economic growth and economic impact. For economic growth, focus on data that are likely to diffuse throughout the economy in the widest and greatest possible way. That’s dynamic data, data that’s granular, collected on a regular basis, updated, and made available through APIs that application developers and businesses can use.

Put some guarantees around those data sources to preserve their formats, longevity and utility, so that businesses have the confidence to use them and start building companies on the backs of them. Investors have got to have confidence that data will be available in the long term.

Those are the steps you take for economic growth. They’re quite different from the steps you might take for transparency, which is about making sure that all data that has a potential bearing on public services and cities and interpretation of government policy is made available.

You defined five business model archetypes in your report: “suppliers, aggregators, developers, enrichers and enablers.” Which examples have been sustainable?

Harvey Lewis: In coming up with that list, we did an analysis of as many companies as we could find. We tried to apprize business models from publicly available information to get a better understanding of what they were doing with the data and how they were generating revenue from it.

We had a long list of about 15 or 16 discreet business models that we were then able to cluster into these five archetypes.

Suppliers are publishing open data, including, of course, public sector bodies. Some businesses are publishing their data. While there may be no direct financial return if they publish data as open data and make it freely available, there are nevertheless other benefits that are going to become very meaningful in the future.

It’s something that a lot of businesses won’t be able to ignore, particularly when it comes to sustainability and financial data. Consumers are putting a lot of businesses under a great deal of scrutiny now to make sure that businesses are operating with integrity and can be trusted. A lot of this is about public good or customer good, and that can be quite intangible.

The second area, aggregators, is perhaps the largest. Organizations are pooling publicly available data, combining it and producing insights from it that are useful. They’re starting to sell those insights to businesses. One example in the report takes open data from the public body that all companies that are operating in the U.K. have to register with. They combine that data with other sources from the web, social media and elsewhere to produce intelligence that other businesses can use. They’re growing at quite a phenomenal rate.

We’re seeing a decline of organizations that are purely aggregating public sources of information. I don’t think there’s a sustainable business model there. Particular areas, like business intelligence, energy and utilities, are taking public data and are getting insights. It’s the insights that have monetary value, not the data itself.

The third are the classic app developers. This is of greatest interest where the data that is provided by the public sector is granular, real-time, updated frequently and close to the hearts of ordinary citizens. Transport data, crime data, and health data are probably the three types of data where software developed on the back of that data is going to have the greatest impact.

In the U.K., we’re seeing a lot of transport applications that enable people to plan journeys across what is, in some cases, quite a fragmented transport infrastructure — and get real benefits as a result. I think it’s only a matter of time before we start to see health data being turned into applications in exactly the same way, allowing individuals to make more informed choices, understand their own health and how to improve it and so on.

The fourth area, enrichers, is a very interesting one. We think this is the “dark matter” of the open data economy. These are larger, typically established businesses that are hoovering significant quantities of open data and combining it with their own proprietary sources to offer services to customers. These sorts of services have traditionally existed and aren’t going to go away if the open data supplies dry up. They are hugely powerful. I’m thinking of insurers and retailers who have a lot of their own data about customers and are seeking better models of risk and understanding of customers. I think it’s difficult to measure economic benefit coming from this particular archetype.

The last area is enablers. These are organizations that don’t make money from open data directly but provide platforms and technologies that other businesses and individuals use. Competition websites are a very good example, where they provide a facility that allows businesses, public sector institutions, or research institutions to make subsets of their data available to seek solutions from the crowd.

Those are the five principal archetypes. The one that stands out, underpinning the open data market at the moment, is the “enricher” model. I think the hope is that the startups and small-to-medium enterprises in the aggregation and the developer areas are going to be the new engine for growth in open data.

Do you see adjustments being made based upon demand? Or are U.K. data releases conditioned upon what the government finds easy or politically low-risk?

Harvey Lewis: This comes back to my point about multiple objectives. The government in the U.K. is addressing a set of objectives through its open data initiative, one of which is economic growth. I’m sure it’s the same as in other countries around the world.

If the question is whether the government is releasing the right data to meet a transparency objective, then the answer is “yes.” Is it releasing the right data from an economic growth perspective? The answer is “almost.” It’s certainly doing an increasingly better job at that.

This is where the Open Data Institute really comes to the fore, because their remit, as far as the government is concerned, is to stimulate demand. They’re able to go back to the government and say, “Look, the real opportunity here is in the wholesale and retail sector. Or in the real estate sector — there are large swaths of government data that are valuable and relevant to this sector that are underutilized.” That’s an opportunity for the government to engage with businesses in those sectors, to encourage the use of open data and to demonstrate the benefits and outcomes that they can achieve.

It’s a very good question, but it depends on which objective you’re thinking about as to whether or not the answer is the right one. I think if you look toward the Danish government, for example, and the way that they’re approaching open data, there’s been a priority on economic growth. The sorts of datasets they’re releasing are going to stimulate growth in the Danish market, but they may not satisfy fully the requirements that one might expect from a transparency perspective or social growth perspective.

Does data format or method of release matter for outcomes, to the extent that you could measure it?

Harvey Lewis: From our analysis, data released through APIs and, in particular, transport data was in significant demand. There were noticeably more applications being built on the back of transport data published through an API than in almost any other area.

As a mechanism for making it easy for businesses to get hold of data, APIs are pretty crucial. Being able to provide data using that mechanism is a very good way of stimulating use.

Based on some of the other work that we’ve been doing, there’s a big push to release data in its raw form. CSV is talked about quite a lot. In some cases, that works well. In other cases, it is a barrier to entry for small-to-medium enterprises.

To go back to the general practitioner prescribing data, a single month’s worth of data is published in a CSV file each month. The file size is about half a gigabyte and contains typically over four million records. If you’re a small-to-medium enterprise with limited resources — or even if you’re a journalist — you cannot open that data file in typical desktop or laptop software. There’s just too many records. Even if you can find software that will open it, running queries on it takes a very long time.

There’s a natural barrier to entry for some formats that you really only appreciate once you try to process and get to grips with the data. That, I think, is something that needs to be thought through.

There’s an imperative to get data out there, but if you provide that data in a format that small-to-medium enterprises can’t use, I think it’s unfair. Larger businesses have the tools and the specialist capability to look at these files. That creates a problem, an economic barrier. It also creates a transparency barrier because although you may be publishing the data, no one can access it. Then you don’t get the benefits of increased transparency and accountability.

Where you’ve got potentially high-value datasets in health, crime, spending data and energy and environment data, a lot of care needs to be put into what formats are going to make that most easily accessible.

It isn’t always obvious. It isn’t the CSV file. It certainly isn’t the PDF! It isn’t anything, actually, that requires specialist knowledge and tools.

What are the next steps for your research inquiry?

Harvey Lewis: We’re continuing our work, trying to formulate ideas and methods. That includes using case studies and use cases, getting information from the public sector about how much it costs to generate the data, and looking at accounts of actual scenarios.

Understanding the economic impact, despite its challenges, is really important to policymakers around open data, to ensure that the benefits of releasing open data outweigh the costs of producing it. That’s absolutely essential to the business case of open data.

The other part of our activity is focusing on the insights that can be derived from open data that benefit the public sector or private sector companies. We’re looking quite hard at the growth opportunities in open data and the areas where significant cost savings or efficiencies can be gained.

We’re also looking at some interesting potential policy areas by mashing up different sources of data. For example, can you go some way to understanding the relationship between crime and mental health? With the release of detailed crime data and detailed prescribing data, there’s an opportunity, at a very granular level, to understand potential correlations and then do some research into the underlying causes. The focus of our research is subtly shifting toward more use-case type analysis, rather than looking at an abstract, generic picture about open data.

Bottom line: does releasing open data lead to significant economic benefit?

Harvey Lewis: My instinct and the data we have today suggest that it is going to lead to significant economic benefit. Precisely how big that benefit is needs further study.

I think it’s likely to be more in the realm of the broader impacts and some of the intangibles where we see the greatest impact, necessarily through new businesses starting up and more businesses using open data. We will see those things.


This post is part of our ongoing investigation into the open data economy.

January 23 2013

Making open data more valuable, one micropayment at a time

When it comes to making sense of the open data economy, tracking cents is valuable. In San Francisco, where Mayor Ed Lee’s administration has reinvigorated city efforts to release open data for economic benefits, entrepreneur Yo Yoshida has made the City by the Bay’s government data central to his mobile ecommerce startup, Appallicious.

Appallicious is positioning its Skipitt mobile platform as a way for cities to easily process mobile transactions for their residents. The startup is generating revenue from each transaction the city takes with its platform using micropayments, a strategy that’s novel in the world of open data but has enabled Appallicious to make enough money to hire more employees and look to expand to other municipalities. I spoke to Yoshida last fall about his startup, what it’s like to go through city procurement, and whether he sees a market opportunity in more open government data.

Where did the idea for Appallicious come from?

Yo Yoshida: About three years ago, I was working on another platform with a friend that I met years ago, working on a company called Beaker. We discovered a number of problems. One of them was being able to find our way around San Francisco and not only get information, but be able to transact with different services and facilities, including going to a football game at the 49ers stadium. Why couldn’t we order a beer to our seats or order merchandise? Or find the food trucks that were sitting in some of the parks and then place an order from that?

So we were looking at what solutions were out there via mobile. We started exploring how to go about doing this. We looked first at the vendors and approaching them. That’s been done with a lot of other specific verticals. We started talking to the city a little bit. We looked at the open data legislation that was coming out at that time and said, “This is the information we need, but now we also need to be able to figure out how to monetize and populate that.”

We set about starting to build a platform that could not only support one type of transaction — ordering merchandise or something like that — but provide what I needed as a citizen to fulfill my needs and solve problems. We approached San Francisco Recreations and Parks because we had heard, through a third party, that they had been looking for a solution like this for two years. We showed them what we were doing. They asked us to come back with a demonstration of a product in a few weeks. We came back and showed them the first iteration of a mobile app.

Essentially, what we built was a mobile commerce platform that supports multiple tenants of financial transactions using open data. We enable the government — or whoever we’re working with — to be able to manage it from a multi-tiered, hierarchical structure.

We’ve built this platform to enable government to manage all of their mobile technology and transactions through software as a service.

What’s your business model?

Yo Yoshida: San Francisco Recreations and Parks has 1,200 facilities in San Francisco. The parks are free. The museums, obviously, are not, but they all sit on park land. You’re talking about permits, reservations for picnic tables. You have all of these different facilities, and all sorts of different ways to transact at each of these facilities. What we’ve done is create an informational piece for the public, which gives them the ability to find all sorts of facilities.

There’s two different models for the financial piece. One is subscription-based.

However, with San Francisco Recreations and Parks, we saw a bigger and a more sustainable proposition in taking micropayments on transactions. There’s tons of transactions going on every day, from permitting to making reservations to scheduling classes to ticketing for events. Golden Gate Park gets 15 million visitors a year, including those visiting the Botanical Gardens, the Japanese Tea Gardens, and the California Academy of Science. Essentially, what we’re setting up is a micropayment or a convenience fee on each of those transactions.

San Francisco’s Recreations and Parks annual revenue alone is $35 million. That’s a percentage of ticket sales and lease prices for everything that all of these different properties sit on. Their extended reach is $200 million plus. So if we were to tap into that marketplace and take micropayments on them, we’re looking at a couple million dollars a year for us.

How big is your company now?

Yo Yoshida: We started with two people. We are now about to hire a total of 12. We expect to grow to maybe 30 by next summer, all depending on our funding rounds as they come through. We have interest from other cities, like San Diego, Denver and Los Angeles. We’re basically a plug-and-play solution for government or cities to be able to take open data, plug it in and then start creating financial pools out of it for the consumers to be able to have easy transactions.

Can other cities “plug and play” open data into your system?

Yo Yoshida: The biggest pain for me, obviously, is the transactions. Some cities have to pass legislation. If they have open data, plugging in and getting the informational piece out first, which is what Recreations and Parks is doing, essentially, is a no-brainer.

If someone has good open datasets, it would take maybe a month to implement this for an entire large city, depending on the departments. You first would have the tools for everyone to be able to find their way around. For instance, there’s always been pain points with Muni, like finding the three-day passes. There’s no reason why you shouldn’t have that built into your map and into your directions if you’re going to one of those facilities, and then be able to use that to actually go to the museums as well.

Entrepreneurs trying to use government data sometimes describe challenges around its quality. Is that true here?

Yo Yoshida: We had to work with San Francisco on that, but each of the departments that we’re working with has assigned someone to clean up the data. You can’t have bad data in there. We’ve had that pain point in our past conversations. Frequently, it is a three-month wait time for them to clean up their data.

The Department of Public Health is doing it now. Their GIS person usually is the person that gets assigned to making sure all of the data that’s opened up to the public is cleaned up. He’s done an amazing job cleaning up all of the data points. It’s been a win-win situation because they all want this technology. They know they have to have clean data to get it, so they’re cleaning up their data.

Do you think more startups will target government as a customer?

Yo Yoshida: The procurement process was a long and grueling process. A lot of it came from the City Attorney’s office not understanding what this was, what this technology is like and that they can’t own everything. We did struggle a little bit there. We were very patient. We educated them as we went along. Most small startups can’t get to that place yet.

I think having someone sitting above that who actually understands software as a service and drives these things through a few times so they can get used to this process is going to make a huge difference for entrepreneurs.

We see this type of development and drive from the Mayor’s office as a huge opportunity to get the process streamlined and more efficient, so that entrepreneurs can actually come up and create technology. I mean, we suffered for a year, but we got it through. Hopefully, that will pave the way for others. With the new legislation, we’re hoping that they’re going to make it a much more efficient process and have someone there that actually understands this process.

The barriers to entry were so high before. If they streamline the process for entrepreneurs, there’s an incredible ability to access extreme amounts of revenue.

Is there a market opportunity in the open data San Francisco is releasing?

Yo Yoshida: There’s a small market play selling apps. I think you’re going to see, with companies like ours, that there truly is an ability to innovate on top of open data.

There absolutely is opportunity. It’s created us. We know that there’s going to be competitors coming along behind us, filling some needs that we can’t. The subscription-based model is going to probably work for several departments, like the Department of Public Health.

As far as hackathons and stuff like that, personally, I think they’re very innovative, but they’re not sustainable. There are definitely companies that are sustainable moving forward.

As far as I can tell, we are pretty much the first sustainable one on the scene. Our projected numbers, just off of micropayments, are going to not only generate revenue for us, but generate revenue for the city. I am looking at this as a sustainable company that can move forward and scale through and accommodate every type of city.

I see lots of new apps and lots of great informational apps, but they don’t make money. You have to sustain the technology. As you know, every version needs a new update. Who’s going to be maintaining that? How are you going to pay for the maintenance and how are you going to pay for the staff to do it? You have to create the real company. Our infrastructure is created to be a sustainable solution for cities moving forward.

This interview has been edited and condensed for clarity. This post is part of our ongoing investigation into the open data economy.

Related

January 10 2013

Want to analyze performance data for accountability? Focus on quality first.

Here’s an ageless insight that will endure well beyond the “era of big data“: poor collection practices and aging IT will derail any institutional efforts to use data analysis to improve performance.

According to an investigation by the Los Angeles Times, poor record-keeping is holding back state government efforts to upgrade California’s 911 system. As with any database project, beware “garbage in, garbage out,” or “GIGO.”

As Ben Welsh and Robert J. Lopez reported for the L.A. Times in December, California’s Emergency Medical Services Authority has been working to centralize performance data since 2009.

Unfortunately, it’s difficult to achieve data-driven improvements or manage against perceived issues by applying big data to the public sector if the data collection itself is flawed. The L.A. Times reported quality issues stemmed from how response times were measured to record keeping on paper to a failure to keep records at all.

lafdanalysislafdanalysis
Image Credit: Ben Welsh, who mapped 911 response time data for Los Angeles Times.

When I shared this story with the Radar team, Nat Torkington suggested revisiting the “Observe, Orient, Decide, and Act” (OODA) loop familiar to military strategists.

“If your observations are flawed, your decisions will be too,” wrote Nat, in an email exchange. “If you pump technology investment into the D phase, without similarly improving the Os, you’ll make your crappy decisions faster.”

Alistair Croll explored the relevance of OODA to big data in his post on the feedback economy last year. If California wants to catalyze the use of data-driven analysis to improve response times that vary by geography and jurisdictions, start with the first “O.”

The set of factors at play here, however, means that there won’t be a single silver bullet for putting California’s effort back on track. Lack of participation and reporting standards, and old IT systems are all at issue — and given California’s ongoing financial issues, upgrading the latter and requiring local fire departments and ambulance firms to spend time and money on data collection will not be an easy sell.

Filed from the data desk

The investigative work of the L.A. Times was substantially supported by its Data Desk, a team of reporters and web developers that specializes in maps, databases, analysis and visualization. I included their interactive visualization mapping how fast the Los Angeles Fire Department responded to calls in my recent post on how data journalism is making sense of the world. When I profiled Ben Welsh’s work last year in our data journalist series, he told me this kind of project is exactly the sort of work he’s most proud of doing.

“As we all know, there’s a lot of data out there,” said Welsh, in our interview, “and, as anyone who works with it knows, most of it is crap. The projects I’m most proud of have taken large, ugly datasets and refined them into something worth knowing: a nut graf in an investigative story or a data-driven app that gives the reader some new insight into the world around them.”

The Data Desk set a high bar in this most recent investigation by not only making sense of the data, but also in releasing the data behind the open source maps of California’s emergency medical agencies it published as part of the series.

This isn’t the first time they’ve made code available. As Welsh noted in a post about the series, the Data Desk has “previously written about the technical methods used to conduct [the] investigation, released the base layer created for an interactive map of response times and contributed the location of LAFD’s 106 fire station to the Open Street Map.”

Creating an open source newsroom is not easy. In sharing not only its code but its data, the Los Angeles Times is setting a notable example for the practice of open journalism in the 21st century, building out the newsroom stack and hinting at media’s networked future.

This post is part of our series investigating data journalism.

Want to analyze performance data for accountability? Focus on quality first.

Here’s an ageless insight that will endure well beyond the “era of big data“: poor collection practices and aging IT will derail any institutional efforts to use data analysis to improve performance.

According to an investigation by the Los Angeles Times, poor record-keeping is holding back state government efforts to upgrade California’s 911 system. As with any database project, beware “garbage in, garbage out,” or “GIGO.”

As Ben Welsh and Robert J. Lopez reported for the L.A. Times in December, California’s Emergency Medical Services Authority has been working to centralize performance data since 2009.

Unfortunately, it’s difficult to achieve data-driven improvements or manage against perceived issues by applying big data to the public sector if the data collection itself is flawed. The L.A. Times reported quality issues stemmed from how response times were measured to record keeping on paper to a failure to keep records at all.

lafdanalysislafdanalysis
Image Credit: Ben Welsh, who mapped 911 response time data for Los Angeles Times.

When I shared this story with the Radar team, Nat Torkington suggested revisiting the “Observe, Orient, Decide, and Act” (OODA) loop familiar to military strategists.

“If your observations are flawed, your decisions will be too,” wrote Nat, in an email exchange. “If you pump technology investment into the D phase, without similarly improving the Os, you’ll make your crappy decisions faster.”

Alistair Croll explored the relevance of OODA to big data in his post on the feedback economy last year. If California wants to catalyze the use of data-driven analysis to improve response times that vary by geography and jurisdictions, start with the first “O.”

The set of factors at play here, however, means that there won’t be a single silver bullet for putting California’s effort back on track. Lack of participation and reporting standards, and old IT systems are all at issue — and given California’s ongoing financial issues, upgrading the latter and requiring local fire departments and ambulance firms to spend time and money on data collection will not be an easy sell.

Filed from the data desk

The investigative work of the L.A. Times was substantially supported by its Data Desk, a team of reporters and web developers that specializes in maps, databases, analysis and visualization. I included their interactive visualization mapping how fast the Los Angeles Fire Department responded to calls in my recent post on how data journalism is making sense of the world. When I profiled Ben Welsh’s work last year in our data journalist series, he told me this kind of project is exactly the sort of work he’s most proud of doing.

“As we all know, there’s a lot of data out there,” said Welsh, in our interview, “and, as anyone who works with it knows, most of it is crap. The projects I’m most proud of have taken large, ugly datasets and refined them into something worth knowing: a nut graf in an investigative story or a data-driven app that gives the reader some new insight into the world around them.”

The Data Desk set a high bar in this most recent investigation by not only making sense of the data, but also in releasing the data behind the open source maps of California’s emergency medical agencies it published as part of the series.

This isn’t the first time they’ve made code available. As Welsh noted in a post about the series, the Data Desk has “previously written about the technical methods used to conduct [the] investigation, released the base layer created for an interactive map of response times and contributed the location of LAFD’s 106 fire station to the Open Street Map.”

Creating an open source newsroom is not easy. In sharing not only its code but its data, the Los Angeles Times is setting a notable example for the practice of open journalism in the 21st century, building out the newsroom stack and hinting at media’s networked future.

This post is part of our series investigating data journalism.

August 13 2012

A grisly job for data scientists

Missing Person: Ai Weiwei by Daquella manera, on FlickrJavier Reveron went missing from Ohio in 2004. His wallet turned up in New York City, but he was nowhere to be found. By the time his parents arrived to search for him and hand out fliers, his remains had already been buried in an unmarked indigent grave. In New York, where coroner’s resources are precious, remains wait a few months to be claimed before they’re buried by convicts in a potter’s field on uninhabited Hart Island, just off the Bronx in Long Island Sound.

The story, reported by the New York Times last week, has as happy an ending as it could given that beginning. In 2010 Reveron’s parents added him to a national database of missing persons. A month later police in New York matched him to an unidentified body and his remains were disinterred, cremated and given burial ceremonies in Ohio.

Reveron’s ordeal suggests an intriguing, and impactful, machine-learning problem. The Department of Justice maintains separate national, public databases for missing people, unidentified people and unclaimed people. Many records are full of rich data that is almost never a perfect match to data in other databases — hair color entered by a police department might differ from how it’s remembered by a missing person’s family; weights fluctuate; scars appear. Photos are provided for many missing people and some unidentified people, and matching them is difficult. Free-text fields in many entries describe the circumstances under which missing people lived and died; a predilection for hitchhiking could be linked to a death by the side of a road.

I’ve called the Department of Justice (DOJ) to ask about the extent to which they’ve worked with computer scientists to match missing and unidentified people, and will update when I hear back. One thing that’s not immediately apparent is the public availability of the necessary training set — cases that have been successfully matched and removed from the lists. The DOJ apparently doesn’t comment on resolved cases, which could make getting this data difficult. But perhaps there’s room for a coalition to request the anonymized data and manage it to the DOJ’s satisfaction while distributing it to capable data scientists.

Photo: Missing Person: Ai Weiwei by Daquella manera, on Flickr

Related:

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl