Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

September 05 2013

August 20 2013

Ouvrir sa propre agence de notation grâce à l'open data ? - Rsln Mag

Ouvrir sa propre agence de notation grâce à l’open data ? - Rsln Mag
http://www.rslnmag.fr/post/2013/08/19/Ouvrir-sa-propre-agence-de-notation-grace-a-lopen-data-.aspx

Marc Joffe de l’Open Knowledge Foundation a créé un calculateur de risque financier pour les villes californiennes basé sur l’Open Data... De quoi faire la nique à Moody, Fitch ou Standard and Poor. Tags : internetactu2net internetactu fing #opendata #politiquespubliques #villelegere #citelabo (...)

#dashboard

August 08 2013

SPIP sait faire de l'open data

SPIP sait faire de l’open data
http://spip-love-opendata.nursit.com

Il « serait donc possible » d’exporter seenthis globalement.
(voir http://seenthis.net/messages/163383#message163664)

#rdf #opendata #spip

Tags: rdf opendata spip

July 12 2013

Open Data (1/3) : la technique a-t-elle pris le pas ?

Open Data (1/3) : la technique a-t-elle pris le pas ?
http://www.internetactu.net/2013/07/12/open-data-13-la-technique-a-t-elle-pris-le-pas

La seconde édition de la semaine européenne de l’open data se tenait cette année à Marseille du 25 au 28 juin 2013 (voir notre compte-rendu de la première édition qui se tenait à Nantes en mai 2012). Et l’impression d’ensemble est très différente de l’année dernière. Si la semaine était plus européenne (certainement du fait du partenariat avec le projet…

#données_publiques #opendata #opendataweek

June 29 2012

UK Cabinet Office relaunches Data.gov.uk, releases open data white paper

The British government is doubling down on the notion that open data can be a catalyst for increased government transparency, civic utility and economic prosperity.

Yesterday, the United Kingdom's Cabinet Office hosted an event in London, England to highlight the release of a new white paper on "unleashing the potential of open data," linked at the bottom of the post, and the relaunch of a Data.gov.uk, the country's open data platform. The site now has over 9,000 data sets on it, according to the Cabinet Office.

In the video below, Francis Maude, minister for the Cabinet Office, talks about the white paper, which was the result of a public consultation over the last year.

"I think it's all good overall," commented author Dr. Ben Goldacre, via email.

"The UK government have been saying the right things about data for a long time: that it's the 21st century's raw material, that it has economic and social benefits, that privacy issues need caution, and so on. That in itself is reassuring, as governments can sometimes be completely clueless about this kind of stuff.

They also get the nerdy details: that standards matter, and so on. Also, all the stuff about building reciprocal relationships with developers, building coder capacity, two way relationships to improve datasets etc is all great. The star rating system for departments is neat, as one lesson from this whole area is simple structured public feedback often improves services.

The main concern is that the core reference data hasn't been released for free. The Postcode Address File allows developers to convert addresses into postcodes: this kind of dataset is like the road network of the digital domain, and it needs to be open with free movement so businesses and services can meet users. Our excellent Ordnance Survey maps are still locked up at the more detailed levels, which is problematic since a lot of geographical data from local government uses OS data too, so release of that is hindered. Companies House data is also still pay only.

The Cabinet Office seem to have been fighting hard for this stuff, which is great, but it's proving difficult to release."

The Guardian's Datablog published a smart, cogent analysis of the open data white paper and a spreadsheet of the government's commitments under it.

I strongly agree with Simon Rogers, the editor of the Datablog, that one of the most significant elements of the white paper is its acknowledgement of the need to engage developers and solicit their feedback on the quality and availability of open government data.

"Traditionally, government has almost ignored developers, even as prime users of its data," wrote Simon Rogers at the Guardian. "This commitment to take that community into account is probably the most striking part of this White Paper, which will allow users to ask government for specific datasets, feedback on how they've used them and, crucially, 'inform us when there are anomalies or mistakes in our data.'"

The past several years have shown such engagement is a critical aspect of building communities around open data. Directly engaging entrepreneurs, venture capitalists, industry and academia is, as US CTO Todd Park's success with stimulating innovation around open health data has demonstrated, necessary for downstream success. Publishing high quality open data online is, in that context, necessary but unsufficient for better downstream outcomes for citizens. In the context of the costs incurred through publishing open data, this investment of time and energy in community engagement can't be underemphasized - and the inclusion of this strategic element in the white paper is notable.

All that being said, an actual strategy for developer engagement was not published in the white paper - stay tuned on that count.

Maude, Berners-Lee and Pollock on open data

Earlier this spring, I interviewed Francis Maude, the United Kingdom's minister for the Cabinet Office, about the responsibilities and opportunities for open government and transparency, including its relationship to prosperity and security. The video of our interview is embedded below:

The British government has also now officially adopted the "5 star" rubric of one of its most celebrated citizens, World Wide Web inventor Tim Berners-Lee, for evaluating the quality of open government data. Below, I've embedded my interview on open data with Berners-Lee, which remains relevant today:

For another view from civil society on what, exactly, open data is and why it matters, watch my interview with Rufus Pollock, the co-founder of the Open Knowledge Foundation, below. The Open Knowledge Foundation supports the, Comprehensive Knowledge Archive Network (CKAN), the open source open data platform software that underpins the Data.gov.uk site.

UK government white paper on open data

June 08 2012

mHealth apps are just the beginning of the disruption in healthcare from open health data

Two years ago, the potential of government making health information as useful as weather data felt like an abstraction. Healthcare data could give citizens the same "blue dot" for navigating health and illness akin to the one GPS data fuels on the glowing map of geolocated mobile devices that are in more and more hands.

After all, profound changes in entire industries, take years, even generations, to occur. In government, the pace of progress can feel even slower, measured in evolutionary time and epochs.

Sometimes, history works differently, particularly given the effect of rapid technological changes. It's only a little more than a decade since President Clinton announced he would unscramble global positioning system data (GPS) for civilian use. President Obama's second U.S. chief technology officer, Todd Park, estimated that GPS data is estimated to have unlocked some $90 billion dollars in value in the United States.

In the context, the arc of the Health Data Initiative (HDI) in the United States might leave some jaded observers with whiplash. From a small beginning, the initiative to put health data to work has now expanded around the United States and attracted great interest from abroad, including observers from England National Health Service eager to understand what strategies have unlocked innovation around public data sets.

While the potential of government health data driving innovation may well have felt like an abstraction to many observers, in June 2012, real health apps and services are here -- and their potential to change how society accesses health information, deliver care, lowers costs, connects patients to one another, creates jobs, empowers care givers and cuts fraud is profound. The venture capital community seems to have noticed the opportunity here: according to HHS Secretary Sebelius, investment in healthcare startups is up 60% since 2009.

Headlines about rockstar Bon Jovi 'rocking Datapalooza' and the smorgasbord of health apps on display, however, while both understandable and largely warranted, don't convey the deeper undercurrent of change.

On March 10, 2010, the initiative started with 36 people brainstorming in a room. On June 2, 2010, approximately 325 in-person attendees saw 7 health apps demoed at an historic forum in the theater of Institute of Medicine in Washington, D.C, with another 10 apps packed into an expo in the rotunda outside. All of the apps or services used open government data from the United States Department of Health and Human Services (HHS).

In 2012, 242 applications or services that were based upon or use open data were submitted for consideration to third annual "Health Datapalooza. About 70 health app exhibitors made it to the expo. The conference itself had some 1400 registered attendees, not counting press and staff, and was sold out in advance of the event in the cavernous Washington Convention Center in DC. On Wednesday, I asked Dr. Bob Kucher, now of Venrock Capital and the Brookings Institution, about how the Health Data Initiative has grown and evolved. Dr. Kucher was instrumental to its founding when he served in the Obama administration. Our interview is embedded below:

Revolutionizing the healthcare industry --- in HHS Secretary Sebelius's words, reformulating Wired executive editor Thomas Goetz's 'latent data' to "lazy data" --- has meant years of work unlocking government data and actively engaging the developers, entrepreneurial and venture capital community. While the process of making health data open and machine-readable is far from done, there has been incontrovertible progress in standing up new application programming interfaces (APIs) that enable entrepreneurs, academic institutions and government itself to retrieve it one demand. On Monday, in concert with the Health Data Palooza, a new version of HealthData.gov launched, including the release of new data sets that enable not just hospital quality comparisons but insurance fees as well.

Two years later, the blossoming of the HDI Forum into a massive conference that attracted the interest of the media, venture capitalists and entrepreneurs from around the nation is a short-term development that few people would have predicted in 2010 but that a nation starved for solutions to spiraling healthcare costs and some action from a federal government that all too frequently looks broken is welcome.

"The immense fiscal pressure driving 'innovation' in the health context actually means belated leveraging of data insights other industries take for granted from customer databases," said Chuck Curran, executive director and general counsel or the Network Advertising Initiative, when interviewed at this year's HDI Forum. For example, he suggested, look at "the dashboarding of latent/lazy data on community health, combined with geographic visualizations, to enable “hotspot”-focused interventions, or info about service plan information like the new HHS interface for insurance plan data (including the API).

Curran also highlighted the role that fiscal pressure is having on making both individual payers and employers a natural source of business funding and adoption for entrepreneurs innovating with health data, with apps like My Drugs Costs holding the potential to help citizens and businesses alike cut down on an estimated $95 billion dollars in annual unnecessary spending on pharmaceuticals.

Curran said that health app providers have fully internalized smart disclosure : "it’s not enough to have open data available for specialist analysis -- there must be simplified interfaces for actionable insights and patient ownership of the care plan."

For entrepreneurs eying the healthcare industry and established players within it, the 2012 Health Data Palooza offers an excellent opportunity to "take the pulse of mHealth, as Jody Ranck wrote at GigaOm this week:

Roughly 95 percent of the potential entrepreneur pool doesn’t know that these vast stores of data exist, so the HHS is working to increase awareness through the Health Data Initiative. The results have been astounding. Numerous companies, including Google and Microsoft, have held health-data code-a-thons and Health 2.0 developer challenges. These have produced applications in a fraction of the time it has historically taken. Applications for understanding and managing chronic diseases, finding the best healthcare provider, locating clinical trials and helping doctors find the best specialist for a given condition have been built based on the open data available through the initiative.

In addition to the Health Datapalooza, the Health Data Initiative hosts other events which have spawned more health innovators. RockHealth, a Health 2.0 incubator, launched at its SXSW 2011 White House Startup America Roundtable. In the wake of these successful events, StartUp Health, a network of health startup incubators, entrepreneurs and investors, was created. The organization is focused on building a robust ecosystem that can support entrepreneurs in the health and wellness space.

This health data ecosystem has now spread around the United States, from Silicon Valley to New York to Louisiana. During this year's Health Datapalooza, I spoke with Ramesh Kolluru, a technologist who works at the University of Louisiana, about his work on a hackathon in Louisiana, the "Cajun Codefest," and his impressions of the forum in Washington:

One story that stood out from this year's crop of health data apps was Symcat, an mHealth app that enables people to look up their symptoms and find nearby hospitals and clinics. The application was developed by two medical students at Johns Hopkins University who happened to share a passion for tinkering, engineering and healthcare. They put their passion to work - and somehow found the time (remember, they're in medical school) to build a beautiful, usable health app. The pair landed a $100,000 prize from the Robert Wood Johnson Foundation for their efforts. In the video embedded below, I interview Craig Munsen, one of the medical students, about his application. (Notably, the pair intends to use their prize to invest in the business, not pay off medical school debt.)

There are more notable applications and services to profile from this year's expo - and in the weeks ahead, expect to see some of them here on Radar, For now, it's important now to recognize the work of all of the men and women who have worked so hard over the past two years create public good from public data.

Releasing and making open health data useful, however, is about far more than these mHealth apps: It's about saving lives, improving the quality of care, adding more transparency to a system that needs it, and creating jobs. Park spoke with me this spring about how open data relates to much more than consumer-facing mHealth apps:

As the US CTO seeks to scale open data across federal government by applying the lessons learned in the health data initiative, look for more industries to receive digital fuel for innovation, from energy to education to transit and finance. The White House digital government strategy explicitly embraces releasing open data in APIs to enable more accountability, civic utility and economic value creation.

While major challenges lie ahead, from data quality to security or privacy, the opportunity to extend the data revolution in healthcare to other industries looks more tangible now than it has in years past.

Business publications, including the Wall Street Journal, have woken up to the disruptive potential of open government data As Michael Hickins wrote this week, "The potential applications for data from agencies as disparate as the Department of Transportation and Department of Labor are endless, and will affect businesses in every industry imaginable. Including yours. But if you can think of how that data could let someone disrupt your business, you can stop that from happening by getting there first."

This growing health data movement is not placed within any single individual city, state, agency or company. It's beautifully chaotic, decentralized, and self-propelled, said Park this past week.

"The Health Data Initiative is no longer a government initiative," he said. "It's an American one. "

June 04 2012

Can Future Advisor be the self-driving car for financial advice?

Future AdvisorLast year, venture capitalist Marc Andreessen famously wrote that software is eating the world. The impact of algorithms upon media, education, healthcare and government, among many other verticals, is just beginning to be felt, and with still unfolding consequences for the industries disrupted.

Whether it's the prospect of IBM's Watson offering a diagnosis to a patient or Google's self-driving car taking over on the morning commute, there are going to be serious concerns raised about safety, power, control and influence.

Doctors and lawyers note, for good reason, that their public appearances on radio, television and the Internet should not be viewed as medical or legal advice. While financial advice may not pose the same threat to a citizen as an incorrect medical diagnosis or treatment, poor advice could have pretty significant downstream outcomes.

That risk isn't stopping a new crop of startups from looking for a piece of the billions of dollars paid every year to financial advisors. Future Advisor launched in 2010 with the goal of providing better financial advice through the Internet using data and algorithms. They're competing against startups like Wealthfront and Betterment, among others.

Not everyone is convinced of the validity of this algorithmically mediated approach to financial advice. Mike Alfred, the co-founder of BrightScope (which has liberated financial advisor data itself), wrote in Forbes this spring that online investment firms are wrong about financial advisors:

"While singularity proponents may disagree with me here, I believe that some professions have a fundamentally human component that will never be replaced by computers, machines, or algorithms. Josh Brown, an independent advisor at Fusion Analytics Investment Partners in NYC, recently wrote that 'for 12,000 years, anywhere someone has had wealth through the history of civilization, there's been a desire to pay others for advice in managing it.' In some ways, it's no different from the reason why many seek out the help of a psychiatrist. People want the comfort of a human presence when things aren't going well. A computer arguably may know how to allocate funds in a normal market environment, but can it talk you off the cliff when things go to hell? I don't think so. Ric Edelman, Chairman & CEO of Edelman Financial Services, brings up another important point. According to him, 'most consumers are delegators and procrastinators, and need the advisor to get them to do what they know they need to do but won't do if left on their own'."

To get the other side of this story, I recently talked with Bo Lu (@bolu), one of the two co-founders of Future Advisor. Lu explained how the service works, where the data comes from and whether we should fear the dispassionate influence of our new robotic financial advisor overlords.

Where did the idea for Future Advisor come from?

Lu: The story behind Future Advisor is one of personal frustration. We started the company in 2010 when my co-founder and I were working at Microsoft. Our friends who had reached their mid-20s were really making money for the first time in their lives. They were now being asked to make decisions, such as "Where do I open an IRA? What do I do with my 401K?" As is often the case, they went to the friend who had the most experience, which in this case turned out to be me. So I said, "Well, let's just find you guys a good financial advisor and then we'll do this," because somehow in my mind, I thought, "Financial advisors do this."

It turned out that all of the financial advisors we found fell into two distinct classes. One were folks that were really nice but essentially in very kind words said, "Maybe you'd be more comfortable at the lower stakes table." We didn't meet any of their minimums. You needed a million dollars or at least a half million to get their services.

The other kinds of financial advisors who didn't have minimums immediately started trying to sell my friends term life insurance and annuities. I'm like, "These guys are 25. There's no reason for you to be doing this." Then I realized there was a misalignment of incentives there. We noticed that our friends were making a small set of the same mistakes over and over again, such as not having the right diversification for their age and their portfolio, or paying too much in mutual fund fees. Most people didn't understand that mutual funds charged fees and were not being tax efficient. We said, "Okay, this looks like a data problem that we can help solve for you guys." That's the genesis out of which Future Advisor was born.

What problem are you working on solving?

Bo Lu: Future Advisor is really trying to do one single thing: deliver on the vision that high-quality financial advice should be able to be produced cheaply and, thus, be broadly accessible to everyone.

If you look at the current U.S. market of financial advisors and you multiply the number of financial advisors in the U.S. — which is roughly a quarter-million people — by what is generally accepted to be a full book of clients, you'll realize that even at full capacity, the U.S. advisor market can serve only about 11% of U.S. households.

In serving that 11% of U.S. households, the advisory market for retail investing makes about $20 billion. This is a classic market where a service is extremely expensive but in being so can only serve a small percentage of the addressable market. As we walked into this, we realized that we're part of something bigger. If you look at 60 years ago, a big problem was that everyone wanted a color television and they just weren't being manufactured quickly or cheaply enough. Manufacturing scale has caught up to us. Now, everything you want you generally can have because manufactured things are cheap. Creating services is still extremely expensive and non-scalable. Healthcare as a service, education as a service and, of course, financial services, financial advising service comes to mind. What we're doing is taking information technology, like computer science, to scale a service in the way the electrical engineering of our forefathers scaled manufacturing.

How big is the team? How are you working together?

Bo Lu: The team has eight people in Seattle. It's almost exactly half finance and half engineering. We unabashedly have a bunch of engineers from MIT, which is where my co-founder went to school, essentially sucking the brains out of the finance team and putting them in software. It's really funny because a lot of the time when we design an algorithm, we actually just sit down and say, "Okay, let's look at a bunch of examples and see what the intuitive decisions are of science people and then try to encode them."

We rely heavily on the existing academic literature in both computational finance and economics because a lot of this work has been done. The interesting thing is that the knowledge is not the problem. The knowledge exists, and it's unequivocal in the things that are good for investors. Paying less in fees is good for investors. Being more tax efficient is good for investors. How to do that is relatively easy. What's hard for the industry for a long time has been to scalably apply those principles in a nuanced way to everybody's unique situation. That's something that software is uniquely good at doing.

How do you think about the responsibility of providing financial advice that traditionally has been offered by highly certified professionals who've taken exams, worked at banks, and are expensive to get to because of that professional experience?

Bo Lu: There's a couple of answers to that question, one of which is the folks on our team have the certifications that people look for. We've got certified financial advisors*, CFAs, which is a private designation on the team. We have math PhDs from the University of Washington on the team. The people who create the software are the caliber of people that you would want to be sitting down with you and helping you with your finances in the first place.

The second part of that is that we ourselves are a registered investment advisor. You'll see many websites that on the bottom say, "This is not intended to be financial advice." We don't say that. This is intended to be financial advice. We're registered federally with the SEC as a registered investment advisor and have passed all of the exams necessary.

*In the interview, Lu said that FutureAdvisor has 'certified financial advisors'. In this context, CFA stood for something else: The Future Advisor team includes Simon Moore, a chartered financial analyst, who advises the startup on investing algorithms design.

Where does the financial data behind the site come from?

Bo Lu: From the consumer side, the site has only four steps. These four steps are very familiar to anyone who's used a financial advisor before. A client signs up for the products. It's a free web service, designed to help everyone. In step one, they answer a couple of questions about their personal situation: age, how much they make, when they want to retire. Then they're asked the kinds of questions that good financial advisors ask, such as your risk tolerance. Here, you start to see that we rely on academic work as much as possible.

There is a great set of work out of the University of Kentucky on risk tolerance questionnaires. Whereas most companies just use some questionnaire they came up with internally, we went and scoured literature to find exact questions that were specifically worded — and have been tested under those wordings to yield statistically significant deviations in determining risk tolerance. So we use those questions. With that information, the algorithm can then come up with a target portfolio allocation for the customer.

In step two, the customer can synchronize or import data from their existing financial institutions into the software. We use Yodlee, which you've written about before. It's the same technology that Mint used to import detailed data about what you already hold in your 401K, in your IRA, and in all of your other investment accounts.

Step three is the dashboard. The dashboard shows your investments at a level that makes sense, rather than current brokerages where when you log in, they tell you how much money you have, with a list of funds you have, and how much they've changed in the last 24 hours of trading. We answer four questions on the dashboard.

  1. Am I on track?
  2. Am I well-diversified for this goal?
  3. Am I overpaying in hidden fees in my mutual funds?
  4. Am I as tax efficient as I could be?

We answer those four questions and then in the final step of the process, we give algorithmically-generated, step-by-step instructions about how to improve your portfolio. This includes specific advice like "this many shares of Fund X to buy this many shares of Fund Y" in your IRA. When the consumer sees this, he or she can go and, with this help, clean up their portfolios. It's kind of like diagnosis and prescription for your portfolio.

There are three separate streams of data underlying the product. One is the Yodlee stream, which is detailed holdings data from hundreds of financial institutions. Two is data about what's in a fund. That comes from Morningstar. Morningstar, of course, gets it from the SEC because mutual funds are required to disclose this. So we can tell, for example, if a fund is an international fund or a domestic fund, what the fees are, and what it holds. The third dataset is from the datasets that we have to tier in ourselves, which is 401K data from the Department of Labor.

On top of this triad of datasets sits our algorithm, which has undergone six to eight months of beta testing with customers. (We launched the product in March 2012.) That algorithm asks, "Okay, given these three datasets, what is the current state of your portfolio? What is the minimum number of moves to reduce both transaction costs and any capital gains that you might incur to get you from where you are to roughly where you need to be?" That's how the product works under the covers.

What's the business model?

Bo Lu: You can think of it as similar to Redfin. Redfin allows individual realtors to do more work by using algorithms to help them do all of the repetitive parts. Our product and the web service is free and will always be free. Information wants to be free. That's how we work in software. It doesn't cost us anything for an additional person to come and use the website.

The way that Future Advisor makes money is that we charge for advisor time. A small percentage of customers will have individual questions about their specific situation or want to talk to a human being and have them answer some questions. This is actually good in two ways.

One, it helps the transition from a purely human service to what we think will eventually be an almost purely digital service. People who are somewhere along that continuum of wanting someone to talk to but don't need someone full-time to talk to can still do that.

Two, those conversations are a great way for us to find out, in aggregate, what the things are that the software doesn't yet do or doesn't do well. Overall, if we take a ton of calls that are all the same, then it means there's an opportunity for the software to step in, scale that process, and help people who don't want to call us or who can't afford to call us to get that information.

What's the next step?

Bo Lu: This is a problem that has a dramatic possible impact attached to it. Personal investing, what the industry calls "retail investing," is a closed-loop system. Money goes in, and it's your money, and it stays there for a while. Then it comes out, and it's still your money. There's very little additional value creation by the financial advisory industry.

It may sound like I'm going out on a limb to say this, but it's generally accepted that the value creation of you and I putting our hard-earned money into the market is actually done by companies. Companies deploy that capital, they grow, and they return that capital in the form of higher stock prices or dividends, fueling the engine of our economic growth.

There are companies across the country and across the world adding value to people's lives. There's little to no value to be added by financial advisors trying to pick stocks. It's actually academically proven that there's negative value to be added there because it turns out the only people who make money are financial advisors.

This is a $20 billion market. But really what that means is that it's a $20 billion tax on individual American investors. If we're successful, we're going to reduce that $20 billion tax to a much smaller number by orders of magnitude. The money that's saved is kept by individual investors, and they keep more of what's theirs.

Because of the size of this market and the size of the possible impact, we are venture-backed because we can really change the world for the better if we're successful. There are a bunch of the great folks in the Valley who have done a lot of work in money and the democratization of software and money tools.

What's the vision for the future of your startup?

Bo Lu: I was just reading your story about smart disclosure a little while ago. There's a great analogy in there that I think applies aptly to us. It's maps. The first maps were paper. Today if you look at the way a retail investor absorbs information, it's mostly paper. They get a prospectus in the mail. They have a bunch of disclosures they have to sign — and the paper is extremely hard to read. I don't know if you've ever tried to read a prospectus; it's something that very few of us enjoy. (I happen to be one of them, but I understand if not everyone's me.) They're extremely hard to parse.

Then we moved on to the digital age of folks taking the data embedded in those prospectuses and making them available. That was Morningstar, right? Now we're moving into the age of folks taking that data and mating it with other data, such as 401K data and your own personal financial holdings data, to make individual personalized recommendations. That's Future Advisor the way it is today.

But just as maps moved from paper maps to Google Maps, it didn't stop there. It moves and has moved to self-autonomous cars. There will be a day when you and I don't ever have to look at a map because, rather than the map being a tool to help me make the decision to get somewhere, the map will be a part of a service I use that just gets the job done. It gets me from point A to point B.

In finance, the job is to invest my money properly. Steward it so that it grows, so that it's there for me when I retire. That's our vision as well. We're going to move from being an information service to actually doing it for you. It's just a default way so that if you do nothing, your financial assets are well taken care of. That's what we think is the ultimate vision of this: Everything works beautifully and you no longer have to think about it.

We're now asked to make ridiculous decisions about spreading money between a checking account, an IRA, a savings account and a 401K, which really make no sense to most of us. The vision is to have one pot of money that invests itself correctly, that you put money into when you earn money. You take money out when you spend it. You don't have to make any decisions that you were never trained nor educated to make about your own personal finances because it just does the right thing. The self-driving car is our vision.

Connecting the future of personal finance with an autonomous car is an interesting perspective. Just as with outsourcing driving, however, there's the potential for negative outcomes. Do you have any concerns about the algorithm going awry?

Bo Lu: We are extremely cognizant of the weighty matters that we are working with here. We have a ton of testing that happens internally. You could even criticize us, as a software development firm, in that we're moving slower than other software development firms. We're not going to move as quickly as Twitter or Foursquare because, to be honest, if they mess up, it's not that big a deal. We're extremely careful about it.

At the same time, I think the Google self-driving car analogy is apt because people immediately say, "Well, what if the car gets into an accident?" Those kinds of fears exist in all fields that matter.


Analysis: Why this matters

"The analogy that comes to mind for me isn't the self-driving car," commented Mike Loukides, via email. "It's personalized medicine."

One of the big problems in health care is that to qualify treatments, we do testing over a very wide sample, and reject it if it doesn't work better than a placebo. But what about drugs that are 100% effective on 10% of the population, but 0% effective on 90%? They're almost certainly rejected. It strikes me that what Future Advisor is doing isn't so much helping you to go on autopilot, but getting beyond generic prescriptions and generating customized advice, just as a future MD might be able to do a DNA sequence in his office and generate a custom treatment.

The secret sauce for Future Advisor is the combination of personal data, open government data and proprietary algorithms. The key to realizing value, in this context, is combining multiple data streams with a user interface that's easy for a consumer to navigate. That combination has long been known by another name: It's a mashup. But the mashups of 2012 have something that those of 2002 didn't have, at least in volume or quality: data.

Future Advisor, Redfin (real estate) or Castlight (healthcare) are all interesting examples of entrepreneurs creating data products from democratized government data. Future Advisor uses data from consumers and the U.S. Department of Labor, Redfin synthesizes data from economists and government agencies, and Castlight uses health data from the U.S. Department of Health and Human Services. In each case, they provide a valuable service and/or product by making sense of that data deluge.

Related:

May 29 2012

US CTO seeks to scale agile thinking and open data across federal government

In the 21st century, federal government must go mobile, putting government services and information at the fingertips of citizens, said United States Chief Technology Officer Todd Park in a recent wide-ranging interview. "That's the first digital government result, outcome, and objective that's desired."

To achieve that vision, Park and U.S. chief information officer Steven VanRoekel are working together to improve how government shares data, architects new digital services and collaborates across agencies to reduce costs and increase productivity through smarter use of information technology.

Park, who was chosen by President Obama to be the second CTO of the United States in March, has been (relatively) quiet over the course of his first two months on the job.

Last Wednesday, that changed. Park launched a new Presidential innovation Fellows program, in concert with VanRoekel's new digital government strategy, at TechCrunch's Disrupt conference in New York City. This was followed by another event for a government audience at the Interior Department headquarters in Washington, D.C. Last Friday, he presented his team's agenda to the President's Council of Advisors on Science and Technology.

"The way I think about the strategy is that you're really talking about three elements," said Park, in our interview. "First, it's going mobile, putting government services at the literal fingertips of the people in the same way that basically every other industry and sector has done. Second, it's being smarter about how we procure technology as we move government in this direction. Finally, it's liberating data. In the end, it's the idea of 'government as a platform.'"

"We're looking for a few good men and women"

In the context of the nation's new digital government strategy, Park announced the launch of five projects that this new class of Innovation Fellows will be entrusted with implementing: a broad Open Data Initiative, Blue Button for America, RFP-EZ, The 20% Campaign, and MyGov.

The idea of the Presidential Innovation Fellows Program, said Park, is to bring in people from outside government to work with innovators inside the government. These agile teams will work together within a six-month time frame to deliver results.

The fellowships are basically scaling up the idea of "entrepreneurs in residence," said Park. "It's a portfolio of five projects that, on top of the digital government strategy, will advance the implementation of it in a variety of ways."

The biggest challenge to bringing the five programs that the US CTO has proposed to successful completion is getting 15 talented men and women to join his team and implement them. There's reason for optimism. Park shared vie email that:

"... within 24 hours of TechCrunch Disrupt, 600 people had already registered via Whitehouse.gov to apply to be a Presidential Innovation Fellow, and another several hundred people had expressed interest in following and engaging in the five projects in some other capacity."

To put that in context, Code for America received 550 applications for 24 fellowships last year. That makes both of these fellowships more competitive than getting in to Harvard in 2012, which received 34,285 applications for its next freshman class. There appears to be considerable appetite for a different kind of public service that applies technology and data for the public good.

Park is enthusiastic about putting open government data to work on behalf of the American people, amplifying the vision that his predecessor, Aneesh Chopra, championed around the country for the past three years.

"The fellows are going to have an extraordinary opportunity to make government work better for their fellow citizens," said Park in our interview. "These projects leverage, substantiate and push forward the whole principle of liberating data. Liberate data."

"To me, one of the aspects of the strategy about which I am most excited, that sends my heart into overdrive, is the idea that going forward, the default state of government data shall be open and machine-readable," said Park. "I think that's just fantastic. You'll want to, of course, evolve the legacy data as fast as you can in that same direction. Setting that as 'this is how we are rolling going forward' — and this is where we expect data to ultimately go — is just terrific."

In the videos and interview that follow, Park talks more about his vision for each of the programs.

A federal government-wide Open Data Initiative

In the video below, Park discusses the Presidential Innovation Fellows program and introduces the first program, which focuses on open data:

Park: The Open Data Initiative is a program to seed and expand the work that we're doing to liberate government data as a platform. Encourage, on a voluntary basis, the liberation of data by corporations, as part of the national data platform, and to actively stimulate the development of new tools and services, and enhance existing tools and services, leveraging the data to help improve Americans' lives in very tangible ways, and create jobs for the future.

This leverages the Open Government Directive to say "look, the default going forward is open data." Also the directive to "API-ize" two high priority datasets and also, in targeted ways, go beyond that, and really push to get more data out there in, critically, machine-readable form, in APIs, and to educate the entrepreneur and innovators of the world that it's there through meetups, and hackathons, and challenges, and "Datapaloozas."

We're doubling down on the Health Data Initiative, we are also launching a much more high-profile Safety Data Initiative, which we kicked off last week. An Energy Data Initiative, which kicked off this week. An education data initiative, which we're kicking off soon, and an Impact Data Initiative, which is about liberating data with respect to inputs and outputs in the non-profit space.

We're also going to be exploring an initiative in the realm of personal finance, enabling Americans to access copies of their financial data from public sector agencies and private sector institutions. So, the format that we're going to be leveraging to execute these initiatives is cloned from the Health Data Initiative.

This will make new data available. It will also take the existing public data that is unusable to developers, i.e. in the form of PDFs, books or static websites, and turn it into liquid machine-readable, downloadable, accessible data via API. Then — because we're consistently hearing that 95% of the innovators and entrepreneurs who could turn our data into magic don't even know the data exists, let alone that it's available to them — engage the developer community and the entrepreneurial community with the data from the beginning. Let them know it's there, get their feedback, make it better.

Blue Button for America

Park: The idea is to develop an open source patient portal capability that will replace MyHealthyVet, which is the Veterans Administration's current patient portal. This will actually allow the Blue Button itself to iterate and evolve more rapidly, so that everY time you add more data to it, it won't require heart surgery. It will be a lot easier, and of course will be open source, so that anyone else who wants to use it can use it as well. On top of that, we're going to do a lot of "biz dev" in America to get the word out about Blue Button and encourage more and more holders of data in the private sector to adopt Blue Button. We're also going to work to help stimulate more tool development by entrepreneurs that can upload Blue Button data and make it useful in all kinds of ways for patients. That's Blue Button for America.

What is RFP-EZ?

Park: The objective is "buying smarter." The project that we're working ON with the Small Business Administration on is called "RFP-EZ."

Basically, it's the idea of setting up a streamlined process for the government to procure solutions from innovative, high-growth tech companies. As you know, most high-growth companies regard the government as way too difficult to sell to.

That A) deprives startups and high-growth companies from the government as a marketplace and, B) perhaps even more problematically, actually deprives the government of their solutions.

The hope here is, through the actions of the RFP-EZ team, to create a process and a prototype that the government can much more easily procure solutions from innovative private firms.

It A) opens up this emerging market called "the government" to high-tech startups and B) infects the government with more of their solutions, which are radically more, pound for pound, effective and cost efficient than a lot of the stuff that the government is currently procuring through conventional channels. That's RFP-EZ.

The 20% Campaign

Park: The 20% Campaign is a project that's being championed by USAID. It's an effort at USAID to, working with other government agencies, NGOs and companies, to catalog the movement of foreign assistance payments from cash to electronics. So, just for example, USAID pays its contractors electronically, obviously, but the contractor who, say, pays highway workers in Afghanistan or the way that police officers get paid in Afghanistan is actually principally via cash. Or has been. And that creates all kinds of waste issues, fraud, and abuse.

The idea is actually to move to electronic payment, including mobile payment — and this has the potential to significantly cut waste, fraud and abuse, to improve financial inclusion, to actually let people on phones, to enable them to access bank accounts set up for them. That leads to all kinds of good things, including safety: it's not ideal to be carrying around large amounts of cash in highly kinetic environments.

The Afghan National Police started paying certain contingents of police officers via mobile phones and mobile payments, as opposed to cash, and what happened is that the police officers started reporting an up to a 30% raise. Of course, their pay hadn't changed, but basically, when it was in cash, a bunch of it got lost. This is obviously a good thing, but it's even more important if you realize that when they were paid what they were paid in cash that they ultimately physically received, that was less than the Taliban in this province was actually paying people to join the Taliban — but the mobile payment, and that level of salary, was greater than the Taliban was paying. That's a critical difference.

It's basically taking foreign assistance payments through the last mile to mobile.

MyGov is the U.S. version of Gov.uk

Park: MyGov is an effort to rapidly prototype a citizen-centric system that allows Americans the information and resources of government that are right for them. Think of it as a personalized channel for Americans to be able to access information resources across government and get feedback from citizens about those information and resources.

How do you plan to scale what you learned while you were HHS CTO to the all of the federal government?

Park: Specifically, we're doing exactly the same thing we did with the Health Data Initiative, kicking off the initiatives with a "data jam" — an ideation workshop where we invite, just like with health data, 40 amazing tech and energy minds, tech and safety innovators, to a room — at the White House, in the case of the Safety Data Initiative, or at Stanford University, in the case of the Energy Initiative.

We walk into the room for several hours and say, "Here's a big pile of data. What would you do with this data?" And they invent 15 or 20 news classes of products or services of the future that we could build with the data. And then we challenge them to, at the end of the session, build prototypes or actual working products, that instantiates their ideas in 90 days, to be highlighted at a White House — hosted Safety Datapalooza, Energy Datapalooza, Education Datapalooza, Impact Datapalooza, etc.

We also take the intellectual capital from the workshops, publish it on the White House website, and publicize the opportunity around the country: Discover the data, come up with your own ideas, build prototypes, and throw your hat in the ring to showcase at a Datapalooza.

What happens at the Datapaloozas — our experience in health guides us — is that, first of all, the prototypes and working products inspire many more innovators to actually build new services, products and features, because the data suddenly becomes really concrete to them, in terms of how it could be used.

Secondly, it helps persuade additional folks in the government to liberate more data, making it available, making it machine-readable, as opposed to saying, "Look, I don't know what the upside is. I can only imagine downsides." What happened in health is, when they went to a Datapalooza, they actually saw that, if data is made available, then at no cost to you and no cost to taxpayers, other people who are very smart will build incredible things that actually enhance your mission. And so you should do the same.

As more data gets liberated, that then leads to more products and services getting built, which then inspires more data liberation, which then leads to more products and services getting built — so you have a virtual spiral, like what's happened in health.

The objective of each of these initiatives is not just to liberate data. Data by itself isn't helpful. You can't eat data. You can't pour data on a wound and heal it. You can't pour data on your house and make it more energy efficient. Data is only useful if it's applied to deliver benefit. The whole point of this exercise, the whole point of these kickoff efforts, is to catalyze the development of an ecosystem of data supply and data use to improve the lives of Americans in very tangible ways — and create jobs.

We have the developers and the suppliers of data actually talk to each other, create value for the American people, and then rinse, wash, repeat.

We're recruiting, to join the team of Presidential Innovation Fellows, entrepreneurs and developers from the outside to come in and help with this effort to liberate data, make it machine-readable, and get it out there to entrepreneurs and help catalyze development of this ecosystem.

We went to TechCrunch Disrupt for a reason: it's right smack dab center in the middle of people we want to recruit. We invite people to check out the projects on WhiteHouse.gov and, if you're interested in applying to be a fellow, indicate their interest. Even if they can't come to DC for 6-plus months to be a fellow, but they want to follow one of the projects or contribute or help in some way, we are inviting them express interest in that as well. For example, if you're an entrepreneur, and you're really interested in the education space, and learning about what data is available in education, you can check out the project, look at the data, and perhaps you can build something really good to show at the Education Datapalooza.

Is open data just about government data? What about smart disclosure?

Park: In the context of the Open Data Initiatives projects, it's not just about liberation of government health data: it's also about government catalyzing the release, on a voluntary basis, of private sector data.

Obviously, scaling Blue Button will extend the open data ecosystem. We're also doubling down on Green Button. I was just in California to host discussions around Green Button. Utilities representing 31 million households and businesses have now committed to make Green Button happen. Close to 10 million households and businesses already have access to Green Button data.

There's also a whole bunch of conversation happening about, at some point later this year, having the first utilities add the option of what we're calling "Green Button Connect." Right now, the Green Button is a download, where you go to a website, hit a green button and bam, you download your data. Green Button Connect is the ability for you to say as a consumer, "I authorize this third party to receive a continuous feed of my electricity usage data."

That creates massive additional opportunity for new products and services. That could go live later this year.

As part of the education data initiative, we are pursuing the launch and scale up of something called "My Data," which will have a red color button. (It will probably, ultimately, be called "Red Button.") This is the ability for students and their families to download an electronic copy of their student loan data, of their transcript data, of their academic assessment data.

That notion of people getting their own data, whether it's your health data, your education data, your finance data, your energy use data, that's an important part of these open data initiatives as well, with government helping to catalyze the release of that data to then feed the ecosystem.

How does open data specifically relate to the things that Americans care about, access to healthcare, reducing energy bills, giving their kids more educational opportunities, and job creation? Is this just about apps?

Park: In healthcare, for example, you'll see a growing array of examples that leverage data to create tangible benefit in many, many ways for Americans. Everything from helping me find the right doctor or hospital for my family to being notified of a clinical trial that could assist my profile and save my life, and the ability to get the latest and greatest information about how to manage my asthma and diabetes via government knowledge in the National Library of Medicine.

There is a whole shift in healthcare systems away from pay-for-volume of services to basically paying to get people healthy. It goes by lots of different names — accountable care organizations or episodic payment — but the fundamental common theme is that the doctors and hospitals increasingly will be paid to keep people healthy and to co-ordinate their care, and keep them out of the hospital, and out of the ER.

There's a whole fleet of companies and services that utilize data to help doctors and hospitals do that work, like utilize Medicare claims data to help identity segments of a patient population that are at real risk, and need to get to the ER or hospital soon. There are tools that help journalists identify easily public health issues, like healthcare outcomes disparities by race, gender and ethnicity. There are tools that help country commissioners and mayors understand what's going on in a community, from a health standpoint, and make better policy decisions, like showing them food desserts. There's just a whole fleet of rapidly growing services for consumers, for doctors, nurses, journalists, employers, public policy makers, that help them make decisions, help them deliver improved health and healthcare, and create jobs, all at the same time.

That's very exciting. If you look at all of those products and services — and a subset of them are the ones that self-identify to us, to actually be exhibited at the Health Datapaloozas. Look at the 20 healthcare apps that were at the first Datapalooza or the 50 that were at the second. This year, there are 230 companies that are being narrowed down to about a total of 100 that will be at the Datapalooza. They collectively serve millions of people today, either through brand new products and services or through new features on existing platforms. They help people in ways that we would never have thought of, let alone build.

The taxpayer dollars expended here were zero. We basically just took our data, made it available in machine-readable format, educated entrepreneurs that it was there, and they did the rest. Think about these other sectors, and think about what's possible in those sectors.

In education, through making the data that we've made available, you can imagine much better tools to help you shop for the college that will deliver the biggest bang for your buck and is the best fit for your situation.

We've actually made available a bunch of data about college outcomes and are making more data available in machine-readable form so it can feed college search tools much better. We are going to be enabling students to download machine-readable copies of their own financial aid application, student loan data and school records. That will really turbo charge "smart scholarship" and school search capabilities for those students. You can actually mash that up with college outcomes in a really powerful, personalized college and scholarship search engine that is enabled by your personal data plus machine-readable data. Tools that help kids and their parents pick the right college for their education and get the right financial aid, that's something government is going to facilitate.

In the energy space, there are apps and services that help you leverage your Green Button data and other data to really assess your electricity usage compared to that of others and get concrete tips on how you can actually save yourself money. We're already seeing very clever, very cool efforts to integrate gamification and social networking into that kind of app, to make it a lot more fun and engaging — and make yourself money.

One dataset that's particularly spectacular that we're making a lot more usable is the EnergyStar database. It's got 40,000 different appliances, everything from washing machines to servers that consumers and businesses use. We are creating a much, much easier to use public, downloadable NSTAR database. It's got really detailed information on the energy use profiles and performance of each of these 40,000 appliances and devices. Imagine that actually integrated into much smarter services.

On safety, the kinds of ideas that people are bringing together are awesome. They're everything from using publicly available safety data to plot the optimal route for your kid to walk home or for a first responder to travel through a city and get to a place most expeditiously.

There's this super awesome resource on Data.gov called the "Safer Products API," which is published by the Consumer Products Safety Commission (CPSC). Consumers send in safety reports to CPSC, but until March of last year, you had to FOIA [Freedom of Information Act] CPSC to get these. So what they've now done is actually publish an API which not only makes the entire database of these reports public, without you having to FOIA them, but also makes it available through an API.

One of the ideas that came up is that, when people buy products on eBay, Craiglist, etc, all the time, some huge percentage of Americans never get to know about a recall — a recall of a crib, a recall of a toy. And even when a company recalls new products, old products are in circulation. What if someone built the ability to integrate the recall data and attach it to all the stuff in the eBays and Craigslists of the world?

Former CIO Vivek Kundra often touted government recall apps based upon government data during his tenure. Is this API the same thing, shared again, or something new?

Park: I think the smartest thing the government can do with data like product recalls data is not build our own shopping sites, or our own product information sites: it's to get the information out there in machine-readable form, so that lots and lots of other platforms that have audiences with millions of people already, and who are really good at creating shopping experiences or product comparison experiences, get the data into their hands, so that they can integrate it seamlessly into what they do. I feel that that's really the core play that the government should be engaged in.

I don't know if the Safer Products API was included in the recall app. What I do know is that before 2011, you had to FOIA to get the data. I think that even if the government included it in some app the government built, that it's important for it to get used by lots and lots of other apps that have a collective audience that's massively greater than any app the government could itself build.

Another example of this is the Hospital Compare website. The Hospital Compare website has been around for a long time. Nobody knows about it. There was a survey done that found 94% of Americans didn't know that there was hospital quality data that was available, let alone that there was a hospital compare website. So, the notion of A) making the hospital care data downloadable and B), we actually deployed it a year and a half ago in API form at Medicare.gov.

That then makes the data much easier for lots of other platforms to incorporate it, that are far more likely than HospitalCompare.gov to be able to present the information in actionable forms for citizens. Even if we build our own apps, we have to get this data out to lots of other people that can help people with it. To do that, we have to make it machine-readable, we have to put it into RESTFUL APIs — or at least make it downloadable — and get the word out to entrepreneurs that it's something they can use.

This is a stunning arbitrage opportunity. Even if you take all this data and you "API-ize" it, it's not automatic that entrepreneurs are going to know it's there.

Let's assume that the hospital quality data is good — which it is — and that you build it, and put it into an API. If nobody knows about it, you've delivered no value to the American people. People don't care whether you API a bunch of data. What they care about is that when they need to find a hospital, like I did, for my baby, I can get that information.

The private sector, in the places where we have pushed the pedal to the medal on this, has just demonstrated the incredible ability to make this data a lot more relevant and help a lot more people with it than we could have by ourselves.

White House photo used on associated home and category pages: white house by dcJohn, on Flickr

May 24 2012

Strata Week: Visualizing a better life

Here are a few of the data stories that caught my attention this week:

Visualizing a better life

How do you compare the quality of life in different countries? As The Guardian's Simon Rogers points out, GDP has commonly been the indicator used to show a country's economic strength, but it's insufficient for comparing the quality of life and happiness of people.

To help build a better picture of what quality of life means to people, the Organization for Economic Cooperation and Development OECD built the Your Better Life Index. The index lets people select the things that matter to them: housing, income, jobs, community, education, environment, governance, health, life satisfaction, safety and work-life balance. The OECD launched the tool last year and offered an update this week, adding data on gender and inequality.

OECD.jpg
Screenshot from OECD's Your Better Life Index.

"It's counted as a major success by the OECD," writes Rogers, "particularly as users consistently rank quality of life indicators such as education, environment, governance, health, life satisfaction, safety and work-life balance above more traditional ones. Designed by Moritz Stefaner and Raureif, it's also rather beautiful."

The countries that come out on top most often based on users' rankings: "Denmark (life satisfaction and work-life balance), Switzerland (health and jobs), Finland (education), Japan (safety), Sweden (environment), and the USA (income)."

Researchers' access to data

The New York Times' John Markoff examines social science research and the growing problem of datasets that are not made available to other scholars. Opening data helps make sure that research results can be verified. But Markoff suggests that in many cases, data is being kept private and proprietary.

Much of the data he's talking about here is:

"... gathered by researchers at companies like Facebook, Google and Microsoft from patterns of cellphone calls, text messages and Internet clicks by millions of users around the world. Companies often refuse to make such information public, sometimes for competitive reasons and sometimes to protect customers' privacy. But to many scientists, the practice is an invitation to bad science, secrecy and even potential fraud."

"The debate will only intensify as large companies with deep pockets do more research about their users," Markoff predicts.

Updates to Hadoop

Apache has released the alpha version of Hadoop 2.0.0. We should stress "alpha" here, and as Hortonworks' Arun Murthy notes, it's "not ready to run in production." However, he adds the update "is still an important step forward, as it represents the very first release that delivers new and important capabilities," including: HDFS HA (manual failover) and next generation MapReduce.

In other Hadoop news, MapR has unveiled a series of new features and initiatives for its Hadoop distribution, including release of a fully compliant ODBC 3.52 driver, support for the Linux Pluggable Authentication Modules (PAM), and the availability of the source code for several of its components.

Have data news to share?

Feel free to email me.

OSCON 2012 — Join the world's open source pioneers, builders, and innovators July 16-20 in Portland, Oregon. Learn about open development, challenge your assumptions, and fire up your brain.

Save 20% on registration with the code RADAR


Related:


Knight Foundation grants $2 million for data journalism research

Every day, the public hears more about technology and media entrepreneurs, from when they started in the garages and the dorm rooms, all the way up until when they go public, get acquired or go spectacularly bust. The way that the world mourned the passing of Steve Jobs last year and that young people now look to Mark Zuckerberg as a model for what's possible offer some insight into that dynamic.

For those who want to follow in their footsteps, the most interesting elements of those stories will be the muddy details of who came up with the idea, who wrote the first lines of code, who funded them, how they were mentored and then how the startup executed upon their ideas.

Today, foundations and institutions alike are getting involved in the startup ecosystem, but with a different hook than the venture capitalists on Sand Hill Road in California or Y Combinator: They're looking for smart, ambitious social entrepreneurs who want to start civic startups and increase the social capital of the world. From the Code for America Civic Accelerator to the Omidyar Foundation to Google.org to the Knight Foundation's News Challenge, there's more access to seed capital than ever before.

There are many reasons to watch what the Knight Foundation is doing, in particular, as it shifts how it funds digital journalism projects. The foundation's grants are going toward supporting many elements of the broader open government movement, from civic media to government transparency projects to data journalism platforms.

Many of these projects — or elements and code from them — have a chance at becoming part of the plumbing of digital democracy in the 21st century, although we're still on the first steps of the long road of that development.

This model for catalyzing civic innovation in the public interest is, in the broader sweep of history, still relatively new. (Then again, so is the medium you're reading this post on.) One barrier that the Internet has helped lower is in the process of discovering and selecting good ideas to fund and letting bad ideas fall to the wayside. Another is changing how ideas are capitalized through microfunding approaches or how distributing opportunities for participation in helping products or services go to market now can happen though crowdfunding platforms like Kickstarter.

When the Pebble smartwatch received $10 million through Kickstarter this year, it offered a notable data point into how this model could work. We'll see how others follow.

These models could contribute to the development of small pieces of civic architecture around the world, loosely joining networks in civil society with mobile technology, lightweight programming languages and open data.

After years of watching how the winners of the Knight News Challenges have — or have not — contributed to this potential future, its architects are looking at big questions: How should resources be allocated in newsrooms? What should be measured? Are governments more transparent and accountable due to the use of public data by journalists? What data is available? What isn't? What's useful and relevant to the lives of citizens? How can data visualization, news applications and interactive maps inform and engage readers?

In the context of these questions, the fact that the next Knight News Challenge will focus on data will create important new opportunities to augment the practice of journalism and accelerate the pace of open government. John Bracken (@jsb), the Knight Foundation's program director for journalism and media innovation, offered an explanation for this focus on the foundation's blog:

"Knight News Challenge: Data is a call for making sense of this onslaught of information. 'As data sits teetering between opportunity and crisis, we need people who can shift the scales and transform data into real assets,' wrote Roger Ehrenberg earlier this year.

"Or, as danah boyd has put it, 'Data is cheap, but making sense of it is not.'

"The CIA, the NBA's Houston Rockets, startups like BrightTag and Personal ('every detail of your life is data') — they're all trying to make sense out of data. We hope that this News Challenge will uncover similar innovators discovering ways for applying data towards informing citizens and communities."

Regardless of what happens with this News Challenge, some of those big data questions stand a much better chance of being answered because of the Knight Foundation's $2 million grant to Columbia University to research and distribute best practices for digital reporting, data visualizations and measuring impact.

Earlier this spring, I spoke with Emily Bell, the director of the Tow Center for Digital Journalism, about how this data journalism research at Columbia will close the data science "skills gap" in newsrooms. Bell is now entrusted with creating the architecture for learning that will teach the next generation of data journalists at Columbia University.

In search of the reasoning behind the grant, I talked to Michael Maness (@MichaelManess), vice president of journalism and media innovations at the Knight Foundation. Our interview, lightly edited for content and clarity, follows.

The last time I checked, you're in charge of funding ideas that will make the world better through journalism and technology. Is that about right?

Michael Maness: That's the hope. What we're trying to do is make sure that we're accelerating innovation in the journalism and media space that continues to help inform and engage communities. We think that's vital for democracy. What I do is work on those issues and fund ideas around that to not only make it easier for journalists to do their work, but citizens to engage in that same practice.

The Knight News Challenge has changed a bit over the last couple of years. How has the new process been going?

Michael Maness: I've been in the job a little bit more than a year. I came in at the tail end of 2011 and the News Challenge of 2011. We had some great winners, but we noticed that in the amount of time from when you applied in the News Challenge to when you were funded could be up to 10 months, by the time everything was done, and certainly eight months in terms of the process. So we reduced that to about 10 weeks. It's intense for the judges to do that, but we wanted to move more quickly, recognizing the speed of disruption and the energy of innovation and how fast it's moving.

We've also switched to a thematic theme. We're going to do three [themes] this year. The point of it is to fund as fast as possible those ideas that we think are interesting and that we think will have a big impact.

This last round was around networks. The reason we focused on networks is the apparent rise of network power. The second reason is we get people, for example, that say, "This is the new Twitter for X" or "This is the new Facebook for journalists." Our point is actually, you should be using and leveraging existing things for that.

We found when we looked back at the last five years of the News Challenge that people who came in with networks or built networks in accordance with what they're doing had a higher and faster scaling rate. We want to start targeting areas to do that, too.

We hear a lot about entrepreneurs, young people and the technology itself, but schools and libraries seem really important to me. How will existing institutions be part of the future that you're funding and building?

Michael Maness: One of the things that we're doing is moving into more "prototyping" types of grants and then finding ways of scaling those out, helping get ideas into a proof-of-concept phase so users kick the tires and look for scaling afterward.

In terms of the institutions, one of the things that we've seen that's been a bit of a frustration point is making sure that when we have innovations, [we're] finding the best ways to parlay those into absorption in these kinds of institutions.

A really good standout for that, from a couple years ago as a News Challenge winner, is DocumentCloud, which has been adopted by a lot of the larger legacy media institutions. From a university standpoint, we know one of the things that is key is getting involvement with students as practitioners. They're trying these things out and they're doing the two kinds of modeling that we're talking about. They're using the newest tools in the curriculum.

That's one of the reasons we made the grant [to Columbia.] They have a good track record. The other reason is that you have a real practitioner there with Emily Bell, doing all of her digital work from The Guardian and really knowing how to implement understandings and new ways of reporting. She's been vital. We see her as someone who has lived in an actual newsroom, pulling in those digital projects and finding new ways for journalists to implement them.

The other aspect is that there are just a lot of unknowns in this space. As we move forward, using these new tools for data visualization, for database reporting, what are the things that work? What are the things that are hard to do? What are the ideas that make the most impact? What efficiencies can we find to help newsrooms do it? We didn't really have a great body of knowledge around that, and that's one of the things that's really exciting about the project at Columbia.

How will you make sure the results of the research go beyond Columbia's ivy-covered walls?

Michael Maness: That was a big thing that we talked about, too, because it's not in us to do a lot of white papers around something like this. It doesn't really disseminate. A lot of this grant is around making sure that there are convocations.

We talk a lot about the creation of content objects. If you're studying data visualization, we should be making sure that we're producing that as well. This will be something that's ongoing and emerging. Definitely, a part of it is that some of these resources will go to hold gatherings, to send people out from Columbia to disseminate [research] and also to produce findings in a way that can be moved very easily around a digital ecosystem.

We want to make sure that you're running into this work a lot. This is something that we've baked into the grant, and we're going to be experimenting with, I think, as it moves forward. But I hear you, that if we did all of this — and it got captured behind ivy walls — it's not beneficial to the industry.

Related:

May 22 2012

Data journalism research at Columbia aims to close data science skills gap

Successfully applying data science to the practice of journalism requires more than providing context and finding clarity in vasts amount of unstructured data: it will require media organizations to think differently about how they work and who they venerate. It will mean evolving towards a multidisciplinary approach to delivering stories, where reporters, videographers, news application developers, interactive designers, editors and community moderators collaborate on storytelling, instead of being segregated by departments or buildings.

The role models for this emerging practice of data journalism won't be found on broadcast television or on the lists of the top journalists over the past century. They're drawn from the increasing pool of people who are building new breeds of newsrooms and extending the practice of computational journalism. They see the reporting that provisions their journalism as data, a body of work that can itself can be collected, analyzed, shared and used to create longitudinal insights about the ways that society, industry or government are changing. (Or not, as the case may be.)

In a recent interview, Emily Bell (@EmilyBell), director of the Tow Center for Digital Journalism at the Columbia University School of Journalism, offered her perspective about what's needed to train the data journalists of the future and the changes that still need to occur in media organizations to maximize their potential. In this context, while the role of institutions and "journalism education are themselves evolving, they both will still fundamentally matter for "what's next," as practitioners adapt to changing newsonomics.

Our discussion took place in the context of a notable investment in the future of data journalism: a $2 million research grant to Columbia University from the Knight Foundation to research and distribute best practices for digital reportage, data visualizations and measuring impact. Bell explained more about what how the research effort will help newsrooms determine what's next on the Knight Foundation's blog:

The knowledge gap that exists between the cutting edge of data science, how information spreads, its effects on people who consume information and the average newsroom is wide. We want to encourage those with the skills in these fields and an interest and knowledge in journalism to produce research projects and ideas that will both help explain this world and also provide guidance for journalism in the tricky area of ‘what next’. It is an aim to produce work which is widely accessible and immediately relevant to both those producing journalism and also those learning the skills of journalism.

We are focusing on funding research projects which relate to the transparency of public information and its intersection with journalism, research into what might broadly be termed data journalism, and the third area of ‘impact’ or, more simply put, what works and what doesn’t.

Our interview, lightly edited for content and clarity, follows.

What did you do before you became director of the Tow Center for Digital Journalism?

I spent ten years where I was editor-in-chief of The Guardian website. During the last four of those, I was also overall director of digital content for all The Guardian properties. That included things like mobile applications, et cetera, but from the editorial side.

Over the course of that decade, you saw one or two things change online, in terms of what journalists could do, the tools available to them and the news consumption habits of people. You also saw the media industry change, in terms of the business models and institutions that support journalism as we think of it. What are the biggest challenges and opportunities for the future journalism?

For newspapers, there was an early warning system: that newspaper circulation has not really consistently risen since the early 1980s. We had a long trajectory of increased production and actually, an overall systemic decline which has been masked by a very, very healthy advertising market, which really went on an incredible bull run with a more static pictures, and just "widen the pipe," which I think fooled a lot of journalism outlets and publishers into thinking that that was the real disruption.

And, of course, it wasn’t.

The real disruption was the ability of anybody anywhere to upload multimedia content and share it with anybody else who was on a connected device. That was the thing that really hit hard, when you look at 2004 onwards.

What journalism has to do is reinvent its processes, its business models and its skillsets to function in a world where human capital does not scale well, in terms of sifting, presenting and explaining all of this information. That’s really the key to it.

The skills that journalists need to do that -- including identifying a story, knowing why something is important and putting it in context -- are incredibly important. But how you do that, which particular elements you now use to tell that story are changing.

Those now include the skills of understanding the platform that you’re operating on and the technologies which are shaping your audiences’ behaviors and the world of data.

By data, I don’t just mean large caches of numbers you might be given or might be released by institutions: I mean that the data thrown off by all of our activity, all the time, is simply transforming the speed and the scope of what can be explained and reported on and identified as stories at a really astonishing speed. If you don’t have the fundamental tools to understand why that change is important and you don’t have the tools to help you interpret and get those stories out to a wide public, then you’re going to struggle to be a sustainable journalist.

The challenge for sustainable journalism going forward is not so different from what exists in other industries: there's a skills gap. Data scientists and data journalists use almost the exact same tools. What are the tools and skills that are needed to make sense of all of this data that you talked about? What will you do to catalog and educate students about them?

It's interesting when you say that the skills of these clients are very similar, which is absolutely right. First of all, you have a basic level of numeracy needed - and maybe not just a basic level, but a more sophisticated understanding of statistical analysis. That’s not something which is routinely taught in journalism schools but that I think will increasingly have to be.

The second thing is having some coding skills or some computer science understanding to help with identifying the best, most efficient tools and the various ways that data is manipulated.

The third thing is that when you’re talking about 'data scientists,' it’s really a combination of those skills. Adding data doesn’t mean you don't have to have other journalism skills which do not change: understanding context, understanding what the story might be, and knowing how to derive that from the data that you’re given or the data that exists. If it’s straightforward, how do you collect it? How do you analyze it? How do you interpret them and present it?

It’s easy to say, but it’s difficult to do. It’s particularly difficult to reorient the skillsets of an industry which have very much resided around the idea of a written story and an ability with editing. Even in the places where I would say there’s sophisticated use of data in journalism, it’s still a minority sport.

I’ve talked to several heads of data in large news organizations and they’ve said, “We have this huge skills gap because we can find plenty of people who can do the math; we can find plenty of people who are data scientists; we can’t find enough people who have those skills but also have a passion or an interest in telling stories in a journalistic context and making those relatable.”

You need a mindset which is about putting this in the context of the story and spotting stories, as well having creative and interesting ideas about how you can actually collect this material for your own stories. It’s not a passive kind of processing function if you’re a data journalist: it’s an active speaking, inquiring and discovery process. I think that that’s something which is actually available to all journalists.

Think about just local information and how local reporters go out and speak to people every day on the beat, collect information, et cetera. At the moment, most get from those entities don’t structure the information in a way that will help them find patterns and build new stories in the future.

This is not just about an amazing graphic that the New York Times does with census data over the past 150 years. This is about almost every story. Almost every story has some component of reusability or a component where you can collect the data in a way that helps your reporting in the future.

To do that requires a level of knowledge about the tools that you’re using, like coding, Google Refine or Fusion Tables. There are lots of freely available tools out there that are making this easier. But, if you don’t have the mindset that approaches, understands and knows why this is going to help you and make you a better reporter, then it’s sometimes hard to motivate journalists to see why they might want to grab on.

The other thing to say, which is really important, is there is currently a lack of both jobs and role models for people to point to and say, “I want to be that person.”

I think the final thing I would say to the industry is we’re getting a lot of smart journalists now. We are one of the schools where all of our digital concentrations from students this year include a basic grounding in data journalism. Every single one of them. We have an advanced course taught by Susan McGregor in data visualization. But we’re producing people from the school now, who are being hired to do these jobs, and the people who are hiring them are saying, “Write your own job description because we know we want you to do something, we just don’t quite know what it is. Can you tell us?”

You can’t cookie-cutter these people out of schools and drop them into existing roles in news trends because those are still developing. What we’re seeing are some very smart reporters with data-centric mindsets and also the ability to do these stories -- but they want to be out reporting. They don’t want to be confined to a desk and a spreadsheet. Some editors usually find that very hard to understand, “Well, what does that job look like?”

I think that this is where working with the industry, we can start to figure some of these things out, produce some experimental work or stories, and do some of the thinking in the classroom that helps people figure out what this whole new world is going to look like.

What do journalism schools need to do to close this 'skills gap?' How do they need to respond to changing business models? What combination of education, training and hands-on experience must they provide?

One of the first things they need to do is identify the problem clearly and be honest about it. I like to think that we’ve done that at Columbia, although I’m not a data journalist. I don’t have a background in it. I’m a writer. I am, if you like, completely the old school.

But one of the things I did do at The Guardian was helped people who early on said to me, “Some of this transformation means that we have to think about data as being a core part of what we do.” Because of the political context and the position I was in, I was able to recognize that that was an important thing that they were saying and we could push through changes and adoption in those areas of the newsroom.

That’s how The Guardian became interested in data. It’s the same in journalism school. One of the early things that we talked about [at Columbia] was how we needed to shift some of what the school did on its axis and acknowledge that this was going to be key part of what we do in the future. Once we acknowledged that that is something we had to work towards, [we hired] Susan McGregor from the Wall Street Journal’s Interactive Team. She’s an expert in data journalism and has an MA in technology in education.

If you say to me, “Well, what’s the ground vision here?” I would say the same thing I would say to anybody: over time, and hopefully not too long a course of time, we want to attract a type of student that is interested and capable in this approach. That means getting out and motivating and talking to people. It means producing attractive examples which high school children and undergraduate programs think about [in their studies]. It means talking to the CS [computer science] programs -- and, in fact, more about talking to those programs and math majors than you would be talking to the liberal arts professors or the historians or the lawyers or the people who have traditionally been involved.

I think that has an effect: it starts to show people who are oriented towards storytelling but have capabilities which are align more with data science skill sets that there’s a real task for them. We can’t message that early enough as an industry. We can’t message it early enough as an educator to get people into those tracks. We have to really make sure that the teaching is high quality and that we’re not just carried away with the idea of the new thing, we need to think pretty deeply about how we get those skills.

What sort of basic sort of statistical teaching do you need? What are the skills you need for data visualization? How do you need to introduce design as well as computer science skills into the classroom, in a way which makes sense for stories? How do you tier that understanding?

You're always going to produce superstars. Hopefully, we’ll be producing superstars in this arena soon as well.

We need to take the mission seriously. Then we need to build resources around it. And that’s difficult for educational organizations because it takes time to introduce new courses. It takes time to signal that this is something you think is important.

I think we’ve done a reasonable job of that so far at Columbia, but we’ve got a lot further to go. It's important that institutions like Columbia do take the lead and demonstrate that we think this is something that has to be a core curriculum component.

That’s hard, because journalism schools are known for producing writers. They’re known for different types of narratives. They are not necessarily lauded for producing math or computer science majors. That has to change.

Related:

May 16 2012

How to start a successful business in health care at Health 2.0 conference

Great piles of cash are descending on entrepreneurs who develop health care apps, but that doesn't make it any easier to create a useful one that your audience will adopt. Furthermore, lowered costs and streamlined application development technique let you fashion a working prototype faster than ever, but that also reduces the time you can fumble around looking for a business model. These were some of the insights I got at Spring Fling 2012: Matchpoint Boston, put on by Health 2.0 this week.

This conference was a bit of a grab-bag, including one-on-one meetings between entrepreneurs and their potential funders and customers, keynotes and panels by health care experts, round-table discussions among peers, and lightning-talk demos. I think the hallway track was the most potent part of this conference, and it was probably planned that way. The variety at the conference mirrors the work of Health 2.0 itself, which includes local chapters, challenges, an influential blog, and partnerships with a range of organizations. Overall, I appreciated the chance to get a snapshot of a critical industry searching for ways to make a positive difference in the world while capitalizing on ways to cut down on the blatant waste and mismanagement that bedevil the multi-trillion-dollar health care field.

Let's look, for instance, at the benefits of faster development time. Health IT companies go through fairly standard early stages (idea, prototype, incubator, venture capital funding) but cochairs Indu Subaiya and Matthew Holt showed slides demonstrating that modern techniques can leave companies in the red for less time and accelerate earnings. On the other hand, Jonathan Bush of athenahealth gave a keynote listing bits of advice for company founders and admitting that his own company had made significant errors that required time to recover from. Does the fast pace of modern development leave less room for company heads to make the inevitable mistakes?

I also heard Margaret Laws, director of the California HealthCare Foundation's Innovations Fund, warn that most of the current applications being developed for health care aim to salve common concerns among doctors or patients but don't address what she calls the "crisis points" in health care. Brad Fluegel of Health Evolution Partners observed that, with the flood of new entrepreneurs in health IT, a lot of old ideas are being recycled without adequate attention to why they failed before.

I'm afraid this blog is coming out too negative, focusing on the dour and the dire, but I do believe that health IT needs to acknowledge its risks in order to avoid squandering the money and attention it's getting, and on the positive side to reap the benefits of this incredibly fertile moment of possibilities in health care. Truly, there's a lot to celebrate in health IT as well. Here are some of the fascinating start-ups I saw at the show:

  • hellohealth aims at that vast area of health care planning and administration that cries out for efficiency improvements--the area where we could do the most good by cutting costs without cutting back on effective patient care. Presenter Shahid Shah described the company as the intersection of patient management with revenue cycle management. They plan to help physicians manage appointments and follow-ups better, and rationalize the whole patient experience.

  • hellohealth will offer portals for patients as well. They're unique, so far as I know, in charging patients for certain features.

  • Corey Booker demo'd onPulse, which aims to bring together doctors with groups of patients, and patients with groups of the doctors treating them. For instance, when a doctor finds an online article of interest to diabetics, she can share it with all the patients in her practice suffering from diabetes. onPulse also makes it easier for a doctor to draw in others who are treating the same patient. The information built up about their interactions can be preserved for billing.

    onPulse overlaps in several respects with HealthTap, a doctor-patient site that I've covered several times and for which an onPulse staffer expressed admiration. But HealthTap leaves discussions out in the open, whereas onPulse connects doctors and patients in private.

  • HealthPasskey.com is another one of these patient/doctor services with a patient portal. It allows doctors to upload continuity of care documents in the standard CCD format to the patient's site, and supports various services such as making appointments.

    A couple weeks ago I reported a controversy over hospitals' claims that they couldn't share patient records with the patients. Check out the innovative services I've just highlighted here as a context for judging whether the technical and legal challenges for hospitals are really too daunting. I recognize that each of the sites I've described pick off particular pieces of the EHR problem and that opening up the whole kit and kaboodle is a larger task, but these sites still prove that all the capabilities are in place for institutions willing to exploit them.

  • GlobalMed has recently released a suitcase-sized box that contains all the tools required to do a standard medical exam. This allows traveling nurse practitioners or other licensed personnel to do a quick check-up at a patient's location without requiring a doctor or a trip to the clinic. Images can also be taken. Everything gets uploaded to a site where a doctor can do an assessment and mark up records later. The suitcase weighs about 30 pounds, rolls on wheels, and costs about $30,000 (price to come down if they start manufacturing in high quantities).

  • SwipeSense won Health 2.0's 100 Day Innovation Challenge. They make a simple device that hospital staff can wear on their belts and wipe their hands on. This may not be as good as washing your hands, but takes advantage of people's natural behavior and reduces the chance of infections. It also picks up when someone is using the device and creates reports about compliance. SwipeSense is being tested at the Rush University Medical Center.

  • Thryve, one of several apps that helps you track your food intake and make better choices, won the highest audience approval at Thursday's Launch! demos.

  • Winner of last weekend's developer challenge was No Sleep Kills, an app that aims to reduce accidents related to sleep deprivation (I need a corresponding app to guard against errors from sleep-deprived blogging). You can enter information on your recent sleep patterns and get back a warning not to drive.

It's worth noting that the last item in that list, No Sleep Kills, draws information from Health and Human Services's Healthy People site. This raises the final issue I want to bring up in regard to the Spring Fling. Sophisticated developers know their work depends heavily on data about public health and on groups of patients. HHS has actually just released another major trove of public health statistics. Our collective knowledge of who needs help, what works, and who best delivers the care would be immensely enhanced if doctors and institutions who currently guard their data would be willing to open it up in aggregate, non-identifiable form. I recently promoted this ideal in coverage of Sage Congress.

In the entirely laudable drive to monetize improvements in health care, I would like the health IT field to choose solutions that open up data rather than keep it proprietary. One of the biggest problems with health care, in this age of big data and incredibly sophisticated statistical tools, is our tragedy of the anti-commons where each institution seeks to gain competitive advantage through hoarding its data. They don't necessarily use their own data in socially beneficial ways, either (they're more interested in ratcheting up opportunities for marketing expensive care). We need collective sources of data in order to make the most of innovation.

OSCON 2012 Healthcare Track — The conjunction of open source and open data with health technology promises to improve creaking infrastructure and give greater control and engagement to patients. Learn more at OSCON 2012, being held July 16-20 in Portland, Oregon.

Save 20% on registration with the code RADAR20

May 03 2012

Strata Week: Google offers big data analytics

Here are the data stories that caught my attention this week.

BigQuery for everyone

Google BigQueryGoogle has released its big data analytics service BigQuery to the public. Initially made available to a small number of developers late last year, now anyone can sign up for the service. A free account lets you query up to 100 GB of data per month, with the option to pay for additional queries and/or storage.

"Google's aim may be to sell data storage in the cloud, as much as it is to sell analytic software," says The New York Times' Quentin Hardy. "A company using BigQuery has to have data stored in the cloud data system, which costs 12 cents a gigabyte a month, for up to two terabytes, or 2,000 gigabytes. Above that, prices are negotiated with Google. BigQuery analysis costs 3.5 cents a gigabyte of data processed."

The interface for BigQuery is meant to lower the bar for these sorts of analytics — there's a UI and a REST interface. In the Times article, Google project manager Ju-kay Kwek says Google is hoping developers build tools that encourage widespread use of the product by executives and other non-developers.

If folks are looking for something to cut their teeth on with BigQuery, GitHub's public timeline is now a publicly available dataset. The data is being synced regularly, so you can query things like popular languages and popular repos. To that end, GitHub is running a data visualization contest.

The Data Journalism Handbook

The Data Journalism Handbook had its release this week at the 2012 International Journalism Festival in Italy. The book, which is freely available and openly licensed, was a joint effort of the European Journalism Centre and the Open Knowledge Foundation. It's meant to serve as a reference for those interested in the field of data journalism.

In the introduction, "Deutsche Welle's" Mirko Lorenz writes:

"Today, news stories are flowing in as they happen, from multiple sources, eye-witnesses, blogs, and what has happened is filtered through a vast network of social connections, being ranked, commented and more often than not, ignored. This is why data journalism is so important. Gathering, filtering and visualizing what is happening beyond what the eye can see has a growing value."


Velocity 2012: Web Operations & Performance — The smartest minds in web operations and performance are coming together for the Velocity Conference, being held June 25-27 in Santa Clara, Calif.



Save 20% on registration with the code RADAR20

Open data is a joke?

Tim Slee fired a shot across the bow of the open data movement with a post this week arguing that "the open data movement is a joke." Moreover, it's not a movement at all, he contends. Slee turns a critical eye to the Canadian government's open data efforts in particular, noting that: "The Harper government's actions around 'open government,' and the lack of any significant consequences for those actions, show just how empty the word 'open' has become."

Slee is also critical of open data efforts outside the government, calling the open data movement "a phrase dragged out by media-oriented personalities to cloak a private-sector initiative in the mantle of progressive politics."

Open data activist David Eaves responded strongly to Slee's post with one of his own, recognizing his own frustrations with "one of the most — if not the most — closed and controlling [governments] in Canada's history." But Eaves takes exception with the ways in which Slee characterizes the open data movement. He contends that many of the corporations involved with the open data movement — something Slee charges has corrupted open data — are U.S. corporations (and points out that in Canada, "most companies don't even know what open data is"). Eaves adds, too, that many of these corporations are led by geeks.

Eaves writes:

"Just as an authoritarian regime can run on open-source software, so too might it engage in open data. Open data is not the solution for Open Government (I don't believe there is a single solution, or that Open Government is an achievable state of being — just a goal to pursue consistently), and I don't believe anyone has made the case that it is. I know I haven't. But I do believe open data can help. Like many others, I believe access to government information can lead to better informed public policy debates and hopefully some improved services for citizens (such as access to transit information). I'm not deluded into thinking that open data is going to provide a steady stream of obvious 'gotcha moments' where government malfeasance is discovered, but I am hopeful that government data can arm citizens with information that the government is using to inform its decisions so that they can better challenge, and ultimately help hold accountable, said government."

Got data news?

Feel free to email me.

Related:

April 19 2012

Strata Week: The rise of the robot essay graders

Here are a few of the data stories that caught my attention this week.

Automated essay-scoring software scores as well as humans

Taking a test at the Real Estate Investing College by Casey Serin, on FlickrRobot essay graders: They grade the same as humans. That's the conclusion of a study conducted by University of Akron's Dean of the College of Education Mark Shermis and Kaggle data scientist Ben Hamner. The researchers examined some 22,000 essays that were administered to junior and high school students as part of their states' standardized testing process, comparing the grades given by human graders and those given by automated grading software. They found that "overall, automated essay scoring was capable of producing scores similar to human scores for extended-response writing items with equal performance for both source-based and traditional writing genre" (PDF of the report).

"The demonstration showed conclusively that automated essay scoring systems are fast, accurate, and cost effective," says Tom Vander Ark, managing partner at the investment firm Learn Capital, in a press release touting the study's results.

The study coincides with an active competition hosted on Kaggle and sponsored by the Hewlett Foundation, in which data scientists are challenged with developing the best algorithm to automatically grade student essays. "Better tests support better learning," noted the foundation's Education Program Director Barbara Chow in the press release. "This demonstration of rapid and accurate automated essay scoring will encourage states to include more writing in their state assessments. And, the more we can use essays to assess what students have learned, the greater the likelihood they'll master important academic content, critical thinking, and effective communication."

Personally, I like writing for a human audience. Bots leave really stupid blog comments — but I bet there's an algorithm for that too.

Velocity 2012: Web Operations & Performance — The smartest minds in web operations and performance are coming together for the Velocity Conference, being held June 25-27 in Santa Clara, Calif.

Save 20% on registration with the code RADAR20

Scaling Instagram

The billion-dollar acquisition of the mobile photo-sharing app Instagram was big news last week. The news coincided with a presentation by co-founder Mike Krieger at an AirBnB Tech Talk about how the startup managed to scale to 30 million users worldwide with a small team of back-end developers (a very small team, in fact). Krieger's presentation is interesting in its own right, of course, but news of the acquisition by Facebook certainly fueled interest — in the deal and in the tech under the Instagram hood.

Krieger's slides can be found here. The presentation details some of the early and ongoing challenges of handling the app's increasing number of users and their photos (including the recent roll-out of an Android app, which added another million new users in just 12 hours). Although Instagram hasn't suffered any major outages of the likes seen by Twitter and Tumblr, Krieger does note a number of early problems, including a missing favicon.ico that was causing a lot of 404 errors in Django.

Auditing data.gov.uk

The UK's National Audit Office has just released its look at the government's open data efforts, reports The Guardian. Although the open data initiative gets good marks for the "tsunami of data" it's released — 8,300 datasets — there remain questions about cost and usage.

Governmental departments estimate they spend between £53,000 and £500,000 each year on publishing the data, with the police crime maps, for example, costing £300,000 to set up and £150,000 per year to maintain. And it's not clear that the data is in demand, according to the National Audit Office report: "None of the departments reported significant spontaneous public demand for the standard dataset releases." This doesn't account for the ways in which third-party vendors may be using the data, however.

Big Data Week

April 23-29 is "Big Data Week," an event created by DataSift that will feature meetups and hackathons in several cities around the world. Big Data Week aims to bring together the "core communities" — data scientists, data technologies, data visualization, and data business. A list of events is available on the Big Data Week website.

Got data news?

Feel free to email me.

Photo: Taking a test at the Real Estate Investing College

April 18 2012

What responsibilities and challenges come with open government?

A historic Open Government Partnership launched in New York City last September with 8 founding countries. Months later representatives from 73 countries and 55 governments have come together to present their open government action plans and formally endorse the principles in the Open Government Partnership. Yesterday, hundreds of attendees from government, civil society, media and the private sector watched in person and online as Brazilian President Dilma Rousseff spoke about her country's efforts to root out corruption and engage the Brazilian people in governance and more active citizenship. United States Secretary of State Hillary Clinton preceded her, defining an open or closed society as a key dividing line of the 21st century.

Today's agenda includes more regional breakout