Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

August 24 2012

The Direct Project has teeth, but it needs pseudonymity

Yesterday, Meaningful Use Stage 2 was released.

You can read the final rule here and you can read the announcement here.

As we read and parse the 900 or so pages of government-issued goodness, you can expect lots of commentary and discussion. Geek Doctor  already has a summary and Motorcycle Guy can be expected to help us all parse the various health IT standards that have been newly blessed. Expect Brian Ahier to also be worth reading over the next couple of days.

I just wanted to highlight one thing about the newly released rules. As suspected, the actual use of the Direct Project will be a requirement. That means certified electronic health record (EHR) systems will have to implement it, and doctors and hospitals will have to exchange data with it. Awesome.

More importantly, this will be the first health IT interoperability standard with teeth. The National Institute of Standards and Technology (NIST) will be setting up an interoperability test server. It will not be enough to say that you support Direct. People will have to prove it. I love it. This has been the problem with Health Level 7 et al for years. No central standard for testing always means an unreliable and weak standard. Make no mistake, this is a critical and important move from the Office of the National Coordinator for Health Information Technology (ONC).

(Have I mentioned that I love that Farzad Mostashari — our current ONC — uses Twitter? I also love that he has a sense of humor!)

Now we just need to make sure that patient pseudonymity is supported on the Directed Exchange network. To do otherwise is to force patients to trust the whole network rather than to merely trust their own doctors. I have already made that case, but it is really nice to see both Arien Malec (founding coordinator of the Direct Project) and Sean Nolan (chief architect at Microsoft HealthVault) have weighed in with similar thoughts. Malec wrote a  lovely piece that details how to translate patient pseudonymity into NIST assurance levels. Nolan talked about how difficult it would be for HealthVault to have to do identity proofing on patients.

In order to emphasize my point in a more public way, I have beat everyone to the punch and registered the account of DaffyDuck@direct.healthvault.com. Everyone seems to think this is just the kind of madness that we need to avoid. But this is just the kind of madness that patients need to really protect their privacy.

Here’s an example. Lets imagine that I am a pain patient and I am seeking treatment from a pain specialist named Dr. John Doe who works at Pain No More clinic. His Direct address might be john.doe@direct.painnomore.com

Now if I provide DaffyDuck@direct.healthvault.com to Dr. Doe and Dr. Doe can be sure that he is always talking to me when he communicates with that address, then there is nothing else that needs to happen here. There never needs to be a formal cryptographic association between DaffyDuck@direct.healthvault.com and Fred Trotter. I know that there is a connection and my doctor knows that there is a connection and those are the only people that need to know.

If any cryptographic or otherwise published association were to exist, then anyone who had access to my public certifications and/or knew of communication between john.doe@direct.painnomore.com and DaffyDuck@direct.healthvault.com could make a pretty good guess about my health care status. I am not actually interested in trusting the Directed Exchange network. I am interested in trusting through the Directed Exchange network. Pseudonymity gives both me and my doctor that privilege. If a patient wants to give a different Direct email address to every doctor they work with, they should have that option.

This is a critical patient privacy feature of the Direct protocol and it was designed in from the beginning. It is critical that later policy makers not screw this up.

Strata Rx — Strata Rx, being held Oct. 16-17 in San Francisco, is the first conference to bring data science to the urgent issues confronting healthcare.

Save 20% on registration with the code RADAR20

Related:

August 23 2012

Balancing health privacy with innovation will rely on improving informed consent

Society is now faced with how to balance the privacy of the individual patient with the immense social good that could come through great health data sharing. Making health data more open and fluid holds both the potential to be hugely beneficial for patients and enormously harmful. As my colleague Alistair Croll put it this summer, big data may well be a civil rights issue that much of the world doesn’t know about yet.

This will likely be a tension that persists throughout my lifetime as technology spreads around the world. While big data breaches are likely to make headlines, more subtle uses of health data have the potential to enable employers, insurers or governments to discriminate — or worse. Figuring out shopping habits can also allow a company to determine a teenager was pregnant before her father did. People simply don’t realize how much about their lives can be intuited through analysis of their data exhaust.

To unlock the potential of health data for the public good, informed consent must mean something. Patients must be given the information and context for how and why their health data will be used in clear, transparent ways. To do otherwise is to duck the responsibility that comes with the immense power of big data.

In search of an informed opinion on all of these issues, I called up Deven McGraw (@HealthPrivacy), the director of the Health Privacy Project at the Center for Democracy and Technology (CDT). Our interview, lightly edited for content and clarity, follows.

Should people feel better about, say, getting their genome decoded because the Patient Protection and Affordable Care Act (PPACA) was upheld by the Supreme Court? What about other health-data-based discrimination?

Deven McGraw: The reality that someone could get data and use it in a way that harms people, and the inability to get affordable health care insurance or to get insurance at all, has been a significant driver of the concerns people have about health data for a very long time.

It’s not the only driver of people’s privacy concerns. Just removing the capacity for entities to do harm to individuals using their health data is probably not going to fully resolve the problem.

It’s important to pursue from a policy standpoint, but it’s also the case that people feel stigmatized by their health data. They feel health care is something they want to be able to pursue privately, even if the chances are very low that anybody could get the information and actually harm them with it by denying them insurance or denying them employment, which is an area we actually haven’t fully fixed. Your ability to get life insurance or disability insurance was not fixed by the Affordable Care Act.

Even if you fix all of those issues, privacy protections are about building an ecosystem in health care that people will trust. When they need to seek care that might be deemed to be sensitive to them, they feel like they can go get care and have some degree of confidence that that information isn’t going to be shared outside of those who have a need to know it, like health care providers or their insurance company if they are seeking to be reimbursed for care.

Obviously, public health can play a role. The average individual doesn’t realize that, often, their health data is sent to public health authorities if they have certain conditions or diseases, or even just as a matter of routine reporting for surveillance purposes.

Some of this is about keeping a trustworthy environment for individuals so they can seek the care they need. That’s a key goal for privacy. The other aspect of it is making sure we have the data available for important public purposes, but in a way that respects the fact that this data is sensitive.

We need to not be disrupting the trust people have in the health care system. If you can’t give people some reasonable assurance about how their data is used, there are lots of folks who will decline to seek care or will lie about health conditions when truthfulness is important.

Are health care providers and services being honest about health data use?

Deven McGraw: Transparency and openness about how we use health data in this country is seriously lacking. Part of it is the challenge of being up front with people, disclosing things they need to know but not overwhelming them with so much information in a consent form that they just sign on the bottom and don’t read it and don’t fully understand it.

It’s really hard to get notice and transparency right, and it’s a constant struggle. The FTC report on privacy talks a lot about how hard it is to be transparent with people about data sharing on the Internet or data collection on your mobile phone.

Ideally, for people to be truly informed, you’d give them an exhaustive amount of information, right? But if you give them too much information, the chances that they’ll read it and understand it are really low. So then people end up saying “yes” to things they don’t even realize they’re saying “yes” to.

On the other hand, we haven’t put enough effort into trying different ways of educating people. We, for too long, have assumed that, in a regulatory regime that provides permissive data sharing within the health care context, people will just trust their doctors.

I’ve been to a seminar on researchers getting access to data. The response of one of the researchers to the issue of “How do you regulate data uses for research?” and “What’s the role of consent?” and “What’s the role of institutional review boards?” was, “Well, people should just trust researchers.”

Maybe some people trust researchers, but that’s not really good enough. You have to earn trust. There’s a lot of room for innovative thinking along those lines. It’s something I have been increasingly itchy to try to dive into in more detail with folks who have expertise in other disciplines, like sociology, anthropology and community-building. What does it take to build trusted infrastructures that are transparent, open and that people are comfortable participating in?

There’s no magic endpoint for privacy, like, “Oh, we have privacy now,” versus, “Oh, we don’t have privacy.” To me, the magic endpoint is whether we have a health care data ecosystem that most people trust. It’s not perfect, but it’s good enough. I don’t think we’re quite there yet.

What specifically needs to happen on the openness and transparency side?

Deven McGraw: When I hear about state-based or community-based health information exchanges (HIE) going out and having town meetings with people in advance of building the HIE, working with the physicians in their communities to make sure they’re having conversations with their patients about what’s happening in the community, the electronic records movement and the HIE they’re building, that’s exactly the kind of work you need to do. When I hear about initiatives where people have actually spent the time and resources to educate patients, it warms my heart.

Yes, it’s fairly time- and resource-intensive, but in my view, it pays huge dividends on the backend, in terms of the level of trust and buy-in the community has to what you’re doing. It’s not that big of a leap. If you live in a community where people tend to go to church on Sundays, reach out to the churches. Ask pastors if you can speak to their congregations. Or bring them along and have them speak to their own congregations. Do tailored outreach to people through vehicles they already trust.

I think a lot of folks are pressed for time and resources, and feeling like digitization of the health care system should have happened yesterday. People are dying from errors in care and not getting their care coordinated. All of that is true. But this is a huge change in health care, and we have to do the hard work of outreach and engagement of patients in the community to do it right. In many ways, it’s a community-by-community effort. We’re not one great ad campaign away from solving the issue.

Is there mistrust for good reason? There have been many years of data breaches, coupled with new fears sparked by hacks enabled by electronic health record (EHR) adoption.

Deven McGraw: Part of it is when one organization has a breach, it’s like they all did. There is a collective sense that the health care industry, overall, doesn’t have its act together. It can’t quite figure out how to do electronic records right when we have breach after breach after breach. If breaches were rare, that would be one thing, but they’re still far too frequent. Institutions aren’t taking the basic steps they could take to reduce breaches. You’re never going to eliminate them, but you certainly can reduce them below where we are today.

In the context of certain secondary data uses, like when parents find out after the fact that blood spots collected from their infants at birth are being used for multiple purposes, you don’t want to surprise people about what you’re doing with their health information, the health information of their children, and that of other family members.

I think most people would be quite comfortable with many uses of health data, including those that do not necessarily directly benefit them but benefit human beings generally, or people who have the same disease, or people like them. In general, we’re actually a fairly generous people, but we don’t want to be surprised by unexpected use.

There’s a tremendous amount of work to do. We have a tendency to think issues like secondary use get resolved by asking for people’s consent ahead of time. Consent certainly plays an important role in protecting people’s privacy and giving them some sense of control over their health care information, but because consent in practice actually doesn’t do such a great job, we can’t over-rely on it to create a trust ecosystem. We have to do more on the openness and transparency side so that people are brought along with where we’re going with these health information technology initiatives.

What do doctors’ offices need to do to mitigate risks from EHR adoption?

Deven McGraw: It’s absolutely true that digitizing data in the absence of the adoption of technical security safeguards puts it much more at risk. You cannot hack into a paper file. If you lose a paper file, you’ve lost one paper file. If you lose a laptop, you’ve lost hundreds of thousands of records, if they’re on there and you didn’t encrypt the data.

Having said that, there are so many tools that you can adopt in technology with data in a digital form that are much stronger from a security standpoint than is true in paper. You can set role-based access controls for who can access a file and track who has accessed a file. You can’t do that with paper. You can use encryption technology. You can use stronger identity and authentication levels in order to make sure the person accessing the data is, in fact, authorized to do so and is the person they say they are on the other end of the transaction.

We do need people to adopt those technologies and to use them. You’re talking about a health care industry that has stewardship over some of the most sensitive data we have out there. It’s not the nuclear codes, but for a lot of people, it’s incredibly sensitive data — and yet, we trust the security of that data to rank amateurs. Honestly, there’s no other way around that. The people who create the data are physicians. Most of them don’t have any experience in digital security.

We have to count on the vendors of those systems to build in security safeguards. Then, we have to count on giving physicians and their staffs as much guidance as we can so they can actually deploy those safeguards and don’t create workarounds to them that create bigger holes in the security of the data and potentially create patient safety issues. It’s an enormously complex problem, but it’s not the reason to say, “Well, we can’t do this.”

Due to the efforts of many advocates, as you well know, health data has become a big part of the discussion around open data. What are the risks and benefits?

Deven McGraw: Honestly, people throw the term “open data” around a lot, and I don’t think we have a clear, agreed-upon definition for what that is. It’s a mistake to think that open data means all health data, fully identifiable, available to anybody, for any purpose, for any reason. That would be a totally “open data” environment. No rules, no restrictions, you get what you need. It certainly would be transformative and disruptive. We’d probably learn an awful lot from the data. But at the same time, we’ve potentially completely blown trust in the system because we can give no guarantees to anybody about what’s going to happen with their data.

Open data means creating rules that provide greater access to data but with certain privacy protections in place, such as protections on minimizing the identifiability of the data. That typically has been the way government health data initiatives, for example, have been put forth: the data that’s open, that’s really widely accessible, is data with a very low risk of being identified with a particular patient. The focus is typically on the patient side, but I think, even in the government health data initiatives that I’m aware of, it’s also not identifiable to a particular provider. It’s aggregate data that says, “How often is that very expensive cardiac surgery done and in what populations of patients? What are the general outcomes?” That’s all valuable information but not data at the granular level, where it’s traceable to an individual and, therefore, puts at risk the notion they can confidentially receive care.

We have a legal regime that opens the doors to data use much wider if you mask identifiers in data, remove them from a dataset, or use statistical techniques to render data to have a very low risk of re-identification.

We don’t have a perfect regulatory regime on that front. We don’t have any strict prohibitions against re-identifying that data. We don’t have any mechanisms to hold people accountable if they do re-identify the data, or if they release a dataset that then is subsequently re-identified because they were sloppy in how they de-identified it. We don’t have the regulatory regime that we need to create an open data ecosystem that loosens some of the regulatory constraints on data but in a way that still protects individual privacy to the maximum extent possible.

Again, it’s a balance. What we’re trying to achieve is a very low risk of re-identification; it’s impossible to achieve no risk of re-identification and still have any utility in the data whatsoever, or so I’m told by researchers.

It is absolutely the path we need to proceed down. Our health care system is so messed up and fails so many people so much of the time. If we don’t start using this data, learning from it and deploying testing initiatives more robustly, getting rid of the ones that don’t work and more aggressively pursuing the interventions that do, we’re never going to move the needle. And consumers suffer from that. They suffer as much — or more, quite frankly — than they do from violations of their privacy. The end goal here is we need to create a health care system that works and that people trust. You need to be pursuing both of those goals.

Congress hasn’t had much appetite for passing new health care legislation in this election year, aside from the House trying to repeal PPACA 33 times. That would seem to leave reform up to the U.S. Department of Health and Human Services (HHS), for now. Where do we stand with rulemaking around creating regulatory regimes like those you’ve described?

Deven McGraw: HHS certainly has made progress in some areas and is much more proactive on the issue of health privacy than I think they have been in the past. On the other hand, I’m not sure I can point to significant milestones that have been met.

Some of that isn’t completely their fault. Within an administration, there are multiple decision-makers. For any sort of policy matter where you want to move the ball forward, there’s a fair amount of process and approval up the food chain that has to happen. In an election year, in particular, that whole mechanism gets jammed up in ways that are often disappointing.

We still don’t have finalized HIPAA rules from the HITECH changes, which is really unfortunate. And I’m now thinking we won’t see them until November. Similarly, there was a study on de-identification that Congress called for in the HITECH legislation. It’s two years late, creeping up on three, and we still haven’t seen it.

You can point to those and you sort of throw up your hands and say, “What’s going on? Who’s minding the store?” If we know and appreciate that we need to build this trust environment to move the needle forward on using health IT to address quality and cost issues, then it starts to look very bad in terms of a report card for the agency on those elements.

On the other hand, you have the Office of the National Coordinator for Health IT doing more work through setting funding conditions on states to get them to adopt privacy frameworks for health information exchanges.

You have progress being made by the Office for Civil Rights on HIPAA enforcement. They’re doing audits. They now have more enforcement actions in the last year than they had in the total number of years the regulations were in effect prior to this year. They’re getting serious.

From a research perspective, the other thing I would mention is the efforts to try to make the common rule — the set of rules that governs federally funded research — more consistent with HIPAA and more workable for researchers. But there’s still a lot of work to be done on that initiative as well.

We started the conversation by saying these are really complex issues. They don’t get fixed overnight. In some respects, fast action is less important than getting it right, but we really should be making faster progress than we are.

What does the trend toward epatients and peer-to-peer health care mean for privacy, prevention and informed consent?

Deven McGraw: I think the epatient movement and increase in people’s use of Internet technologies, like social media, to connect with one another and to share data and experiences in order to improve their care is an enormously positive development. It’s a huge game-changer. And, of course, it will have an impact on privacy.

One of the things we’re going to have to keep an eye on is the fact that one out of six people, when they’re surveyed, say they practice what we call “privacy protective behaviors.” They lie to their physicians. They don’t go to seek the care they need, which is often the case with respect to mental illness. Or they seek care out of their area in order to prevent people they might know who work in their local hospital from seeing their data.

But that’s only one out of six people who say that, so there are an awful lot of people who, from the start, even when they’re healthy, are completely comfortable being open with their data. Certainly when you’re sick, your desire is to get better. And when you’re seriously sick, your desire is to save your life. Anything you can do to do that means whatever qualms you may have had about sharing your data, if they existed at all, go right out the window.

On the other hand, we have to build an ecosystem that the one out of six people can use as well. That’s what I’m focusing on, in particular, in the consumer-facing health space, the “Health 2.0 space” and on social media sites. It really should be the choice of the individual about how much data they share. There needs to be a lot of transparency about how that data is used.

When I look at a site like PatientsLikeMe, I know some privacy advocates think it’s horrifying and that those people are crazy for sharing the level of detail in their data on that site. On the other hand, I have read few privacy policies that are as transparent and open about what they do with data as PatientsLikeMe’s policy. They’re very up front about what happens with that data. I’m confident that people who go on the site absolutely know what they’re doing. It’s not my job to tell them they can’t do it.

But we also need to create environments so people can get the benefits of sharing their experiences with other patients who have their disease — because it’s enormously empowering and groundbreaking from a research standpoint — without telling people they have to throw all of their inhibitions out the door.

You clearly care about these issues deeply. How did you end up in your current position?

Deven McGraw: I was working at the National Partnership for Women and Families, which is another nonprofit advocacy organization here in town [Washington, D.C.], as their chief operating officer. I had been working on health information technology policy issues — specifically, the use of technology to improve health care quality and trying to normalize or reduce costs. I was getting increasingly involved in being a consumer representative at meetings on health information technology adoption and applauding health information technology adoption, and thinking about what the benefits for consumers were and how we can make sure that those happen.

The one issue that kept coming up in those conversations was that we know we need to build in privacy protections for this data and we know we have HIPAA — so where are the gaps? What do we need to do to move the ball forward? I never had enough time to really drill down on that issue because I was the chief operating officer of a nonprofit.

At the time, the Health Privacy Project was an independent nonprofit organization that had been founded and led by one dynamic woman, Janlori Goldman. She was living in New York and was ready to transition the work to somebody else. When the CDT approached me about being the director of the Health Privacy Project, they were moving it into CDT to take advantage of all the technology and Internet expertise at a time when we’re trying to move health care aggressively into the digital space. It was a perfect storm, with me wishing I had more time to think through the privacy issues and then this job aligned with the way I like to do policy work, which is to sit down with stakeholders and try to figure out a solution that ideally works for everybody.

From a timing perspective, it couldn’t have been more perfect. It was right during the consideration of bills on health IT. There were hearings on health information technology that we were invited to testify in. We wrote papers to put ourselves on the map, in terms of our theory about how to do privacy well in health IT and what the role of patient consent should be in privacy, because a lot of the debate was really spinning around that one issue. It’s been a terrific experience. It’s an enormous challenge.

Strata Rx — Strata Rx, being held Oct. 16-17 in San Francisco, is the first conference to bring data science to the urgent issues confronting health care.

Save 20% on registration with the code RADAR20

Related:

August 14 2012

Solving the Wanamaker problem for health care

By Tim O’Reilly, Julie Steele, Mike Loukides and Colin Hill

“The best minds of my generation are thinking about how to make people click ads.” — Jeff Hammerbacher, early Facebook employee

“Work on stuff that matters.” — Tim O’Reilly

Doctors in operating room with data

In the early days of the 20th century, department store magnate John Wanamaker famously said, “I know that half of my advertising doesn’t work. The problem is that I don’t know which half.”

The consumer Internet revolution was fueled by a search for the answer to Wanamaker’s question. Google AdWords and the pay-per-click model transformed a business in which advertisers paid for ad impressions into one in which they pay for results. “Cost per thousand impressions” (CPM) was replaced by “cost per click” (CPC), and a new industry was born. It’s important to understand why CPC replaced CPM, though. Superficially, it’s because Google was able to track when a user clicked on a link, and was therefore able to bill based on success. But billing based on success doesn’t fundamentally change anything unless you can also change the success rate, and that’s what Google was able to do. By using data to understand each user’s behavior, Google was able to place advertisements that an individual was likely to click. They knew “which half” of their advertising was more likely to be effective, and didn’t bother with the rest.

Since then, data and predictive analytics have driven ever deeper insight into user behavior such that companies like Google, Facebook, Twitter, Zynga, and LinkedIn are fundamentally data companies. And data isn’t just transforming the consumer Internet. It is transforming finance, design, and manufacturing — and perhaps most importantly, health care.

How is data science transforming health care? There are many ways in which health care is changing, and needs to change. We’re focusing on one particular issue: the problem Wanamaker described when talking about his advertising. How do you make sure you’re spending money effectively? Is it possible to know what will work in advance?

Too often, when doctors order a treatment, whether it’s surgery or an over-the-counter medication, they are applying a “standard of care” treatment or some variation that is based on their own intuition, effectively hoping for the best. The sad truth of medicine is that we don’t really understand the relationship between treatments and outcomes. We have studies to show that various treatments will work more often than placebos; but, like Wanamaker, we know that much of our medicine doesn’t work for half or our patients, we just don’t know which half. At least, not in advance. One of data science’s many promises is that, if we can collect data about medical treatments and use that data effectively, we’ll be able to predict more accurately which treatments will be effective for which patient, and which treatments won’t.

A better understanding of the relationship between treatments, outcomes, and patients will have a huge impact on the practice of medicine in the United States. Health care is expensive. The U.S. spends over $2.6 trillion on health care every year, an amount that constitutes a serious fiscal burden for government, businesses, and our society as a whole. These costs include over $600 billion of unexplained variations in treatments: treatments that cause no differences in outcomes, or even make the patient’s condition worse. We have reached a point at which our need to understand treatment effectiveness has become vital — to the health care system and to the health and sustainability of the economy overall.

Why do we believe that data science has the potential to revolutionize health care? After all, the medical industry has had data for generations: clinical studies, insurance data, hospital records. But the health care industry is now awash in data in a way that it has never been before: from biological data such as gene expression, next-generation DNA sequence data, proteomics, and metabolomics, to clinical data and health outcomes data contained in ever more prevalent electronic health records (EHRs) and longitudinal drug and medical claims. We have entered a new era in which we can work on massive datasets effectively, combining data from clinical trials and direct observation by practicing physicians (the records generated by our $2.6 trillion of medical expense). When we combine data with the resources needed to work on the data, we can start asking the important questions, the Wanamaker questions, about what treatments work and for whom.

The opportunities are huge: for entrepreneurs and data scientists looking to put their skills to work disrupting a large market, for researchers trying to make sense out of the flood of data they are now generating, and for existing companies (including health insurance companies, biotech, pharmaceutical, and medical device companies, hospitals and other care providers) that are looking to remake their businesses for the coming world of outcome-based payment models.

Making health care more effective

Downloadable Editions

This report will soon be available in PDF, EPUB and Mobi formats. Submit your email to be alerted when the downloadable editions are ready.

What, specifically, does data allow us to do that we couldn’t do before? For the past 60 or so years of medical history, we’ve treated patients as some sort of an average. A doctor would diagnose a condition and recommend a treatment based on what worked for most people, as reflected in large clinical studies. Over the years, we’ve become more sophisticated about what that average patient means, but that same statistical approach didn’t allow for differences between patients. A treatment was deemed effective or ineffective, safe or unsafe, based on double-blind studies that rarely took into account the differences between patients. With the data that’s now available, we can go much further. The exceptions to this are relatively recent and have been dominated by cancer treatments, the first being Herceptin for breast cancer in women who over-express the Her2 receptor. With the data that’s now available, we can go much further for a broad range of diseases and interventions that are not just drugs but include surgery, disease management programs, medical devices, patient adherence, and care delivery.

For a long time, we thought that Tamoxifen was roughly 80% effective for breast cancer patients. But now we know much more: we know that it’s 100% effective in 70 to 80% of the patients, and ineffective in the rest. That’s not word games, because we can now use genetic markers to tell whether it’s likely to be effective or ineffective for any given patient, and we can tell in advance whether to treat with Tamoxifen or to try something else.

Two factors lie behind this new approach to medicine: a different way of using data, and the availability of new kinds of data. It’s not just stating that the drug is effective on most patients, based on trials (indeed, 80% is an enviable success rate); it’s using artificial intelligence techniques to divide the patients into groups and then determine the difference between those groups. We’re not asking whether the drug is effective; we’re asking a fundamentally different question: “for which patients is this drug effective?” We’re asking about the patients, not just the treatments. A drug that’s only effective on 1% of patients might be very valuable if we can tell who that 1% is, though it would certainly be rejected by any traditional clinical trial.

More than that, asking questions about patients is only possible because we’re using data that wasn’t available until recently: DNA sequencing was only invented in the mid-1970s, and is only now coming into its own as a medical tool. What we’ve seen with Tamoxifen is as clear a solution to the Wanamaker problem as you could ask for: we now know when that treatment will be effective. If you can do the same thing with millions of cancer patients, you will both improve outcomes and save money.

Dr. Lukas Wartman, a cancer researcher who was himself diagnosed with terminal leukemia, was successfully treated with sunitinib, a drug that was only approved for kidney cancer. Sequencing the genes of both the patient’s healthy cells and cancerous cells led to the discovery of a protein that was out of control and encouraging the spread of the cancer. The gene responsible for manufacturing this protein could potentially be inhibited by the kidney drug, although it had never been tested for this application. This unorthodox treatment was surprisingly effective: Wartman is now in remission.

While this treatment was exotic and expensive, what’s important isn’t the expense but the potential for new kinds of diagnosis. The price of gene sequencing has been plummeting; it will be a common doctor’s office procedure in a few years. And through Amazon and Google, you can now “rent” a cloud-based supercomputing cluster that can solve huge analytic problems for a few hundred dollars per hour. What is now exotic inevitably becomes routine.

But even more important: we’re looking at a completely different approach to treatment. Rather than a treatment that works 80% of the time, or even 100% of the time for 80% of the patients, a treatment might be effective for a small group. It might be entirely specific to the individual; the next cancer patient may have a different protein that’s out of control, an entirely different genetic cause for the disease. Treatments that are specific to one patient don’t exist in medicine as it’s currently practiced; how could you ever do an FDA trial for a medication that’s only going to be used once to treat a certain kind of cancer?

Foundation Medicine is at the forefront of this new era in cancer treatment. They use next-generation DNA sequencing to discover DNA sequence mutations and deletions that are currently used in standard of care treatments, as well as many other actionable mutations that are tied to drugs for other types of cancer. They are creating a patient-outcomes repository that will be the fuel for discovering the relation between mutations and drugs. Foundation has identified DNA mutations in 50% of cancer cases for which drugs exist (information via a private communication), but are not currently used in the standard of care for the patient’s particular cancer.

The ability to do large-scale computing on genetic data gives us the ability to understand the origins of disease. If we can understand why an anti-cancer drug is effective (what specific proteins it affects), and if we can understand what genetic factors are causing the cancer to spread, then we’re able to use the tools at our disposal much more effectively. Rather than using imprecise treatments organized around symptoms, we’ll be able to target the actual causes of disease, and design treatments tuned to the biology of the specific patient. Eventually, we’ll be able to treat 100% of the patients 100% of the time, precisely because we realize that each patient presents a unique problem.

Personalized treatment is just one area in which we can solve the Wanamaker problem with data. Hospital admissions are extremely expensive. Data can make hospital systems more efficient, and to avoid preventable complications such as blood clots and hospital re-admissions. It can also help address the challenge of hot-spotting (a term coined by Atul Gawande): finding people who use an inordinate amount of health care resources. By looking at data from hospital visits, Dr. Jeffrey Brenner of Camden, NJ, was able to determine that “just one per cent of the hundred thousand people who made use of Camden’s medical facilities accounted for thirty per cent of its costs.” Furthermore, many of these people came from two apartment buildings. Designing more effective medical care for these patients was difficult; it doesn’t fit our health insurance system, the patients are often dealing with many serious medical issues (addiction and obesity are frequent complications), and have trouble trusting doctors and social workers. It’s counter-intuitive, but spending more on some patients now results in spending less on them when they become really sick. While it’s a work in progress, it looks like building appropriate systems to target these high-risk patients and treat them before they’re hospitalized will bring significant savings.

Many poor health outcomes are attributable to patients who don’t take their medications. Eliza, a Boston-based company started by Alexandra Drane, has pioneered approaches to improve compliance through interactive communication with patients. Eliza improves patient drug compliance by tracking which types of reminders work on which types of people; it’s similar to the way companies like Google target advertisements to individual consumers. By using data to analyze each patient’s behavior, Eliza can generate reminders that are more likely to be effective. The results aren’t surprising: if patients take their medicine as prescribed, they are more likely to get better. And if they get better, they are less likely to require further, more expensive treatment. Again, we’re using data to solve Wanamaker’s problem in medicine: we’re spending our resources on what’s effective, on appropriate reminders that are mostly to get patients to take their medications.

More data, more sources

The examples we’ve looked at so far have been limited to traditional sources of medical data: hospitals, research centers, doctor’s offices, insurers. The Internet has enabled the formation of patient networks aimed at sharing data. Health social networks now are some of the largest patient communities. As of November 2011, PatientsLikeMe has over 120,000 patients in 500 different condition groups; ACOR has over 100,000 patients in 127 cancer support groups; 23andMe has over 100,000 members in their genomic database; and diabetes health social network SugarStats has over 10,000 members. These are just the larger communities, thousands of small communities are created around rare diseases, or even uncommon experiences with common diseases. All of these communities are generating data that they voluntarily share with each other and the world.

Increasingly, what they share is not just anecdotal, but includes an array of clinical data. For this reason, these groups are being recruited for large-scale crowdsourced clinical outcomes research.

Thanks to ubiquitous data networking through the mobile network, we can take several steps further. In the past two or three years, there’s been a flood of personal fitness devices (such as the Fitbit) for monitoring your personal activity. There are mobile apps for taking your pulse, and an iPhone attachment for measuring your glucose. There has been talk of mobile applications that would constantly listen to a patient’s speech and detect changes that might be the precursor for a stroke, or would use the accelerometer to report falls. Tanzeem Choudhury has developed an app called Be Well that is intended primarily for victims of depression, though it can be used by anyone. Be Well monitors the user’s sleep cycles, the amount of time they spend talking, and the amount of time they spend walking. The data is scored, and the app makes appropriate recommendations, based both on the individual patient and data collected across all the app’s users.

Continuous monitoring of critical patients in hospitals has been normal for years; but we now have the tools to monitor patients constantly, in their home, at work, wherever they happen to be. And if this sounds like big brother, at this point most of the patients are willing. We don’t want to transform our lives into hospital experiences; far from it! But we can collect and use the data we constantly emit, our “data smog,” to maintain our health, to become conscious of our behavior, and to detect oncoming conditions before they become serious. The most effective medical care is the medical care you avoid because you don’t need it.

Paying for results

Once we’re on the road toward more effective health care, we can look at other ways in which Wanamaker’s problem shows up in the medical industry. It’s clear that we don’t want to pay for treatments that are ineffective. Wanamaker wanted to know which part of his advertising was effective, not just to make better ads, but also so that he wouldn’t have to buy the advertisements that wouldn’t work. He wanted to pay for results, not for ad placements. Now that we’re starting to understand how to make treatment effective, now that we understand that it’s more than rolling the dice and hoping that a treatment that works for a typical patient will be effective for you, we can take the next step: Can we change the underlying incentives in the medical system? Can we make the system better by paying for results, rather than paying for procedures?

It’s shocking just how badly the incentives in our current medical system are aligned with outcomes. If you see an orthopedist, you’re likely to get an MRI, most likely at a facility owned by the orthopedist’s practice. On one hand, it’s good medicine to know what you’re doing before you operate. But how often does that MRI result in a different treatment? How often is the MRI required just because it’s part of the protocol, when it’s perfectly obvious what the doctor needs to do? Many men have had PSA tests for prostate cancer; but in most cases, aggressive treatment of prostate cancer is a bigger risk than the disease itself. Yet the test itself is a significant profit center. Think again about Tamoxifen, and about the pharmaceutical company that makes it. In our current system, what does “100% effective in 80% of the patients” mean, except for a 20% loss in sales? That’s because the drug company is paid for the treatment, not for the result; it has no financial interest in whether any individual patient gets better. (Whether a statistically significant number of patients has side-effects is a different issue.) And at the same time, bringing a new drug to market is very expensive, and might not be worthwhile if it will only be used on the remaining 20% of the patients. And that’s assuming that one drug, not two, or 20, or 200 will be required to treat the unlucky 20% effectively.

It doesn’t have to be this way.

In the U.K., Johnson & Johnson, faced with the possibility of losing reimbursements for their multiple myeloma drug Velcade, agreed to refund the money for patients who did not respond to the drug. Several other pay-for-performance drug deals have followed since, paving the way for the ultimate transition in pharmaceutical company business models in which their product is health outcomes instead of pills. Such a transition would rely more heavily on real-world outcome data (are patients actually getting better?), rather than controlled clinical trials, and would use molecular diagnostics to create personalized “treatment algorithms.” Pharmaceutical companies would also focus more on drug compliance to ensure health outcomes were being achieved. This would ultimately align the interests of drug makers with patients, their providers, and payors.

Similarly, rather than paying for treatments and procedures, can we pay hospitals and doctors for results? That’s what Accountable Care Organizations (ACOs) are about. ACOs are a leap forward in business model design, where the provider shoulders any financial risk. ACOs represent a new framing of the much maligned HMO approaches from the ’90s, which did not work. HMOs tried to use statistics to predict and prevent unneeded care. The ACO model, rather than controlling doctors with what the data says they “should” do, uses data to measure how each doctor performs. Doctors are paid for successes, not for the procedures they administer. The main advantage that the ACO model has over the HMO model is how good the data is, and how that data is leveraged. The ACO model aligns incentives with outcomes: a practice that owns an MRI facility isn’t incentivized to order MRIs when they’re not necessary. It is incentivized to use all the data at its disposal to determine the most effective treatment for the patient, and to follow through on that treatment with a minimum of unnecessary testing.

When we know which procedures are likely to be successful, we’ll be in a position where we can pay only for the health care that works. When we can do that, we’ve solved Wanamaker’s problem for health care.

Enabling data

Data science is not optional in health care reform; it is the linchpin of the whole process. All of the examples we’ve seen, ranging from cancer treatment to detecting hot spots where additional intervention will make hospital admission unnecessary, depend on using data effectively: taking advantage of new data sources and new analytics techniques, in addition to the data the medical profession has had all along.

But it’s too simple just to say “we need data.” We’ve had data all along: handwritten records in manila folders on acres and acres of shelving. Insurance company records. But it’s all been locked up in silos: insurance silos, hospital silos, and many, many doctor’s office silos. Data doesn’t help if it can’t be moved, if data sources can’t be combined.

There are two big issues here. First, a surprising amount of medical records are still either hand-written, or in digital formats that are scarcely better than hand-written (for example, scanned images of hand-written records). Getting medical records into a format that’s computable is a prerequisite for almost any kind of progress. Second, we need to break down those silos.

Anyone who has worked with data knows that, in any problem, 90% of the work is getting the data in a form in which it can be used; the analysis itself is often simple. We need electronic health records: patient data in a more-or-less standard form that can be shared efficiently, data that can be moved from one location to another at the speed of the Internet. Not all data formats are created equal, and some are certainly better than others: but at this point, any machine-readable format, even simple text files, is better than nothing. While there are currently hundreds of different formats for electronic health records, the fact that they’re electronic means that they can be converted from one form into another. Standardizing on a single format would make things much easier, but just getting the data into some electronic form, any, is the first step.

Once we have electronic health records, we can link doctor’s offices, labs, hospitals, and insurers into a data network, so that all patient data is immediately stored in a data center: every prescription, every procedure, and whether that treatment was effective or not. This isn’t some futuristic dream; it’s technology we have now. Building this network would be substantially simpler and cheaper than building the networks and data centers now operated by Google, Facebook, Amazon, Apple, and many other large technology companies. It’s not even close to pushing the limits.

Electronic health records enable us to go far beyond the current mechanism of clinical trials. In the past, once a drug has been approved in trials, that’s effectively the end of the story: running more tests to determine whether it’s effective in practice would be a huge expense. A physician might get a sense for whether any treatment worked, but that evidence is essentially anecdotal: it’s easy to believe that something is effective because that’s what you want to see. And if it’s shared with other doctors, it’s shared while chatting at a medical convention. But with electronic health records, it’s possible (and not even terribly expensive) to collect documentation from thousands of physicians treating millions of patients. We can find out when and where a drug was prescribed, why, and whether there was a good outcome. We can ask questions that are never part of clinical trials: is the medication used in combination with anything else? What other conditions is the patient being treated for? We can use machine learning techniques to discover unexpected combinations of drugs that work well together, or to predict adverse reactions. We’re no longer limited by clinical trials; every patient can be part of an ongoing evaluation of whether his treatment is effective, and under what conditions. Technically, this isn’t hard. The only difficult part is getting the data to move, getting data in a form where it’s easily transferred from the doctor’s office to analytics centers.

To solve problems of hot-spotting (individual patients or groups of patients consuming inordinate medical resources) requires a different combination of information. You can’t locate hot spots if you don’t have physical addresses. Physical addresses can be geocoded (converted from addresses to longitude and latitude, which is more useful for mapping problems) easily enough, once you have them, but you need access to patient records from all the hospitals operating in the area under study. And you need access to insurance records to determine how much health care patients are requiring, and to evaluate whether special interventions for these patients are effective. Not only does this require electronic records, it requires cooperation across different organizations (breaking down silos), and assurance that the data won’t be misused (patient privacy). Again, the enabling factor is our ability to combine data from different sources; once you have the data, the solutions come easily.

Breaking down silos has a lot to do with aligning incentives. Currently, hospitals are trying to optimize their income from medical treatments, while insurance companies are trying to optimize their income by minimizing payments, and doctors are just trying to keep their heads above water. There’s little incentive to cooperate. But as financial pressures rise, it will become critically important for everyone in the health care system, from the patient to the insurance executive, to assume that they are getting the most for their money. While there’s intense cultural resistance to be overcome (through our experience in data science, we’ve learned that it’s often difficult to break down silos within an organization, let alone between organizations), the pressure of delivering more effective health care for less money will eventually break the silos down. The old zero-sum game of winners and losers must end if we’re going to have a medical system that’s effective over the coming decades.

Data becomes infinitely more powerful when you can mix data from different sources: many doctor’s offices, hospital admission records, address databases, and even the rapidly increasing stream of data coming from personal fitness devices. The challenge isn’t employing our statistics more carefully, precisely, or guardedly. It’s about letting go of an old paradigm that starts by assuming only certain variables are key and ends by correlating only these variables. This paradigm worked well when data was scarce, but if you think about, these assumptions arise precisely because data is scarce. We didn’t study the relationship between leukemia and kidney cancers because that would require asking a huge set of questions that would require collecting a lot of data; and a connection between leukemia and kidney cancer is no more likely than a connection between leukemia and flu. But the existence of data is no longer a problem: we’re collecting the data all the time. Electronic health records let us move the data around so that we can assemble a collection of cases that goes far beyond a particular practice, a particular hospital, a particular study. So now, we can use machine learning techniques to identify and test all possible hypotheses, rather than just the small set that intuition might suggest. And finally, with enough data, we can get beyond correlation to causation: rather than saying “A and B are correlated,” we’ll be able to say “A causes B,” and know what to do about it.

Building the health care system we want

The U.S. ranks 37th out of developed economies in life expectancy and other measures of health, while by far outspending other countries on per-capita health care costs. We spend 18% of GDP on health care, while other countries on average spend on the order of 10% of GDP. We spend a lot of money on treatments that don’t work, because we have a poor understanding at best of what will and won’t work.

Part of the problem is cultural. In a country where even pets can have hip replacement surgery, it’s hard to imagine not spending every penny you have to prolong Grandma’s life — or your own. The U.S. is a wealthy nation, and health care is something we choose to spend our money on. But wealthy or not, nobody wants ineffective treatments. Nobody wants to roll the dice and hope that their biology is similar enough to a hypothetical “average” patient. No one wants a “winner take all” payment system in which the patient is always the loser, paying for procedures whether or not they are helpful or necessary. Like Wanamaker with his advertisements, we want to know what works, and we want to pay for what works. We want a smarter system where treatments are designed to be effective on our individual biologies; where treatments are administered effectively; where our hospitals our used effectively; and where we pay for outcomes, not for procedures.

We’re on the verge of that new system now. We don’t have it yet, but we can see it around the corner. Ultra-cheap DNA sequencing in the doctor’s office, massive inexpensive computing power, the availability of EHRs to study whether treatments are effective even after the FDA trials are over, and improved techniques for analyzing data are the tools that will bring this new system about. The tools are here now; it’s up to us to put them into use.

Recommended reading:

We recommend the following books regarding technology, data, and health care reform:

August 13 2012

A grisly job for data scientists

Missing Person: Ai Weiwei by Daquella manera, on FlickrJavier Reveron went missing from Ohio in 2004. His wallet turned up in New York City, but he was nowhere to be found. By the time his parents arrived to search for him and hand out fliers, his remains had already been buried in an unmarked indigent grave. In New York, where coroner’s resources are precious, remains wait a few months to be claimed before they’re buried by convicts in a potter’s field on uninhabited Hart Island, just off the Bronx in Long Island Sound.

The story, reported by the New York Times last week, has as happy an ending as it could given that beginning. In 2010 Reveron’s parents added him to a national database of missing persons. A month later police in New York matched him to an unidentified body and his remains were disinterred, cremated and given burial ceremonies in Ohio.

Reveron’s ordeal suggests an intriguing, and impactful, machine-learning problem. The Department of Justice maintains separate national, public databases for missing people, unidentified people and unclaimed people. Many records are full of rich data that is almost never a perfect match to data in other databases — hair color entered by a police department might differ from how it’s remembered by a missing person’s family; weights fluctuate; scars appear. Photos are provided for many missing people and some unidentified people, and matching them is difficult. Free-text fields in many entries describe the circumstances under which missing people lived and died; a predilection for hitchhiking could be linked to a death by the side of a road.

I’ve called the Department of Justice (DOJ) to ask about the extent to which they’ve worked with computer scientists to match missing and unidentified people, and will update when I hear back. One thing that’s not immediately apparent is the public availability of the necessary training set — cases that have been successfully matched and removed from the lists. The DOJ apparently doesn’t comment on resolved cases, which could make getting this data difficult. But perhaps there’s room for a coalition to request the anonymized data and manage it to the DOJ’s satisfaction while distributing it to capable data scientists.

Photo: Missing Person: Ai Weiwei by Daquella manera, on Flickr

Related:

With new maps and apps, the case for open transit gets stronger

OpenTripPlanner logoEarlier this year, the news broke that Apple would be dropping default support for transit in iOS 6. For people (like me) who use the iPhone to check transit routes and times when they travel, that would mean losing a key feature. It also has the potential to decrease the demand for open transit data from cities, which has open government advocates like Clay Johnson concerned about public transportation and iOS 6.

This summer, New York City-based non-profit Open Plans launched a Kickstarter campaign to fund a new iPhone transit app to fill in the gap.

“From the public perspective, this campaign is about putting an important feature back on the iPhone,” wrote Kevin Webb, a principal at Open Plans, via email. “But for those of us in the open government community, this is about demonstrating why open data matters. There’s no reason why important civic infrastructure should get bound up in a fight between Apple and Google. And in communities with public GTFS, it won’t.”

Open Plans already had a head start in creating a patch for the problem: they’ve been working with transit agencies over the past few years to build OpenTripPlanner, an open source application that uses open transit data to help citizens make transit decisions.

“We were already working on the back-end to support this application but decided to pursue the app development when we heard about Apple’s plans with iOS,” explained Webb. “We were surprised by the public response around this issue (the tens of thousands who joined Walkscore’s petition and wanted to offer a constructive response).”

Crowdfunding digital city infrastructure?

That’s where Kickstarter and crowdfunding come into the picture. The Kickstarter campaign would help Open Plans make OpenTripPlanner a native iPhone app, followed by Android and HTML5 apps down the road. Open Plans’ developers have decided that given mobile browser limitations in iOS, particularly the speed of JavaScript apps, an HTML5 app isn’t a replacement for a native app.

Kickstarter has emerged as a platform for more than backing ideas for cool iPod watches or services. Increasingly, it’s looking like Kickstarter could be a new way for communities to collectively fund the creation of civic apps or services for their towns that government isn’t agile enough to deliver for them. While that’s sure to make some people in traditional positions of power uneasy, it also might be a way to do an end-around traditional procurement processes — contingent upon cities acting as platforms for civic startups to build upon.

“We get foundation and agency-based contract support for our work already,” wrote Webb. “However, we’ve discovered that foundations aren’t interested in these kinds of rider-facing tools, and most agencies don’t have the discretion or the budget to support the development of something universal. As a result, these kinds of projects require speculative investment. One of the awesome things about open data is that it lets folks respond directly and constructively by building something to solve a need, rather than waiting on others to fix it for them.

“Given our experience with transit and open data, we knew that this was a solvable problem; it just required someone to step up to the challenge. We were well positioned to take on that role. However, as a non-profit, we don’t have unlimited resources, so we’d ask for help. Kickstarter seems like the right fit, given the widespread public interest in the problem, and an interesting way to get the message out about our perspective. Not only do we get to raise a little money, but we’re also sharing the story about why open data and open source matter for public infrastructure with a new audience.”

Civic code in active re-use

Webb, who has previously staked out a position that iOS 6 will promote innovation in public transit, says that OpenTripPlanner is already a thriving open source project, with a recent open transit launch in New Orleans, a refresh in Portland and other betas soon to come.

In a welcome development for DC cyclists (including this writer), a version of OpenTripPlanner went live recently at BikePlanner.org. The web app, which notably uses OpenStreetMap as a base layer, lets users either plot a course for their own bike or tap into the Capital Bikeshare network in DC. BikePlanner is a responsive HTML5 app, which means that it looks good and works well on a laptop, iPad, iPhone or Android device.

Focusing on just open transit apps, however, would be to miss the larger picture of new opportunities to build improvements to digital city infrastructure.

There’s a lot more at stake than just rider-facing tools, in Webb’s view — from urban accessibility to extending the GTFS data ecosystem.

“There’s a real need to build a national (and eventually international) transit data infrastructure,” said Webb. “Right now, the USDOT has completely fallen down on the job. The GTFS support we see today is entirely organic, and there’s no clear guidance anywhere about making data public or even creating GTFS in the first place. That means building universal apps takes a lot of effort just wrangling data.”

August 09 2012

Five elements of reform that health providers would rather not hear about

The quantum leap we need in patient care requires a complete overhaul of record-keeping and health IT. Leaders of the health care field know this and have been urging the changes on health care providers for years, but the providers are having trouble accepting the changes for several reasons.

What’s holding them back? Change certainly costs money, but the industry is already groaning its way through enormous paradigm shifts to meet current financial and regulatory climate, so the money might as well be directed to things that work. Training staff to handle patients differently is also difficult, but the staff on the floor of these institutions are experiencing burn-out and can be inspired by a new direction. The fundamental resistance seems to be expectations by health providers and their vendors about the control they need to conduct their business profitably.

A few months ago I wrote an article titled Five Tough Lessons I Had to Learn About Health Care. Here I’ll delineate some elements of a new health care system that are promoted by thought leaders, that echo the evolution of other industries, that will seem utterly natural in a couple decades–but that providers are loathe to consider. I feel that leaders in the field are not confronting that resistance with an equivalent sense of conviction that these changes are crucial.

1. Reform will not succeed unless electronic records standardize on a common, robust format

Records are not static. They must be combined, parsed, and analyzed to be useful. In the health care field, records must travel with the patient. Furthermore, we need an explosion of data analysis applications in order to drive diagnosis, public health planning, and research into new treatments.

Interoperability is a common mantra these days in talking about electronic health records, but I don’t think the power and urgency of record formats can be conveyed in eight-syllable words. It can be conveyed better by a site that uses data about hospital procedures, costs, and patient satisfaction to help consumers choose a desirable hospital. Or an app that might prevent a million heart attacks and strokes.

Data-wise (or data-ignorant), doctors are stuck in the 1980s, buying proprietary record systems that don’t work together even between different departments in a hospital, or between outpatient clinics and their affiliated hospitals. Now the vendors are responding to pressures from both government and the market by promising interoperability. The federal government has taken this promise as good coin, hoping that vendors will provide windows onto their data. It never really happens. Every baby step toward opening up one field or another requires additional payments to vendors or consultants.

That’s why exchanging patient data (health information exchange) requires a multi-million dollar investment, year after year, and why most HIEs go under. And that’s why the HL7 committee, putatively responsible for defining standards for electronic health records, keeps on putting out new, complicated variations on a long history of formats that were not well enough defined to ensure compatibility among vendors.

The Direct project and perhaps the nascent RHEx RESTful exchange standard will let hospitals exchange the limited types of information that the government forces them to exchange. But it won’t create a platform (as suggested in this PDF slideshow) for the hundreds of applications we need to extract useful data from records. Nor will it open the records to the masses of data we need to start collecting. It remains to be seen whether Accountable Care Organizations, which are the latest reform in U.S. health care and are described in this video, will be able to use current standards to exchange the data that each member institution needs to coordinate care. Shahid Shaw has laid out in glorious detail the elements of open data exchange in health care.

2. Reform will not succeed unless massive amounts of patient data are collected

We aren’t giving patients the most effective treatments because we just don’t know enough about what works. This extends throughout the health care system:

  • We can’t prescribe a drug tailored to the patient because we don’t collect enough data about patients and their reactions to the drug.

  • We can’t be sure drugs are safe and effective because we don’t collect data about how patients fare on those drugs.

  • We don’t see a heart attack or other crisis coming because we don’t track the vital signs of at-risk populations on a daily basis.

  • We don’t make sure patients follow through on treatment plans because we don’t track whether they take their medications and perform their exercises.

  • We don’t target people who need treatment because we don’t keep track of their risk factors.

Some institutions have adopted a holistic approach to health, but as a society there’s a huge amount more that we could do in this area. O’Reilly is hosting a conference called Strata Rx on this subject.

Leaders in the field know what health care providers could accomplish with data. A recent article even advises policy-makers to focus on the data instead of the electronic records. The question is whether providers are technically and organizationally prepped to accept it in such quantities and variety. When doctors and hospitals think they own the patients’ records, they resist putting in anything but their own notes and observations, along with lab results they order. We’ve got to change the concept of ownership, which strikes deep into their culture.

3. Reform will not succeed unless patients are in charge of their records

Doctors are currently acting in isolation, occasionally consulting with the other providers seen by their patients but rarely sharing detailed information. It falls on the patient, or a family advocate, to remember that one drug or treatment interferes with another or to remind treatment centers of follow-up plans. And any data collected by the patient remains confined to scribbled notes or (in the modern Quantified Self equivalent) a web site that’s disconnected from the official records.

Doctors don’t trust patients. They have some good reasons for this: medical records are complicated documents in which a slight rewording or typographical error can change the meaning enough to risk a life. But walling off patients from records doesn’t insulate them against errors: on the contrary, patients catch errors entered by staff all the time. So ultimately it’s better to bring the patient onto the team and educate her. If a problem with records altered by patients–deliberately or through accidental misuse–turns up down the line, digital certificates can be deployed to sign doctor records and output from devices.

The amounts of data we’re talking about get really big fast. Genomic information and radiological images, in particular, can occupy dozens of gigabytes of space. But hospitals are moving to the cloud anyway. Practice Fusion just announced that they serve 150,000 medical practitioners and that “One in four doctors selecting an EHR today chooses Practice Fusion.” So we can just hand over the keys to the patients and storage will grow along with need.

The movement for patient empowerment will take off, as experts in health reform told US government representatives, when patients are in charge of their records. To treat people, doctors will have to ask for the records, and the patients can offer the full range of treatment histories, vital signs, and observations of daily living they’ve collected. Applications will arise that can search the data for patterns and relevant facts.

Once again, the US government is trying to stimulate patient empowerment by requiring doctors to open their records to patients. But most institutions meet the formal requirements by providing portals that patients can log into, the way we can view flight reservations on airlines. We need the patients to become the pilots. We also need to give them the information they need to navigate.

4. Reform will not succeed unless providers conform to practice guidlines

Now that the government is forcing doctors to release informtion about outcomes, patients can start to choose doctors and hospitals that offer the best chances of success. The providers will have to apply more rigor to their activities, using checklists and more, to bring up the scores of the less successful providers. Medicine is both a science and an art, but many lag on the science–that is, doing what has been statistically proven to produce the best likely outcome–even at prestigious institutions.

Patient choice is restricted by arbitrary insurance rules, unfortunately. These also contribute to the utterly crazy difficulty determining what a medical procedure will cost as reported by e-Patient Dave and WBUR radio. Straightening out this problem goes way beyond the doctors and hospitals, and settling on a fair, predictable cost structure will benefit them almost as much as patients and taxpayers. Even some insurers have started to see that the system is reaching a dead-end and are erecting new payment mechanisms.

5. Reform will not succeed unless providers and patients can form partnerships

I’m always talking about technologies and data in my articles, but none of that constitutes health. Just as student testing is a poor model for education, data collection is a poor model for medical care. What patients want is time to talk intensively with their providers about their needs, and providers voice the same desires.

Data and good record keeping can help us use our resources more efficiently and deal with the physician shortage, partly by spreading out jobs among other clinical staff. Computer systems can’t deal with complex and overlapping syndromes, or persuade patients to adopt practices that are good for them. Relationships will always have to be in the forefront. Health IT expert Fred Trotter says, “Time is the gas that makes the relationship go, but the technology should be focused on fuel efficiency.”

Arien Malec, former contractor for the Office of the National Coordinator, used to give a speech about the evolution of medical care. Before the revolution in antibiotics, doctors had few tools to actually cure patients, but they live with the patients in the same community and know their needs through and through. As we’ve improved the science of medicine, we’ve lost that personal connection. Malec argued that better records could help doctors really know their patients again. But conversations are necessary too.

The risks and rewards of a health data commons

As I wrote earlier this year in an ebook on data for the public good, while the idea of data as a currency is still in its infancy, it’s important to think about where the future is taking us and our personal data.

If the Obama administration’s smart disclosure initiatives gather steam, more citizens will be able to do more than think about personal data: they’ll be able to access their financial, health, education, or energy data. In the U.S. federal government, the Blue Button initiative, which initially enabled veterans to download personal health data, is now spreading to all federal employees, and it also earned adoption at private institutions like Aetna and Kaiser Permanente. Putting health data to work stands to benefit hundreds of millions of people. The Locker Project, which provides people with the ability to move and store personal data, is another approach to watch.

The promise of more access to personal data, however, is balanced by accompanying risks. Smartphones, tablets, and flash drives, after all, are lost or stolen every day. Given the potential of mhealth, and big data and health care information technology, researchers and policy makers alike are moving forward with their applications. As they do so, conversations and rulemaking about health care privacy will need to take into account not just data collection or retention but context and use.

Put simply, businesses must confront the ethical issues tied to massive aggregation and data analysis. Given that context, Fred Trotter’s post on who owns health data is a crucial read. As Fred highlights, the real issue is not ownership, per se, but “What rights do patients have regarding health care data that refers to them?”

Would, for instance, those rights include the ability to donate personal data to a data commons, much in the same way organs are donated now for research? That question isn’t exactly hypothetical, as the following interview with John Wilbanks highlights.

Wilbanks, a senior fellow at the Kauffman Foundation and director of the Consent to Research Project, has been an advocate for open data and open access for years, including a stint at Creative Commons; a fellowship at the World Wide Web Consortium; and experience in the academic, business, and legislative worlds. Wilbanks will be speaking at the Strata Rx Conference in October.

Our interview, lightly edited for content and clarity, follows.

Where did you start your career? Where has it taken you?

John WilbanksJohn Wilbanks: I got into all of this, in many ways, because I studied philosophy 20 years ago. What I studied inside of philosophy was semantics. In the ’90s, that was actually sort of pointless because there wasn’t much semantic stuff happening computationally.

In the late ’90s, I started playing around with biotech data, mainly because I was dating a biologist. I was sort of shocked at how the data was being represented. It wasn’t being represented in a way that was very semantic, in my opinion. I started a software company and we ran that for a while, [and then] sold it during the crash.

I went to the Worldwide Web Consortium, where I spent a year helping start their Semantic Web for Life Sciences project. While I was there, Creative Commons (CC) asked me to come and start their science project because I had known a lot of those guys. When I started my company, I was at the Berkman Center at Harvard Law School, and that’s where Creative Commons emerged from, so I knew the people. I knew the policy and I had gone off and had this bioinformatics software adventure.

I spent most of the last eight years at CC working on trying to build different commons in science. We looked at open access to scientific literature, which is probably where we had the most success because that’s copyright-centric. We looked at patents. We looked at physical laboratory materials, like stem cells in mice. We looked at different legal regimes to share those things. And we looked at data. We looked at both the technology aspects and legal aspects of sharing data and making it useful.

A couple of times over those years, we almost pivoted from science to health because science is so institutional that it’s really hard for any of the individual players to create sharing systems. It’s not like software, where anyone with a PC and an Internet connection can contribute to free software, or Flickr, where anybody with a digital camera can license something under CC. Most scientists are actually restricted by their institutions. They can’t share, even if they want to.

Health kept being interesting because it was the individual patients who had a motivation to actually create something different than the system did. At the same time, we were watching and seeing the capacity of individuals to capture data about themselves exploding. So, at the same time that the capacity of the system to capture data about you exploded, your own capacity to capture data exploded.

That, to me, started taking on some of the interesting contours that make Creative Commons successful, which was that you didn’t need a large number of people. You didn’t need a very large percentage of Wikipedia users to create Wikipedia. You didn’t need a large percentage of free software users to create free software. If this capacity to generate data about your health was exploding, you didn’t need a very large percentage of those people to create an awesome data resource: you needed to create the legal and technical systems for the people who did choose to share to make that sharing useful.

Since Creative Commons is really a copyright-centric organization, I left because the power on which you’re going to build a commons of health data is going to be privacy power, not copyright power. What I do now is work on informed consent, which is the legal system you need to work with instead of copyright licenses, as well as the technologies that then store, clean, and forward user-generated data to computational health and computational disease research.

What are the major barriers to people being able to donate their data in the same way they might donate their organs?

John Wilbanks: Right now, it looks an awful lot like getting onto the Internet before there was the web. The big ISPs kind of dominated the early adopters of computer technologies. You had AOL. You had CompuServe. You had Prodigy. And they didn’t communicate with each other. You couldn’t send email from AOL to CompuServe.

What you have now depends on the kind of data. If the data that interests you is your genotype, you’re probably a 23andMe customer and you’ve got a bunch of your data at 23andMe. If you are the kind of person who has a chronic illness and likes to share information about that illness, you’re probably a customer at PatientsLikeMe. But those two systems don’t interoperate. You can’t send data from one to the other very effectively or really at all.

On top of that, the system has data about you. Your insurance company has your billing records. Your physician has your medical records. Your pharmacy has your pharmacy records. And if you do quantified self, you’ve got your own set of data streams. You’ve got your Fitbit, the data coming off of your smartphone, and your meal data.

Almost all of these are basically populating different silos. In some cases, you have the right to download certain pieces of the data. For the most part, you don’t. It’s really hard for you, as an individual, to build your own, multidimensional picture of your data, whereas it’s actually fairly easy for all of those companies to sell your data to one another. There’s not a lot of technology that lets you share.

What are some of the early signals we’re seeing about data usage moving into actual regulatory language?

John Wilbanks: The regulatory language actually makes it fairly hard to do contextual privacy waiving, in a Creative Commons sense. It’s hard to do granular permissions around privacy in the way you can do granular conditional copyright grants because you don’t have intellectual property. The only legal tool you have is a contract, and the contracts don’t have a lot of teeth.

It’s pretty hard to do anything beyond a gift. It’s more like organ donation, where you don’t get to decide where the organs go. What I’m working on is basically a donation, not a conditional gift. The regulatory environment makes it quite hard to do anything besides that.

There was a public comment period that just finished. It’s an announcement of proposed rulemaking on what’s called the Common Rule, which is the Department of Health and Human Services privacy language. It was looking to re-examine the rules around letting de-identified data or anonymized data out for widespread use. They got a bunch of comments.

There’s controversy as to how de-identified data can actually be and still be useful. There is going to be, probably, a three-to-five year process where they rewrite the Common Rule and it’ll be more modern. No one knows how modern, but it will be at least more modern when that finishes.

Then there’s another piece in the US — HIPAA — which creates a totally separate regime. In some ways, it is the same as the Common Rule, but not always. I don’t think that’s going to get opened up. The way HIPAA works is that they have 17 direct identifiers that are labeled as identifying information. If you strip those out, it’s considered de-identified.

There’s an 18th bucket, which is anything else that can reasonably identify people. It’s really hard to hit. Right now, your genome is not considered to fall under that. I would be willing to bet within a year or two, it will be.

From a regulatory perspective, you’ve got these overlapping regimes that don’t quite fit and both of them are moving targets. That creates a lot of uncertainty from an investment perspective or from an analytics perspective.

How are you thinking about a “health data commons,” in terms of weighing potential risks against potential social good?

John Wilbanks: I think that that’s a personal judgment as to the risk-benefit decision. Part of the difficulty is that the regulations are very syntactic — “This is what re-identification is” — whereas the concept of harm, benefit, or risk is actually something that’s deeply personal. If you are sick, if you have cancer or a rare disease, you have a very different idea of what risk is compared to somebody who thinks of him or herself as healthy.

What we see — and this is born out in the Framingham Heart Study and all sorts of other longitudinal surveys — is that people’s attitudes toward risk and benefit change depending on their circumstances. Their own context really affects what they think is risky and what they think isn’t risky.

I believe that the early data donors are likely to be people for whom there isn’t a lot of risk perceived because the health system already knows that they’re sick. The health system is already denying them coverage, denying their requests for PET scans, denying their requests for access to care. That’s based on actuarial tables, not on their personal data. It’s based on their medical history.

If you’re in that group of people, then the perceived risk is actually pretty low compared to the idea that your data might actually get used or to the idea that you’re no longer passive. Even if it’s just a donation, you’re doing something outside of the system that’s accelerating the odds of getting something discovered. I think that’s the natural group.

If you think back to the numbers of users who are required to create free software or Wikipedia, to create a cultural commons, a very low percentage is needed to create a useful resource.

Depending on who you talk to, somewhere between 5-10% of all Americans either have a rare disease, have it in their first order family, or have a friend with a rare disease. Each individual disease might not have very many people suffering from it, but if you net them all up, it’s a lot of people. Getting several hundred thousand to a few million people enrolled is not an outrageous idea.

When you look at the existing examples of where such commons have come together, what have been the most important concrete positive outcomes for society?

John Wilbanks: I don’t think we have really even started to see them because most people don’t have computable data about themselves. Most people, if they have any data about themselves, have scans of their medical records.
What we really know is that there’s an opportunity cost to not trying, which is that the existing system is really inefficient, very bad at discovering drugs, and very bad at getting those drugs to market in a timely basis.

That’s one of the reasons we’re doing this is as an experiment. We would like to see exactly how effective big computational approaches are on health data. The problem is that there are two ways to get there.

One is through a set of monopoly companies coming together and working together. That’s how semiconductors work. The other is through an open network approach. There’s not a lot of evidence that things besides these two approaches work. Government intervention is probably not going to work.

Obviously, I come down on the open network side. But there’s an implicit belief, I think, both in the people who are pushing the cooperating monopolies approach and the people who are pushing the open networks approach, that there’s enormous power in the big-data-driven approach. We’re just leaving that on the table right now by not having enough data aggregated.

The benefits to health that will come out will be the ability to increasingly, by looking at a multidimensional picture of a person, predict with some confidence whether or not a drug will work, or whether they’re going to get sick, or how sick they’re going to get, or what lifestyle changes they can make to mitigate an illness. Right now, basically, we really don’t know very much.

Pretty Simple Data Privacy

John Wilbanks discussed “Pretty Simple Data Privacy” during a Strata Online Conference in January 2012. His presentation begins at the 7:18 mark in the following video:

Strata Rx — Strata Rx, being held Oct. 16-17 in San Francisco, is the first conference to bring data science to the urgent issues confronting health care.

Save 20% on registration with the code RADAR20

Photo: Science Commons

August 03 2012

Palo Alto looks to use open data to embrace ‘city as a platform’

In the 21st century, one of the strategies cities around the world are embracing to improve services, increase accountability and stimulate economic activity is to publish open data online. The vision for New York City as a data platform earned wider attention last year, when the Big Apple’s first chief digital officer, Rachel Sterne, pitched the idea to the public.

This week, the city of Palo Alto in California joined over a dozen cities around the United States and globe when it launched its own open data platform. The platform includes an application programming interface (API) which enables direct access through a RESTful interface to open government data published in a JSON format. Datasets can also be embedded like YouTube videos, as below:

“We’re excited to bring the value of Open Data to our community. It is a natural complement to our goal of becoming a leading digital city and a connected community,” said James Keene, Palo Alto City Manager, in a prepared statement. “By making valuable datasets easily available to our residents, we’re further removing the barriers to a more inclusive and transparent local government here in Palo Alto.”

The city initially published open datasets that include the 2010 census data, pavement condition, city tree locations, park locations, bicycle paths and hiking trails, creek water level, rainfall and utility data. Open data about Palo Alto budgets, campaign finance, government salaries, regulations, licensing, or performance — which would all offer more insight into traditional metrics for government accountability — were not part of this first release.

“We are delighted to work with a local, innovative Silicon Valley start-up,” said Dr. Jonathan Reichental, Palo Alto’s chief information officer, in a prepared statement. (Junar’s U.S. offices are in Palo Alto.) “Rather than just publishing lists of datasets, the cloud-based Junar platform has enhancement and visualization capabilities that make the data useful even before it is downloaded or consumed by a software application.”

Notably, the city chose to use Junar, a Chilean software company that raised $1.2 million dollars in funding in May 2012. Junar provides data access in the cloud through the software-as-a-service model. There’s now a more competitive marketplace for open data platforms than has existed in years past, with a new venture-backed startup joining the space.

“The City of Palo Alto joins a group of forward-thinking organizations that are using Open Data as a foundation for more efficient delivery of services, information, and enabling innovation,” said Diego May, CEO and co-founder of Junar, in a prepared statement. “By opening data with the Junar Platform, the City of Palo Alto is exposing and sharing valuable data assets and is also empowering citizens to use and create new applications and services.”

The success or failure of Palo Alto’s push to become a more digital city might be more fairly judged in a year, when measuring downstream consumption of its open data in applications and services by citizens — or by government in increasing productivity — will be possible.

In the meantime, Reichental (who may be familiar to Radar readers as O’Reilly Media’s former CIO) provided more perspective via email on what he’s up to in Palo Alto.

What does it mean for a “city to be a platform?”

Reichental: We think of this as both a broad metaphor and a practicality. Not only do our citizens want to be plugged in to our government operations — open data being one way to achieve this among others — but we want our community and other interested parties to build capability on top of our existing data and services. Recognizing the increasing limitations of local government means you have to find creative ways to extend it and engage with those that have the skills and resources to build a rich and seamless public-private partnership.

Why launch an open data initiative now? What success stories convinced you to make the investment?

Reichental: It’s a response to our community’s desire to easily access their data and our want as a City to unleash the data for better community decision-making and solution development.

We also believe that over time an open data portal will become a standard government offering. Palo Alto wants to be ahead of the curve and create a positive model for other communities.

Seldom does a week pass when a software engineer in our community doesn’t ask me for access to a large dataset to build an app. Earlier this year, the City participated in a hackathon at Stanford University that produced a prototype web application in less than 24 hours. We provided the data. They provided the skills. The results were so impressive, we were convinced then that we should scale this model.

How much work did it take to make your data more open? Is it machine-readable? What format? What cost was involved?

Reichental: We’re experimenting with running our IT department like a start-up, so we’re moving fast. We went from vendor selection to live in just a few weeks. The data in our platform can be exported as a CSV or to a Google Spreadsheet. In addition, we provide an API for direct access to the data. The bulk of the cost was internal staff time. The actual software, which is cloud-based, was under $5000 for the first year.

What are the best examples of open data initiatives delivering sustainable services to citizens?

Reichental: Too many to mention. I really like what they’re doing in San Francisco (http://apps.sfgov.org/showcase/) but there are amazing things happening on data.gov and in New York City. Lots of other cities in the US doing neat things. The UK has done some high-quality budget accountability work.

Are you consuming your own open data?

Reichental: You bet we are.

Why does having an API matter?

Reichental: We believe the main advantage of having an API is for app development. Of course, there will be other use cases that we can’t even think of right now.

Why did you choose Junar instead of Socrata, CKAN or the OGPL from the U.S. federal government?

Reichental: We did review most of the products in the marketplace including some open source solutions. Each had merits. We ultimately decided on Junar for a 1-year commitment, as it seemed to strike the right balance of features, cost, and vision alignment.

Palo Alto has a couple developers in it. How are you engaging them to work with your data?

Reichental: That’s quite the understatement! The buzz already in the developer community is palpable. We’ve been swamped with requests and ideas already. We think one of the first places we’ll see good usage is in the myriad of hackathons/code jams held in the area.

What are the conditions for using your data or making apps?

Reichental: Our terms and conditions are straightforward. The data can be freely used by anyone for almost any purpose, but the condition of use is that the City has no liability or relationship with the use of the data or any derivative.

You told Mashable that you’re trying to acting like a “lean startup.” What does that mean, in practice?

Reichental: This initiative is a good example. Rather than spend time making the go-live product perfect, we went for speed-to-market with the minimally viable solution to get community feedback. We’ll use that feedback to quickly improve on the solution.

With the recent go-live of our redesigned public website, we launched it initially as a beta site; warts and all. We received lots of valuable feedback, made many of the suggested changes, and then cutover from the beta to production. We ended up with a better product.

Our intent is to get more useful capability out to our community and City staff in shorter time. We want to function as close as we can with the community that we serve. And that’s a lot of amazing start-ups.

August 01 2012

Big data is our generation’s civil rights issue, and we don’t know it

Data doesn’t invade people’s lives. Lack of control over how it’s used does.

What’s really driving so-called big data isn’t the volume of information. It turns out big data doesn’t have to be all that big. Rather, it’s about a reconsideration of the fundamental economics of analyzing data.

For decades, there’s been a fundamental tension between three attributes of databases. You can have the data fast; you can have it big; or you can have it varied. The catch is, you can’t have all three at once.

The big data trifectaThe big data trifecta

I’d first heard this as the “three V’s of data”: Volume, Variety, and Velocity. Traditionally, getting two was easy but getting three was very, very, very expensive.

The advent of clouds, platforms like Hadoop, and the inexorable march of Moore’s Law means that now, analyzing data is trivially inexpensive. And when things become so cheap that they’re practically free, big changes happen — just look at the advent of steam power, or the copying of digital music, or the rise of home printing. Abundance replaces scarcity, and we invent new business models.

In the old, data-is-scarce model, companies had to decide what to collect first, and then collect it. A traditional enterprise data warehouse might have tracked sales of widgets by color, region, and size. This act of deciding what to store and how to store it is called designing the schema, and in many ways, it’s the moment where someone decides what the data is about. It’s the instant of context.

That needs repeating:

You decide what data is about the moment you define its schema.

With the new, data-is-abundant model, we collect first and ask questions later. The schema comes after the collection. Indeed, big data success stories like Splunk, Palantir, and others are prized because of their ability to make sense of content well after it’s been collected — sometimes called a schema-less query. This means we collect information long before we decide what it’s for.

And this is a dangerous thing.

When bank managers tried to restrict loans to residents of certain areas (known as redlining) Congress stepped in to stop it (with the Fair Housing Act of 1968). They were able to legislate against discrimination, making it illegal to change loan policy based on someone’s race.

Home Owners' Loan Corporation map showing redlining of hazardous districts in 1936Home Owners' Loan Corporation map showing redlining of hazardous districts in 1936
Home Owners’ Loan Corporation map showing redlining of “hazardous” districts in 1936.


“Personalization” is another word for discrimination. We’re not discriminating if we tailor things to you based on what we know about you — right? That’s just better service.

In one case, American Express used purchase history to adjust credit limits based on where a customer shopped, despite his excellent credit limit:

Johnson says his jaw dropped when he read one of the reasons American Express gave for lowering his credit limit: “Other customers who have used their card at establishments where you recently shopped have a poor repayment history with American Express.”

Some of the things white men liked in 2010, according to OKCupidSome of the things white men liked in 2010, according to OKCupidWe’re seeing the start of this slippery slope everywhere from tailored credit-card limits like this one to car insurance based on driver profiles. In this regard, big data is a civil rights issue, but it’s one that society in general is ill-equipped to deal with.

We’re great at using taste to predict things about people. OKcupid’s 2010 blog post “The Real Stuff White People Like” showed just how easily we can use information to guess at race. It’s a real eye-opener (and the guys who wrote it didn’t include everything they learned — some of it was a bit too controversial). They simply looked at the words one group used which others didn’t often use. The result was a list of “trigger” words for a particular race or gender.

Now run this backwards. If I know you like these things, or see you mention them in blog posts, on Facebook, or in tweets, then there’s a good chance I know your gender and your race, and maybe even your religion and your sexual orientation. And that I can personalize my marketing efforts towards you.

That makes it a civil rights issue.

If I collect information on the music you listen to, you might assume I will use that data in order to suggest new songs, or share it with your friends. But instead, I could use it to guess at your racial background. And then I could use that data to deny you a loan.

Want another example? Check out Private Data In Public Ways, something I wrote a few months ago after seeing a talk at Big Data London, which discusses how publicly available last name information can be used to generate racial boundary maps:

Screen from the Mapping London projectScreen from the Mapping London project
Screen from the Mapping London project.


This TED talk by Malte Spitz does a great job of explaining the challenges of tracking citizens today, and he speculates about whether the Berlin Wall would ever have come down if the Stasi had access to phone records in the way today’s governments do.

So how do we regulate the way data is used?

The only way to deal with this properly is to somehow link what the data is with how it can be used. I might, for example, say that my musical tastes should be used for song recommendation, but not for banking decisions.

Tying data to permissions can be done through encryption, which is slow, riddled with DRM, burdensome, hard to implement, and bad for innovation. Or it can be done through legislation, which has about as much chance of success as regulating spam: it feels great, but it’s damned hard to enforce.

There are brilliant examples of how a quantified society can improve the way we live, love, work, and play. Big data helps detect disease outbreaks, improve how students learn, reveal political partisanship, and save hundreds of millions of dollars for commuters — to pick just four examples. These are benefits we simply can’t ignore as we try to survive on a planet bursting with people and shaken by climate and energy crises.

But governments need to balance reliance on data with checks and balances about how this reliance erodes privacy and creates civil and moral issues we haven’t thought through. It’s something that most of the electorate isn’t thinking about, and yet it affects every purchase they make.

This should be fun.

This post originally appeared on Solve for Interesting. This version has been lightly edited.

Strata Conference + Hadoop World — The O’Reilly Strata Conference, being held Oct. 23-25 in New York City, explores the changes brought to technology and business by big data, data science, and pervasive computing. This year, Strata has joined forces with Hadoop World.

Save 20% on registration with the code RADAR20

Related:

July 31 2012

On email privacy, Twitter’s ToS and owning your own platform

The existential challenge for the Internet and society remains is that the technology platforms constitute what many people regard as the new public square are owned by private companies. If you missed the news, Guy Adams, a journalist at the Independent newspaper in England, was suspended by Twitter after he tweeted the corporate email address of a NBC executive, Gary Zenkel. Zenkel is in charge of NBC’s Olympics coverage.

Like many other observers, I assumed that NBC had seen the tweet and filed an objection with Twitter about the email address being tweeted. The email address, after all, was shared with the exhortation to Adams’ followers to write to Zenkel about frustrations with NBC’s coverage of the Olympics, a number of which Jim Stogdill memorably expressed here at Radar and Heidi Moore compared to Wall Street’s hubris.

Today, Guy Adams published two more columns. The first shared his correspondence with Twitter, including a copy of a written statement from an NBC spokesman called Christopher McCloskey that indicated that NBC’s social media department was alerted to Adams’ tweet by Twittersecond column, which followed the @GuyAdams account being reinstated, indicated that NBC had withdrawn their original complaint. Adams tweeted the statement: “we have just received an update from the complainant retracting their original request. Therefore your account has been unsuspended.”

Since the account is back up, is the case over? A tempest in a Twitter teapot? Well, not so much. I see at least three different important issues here related to electronic privacy, Twitter’s terms of service, censorship and how many people think about social media and the Web.

Is a corporate email address private?

Washington Post media critic Erik Wemple is at a loss to explain how tweeting this corporate email address qualifies public rises to the level of disclosing private information.

Can a corporate email address based upon a known nomenclature used by tens of thousands of people “private?” A 2010 Supreme Court ruling on privacy established that electronic messages sent on a corporate server are not private, at least from the employer. But a corporate email address itself? Hmm. Yes, the corporate email address Adams tweeted was available online prior to the tweet if you knew how to find it in a Web search. Danny Sullivan, however, made a strong case that the email address wasn’t widely available in Google, although Adams said he was able to find it in under a minute. There’s also an argument that because an address can be guessed, it is public. Jeff Jarvis and other journalists are saying it isn’t, using the logic that because NBC’s email nomenclature is standardized, it can be easily deduced. I “co-signed” Reuters’ Jack Shafer’s tweet making that assertion.

The question to ask privacy experts, then, is whether a corporate email address is “private” or not.

Fred Cate, a law professor at the Indiana University Maurer School of Law, however, commented via email that “a corporate email address can be private, in the sense that a company protects it and has a legitimate interest in it not being disclosed.” Can it lost its private character due to unauthorized disclosure online? “The answer is probably and regrettably ‘it depends,’” he wrote. “It depends on the breadth of the unauthorized dissemination and the sensitivity of the information and the likely harm if more widely disclosed. An email address that has been disclosed in public blogs would seem fairly widely available, the information is hardly sensitive, and any harm can be avoided by changing the address, so the argument for privacy seems pretty weak to me.”

Danielle Citron, professor of law at the University of Maryland, argues that because Zenkel did not publish his corporate email address on NBC’s site, there’s an argument, though a weak one, that its corporate email addresses are private information only disclosed to a select audience.

“Under privacy tort common law, an unpublished home address has been deemed by courts to be private for purposes of public disclosure of private fact tort if the publication appeared online, even though many people know the address offline,” wrote Citon in an email. “This arose in a cyber harassment case involving privacy torts. Privacy is not a binary concept, that is, one can have privacy in public, at least according to Nader v. GM, the NY [Court of Appeals] found that GM’s zealous surveillance of Ralph Nader, including looking over his shoulder while he took out money from the bank, constituted intrusion of his seclusion, even though he was in public. Now, the court did not find surveillance itself a privacy violation. It was the fact that the surveillance yielded information Nader would have thought no one could see, that is, how much he took out of the bank machine.”

Email is, however, a different case that home addresses, as Citron allowed. “Far less people know one’s home address — neighbors and friends — if a home address is unlisted whereas email addresses are shared with countless people and there is no analogous means to keep it unpublished like home and phone addresses,” Citron wrote. “These qualities may indeed make it a tough sell to suggest that the email address is private.”

Perhaps ironically, the NBC executive’s email address has now been published by many major media outlets and blogs, making it one of the most public email addresses on the planet. Hello, Streisand effect.

Did Twitter break its own Terms of Service?

Specifically, was tweeting someone’s publicly available *work* email address (available online) a a violation of the Twitter’s rules. To a large extent, this hinges upon the answer to the first issue, of privacy.

If a given email address is already public — and it’s been available online for over a year, one line of thinking goes that it can’t be private. Twitter’s position is that it considers a corporate email address to be private and that sharing it therefore breaks the ToS. Alex McGillivray, Twitter’s general counsel, clarified the company’s approach to trust and safety in a post on Twitter’s blog:

We’ve seen a lot of commentary about whether we should have considered a corporate email address to be private information. There are many individuals who may use their work email address for a variety of personal reasons — and they may not. Our Trust and Safety team does not have insight into the use of every user’s email address, and we need a policy that we can implement across all of our users in every instance.

“I do not think privacy can be defined for third parties by terms of service,” wrote Cate, via email. “If Twitter wants to say that the company will treat its users’ email addresses as private it’s fine, but I don’t think it can convincingly say that  other email addresses available in public are suddenly private.”

“If the corporate email was published online previously by the company or by himself, it likely would not amount to public disclosure of private fact under tort law and likely would not meet the strict terms of the TOS, which says nonpublic. Twitter’s policy about email address stems from its judgment that people should not use its service to publicize non-public email addresses, even though such an address is not a secret and countless people in communication with the person know it,” wrote Citon. “Unless Twitter says explicitly, ‘we are adopting this rule for privacy reasons,’ there are reasons that have nothing to do with privacy that might animate that decision, such as preventing fraud.”

The bottom line is that Twitter is a private company with a Terms of Service. It’s not a public utility, as Dave Winer highlighted yesterday, following up today with another argument for a distributed, open system for microblogging. Simply put, there *are* principles for use of Twitter’s platform. They’re in the Rules, Terms of Service and strictures around its API, the evolution of which was recently walked through over at the real-time report.

Ultimately, private companies are bound by the regulations of the FTC or FCC or other relevant regulatory bodies, along with their own rules, not the wishes of users. If Twitter’s users don’t like them or lose trust, their option is to stop using the service or complain loudly. I certainly agree with Jillian C. York, who argues at the EFF that the Guy Adams case demonstrates that Twitter needs a more robust appeals process.

There’s also the question about how the ToS is applied to celebrities on Twitter, who are an attraction for millions of users. In the past, Justin Bieber tweeted someone else’s personal phone number. Spike Lee tweeted a home address, causing someone to receive death threats in Florida. Neither was suspended. Neither the celebrities nor offenders referenced, according to personal accounts, were suspended. In one case, @QueenOfSpain had to get a court order to see any action taken on death threats on Twitter. Twitter’s Safety team has absolutely taken actions in some cases but it certainly might look like there’s a different standard here. The question to ask is whether tickets were filed for Lee or Bieber by the person who was personally affected. Without a ticket, there would be no suspension. Twitter has not commented on that count, under their policy of not commenting about individual users.

Own your own platform

In the wake of this move, there should be some careful consideration by journalists who use Twitter about where and how they do it. McGillivray did explain where Twitter went awry, confirming that someone on the media partnership side of the house flagged a tweet to NBC and reaffirming the principle that Twitter does not remove content on demand:

…we want to apologize for the part of this story that we did mess up. The team working closely with NBC around our Olympics partnership did proactively identify a Tweet that was in violation of the Twitter Rules and encouraged them to file a support ticket with our Trust and Safety team to report the violation, as has now been reported publicly.

Our Trust and Safety team did not know that part of the story and acted on the report as they would any other.

As I stated earlier, we do not proactively report or remove content on behalf of other users no matter who they are. This behavior is not acceptable and undermines the trust our users have in us. We should not and cannot be in the business of proactively monitoring and flagging content, no matter who the user is — whether a business partner, celebrity or friend. As of earlier today, the account has been unsuspended, and we will actively work to ensure this does not happen again.

As I’ve written elsewhere, looking at Twitter, censorship and Internet freedom, my sense is that, of all of the major social media players, Twitter has been one of the leaders in the technology community for sticking up for its users. It’s taken some notable stands, particularly with respect to the matter of fighting to make Twitter subpoena from the U.S. Justice Department regarding user data public.

“Twitter is so hands off, only stepping in to ban people in really narrow circumstances like impersonation and tweeting personal information like non-public email addresses. It also bans impersonation and harassment understood VERY NARROWLY, as credible threats of imminent physical harm,” wrote Citron.  ”That is Twitter’s choice. By my lights, and from conversations with their safety folks, they are very deferential to speech. Indeed, their whole policy is a “we are a speech platform,” implying that what transpires there is public speech and hence subject to great latitude.” 

Much of the good will Twitter had built up, however, may have evaporated after this week. My perspective is that this episode absolutely drives home (again) the need to own your own platform online, particularly for media entities and government. While there is clearly enormous utility in “going where the people are” online to participate in conversations, share news and listen to learn what’s happening, that activity doesn’t come without strings or terms of service.

To be clear, I don’t plan on leaving Twitter any time soon. I do think that McGillivray’s explanation highlights the need for the company to get its internal house in order, in terms of a church and state relationship between its policy and safety team, which makes suspension decisions, and its media partnerships team, which works with parties that might be aggrieved by what Twitter users are tweeting. If Twitter becomes a media company, a future that this NBC Olympics deal suggests, such distinctions could be just as important for it as the “church and state” relationship between traditional newspaper companies or broadcasters.

While that does mean that a media organization could be censored by a distributed denial of service (DDoS) attack (a tactic used in Russia) and that it must get a domain name, set up Web hosting and a content management system, the barrier to entry on all three counts has radically fallen.

The open Internet and World Wide Web, fragile and insecure as they may seem at times, remain the surest way to publish what you want and have it remain online, accessible to the networked world. When you own your own platform online, it’s much harder for a third party company nervous about the reaction of advertisers or media partners to take your voice away.

July 30 2012

Mobile participatory budgeting helps raise tax revenues in Congo

In a world awash in data, connected by social networks and focused on the next big thing, stories about genuine innovation get buried behind the newest shiny app or global development initiative. For billions of people around the world, the reality is that inequality in resources, access to education or clean water, or functional local government remain serious concerns.

South Kivu, located near the border of the Democratic Republic of Congo, has been devastated by the wars that have ravaged the region over the past decade.

Despite that grim context, a pilot program has born unexpected fruit. Mobile technology, civic participation, smarter governance and systems thinking combined to not only give citizens more of a voice in their government but have increased tax revenues as well. Sometimes, positive change happens where one might reasonably least expect it. The video below tells the story. After the jump, World Bank experts talk about story behind the story.

“Beyond creating a more inclusive environment, the beauty of the project in South Kivu is that citizen participation translates into demonstrated and measurable results on mobilizing more public funds for services for the poor,” said Boris Weber, team leader for ICT4Gov at the World Bank Institute for Open Government

, in an interview in Washington. “This makes a strong case when we ask ourselves where the return of investment of open government approaches is.”

Gathering support

The World Bank acted as a convener, in this context, said Tiago Peixoto, an open government specialist at the World Bank, in an interview. The Bank brought together the provincial government and local government to identify the governance issues and propose strategies to address them.

The challenge was straightforward: the South Kivu provincial government needed to relay revenues to the lower level of government to fund services but wasn’t doing so, both because lack of incentives and concerns about how the funds would be spent.

What came out of a four day meeting was a request for a feasibility study on participatory budgeting from the World Bank, said Peixoto.

Initially, the Bank found good conditions with respect to strong civil society, despite years of war. They found a participatory budgeting expert in Cameroon, who came and did workshops with local governments in how the process would work. They chose some cities as control groups, to introduce some scientific rigor.

They shared scholarship on participatory budgeting with all the stakeholders, emphasizing that research shows participation is more effective than penalties in taxation compliance.

“It’s like the process of ownership,” said Peixoto in our interview. “Once you see where money is going, you see how government can work. When you see a wish list, where some things happen and others do not because people aren’t paying, it changes perspectives.”

Hitting the books

When asked to provide more context on the scholarship in this area, Peixoto obliged, via email.

“As shown in a cross-national analysis by Torgler & Schneider (2009), citizens are more willing to pay taxes when they perceive that their preferences are properly taken into account by public institutions,” he wrote.

“Along these lines, the existing evidence suggests the existence of a causal relationship between citizen participation processes and levels of tax compliance. For instance, studies show that Swiss cantons with higher levels of democratic participation present lower tax evasion rates (Pommerehne & Weck-Hannemann 1996, Pommerehne & Frey 1992, Frey 1997). This effect is particularly strong when it comes to direct citizen participation in budgetary decisions, i.e. fiscal referendum (Frey & Feld 2002, Frey et al. 2004, Torgler 2005):

“The fiscal exchange relationship between taxpayers and the state also depends on the politico-economic framework within which the government acts. It has, in particular, been argued that the extent of citizens’ political participation rights systematically affects the kind of tax policy pursued by the government and its tax authority. (…) The more direct democratic the political decision-making procedures of a canton are, the lower is tax evasion according to these studies” (Feld & Frey 2005:29)

“According to his (Torgler) estimates, tax morale is significantly higher in direct democratic cantons. Distinguishing between different instruments of direct democracy, he finds that the fiscal referendum has the highest positive influence on tax morale” (Feld & Frey 2005:19)

Participatory budgeting, which has been gaining more attention in cities in the United States as more governments implement open government initiatives, has had particular success in Brazil, pointed out Peixoto, who is native to that country.

“In the Latin American context, a number of authors have observed a similar relationship with regard to participatory budgeting processes,” wrote Peixoto.

“In the municipality of Porto Alegre (BR) for instance, Schneider and Baquero (2006) show that the adoption of PB led to a substantive increase in tax revenues. In another study Zamboni (2007) compares the performance of similar Brazilian municipalities with and without PB processes: even when controlling for other factors, the study finds a significant relationship between the existence of PB and the increase in tax revenues. Another comparative study of 25 municipalities in Latin America and Europe also finds a significant reduction in levels of tax delinquency after the adoption of PB (Cabannes 2004):

“What is the relationship between the PB process and the municipality’s tax revenues? Most respondent cities indicated that the PB process entailed an increase in tax revenues and a decrease in delinquency. In Campinas, Recife and Cuenca, tax revenues increased significantly in a very few years; in Porto Alegre, property tax delinquency dropped from 20 per cent to 15 per cent and, in less than ten years, property taxes grew from 6 per cent to almost 12 per cent of the municipality’s revenues. Mundo Novo, in Brazil,also emphasized the drop in tax delinquency and relates it to the transparency of public administration entailed by PB. The immediate visibility of the work and services that result from PB also tends to change the citizenry’s taxpaying habits.” (Cabannes 2004:36)

Presenting the mayors with the results of that research provided them a strong incentive to try participatory budgeting, emphasized Peixoto. The results from the pilot, however, provided evidence of the efficacy of the practice:

The World Bank found that tax compliance in Kabare went from 7% to 12% in Kabare. In Ibanda, the impact of the pilot was even greater, with 16-fold increase in tax compliance. After the pilot, the provincial government decided to start transferring money to local areas but only if cities used participatory budgeting in the process.

“This was an eye-opening process,” said Jean Bunani, senior counsel in the Ministry of Budget in South Kivu

, in a prepared statement provided by the World Bank. Bunani was one of the beneficiaries of the project. “As a result, the province started transferring funds to local governments to start providing basic services to citizens,” he said. “This had been mandated by law for years – and for years the law had been ignored.”

Peixoto insisted that the results in South Kivu be interpreted with caution. “It is difficult to confirm a causal relationship participatory budgeting and the increase in tax compliance at scientific levels thus far,” he said, “but the evidence collected thus and testimonials of local officials suggests the existence of this causality.”

Mobile technology helped increase civic participation

“The way the citizens and the provincial Government of South Kivu took ownership of this project shows that technology can help build more inclusive decision making processes even in fragile and low-tech environments,” said Weber.

The vast majority of people in the area don’t have computers. Mobile phones may be one of the most important technologies to enter the region in decades – and people value them, walking long distances to generators to keep them charged.

“This is a place where they don’t have electricity for houses but they charge phones,” said Weber.

The mobile technology initiative was coordinated with the cooperation of the local mobile operators and funded by the World Bank. 1 million text messages cost $10,000, when purchased in bulk. Over 250,000 messages have been sent in support of the project, as of February 2012. Whenever there was a region meeting to deliberate about where to spend budgets funds, every handset under the local cellular towers would get a text message about it. And after the meeting, everyone would get a message with the results.

“The benefits from the participatory budgeting outweigh those costs enormously,” said Peixoto. “For more people to pay taxes, they need to know participatory budgeting exists. That’s the mass mobilization. There was already a substantial increase in the year 2010, when the local government started consulting grassroots organizations on an informal basis. Nevertheless, the process gains steam in 2011, in which the full methodology of participatory budgeting is really in place, with direct participation of the citizens and with the support of mobile telephony. Please note that these results are evaluated with control groups. In other words, in cities without the participatory budgeting process, the same behavior is not identified.”

Peixoto followed up with a review of research on public participation. “Some evidence suggests that participation may be even more effective at curbing tax evasion than traditional deterrence measures, such as fines and controls,” he wrote. “At odds with conventional economic reasoning, the literature in the field of ‘tax morale’ suggests that citizen participation actually comes across as a better remedy for tax evasion than commonly adopted deterrence policies (e.g. Torgler 2005, Feld & Frey 2007, Feld & Torgler 2007).

Weber noted that it’s hard to know exactly what the mobile penetration is in the South Kivu area. Unreliable research estimates put handset ownership at 14%, he said, but people also share devices, which in turn means that it’s not the most accurate estimate for research purposes.

“The new global focus on results-based aid is creating strong demand for better feedback data,” said Weber.

“This type of initiative can provide it and help crowdsource the monitoring of development impact. As more governments start to make commitments as part of the Open Government Partnership, they now need to stand up to the challenge on how to do engage citizens in a meaningful manner,” said Weber. “This type of project provides us with valuable lessons how donors can support governments in this effort.”

Finding the ROI for open government

This project clearly shows some of benefits associated with open government, said Peixoto, but only when citizens are involved in the process. Technology, in that context, is an enabler but is not sufficient by itself.

“These benefits are only generated when there is real engagement of the citizens,” said Peixoto. “Both the politicians and those working on the ground very convinced that the transparency of the budget in itself would not suffice to generate the results that we now observe. Nevertheless, to ensure that real engagement happens is an extremely long process, which cannot take place without having all the stakeholders involved.”

In that context, making participatory budgeting work using mobile devices isn’t just about working with the mayors, regional government, development officials, citizens or telecommunications companies: it’s about systems thinking and collaboration between all of the stakeholders.

“While we spend 10% of our time to convince governments to make their budgets transparency, the other 90% is convincing them to let citizens having a real say on where the money is going,” said Peixoto. “The road from transparency to accountability is neither obvious, nor an easy one.”

He also noted that technology complements what he describes a core components of citizen engagement, participatory design and institutional reform. Which is to say, rethinking processes and institutions come first, followed by figuring out how to architect technology to support them. In the Congo, mobile phones supported participatory budgeting by reducing associated costs, avoiding elite capture, maintaining public engagement and raising awareness of the process, which helped gather popular support.

Exploring mobile government

South Kivu is also experimenting with mobile phone voting, including some beta tests during the budget meetings.

“Mobile voting is expected to be implemented in full-scale, enabling a large number of inhabitants to remotely participate in the process of budget allocation,” said Peixoto. “The program is now going beyond the pilots, aiming to institutionalize ICT facilitated participatory budgeting in other provinces in the DR Congo and beyond.”

Weber said that after the pilots, 100% of citizens asked preferred to vote by mobile. “With ballots, there were huge lines,” he said. “It’s really about the costs of participation. This spoke to them.”

An important next step will be finding more ways to bridge the digital divide to engage the community in an ongoing dialogue about governance, including more analog methods like “data murals.”

“We are interested in inverting the logics of innovation,” wrote Peixoto. “The replication of contents and processes from the offline world into the online world has been the focus of people working in this field for a long time. Now we are looking how we can bring elements of the online world to the offline reality of South-Kivu, where access to the Internet is extremely scarce. Hence, one of our priorities for the beginning of next year is the creation of ‘data murals’ in which budget visualizations often available in online environments will be painted on the walls of the cities. If the logic is that of ‘going where citizens already are,’ in South-Kivu we will bring data and data visualization to the streets: budget data in citizen readable format.”

For instance, in Brazil, they’re painting the numbers for the budget on the wall, said Peixoto. “So, you could paint data visualizations on street walls. Fancy online data visualizations are very nice – but what if there’s no Internet? You need to get creative. We want to bring data vizualization to the streets, with a physical version of a dashboard where you go back and do updates as steps happen.”

Peixoto expects that the experience in South Kivu will also help inform how upcoming participatory budgeting initiatives mediated with mobile technology will be implemented elsewhere in the world, from Brazil to Cameroon to the Dominican Republic.

“Organizations working with participatory budgeting in the United States and Europe have already demonstrated interest in learning from the experience of South-Kivu and the use of mobile phones,” observed Peixoto, who said that the approach is already being replicated in Cameroon, the Dominican Republic and (back) in Brazil.

“In Cameroon, we are making a randomized experiment to assess the impact of SMS as a means to mobilize citizens and avoid elite capture,” he said.

The World Bank as set up a beta website for the Cameroon project and is working on another one with the Open Knowledge Foundation, similar to “Where does my money go?” The visualizations will be used to inform the participatory budgeting process, providing citizens with the means to use those visualizations to indicate where they’d like money to be allocated.

“This all goes to show that innovations in open government go both ways, from developing to developed countries,” said Peixoto. “The fact that people are not blogging about it in English does not mean that it does not exist. Sometimes people are just too busy making it happen. “

July 26 2012

Esther Dyson on health data, “preemptive healthcare” and the next big thing

If we look ahead to the next decade, it’s worth wondering whether the way we think about health and healthcare will have shifted. Will healthcare technology be a panacea? Will it drive even higher costs, creating a broader divide between digital haves and have-nots? Will opening health data empower patients or empower companies?

As ever, there will be good outcomes and bad outcomes, and not just in the medical sense. There’s a great deal of foment around the potential for mobile applications right now, from the FDA’s potential decision to regulate them to a reported high abandonment rate. There are also significant questions about privacy, patient empowerment and meaningful use of electronic healthcare records.

When I’ve talked to US CTO Todd Park or Dr. Farzad Mostashari they’ve been excited about the prospect for health data to fuel better dashboards and algorithms to give frontline caregivers access to critical information about people they’re looking after, providing critical insight at the point of contact.

Kathleen Sebelius, the U.S. Secretary for Health and Human Services, said at this year’s Health Datapalooza that venture capital investment in the Healthcare IT area is up 60 percent since 2009.

Given that context, I was more than a little curious to hear what Esther Dyson (@edyson) is thinking about when she looks at the intersection of healthcare, data and information technology.

"yes, but the sharks must love it!""yes, but the sharks must love it!"

[Photo Credit: Rick Smolan, via Esther Dyson]

Dyson, who started her career as a journalist, is now an angel investor and philanthropist. Dyson is a strong supporter of “preemptive healthcare” – and she’s putting her money where her interest lies, with her investments. She’ll be speaking at the StrataRX conference this October in San Francisco.

Our interview, which was lightly edited for content and clarity, follows.

How do you see healthcare changing?

Dyson: There’s multiple perspectives. The one I’ve got does not invalidate others, nor it is intended to any of trump the others, but it’s the one that I focus on — and that’s really “health” as opposed to “healthcare.”

If you maintain good health, you can avoid healthcare. That’s one of those great and unrealizable goals, but it’s realizable in part. Any healthcare you can avoid because you’re healthy is valuable.

What I’m mostly focused on is trying to change people’s behavior. You’ll get agreement from almost everybody that eating right, not smoking, getting exercise, avoiding too much stress, and sleeping a lot are good for your health.

The challenge is what makes people do those things, and that’s where there’s a real lack of data. So a lot of what I’m doing is investing on space. There’s evidence-based medicine. There’s also evidence-based prevention, and that’s even harder to validate.

Right now, a lot of people are doing a lot of different things. Many of them are collecting data, which over time, with luck, will prove that some of these things I’m going to talk about are valuable.

What does the landscape for healthcare products and services look like to you today?

Dyson: I see three markets.

There’s the traditional healthcare market, which is what people usually talk about. It’s drugs, clinics, hospitals, doctors, therapies, devices, insurance companies, data processors, or electronic health records.

Then there’s the market for bad health, which people don’t talk about a lot, at least not in those terms, but it’s huge. It’s the products and all of the advertising around everything from sugared soft drinks to cigarettes to recreational drugs to things that keep you from going to bed, going to sleep, keep you on the couch, and keep you immobile. I mentioned cigarettes and alcohol, I think. That’s a huge market. People are being encouraged to engage in unhealthy behaviors, whether it’s stuff that might be healthy in moderation or stuff that just isn’t healthy at all.

The new [third] market for health existed already as health clubs. What’s exciting is that there’s now an explicit market for things that are designed to change your behavior. Usually, they’re information and social-based. These are the quantified self – analytical tools, tools for sharing, tools for fostering collaboration or competition with people that behave in a healthy way. Most of those have very little data to back them up. It’s people think they make sense. The business models are still not too clear, because if I’m healthy, who’s going to pay for that? The chances are that if I’ll pay for it, I’m already kind of a health nut and don’t need it as much as someone who isn’t.

Pharma companies will pay for some such things, especially if they think that they can sell people drugs in conjunction with them. I’ll sell you a cholesterol lowering drug through a service that encourages you to exercise, for example. That’s a nice market. You go to the pre-diabetics and you sell them your statin. Various vendors of sports clubs and so forth will fund this. But over time, I expect you’re going to see employers realize the value of this, then finally long-term insurance companies and perhaps government. But it’s a market that operates mostly on faith at this point.

Speaking of faith, Rock Health shared data that around 80 percent of mobile health apps are being abandoned by consumers after two weeks. Thoughts?

Dyson: To me, that’s infant mortality. The challenge is to take the 20 percent and then make those persist. But yeah, you’re right, people try a lot of stuff and it turns out to be confusing and not well-designed, et cetera.

If you look ahead a decade, what are the big barriers for health data and mobile technology playing a beneficial role, as opposed to a more dystopian one?

Dyson: Well, the benign version is we’ve done a lot of experimentation. We’ve discovered that most apps have an 80 percent abandon rate, but the 20 percent that are persisting get better and better and better. So the 80 percent that are abandoned vanish and the marketplace and the vendors focus on the 20 percent. And we get broad adoption. You get onto the subway in New York and everybody’s thin and healthy.

Yeah, that’s not going to happen. But there’s some impact. Employers understand the value of this is. There’s a lot more to do than just these [mobile] apps. The employers start serving only healthy food in the cafeteria. Actually, one big sign is going to be what they serve for breakfast at Strata RX. I was at the Kauffman Life Sciences Entrepreneur Conference and they had muffins, bagels and cream cheese.

Carbohydrates and fat, in other words.

Dyson: And sugar-filled yogurts. That was the first day. They responded to somebody’s tweet [the second day] and it was better. But it’s not just the advertising. It’s the selection of stuff that you get when you go to these events or when you go to a hotel or you go to school or you go to your cafeteria at your office.

Defaults are tremendously important. That’s why I’m a big fan of what Bloomberg’s trying to do in New York. If you really want to buy two servings of soda, that’s fine, but the default serving should be one. I mean personally, I’d get rid of them entirely, but anyway. You know, make the defaults smaller dinner plates. All of this stuff really does have an impact.

Anyway, ten years from now, evidence has shown what works. What works is, in fact, working because people are doing it. A lot of this is social norms have changed. The early adopters have adopted, the late adopters are being carried along in the wake — just like there are still people who smoke, but it’s no longer the norm.

Do you have concerns or hopes for the risks and rewards of open health data releases?

Dyson: If we have a sensible healthcare system, the data will be helpful. Hospitals will say, “Oh my God, this guy’s at-risk, let’s prevent him getting sick.” Hospitals and the payers will know, “Gee, if we let this guy get sick, it’s going to cost us a lot more in the long run. And we actually have a business model that operates long-term rather than simply tries to minimize cost in the short-term.”

And insurance companies will say, “Gee, I’m paying for this guy. I better keep him healthy.” So the most important thing is for us to have a system that works long-term like that.

What role will personal data ownership play in the healthcare system of the future?

Dyson: Well, first we have to define what it is. I mean, from my point-of-view, you own your own data. On the other hand, if you want care, you’ve got to share it.

I think people are way too paranoid about their data. There will, inevitably, be data spills. We should try to avoid them, but we should also not encourage paranoia. If you have a rational economic system, privacy will be an issue, but financial security will not. Those two have gotten kind of mingled in people’s minds.

Yes, I may just want to keep it quiet that I have a sexually transmitted disease, but it’s not going to affect my ability to get treatment or to get insurance if I’ve got it. On the other hand, if I have to pay a little more for my diet soda or my hamburger because it’s being taxed, I don’t think that’s such a bad idea. Not that I want somebody recording how many hamburgers I eat, just tax them — but you don’t need to tax me personally: tax the hamburger.

What about the potential for the quantified self-movement to someday potentially reveal that hamburger consumption to insurers?

Dyson: You know, people are paranoid about insurers. They’re too busy. They’re not tracking the hamburgers you eat. They’re insuring populations. I mean seriously, you know? I went to get insurance and I told Aetna, “You can have my genetic profile.” And they said, “We wouldn’t know what to do with it.” I mean seriously, I’m not saying that’s entirely impossible ever in some kind of dystopia, but I really think people obsess too much about this kind of stuff.

How should — or could — startups in healthcare be differentiating themselves? What are the big problems that they could be working on solving?

Dyson: The whole social aspect. How do you design a game, a social interaction, that encourages people to react the way you want them to react? I mean, it’s just like what’s the difference between Facebook and Friendster. They both had the same potential user base. One was successful; one wasn’t. It’s the quality of the analytics, you show individuals about their behavior. It’s the narratives, the tools and the affordances that you give them for interacting with their friends. It’s like what makes one app different from another. They all use the same data in the end, but some of them are very, very different.

For what it’s worth, of the hundreds of companies that Rock Health or anybody else will tell you about, probably a third of them will disappear. One tenth will be highly successful and will acquire the remaining 57 percent.

What are the models that exist right now of the current landscape of healthcare startups that are really interesting to you? Why?

Dyson: I don’t think there’s a single one. There’s bunches of them occupying different places.

One area I really like is user-generated research and experiments. Obviously, 23andMe*. Deep analysis of your own data and the option to share it with other people and with researchers. User-generated data science research is really fascinating.

And then social affordance, like Kia’s Health Rally, where people interact with one and other. Omada Health (which I’m an investor in) is a Rock Health company which says we can’t do it all ourselves — there’s a designated counselor for a group. It’s right now focused on pre-diabetics.

I love that, partly because I think it’s going to be effective, and partly because I really like it as an employment model. I think our country is too focused on manufacturing and there’s a way to turn more people into health counselors. I mean, I’d take all of the laid off auto workers and turn them into gym teachers, and all the laid off engineers and turn them into data scientists or people developing health apps. Or something like that.

[*Dyson is an investor in 23andMe.]

What’s the biggest myth in the health data world? What’s the thing that drives you up the wall, so to speak?

Dyson: The biggest myth is that any single thing is the solution. The biggest need is for long-term thinking, which is everything from an individual thinking long-term about the impact of behavior to a financial institution thinking long-term and having the incentive to think long-term.

Individuals need to be influenced by psychology. Institutions, and the individuals in them, are employees that can be motivated or not. As an institution, they need financial incentives that are aligned with the long-term rather than the short-term.

That, again, goes back to having a vested interest in the health of people rather than in the cost of care.

Employers, to some extent, have that already. Your employer wants you to be healthy. They want you to show up for work, be cheerful, motivated and well rested. They get a benefit from you being healthy, far beyond simply avoiding the cost of your care.

Whereas the insurance companies, at this point, simply pass it through. If the insurance company is too effective, they actually have to lower their premiums, which is crazy. It’s really not insurance: it’s a cost-sharing and administration role that the insurance companies play. That’s something a lot of people don’t get. That needs to be fixed, one way or another.

July 25 2012

Mr. Issa logs on from Washington

To update an old proverb for the Information Age, digital politics makes strange bedfellows. In the current polarized atmosphere of Washington, certain issues create more interesting combinations than others.

In that context, it would be an understatement to say that’s been interesting to watch how Representative Darrell Issa (CA-R) has added his voice to the open government and Internet policy community over the last several years.

Rep. Issa was a key member of the coalition of open government advocates, digital rights advocates, electronic privacy wonks, Internet entrepreneurs, nonprofits, media organizations and congressmen that formed a coalition to oppose the passage of the Stop Online Piracy Act (SOPA) and PROTECT IP Act (PIPA) this winter. Rep. Issa strongly opposed SOPA after its introduction last fall and, working with key allies on the U.S. House Judicial Committee, effectively filibustered its advance by introducing dozens of amendments during the bill’s markup.

The delay created time over Congress’ holiday recess for opposition to SOPA and its companion bill in the Senate (The PROTECT IP Act) to build, culminating in a historic “black out day” on January 18, 2012. Both bills were halted.

While he worked across the aisle on SOPA and PIPA, Rep. Issa has been fiercely partisan in other respects, using his powerful position as the chairman of the U.S. House Oversight and Government Reform Committee to investigate various policy choices and actions of the Obama administration and federal agencies. During the same time period, he’s also become one of the most vocal proponents of open government data and Internet freedom in Congress, from drafting legislation to standardize federal finance data to opposing bills that stood to create uncertainty in the domain name system. He also sponsored the ill-conceived Research Works Act, which expired after received fierce criticism from open access advocates.

In recent years, Rep. Issa and his office have used the Web and social media to advance his legislative agenda, demonstrating in the process a willingness to directly engage with citizens and public officials alike on Twitter as @DarrellIssa, even to the extent of going onto Reddit to personally do an “Ask Me Anything.” Regardless of where one stands on his politics, the extent to which he and staff have embraced using the Web to experiment with more participatory democracy have set an example that perhaps no other member of Congress has matched.

In June 2012, I interviewed Rep. Issa over the phone, covering a broader range of his legislative and oversight work, including the purpose of this foundation and his views on regulation, open data, and technology policy in general. More context on other political issue, his personal life, business background and political career can be found at his Wikipedia entry and in Ryan Lizza’s New Yorker feature.

Our interview, lightly edited for content and clarity, is broken out into a series of posts that each explore different aspects of the conversation. Below, we talk about open government data and his new “Open Gov Foundation.”

What is the Open Gov Foundation?

In June, Representative Darrell Issa (R-CA) launched an “Open Gov Foundation” at the 2012 Personal Democracy Forum. Rep. Issa said then the foundation would institutionalize the work he’s done while in office, in particular “Project MADISON,” the online legislative markup software that his technology staff and contractors developed and launched after the first Congressional hackathon last December. If you visit the Open Gov Foundation website, you’ll read language about creating “platforms” for government data, from regulatory data to legislative data.

Congressman Issa’s office stated that this Open Gov Foundation will be registered as a non-partisan 501c3 by mid-fall 2012. A year from now, he would like to have made “major headway” on the MADISON project working in a number of different places, not just federal House but elsewhere.

For that to happen, MADISON code will almost certainly need to be open sourced, a prospect that the Congressman indicated is highly likely to in our interview, and integrated into other open government projects. On that count, Congressman Issa listed a series of organizations that he admired in the context of open government work, including the Sunlight Foundation, Govtrack, public.resource.org, the New York State Senate, OpenCongress and the Open Knowledge Foundation

Th general thrust of his admiration, said the Congressman, comes from that fact that these people are not just working hard to get government data out there, to deliver raw data, but to build things that are useful and that use that government data, helping to build tools that help bridge the gap for citizens.

What do you hope to achieve with the Open Government Foundation?

Rep. Issa: I’ve observed over 12 years that this expression that people use in Congress is actually a truism. And the expression they use is you’re entitled to your opinion but not your facts.

Well, the problem in government is that, in fact, facts seem to be very fungible. People will have their research, somebody will have theirs. Their ability to get raw data in a format where everybody can see it and then reach, if you will, opinions as to what it means, tends to be limited.

The whole goal that I’d like to have, whether it’s routing out waste and fraud — or honestly knowing what somebody’s proposal is, let’s just say SOPA and PIPA — is [to] get transparency in real-time. Get it directly to any and all consumers, knowing that in some cases, it can be as simple as a Google search by the public. In other cases, there would need to be digesting and analysis, but at least the raw data would be equally available to everyone.

What that does is it eliminates one of the steps that people like Ron Wyden and myself find ourselves in. Ron and I probably reach different conclusions if we’re given the same facts. He will see the part of the cup that is empty and needs government to fill it. And I will see the part that exists only because government isn’t providing all of the answers. But first, we have to have the same set of facts. That’s one of the reasons that a lot of our initiatives absolutely are equally desired by the left and the right, even though once we have the facts, we may reach different conclusions on policy.

Does you that mean more bulk data from Congress, which you supported with an amendment to a recent appropriations bill?

Rep. Issa: Let’s say it’s not about the quantity of data; it’s about whether or not there’s meaningful metadata attached to it. If you want to find every penny being spent on breast cancer research, there’s no way to compare different programs, different dollars in different agencies today. And yet, you may want to find that.

What we learned with the control board — or the oversight board that went with the stimulus — was that you’ve got to bring together all of the data if you’re going to find, if you will, people who are doing the same things in different parts of government and not have to find out only forensically after you’ve had rip-off artists rip-off the government.

The other example is on the granting of grants and other programs. That’s what we’re really going for in the DATA Act: to get that level of information that can, in fact, be used across platforms to find like data that becomes meaningful information.

Do you think more open government data remove some of the asynchronies of information around D.C.?

Issa:A lot of people have monetized the compiling of data in addition to monetizing the consulting as to what its meaning is. What we would like to do is take the monetization of data and take it down to a number that is effectively zero. Analysis by people who really have value-added will always be part of the equation.

Do you envision putting the MADISON Code onto GitHub, for open source developers in this country and around the world to use and deploy in their own legislatures if they wish?

Rep. Issa: Actually, the reason that we’ve formed a public nonprofit is for just that reason. I don’t want to own it or control it or to produce it for any one purpose, but rather, a purpose of open government. So if it spawns hundreds of other not-for-profits, that’s great. If people are able to monetize some of the value provided by that service, then I can also live with that.

I think once you create government information and, for that matter, appropriate private sector information, in easier and easier to use formats, people will monetize it. Oddly enough, they’ll monetize it for a fairly low price, because that which is easy, but you have to create value at a low cost. That which is hard, you can charge a fortune to provide that information to those who need it.

Will you be posting the budget of the Open Gov Foundation in an open format so people know where the funding is coming from and what it’s being spent on?

Rep. Issa: Absolutely. Although, at this point, we’re not inviting any other contributions of cash, we will take in-kind contributions. But at least for the short run, I’ll fund it out of my own private foundation. Once we have a board established and a set of policies to determine the relationships that would occur in the way of people who might contribute, then we’ll open it up. And at that point, the posting would become complex. Right now, it’s fairly easy: whatever money it needs, the Issa Family Foundation will provide to get this thing up and going.

Rethinking regulatory reform in the Internet age

As the cover story of a February issue of The Economist highlighted, concerns about an over-regulated America are cresting in this election year, with headlines from that same magazine decrying “excessive environmental regulation” and calling for more accurate measurement of the cost of regulations. Deleting regulations is far from easy to do but there does appear to be a political tailwind behind doing so.

As a legislator and chairman of the Government Oversight and Reform Committee, it’s fair to say that Representative Darrell Issa (D-CA) been quite active in publicly discussing the issue of regulations and regulatory burdens upon business. As a former technology entrepreneur, and a successful one at that (he’s the wealthiest member of Congress) Rep. Issa does have first-hand knowledge of what it takes to run a business, to bring products to market, and to deal with the various regulations.

In a wide-ranging interview earlier this summer, Rep. Issa commented on a number of issues related to open government and the work of the committee. When we talked about smart disclosure and the reforming the Freedom of Information Act, I posed several questions about regulatory data, in the context of its role in the marketplace for products and services. Our interview on regulation is below, followed by a look at how his office and the White House are trying to use the Web to improve regulatory reform and involve citizens in the debate.

What role does the release of regulatory data from the various agencies, in the form of smart disclosure or other directions, have in creating market transparency, bringing products to market or enabling citizens to understand the quality of said products? What is the baseline for regulation? For instance, after flying a lot recently, I’ve felt grateful the FAA had regulations that meant my flights would be safes when I flew back and forth across the country or ocean. There’s some baseline for the appropriate amount of regulation but it’s never entirely clear what that might be.

Rep. Issa: I’ll give you a good example of why regulations that you believe in, you don’t believe in. Do you believe it’s dangerous to have your cell phone on as you’re going across country?

My understanding is that it is extremely likely that many people’s cellphones have been, in fact, left on while they fly cross country or while they take off and land. The probability of people not having switched them off is high. To date, I have not heard a documented case where a switched on cellphone interfered with the navatronics of the plane. [See Nick Bilton's reporting on the FAA and gadgets in the New York Times.] That logically suggests to me that it’s not as much of a risk as has been posited, but I haven’t seen the data.

Rep Issa: So, on a regulatory basis, your country is lying to you. I’m making the statement as I’m asking the question. Of course your country’s lying to you about the risk. Of course there’s a valid reason to turn off your cell phone: it’s so you won’t be distracted while they’re telling you where the exit is. So rather than say, “Look, we have the right to have you shut off your cellphone and we believe that for safety purposes you should do it, but let’s not kid each other: If you’ve got it on final so you can get your emails a little earlier by 30 seconds and you don’t mind your battery going dead a little faster, it probably has no real risk.’

The fact is your government has regulatory power to regulate an action for which they don’t actually have a good faith belief it’s causing damage. Just the opposite: they have the knowledge that these units are on all the time by accident, in people’s luggage, and our planes still don’t crash.

My problem with regulations is they need to have a cost benefit. And that cost benefit, the burden has to be against the regulator, not for the regulator. So when the EPA says, “You need to take the arsenic out of water,” as they did a number of years ago, and it sounded great, but the number was arbitrary and they had no science. And what ended up happening in New Mexico was that people’s small water districts went out of business. In some cases, people went back to taking what was ever in their well and you go, “Well, why didn’t they have a number that they could justify you absolutely had to have otherwise it was hurting you?” Well, the answer is because they never did the science, they just did the regulations.

So where does the balance lie, in your opinion?

Rep Issa: When it comes to individual liberty, I try to be as absolute as possible. When it comes to regulatory needs, I tend to be as limited as possible, both because of people’s liberty, but also because government has a tendency to want to grow itself. And if you let it grow itself, one day you wake up like the frogs that were slowly boiled because they were put in the water and didn’t notice it getting warm until they were cooked.

When I’ve traveled abroad, I’ve heard from citizens of other countries, particularly in the developing world, that one of the things that they admire about the U.S. is that we have an FDA, an EPA, an FTC and other regulatory bodies which they see holding our quite powerful corporations to some level of account. What do role those institutions have in the 21st Century to hold private interests, which have incredible amounts of power in our world, accountable for the people?

Issa: I gave you the EPA example because there was a debate that ultimately the EPA won on arsenic to the detriment of whole communities who disagreed, who said, you haven’t made the case as to why you picked a particular level. They all supported the idea that water should be clean. The question is at what point of the cost-benefit was it the right level of clean. And I remember that one.

Let me give you one in closing that’s probably perfect. Today, the FDA is unable to ensure that generic cancer and antibiotics are in sufficient supply, which was one of its mandates. And as a result, there’s a whole bootleg market developing — and the left and the right are both concerned about it — for both cancer and antibiotics because there’s a shortage. But the FDA had a regulatory responsibility to ensure that the shortage didn’t occur and they’re failing it. So the FDA has a job it’s not doing.

Additionally, people are traveling to Europe and other places to get drugs which are saving lives because they’re getting approved in those countries quicker. These are western countries with the equivalent of FDA, but they’re getting approved quicker and clinical trials are going better and moving over there.

So when we look at the FDA, you’re not attacking them because you think you shouldn’t have the Food and Drug Administration dealing with particularly the efficacy of medicines, but because the FDA is falling short in the speed to market, getting longer and longer, meaning people are being denied innovative drugs.

Can the Web help with regulatory reform and e-rulemaking?

Representative Issa, whose committee heard testimony on regulatory impediments to job creation last week, is not alone in the U.S. House in his interest in streamlining regulations. This week, Speaker Boehner and his caucus have been pushing to “cut the red tape” limiting or loosening regulations on small businesses until unemployment falls to 6%.
The administration has not been inactive on this front, although it’s fair to say that House Republicans have made clear that its progress towards regulatory reform to date has been unsatisfactory. One early case study can be found in FCC open Internet rules and net neutrality, where OpenInternet.gov was used to collect public feedback for proposed rules. Public comments on OpenInternet.gov were officially entered as official comment, which was something of a watershed in e-rulemaking. The full version of the final rules, however, were not shared with the public until days after they were voted upon.

In January 2011, President Barack Obama issued an executive order focused on reforming regulation regulatory review. One element of the order was particularly notable for observers who watch to see whether citizen engagement is part of open government efforts by this administration: its focus upon public participation in the regulatory process.
As I’ve
written elsewhere, this order is part of a larger effort towards e-rulemaking by the administration. In February 2012, Regulations.gov relaunched with an API and some social media features, with an eye towards gaining more public participation. This electronic infrastructure will almost certainly be carried over into future administrations, regardless of the political persuasion of the incumbent of the Oval Office.

This summer, Cass Sunstein, the administrator of the Office for Information and Regulatory Affairs in the White House, asked the American people for more ideas on how the federal government could “streamline, simplify or eliminate federal regulations to help businesses and individuals.”

As the Wall Street Journal reported last year, the ongoing regulatory review by OIRA is a nod to serious, long-standing concerns in the business community about excessive regulation hampering investment and job creation as citizens struggle to recover from the effects of the Great Recession.

It’s not clear yet if an upgraded Regulations.gov will makes any difference in the quality of regulatory outcomes. Rulemaking and regulatory review are, virtually by their nature, wonky and involve esoteric processes that rely upon knowledge of existing laws and regulations.

In the future, better outcomes might come from smart government approaches, through adopting what Tim O’Reilly has described “algorithmic regulation,” applying the dynamic feedback loops that Web giants use to police their systems against malware and spam in government agencies entrusted with protecting the public interest.

In the present, however, while the Internet could involve many more people in the process, improved outcomes will depend upon an digitally literate populace that’s willing to spend some of its civic surplus on public participation in identifying problematic regulations. That would mean legislators and staff, regulators and agency workers to use the dynamic social Web of 2012 to listen as well as to broadcast.

To put it another way, getting to “Regulations 2.0″ will require “Citizen 2.0″ — and we’ll need the combined efforts of all our schools, universities, libraries, non-profits and open government advocates to have a hope of successfully making that upgrade.

Do citizens have a ‘right to record’ in the digital age?

When Representative Darrell Issa (R-CA) and I talked this summer about his proposal for a digital Bill of Rights, I followed up by asking him about whether it might be more productive to focus on the rights that we already have in the digital context.

That conversation naturally led to a question about freedom of assembly and freedom of the press, both of which came under some pressure in the United States during the Occupy protests of the past year. Our interview follows.

How can we make sure that the ‘inalienable rights’ that we are endowed with already are receiving oversight and enforcement from our representatives?

Rep Issa: I think that when we’re aghast at what China’s doing to Google, it helps us say we’re so upset about that. I think we need to take examples, we need to see what we don’t like and what the American people want us to protect. Sometimes you look abroad for really bad behavior and then you look internally to find similar behavior, maybe a quantum leap lower, but it’s still there.

You mentioned Washington D.C. and Occupy. I think that’s a classic example where free speech was turned into free camping. The rights of the public broadly to enjoy an asset that was set aside for public use [were involved], where the Mayor — who doesn’t happen to be from my party or even my ideology — comes to us and says, “We’ve got rats, we’ve got crud. We’ve got all of these things that are spreading into the rest of the city. These people are not exercising their free rights for most of the day, what they’re doing is camping on grounds that were not designed or built or prepared for that.”

As it went on week after week after week, it wasn’t camping overnight, between the day you arrive and the protest the next day: it was effectively ‘living in.’ That’s a good example where the rules are pretty well understood, the history is pretty understood, and the enforcement at the end was pretty consistent with what it’s been over the years. Your presence can be a protest, but, at some point, your presence becomes simply an impediment to other people’s rights.

Are you concerned that there have been dozens of journalists arrested at Occupy protests? And I don’t mean just citizens livestreaming what’s happening, although one could make a case that they’re committing ‘acts of journalism.’ I’m referring to credentialed journalists arrested while they’re actively chronicling what law enforcement is doing to their fellow citizens.

Rep Issa: There’s a separate question, which is whether law enforcement is entitled to the cloak of secrecy as they pull you over for a DWI and their camera isn’t on when they rough you up but is on when you resist arrest. Those are areas of personal liberty.

We’re not dealing ‘digitally,’ but we are dealing with an era in which a policeman or other individuals demand the rights to video you involuntarily, when it suits them, and then object if you want to video their doing the same event but from an independent perspective.

Do I think the court has to rule on that? Absolutely. Do I think you have to find the anecdotal examples of most egregious behavior in order to prove the point? Probably. But I think there have been a number of them.

When they talk about arresting journalists, whether credentialed or not, the court has to weigh in and say the police should not be afraid of a camera. If they’re afraid of a camera, they might be afraid of a witness. And if they’re afraid of you and I watching for some valid reason, great. But if you and I watching and the equivalent, digitally capturing it, then they’ve crossed a line.

I want to be careful. Some of the arrests of journalists, some of those arrest examples include, basically, misbehavior of journalists getting in the face of people, shoving cameras at them and asking questions designed to be less than what you would call passive. Those are not necessarily the best example. It’s sort of like you look at the paparazzi and Princess Di dying: it wasn’t the finest day to claim that paparazzi had rights.

On the other hand, if it’s somebody who from a distance who is observing and video recording an actual arrest of or holding of some individual who is simply walking down the street and says, “Hey, don’t stop me, you haven’t got a right” — the two are very different. And passive observation by the press is a better one to take to the court because it’s a slam dunk First Amendment [case].

Aggressive behavior by press who get in the face and blocks somebody trying to move is always a little bit more of a call where you and I could probably find a point in which we would say that the First Amendment line has been crossed in that somebody else’s rights have been infringed.

This is important, because this is where the left and the right should come to a common ground. Strict adherence to rights, even when it’s inconvenient, is part of what makes America a better country.

Addendum

This week, the Chief of the Metropolitan Police in the District of Columbia issued an order [PDF] affirming the public’s right to photograph and film police officers who are performing official business. The MPD’s action came as part of a court-mandated settlement of a lawsuit brought by Jerome Vorus, who claimed he was wrongly detained by the police department after photographing police activity.

The order “recognizes that members of the general public have a First Amendment right to video record, photograph, and/or audio record MPD members while MPD members are conducting official business or while acting in an official capacity in any public space, unless such recordings interfere with police activity.”

Given instances of documented interference with credentialed media in the city of New York, such guidance might be useful in the five borough as well.

Democratizing data, and other notes from the Open Source convention

There has been enormous talk over the past few years of open data and what it can do for society, but proponents have largely come to admit: data is not democratizing in itself. This topic is hotly debated, and a nice summary of the viewpoints is available in this PDF containing articles by noted experts. At the Open Source convention last week, I thought a lot about the democratizing potential of data and how it could be realized.

Who benefits from data sets

At a high level, large businesses and other well-funded organizations have three natural advantages over the general public in the exploitation of data sets:

  • The resources to gather the data
  • The resources to do the necessary programming to crunch and interpret the data
  • The resources to act on the results

These advantages will probably always exist, but data can be useful to the public too. We have some tricks that can compensate for each of the large institutions’ advantages:

  • Crowdsourcing can create data sets that can help everybody, including the formation of new businesses. OpenStreetMap, an SaaS project based on open source software, is a superb example. Its maps have been built up through years of contributions by people trying to support their communities, and it supports interesting features missing from proprietary map projects, such as tools for laying out bike paths.

  • Data-crunching is where developers, like those at the Open Source convention, come in. Working at non-profits, during week-end challenges, or just on impulse, they can code up the algorithms that make sense of data sets and apps to visualize and accept interaction from people with less technical training.

  • Some apps, such as reports of neighborhood crime or available health facilities, can benefit individuals, but we can really drive progress by joining together in community organizations or other associations that use the data. I saw a fantastic presentation by high school students in the Boston area who demonstrated a correlation between funding for summer jobs programs and lowered homicides in the inner city–and they won more funding from the Massachusetts legislature with that presentation.

Health care track

This year was the third in which the Open Source convention offered a health care track. IT plays a growing role in health care, but a lot of the established institutions are creaking forward slowly, encountering lots of organizational and cultural barriers to making good use of computers. This year our presentations clustered around areas where innovation is most robust: personal tracking, using data behind the scenes to improve care, and international development.

Open source coders Fred Trotter and David Neary gave popular talks about running and tracking one’s achievements. Bob Evans discussed a project named PACO that he started at Google to track productivity by individuals and in groups of people who come together for mutual support, while Anne Wright and Candide Kemmler described the ambitious BodyTrack project. Jason Levitt gave the science of sitting (and how to make it better for you).

In a high-energy presentation, systems developer Shahid Shah described the cornucopia of high-quality, structured data that will be made available when devices are hooked together. “Gigabytes of data is being lost every minute from every patient hooked up to hospital monitors,” he said. DDS, HTTP, and XMPP are among the standards that will make an interconnected device mesh possible. Michael Italia described the promise of genome sequencing and the challenges it raises, including storage requirements and the social impacts of storing sensitive data about people’s propensity for disease. Mohamed ElMallah showed how it was sometimes possible to work around proprietary barriers in electronic health records and use them for research.

Representatives from OpenMRS and IntraHealth international spoke about the difficulties and successes of introducing IT into very poor areas of the world, where systems need to be powered by their own electricity generators. A maintainable project can’t be dropped in by external NGO staff, but must cultivate local experts and take a whole-systems approach. Programmers in Rwanda, for instance, have developed enough expertise by now in OpenMRS to help clinics in neighboring countries install it. Leaders of OSEHRA, which is responsible for improving the Department of Veteran Affairs’ VistA and developing a community around it, spoke to a very engaged audience about their work untangling and regularizing twenty years’ worth of code.

In general, I was pleased with the modest growth of the health care track this year–most session drew about thirty people, and several drew a lot more–and both the energy and the expertise of the people who came. Many attendees play an important role in furthering health IT.

Other thoughts

The Open Source convention reflected much of the buzz surrounding developments in computing. Full-day sessions on OpenStack and Gluster were totally filled. A focus on developing web pages came through in the popularity of talks about HTML5 and jQuery (now a platform all its own, with extensions sprouting in all directions). Perl still has a strong community. A few years ago, Ruby on Rails was the must-learn platform, and knock-off derivatives appeared in almost every other programming language imaginable. Now the Rails paradigm has been eclipsed (at least in the pursuit of learning) by Node.js, which was recently ported to Microsoft platforms, and its imitators.

No two OSCons are the same, but the conference continues to track what matters to developers and IT staff and to attract crowds every year. I enjoyed nearly all the speakers, who often pump excitement into the dryest of technical topics through their own sense of continuing wonder. This is an industry where imagination’s wildest thoughts become everyday products.

Should the Freedom of Information Act extend to data in private companies?

The Freedom of Information Act (FOIA), which gives the people and press the right to access information from government, is one of the pillars of open government in the modern age. In the United States, FOIA is relatively new — it was originally enacted on July 4, 1966. As other countries around the world enshrine the principle into their legal systems, new questions about FOIA are arising, particularly when private industry takes on services that previously were delivered by government.

In that context, one of the federal open government initiatives worth watching in 2012 is ‘smart disclosure,’ the targeted release of information about citizens or about services they consume by government and by private industry. Smart disclosure is notable because there’s some “there there.” It’s not just a matter of it being one of the “flagship open government initiatives” under the U.S. National Plan for open government or that a White House Smart Disclosure Summit in March featured a standing room only audience at the National Archives. When compared to other initiatives, there has been relatively strong uptake of data from government and the private sector and its use in the consumer finance sector. Citizens can download their bank records and use them to make different decisions.

Earlier this summer, I interviewed Representative Darrell Issa (R-CA) about a number of issues related to open government, including what he thought of “smart disclosure” initiatives.

“These are areas of legitimate concern,” he said. “Europeans have a completely different set of criteria for what they consider to be data that can be released on behalf of their people. They are much more liberal in what you can find out but then they’re much more conservative in how long the data can be kept. We, on the other hand, limit how much data you can get by comparison. But you can keep it forever. It’s very hard to reconcile those two standards. But more importantly, the American people don’t agree with either one of them.”

Rep Issa told me that including data collected about individual by private actors, like financial institutions or insurers, is the “most important thing” that could be added to the Freedom of Information Act. That’s a notable position, given that the U.S. Federal Trade Commission called on Congress to enact baseline privacy legislation and more transparency of data brokers earlier this spring.

While online privacy debates have been going on in Washington for years now, legislators and regulators alike might consider the role of personal data ownership, where data is a currency that citizens control and may spend. As a matter of principle, the big (multi-billion dollar?) question may be whether the American people should have ownership of the data that is collected about them by, financial institutions, insurers, telecommunications companies or government agencies, similar to the credit report.

“As we’re reforming the Freedom of Information Act, the information held about you by anybody is yours unless there’s an affirmative defense to keep it,” said Rep. Issa.

For example, if you’re the subject of a criminal investigation in a drug deal, you shouldn’t be able to FOIA and find out what the feds know about you. That would be inappropriate for obvious reasons.If people are gathering the data about your financial well being and you want your FICO score, you should be able to get it. We accept that. So do you have a medical FICO score? Of course you do. You have a life insurance FICO score. I’m using the acronym, not for what it stands, but for how people look at it.

Each of those areas, you should give people access. The question is how do we get it into a common understanding of freedom of information both publicly and privately. And we should get there. And we can get there. I’ll go back to square one for a second. Without the DATA Act, you actually can’t expect your government to deliver it to you because they wouldn’t be able to find it. You’d be going agency to agency rather than saying, “Look, I want to know what you know about me. I want to know what you think about me. And I have a right.” Once we have a format in which they can’t hide behind [it] being burdensome to find it, then you should be able to get it.

I also asked Rep. Issa about the role of releasing government data into the marketplace plays in more transparent marketplaces, including data that private companies collect. I specifically called out the release of financial data, which is already released as XBRL data through the SEC’s Office of Interactive Disclosure. Issa said that “government being able to aggregate data [and] make it available in useable formats is one of the least expensive and most valuable things government does.”

When it came to the the initiatives that the Consumer Financial Protection Bureau, the U.S. Treasury Department and other federal agencies are taking, in terms of releasing government data back to the people, Rep. Issa highlighted some of the complexity he anticipates in personal data disclosures from industry:

“If you go back to your earlier question, what is somebody’s private information? That’s the only fly in that ointment,” said Rep. Issa. “If it’s my private information and I do not want to release it, then without a compelling need of America — not just a ‘nice to have,’ but compelling need — you don’t have the right to have it. So your point you’re leading to is well, shouldn’t they have all of this information? And the answer is, it depends. You know, people choose to belong to the Better Business Bureau. If they choose not to belong to the Better Business Bureau, should they have to give their information? The answer is no. If I belong to a credit rating agency or a credit exchange and I voluntarily give my credit experience so I can get other people’s credit experience, that’s an opt-in. One of the problems with the federal government is when they force the turning over of individual data and then use it for individual action against that entity, it’s a different standard than when they voluntarily receive data and aggregate it for the common good.”

When I asked FTC chairman Jon Leibowitz about whether citizens have a right to their data at a press conference earlier this year, he offered support for the idea. “With respect to data brokers, these are cyberazzi that collect information from consumers and consumers have no interface with them. They’re invisible to consumers. And so we have called for, and we actually have supported this for quite some time, legislation to create parameters and rights to correct inaccurate data by consumers in terms of baseline privacy. We’ve also called for specific data streaming legislation, which has been a bipartisan priority for the commerce committee and energy committee for quite some time.

… . In the report, we talk about some of the gaps now, because it’s our sense that companies are doing things that are very much like credit reporting agencies, but they might not be within the ambit of the FCRA.”

As Congress open an inquiry into data brokers, it will be interesting to see whether legislators agree with Rep. Issa or the FTC chairman — and whether they draft bills that extend digital rights to data to citizens.

Tags: Data Gov 2.0

Does the Open Government Partnership merit more oversight and attention?

Brazilian President Dilma Roussef speaks at the 2012 annual Open Government Partnership conference

Brazilian President Dilma Roussef speaks at the 2012 annual Open Government Partnership conference

There are any number of responsibilities and challenges inherent in moving forward with the historic Open Government Partnership (OGP) that officially launched last September. Global Integrity’s recent assessment of the National Action plans submitted to the Open Government Partnership by participating countries found cause for both concern and optimism, As I’ve highlighted elsewhere previously.

The National Action Plan commits the United States to 18 different open government initiatives, including implementing the Extractive Industries Transparency Initiative (EITI). One of the primary functions of the committee that Representative Darrell Issa (R-CA) chairs in the U.S. House is to provide oversight of what’s happening in the Executive Branch of government. In that context, the Government Oversight and Reform has an important role in overseeing not just what the proposals are but how they’re actually executed by agencies. In March 2011, the committee held a hearing on open government initiatives in the United States.

Earlier this summer, I interviewed Rep. Issa about a number of issues related to open government at the federal level including the involvement of the United States in OGP. Here’s what he had to say on the topic:

There always will be people who only see the negative of the United Nations or before that, the League of Nations. There will be people who find the World Trade Organization a group that needs to be struck down, because they view the access by the developed nations to assets of the developing nations works to their detriment.

Using those as backdrops, any time lawful representatives of governments come together to see if, in fact, there’s a win-win, I applaud it. The question that I have with this formation is will they come back to their people and stand the test of the traditional question of what is sovereign and what isn’t? And more importantly, see if they have the will of their people broadly through actual new statutes. A lot of what we’re seeing in agreement abroad right now is that individuals from our government go over. They agree to agree, but they never come back and make the circle, of do the American people agree. Do their representatives have the information, and an intervening election, so that when they vote for it, they’re voting for something akin to a treaty?

I think you see it in TPP [The Trans Pacific Partnership], and other things, that sometimes what you do is you say, “Well, we’re bound internationally for that which has not been bought into by the country itself, the people of the country.” I’m broadly for these kinds of talks. I’m decisively against finding out that you’re bound to something that wasn’t approved, not just by legislative representatives but by the American people, because I can give somebody authority to go have a conversation. I can’t give them authority to make a deal on behalf of the American people that the American people don’t know until after the deal has been made.

With respect to the concerns Rep. Issa raised about whether the American people have been consulted, each one of these national action plans for the Open Government Partnership was arrived at with a public consultation with the people of the countries in question. (I was present at the third White House open government partnership consultation as a member of civil society and posted my notes online.) There has been criticism about whether those public consultations are good enough or not, including the one held by our neighbor to the north, up in Canada. (Full disclosure: I was asked to sit on Canada’s open government advisory board and made a series of recommendations for Canada.) Once agreed to, it will be up to civil society and Congress to hold the government of a country accountable for implementing the plans.

There will be inevitable diplomatic challenges for OGP, from South Africa’s proposed secrecy law to Russia’s membership. Given that context, all of the stakeholders in the Open Government Partnership — from the government co-chairs in Brazil and the United Kingdom to the leaders of participating countries to the members of civil society that have been given a seat at the table — will need to keep pressure on other stakeholders if significant progress is going to be made on all of these fronts. If the next President of the United States doesn’t directly support the partnership and its principles on the campaign trail and in actions, it will leave considerable room for other countries to score diplomatic points for joining without delivering upon the promise of its requirements for their people. If OGP is to be judged more than a PR opportunity for politicians and diplomats to make bold framing statements, government and civil society leaders will need to do more to hold countries accountable to the commitments required for participation: they must submit Action Plans after a bonafide public consultation. Moreover, they’ll need to define the metrics by which progress should be judged and be clear with citizens about the timelines for change.

How will “open government” play into Election 2012?

It remains to be seen if open government or OGP comes up as a significant issue in the presidential campaign or in the context of this year’s Congressional election. While the Obama and Romney campaigns are heavily criticizing one another on the issue of “transparency,” from the White House’s mixed record to the former Massachusetts governor’s records in office or work in the Winter Olympics, the future of U.S. involvement in the partnership or its commitments in the plan isn’t making the campaign stump. For that matter, neither is open innovation in the public sector, including the use of prizes and challenges, or lean government.

That’s unfortunate. While there may be a strong rationale for both candidates for the presidency to focus on other issues than the emerging, often nebulous field of “open government,” including fundamental concerns like the economy, foreign policy, energy, education or healthcare, more open policies stand to benefit each of those areas. For instance, at the launch of OGP last September in New York, World Wide Web inventor Tim Berners-Lee argued that more transparency in aid and financial markets attracts more investment in developing countries. The party that would stand to benefit the most from competition on open government would be the American people.

And, while the ambiguity of open government and open data has been driving discussions online for months now, there’s just enough traction behind initiatives around open health data, energy data, and smart disclosure for policy makers, legislators and the electorate to pay a bit more attention to what’s happening in those areas.

Image Credit: DL Photo/CGU at the 2012 Open Government Partnership Conference

Uncertain prospects for the DATA Act in the Senate

The old adage that “you can’t manage what you can’t measure” is often applied to organizations in today’s data-drenched world. Given the enormity of the United States federal government, breaking down the estimated $3.7 trillion dollars in the 2012 budget into its individual allocations, much less drilling down to individual outlays to specific programs and subsequent performance, is no easy task. There are several sources for policy wonks to turn use for applying open data to journalism, but the flagship database of federal government spending at USASpending.gov simply isn’t anywhere near as accurate as it needs to be to source stories. The issues with USASpending.gov have been extensively chronicled by the Sunlight Foundation in its ClearSpending project, which found that nearly $1.3 trillion of federal spending as reported on the open data website was inaccurate.

If the people are to gain more insight into how their taxes are being spent, Congress will need to send President Obama a bill to sign to improve the quality of federal spending data. In the spring of 2012, the U.S. House passed by unanimous voice vote the DATA Act, a signature piece of legislation from Representative Darrell Issa (R-CA). H.R. 2146 requires every United States federal government agency to report its spending data in a standardized way and establish uniform reporting standards for recipients of federal funds.

“The DATA Act will transform how we are able to monitor government spending online,” said Ellen Miller, co-founder and executive director of the Sunlight Foundation, in a prepared statement. “We’ve said time and time again that transparency is not a partisan issue, and we are proud to see there was broad support across the aisle for the bill. The DATA Act will increase transparency for federal spending data and expand when, where and how it is available online,” said Ellen Miller, co-founder and executive director of the Sunlight Foundation. The DATA Act also received support from a broad range of other open government stalwarts, from OMB Watch to Citizens for Responsibility and Ethics in Washington (CREW):

Orgs in Support of DATA Act

Discussing DATA

I spoke with Rep. Issa, who serves as the chairman of the U.S. House Government Reform and Oversight Committee, about the DATA Act and the broader issues around open government data at the Strata Conference in New York City.

Daniel Schuman, the Sunlight Foundation’s legislative counsel, summarized our conversation on open government data over at the Sunlight Foundation’s blog. Video of our discussion is embedded below.

Rep. Issa: …when I work with [Inspector Generals], they would love to have access to predictive [data analytics tools]. Today, they only have forensic. And in many cases, they have like stove pipe forensic. They only know after the fact, a portion of the data, and it frustrates them. We’re going to change that.

The DATA Act is bipartisan, which here in Washington is very unusual. One of the reasons is that people who want to know from the left and the right want to be in the know. We believe that by mandating standard reporting and a process of greater transparency and, of course, the tools created to make this easy and inexpensive for the private sector to participate in will give us an opportunity which will at some time be used by the left or the right or often used by simply people who have a vested interested in advising the private sector accurately on what is, has and will become events in government or for that matter, events in the private sector that are being aggregated through the government.

Your industry is going to be essential because if we give you more accurate, more easily compiled data, unless you turn it into information that’s valuable, we haven’t really accomplished what we want to. The same is true, though, unless you do it, my IGs won’t have private sector solutions that allow them to pick up COTS or near COTS solutions that are affordable and valuable and use them in evaluating government to drive out waste and fraud in government.

What’s next for the DATA Act?

The Senate version of the DATA Act, which is sponsored by Senator Mark Warner (D-VA) remains “pending” in the Homeland Security and Government Affairs Committee after a hearing last week, despite the considerable efforts of a new Data Transparency Coalition to move the bill. The hearing came one week after the coalition held a public DATA Demo Day that featured technology companies demonstrating different uses of standardized federal spending data, including claims that it could have prevented the scandal over excessive conference spending in the General Services Agency.

At the hearing, Senator Warner proposed an amended version of the DATA Act that would drop the independent board modeled on the Recovery Accountability and Transparency Board that oversaw spending from the American Recovery and Reinvestment Act of 2009, as Joseph Marks reported for Next Gov.

The DATA Act, however, received a hearing but not a markup, as Daniel Schuman, the legislative counsel of the Sunlight Foundation, wrote at the transparency advocate’s blog. For those who aren’t well versed in the legislative process, “markups” are when amendments are considered. The DATA Act will have to pass through the HSGAC committee to get to the floor of the Senate.) In his summary of the hearing, Schuman highlighted the opposition of the Office of Management and Budget and U.S. Treasury Department to the Act’s provisions:

Gene Dodaro, the Comptroller General, testified about a newly-issued GAO report on federal spending transparency, which alternatively praised and criticized OMB’s efforts to comply with legislation to improve information availability. During the Q&A, Dodaro explained that it may be helpful for Congress to enact legislation declaring what spending information it wants to have available to the public, as a way of establishing priorities and direction.

OMB Controller Daniel Werfel’s testimony [PDF] focused on OMB’s efforts to improve the accuracy and availability of spending information, largely defending the administration’s record. During the Q&A, Werfel emphasized that new legislation is not necessary to implement spending transparency as the administration already has the necessary authority. While his testimony highlighted the administration’s claims of what it has accomplished, it did not engage the concerns that OMB has dragged its feet over the last 4 years, or that OMB — as an arm of the president — may have mixed incentives about releasing potentially politically damaging information. He did explain that OMB has not released a statement of administration policy on the DATA Act, but that OMB (unsurprisingly) is less than enthusiastic about shifting responsibility over standard-setting and implementation to an independent body.

Treasury Department Assistant Fiscal Secretary Richard Gregg testified [PDF] about ongoing internal efforts at Treasury to improve data quality and projects that will yield results in the future. During the Q&A, Gregg explained that legislation isn’t needed for financial transparency, leadership in the executive branch would be sufficient. This raises the question of whether sufficient leadership is being exercised.

The question of leadership that Schuman raised is a good one, as is one regarding incentives. During July’s International Open Government Data Conference in DC, Kaitlin Bline, the senior developer working on the Sunlight Foundation’s Clearspending project, said that the problems with USASpending.gov government spending data come from oversight, not technology.

Bline was blunter in her post on aGeneral Accounting Office, Congressional committees performing oversight of federal agencies, or special commission, notably the Truman committee during World War II. In the decades since, the work of inspector generals and Congressional staffers has been augmented by fraud detection technology, a critical innovation given the estimated $70 billion dollars in improper payments made by the federal government within the Medicare and Medicaid programs alone. (The fraud detection technology that was developed at PayPal and spun out into Palantir Technologies, in fact, has been deployed to that end.)

The promise of standardizing federal spending data, grant data — or performance data — is that those entrusted with oversight could be empowered with predictive data analytics tools and teams to discover patterns and shift policy to address them.

While the huge budget deficit in the United States is highly unlikely to be closed by cutting fraud and waste alone, making federal spending machine-readable and putting it online clearly holds promise to save taxpayers dollars. First, however, the quality of government spending data must be improved.

Important questions about the DATA Act remain, from the cost of its implementation for cities and states, which would have to report federal grants, to the overall cost of the bill to federal government. The Congressional Budget Office estimated that the DATA Act would cost the government $575 million to implement over 5 years. In response to the CBO, House Oversight staff have estimated that annual savings from standards and centralized spending database that would more than offset that outlay, including:

  • $41 million in funds recovered from questionable recipients
  • $63 million in funds withheld from questionable recipients
  • $5 billion in savings recommended by inspectors general
  • unknown savings resulting from better internal spending control and better oversight by Congressional appropriators.

No formal subsequent action on the DATA Act has been scheduled in the Senate and, with the August recess looming and many eyes turning to cybersecurity legislation, there are uncertain prospects for its passage in this election year’s legislative calendar.

The need for the federal government, watchdogs and the people to be able to accurately track the spending of taxpayer dollars through high quality open government data, however, remains acute.

July 23 2012

The dark side of data

Map of France in Google Earth by Steven La Roux

A few weeks ago, Tom Slee published “Seeing Like a Geek,” a thoughtful article on the dark side of open data. He starts with the story of a Dalit community in India, whose land was transferred to a group of higher cast Mudaliars through bureaucratic manipulation under the guise of standardizing and digitizing property records. While this sounds like a good idea, it gave a wealthier, more powerful group a chance to erase older, traditional records that hadn’t been properly codified. One effect of passing laws requiring standardized, digital data is to marginalize all data that can’t be standardized or digitized, and to marginalize the people who don’t control the process of standardization.

That’s a serious problem. It’s sad to see oppression and property theft riding in under the guise of transparency and openness. But the issue isn’t open data, but how data is used.

Jesus said “the poor are with you always” not because the poor aren’t a legitimate area of concern (only an American fundamentalist would say that), but because they’re an intractable problem that won’t go away. The poor are going to be the victims of any changes in technology; it isn’t surprisingly that the wealthy in India used data to marginalize the land holdings of the poor. In a similar vein, when Europeans came to North America, I imagine they told the natives “So, you got a deed to all this land?,” a narrative that’s still being played out with indigenous people around the world.

The issue is how data is used. If the wealthy can manipulate legislators to wipe out generations of records and folk knowledge as “inaccurate,” then there’s a problem. A group like DataKind could go in and figure out a way to codify that older generation of knowledge. Then at least, if that isn’t acceptable to the government, it would be clear that the problem lies in political manipulation, not in the data itself. And note that a government could wipe out generations of “inaccurate records” without any requirement that the new records be open. In years past the monied classes would have just taken what they wanted, with the government’s support. The availability of open data gives a plausible pretext, but it’s certainly not a prerequisite (nor should it be blamed) for manipulation by the 0.1%.

One can see the opposite happening, too: the recent legislation in North Carolina that you can’t use data that shows sea level rise. Open data may be the only possible resource against forces that are interested in suppressing science. What we’re seeing here is a full-scale retreat from data and what it can teach us: an attempt to push the furniture against the door to prevent the data from getting in and changing the way we act.

The digital publishing landscape

Slee is on shakier ground when he claims that the digitization of books has allowed Amazon to undermine publishers and booksellers. Yes, there’s technological upheaval, and that necessarily drives changes in business models. Business models change; if they didn’t, we’d still have the Pony Express and stagecoaches. O’Reilly Media is thriving, in part because we have a viable digital publishing strategy; publishers without a viable digital strategy are failing.

But what about booksellers? The demise of the local bookstore has, in my observation, as much to do with Barnes & Noble superstores (and the now-defunct Borders), as with Amazon, and it played out long before the rise of ebooks.

I live in a town in southern Connecticut, roughly a half-hour’s drive from the two nearest B&N outlets. Guilford and Madison, the town immediately to the east, both have thriving independent bookstores. One has a coffeeshop, stages many, many author events (roughly one a day), and runs many other innovative programs (birthday parties, book-of-the-month services, even ebook sales). The other is just a small local bookstore with a good collection and knowledgeable staff. The town to the west lost its bookstore several years ago, possibly before Amazon even existed. Long before the Internet became a factor, it had reduced itself to cheap gift items and soft porn magazines. So: data may threaten middlemen, though it’s
not at all clear to me that middlemen can’t respond competitively. Or that they are really threatened by “data”, as opposed to large centralized competitors.

There are also countervailing benefits. With ebooks, access is democratized. Anyone, anywhere has access to what used to be available only in limited, mostly privileged locations. At O’Reilly, we now sell ebooks in countries we were never able to reach in print. Our print sales overseas never exceeded 30% of our sales; for ebooks, overseas represents more than half the total, with customers as far away as Azerbaijan.

Slee also points to the music labels as an industry that has been marginalized by open data.  I really refuse to listen whining about all the money that the music labels are losing. We’ve had too many years of crap product generated by marketing people who only care about finding the next Justin Bieber to take the “creative industry” and its sycophants seriously.

Privacy by design

Data inevitably brings privacy issues into play. As Slee points out,(and as Jeff Jonas has before him), apparently insignificant pieces of data can be put together to form a surprisingly accurate picture of who you are, a picture that can be sold. It’s useless to pretend that there won’t be increased surveillance in any forseeable future, or that there won’t be an increase in targeted advertising (which is, technically, much the same thing).

We can bemoan that shift, celebrate it, or try to subvert it, but we can’t pretend that it isn’t happening. We shouldn’t even pretend that it’s new, or that it has anything to do with openness. What is a credit bureau if not an organization that buys and sells data about your financial history, with no pretense of openness?

Jonas’s concept of “privacy by design” is an important attempt to address privacy
issues in big data. Jonas envisions a day when “I have more privacy features than you” is a marketing advantage. It’s certainly a claim I’d like to see Facebook make.

Absent a solution like Jonas’, data is going to be collected, bought, sold, and used for marketing and other purposes, whether it is “open” or not. I do not think we can get to Jonas’s world, where privacy is something consumers demand, without going through a stage where data is open and public. It’s too easy to live with the illusion of privacy that thrives in a closed world.

I agree that the notion that “open data” is an unalloyed public good is mistaken, and Tom Slee has done a good job of pointing that out. It underscores the importance of of a still-nascent ethical consensus about how to use data, along with the importance of data watchdogs, DataKind, and other organizations devoted to the public good. (I don’t understand why he argues that Apple and Amazon “undermine community activism”; that seems wrong, particularly in the light of Apple’s re-joining the EPEAT green certification system for their products after a net-driven consumer protest.) Data collection is going to happen whether we like it or not, and whether it’s open or not. I am convinced that private data is a public bad, and I’m less afraid of data that’s open. That doesn’t make it necessarily a good; that depends on how the data is used, and the people who are using it.

Image Credit: Steven La Roux

Reposted bydatenwolf datenwolf
Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl