About
Abbrev:..oAnth.....Motto:...'Nothing to Hide'.#25c3/#CCC.:.. Den Nachgeborenen ein
gemahnendes Vorbild & zur bleibenden Erinnerung - Loc: München (Munich - Germany).
..............................................................................................................................
Intended: a caleidoscope of repostings, feeds & direct postings in EN....DE....FR..
Selected entries from oAnth are provided via scoop.it - oAnth miscellaneous .........
..............................................................................................................................
Start of active postings on this Tumblelog Diary [microblogging -- WP] on Jan 2009,
nonetheless a great number of earlier entries are indirectly implemented via RSS-feeds.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Selection by entry types - starting with the latest. . . . links. . . texts. . . quotes. . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . files . . . videos . . . images . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
See likewise . . . . . . . >> 02myTagManual . . . . . . >> latest compilations . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Links & feeds to my Posterous-account are protected - pls use password: oA:acc_
:: at twitter >> 02mytwi01 ... diaspora* >> oAnth ... friendfeed >> 02myffeed01 ::
..............................................................................................................................
............ ABOUT THE ACTUAL SOUP.IO STATUS - - - latest entry 2012-03-27 ...........
2012-05-08 - oAnth: during the coming days I will hardly be capable for personal online
aktivities - only RSS import will be provided, if soup.io works regulary.
Click here to check if anything new just came in.
May 06 2012
The state of health IT according to the American Hospital Association
Last week, the American Hospital Association released a major document. Framed as comments on a major federal initiative, the proposed Stage 2 Meaningful Use criteria by the Centers for Medicare & Medicaid Services (CMS), the letter also conveys a rather sorrowful message about the state of health IT in the United States. One request--to put brakes on the requirement for hospitals to let patients see their own information electronically--has received particularly strong coverage and vigorous responses from e-Patient Dave deBronkart, Regina Holliday, Dr. Adrian Gropper, Fred Trotter, the Center for Democracy and Technology, and others.
I think the AHA has overreached in its bid to slow down patient access to data, which I'll examine later in this article. But to me, the most poignant aspect of the AHA letter is its careful accumulation of data to show the huge gap between what health care calls for and what hospitals, vendors, standards bodies, and even the government are capable of providing.
Two AHA staff were generous enough to talk to me on very short notice and offer some clarifications that I'll include with the article.
A survey of the U.S. health care system
According to the AHA (translated into my own rather harsh words), the state of health IT in American hospitals is as follows:
Few hospitals and doctors can fulfill basic requirements of health care quality and cost control. For instance, 62% could not record basic patient health indicators such as weight and blood pressure (page 51 of their report) in electronic health records (EHRs).
Many EHR vendors can't support the meaningful use criteria in real-life settings, even when their systems were officially certified to do so. I'll cite some statements from the AHA report later in the article. Meaningful use is a big package of reforms, of course, promulgated over just a few years, but it's also difficult because vendors and hospitals had also been heading for a long time in the opposite direction: toward closed, limited functionality.
Doctors still record huge globs of patient data in unstructured text format, where they are unavailable for quality reporting, tracking clinical effectiveness, etc. Data is often unstructured because humans are complex and their symptoms don't fit into easy categories. Yet doctors have learned to make diagnoses for purposes of payment and other requirements; we need to learn what other forms of information are worth formalizing for the sake of better public health.
Quality reporting is a mess. The measures currently being reported are unreliable, and standards have not been put in place to allow valid comparisons of measures from different hospitals.
Government hasn't stepped up to the plate to perform its role in supporting electronic reporting. For instance, the Centers for Medicare & Medicaid Services (CMS) wants the hospitals to report lots of quality measures, but its own electronic reporting system is still in the testing stages, so hospitals must enter data through a cumbersome and error-prone manual "attestation." States aren't ready to accept electronic submissions either. The Direct project is moving along, but its contribution to health data exchange is still very new.
There's no easy place to assign blame for a system that is killing hundreds of thousands of people a year while sticking the US public with rising costs. The AHA letter constantly assures us that they approve the meaningful use objectives , but say their implementation in a foreseeable time frame is unfeasible. "We can envision a time when all automated quality reporting will occur effortlessly in a reliable and valid fashion. However, we are not there yet." (pp. 42-43)
So the AHA message petition to the CMS can be summarized overall as, "Slow everything down, but keep the payments coming."
AHA staff referred to the extensively researched article, A Progress Report On Electronic Health Records In U.S. Hospitals. It corroborates observations that adoption of EHRs has vastly increased between 2010 and 2011. However, the capabilities of the EHRs and hospitals using them have not kept up with meaningful use requirements, particularly among small rural hospitals with few opportunities to hire sophisticated computer technicians, etc. Some small hospitals have trouble even getting an EHR vendor to talk to them.
Why all this matters
Before looking at some details, let me lay out some of the reasons that meaningful use criteria are so important to patients and the general public:
After treatment, data must be transferred quickly to patients and the next organizations treating them (such as rehab centers and visiting nurses) so that the patients receive proper care.
Quality measures are critical so that hospitals can be exposed to sunshine, the best disinfectant, and be shamed into lowering costs and reducing errors.
Data must be collected by public agencies so that data crunchers can find improvements in outreach and treatment. Hospitals love to keep their data private, but that gives them relatively tiny samples on which to base decisions, and they often lack the skills to analyze the data.
No one can predict what will break logjams and propel health care forward, but the patient engagement seems crucial because most health care problems in developed countries involve lifestyle issues such as smoking and body weight. Next, to provide the kind of instant, pervasive patient engagement that can produce change, we need electronic records that are open to innovative apps, that can accept data from the patient-centered medical home, and that link together all care-givers.
The state of electronic health records
The EHR industry does not come out well in the AHA list of woes. The letter cites "unworkable, but certified, vendor products" (p.3) and say, "Current experience is marked by limited vendor and workforce capacity." (p. 7) The latter complaint points to one of the big hurdles facing health care reform: we don't have enough staff who understand computer systems and who can adapt their behavior to use them effectively.
Functionality falls far short of real hospital needs:
...one hospital system spent more than $1 million on a quality reporting tool from its vendor that was, for the most part, an unwieldy data entry screen. Even medication orders placed using CPOE [computerized physician order entry] needed to be manually re-entered for the CQM [Center For Quality Management] calculation. Even then, the data were not reliable, despite seven months of working with the vendor to attempt to get it right. Thus, after tremendous investment of financial and human resources, the data are not useful. (p. 45)
The AHA claims that vendors were lax in testing their systems, and that the government abetted the omission: "the proposals within the certification regulation require vendors to incorporate all of the data elements needed to calculate only one CQM. There is no proposal to require that certified EHRs be capable of generating all of the relevant CQMs proposed/finalized by CMS." (p. 41) With perhaps a subtle sarcasm, the AHA proposes, "CMS should not require providers to report more e-measures than vendors are required to generate." (p. 36)
Vendors kind of take it on the chin for fundamental failures in electronic capabilities. "AHA survey data indicate that only 10 percent of hospitals had a patient portal of any kind in Fall 2011. Our members report that none had anywhere near the functionality required by this objective. In canvassing vendors, they report no technology companies can currently support this volume of data or the listed functions." (p. 26)
We can add an observation from the College of Healthcare Information Management Executives (CHIME):
...in Stage 1, some vendors were able to dictate which clinical quality measures providers chose to report--not based on the priorities of the provider, but based on the capabilities of the system. Subsequently, market forces corrected this and vendors have gone on to develop more capabilities. But this anecdote provides an important lesson when segmenting certification criteria--indeed for most technologies in general--flexibility for users necessitates consistent and robust standards for developers. In short, the 2014 Edition must require more of the vendor community if providers are to have space to pursue meaningful use of Meaningful Use. (p. 2)
Better standards--which take time to develop--could improve the situation, which is why the Office of the National Coordinator (ONC) has set up a Health IT Standards Committee. For instance, the AHA says, "we have discovered that vendors needed to program many decisions into EHRs that were not included in the e-specifications. Not only has this resulted in rampant inconsistencies between different vendors, it produced inconsistent measure results when the e-measures are compared to their counterparts in the Inpatient Quality Reporting (IQR) Program." (p. 35)
The AHA goes so far as to say, "The market cannot sustain this level of chaos." (p. 7) They conclude that the government is pushing too hard. One of their claims, though, comes across as eccentric: "Providers and vendors agree that the meaningful use program has stifled innovation in the development of new uses of EHRs." (p. 9)
To me, all the evidence points in the opposite direction. The vendors were happy for decades to push systems that performed minimal record-keeping and modest support such as formularies at huge costs, and the hospitals that adopted EHRs failed to ask for more. It wasn't a case of market failure because, as I have pointed out (and others have too), health care is not a market. But nothing would have changed had not the government stepped in.
Patient empowerment
Now for the point that has received the most press, AHA's request to weaken the rules giving patients access to their data. Once again, the AHA claims to favor patient access--and actually, they have helped hospitals over the years to give patients summaries of care, mostly on paper--but are passing on the evidence they have accumulated from their members that the systems will not be in place to support electronic distribution for some time. I won't repeat all the criticisms of the experts mentioned at the beginning of this article, but provide some perspective about patient engagement.
Let's start with the AHA's request to let the hospital can choose the format for patient data (pp. 25-26). So long as hospitals can do that, we will be left with formats that are not interoperable. Many hospitals will choose formats that are human-readable but not machine-readable, so that correlations and useful data cannot be extracted programmatically. Perhaps the technology lags in this area--but if the records are not in structured format already, hospitals themselves lose critical opportunities to check for errors, mine data for trends, and perform other useful tasks with their records.
The AHA raises alarms at the difficulties of providing data. They claim that for each patient who is treated, the hospital will have to invest resources "determining which records are relevant and appropriate." (p. 26) "It is also unclear whether a hospital would be expected to spend resources to post information and verify that all of the data listed are available within 36 hours." (p. 27)
From my perspective, the patient download provisions would simply require hospitals to clean up their ways of recording data so that it is in a useable and structured format for all, including their own staff. Just evaluate what the AHA is admitting to in the following passage: "Transferring these clinical observations into a structured, coded problem list in the EHR requires significant changes to work flows and training to ensure accuracy. It also increases time demands for documentation by physicians who already are stretched thin." (p. 27)
People used to getting instant information from commercial web sites find it very hard to justify even the 36-hour delay offered by the Stage 2 meaningful use guidelines. Amazon.com can provide me with information on all my current and recent orders. Google offers each registered user a dashboard that shows me everything they track about me, including all my web searches going back to mid-2006. They probably do this to assure people that they are not the egregious privacy violators they are regularly accused of being. Nevertheless, it shows that sites collecting data can make it available to users without friction, and with adequate security to manage privacy risks.
The AHA staff made a good point in talking to me. The CMS "transmit" requirement would let a patient ask the hospital to send his records to any institution or individual of his choice. First of all, this would assume that the recipient has encrypted email or access to an encrypted web site. And it could be hard for a hospital to make sure both the requester and the intended recipient are who they claim to be. "The transmit function also heightens security risks, as the hospital could be asked to send data to an individual with whom it has no existing relationship and no mechanism for authentication of their identity." (p. 27) Countering this claim, Gropper and the Society for Participatory Medicine offer the open OAuth standard to give patients easy and secure access. But while OAuth is a fairly stable standard, the AHA's concerns are justified because it hasn't been applied yet to the health care field.
Unfortunately, allowing a patient to send his or her data to a third party is central to Accountable Care Organizations (ACOs), which hold the promise of improving patient care by sharing data among cooperating health care providers. If the "transmit" provision is delayed, I don't see how ACOs can take off.
The AHA drastically reduces the information hospitals would have to give patients, at least for the next stage of the requirements. Among the material they would remove are diagnoses, the reason for hospitalization, providers of care during hospitalization, vital signs at discharge, laboratory test results, the care transition summary and plan for next provider of care, and discharge instructions for patient. (p. 27) All this vastly reduces the value of data for increasing quality care. For instance, removing lab test results will lead to expensive and redundant retesting. (However, the AHA staff told me they support the ability of patients to get results directly from the labs.)
I'll conclude this section with the interesting observation that the CHIME comments on meaningful use I mentioned earlier say nothing about the patient engagement rules. In other words, the hospital CIOs in CHIME don't back up the hospitals' own claims.
Some reasonable AHA objections
Now I'm happy to turn to AHA proposals that leave fewer impediments to the achievement of better health care. Their 49-page letter (plus appendices) details many aspects of Stage 2 that seem unnecessarily burdensome or of questionable value.
It seems reasonable to me to ask the ONC, "Remove measures that make the performance of hospitals and EPs contingent on the actions of others." (p. 2) For instance, to engage in successful exchanges of patient data, hospitals depend on their partners (labs, nursing homes, other hospitals) to have Stage 2 capabilities, and given the slow rate of adoption, such partners could be really hard to find.
The same goes for patient downloads. Not only do hospitals have to permit patients to get access to data over the Internet, but they have to get 10% of the patients to actually do it. I don't think the tools are in place yet for patients to make good use of the data. When data is available, apps for processing the data will flood the market and patients will gradually understand the data's value, but right now there are few reasons to download it: perhaps to give it to a relative who is caring for the patient or to a health provider who doesn't have the technical means to request the data directly. Such uses may allow hospitals to reach the 10% required by the Stage 2 rule, but why make them responsible?
The AHA documents a growing digital divide among hospitals and other health care providers. "Rural, smaller and nonteaching hospitals have fewer financial and technical resources at their disposal. They also are starting from a lower base of adoption." (p. 59) The open source community needs to step up here. There are plenty of free software solutions to choose from, but small providers can't use them unless they become as easy to set up and configure as MySQL or even LibreOffice.
The AHA is talking from deep experience when it questions whether patients will actually be able to make use of medical images. "Images are generally very large files, and would require that the individual downloading or receiving the file have specialized, expensive software to access the images. The effort required to make the images available would be tremendous." (p. 26) We must remember that parts of our country don't even have high-speed Internet access.
The AHA's detailed comments about CMS penalties for the slow adoption of EHRs (pp. 9-18) also seem to reflect the hard realities out in the field.
But their attitude toward HIPAA is unclear. They point out that Congress required meaningful use to "take into account the requirements of HIPAA privacy and security law." (p. 25) Nevertheless, they ask the ONC to remove its HIPAA-related clauses from meaningful use because HIPAA is already administered by the Office of Civil Rights (OCR). It's reasonable to remove redundancy by keeping regulations under a single agency, but the AHA admits that the OCR proposal itself is "significantly flawed." Their staff explained to me that their goal is to wait for the next version of the OCR's own proposal, which should be released soon, before creating a new requirement that could well be redundant or conflicting.
Unless we level the playing field for small providers, an enormous wave of buy-outs and consolidation will occur. Market forces and the push to form ACOs are already causing such consolidation. Maybe it's even a good thing--who feels nostalgic for the corner grocery? But consolidation will make it even more important to empower patients with their data, in order to counterbalance the power of the health care institutions.
A closing note about hospital inertia
The AHA includes in its letter some valuable data about difficulties and costs of implementing new systems (pp. 47-48). They say, "More than one hospital executive has reported that managing the meaningful use implementation has been more challenging than building a new hospital, even while acknowledging the need to move ahead." (p. 49)
What I find particularly troublesome about their report is that the AHA offers no hint that the hospitals spent all this money to put in place new workflows that could improve care. All the money went to EHRs and the minimal training and installation they require. What will it take for hospitals to make the culture changes that reap the potential benefits of EHRs and data transfers? The public needs to start asking tough questions, and the Stage 2 requirements should be robust enough to give these questions a basis.
Principles of patient access in Directed Exchange
The Health Insurance Portability and Accountability Act (HIPPA) is good law. HIPPA formalized principles of patient privacy that should have been codified industry norms for more than 50 years (better late than never). HIPPA provided the right to patients in the U.S. to get access to their own healthcare records. The law struck reasonable balances on hundreds of complicated issues in order to achieve these goals. The law solved more problems, by far, than it created. Which is as close to the definition of good government as I can imagine. Patients are better off after HIPPA than before.
Sadly, the "letter of the law" in HIPPA is frequently either ignored or worse, fully embraced, in order to make patient access to their own healthcare data more cumbersome. This is evidenced nowhere better than Regina Holiday's experience with access to her husband's medical records. To make a long story short, she was able to acquire an unpublished manuscript of a Stephen King novel, sooner and for less money than she was able to get her husband's medical records.
Principle zero: Some clinicians will do anything they can to make patient access to their health records impossible or cumbersome.
Regina's work, detailing her experience with her husband is titled 73 cents, because that's how much it cost to get one page of her husband's medical record. HIPPA allows hospitals and clinicians to charge a "reasonable" copying fee for access to patient records. The problem with that is that in the digital age, a single healthcare record print out looks like this:

A partial printout of a patient's medical record.
This is what happens when you print out a digital health record. Having patients pay the copying costs for access to medical records makes a simple presumption: there are only a few pages there. Obviously no patient will be able to afford copying costs in the age of all-digital records.
Principle one: Patient access to their own healthcare records must be digital once the record is digital.
Once you concede that access to the patient's medical record must be digital, we can discuss the push vs. pull question. When someone else on the Internet has data that is important to you, you can generally find ways to have it "pushed" to you or you can choose to "pull" it. The simplest example is the weather. You can always check the weather easily online by visiting a website (by pulling). But you can also have software text you when it is going to rain (by pushing).
There are advantages of both push and pull approaches for patient access to data. People who are excited about the pull model tend to focus on the benefits of the "portal" requirements in Meaningful Use, and those that favor the push model are excited about directed exchange. Without getting into the debate, I can posit that there are some cases where push access to patient data is critical. Without supporting patient participation in directed exchange we regulate patients to second-class citizens with regard to healthcare exchange. That is unacceptable. Patients should be first-class citizens in healthcare exchange.
Principle two: Patients should be able to participate in health information exchange as first-class citizens.
The Office of the National Coordinator for Health Information Technology (ONC) should be applauded for requiring directed exchange with patients in the current proposed rule. I hope that ONC does not back off of this new requirement.
The current proposed rule making, however, is silent on a critical issue for directed health information exchange. How do we ensure that providers will not refuse to communicate with patients over directed exchange because of bogus "security concerns"? As we see with the copying costs under HIPPA, every potential barrier to a patient's access to data will be used against patients.
There are already rumors of cases in the pilots of directed exchange where organizations are using the trust architecture of the Direct Project to refuse to communicate with certain parties. While that might be reasonable between institutions (do you really think Planned Parenthood will ever automate communication with Catholic charity clinics or vice-versa?), it is absolutely critical that this not hamper patient-clinician communication.
When we first designed the Direct Project Trust model, we presumed that patient-clinicians communication would take place based on "business-card" identity verification. That meant that when a patient provided a clinician with a public key (no matter how they did that) the clinician would trust it because the patient provided the public key. We did this because we knew that if clinicians could reject a patient's public key based on "security concerns," they would do so. Either the clinicians (or more likely the vendors that they hired) would choose directed exchange "partners" that were "approved" and "secure," ensuring that the patient's experience of directed exchange was merely a more extensive menu of patient portal options. Patient data is very valuable and controlling the flow of patient data is central to more business plans than I care to count.
In order for patients to be first-class citizens in health information exchange, they should have the right to send their records, in an automated fashion, anywhere they want. Even if it meant sending it to a service that the patient was enthusiastic about, but the clinician disapproved of (i.e. qpid.me). In the world of secure email enabled by public-key infrastructure (PKI), that translates to clinicians must accept any public key/direct address presented by a patient in a reasonable manner. This acceptance must be unconditional, but should probably mean limiting the acceptance of that key to communication with just that patient. Anything less than this means that the patient is a second-class citizen with regards to the information exchange of their own data.
Conclusion: ONC should require that clinicians communicate with a patient's chosen directed exchange provider, which means accepting any public key presented by a patient in a reasonable manner.
The community at Direct Trust is working hard to agree on what "reasonable manner" should mean, exactly. Here is my latest proposal on the subject, and here are similar ideas from Dr. David Kibbe. Eventually the Direct Trust community will knock out a firm understanding on the specific ways that might be "reasonable" for a patient to provide a certificate. But we are certainly agreed that without firm requirements on certificate acceptance, this issue will be used by clinicians to limit where patients can send their own data.
As the U.S. federal government is preparing to pay healthcare providers to adopt electronic health records (EHR) they will insist that those doctors/hospitals/etc. show that they are using the new software in clinically meaningful ways. On Monday (May 7, 2012) they will be accepting comments on the second stage of the requirements that clinicians must meet in order to receive compensation. These requirements are usually short-handed as "meaningful use."
I will be submitting this blog post as my comments to that process. Others will be submitting comments that directly contradict the principles and conclusions I write here. Most notably the American Hospital Association (AHA) has argued that the requirements for patient portals and for providing patients with access to their digital record should be entirely removed from the meaningful use standards (PDF). Specifically:
"Our members are particularly concerned with the proposed objective to provide patients with the ability to view, download and transmit large volumes of protected health information via the Internet (a "patient portal"). The AHA believes that this objective is not feasible as proposed, raises significant security issues, and goes well beyond current technical capacity. We also believe that CMS should not include this objective because the Office of Civil Rights, and not CMS, regulates how health care providers and other covered entities fulfill their obligations under the Health Insurance Portability and Accountability Act (HIPAA), including the obligation to give patients access to their health records."
This is fairly ironic, since the report also says:
"To date, OCR has received comments on its own significantly flawed original proposal to implement this section of HITECH, but has yet to finalize the standard."
Apparently, AHA is not satisfied with any government agency's interpretation of giving electronic access to patient data. The AHA would prefer that patients continue to wait the same amount of time for access to their digital records that they do for their paper records. Specifically:
"Further, 30 days are necessary to make determinations about how to respond to a request no matter the format of the protected health information. While providing an electronic copy of protected health information maintained in an EHR eventually may be facilitated more easily by technology, the process of determining which records are relevant and appropriate takes the same amount of time as it does for evaluating paper records."
Of course, this is entirely false. Indeed, HIPPA does maintain that certain parts of healthcare records (i.e. a psychiatrist's notes) and disclosures (i.e. when the FBI asks for records) are not subject to patient access. An EHR should be capable of understanding which parts of an EHR record are subject to HIPPA and which are not. If the EHR system can understand this distinction, then responses to HIPPA requests can be made in near-real-time. If the EHR system cannot make the distinction between which portions of the record to automatically provide to honor a HIPPA patient access request, then having 30 days is not going to be enough. Can you imagine a nurse reading through the entire stack of papers above to ensure that a certain mental health diagnosis is redacted?
One of the most critical features of patient participation in directed exchange is the patient's capacity to prevent the spread of bad information as it is happening. Apparently, the AHA believes that patients should tolerate the spread of mis-information in their health records to other institutions for a month before correcting it. This of course works in every situation where patients can wait a whole month to get correct information to other hospitals and clinicians.
I would like to be the first to welcome the American Hospital Association to the digital age. (Okay, maybe the second.) From a technology perspective, there is nothing at all that would prevent patients from receiving copies of their updated digital health records seconds after it is "signed" by their clinicians. Inside those seconds is plenty of time to digitally determine whether sharing with the patient is appropriate, legal and safe. Seconds after a patient like me receives data, I intend to process it in an automated fashion. It is not unreasonable, in this new digital world, for me to get a text message that a doctor has ordered a medication that I am allergic to. I wish to get that message after the doctor has ordered the medication, but before I receive it in my IV.
In this new digital world, 36 hours is unreasonable. It means that humans continue to be involved in tasks that can be performed perfectly by a computer without errors. Even 36 hours means that doctors, nurses and hospital administrators are still "thinking in paper." Thirty-six hours means that you still do not view me, the patient, as an equal data partner. It means that I am blind to the data in your hospital at the only time it really matters, which is right now. Health data that is 36-hours old can only be analyzed as a post-mortem and data that is 30-days old is already rotting. As a patient, 36 hours is a short-term solution. It is an opportunity for you to rethink how information flows in your hospitals. It is an opportunity for you to rethink the notions of "inside" the hospital and "outside" the hospital.
This is not that I do not take your point regarding the reconciliation of the policies from the perspective of HIPPA and meaningful use. Two time-lines for compliance is difficult. But the reconciliation is to speed HIPPA up, not slow meaningful use down. The notion that you will give patients a stack of paper like the one above 30 days after it is useful is a bad joke. It was a bad joke 20 years ago, when the technologies already existed to fix the problem, but you decided that the patient's experience was not worth that investment.
There is always something you can do, if you feel as strongly about this as I do.
Meaningful Use and Beyond: A Guide for IT Staff in Health Care — Meaningful Use underlies a major federal incentives program for medical offices and hospitals that pays doctors and clinicians to move to electronic health records (EHR). This book is a rosetta stone for the IT implementer who wants to help organizations harness EHR systems.Photo: Medical record printout by jodi0327, on Flickr
Related:
- The Direct Project: Healthcare communication gets an upgrade
- Epatients: The hackers of the healthcare world
- Building the health information infrastructure for the modern epatient
- Why geeks should care about meaningful use and ACOs
- See more of Radar's health IT coverage
May 03 2012
Strata Week: Google offers big data analytics
Here are the data stories that caught my attention this week.
BigQuery for everyone
Google has released its big data analytics service BigQuery to the public. Initially made available to a small number of developers late last year, now anyone can sign up for the service. A free account lets you query up to 100 GB of data per month, with the option to pay for additional queries and/or storage.
"Google's aim may be to sell data storage in the cloud, as much as it is to sell analytic software," says The New York Times' Quentin Hardy. "A company using BigQuery has to have data stored in the cloud data system, which costs 12 cents a gigabyte a month, for up to two terabytes, or 2,000 gigabytes. Above that, prices are negotiated with Google. BigQuery analysis costs 3.5 cents a gigabyte of data processed."
The interface for BigQuery is meant to lower the bar for these sorts of analytics — there's a UI and a REST interface. In the Times article, Google project manager Ju-kay Kwek says Google is hoping developers build tools that encourage widespread use of the product by executives and other non-developers.
If folks are looking for something to cut their teeth on with BigQuery, GitHub's public timeline is now a publicly available dataset. The data is being synced regularly, so you can query things like popular languages and popular repos. To that end, GitHub is running a data visualization contest.
The Data Journalism Handbook
The Data Journalism Handbook had its release this week at the 2012 International Journalism Festival in Italy. The book, which is freely available and openly licensed, was a joint effort of the European Journalism Centre and the Open Knowledge Foundation. It's meant to serve as a reference for those interested in the field of data journalism.
In the introduction, "Deutsche Welle's" Mirko Lorenz writes:
"Today, news stories are flowing in as they happen, from multiple sources, eye-witnesses, blogs, and what has happened is filtered through a vast network of social connections, being ranked, commented and more often than not, ignored. This is why data journalism is so important. Gathering, filtering and visualizing what is happening beyond what the eye can see has a growing value."
Velocity 2012: Web Operations & Performance — The smartest minds in web operations and performance are coming together for the Velocity Conference, being held June 25-27 in Santa Clara, Calif.Save 20% on registration with the code RADAR20
Open data is a joke?
Tim Slee fired a shot across the bow of the open data movement with a post this week arguing that "the open data movement is a joke." Moreover, it's not a movement at all, he contends. Slee turns a critical eye to the Canadian government's open data efforts in particular, noting that: "The Harper government's actions around 'open government,' and the lack of any significant consequences for those actions, show just how empty the word 'open' has become."
Slee is also critical of open data efforts outside the government, calling the open data movement "a phrase dragged out by media-oriented personalities to cloak a private-sector initiative in the mantle of progressive politics."
Open data activist David Eaves responded strongly to Slee's post with one of his own, recognizing his own frustrations with "one of the most — if not the most — closed and controlling [governments] in Canada's history." But Eaves takes exception with the ways in which Slee characterizes the open data movement. He contends that many of the corporations involved with the open data movement — something Slee charges has corrupted open data — are U.S. corporations (and points out that in Canada, "most companies don't even know what open data is"). Eaves adds, too, that many of these corporations are led by geeks.
"Just as an authoritarian regime can run on open-source software, so too might it engage in open data. Open data is not the solution for Open Government (I don't believe there is a single solution, or that Open Government is an achievable state of being — just a goal to pursue consistently), and I don't believe anyone has made the case that it is. I know I haven't. But I do believe open data can help. Like many others, I believe access to government information can lead to better informed public policy debates and hopefully some improved services for citizens (such as access to transit information). I'm not deluded into thinking that open data is going to provide a steady stream of obvious 'gotcha moments' where government malfeasance is discovered, but I am hopeful that government data can arm citizens with information that the government is using to inform its decisions so that they can better challenge, and ultimately help hold accountable, said government."
Got data news?
Feel free to email me.
Related:
- Big data in the cloud
- The state of open government in Canada
- New open-data initiatives in Canada and the UK
- The bond between data and journalism grows stronger
- Data journalism, data tools, and the newsroom stack
May 02 2012
The UK's battle for open standards
Many of you are probably not aware, but there is an ongoing battle within the U.K. that will shape the future of the U.K. tech industry. It's all about open standards.
Last year, the Cabinet Office ran a consultation on open standards covering 970 CIOs and academics. The result of this consultation was a policy (PDF) in favour of royalty-free (RF) open standards in the U.K. I'm not going to go through the benefits of open standards in this space, other than to note that they are essential for the U.K.'s future competitive position, for spurring on innovation and creating a level playing field within the tech field. For those who wish to read more on this subject, Mark Thompson, the only academic I know to have published a paper on open standards in a quality peer reviewed journal, has provided an excellent overview.
Normally, I put these battles into an historical context, and I certainly have a plethora of examples of past industries attempting to lobby against future change. However, to keep this short I'll simply note that the incumbent industry has reacted to the Cabinet Office policy with attempts to redefine open standards to include non-open FRAND (fair, reasonable and non discriminatory) licenses and portray some sort of legitimate debate of RF versus FRAND, which doesn't exist.
Whilst this is clearly wrong and underhanded, there's another story I wish to focus on. It relates to the accusations that the meetings have been filled with "spokespeople for big vendors to argue in favour of paid-for software, specifically giving advocates of FRAND the chance to argue that free software on RF terms would be a bad thing" as reported by TechWeek Europe.
The back story is that since the Government policy on open standards was put in place, the Cabinet Office was pressured into a u-turn and running another consultation by various standards bodies and other vested interests. The arguments used were either fortuitous misunderstandings of the policy or willful misinformation in favour of current business interests. The Cabinet Office then appeared to relent to the pressure and undertake a second set of consultations. What happened next shows the sorry behaviour of lobbyists in our industry.
"Software patent heavyweights piled into the first public meeting," filling the room with unrepresentative views backed up by vendors flying in senior individuals from the U.S. It apparently seems that the chair of the roundtable was himself a paid lobbyist working on behalf of those vested interests, a fact that he forgot to mention to the Cabinet Office. Microsoft has now been "accused of trying to secretly influence government consultation."
What's surprising is that the majority of this had been uncovered by two journalists — Mark Ballard at Computer Weekly and Glyn Moody — who work mainly outside the mainstream media. In fact, the mainstream media has remained silent on the issue, with the notable exception of The Guardian.
The end result of the work of these two journalists is that the Cabinet Office has had to extend the consultation and, as noted by The Guardian, "rerun one of its discussion roundtables after it found that an independent facilitator of one of its discussions was simultaneously advising Microsoft on the consultation."
So, we have two plucky journalists who stand alone uncovering large corporations that are bullying Government to protect profits worth hundreds of millions. Our heroes' journey uncovers gerrymandering, skullduggery, rampant conflicts of interests, dubious ethics and a host of other sordid details and ... hold on, this sounds like a Hollywood script, not real life. Why on earth isn't mainstream media all over this, especially given the leaked Bell Pottinger memo on exploiting citizen initiatives?
The silence makes me wonder whether investigative journalism into things that might matter and might make a positive difference doesn't sell much advertising? Would it help if the open standards battle had celebrity endorsement? Alas, that's not the case and the battle for open standards might have been extended, but it is still ongoing. This issue is as important to the U.K. as SOPA / PIPA were to the U.S., but rather than fighting against a Government trying to do something that harms the growth of future industry, we are fighting with a Government trying to do the right thing and benefit a nation.
If you're too busy to help, that's understandable, but don't ever grumble about why the U.K. Government doesn't do more to support open standards and open source. The U.K. Government is trying to make a difference. It's trying to fight a good fight against a huge and well-funded lobby, but it needs you to turn up.
The battle for open standards needs help, so get involved.
Related:
- Promoting Open Source Software in Government: The Challenges of Motivation and Follow-Through
- Cost is only part of the Gov 2.0 open source story
- With GOV.UK, British government redefines the online government platform
- Government IT's quiet open source evolution
May 01 2012
Recombinant Research: Sage Congress plans for patient engagement
Clinical trials are the pathway for approving drug use, but they aren't good enough. That has become clear as a number of drugs (Vioxx being the most famous) have been blessed by the FDA, but disqualified after years of widespread use reveal either lack of efficacy or dangerous side effects. And the measures taken by the FDA recently to solve this embarrassing problem continue the heavy-weight bureaucratic methods it has always employed: more trials, raising the costs of every drug and slowing down approval. Although I don't agree with the opinion of Avik S. A. Roy (reprinted in Forbes) that Phase III trials tend to be arbitrary, I do believe it is time to look for other ways to test drugs for safety and efficacy.
First article in the series: Recombinant Research: Sage Congress Promotes Data Sharing in Genetics.
But the Vioxx problem is just one instance of the wider malaise afflicting the drug industry. They just aren't producing enough new medications, either to solve pressing public needs or to keep up their own earnings. Vicki Seyfert-Margolis of the FDA built on her noteworthy speech at last year's Sage Congress (reported in one of my articles about the conference) with the statistic that drug companies have submitted 20% fewer medications to the FDA between 2001 and 2007. Their blockbuster drugs produce far fewer profits than before as patents expire and fewer new drugs emerge (a predicament called the "patent cliff"). Seyfert-Margolis intimated that this crisis in the cause of layoffs in the industry, although I heard elsewhere that the companies are outsourcing more research, so perhaps the downsizing is just a reallocation of the same money.
Benefits of patient involvement
The field has failed to rise to the challenges posed by new complexity. Speakers at Sage Congress seemed to feel that genetic research has gone off the tracks. As the previous article in this series explained, Sage Bionetworks wants researchers to break the logjam by sharing data and code in GitHub fashion. And surprisingly, pharma is hurting enough to consider going along with an open research system. They're bleeding from a situation where as much as 80% of each clinical analysis is spent retrieving, formatting, and curating the data. Meanwhile, Kathy Giusti of the Multiple Myeloma Research Foundation says that in their work, open clinical trials are 60% faster.
Attendees at a breakout session where I sat in, including numerous managers from major pharma companies, expressed confidence that they could expand public or "pre-competitive" research in the direction Sage Congress proposed. The sector left to engage is the one that's central to all this work--the public.
If we could collect wide-ranging data from, say, 50,000 individuals (a May 2013 goal cited by John Wilbanks of Sage Bionetworks, a Kauffman Foundation Fellow), we could uncover a lot of trends that clinical trials are too narrow to turn up. Wilbanks ultimately wants millions of such data samples, and another attendee claimed that "technology will be ready by 2020 for a billion people to maintain their own molecular and longitudinal health data." And Jamie Heywood of PatientsLikeMe, in his keynote, claimed to have demonstrated through shared patient notes that some drugs were ineffective long before the FDA or manufacturers made the discoveries. He decried the current system of validating drugs for use and then failing to follow up with more studies, snorting that, "Validated means that I have ceased the process of learning."
But patients have good reasons to keep a close hold on their health data, fearing that an insurance company, an identity thief, a drug marketer, or even their own employer will find and misuse it. They already have little enough control over it, because the annoying consent forms we always have shoved in our faces when we come to a clinic give away a lot of rights. Current laws allow all kinds of funny business, as shown in the famous case of the Vermont law against data mining, which gave the Supreme Court a chance to say that marketers can do anything they damn please with your data, under the excuse that it's de-identified.
In a noteworthy poll by Sage Bionetworks, 80% of academics claimed they were comfortable sharing their personal health data with family members, but only 31% of citizen advocates would do so. If that 31% is more representative of patients and the general public, how many would open their data to strangers, even when supposedly de-identified?
The Sage Bionetworks approach to patient consent
It's basic research that loses. So Wilbanks and a team have been working for the past year on a "portable consent" procedure. This is meant to overcome the hurdle by which a patient has to be contacted and give consent anew each time a new researcher wants data related to his or her genetics, conditions, or treatment. The ideal behind portable consent is to treat the entire research community as a trusted user.
The current plan for portable consent provides three tiers:
Tier 1
No restrictions on data, so long as researchers follow the terms of service. Hopefully, millions of people will choose this tier.
Tier 2
A middle ground. Someone with asthma may state that his data can be used only by asthma researchers, for example.
Tier 3
Carefully controlled. Meant for data coming from sensitive populations, along with anything that includes genetic information.
Synapse provides a trusted identification service. If researchers find a person with useful characteristics in the last two tiers, and are not authorized automatically to use that person's data, they can contact Synapse with the random number assigned to the person. Synapse keeps the original email address of the person on file and will contact him or her to request consent.
Portable consent also involves a lot of patient education. People will sign up through a software wizard that explains the risks. After choosing portable consent, the person decides how much to put in: 23andMe data, prescriptions, or whatever they choose to release.
Sharon Terry of the Genetic Alliance said that patient advocates currently try to control patient data in order to force researchers to share the work they base on that data. Portable consent loosens this control, but the field may be ready for its more flexible conditions for sharing.
Pharma companies and genetics researchers have lots to gain from access to enormous repositories of patient data. But what do the patients get from it? Leaders in health care already recognize that patients are more than experimental subjects and passive recipients of treatment. The recent ONC proposal for Stage 2 of Meaningful Use includes several requirements to share treatment data with the people being treated (which seems kind of a no-brainer when stated this baldly) and the ONC has a Consumer/Patient Engagement Power Team.
Sage Congress is fully engaged in the patient engagement movement too. One result is the BRIDGE initiative, a joint project of Sage Bionetworks and Ashoka with funding from the Robert Wood Johnson Foundation, to solicit questions and suggestions for research from patients. Researchers can go for years researching a condition without even touching on some symptom that patients care about. Listening to patients in the long run produces more cooperation and more funding.
Portable consent requires a leap of faith, because as Wilbanks admits, releasing aggregates of patient data mean that over time, a patient is almost certain to be re-identified. Statistical techniques are just getting too sophisticated and compute power growing too fast for anyone to hide behind current tricks such as using only the first three digits of a five-digit postal code. Portable consent requires the data repository to grant access only to bona fide researchers and to set terms of use, including a ban on re-identifying patients. Still, researchers will have rights to do research, redistribute data, and derive products from it. Audits will be built in.
But as mentioned by Kelly Edwards of the University of Washington, tools and legal contracts can contribute to trust, but trust is ultimately based on shared values. Portable consent, properly done, engages with frameworks like Synapse to create a culture of respect for data.
In fact, I think the combination of the contractual framework in portable consent and a platform like Synapse, with its terms of use, might make a big difference in protecting patient privacy. Seyfert-Margolis cited predictions that 500 million smartphone users will be using medical apps by 2015. But mobile apps are notoriously greedy for personal data and cavalier toward user rights. Suppose all those smartphone users stored their data in a repository with clear terms of use and employed portable consent to grant access to the apps? We might all be safer.
The final article in this series will evaluate the prospects for open research in genetics, with a look at the grip of journal publishing on the field, and some comparisons to the success of free and open source software.
Next: Breaking Open Rewards and Incentives. All articles in this series, and others I've written about Sage Congress, are available through a bit.ly bundle.
OSCON 2012 — Join the world's open source pioneers, builders, and innovators July 16-20 in Portland, Oregon. Learn about open development, challenge your assumptions, and fire up your brain.Save 20% on registration with the code RADAR20
April 30 2012
Recombinant Research: Sage Congress promotes data sharing in genetics
Given the exponential drop in the cost of personal genome sequencing (you can get a basic DNA test from 23andMe for a couple hundred dollars, and a full sequence will probably soon come down to one thousand dollars in cost), a new dawn seems to be breaking forth for biological research. Yet the assessment of genetics research at the recent Sage Congress was highly cautionary. Various speakers chided their own field for tilling the same ground over and over, ignoring the urgent needs of patients, and just plain researching the wrong things.
Sage Congress also has some plans to fix all that. These projects include tools for sharing data and storing it in cloud facilities, running challenges, injecting new fertility into collaboration projects, and ways to gather more patient data and bring patients into the planning process. Through two days of demos, keynotes, panels, and breakout sessions, Sage Congress brought its vision to a high-level cohort of 230 attendees from universities, pharmaceutical companies, government health agencies, and others who can make change in the field.
In the course of this series of articles, I'll pinpoint some of the pain points that can force researchers, pharmaceutical companies, doctors, and patients to work together better. I'll offer a look at the importance of public input, legal frameworks for cooperation, the role of standards, and a number of other topics. But we'll start by seeing what Sage Bionetworks and its pals have done over the past year.
Synapse: providing the tools for genetics collaboration
Everybody understands that change is driven by people and the culture they form around them, not by tools, but good tools can make it a heck of a lot easier to drive change. To give genetics researchers the best environment available to share their work, Sage Bionetworks created the Synapse platform.
Synapse recognizes that data sets in biological research are getting too large to share through simple data transfers. For instance, in his keynote about cancer research (where he kindly treated us to pictures of cancer victims during lunch), UC Santa Cruz professor David Haussler announced plans to store 25,000 cases at 200 gigabytes per case in the Cancer Genome Atlas, also known as TCGA in what seems to be a clever pun on the four nucleotides in DNA. Storage requirements thus work out to 5 petabytes, which Haussler wants to be expandable to 20 petabytes. In the face of big data like this, the job becomes moving the code to the data, not moving the data to the code.
Synapse points to data sets contributed by cooperating researchers, but also lets you pull up a console in a web browser to run R or Python code on the data. Some effort goes into tagging each data set with associated metadata: tissue type, species tested, last update, number of samples, etc. Thus, you can search across Synapse to find data sets that are pertinent to your research.
One group working with Synapse has already harmonized and normalized the data sets in TCGA so that a researcher can quickly mix and run stats on them to extract emerging patterns. The effort took about one and half full-time employees for six months, but the project leader is confident that with the system in place, "we can activate a similar size repository in hours."
This contribution highlights an important principle behind Synapse (appropriately called "viral" by some people in the open source movement): when you have manipulated and improved upon the data you find through Synapse, you should put your work back into Synapse. This work could include cleaning up outlier data, adding metadata, and so on. To make work sharing even easier, Synapse has plans to incorporate the Amazon Simple Workflow Service (SWF). It also hopes to add web interfaces to allow non-programmers do do useful work with data.
The Synapse development effort was an impressive one, coming up with a feature-rich Beta version in a year with just four coders. And Synapse code is entirely open source. So not only is the data distributed, but the creators will be happy for research institutions to set up their own Synapse sites. This may make Synapse more appealing to geneticists who are prevented by inertia from visiting the original Synapse.
Mike Kellen, introducing Synapse, compared its potential impact to that of moving research from a world of journals to a world like GitHub, where people record and share every detail of their work and plans. Along these lines, Synapse records who has used a data set. This has many benefits:
Researchers can meet up with others doing related work.
It gives public interest advocates a hook with which to call on those who benefit commercially from Synapse--as we hope the pharmaceutical companies will--to contribute money or other resources.
Members of the public can monitor accesses for suspicious uses that may be unethical.
There's plenty more work to be done to get data in good shape for sharing. Researchers must agree on some kind of metadata--the dreaded notion of ontologies came up several times--and clean up their data. They must learn about data provenance and versioning.
But sharing is critical for such basics of science as reproducing results. One source estimates that 75% of published results in genetics can't be replicated. A later article in this series will examine a new model in which enough metainformation is shared about a study for it to be reproduced, and even more important to be a foundation for further research.
With this Beta release of Synapse, Sage Bionetworks feels it is ready for a new initiative to promote collaboration in biological research. But how do you get biologists around the world to start using Synapse? For one, try an activity that's gotten popular nowadays: a research challenge.
The Sage DREAM challenge
Sage Bionetworks' DREAM challenge asks genetics researchers to find predictors of the progression of breast cancer. The challenge uses data from 2000 women diagnosed with breast cancer, combining information on DNA alterations affecting how their genes were expressed in the tumors, clinical information about their tumor status, and their outcomes over ten years. The challenge is to build models integrating the alterations with molecular markers and clinical features to predict which women will have the most aggressive disease over a ten year period.
Several hidden aspects of the challenge make it a clever vehicle for Sage Bionetworks' values and goals. First, breast cancer is a scourge whose urgency is matched by its stubborn resistance to diagnosis. The famous 2009 recommendations of U.S. Preventive Services Task Force, after all the controversy was aired, left us with the dismal truth that we don't know a good way to predict breast cancer. Some women get mastectomies in the total absence of symptoms based just on frightening family histories. In short, breast cancer puts the research and health care communities in a quandary.
We need finer-grained predictors to say who is likely to get breast cancer, and standard research efforts up to now have fallen short. The Sage proposal is to marshal experts in a new way that combines their strengths, asking them to publish models that show the complex interactions between gene targets and influences from the environment. Sage Bionetworks will publish data sets at regular intervals that it uses to measure the predictive ability of each model. A totally fresh data set will be used at the end to choose the winning model.
The process behind the challenge--particularly the need to upload code in order to run it on the Synapse site--automatically forces model builders to publish all their code. According to Stephen Friend, founder of Sage Bionetworks, "this brings a level of accountability, transparency, and reproducibility not previously achieved in clinical data model challenges."
Finally, the process has two more effects: it shows off the huge amount of genetic data that can be accessed through Synapse, and it encourages researchers to look at each other's models in order to boost their own efforts. In less than a month, the challenge already received more than 100 models from 10 sources.
The reward for winning the challenge is publication in a respected journal, the gold medal still sought by academic researchers. (More on shattering this obelisk later in the series.) Science Translational Medicine will accept results of the evaluation as a stand-in for peer review, a real breakthrough for Sage Bionetworks because it validates their software-based, evidence-driven process.
Finally, the DREAM challenge promotes use of the Synapse infrastructure, and in particular the method of bringing the code to the data. Google is donating server space for the challenge, which levels the playing field for researchers, freeing them from paying for their own computing.
A single challenge doesn't solve all the problems of incentives, of course. We still need to persuade researchers to put up their code and data on a kind of genetic GitHub, persuade pharmaceutical companies to support open research, and persuade the general public to share data about the phonemes (life data) and genes--all topics for upcoming articles in the series.
Next: Sage Congress Plans for Patient Engagement. All articles in this series, and others I've written about Sage Congress, are available through a bit.ly bundle.
OSCON 2012 — Join the world's open source pioneers, builders, and innovators July 16-20 in Portland, Oregon. Learn about open development, challenge your assumptions, and fire up your brain.Save 20% on registration with the code RADAR20
April 27 2012
Passage of CISPA in the U.S. House highlights need for viable cybersecurity legislation
To paraphrase Ben Franklin, he who sacrifices online freedom for the sake of cybersecurity deserves neither. Last night, the Cyber Intelligence Sharing and Protection Act (CISPA) (H.R. 3523) through the United States House of Representatives was sent to a vote a day earlier than scheduled. CISPA passed the House by a vote of 250-180, defying a threatened veto from the White House. The passage of CISPA now sets up a fierce debate in the Senate, where Senate Majority Leader Harry Reid (D-NV) has indicated that he wishes to bring cybersecurity legislation forward for a vote in May.
The votes on H.R. 3523 broke down along largely partisan lines, although dozens of both Democrats and Republicans voted for or against CISPA it in the finally tally. CISPA was introduced last November and approved by the House Intelligence Committee by a 17-1 vote before the end of 2011, which meant that the public has had months to view and comment upon the bill. The bill has 112 cosponsors and received no significant opposition from major U.S. corporations, including the social networking giants and telecommunications companies who would be subject to its contents.
In fact, as an analysis of campaign donations by Maplight showed, over the past two years interest groups that support CISPA have outspent those that oppose it by 12 to 1, ranging from defense contractors, cable and satellite TV providers, software makers, cellular companies and online computer services.
While the version of CISPA that passed shifted before the final vote, ProPublica's explainer on CISPA remains a useful resource for people who wish to understand its contents. Declan McCullagh, CNET's tech policy reporter, has also been following the bill closely since it was introduced and he has published an excellent FAQ explaining how CISPA would affect you.
As TechDirt observed last night, the final version of CISPA — available as a PDF from docs.house.gov contained more scope on the information types collected in the name of security. Specifically, CISPA now would allow the federal government to use information for the purpose of investigation and prosecution of cybersecurity crimes, protection of individuals, and the protection of children. In this context, a "cybersecurity crime" would be defined as any crime that involves network disruption or "hacking."
Civil libertarians, from the Electronic Frontier Foundation (EFF) to the American Civil Liberties Union, have been fiercely resisting CISPA for months. "CISPA goes too far for little reason," said Michelle Richardson, the ACLU legislative counsel, in a statement on Thursday. "Cybersecurity does not have to mean abdication of Americans' online privacy. As we've seen repeatedly, once the government gets expansive national security authorities, there's no going back. We encourage the Senate to let this horrible bill fade into obscurity."
Today, there is widespread alarm online over the passage of CISPA, from David Gewirtz calling it heinous at ZDNet to Alexander Furnas exploring its troubling aspects to it being called a direct threat to Internet privacy over at WebProNews.
The Center for Democracy and Technology issued a statement that it was:
"... disappointed that House leadership chose to block amendments on two core issues we had long identified — the flow of information from the private sector directly to NSA and the use of that information for national security purposes unrelated to cybersecurity. Reps. Thompson, Schakowsky, and Lofgren wrote amendments to address those issues, but the leadership did not allow votes on those amendments. Such momentous issues deserved a vote of the full House. We intend to press these issues when the Senate takes up its cybersecurity legislation."
Alexander Furnas included a warning in his nuanced exploration of the bill at The Atlantic:
"CISPA supporters — a list that surprisingly includes SOPA opponent Congressman Darrell Issa — are quick to point out that the bill does not obligate disclosure of any kind. Participation is 'totally voluntary.' They are right, of course, there is no obligation for a private company to participate in CISPA information sharing. However, this misses the point. The cost of this information sharing — in terms of privacy lost and civil liberties violated — is borne by individual customers and Internet users. For them, nothing about CISPA is voluntary and for them there is no recourse. CISPA leaves the protection of peoples' privacy in the hands of companies who don't have a strong incentive to care. Sure, transparency might lead to market pressure on these companies to act in good conscience; but CISPA ensures that no such transparency exists. Without correctly aligned incentives, where control over the data being gathered and shared (or at least knowledge of that sharing) is subject to public accountability and respectful of individual right to privacy, CISPA will inevitably lead to an eco-system that tends towards disclosure and abuse."
The context that already exists around digital technology, civil rights and national security must also be acknowledged for the purposes of public debate. As the EFF's Trevor Timm emphasized earlier this week, once national security is invoked, both civilian and law enforcement wield enormous powers to track and log information about citizens' lives without their knowledge nor practical ability to gain access to the records involved.
On that count, CISPA provoked significant concerns from the open government community, with the Sunlight Foundation's John Wonderlich calling the bill terrible for transparency because it proposes to limit public oversight of the work of information collection and sharing within the federal government.
"The FOIA is, in many ways, the fundamental safeguard for public oversight of government's activities," wrote Wonderlich. "CISPA dismisses it entirely, for the core activities of the newly proposed powers under the bill. If this level of disregard for public accountability exists throughout the other provisions, then CISPA is a mess. Even if it isn't, creating a whole new FOIA exemption for information that is poorly defined and doesn't even exist yet is irresponsible, and should be opposed."
What's the way forward?
The good news, for those concerned about what passage of the bill will mean for the Internet and online privacy, is that now the legislative process turns to the Senate. The open government community's triumphalism around the passage of the DATA Act and the gathering gloom and doom around CISPA all meet the same reality in this respect: checks and balances in the other chamber of Congress and a threatened veto from the White House.
Well done, founding fathers.
On the latter count, the White House has made it clear that the administration views CISPA as a huge overreach on privacy, driving a truck through existing privacy protections. The Obama administration has stated (PDF) that CISPA:
"... effectively treats domestic cybersecurity as an intelligence activity and thus, significantly departs from longstanding efforts to treat the Internet and cyberspace as civilian spheres. The Administration believes that a civilian agency — the Department of Homeland Security — must have a central role in domestic cybersecurity, including for conducting and overseeing the exchange of cybersecurity information with the private sector and with sector-specific Federal agencies."
At a news conference yesterday in Washington, the Republican leadership of the House characterized the administration's position differently. "The White House believes the government ought to control the Internet, government ought to set standards, and government ought to take care of everything that's needed for cybersecurity," said Speaker of the House John Boehner (R-Ohio), who voted for CISPA. "They're in a camp all by themselves."
Representative Mike Rogers (R-Michigan) -- the primary sponsor of the bill, along with Representative Dutch Ruppersberger (D-Maryland) -- accused opponents of "obfuscation" on the House floor yesterday.
While there are people who are not comfortable with the Department of Homeland Security (DHS) holding the keys to the nation's "cyberdefense" — particularly given the expertise and capabilities that rest in the military and intelligence communities — the prospect of military surveillance of citizens within the domestic United States is not likely to be one that the founding fathers would support, particularly without significant oversight from the Congress.
CISPA does not, however, formally grant either the National Security Agency or DHS any more powers than they already hold under existing legislation, such as the Patriot Act. It would, however, enable more information sharing between private companies and government agencies, including threat information pertinent to legitimate national security concerns.
It's crucial to recognize that cybersecurity legislation has been percolating in the Senate for years now without passage. That issue of civilian oversight is a key issue in the Senate wrangling, where major bills have been circulating for years now without passage, from proposals from Senator Lieberman's office on cybersecurity to the ICE Act from Senator Carper to Senator McCain's proposals.
If the fight over CISPA is "just beginning", as Andy Greenberg wrote in Forbes today, it's important for everyone that's getting involved because of concerns over civil liberties or privacy recognizes that CISPA is not like SOPA, as Brian Fung wrote in the American Prospect, particularly after provisions regarding intellectual property were dropped:
"At some point, privacy groups will have to come to an agreement with Congress over Internet legislation or risk being tarred as obstructionists. That, combined with the fact that most ordinary Americans lack the means to distinguish among the vagaries of different bills, suggests that Congress is likely to win out over the objections of EFF and the ACLU sooner rather than later. Thinking of CISPA as just another SOPA not only prolongs the inevitable — it's a poor analogy that obscures more than it reveals."
That doesn't mean that those objections aren't important or necessary. It does mean, however, that anyone who wishes to join the debate must recognize that genuine security threats do exist, even though massive hype about a potential "Cyber 9/11" perpetuated by contractors that stand to benefit from spending continues to pervade the media. There are legitimate concerns regarding the theft of industrial secrets, "crimesourcing" by organized crime and the reality of digital agents from the Chinese, Iranian and Russian governments — along with non-state actors — exploring the IT infrastructure of the United States.
The simple reality is that in Washington, national security trumps everything. It's not like intellectual property or energy or education or healthcare. What anyone who wishes to get involved in this debate will need to do is to support an affirmative vision for what roles the federal government and the private sector should play in securing the nation's critical infrastructure against electronic attacks. And the relationship of business and government complicates cybersecurity quite a bit, as "Inside Cyber Warfare" author Jeffrey Carr explained here at Radar in February:
"Due to the dependence of the U.S. government upon private contractors, the insecurity of one impacts the security of the other. The fact is that there are an unlimited number of ways that an attacker can compromise a person, organization or government agency due to the interdependencies and connectedness that exist between both."
The good news today is that increased awareness of the issue will drive more public debate about what's to be done. During the week the Web changed Washington in January, the world saw how the Internet can act as a platform for collective action against a bill.
Civil liberties groups have vowed to continue advocating against the passage of any vaguely drafted bill in the Senate.
On Monday, more than 60 distinguished IT security professionals, academics and engineers published an open letter to Congress urging opposition to any "'cybersecurity' initiative that does not explicitly include appropriate methods to ensure the protection of users’ civil liberties."
The open question now, as with intellectual property, is whether major broadcast and print media outlets in the United States will take their role of educating citizens seriously enough for the nation to meaningfully participate in legislative action.
This is a debate that will balance the freedoms that the nation has fought hard to achieve and defend throughout its history against the dangers we collectively face in a century when digital technologies have become interwoven into the everyday lives of citizens. We live in a networked age, with new attendant risks and rewards.
Citizens should hold their legislators accountable for supporting bills that balance civil liberties, public oversight and privacy protections with improvements to how the public and private sector monitors, mitigates and shares information about network security threats in the 21st century.
April 18 2012
What responsibilities and challenges come with open government?
A historic Open Government Partnership launched in New York City last September with 8 founding countries. Months later representatives from 73 countries and 55 governments have come together to present their open government action plans and formally endorse the principles in the Open Government Partnership. Yesterday, hundreds of attendees from government, civil society, media and the private sector watched in person and online as Brazilian President Dilma Rousseff spoke about her country's efforts to root out corruption and engage the Brazilian people in governance and more active citizenship. United States Secretary of State Hillary Clinton preceded her, defining an open or closed society as a key dividing line of the 21st century.
Today's agenda includes more regional breakouts and an opening plenary session on the "Responsibility and Challenges that Come with Openness." If you have an Internet connection, you should be able to watch the discussion in the embedded player below:
The plenary will feature Walid al-Saqaf of YemenPortal.net & Alkasir, minister Francis Maude from the United Kingdom, Tunisian Secretary of State Ben Abbes, and Fernando Rodrigues, and investigative journalist from Folha de São Paulo in Brazil.
The liveblog of the entire proceedings is embedded below.
April 10 2012
Open source is interoperable with smarter government at the CFPB
When you look at the government IT landscape of 2012, federal CIOs are being asked to address a lot of needs. They have to accomplish your mission. They need to be able to scale initiatives to tens of thousands of agency workers. They're under pressure to address not just network security but web security and mobile device security. They also need to be innovative, because all of this is supported by the same of less funding. These are common requirements in every agency.
As the first federal "start-up agency" in a generation, some of those needs at the Consumer Financial Protection Bureau (CFPB) are even more pressing. On the other hand, the opportunity for the agency to be smarter, leaner and "open from the beginning" is also immense.
Progress establishing the agency's infrastructure and culture over the first 16 months has been promising, save for larger context of getting a director at the helm. Enabling open government by design isn't just a catchphrase at the CFPB. There has been a bold vision behind the CFPB from the outset, where a 21st century regulator would leverage new technologies to find problems in the economy before the next great financial crisis escalates.
In the private sector, there's great interest right now is finding actionable insight in large volumes of data. Making sense of big data is increasingly being viewed as a strategic imperative in the public sector as well. Recently, the White House put its stamp on that reality with a $200 million big data research and development initiative, including a focus on improving the available tools. There's now an entire ecosystem of software around Hadoop, which is itself open source code. The problem that now exists in many organizations, across the public and private sector, is not so much that the technology to manipulate big data isn't available: it's that the expertise to apply big data doesn't exist in-house. The data science talent shortage is real.
People who work and play in the open source community understand the importance of sharing code, especially when that action leads to improving the code base. That's not necessarily an ethic or a perspective that has been pervasive across the federal government. That does seem to be slowly changing, with leadership from the top: the White House used Drupal for its site and has since contributed modules back into the open source community, including one that helps with 508 compliance.
In an in-person interview last week, CFPB CIO Chris Willey (@ChrisWilleyDC) and acting deputy CIO Matthew Burton (@MatthewBurton) sat down to talk about the agency's new open source policy, government IT, security, programming in-house, the myths around code-sharing, and big data.
The fact that this government IT leadership team is strongly supportive of sharing code back to the open source community is probably the most interesting part of this policy, as Scott Merrill picked up in his post on the CFPB and Github.
Our interview follows.
In addition to being the leader of the CFPB's development team over the past year and half, Burton was just named acting deputy chief information officer. What will that mean?
Willey: He hasn't been leading the software development team the whole time. In fact, we only really had an org chart as of October. In the time that he's been here, Matt has led his team to some amazing things. We're going to talk about a one of them today, but we've also got a great intranet. We've got some great internal apps that are being built and that we've built. We've unleashed one version of the supervision system that helps bank examiners do their work in the field. We've got a lot of faith he's going to do great things.
What it actually means is that he's going to be backing me up as CIO. Even though we're a fairly small organization, we have an awful lot going on. We have 76 active IT projects, for example. We're just building a team. We're actually doubling in size this fiscal year, from about 35 staff to 70, as well as adding lots of contractors. We're just growing the whole pie. We've got 800 people on board now. We're going to have 1,100 on board in the whole bureau by the end of the fiscal year. There's a lot happening, and I recognize we need to have some additional hands and brain cells helping me out.
With respect to building an internal IT team, what's the thinking behind having technical talent inside of an agency like this one? What does that change, in terms of your relationship with technology and your capacity to work?
Burton: I think it's all about experimentation. Having technical people on staff allows an organization to do new things. I think the way most agencies work is that when they have a technical need, they don't have the technical people on staff to make it happen so instead, that need becomes larger and larger until it justifies the contract. And by then, the problem is very difficult to solve.
By having developers and designers in-house, we can constantly be addressing things as they come up. In some cases, before the businesses even know it's a problem. By doing that, we're constantly staying ahead of the curve instead of always reacting to problems that we're facing.
How do you use open source technology to accomplish your mission? What are the tools you're using now?
Willey: We're actually trying to use open source in every aspect of what we do. It's not just in software development, although that's been a big focus for us. We're trying to do it on the infrastructure side as well.
As we look at network and system monitoring, we look at the tools that help us manage the infrastructure. As I've mentioned in the past, we are 100% in the cloud today. Open source has been a big help for us in giving us the ability to manipulate those infrastructures that we have out there.
At the end of the day, we want to bring in the tools that make the most sense for the business needs. It's not about only selecting open source or having necessarily a preference for open source.
What we've seen is that over time, the open source marketplace has matured. A lot of tools that might not have been ready for prime time a year ago or two years ago are today. By bringing them into the fold, we potentially save money. We potentially have systems that we can extend. We could more easily integrate with the other things that we have inside the shop that maybe we built or maybe things that we've acquired through other means. Open source gives us a lot of flexibility because there's a lot of opportunities to do things that we might not be able to do with some proprietary software.
Can you share a couple of specific examples of open source tools that you're using and what you actually use them for within mission?
Willey: On network monitoring, for example, we're using ZFS, which is an open source monitoring tool. We've been working with Nagios as well. Nagios, we actually inherited from Treasury — and while Treasury's not necessarily known for its use of open source technologies, it uses that internally for network monitoring. Splunk is another one that we have been using for web analysis. [After the interview, Burton and Willey also shared that they built the CFPB's intranet on MediaWiki, the software that drives Wikipedia.]
Burton: On the development side, we've invested a lot in Django and WordPress. Our site is a hybrid of them. It's WordPress at its core, with Django on top of that.
In November of 2010, it was actually a few weeks before I started here, Merici [Vinton] called me and said, "Matt, what should we use for our website?"
And I said, "Well, what's it going to do?"
And she said, "At first, it's going to be a blog with a few pages."
And this website needed to be up and running by February. And there was no hosting; there was nothing. There were no developers.
So I said, "Use WordPress."
And by early February, we had our website up. I'm not sure that would have been possible if we had to go through a lengthy procurement process for something not open source.
We use a lot of jQuery. We use Linux servers. For development ops, we use Selenium and Jenkins and Git to manage our releases and source code. We actually have GitHub Enterprise, which although not open source, is very sharing-focused. It encourages sharing internally. And we're using GitHub on the public side to share our code. It's great to have the same interface internally as we're using externally.
Developers and citizens alike can go to github.com/cfpb and see code that you've released back to the public and for other federal agencies. What projects are there?
Burton: These are the ones that came up between basic building blocks. They range from code that may not strike an outside developer as that interesting but that's really useful for the government, all the way to things that we created from scratch that are very developer-focused and are going to be very useful for any developer.
On the first side of that spectrum, there's an app that we made for transit subsidy involvement. Treasury used to manage our transit subsidy balances. That involved going to a webpage that you would print out, write into with a pen and then fax to someone.
Willey: Or scan and email it.
Burton: Right. And then once you'd had your supervisor sign it, faxed it over to someone, eventually, several weeks later, you would get your benefits. We started to take over that process and the human resources office came to us and asked, "How can we do this better?"
Obviously, that should just be a web form that you type into, that will auto fill any detail it knows about you. You press submit and it goes into the database, which goes directly to the DOT [Department of Transportation]. So that's what we made. We demoed that for DOT and they really like it. USAID is also into it. It's encouraging to see that something really simple could prove really useful for other agencies.
On the other side of the spectrum, we use a lot of Django tools. As an example, we have a tool we just released through our website called "Ask CFPB." It's a Django-based question and answer tool, with a series of questions and answers.
Now, the content is managed in Django. All of the content is managed from our staging server behind the firewall. When we need to get that content, we need to get the update from staging over to production.
Before, what we had to do was pick up the entire database, copy it and them move it over to production, which was kind of a nightmare. And there was no Django tool for selectively moving data modifications.
So we sat there and we thought, "Oh, we really need something to do that because we're going to be doing a lot of that. We can't be copying the database over every time we need to correct a copy. So two of our developers developed a Django app called "Nudge." Basically, you go into a Django and if you've ever seen a Django admin, you just go into it and assess, "Hey, here's everything that's changed. What do you want to move over?"
You can pick and choose what you want to move over and, with the click of a button, it goes to production. I think that's something that every Django developer will have a use for if they have a staging server.
In a way, we were sort of surprised it didn't exist. So, we needed it. We built it. Now we're giving it back and anybody in the world can use it.
You mentioned the cloud. I know that CFPB is very associated with Treasury. Are you using Treasury's FISMA moderate cloud?
Willey: We have a mix of what I would say are private and public clouds. On the public side, we're using our own cloud environments that we have established. On the private side, we are using Treasury for some of our apps. We're slowly migrating off of treasury systems onto our own cloud infrastructure or our own cloud.
In the case of email, for example, we're looking at email as a service. So we'll be looking at Google, Microsoft and others just to see what's out there and what we might be able to use.
Why is it important for the CFPB to share code back to the public? And who else in the federal government has done something like this, aside from the folks at the White House?
Burton:: We see it the same way that we believe the rest of the open source community sees it: The only way this stuff is going to get better and become more viable is if people share. Without that, then it'll only be hobbyists. It'll only be people who build their own little personal thing. Maybe it's great. Maybe it's not. Open source gets better by the community actually contributing to it. So it's self-interest in a lot of ways. If the tools get better, then what we have available to us is, therefore, gets better. We can actually do our mission better.
Using the transit subsidy enrollment application example, it's also an opportunity for government to help itself, for one agency to help another. We've created this thing. Every federal agency has a transit subsidy program. They all need to allow people to enroll in it. Therefore, it's immediately useful to any other agency in the federal government. That's just a matter of government improving its own processes.
If one group does it, why should another group have to figure it out or have to pay lots of money to have it figured out? Why not just share it internally and then everybody benefits?
Why do you think it's taken until 2012 to have that insight actually be made into reality in terms of a policy?
Burton: I think to some degree, the tools have changed. The ability to actually do this easily is a lot better now than it was even a year or two ago. Government also traditionally lags behind the private sector in a lot of ways. I think that's changing, too. With this administration in particular, I think what we've seen is that government has started to become a little bit on parity with the private sector, including some of the thinking around how to use technology to improve business processes. That's really exciting. And I think as a result, there are a lot of great people coming in as developers and designers who want to work in the federal government because they see that change.
Willey: It's also because we're new. There are two things behind that. First, we're able to sort of craft a technology philosophy with a modern perspective. So we can, from our founding, ask "What is the right way to do this?" Other agencies, if they want to do this, have to turn around decades of culture. We don't have that burden. I think that's a big reason why we're able to do this.
The second thing is a lot of agencies don't have the intense need that we do. We have 76 projects to do. We have to use every means available to us.
We can't say, "We're not going to use a large share of the software that's available to us." That's just not an option. We have to say, "Yes, we will consider this as a commercial good, just like any other piece of proprietary software."
In terms of the broader context for technology and policy, how does open source relate to open government?
Willey: When I was working for the District, Apps for Democracy was a big contest that we did around opening data and then asking developers to write applications using that data that could then be used by anybody. We said that the next logical step was to sort of create more participatory government. And in my mind, open sourcing the projects that we do is a way of asking the citizenry to participate in the active government.
So by putting something in the public space, somebody could pick that up. Maybe not the transit subsidy enrollment project — but maybe some other project that we've put out there that's useful outside of government as well as inside of government. Somebody can pick that code up, contribute to it and then we benefit. In that way, the public is helping us make government better.
When you have conversations around open source in government, what do you say about what it means to put your code online and to have people look at it or work on it? Can you take changes that people make to the code base to improve it and then use it yourself?
Willey: Everything that we put out there will be reviewed by our security team. The goal is that, by the time it's out there, not to have any security vulnerabilities. If someone does discover a security vulnerability, however, we'll be sharing that code in a way that makes it much more likely that someone will point it out to us and maybe even provide a fix than they will exploit it because it's out there. They wouldn't be exploiting our instance of the code; they would be working with the code on Github.com.
I've seen people in government with a misperception of what open source means. They hear that it's code that anyone can contribute to. I think that they don't understand that you're controlling your own instance of it. They think that anyone can come along and just write anything into your code that they like. And, of course, it's not like that.
I think as we talk more and more about this to other agencies, we might run into that, but I think it'll be good to have strong advocates in government, especially on the security side, who can say, "No, that's not the case; it doesn't work that way."
Burton: We have a firewall between our public and private instances at Git as well. So even if somebody contributes code, that's also reviewed on the way in. We wouldn't implement it unless we made sure that, from a security perspective, the code was not malicious. We're taking those precautions as well.
I can't point to one specifically, but I know that there have been articles and studies done on the relative security of open source. I think the consensus in the industry is that the peer review process of open source actually helps from a security perspective. It's not that you have a chaos of people contributing code whenever they want to. It improves the process. It's like the thinking behind academic papers. You do peer review because it enhances the quality of the work. I think that's true for open source as well.
We actually want to create a community of peer reviewers of code within the federal government. As we talk to agencies, we want people to actually use the stuff we build. We want them to contribute to it. We actually want them to be a community. As each agency contributes things, the other agencies can actually review that code and help each other from that perspective as well.
It's actually fairly hard. As we build more projects, it's going to put a little bit of a strain on our IT security team, doing an extra level of scrutiny to make sure that the code going out is safe. But the only way to get there is to grow that pie. And I think by talking with other agencies, we'll be able to do that.
A classic open source koan is that "with many eyes, all bugs become shallow." In IT security, is it that with many eyes, all worms become shallow?
Burton: What the Department of Defense said was if someone has malicious intent and the code isn't available, they'll have some way of getting the code. But if it is available and everyone has access to it, then any vulnerabilities that are there are much more likely to be corrected than before they're exploited.
How do you see open source contributing to your ability to get insights from large amounts of data? If you're recruiting developers, can they actually make a difference in helping their fellow citizens?
Burton: It's all about recruiting. As we go out and we bring on data people and software developers, we're looking for that kind of expertise. We're looking for people that have worked with PostgreSQL. We're looking for people that have worked with Solar. We're looking for people that have worked with Hadoop, because then we can start to build that expertise in-house. Those tools are out there.
R is an interesting example. What we're finding is that as more people are coming out of academia into the professional world, they're actually used to using R in school. And then they have to come out and learn a different tool and they're actually working in the marketplace.
It's similar with the Mac versus the PC. You get people using the Mac in college — and suddenly they have to go to a Windows interface. Why impose that on them? If they're going to be extremely productive with a tool like R, why not allow that to be used?
We're starting to see, in some pockets of the bureau, push from the business side to actually use some of these tools, which is great. That's another change I think that's happened in the last couple of years.
Before, there would've been big resistance on that kind of thing. Now that we're getting pushed a little bit, we have to respond to that. We also think it's worth it that we do.
Related:
- The Consumer Financial Protection Bureau shares code built for the people with the people
- Government IT's quiet open source evolution
- Cost is only part of the Gov 2.0 open source story
- Promoting Open Source Software in Government: The Challenges of Motivation and Follow-Through
Carsharing saves U.S. city governments millions in operating costs
One of the most dynamic sectors of the sharing economy is the trend in large cities toward more collaborative consumption — and the entrepreneurs have followed, from Airbnb to Getable to Freecycle. Whether it's co-working, bike sharing, exchanging books and videos, or cohabiting hackerspaces and community garden spaces, there are green shoots throughout the economy that suggest the way we work, play and learn is changing due to the impact of connection technologies and the Great Recession.
This isn't just about the classic dilemma of "buy vs. rent." It's about whether people or organizations can pool limited resources to more efficiently access tools or services as needed and then pass them back into a commons, if appropriate.
Speaking to TechCrunch last year, Lauren Anderson floated the idea that a collaborative consumption revolution might be as "significant as the Industrial Revolution." We'll see about that. The new sharing economy is clearly a powerful force, as a recent report (PDF) by Latitude Research and Shareable Magazine highlighted, but it's not clear yet if it's going to transform society and production in the same way that industrialized mass production did in the 19th and 20th centuries.

Infographic from "The New Sharing Economy" study. Read the report (PDF) and
see a larger version of this image.
Carsharing is saving
What is clear is that, after years of spreading through the private sector, collaborative consumption is coming to government, and it's making a difference. A specific example: Carsharing via Zipcar in city car fleets is saving money and enabling government to increase its efficacy and decrease its use of natural resources.
After finally making inroads into cities, Zipcar is saving taxpayers real money in the public sector. Technology developed by the car-sharing startup is being used in 10 cities and municipalities in 2012. If data from a pilot with the United States General Services Agency fleet pans out, the technology could be also adopted across the sprawling federal agency's vehicles, saving tens of millions of dollars of operating expenses though smarter use of new technology.
"Now the politics are past, the data are there," said Michael Serafino, general manager for Zipcar's university and FastFleet programs, in a phone interview. "Collaborative consumption isn't so difficult from other technology. We're all used to networked laser printers. The car is just a tool to do business. People are starting to come around to the idea that it can be shared."
As with many other city needs, vehicle fleet management in the public sector shares commonalities across all cities. In every case, municipal governments need to find a way to use the vehicles that the city owns more efficiently to save scarce funds.
The FastFleet product has been around for a little more than three years, said Serafino. Zipcar started it in beta and then took a "methodical approach" to rolling it out.
FastFleet uses the same mechanism that's used throughout thousands of cars in the Zipcar fleet: a magnetized smartcard paired with a card reader in the windshield that can communicate with a central web-based reservation system.
There's a one-time setup charge to get a car wired for the system and then a per-month charge for the FastFleet service. The cost of that installation varies, predicated upon the make of vehicles, type of vehicles and tech that goes into them. Zipcar earns its revenue in a model quite similar to cloud computing and software-as-a-service, where operational costs are billed based upon usage.
Currently, Washington, D.C., Chicago, Santa Cruz, Calif., Boston, New York and Wilmington, Del. are all using FastFleet to add carsharing capabilities to their fleets, with more cities on the way. (Zipcar's representative declined to identify which municipalities are next.)
Boston's pilot cut its fleet in half
"Lots of cities have departments where someone occasionally needs a car," said Matthew Mayrl, chief of staff in the Boston Public Works department, during a phone interview.
"They buy one and then use it semi-frequently, maybe one to two times per week. But they do need it, so they can't give up the car. That means it's not being used for highest utilization."
The utilization issue is the key pain point, in terms of both efficiency and cost. Depending on the make and model, it generally costs between $3,000 and $7,000 on average for a municipality to operate a vehicle, said Serafino. "Utilization is about 30% in most municipal fleets," he said.
That's where collaborative consumption became to relevant to Boston. Mayrl said Boston's Public Works Department talked to Zipcar representatives with two goals in mind: get out of a manual reservation system and reduce the number of cars the city uses, which would reduce costs in the process. "Our public works was, for a long time, administered by a city motor pool," Mayrl said. "It was pretty old school: stop by, get keys, borrow a car."
While Boston did decide to join up with Zipcar, public sector workers aren't using actual Zipcars. The city has licensed Zipcar's FastFleet technology and is adding it to the existing fleet.
One benefit to using just the tech is that it can be integrated with cars that are already branded with the "City of Boston," pointed out Mayrl. That's crucial when the assessing office is visiting a household, he said: In that context, it's important to be identified.
Boston started a pilot in February that was rolled out to existing users of public works vehicles, along with two pilots in assessing and the Department of Motor Vehicles. The program started by taking the oldest cars off the road and training the relevant potential drivers. Using carsharing, the city of Boston was able to reduce the number of vehicles in the pilot by over 50%.
"Previously, there were 28 cars between DPW [the Public Works department] and those elsewhere in the department," said Mayrl. "That's been cut in half. Now we have 12 to 14 cars without any missed reservations. This holds a lot of promise, only a month in. We don't have to worry about maintenance or whether someone is parked in the wrong place or cleaning snow off a car. We hope that if this is successful, we can roll it out to other departments."
The District's fleet gets leaner
While a 50% reduction in fleet size looks like significant cost savings, Serafino said that a 2:1 ratio is actually a conservative number.
"We strive for 3:1," Serafino said. "The one thing we have is data. We capture and gather data from every single use of every single vehicle by every single driver, at a very granular level, including whenever a driver gets in and out. That allows a city to measure real utilization and efficiency. Using those numbers, officials can drive policy and other things. You can take effective utilization and real utilization and say, 'we're taking away these four cars from this area.' You can use hard data gathered by the system to make financial and efficiency decisions."
Based upon the results to date, Serafino said he expects Washington, DC, to triple its investment in the system. "The original pilot was started by a mandated reduction by [former DC Mayor Adrian] Fenty, who said 'make this goal,' and 'get it done by this date.' Overall, DC went from 365 to 80 vehicles by consolidating and cooperating."
Serafino estimated the reduction represents about 50% of the opportunity for DC to save money. "The leader of the DC Department of Public Works wants to do more," he said. "The final plans are to get to a couple of hundred vehicles under management, resulting in another reduction by at least 200 cars." Serafino estimated potential net cost savings would be north of $1 million per year.
There is a floor, however, for how lean a city's car fleet can become — and a ceiling for optimal utilization as well.
"The more you reduce, the harder it gets," said Serafino. "DC may have gone too far, by going down to 80 [vehicles]. It has hurt mobility." If you cut into fat deep enough, in other words, eventually you hit muscle and bone.
"DC is passing 70% utilization on a per-day basis," said Serafino. "They have three to four people using each of the cars every day. The trip profile, in the government sense, is different from other customers. We don't expect to go over 80%. There is a point where you can get too lean. DC has kind of gotten there now."
In Boston, Mayrl said they did a financial analysis of how to reduce costs from their car fleet. "It was cheaper to better manage the cars we have than to buy new ones. Technology helps us do that. [Carsharing] had already been done in a couple of other cities. Chicago does it. The city of DC does it. We went to a competitive bid for an online vehicle fleet management software system. [Zipcar] was the only respondent."
Given that FastFleet has been around for more than three years and there's a strong business case for employing the technology, the rate of adoption by American cities might seem to be a little slow to outside observers. What would be missing from that analysis are the barriers to entry for startups that want to compete in the market for government services.
"What hit us was the sales cycle," said Zipcar's Serafino. "The average is about 18 months to two years on city deals. That's why they're all popping now, with more announcements to come soon."
The problem, Serafino mused, was not making the case for potential cost savings. "Cities will only act as sensitive as politics will allow," said Serafino.
"Boston, San Francisco, New York and Chicago are trying. The problem is the automotive and vehicle culture," Serafino said. "That, combined with the financial aspects of decentralized budgeting for fleets, is the bane of fleet managers. Most automotive fleet managers in cities don't control their own destinies. Chicago is one of the very few cities where they can control the entire fleet.
Cities do have other options to use technology to manage their car fleets, from telematics providers to GPS devices to web-based reservation systems, each of which may be comparatively less expensive to buy off the shelf.
One place that Zipcar will continue to face competition at the local level is from companies that provide key vending machines, which are essentially automated devices on garage walls.
"You go get a key and go to a car," said Serafino. "If you have 20 cars in one location, it's not as likely to make sense to choose our system. If you have 50 cars in three locations, that's a different context. You can't just pick up a keybox and move it."
Collaborative consumption goes federal?
Zipcar is continuing along the long on-ramp to working with government. The next step for the company may be to help Uncle Sam with the federal government's car fleet.
As noted previously, the U.S. General Services Agency (GSA) has already done a collaborative consumption pilot using part of its immense vehicle fleet. Serafino says the GSA is now using that data to prepare a broader procurement action for a request for proposals.
The scale for potential cost savings is significant: The GSA manages some 210,000 vehicles, including a small but growing number of electric vehicles.
Given congressional pressure to find cost savings in the federal budget, if the GSA can increase the utilization of its fleet in a way that's even vaguely comparable to the savings that cities are finding, collaborative consumption could become quite popular in Congress.
If carsharing at the federal level succeeded similarly well at scale, members of Congress and staff that became familiar with collaborative consumption through the wildly popular Capital bike sharing program may well see the sharing economy in a new light.
"There's a broader international trend to work to share resources more efficiently, from energy to physical infrastructure," said Mayrl. "Like every good city, we're copying the successful stuff elsewhere."
Related:
- It's a time-sharing world
- Why businesses need to embrace sharing and open systems
- Measuring the economic impact of the Sharing Economy
- Lisa Gansky, "The Mesh: Why the Future of Business is Sharing"
- A future of cities fueled by citizens, open data and collaborative consumption
April 09 2012
The Consumer Financial Protection Bureau shares code built for the people with the people
Editor's Note: This guest post is written by Matthew Burton, the acting deputy chief information officer of the Consumer Financial Protection Bureau (@CFPB). The quiet evolution in government IT has been a long road, with many forks. In the original version of this piece, published on the CFPB's blog, Burton needed to take the time to explain what open source software is because many people in government and citizens in the country still don't understand it, unlike readers here at Radar. That's why the post below includes a short section outlining the basics of open source. — Alex Howard.
The Consumer Financial Protection Bureau (CFPB) was fortunate to be born in the digital era. We've been able to rethink many of the practices that make financial products confusing to consumers and certain regulations burdensome for businesses. We've also been able to launch the CFPB with a state-of-the-art technical infrastructure that's more stable and more cost-effective than an equivalent system was just 10 years ago.
Many of the things we're doing are new to government, which has made them difficult to achieve. But the hard part lies ahead. While our current technology is great, those of us on the CFPB's Technology & Innovation team will have failed if we're still using the same tools 10 years from now. Our goal is not to tie the Bureau to 2012's technology, but to create something that stays modern and relevant — no matter the year.
Good internal technology policies can help, especially the policy that governs our use of software source code. We are unveiling that policy today.
Source code is the set of instructions that tells software how to work. This is distinct from data, which is the content that a user inputs into the software. Unlike data, most users never see software source code; it works behind the scenes while the users interact with their data through a more intuitive, human-friendly interface.
Some software lets users modify its source code, so that they can tweak the code to achieve their own goals if the software doesn't specifically do what users want. Source code that can be freely modified and redistributed is known as "open-source software," and it has been instrumental to the CFPB's innovation efforts for a few reasons:
- It is usually very easy to acquire, as there are no ongoing licensing fees. Just pay once, and the product is yours.
- It keeps our data open. If we decide one day to move our website to another platform, we don't have to worry about whether the current platform is going to keep us from exporting all of our data. (Only some proprietary software keeps its data open, but all open source software does so.)
- It lets us use tailor-made tools without having to build those tools from scratch. This lets us do things that nobody else has ever done, and do them quickly.
Until recently, the federal government was hesitant to adopt open-source software due to a perceived ambiguity around its legal status as a commercial good. In 2009, however, the Department of Defense made it clear that open source software products are on equal footing with their proprietary counterparts.
We agree, and the first section of our source code policy is unequivocal: We use open-source software, and we do so because it helps us fulfill our mission.
Open-source software works because it enables people from around the world to share their contributions with each other. The CFPB has benefited tremendously from other people's efforts, so it's only right that we give back to the community by sharing our work with others.
This brings us to the second part of our policy: When we build our own software or contract with a third party to build it for us, we will share the code with the public at no charge. Exceptions will be made when source code exposes sensitive details that would put the Bureau at risk for security breaches; but we believe that, in general, hiding source code does not make the software safer.
We're sharing our code for a few reasons:
- First, it is the right thing to do: the Bureau will use public dollars to create the source code, so the public should have access to that creation.
- Second, it gives the public a window into how a government agency conducts its business. Our job is to protect consumers and to regulate financial institutions, and every citizen deserves to know exactly how we perform those missions.
- Third, code sharing makes our products better. By letting the development community propose modifications , our software will become more stable, more secure, and more powerful with less time and expense from our team. Sharing our code positions us to maintain a technological pace that would otherwise be impossible for a government agency.
The CFPB is serious about building great technology. This policy will not necessarily make that an easy job, but it will make the goal achievable.
Our policy is available in three formats: HTML, for easy access; PDF, for good presentation; and as a GitHub Gist, which will make it easy for other organizations to adopt a similar policy and will allow the public to easily track any revisions we make to the policy.
If you're a coder, keep an eye on our GitHub account. We'll be releasing code for a few projects in the coming weeks.
Related:
- Government IT's quiet open source evolution
- Cost is only part of the Gov 2.0 open source story
- Promoting Open Source Software in Government: The Challenges of Motivation and Follow-Through
April 05 2012
Steep climb for National Cancer Institute toward open source collaboration
Although a lot of government agencies produce open source software, hardly any develop relationships with a community of outside programmers, testers, and other contributors. I recently spoke to John Speakman of the National Cancer Institute to learn about their crowdsourcing initiative and the barriers they've encountered.
First let's orient ourselves a bit--forgive me for dumping out a lot of abbreviations and organizational affiliations here. The NCI is part of the National Institutes of Health. Speakman is the Chief Program Officer for NCI's Center for Biomedical Informatics and Information Technology. Their major open source software initiative is the Cancer Biomedical Informatics Grid (caBIG), which supports tools for transferring and manipulating cancer research data. For example, it provides access to data classifying the carcinogenic aspects of genes (The Cancer Genome Atlas) and resources to help researchers ask questions of and visualize this data (the Cancer Molecular Analysis Portal).
Plenty of outside researchers use caBIG software, but it's a one-way street, somewhat in the way the Department of Veterans Affairs used to release its VistA software. NCI sees the advantages of a give-and-take such as the CONNECT project has achieved, through assiduous cultivation of interested outside contributors, and wants to wean its outside users away from the dependent relationship that has been all take and no give. And even the VA decided last year that a more collaborative arrangement for VistA would benefit them, thus putting the software under the guidance of an independent non-profit, the Open Source Electronic Health Record Agent (OSEHRA).
Another model is Forge.mil, which the Department of Defense set up with the help of CollabNet, the well-known organization in charge of the Subversion revision control tool. Forge.mil represents a collaboration between the DoD and private contractors, encouraging them to create shared libraries that hopefully increase each contractor's productivity, but it is not open source.
The OSEHRA model--creating an independent, non-government custodian--seems a robust solution, although it takes a lot of effort and risks failure if the organization can't create a community around the project. (Communities don't just spring into being at the snap of a bureaucrat's fingers, as many corporations have found to their regret.) In the case of CONNECT, the independent Alembic Foundation stepped in to fill the gap after a lawsuit stalled CONNECT's development within the government. According to Alembic co-founder David Riley, with the contract issues resolved, CONNECT's original sponsor--the Office of the National Coordinator--is spinning off CONNECT to a private sector, open source entity, and work is underway to merge the two baselines.
Whether an agency manages its own project or spins off management, it has to invest a lot of work to turn an internal project into one that appeals to outside developers. This burden has been discovered by many private corporations as well as public entities. Tasks include:
Setting up public repositories for code and data.
Creating a clean software package with good version control that make downloading and uploading simple.
Possibly adding an API to encourage third-party plugins, an effort that may require a good deal of refactoring and a definition of clear interfaces.
Substantially adding to the documentation.
General purging of internal code and data (sometimes even passwords!) that get in the way of general use.
Companies and institutions have also learned that "build it and they will come" doesn't usually work. An open source or open data initiative must be promoted vigorously, usually with challenges and competitions such as the Department of Health and Human Services offer in their annual Health Data Initiative forums (a.k.a datapaloozas).
With these considerations in mind, the NCI decided in the summer of 2011 to start looking for guidance and potential collaborators. Here, laws designed long ago to combat cronyism put up barriers. The NCI was not allowed to contact anyone it wanted out of the blue. Instead, it has to issue a Request for Information and talk to people who responded. Although the RFI went online, it obviously wasn't widely seen. After all, do you regularly look for RFIs and RFPs from government agencies? If so, I can safely guess that you're paid by a large company or lobbying agency to follow a particular area of interest.
RFIs and RFPs are released as a gesture toward transparency, but in reality they just make it easier for the usual crowd of established contractors and lobbyists to build on the relationships they already have with agencies. And true to form, the NCI received only a limited set of responses, frustrated in their attempts to talk to new actors with the expertise they needed for their open source efforts.
And because the RFI had to allow a limited time window for responses, there is no point in responding to it now.
Still, Speakman and his colleagues are educating themselves and meeting with stakeholders. Cancer research is a hot topic drawing zealous attention from many academic and commercial entities, and they're hungry for data. Already, the NCI is encouraged by the initial positive response from the cancer informatics community, many of whom are eager to see the caBIG software deposited in an open repository like GitHub right away. Luckily, HHS has already negotiated terms of service with GitHub and SourceForge, removing at least one important barrier to entry. The NCI is packaging its first tool (a laboratory information management system called caLIMS) for deposit into a public repository. So I'm hoping the NCI is too caBIG to fail.
April 01 2012
What is smart disclosure?
Citizens generate an enormous amount of economically valuable data through interactions with with companies and government. Earlier this year, a report from the World Economic Forum and McKinsey Consulting described the emergence of personal data as of a new asset class." The value created from such data does not , however, always go to the benefit of consumers, particularly when third parties collect it, separating people from their personal data.
The emergence of new technologies and government policies has provided an opportunity to both empower consumers and create new markets from "smarter disclosure" of this personal data. Smart disclosure is when a private company or government agency provides a person with periodic access to his or her own data in open formats that enable them to easily put the data to use. Specifically, smart disclosure refers to the timely release of data in standardized, machine readable formats in ways that enable consumers to make better decisions about finance, healthcare, energy or other contexts.
Smart disclosure is "a new tool that helps provide consumers with greater access to the information they need to make informed choices," wrote Cass Sunstein, the U.S. administrator of the White House Office of Information and Regulatory Affairs (OIRA), in a post on smart disclosure on the White House blog. Sunstein delivered a keynote address at the White House Summit on smart disclosure at the U.S. National Archives on Friday. He authored a memorandum providing guidance on smart disclosure guidance from OIRA in September 2011.
Smart disclosure is part of the final United States National Action Plan for its participation in the Open Government Partnership." Speaking at the launch of the Open Government Partnership in New York City last September, the president specifically referred to the role of smart disclosure in the United States:
"We’ve developed new tools -- called 'smart disclosures' -- so that the data we make public can help people make health care choices, help small businesses innovate, and help scientists achieve new breakthroughs," said President Obama. "We’ve been promoting greater disclosure of government information, empowering citizens with new ways to participate in their democracy," said President Obama. "We are releasing more data in usable forms on health and safety and the environment, because information is power, and helping people make informed decisions and entrepreneurs turn data into new products, they create new jobs."
In the months since the announcement, the U.S. National Science and Technology Council established a smart disclosure task force dedicated to promoting better policies and implementation across government.
"In many contexts, the federal government uses disclosure as a way to ensure that consumers know what they are purchasing and are able to compare alternatives," wrote Sunstein at the White House blog. "Consider nutrition facts labels, the newly designed automobile fuel economy labels, and ChooseMyPlate.gov. Modern technologies are giving rise to a series of new possibilities for promoting informed decisions."
Smart disclosure is a "case of the Administration asking agencies to focus on making available high value data (as distinct from traditional transparency and accountability data) for purposes other than decreasing corruption in government," wrote New York Law School professor Beth Noveck, the former U.S. deputy chief technology officer for open government, in an email. "It starts from the premise that consumers, when given access to information and useful decision tools built by third parties using that information, can self-regulate and stand on a more level playing field with companies who otherwise seek to obfuscate." The choice of Todd Park as United States CTO also sends a message about the importance of smart disclosure to the administration, she said.
The United Kingdom's “midata” smart disclosure initiative is an important smart disclosure case study outside of the United States. Progress there has come in large part because the UK has a privacy law that gives citizens the right to access their personal data held by private companies, unlike the United States. In the UK, however, companies have been complying with the law in a way that did not realize the real potential value of that right to data, which is to say that a citizen could request personal data and it would arrive the mail weeks later at a cost of a few dozen pounds. The UK government has launched a voluntary public-private partnership to enable companies to comply with the law by making the data available online in open formats. The recent introduction of the Consumer Privacy Bill of Rights from the White House and Privacy Report from the FTC suggests that such rights to personal data ownership might be negotiated, in principle, much as a right to credit reports have been in the past.
Four categories of smart disclosure
One of the most powerful versions of smart disclosure is when data on products or services (including pricing algorithms, quality, and features) is combined with personal data (like customer usage history, credit score, health, energy and education data) into "choice engines" (like search engines, interactive maps or mobile applications) that enable consumers to make better decisions in context, at the point of a buying or contractual decision. There are four broad categories where smart disclosure applies:
- When government releases data about products or services. For instance, when the Department of Health and Human Services releases hospital quality ratings, the Security and Exchange Commission releases public company financial filings in machine-readable formats at XBLR.SEC.gov, or the Department of Education puts data about more than 7,000 institutions online in a College Navigator for prospective students.
- When government releases personal data about a citizen. For instance, when the Department of Veterans Affairs gives veterans access to health records using at the "Blue Button" or the IRS provides citizens with online access to their electronic tax transcript. The work of BrightScope liberating financial advisor data and 401(k) data has been an early signal of how data drives the innovation economy.
- When a private company releases information about products or services in machine readable formats. Entrepreneurs can then use that data to empower consumers. For instance, both Billshrink.com and Hello Wallet may enhance consumer finance decisions.
- When a private company releases personal data about usage to a citizen. For instance, when a power utility company provides a household access to its energy usage data through the Green Button or when banks allowing customers to download their transaction histories in a machine readable format to use at Mint.com or similar services. As with the Blue Button for healthcare data and consumer finance, the White House asserts that providing energy consumers with secure access to information about energy usage will increase innovation in the sector and empower citizens with more information.
An expanding colorwheel of buttons
Should smart disclosure initiatives continue to gather steam, citizens could see “Blue Button”-like and "Green Button"-like solutions for every kind of data government or industry collects about citizens. For example, the Department of Defense has military training and experience records. Social Security and the Internal Revenue Service have the historical financial history of citizens, such as earnings and income. The Department of Veterans Affairs and Centers for Medicare and Medicaid Services have personal health records.
More "Green Button"-like mechanisms could enable secure, private access to private industry collects about citizen services. The latter could includes mobile phone bills, credit card fees, mortgage disclosures, mutual fund fee and more, except where there are legal restrictions, as for national security reasons.
Earlier this year, influential venture capitalist Fred Wilson encouraged entrepreneurs and VCs to get behind open data. Writing on his widely read blog, Wilson urged developers to adopt the Green Button.
"This is the kind of innovation that gets me excited," Wilson wrote. "The Green Button is like OAuth for energy data. It is a simple standard that the utilities can implement on one side and web/mobile developers can implement on the other side. And the result is a ton of information sharing about energy consumption and in all likelihood energy savings that result from more informed consumers.
When citizens gain access to data and put it to work, they can tap it to make better choices about everything from finance to healthcare to real estate, much in the same way that Web applications like Hipmunk and Zillow let consumers make more informed decisions.
"I'm a big fan of simplicity and open standards to unleash a lot of innovation," wrote Wilson. "APIs and open data aren't always simple concepts for end users. Green Buttons and Blue Buttons are pretty simple concepts that most consumers will understand. I'm hoping we soon see Yellow Buttons, Red Buttons, Purple Buttons, and Orange Buttons too. Let's get behind these open data initiatives. Let's build them into our apps. And let's pressure our hospitals, utilities, and other institutions to support them."
The next generation of open data is personal data, wrote open government analyst David Eaves this month:
I would love to see the blue button and green button initiative spread to companies and jurisdictions outside the United States. There is no reason why for example there cannot be Blue Buttons on the Provincial Health Care website in Canada, or the UK. Nor is there any reason why provincial energy corporations like BC Hydro or Bullfrog Energy (there's a progressive company that would get this) couldn't implement the Green Button. Doing so would enable Canadian software developers to create applications that could use this data and help citizens and tap into the US market. Conversely, Canadian citizens could tap into applications created in the US.
The opportunity here is huge. Not only could this revolutionize citizens access to their own health and energy consumption data, it would reduce the costs of sharing health care records, which in turn could potentially create savings for the industry at large.
Data drives consumer finance innovation
Despite recent headlines about the Green Button and the household energy data market, the biggest US smart disclosure story of this type is currently consumer finance, where there is already significant private sector activity going on today.
For instance, if a consumer visits Billshrink.com, you can get personalized recommendations for a cheaper cell phone plan based on your calling history. Mint.com will make specific recommendations on how to save (and alternative products to use) based on an analysis of the accounts it is pulling data from. Hello Wallet is enabled by smart disclosure by banks and government data. The sector's success hints at the innovation that's possible when people get open, portable access to their personal data in a a consumer market of sufficient size and value to attract entrepreneurial activity.
Such innovation is enabled in part because entrepreneurs and developers can go directly to data aggregation intermediaries like Yodlee or CashEdge and license the data, meaning that they do not have to strike deals directly with each of the private companies or build their own screen scraping technology, although some do go it alone.
"How do people actually make decisions? How can data help improve those decisions in complex markets? Research questions like these in behavioral economics are priorities for both the Russell Sage Foundation and the Alfred P. Sloan Foundation," said Daniel Goroff, a Sloan Program Director, in an interview yesterday. "That's why we are launching a 'Smart Disclosure Research and Demonstration Design Competition.' If you have ideas and want to win a prize, please send Innocentive.com a short essay. Even if you are not in a position to carry out the work, we are especially interested in finding and funding projects that can help measure the costs and benefits of existing or novel 'choice engines.'"
What is the future of smart disclosure?
This kind of vibrant innovation could spread to many other sectors, like energy, health, education, telecommunication, food and nutrition, if relevant data were liberated. The Green Button is an early signal in this area, with the potential to spread to 27 million households around the United States. The Blue Button, with over 800,000 current users, is spreading to private health plans like Aetna and Walgreens, with the potential to spread to 21 million users.
Despite an increasingly number of powerful tools that enable data journalists and scientists to interrogate data, many of even the most literate consumers do not look at data themselves, particularly if it is in machine-readable, as opposed to human-readable formats. Instead, they digest it from ratings agencies, consumer reports and guides to the best services or products in a given area. Increasingly, entrepreneurs are combining data with applications, algorithms and improved user interfaces to provide consumers with "choice engines."
As Tim O'Reilly outlined in his keynote speech yesterday, the future of smart disclosure includes more than quarterly data disclosure from the SEC or banks. If you're really lining up with the future, you have to think about real-time data and real-time data systems, he said. Tim outlined 10 key lessons his presentation, an annotated version of which is embedded below.
When released through smart disclosure, data resembles a classic "public good" in a broader economic sense. Disclosures of such open data in a useful format are currently under-produced by the marketplace, suggesting a potential role for government in the facilitation of its release. Generally, consumers do not have access to it today.
Well over a century ago, President Lincoln said that "the legitimate object of government is to do for the people what needs to be done, but which they cannot by individual effort do at all, or do so well, for themselves." The thesis behind smart disclosure in the 21st century is that when consumers have access to that personal data and the market creates new tools to put to work, citizens will be empowered make economic, education and lifestyle choices that enable to them to live healthier, wealthier, and -- in the most aspirational sense -- happier lives.
"Moving the government into the 21st century should be applauded," wrote Richard Thaler, an economics professor at the University of Chicago, in the New York Times last year. In a time when so many citizens are struggling with economic woes, unemployment and the high costs of energy, education and healthcare, better tools that help them invest and benefit from personal data are sorely needed..
March 27 2012
FTC calls on Congress to enact baseline privacy legislation and more transparency of data brokers
Over a century ago, Supreme Court Justice Lewis Brandeis "could not have imagined phones that keep track of where we are going, search engines that predict what we're thinking, advertisers that monitor what we're reading, and data brokers who maintain dossiers of every who, what, where, when and how of our lives," said Federal Trade Commission Chairman Jon Leibowitz yesterday morning in Washington, announcing the release of the final version of its framework on consumer privacy.,
"But he knew that, when technology changes dramatically, consumers need privacy protections that update just as quickly. So we issue our report today to ensure that, online and off, the right to privacy, that 'right most valued by civilized men,' remains relevant and robust to Americans in the 21st century as it was nearly 100 years ago."
What, exactly, privacy means in this digital age is still being defined all around us, reflected in the increasing number of small screens, cameras and explosion of data. The FTC's final report, "Protecting Consumer Privacy in an Era of Rapid Change: Recommendations For Businesses and Policymakers," makes a strong recommendation to Congress to draft and pass a strong consumer privacy law that provides rules of the road for the various entities that have the responsibility for protecting sensitive data.
The final report clearly enumerates the same three basic principles that the draft of the FTC's privacy framework outlined for companies :
- Privacy by design, where privacy is "built in" at every stage that an application, service or product is developed
- Simplified choice, wherein consumers are empowered to make informed decisions by clear information about how their data will be used at a relevant "time and context," including a "Do Not Track" mechanism, and businesses are freed of the burden of providing unnecessary choices
- Greater transparency, where the collection and use of consumer data is made more clear to those who own it.
"We are demanding more and better protections for consumer privacy not because industry is ignoring the issue," said Leibowitz today. "In fact, the best companies already follow the privacy principles we lay out in the report. In the last year, online advertisers, major browser companies, and the W3C -- an Internet standard setting group -- have all made strides towards putting into place the foundation of a Do Not Track system, and we commit to continue working with them until all consumers can easily and effectively choose not to be tracked. I'm optimistic that we'll get the job done by the end of the year."
According to the FTC, the nation's top consumer watchdog received over 450 comments on the draft online privacy report that it released in December 2010. In response to "technological advances" and comments, the FTC revised the privacy framework in several areas. (For a broad overview of the final FTC privacy framework, read Dan Rowinski's overview at ReadWriteWeb and the Information Law Group's summary of the commission report on consumer privacy).
First, it will not apply to "companies that collect and do not transfer only non-sensitive data from fewer than 5,000 consumers a year," which would have been a burden on small businesses. Second, the FTC has brought action against Google and Facebook since the draft report was issued. Those actions -- and the agreements reached -- provide a model and guidance for other companies.
Third, the FTC made specific recommendations to companies that offer mobile services that include improved privacy protections and disclosures that are short, clear and effective on small screens. Fourth, the report also outlined "heightened privacy concerns" about large platform providers, such as ISPs, "operating systems, browsers and social media companies," seeking to "comprehensively track consumers' online activities." When asked about "social plug-ins" from such a platform, chairman Leibowitz provided Facebook's "Like" button as an example. (Google's +1 button is presumably another such mechanism.)
Finally, the final report also included a specific recommendation with respect to "data brokers," which chairman Leibowitz described as "cyberazzi" on Monday, echoing remarks at the National Press Club in November 2011. Over at Forbes, Kashmir Hill reports that the FTC officially defined a data broker as those who “collect and traffic in the data we leave behind when we travel through virtual and brick-and-mortar spaces."
During the press conference, chairman Leibowitz said that American citizens should be able to learn see what information is held by them and "have the right to correct inaccurate data," much as they do with credit reports. Specifically, the FTC has called on data brokers to "make their operations more transparent by creating a centralized website to identify themselves, and to disclose how they collect and use consumer data. In addition, the website should detail the choices that data brokers provide consumers about their own information."
While the majority of the tech media's stories about the FTC today focused on "Do Not Track" prospects and mechanisms, or the privacy framework's impact on mobile, apps and social media, the reality of this historic moment is it's world's world's data brokers that currently hold immense amounts of information regarding just about everyone "on the grid," even if they never "Like" something on Facebook, turn on a smartphone or buy and use an app.
In other words, even though the FTC's recommendations for privacy by design led TechMeme yesterday, that's wasn't new news. CNET's Declan McCullagh, one of the closest observers of Washington tech policy in the media, picked up on the focus, writing that FTC stops short of calling for a new DNT law but "asks Congress to enact a new law that "would provide consumers with access to information about them held by a data broker" such as Lexis Nexis, US Search, or Reed Elsevier subsidiary Choicepoint -- many of which have been the subject of FTC enforcement actions in the last few years." As McCullagh reported, the American Civil Liberties Union "applauded" the FTC's focus on data brokers.
They should. As Ryan Singel pointed out at Wired, the FTC's report does "call for federal legislation that would force transparency on giant data collection companies like Choicepoint and Lexis Nexis. Few Americans know about those companies’ databases but they are used by law enforcement, employers and landlords."
Would we, as Hill wondered, be less freaked out if we could see what data brokers have on us? A good question, and one that, should the industry coalesce around providing consumers access to their personal data in that context, just as utilities are beginning to do with energy data.
Another year without privacy legislation?
Whether it's "baseline privacy protections" or more transparency for data brokers, the FTC is looking to Congress to act. Whether it will or not is another matter. While the Online privacy debate was just about as hot in Washington nearly two years ago as it is today, no significant laws were passed.The probability of significant consumer privacy legislation advancing in this session of Congress, however, currently appears quite low. While at least four major privacy bills have been introduced in the U.S. House and Senate, "none of that legislation is likely to make it into law in this Congressional session, however, given the heavy schedule of pending matters and re-election campaigns," wrote Tanzina Vegas and Edward Wyatt in the New York Times.
The push the FTC gave yesterday was welcomed in some quarters. "We look forward to working with the FTC toward legislation and further developing the issues presented in the report," said Leslie Harris, president of the Center for Democracy and Technology (CDT), in a prepared release. CDT also endorsed the FTC's guidance on "Do Not Track" and focus on large platform providers. Earlier this winter, a coalition of Internet giants, including Google, Yahoo, Microsoft, and AOL, have committed to adopt “Do Not Track technology” in most Web browsers by the end of 2012. These companies, which deliver almost 90 percent of online behavioral advertisements, have agreed not to track consumers if they choose to opt out of online tracking using the Do Not Track mechanism, which will likely manifest as a button or browser plug-in. All companies that have made this commitment will be subject to FTC enforcement.
By way of contrast, Jim Harper, the Cato Institute's director of information policy studies, called the framework a "groundhog report on privacy," describing it as "regulatory cheerleading of the same kind our government’s all-purpose trade regulator put out a dozen years ago." In May of 2000, wrote Harper, "the FTC issued a report finding “that legislation is necessary to ensure further implementation of fair information practices online” and recommending a framework for such legislation. Congress did not act on that, and things are humming along today without top-down regulation of information practices on the Internet."
Overall, the "industry here has a self-interest beyond avoiding legislation," said Leibowitz during today's press conference. Consumers have very serious concerns about privacy, he went on, alluding to polling data, surveys and conversations, and "better, clearer privacy policies" will lead to people having "more trust in doing business online."
This FTC privacy framework and the White House's consumer privacy bill of rights will, at minimum, inform the debates going forward. What happens next will depend upon Congress finding a way to protect privacy and industry innovation. It will be a difficult balance to strike, particularly given concerns about protecting children online and the continued march of data breaches around the country.
Making technology more accessible
I interviewed Princeton professor Ed Felten, the FTC's chief technologist and co-author of "Government Data and the Invisible Hand" (2009) after yesterday's FTC press conference at FTC headquarters in D.C. In December 2010, we spoke about the FTC's 'Do Not Track' proposal, after the release of the draft report.
Felten launched "Tech at the FTC" last Friday morning, a new blog that he hopes will play a number of different roles in the discussion of technology, government and society.
"It will combine Freedom to Tinker posts," he said, "some of which were op-ed, some more like teaching. The latter is what I'm looking for: explanations of sophisticated technical information that cross over to a non-technical audience."
Felten wants to start a conversation that's "interesting to general public" and "draws them into the discussion" about the intersection of regulation and technology. One aspect of that will be a connected Twitter account, @TechFTC, along with his established social identity, @EdFelten.
Possible future topics will include security issues around passwords and authentication of people in digital environments, both of which Felten finds interesting as they relate to policy. He said that he expects to write about technology stories that are in the news, with the intent of helping citizens to understand at an accessible level what the take away is for them.
Social media and the Internet are "useful to give people a window into the way people in government are thinking about these issues," said Felten. "They let people see that people in government are thinking about technology in a sophisticated way. It's easy to fall into the trap where people in government don't know about technology. That's part of the goal: speak to the technical community in their language.
"Part of my job is to be an ambassador to the technology community, through speaking to and with the public," said Felten. "The blog will help people know how to talk to the FTC and who to talk to, if they want to. People think that we don't want to talk to them. Just emailing us, just calling us, is usually the best way to get a conversation started. You usually don't need a formal process to do this -- and those conversations are really valuable."
In that context, he plans to write more posts like the one that went live Monday morning, on tech highlights of the FTC privacy report, in which he highlighted four sections of the framework that the computer science professor thought would be of interest to techies:
- De-identified data (pp. 18-22): Data that is truly de-identified (or anonymous) can’t be used to infer anything about an individual person or device, so it doesn’t raise privacy concerns. Of course, it’s not enough just to say that data is anonymous, or that it falls outside some narrow notion of PII. But beyond that, figuring out whether your dataset is really de-identified can be challenging. If you’re going to claim that data is de-identified, you need to have a good reason-the report calls it a “reasonable level of justified confidence”-for claiming that the data does not allow inferences about individuals. What “reasonable” means-how confident you have to be-depends on how much data there is, and what the consequences of a breach would be. But here’s a good rule of thumb: if you plan to use a dataset to personalize or target content to individual consumers, it’s probably not de-identified.
- Sensitive data (pp. 47-48): Certain types of information, such as health and financial information, information about children, and individual geolocation, are sensitive and ought to be treated with special care, for example by getting explicit consent from users before collecting it. If your service is targeted toward sensitive data, perhaps because of its subject matter or target audience, then you should take extra care to provide transparency and choice and to limit collection and use of information. If you run a general-purpose site that incidentally collects a little bit of sensitive information, your responsibilities will be more limited.
- Mobile disclosures (pp. 33-34): The FTC is concerned that too few mobile apps disclose their privacy practices. Companies often say that users accept their data practices in exchange for getting a service. But how can users accept your practices if you don’t say what they are? A better disclosure would tell users not only what data you’re collecting, but also how you are going to use it and with whom you’ll share it. The challenging part is how to make all of this clear to users without subjecting them to a long privacy policy that they probably won’t have time to read. FTC staff will be holding a workshop to discuss these issues.
- Do Not Track (pp. 52-55): DNT gives users a choice about whether to be tracked by third parties as they move across the web. In this section of the report, the FTC reiterates its five criteria for a successful DNT system, reviews the status of major efforts including the ad industry’s self-regulatory program and the W3C’s work toward a standard for DNT, and talks about what steps remain to get to a system that is practical for consumers and companies alike.
When asked about what the developers and founders of startups should be thinking about with respect to the FTC's privacy framework, Felten emphasized those three basic principles -- privacy by design, simplified choice, greater transparency -- and then offered some common sense:
"Start with the basic question of 'what Section 5 means for you,' he suggested. "If you make a promise to consumers in your privacy policy, consumers are entitled to rely on that. The FTC has brought cases against companies that made them and didn't hold up their responsibility around privacy. You have a responsibility to protect consumer data. If not, you may find yourself on the wrong side of the FTC act if there's a breach and it harms consumers."
March 26 2012
Five tough lessons I had to learn about health care
Working in the health care space has forced me to give up many hopes and expectations that I had a few years ago. Forgive me for being cynical (it's an easy feeling to have following the country's largest health IT conference, as I reported a month ago), and indeed some positive trends do step in to shore up hope. I'll go over the redeeming factors after listing the five tough lessons.
1. The health care field will not adopt a Silicon Valley mentality
Wild, willful, ego-driven experimentation--a zeal for throwing money after intriguing ideas with minimal business plans--has seemed work for the computer field, and much of the world is trying to adopt a "California optimism." A lot of venture capitalists and technology fans deem this attitude the way to redeem health care from its morass of expensive solutions that don't lead to cures. But it won't happen, at least not the way they paint it.
Health care is one of the most regulated fields in public life, and we want it that way. From the moment we walk into a health facility, we expect the staff to be following rigorous policies to avoid infections. (They don't, but we expect them to.) And not just anybody can set up a shield outside the door and call themselves a doctor. In the nineteenth century it was easier, but we don't consider that a golden age of medicine.
Instead, doctors go through some of the longest and most demanding training that exists in the world today. And even after they're licensed, they have to regularly sign up for continuing education to keep practicing. Other fields in medicine are similar. The whole industry is constrained by endless requirements that make sure the insiders remain in their seats and no "disruptive technologies" raise surprises. Just ask a legal expert about the complex mesh of Federal and state regulations that a health care provider has to navigate to protect patient privacy--and you do want your medical records to be private, don't you?--before you rave about the Silicon Valley mentality. Also read the O'Reilly book by Fred Trotter and David Uhlman about the health care system as it really is.
Nor can patients change treatments with the ease of closing down a Facebook account. Once a patient has established a trust relationship with a doctor and obtained a treatment plan, he or she won't say, "I think I'll go down the road to another center that charges $100 less for this procedure." And indeed, health reform doesn't prosper from breaking down treatments into individual chunks. Progress lies in the opposite direction: the redemptive potential of long-term relationships.
2. Regulations can't force change
I am very impressed with the HITECH act (a product of the American Recovery and Reinvestment Act, more than the Affordable Care Act) that set modern health reform in motion, as well as the efforts of the Department of Health and Human Services to push institutions forward. But change in health care, like education, boils down to the interaction in a room between a professional and a client. Just as lesson plans and tests can't ensure that a teacher inspires a child to learn, regulations can't keep a doctor from ordering an unnecessary test to placate an anxious patient.
We can offer clinical decision support to suggest what has worked for other patients, but we can't keep a patient from asking for a expensive procedure that has a 10% chance of making him better (and a 20% chance of making him worse), nor can we make the moral decision about what treatment to pursue, for the patient or the doctor. Each patient is different, anyway. No one wants to be a statistic.
3. The insurance companies are not the locus of cost and treatment problems
Health insurers are a favorite target of hatred by Americans, exemplified by Michael Moore's 2007 movie Sicko and more surprisingly in the 1997 romantic comedy As Good as it Gets, where I saw an audience applaud as Helen Hunt delivered a rant against health maintenance organizations. A lot of activists, looking at other countries, declare that our problems would be solved (well, would improve a lot) if we got private insurers out of the picture.
Sure, there's a lot of waste in the current insurance system, which deliberately stretches out the task of payment and makes it take up the days of full-time staff in each doctor's office. But that's not the cause of the main problems in either costs or treatment failures. The problems lie with the beloved treatment staff. We can respect their hard work and the lives they save, but we don't have to respect them for releasing patients from hospitals without adequate follow-up, or for ordering unnecessary radiation that creates harm for patients, or for the preventable errors that still (after years of publicity) kill 90,000 to 100,000 patients a year.
4. Doctors don't want to be care managers
The premise of health reform is to integrate patients into a larger plan for managing a population. A doctor is supposed to manage a case load and keep his or her pipeline full while not spending too much. The thrust of various remuneration schemes, old and new, that go beyond fee for service (capitation, global payment systems) is to reward a doctor for handling patients of a particular type (for instance, elderly people with hypertension) at a particular cost. But doctors aren't trained for this. They want to fix the immediate, presenting complaint and send the patient home until they're needed again. Some think longitudinally, and diligently try to treat the whole person rather than a symptom. But managing their treatment options as a finite resource is just not in their skill set.
The United Kingdom--host of one of the world's great national care systems--is about to launch a bold new program where doctors have to do case management. The doctors are rebelling. If this is the future of medicine, we'll have to find new medical personnel to do it.
5. Patients don't want to be care managers
Now that the medical field has responded superbly to acute health problems, we are left with long-term problems that require lifestyle and environmental changes. The patient is even more important than the doctor in these modern ills. But the patients who cost the most and need to make the most far-ranging changes are demonstrating an immunity to good advice. They didn't get emphysema or Type 2 diabetes by acting healthily in the first place, and they aren't about to climb out of their condition voluntarily either.
You know what the problem with chronic disease is? Its worst effects are not likely to show up early in life when lifestyle change could make the most difference. (Serious pain can come quickly from some chronic illnesses, such as asthma and Crohn's disease, but these are also hard to fix through lifestyle changes, if by "lifestyle change" you mean breathing clean air.) The changes a patient would have to make to prevent smoking-related lung disease or obesity-related problems would require a piercing re-evaluation of his course of life, which few can do. And incidentally, they are neither motivated nor trained to store their own personal health records.
Hope for the future
Despite the disappointments I've undergone in learning about health care, I expect the system to change for the better. It has to, because the public just won't tolerate more precipitous price hikes and sub-standard care.
There's a paucity of citations in my five lessons because they tend not to be laid out bluntly in research or opinion pieces; for the most part, they emerged gradually over many hallway conversations I had. Each of the five lessons contain a "not," indicating that they attack common myths. Myths (in the traditional sense) in fact are very useful constructs, because they organize the understanding of the world that societies have trouble articulating in other ways. We can realize that myths are historically inaccurate while finding positive steps forward in them.
The Silicon Valley mentality will have some effect through new devices and mobile phone apps that promote healthy activity. They can help with everything from basic compliance--remembering to take prescribed meds--to promoting fitness crazes and keeping disabled people in their homes. Lectures given once in a year in the doctor's office don't lead to deep personal change, but having a helper nearby (even a digital one) can impel a person to act better, hour by hour and day by day. This has been proven by psychologists over and over: motivation is best delivered in small, regular doses (a theme found in my posting from HIMSS).
Because the most needy patients are often the most recalcitrant ones, personal responsibility has to intersect with professional guidance. A doctor has to work the patient, and other staff can shore up good habits as well. This requires the doctors' electronic record systems to accept patient data, such as weight and mood. Projects such as Indivo X support these enhancements, which traditional electronic record systems are ill-prepared for.
Although doctors eschew case management, there are plenty of other professionals who can help them with it, and forming Accountable Care Organizations gives the treatment staff access to such help. Tons of potential savings lie in the data that clinicians could collect and aggregate. Still more data is being loaded by the federal government regularly at Health.Data.Gov. ACOs and other large institutions can hire people who love to crunch big data (if such staff can be found, because they're in extremely high demand now in almost every industry) to create systems that slide seamlessly into clinical decision support and provide guidelines for better treatment, as well as handle the clinic's logistics better. So what we need to do is train a lot more experts in big data to understand the health care field and crunch its numbers.
Change will be disruptive, and will not be welcomed with open arms. Those who want a better system need to look at the areas where change is most likely to make a difference.
March 17 2012
Profile of the Data Journalist: The Homicide Watch
Around the globe, the bond between data and journalism is growing stronger. In an age of big data, the growing importance of data journalism lies in the ability of its practitioners to provide context, clarity and, perhaps most important, find truth in the expanding amount of digital content in the world. In that context, data journalism has profound importance for society.
To learn more about the people who are doing this work and, in some cases, building the newsroom stack for the 21st century, I conducted in-person and email interviews during the 2012 NICAR Conference and published a series of data journalist profiles here at Radar.
Chris Amico (@eyeseast) is a journalist and web developer based in Washington, DC, where he works on NPR's State Impact project, building a platform for local reporters covering issues in their states. Laura Norton Amico (@LauraNorton) is the editor of Homicide Watch (@HomicideWatch), an online community news platform in Washington, D.C. that aspires to cover every homicide in the District of Columbia. And yes, the similar names aren't a coincidence: the Amicos were married in 2010.
Since Homicide Watch launched in 2009, it's been earning praise and interest from around the digital world, including a profile by the Nieman Lab at Harvard University that asked whether a local blog "could fill the gaps of DC's homicide coverage. Notably, Homicide Watch has turned up a number of unreported murders.
In the process, the site has also highlighted an important emerging set of data that other digital editors should consider: using inbound search engine analytics for reporting. As Steve Myers reported for the Poynter Institute, Homicide Watch used clues in site search queries to ID a homicide victim. We'll see if the Knight Foundation think this idea has legs: the husband and wife team have applied for a Knight News Challenge grant to build a tooklit for real-time investigative reporting from site analytics.
The Amico's success with the site - which saw big growth in 2011 -- offers an important case study into why organizing beats may well hold similar importance as investigative projects. It also will be a case study with respect to sustainability and business models for the "new news,"as Homicide Watch looks to license its platform to news outlets across the country.
Below, I've embedded a presentation on Homicide Watch from the January 2012 meeting of the Online News Association. Our interview follows.
Where do you work now? What is a day in your life like?
Laura: I work full time right now for Homicide Watch, a database driven beat publishing platform for covering homicides. Our flagship site is in DC, and I’m the editor and primary reporter on that site as well as running business operations for the brand.
My typical days start with reporting. First, news checks, and maybe posting some quick posts on anything that’s happened overnight. After that, it’s usually off to court to attend hearings and trials, get documents, reporting stuff. I usually have to to-do list for the day that includes business meetings, scheduling freelancers, mapping out long-term projects, doing interviews about the site, managing our accounting, dealing with awards applications, blogging about the start-up data journalism life on my personal blog and for ONA at journalists.org, guest teaching the occasional journalism class, and meeting deadlines for freelance stories. The work day never really ends; I’m online keeping an eye on things until I go to bed.
Chris: I work for NPR, on the State Impact project, where I build news apps and tools for journalists. With Homicide Watch, I work in short bursts, usually an hour before dinner and a few hours after. I’m a night owl, so if I let myself, I’ll work until 1 or 2 a.m., just hacking at small bugs on the site. I keep a long list of little things I can fix, so I can dip into the codebase, fix something and deploy it, then do something else. Big features, like tracking case outcomes, tend to come from weekend code sprints.
How did you get started in data journalism? Did you get any special degrees or certificates?
Laura: Homicide Watch DC was my first data project. I’ve learned everything I know now from conceiving of the site, managing it as Chris built it, and from working on it. Homicide Watch DC started as a spreadsheet. Our start-up kit for newsrooms starting Homicide Watch sites still includes filling out a spreadsheet. The best lesson I learned when I was starting out was to find out what all the pieces are and learn how to manage them in the simplest way possible.
Chris: My first job was covering local schools in southern California, and data kept creeping into my beat. I liked having firm answers to tough questions, so I made sure I knew, for example, how many graduates at a given high school met the minimum requirements for college. California just has this wealth of education data available, and when I started asking questions of the data, I got stories that were way more interesting.
I lived in Dalian, China for a while. I helped start a local news site with two other expats (Alex Bowman and Rick Martin). We put everything we knew about the city -- restaurant reviews, blog posts, photos from Flickr -- into one big database and mapped it all. It was this awakening moment when suddenly we had this resource where all the information we had was interlinked. When I came back to California, I sat down with a book on Python and Django and started teaching myself to code. I spent a year freelancing in the Bay Area, writing for newspapers by day, learning Python by night. Then the NewsHour hired me.
Did you have any mentors? Who? What were the most important resources they shared with you?
Laura: Chris really coached me through the complexities of data journalism when we were creating the site. He taught me that data questions are editorial questions. When I realized that data could be discussed as an editorial approach, it opened the crime beat up. I learned to ask questions of the information I was gathering in a new way.
Chris: My education has been really informal. I worked with a great reporter at my first job, Bob Wilson, who is a great interviewer of both people and spreadsheets. At NewsHour, I worked with Dante Chinni on Patchwork Nation, who taught me about reporting around a central organizing principle. Since I’ve started coding, I’ve ended up in this great little community of programmer-journalists where people bound ideas around and help each other out.
What does your personal data journalism "stack" look like? What tools could you not live without?
Laura: The site itself and its database which I report to and from, WordPress, Wordpress analytics, Google Analytics, Google Calendar, Twitter, Facebook, Storify, Document Cloud, VINElink, and DC Superior Court’s online case lookup.
Chris: Since I write more Python than prose these days, I spend most of my time in a text editor (usually TextMate) on a MacBook Pro. I try not to do anything with git.
What data journalism project are you the most proud of working on or creating?
Laura: Homicide Watch is the best thing I’ve ever done. It’s not just about the data, and it’s not just about the journalism, but it’s about meeting a community need in an innovative way. I stared thinking about a Homicide Watchtype site when I was trying to follow a few local cases shortly after moving to DC. It was nearly impossible to find news sources for the information. I did find that family and friends of victims and suspects were posting newsy updates in unusual places -- online obituaries and Facebook memorial pages, for example. I thought a lot about how a news product could fit the expressed need for news, information, and a way for the community to stay in touch about cases.
The data part developed very naturally out of that. The earliest description of the site was “everything a reporter would have in their notebook or on their desk while covering a murder case from start to finish.” That’s still one of the guiding principals of the site, but it’s also meant that organizing that information is super important. What good is making court dates public if you’re not doing it on a calendar, for example.
We started, like I said, with a spreadsheet that listed everything we knew: victim name, age, race, gender, method of death, place of death, link to obituary, photo, suspect name, age, race, gender, case status, incarceration status, detective name, age, race, gender, phone number, judge assigned to case, attorneys connected to the case, co-defendants, connections to other murder cases.
And those are just the basics. Any reporter covering a murder case, crime to conviction, should have that information. What Homicide Watch does is organize it, make as much of it public as we can, and then report from it. It’s led to some pretty cool work, from developing a method to discover news tips in analytics, to simply building news packages that accomplish more than anyone else can.
Chris: Homicide Watch is really the project I wanted to build for years. It’s data-driven beat reporting, where the platform and the editorial direction are tightly coupled. In a lot of ways, it’s what I had in mind when I was writing about frameworks for reporting.
The site is built to be a crime reporter’s toolkit. It’s built around the way Laura works, based on our conversations over the dinner table for the first six months of the site’s existence. Building it meant understanding the legal system, doing reporting and modeling reality in ways I hadn’t done before, and that was a challenge on both the technical and editorial side.
Where do you turn to keep your skills updated or learn new things?
Laura: Assigning myself new projects and tasks is the best way for me to learn; it forces me to find solutions for what I want to do. I’m not great at seeking out resources on my own, but I keep a close eye on Twitter for what others are doing, saying about it, and reading.
Chris: Part of my usual morning news reading is a run through a bunch of programming blogs. I try to get exposed to technologies that have no immediate use to me, just so it keeps me thinking about other ways to approach a problem and to see what other problems people are trying to solve.
I spend a lot of time trying to reverse-engineer other people’s projects, too. Whenever someone launches a new news app, I’ll try to find the data behind it, take a dive through the source code if it’s available and generally see if I can reconstruct how it came together.
Why are data journalism and "news apps" important, in the context of the contemporary digital environment for information?
Laura: Working on Homicide Watch has taught me that news is about so much more than “stories.” If you think about a typical crime brief, for example, there’s a lot of information in there, starting with the "who-what-where-when." Once that brief is filed and published, though, all of that information disappears.
Working with news apps gives us the ability to harness that information and reuse/repackage it. It’s about slicing our reporting in as many ways as possible in order to make the most of it. On Homicide Watch, that means maintaining a database and creating features like victims’ and suspects’ pages. Those features help regroup, refocus, and curate the reporting into evergreen resources that benefit both reporters and the community.
Chris: Spend some time with your site analytics. You’ll find that there’s no one thing your audience wants. There isn’t even really one audience. Lots of people want lots of different things at different times, or at least different views of the information you have.
One of our design goals with Homicide Watch is “never hit a dead end.” A user may come in looking for information about a certain case, then decide she’s curious about a related issue, then wonder which cases are closed. We want users to be able to explore what we’ve gathered and to be able to answer their own questions. Stories are part of that, but stories are data, too.
March 15 2012
Left and right and wrong
Sometimes I find a picture or a blog post that leaps off the screen at me and says "your readers must see this as it applies to health IT."
Normal Modes, a solid UX company based in Houston, sends me fairly good UX tips on a regular business. The last one featured this photo (used with permission):
Normal Modes points out, very clearly, that points of confusion like this are bad for users. They regard their job, as UX experts, to eliminate this kind of experience for users. Their analysis about how to do this is right on.
I have seen this kind of error in EHR systems and PHR systems on countless occasions. From an engineering perspective, it is really useful to take a moment and consider how something like this happens. First, you have two different "levels" of operation here. One is concerned with how traffic flows in the parking lot. The other is concerned with directions in the parking lot. For whatever reasons, these two "parking lot features" were implemented separately by people who had access to two different sets of resources. It stands to reason that the people who had access to white paint and stencils to make the sign on the right were the same people using stencils to mark the parking spots. It stands to reason that the people who had access to the professional sign-making system were somewhat removed from the people actually designing the parking lot.
In short, what you are seeing here is the artifact of a political and process disconnect. In health IT, there are constant political disconnects that cause similar issues. The EHR vendor is one political group, the insurance companies another, and the government is so large that it actually has multiple groups with different agendas. (HHS alone has so many sub groups that it's very difficult to completely follow what is happening.)
As enthusiastic as I am about the potential for meaningful use incentives, I think there will be lots of artifacts like this in EHRs that do not make much sense because the EHR vendor was pulled in a new direction by these incentives.
I have said in almost every talk about health IT I have ever given that the problems in health IT are political and not technical. I think it is my most tweeted quote. But sometimes a picture is worth a thousand words.
Meaningful Use and Beyond: A Guide for IT Staff in Health Care — Meaningful Use underlies a major federal incentives program for medical offices and hospitals that pays doctors and clinicians to move to electronic health records (EHR). This book is a rosetta stone for the IT implementer who wants to help organizations harness EHR systems.Related:
- The Direct Project: Healthcare communication gets an upgrade
- Epatients: The hackers of the healthcare world
- Building the health information infrastructure for the modern epatient
- Why geeks should care about meaningful use and ACOs
- See more of Radar's health IT coverage
Strata Week: Infographics for all
Here are some of the data stories that caught my attention this week.
More infographics incoming, thanks to Visual.ly Create
The visualization site Visual.ly launched a new tool this week that helps users create their own infographics. Aptly called Visual.ly Create, the new feature lets people take publicly available datasets (such as information from a Twitter hashtag), select a template, and publish their own infographics.

Segment from a Visual.ly Create infographic of the #stratconf hashtag.
As GigaOm's Derrick Harris observes, it's fairly easy to spot the limitations with this service — in the data you can use, in the templates that are available, and in the visualizations that are created. But after talking to Visual.ly's co-founder and Chief Content Officer Lee Sherman about some "serious customization options" that are in the works, Harris wonders if a tool like this could be something to spawn interest in data science:
"The problem is that we need more people with math skills to meet growing employer demand for data scientists and data analysts. But how do you get started caring about data in the first place when the barriers are so high? Really working with data requires a deep understanding of both math and statistics, and Excel isn't exactly a barrel of monkeys (nor are the charts it creates)."
Could Visual.ly be an on-ramp for more folks to start caring about and playing with data?
San Francisco upgrades its open data initiative
Late last week, San Francisco Mayor Ed Lee unveiled the new data.SFgov.org, a cloud-based open data website that will replace DataSF.org, one of the earliest examples of civic open data initiatives.
"By making City data more accessible to the public secures San Francisco's future as the world's first 2.0 City," said Lee in an announcement. "It's only natural that we move our Open Data platform to the cloud and adopt modern open interface to facilitate that flow and access to information and develop better tools to enhance City services."
The city's Chief Innovation Officer Jay Nath told TechCrunch that the update to the website expands access to information while saving the city money.
The new site contains some 175 datasets, including map-based crime data, active business listings, and various financial datasets. It's powered by the Seattle-based data startup Socrata.
The personal analytics of Stephen Wolfram
"One day I'm sure everyone will routinely collect all sorts of data about themselves," writes Mathematica and Wolfram Alpha creator Stephen Wolfram. "But because I've been interested in data for a very long time, I started doing this long ago. I actually assumed lots of other people were doing it too, but apparently they were not. And so now I have what is probably one of the world's largest collections of personal data."
And what a fascinating collection of data it is, including emails received and sent, phone calls made, calendar events planned, keystrokes made, and steps taken. Through this, you can see Wolfram's sleep, social, and work patterns, and even how various chapters of his book and Mathematica projects took shape.
"The overall pattern is fairly clear," Wolfram writes. "It's meetings and collaborative work during the day, a dinnertime break, more meetings and collaborative work, and then in the later evening more work on my own. I have to say that looking at all this data, I am struck by how shockingly regular many aspects of it are. But in general, I am happy to see it. For my consistent experience has been that the more routine I can make the basic practical aspects of my life, the more I am able to be energetic — and spontaneous — about intellectual and other things."
Fluent Conference: JavaScript & Beyond — Explore the changing worlds of JavaScript & HTML5 at the O'Reilly Fluent Conference (May 29 - 31 in San Francisco, Calif.).Save 20% on registration with the code RADAR20
Got data news?
Feel free to email me.
Related:
March 12 2012
Parts of healthcare are moving to the cloud
Healthcare providers are increasingly required to do more with less. Regulations, HIPAA, Meaningful Use, recovery audit contractor (RAC) audits and decreasing revenues are motivating providers to consider cloud computing as a solution to potentially help them cut costs, maintain quality, meet regulations, and increase productivity.
Some electronic health record (EHR) vendors are offering solutions as a cloud-based offering. This offers an approach intended to help providers better manage the IT investments that need to be made to support EHR implementations. And just as we've seen in other industries, there is an ongoing debate within healthcare as to the viability of cloud-based solutions given the care needed for patient privacy and sensitive personal information.
Providers' trust in the public cloud is still relatively weak, but increasing numbers are considering using private clouds. However, EHR applications hosted in the cloud do seem to be gaining traction.
One example of a cloud-based EHR offering is CareCloud. My fellow Radar blogger Andy Oram wrote about them two years ago at HIMSS, and they have made significant progress since then. CareCloud creates apps that help medical professionals run their businesses. Those apps include a community collaboration and communication platform to securely share patient information, a medical practice management system for billing and scheduling, and a revenue cycle management service. CareCloud also provides electronic health records. It's built with Ruby on Rails, a highly abstracted programming language quite well suited for rapid development of web applications. CareCloud was a co-winner of the IBM Global Entrepreneur Silicon Valley SmartCamp competition in 2010 (see video below).
I ran into the folks at CareCloud at the HIMSS 2012 conference and was impressed with both their use of open source and their strategy on leveraging the cloud in healthcare. Mike Cuesta, CareCloud's director of marketing and user experience, defined CareCloud's strategy as one of future survival.
"Being able to deliver the product across platforms is crucial," Cuesta said. "In healthcare there is a glaring lack of modern web apps. What we wanted to do was create an elegant and user-friendly application that is accessible anywhere. Companies have to be able to deliver a desktop-class experience that works across platforms."
CareCloud relies on open source. "I had my eyes opened to open source about eight years ago when I was looking for a project management system," said CareCloud CTO Tom Packert. "I discovered I could use something like dotproject, which is a GPL-licensed PHP-MySQL web-based project management application. It only took us a day to put it up on SUSE Linux and we didn't need SQL seat licenses. Open source allows you to scale horizontally. It's not as scary as a lot of people think it is."
Another EHR in the cloud is athenahealth. Athenahealth's co-founders Todd Park, the new U.S. chief technology officer (CTO), and Jonathan Bush, purchased a birthing practice in 1997. Soon, like most medical practices, they were buried in paper and spent most of their resources trying to get paid. Searching for innovative solutions led them to create their own software. Enlisting the help of Todd's younger brother Ed, a software developer, they created an EHR and financial revenue cycle system with a rules engine of dynamic billing rules data. I met Ed Park at HIMSS when I remarked that he looked a lot like Todd Park, and Jonathan introduced him to me as Todd's "younger, smarter, and much better looking brother." Apparently his programming skills are paying off ...
This year, athenahealth was named to the TR50, Technology Review's third annual list of the world's most innovative technology companies. At this year's HIMSS conference, athenahealth showed the company's plans for an iPhone app that will gives its EHR users access to certain features of its athenaClinicals cloud-based platform. An iPad version of the web-based athenahealth EHR app is also currently under development and set to launch in 2013.
Being based on cloud technology makes athenahealth much more nimble in launching mobile products in services. In the video below, I discuss with Jonathan Bush how athenahealth is using the cloud in their EHR.
(Thanks to Nate DeNiro and Open Affairs Television for their assistance with this video.)
Related:
- AI will eventually drive healthcare, but not anytime soon
- Medical imaging in the cloud: a conversation about eMix
- Health gets personal in the cloud
March 09 2012
OK, I Admit It. I have a mancrush on the new Federal CTO, Todd Park
I couldn't be more delighted by the announcement today that Todd Park has been named the new Chief Technology Officer for the United States, replacing Aneesh Chopra.
I first met Todd in 2008 at the urging of Mitch Kapor, who thought that Todd was the best exemplar in the healthcare world of my ideas about the power of data to transform business and society, and that I would find him to be a kindred spirit. And so it was. My lunch with Todd turned into a multi-hour brainstorm as we walked around the cliffs of Lands End in San Francisco. Todd was on fire with ideas about how to change healthcare, and the opportunity of the new job he'd just accepted, to become the CTO at HHS.
Subsequently, I helped Todd to organize a series of workshops and conferences at HHS to plan and execute their open data strategy. I met with Todd and told him how important it was not just to make data public and hope developers would come, but to actually do developer evangelism. I told him how various tech companies ran their developer programs, including some stories about Amazon's rollout of AWS: they had first held a small, private event to which they invited people and companies who'd been unofficially hacking on their data, told them their plans, and recruited them to build apps against the new APIs that were planned. Then, when they made their public announcement, they had cool apps to show, not just good intentions.
Todd immediately grasped the blueprint, and executed with astonishing speed. Before long, he held a workshop for an invited group of developers, entrepreneurs and health data wonks to map out useful data that could be liberated, and useful applications that could be built with it. Six months later, he held a public conference to showcase the 40-odd applications that had been developed. Now in its third year, the event has grown into what Todd calls the Health Datapalooza. As noted on GigaOm, the event has already led to several venture backed startup. (Applications are open for startups to be showcased at this year's event, June 5-6 in Washington D.C.)
Since I introduced him to Eric Ries, author of The Lean Startup, Todd has been introducing the methodology to Washington, insisting on programs that can show real results (learning and pivots) in only 90 days. He just knows how to make stuff happen.
Todd is also an incredibly inspiring speaker. At my various Gov 2.0 events, he routinely got a standing ovation. His enthusiasm, insight, and optimism are infectious.
When Todd Park talks, I listen. (Photo by James Duncan Davidson from the 2010 Gov 2.0 Summit. http://www.flickr.com/photos/oreillyconf/4967787323/in/photostream/)
Many will ask about Todd's technical credentials. After all, he is trained as a healthcare economist, not an engineer or scientist. There are three good answers:
1. Economists are playing an incredibly important role at today's technology companies, as extracting meaning and monetization from massive amounts of data become one of the key levers of success and competitive advantage. (Think Hal Varian at Google, working to optimize the ad auction.) Healthcare in particular is one of those areas where science, human factors, and economics are on a collision course, but virtually every sector of our nation is undergoing a transformation as a result of intelligence derived from data analysis. That's why I put Todd on my list for Forbes.com of the world's most important data scientists.
2. Todd is an enormously successful technology entrepreneur, with two brilliant companies - Athenahealth and Castlight Health - under his belt. In each case, he was able to succeed by understanding the power of data to transform an industry.
3. He's an amazing learner. In a 1998 interview describing the founding of Athena Health, he described his leadership philosophy: "Put enough of an idea together to inspire a team of really good people to jump with you into a general zone like medical practices. Then, just learn as much as you possibly can and what you really can do to be helpful and then act against that opportunity. No question."
Todd is one of the most remarkable people I've ever met, in a career filled with remarkable people. As Alex Howard notes, he should be an inspiration for more "retired" tech entrepreneurs to go into government. This is a guy who could do literally anything he put his mind to, and he's taking up the challenge of making our government smarter about technology. I want to put out a request to all my friends in the technology world: if Todd calls you and asks you for help, please take the call, and do whatever he asks.
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...



