Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

October 16 2011

BioCurious opens its lab in Sunnyvale, CA

When I got to the BioCurious lab yesterday evening, they were just cleaning up some old coffee makers. These, I learned, had been turned into sous vide cookers in that day's class.

New lab at BioCurious
New lab at BioCurious

Sous vide cookers are sort of the gourmet rage at the moment. One normally costs several hundred dollars, but BioCurious offered a class for $117 where seventeen participants learned to build their own cookers and took them home at the end. They actually cooked steak during the class--and I'm told that it come out very good--but of course, sous vide cookers are also useful for biological experiments because they hold temperatures very steady.

The class used Arduinos to provide the temperature control for the coffee pots and other basic hardware, so the lesson was more about electronics than biology. But it's a great illustration of several aspects of what BioCurious is doing: a mission of involving ordinary people off the street in biological experiments, using hands-on learning, and promoting open source hardware and software.

Other classes have taught people to insert dyes into cells (in order to teach basic skills such as pipetting), to run tests on food for genetically modified ingredients, and to run computer analyses on people's personal DNA sequences. The latter class involved interesting philosophical discussions about how much to trust their amateur analyses and how to handle potentially disturbing revelations about their genetic make-up. All the participants in that class got their sequencing done at 23andme first, so they had sequences to work with and could compare their own work with what the professionals turned up.

Experiments at BioCurious are not just about health. Synthetic biologists, for instance, are trying a lot of different ways to create eco-friendly synthetic fuels.

BioCurious is not a substitute for formal training in biochemistry, biology, and genetics. But it is a place for people to get a feel for what biologists do and for real biologists without access to expensive equipment to do research of their dreams.

In a back room (where I was allowed to go after being strenuously warned not to touch anything--BioCurious is an official BSL 1 facility, and they're lucky the city of Sunnyvale allowed them to open), one of the staff showed a traditional polymerase chain reaction (PCR) machine, which costs several thousand dollars and is critical for sequencing DNA.

Traditional commercial PCR
Traditional commercial PCR

A couple BioCurious founders analyzed the functions of a PCR and, out of plywood and off-the-shelf parts, built an OpenPCR with open hardware specs. At $599, OpenPCR opens up genetic research to a far greater audience.

BioCurious staffer with OpenPCR
BioCurious staffer with OpenPCR

How low-budget is BioCurious? After meeting for a year in somebody's garage, they finally opened this space three weeks ago with funds raised through Kickstarter. All the staff and instructors are volunteers. They keep such a tight rein on spending that a staffer told me they could keep the place open by teaching one class per week. Of the $117 students spent today for their five-hour class, $80 went to hardware.

BioCurious isn't unique (a similar space has been set up in New York City, and some movements such as synthetic biology promote open information), but it's got a rare knack for making people comfortable with processes and ideas that normally put them off. When executive director Eri Gentry introduces the idea to many people, they react with alarm and put up their hands, as if they're afraid of being overwhelmed by technobabble. (I interviewed Gentry (MP3) before a talk she gave at this year's O'Reilly Open Source Convention.)

Founder and executive director Eri Gentry
Founder and executive director Eri Gentry

BioCurious attacks that fear and miscomprehension. Like Hacker Dojo, another Silicon Valley stalwart whose happy hour I attended Friday night, they wants an open space for open-minded people. Hacker Dojo and BioCurious will banish forever the stereotype of the scientist or engineer as a socially maladroit loner. The attendees are stringently welcoming and interested in talking about what they do in says that make it understandable.

I thought of my two children, both of whom pursued musical careers. I wondered how they would have felt about music if kids weren't exposed to music until junior high school, whereupon they were sat down and forced to learn the circle of fifths and first species counterpoint. That's sort of how we present biology to the public--and then, even those who do show an interest are denied access to affordable equipment. BioCurious is on the cusp of a new scientific revolution.

Eri Gentry with Andy Oram in lab
Eri Gentry with Andy Oram in lab

October 07 2011

OpenStack Foundation requires further definition

For outsiders, the major news of interest from this week's OpenStack conference in Boston was the announcement of an OpenStack Foundation. I attended the conference yesterday where the official announcement was made, and tried to find out more about the move. But this will be a short posting because there's not much to say. The thinness of detail about the Foundation is probably a good sign, because it means that Rackspace and its partners are seeking input from the community about important parameters.

OpenStack is going to be the universal cloud platform of the future. This is assured by the huge backing and scads of funding from major companies, both potential users and vendors. (Dell and HP had big presences at the show.) Even if the leadership flubs a few things, the backers will pick them up, dust them off, and propel them on their way forward.

But the leadership has made some flubs--just the garden-variety types made by other leaders of other open source projects that are not so fortunate (or unfortunate) to be under such a strong spotlight. Most of the attendees expressed the view that the project, barely a year old, just needs to mature a bit and get through its awkward adolescent phase.

The whole premise of OpenStack is freedom from vendor lock-in. So Rackspace knew its stewardship had to come to an end. One keynoter today suggested that OpenStack invite seasoned leaders from other famous foundations taking the helm of free software projects--Apache, Mozilla, Linux, GNOME--to join its board and give it sage advice. But OpenStack is in a unique position. These other projects had a few years to achieve code stability and gather a robust community before becoming the intense objects of desire among major corporations who, although they undoubtedly benefited the projects, brought competing agendas. OpenStack got the corporate attention first.

It's also making a pilgrimage into a land dominated by giants such as, VMware, and Microsoft. Interestingly, the people at this conference expressed less concern about the competition presented by those companies than the ambiguous love expressed by companies with complicated relationships to OpenStack, notably Red Hat.

Will the OpenStack Foundation control the code or just manage the business side of the project? How will it attract developers and build community? What role do governments play, given that cloud computing raises substantial regulatory issues? I heard lots of questions like these, all apparently to be decided in the months to come. As one attendee said at the governance forum, "Let's not talk here about details, but about how we're going to talk about details."

And a colleague said to me afterward, "It's exciting to be in at the start of something big." I agree, but other than saying it's big, we don't know much about it.

Sponsored post

September 19 2011

Promoting Open Source Software in Government: The Challenges of Motivation and Follow-Through

The Journal of Information Technology & Politics has just published a special issue on open source software. My article "Promoting Open Source Software in Government: The Challenges of Motivation and Follow-Through" appears in this issue, and the publisher has given me permission to put a prepublication draft online.

The main subject of the article is the battle between the Open Document Format (ODF) and Microsoft's Office standard, OOXML, which might sound like a quaint echo of a by-gone era but is still a critical issue in open government. But during the time my article developed, I saw new trends in government procurement--such as the Apps for Democracy challenge and the site--and incorporated some of the potential they represent into the piece.

Working with the publisher Taylor & Francis was enriching. The prepublication draft I gave them ranged far and wide among topics, and although these topics pleased the peer reviewers, my style did not. They demanded a much more rigorous accounting of theses and their justification. In response to their critique, I shortened the article a lot and oriented it around the four main criteria for successful adoption of open source by government agencies:

  1. An external trigger, such as a deadline for upgrading existing software

  2. An emphasis on strategic goals, rather than a naive focus on cost

  3. A principled commitment to open source among managers and IT staff responsible for making the transition, accompanied by the technical sophistication and creativity to implement an open source strategy

  4. High-level support at the policy-making level, such as the legislature or city council

Whenever I tell colleagues about the special issue on open source, they ask whether it's available under a Creative Commons license, or at least online for free download. This was also the first issue I raised with the editor as soon as my article was accepted, and he raised it with the publisher, but they decided to stick to their usual licensing policies. Allowing authors to put up a prepublication draft is adroit marketing, but also represents a pretty open policy as academic journals go.

On the one hand, I see the decision to leave the articles under a conventional license as organizational inertia, and a form of inertia I can sympathize with. It's hard to make an exception to one's business model and legal process for a single issue of a journal. Moreover, working as I do for a publisher, I feel strongly that each publisher should make the licensing and distribution choices that it feels is right for it.

But reflecting on the academic review process I had just undergone, I realized that the licensing choice reflected the significant difference between my attitude toward the topic and the attitude taken by academics who run journals. I have been "embedded" in free software communities for years and see my writing as an emerging distillation of what they have taught me. To people like me who promote open information, making our papers open is a logical expression of the values we're promoting in writing the papers.

But the academic approach is much more stand-offish. An anthropologist doesn't feel that he needs to invoke tribal spirits before writing about the tribe's rituals to invoke spirits, nor does a political scientist feel it necessary to organize a worker's revolution in order to write about Marxism. And having outsiders critique practices is valuable. I value the process that improved my paper.

But something special happens when an academic produces insights from the midst of a community or a movement. It's like illuminating a light emitting diode instead of just "shining light on a subject." I recently finished the book by Henry Jenkins, Fans, Bloggers, and Gamers: Media Consumers in a Digital Age, which hammers on this issue. As with his better-known book Convergence Culture, Jenkins is convinced that research about popular culture is uniquely informed by participating in fan communities. These communities don't waste much attention on licenses and copyrights. They aren't merely fawning enthusiasts, either--they critique the culture roughly and demandingly. I wonder what other disciplines could take from Jenkins.

August 26 2011

How Free Software Contributed to the Success of Steve Jobs and Apple

We all have to celebrate the career of Steve Jobs and thank him for the tremendous improvements he has brought to computer interfaces and hardware. The guy's amazing, OK? But Apple is something of a control-freak environment with a hard-handed approach to things such as product announcements and the App Store. An undercurrent of disgruntled consumers and policy-minded free software advocates has transferred their historic antipathy for Microsoft to Apple, now that it has become the brilliant business success of the new century. So I'd like to bring everybody together again for an acknowledgment of how important free software has been to Jobs and to Apple.

In the great Second Coming, when Jobs returned to Apple 1996, he drove
two big changes right away: porting over OpenSTEP from NeXT computer
and adopting a version of the open source BSD as Apple's new operating
system. OpenSTEP was a proprietary, platform-independent set of APIs
for Solaris, Windows, and NeXTSTEP. It was derived from NeXTSTEP
itself, the operating system that ran on Jobs's m68k-based NeXT
computers. But NeXT worked with the then-powerful Sun Microsystems,
which had based its own wildly popular SunOS on BSD. OpenSTEP became
the basis for the familiar Cocoa libraries and run-time that Apple
developers now depend on.

(It may seem strange to use the word "Open" in the name of a
proprietary system. But back then--and in some circles even today--the
most rudimentary efforts at interoperability were used to justify the
term. Anybody remember the Open Software Foundation?)

The foundation for the ground-breaking and still strong Mac OS X was a
version of BSD based on
NetBSD and FreeBSD but incorporating some unique
. Adopting BSD brought numerous advantages: it permitted
the Mac to multitask, and it made simple the porting of a huge range
of Unix-based and BSD-based applications that would expand the Mac
from its original role as a desktop for creative artists to a much more
robust and widely deployable system.

Particularly valuable to Apple--and related to its adoption of a Unix
variant--was the port of open source Samba, developed for Linux. Samba
reverse engineers the SMB/CIFS protocol and related protocols that
permit computers to join Microsoft local networks. Apple also (like
NeXT) used the historic GCC compiler developed by Richard Stallman,
and adopted KDE's browser engine (now known as Webkit) for Safari. These free software packages were insanely great; that's why Mac OS X incorporated them.

I think it is the familiarity of the Unix and BSD software that makes the Mac popular among geeks; it is now by far the most popular laptop one sees at computer conferences. And because of all the great server software that runs on the Mac thanks to its BSD core, it's gradually growing in popularity as a server for homes and small businesses.

Apple knew it had a good thing in its BSD-based kernel, because it chose to use it also in the iPhone and follow-on products. As I have reported before, the presence of BSD libraries and tools helped a group of free software advocates reverse engineer the iPhone API and create a public library that permitted people outside Apple for the first time to create applications for the iPhone. This led to a thriving community of iPhone apps, none of them approved by Apple of course, but Apple came out with its own API many months later and legitimized the external developer community with its App Store.

Although the BSD license allowed Apple to keep its changes proprietary, it chose to open-source the resulting operating system under the name Darwin. As a separate project, though, Darwin hasn't seen wide use.

BSD was not Jobs's first alliance with free software. The NeXT computer was based on the open source Mach 3 kernel developed by Richard Rashid at Carnegie Mellon University. Mach emulated FreeBSD (even though Rashid personally expressed a distaste for it) for its programmer and user interface. Some elements of Mach 3 were incorporated into Darwin, and (to digress a bit) Mach 3 has gone on to have major effects on the computer industry. It was an inspiration for the microkernel design of Microsoft's NT system, which thrust Microsoft into the modern age of operating systems and servers especially. And Rashid himself took a position as Senior Vice President of Research at Microsoft a few years ago.

The impacts of broad, leaderless, idea-based movements are often surprising and hard to trace, and that's true of open source and free software. The triumphs of Steve Jobs demonstrate this principle--even though free software is the antithesis of how Apple runs its own business. Innovators such as Andrew Tridgell, with Samba and rsync, just keep amazing us over and over again, showing that free software doesn't recognize limits to its accomplishments. A lot of computing history would be very different, and poorer, without it.

Thanks to Karl Fogel, Brian Jepson, and Don Marti for comments
that enhanced this posting.

July 31 2011

App outreach and sustainability: lessons learned by Portland, Oregon

Having decided to hang around Portland for a couple days after the Open Source convention, I attended a hackathon sponsored by the City of Portland and a number of local high tech companies, and talked to Rick Nixon (program manager for technology initiatives in the Portland city government) about the two big problems faced by contests and challenges in government apps: encouraging developers to turn their cool apps into sustainable products, and getting the public to use them.

It's now widely recognized that most of the apps produced by government challenges are quickly abandoned. None of the apps that won awards at the original government challenge--Vivek Kundra's celebrated Apps for Democracy contest in Washington, DC--still exist.

Correction: Alex Howard tells me one of the Apps for Democracy
winners is still in use, and points out that other cities have found
strategies for sustainability.

And how could one expect a developer to put in the time to maintain an app, much less turn it into a robust, broadly useful tool for the general public? Productizing software requires a major investment. User interface design is a skill all its own, databases have to be maintained, APIs require documentation that nobody enjoys writing, and so forth. (Customer service is yet another burden--one that Nixon finds himself taking on for apps developed by private individuals for the city of Portland.) Developers quit their day jobs when they decide to pursue interesting products. The payoff for something in the public sphere just isn't there.

If a government's goal is just to let the commercial world know that a data set is available, a challenge may be just the thing to do, even if no direct long-term applications emerge. But as Nixon pointed out, award ceremonies create a very short blip in the public attention. Governments and private foundations may soon decide that the money sunk into challenges and awards is money wasted--especially as the number of challenges proliferate, as I've seen them do in the field of health.

Because traditional incentives can never bulk up enough muscle to make it worthwhile for a developer to productize a government app, the governments can try taking the exact opposite approach and require any winning app to be open source. That's what Portland's CivicApps does. Nixon says they also require a winning developer to offer the app online for at least a year after the contest. This gives the app time to gain some traction.

Because nearly any app that's useful to one government is useful to many, open source should make support a trivial problem. For instance, take Portland's city council agenda API, which lets programmers issue queries like "show me the votes on item 506" or "what was the disposition of item 95?" On the front end, a city developer named Oscar Godson created a nice wizard, with features such as prepopulated fields and picklists, that lets staff quickly create agendas. The data format for storing agendas is JSON and the API is so simple that I started retrieving fields in 5 minutes of Ruby coding. And at the session introducing the API, several people suggested enhancements. (I suggested a diff facility and a search facility, and someone else suggested that session times be coded in standard formats so that people could plan when to arrive.) Why couldn't hundreds of governments chip in to support such a project?

Code for America, a public service organization for programmers supported by O'Reilly and many other institutions, combines a variety of strategies. All projects are open source, but developers are hooked up with projects for a long enough period to achieve real development milestones. But there may still be a role for the macho theatrics of a one-day hackathon or short-term challenge.

Enhancing the platform available to developers can also stimulate more apps. Nixon pointed out that, when Portland first released geographic data in the form of Shapefiles, a local developer created a site to serve them up more easily via an API, mobilizing others to create more apps. He is now part of the Code For America effort doing exactly the same thing--serving up geographic data--for other large municipalities.

Public acceptance is the other big problem. A few apps hit the big time, notably the Portland PDX bus app that tells you how soon a bus is coming so you can minimize the time you wait out in the rain. But most remain unknown and unappreciated. Nixon and I saw no way forward here, except perhaps that one must lead the way with increasing public involvement in government, and that this involvement will result in an increased use of software that facilitates it.

The wealth of simple APIs made a lot of people productive today. The applications presented at the end of the Portland hackathon were:

  • A mapping program that shows how much one's friends know each other, clustering people together who know each other well

  • An information retrieval program that organizes movies to help you find one to watch

  • A natural language processing application that finds and displays activities related to a particular location

  • An event planner that lets you combine the users of many different social networks, as well as email and text messaging users (grand prize winner)

  • A JSON parser written in Lua communicating with a GTK user interface written in Scheme (just for the exercise)

  • A popularity sorter for the city council agenda, basing popularity on the number of comments posted

  • A JavaScript implementation of LinkedIn Circles

  • A geographic display of local institutions matching a search string, using the Twilio API

  • A visualization of votes among city council members

  • An aggregator for likes and comments on Facebook and (eventually) other sites

  • A resume generator using LinkedIn data

  • A tool for generating consistent location names for different parts of the world that call things by different terms

Approximately 130 man-and-woman hours went into today's achievements. A project like Code for America multiplies that by hundreds.

July 30 2011

Report from Open Source convention health track, 2011

Open source software in health care? It's limited to a few pockets of use--at least in the United States--but if you look at it a bit, you start to wonder why any health care institution uses any proprietary software at all.

What the evidence suggests

Take the conference session by University of Chicago researchers commissioned to produce a report for Congress on open source in health care. They found several open source packages that met the needs for electronic records at rural providers with few resources, such as safety-net providers.

They found that providers who adopted open source started to make the changes that the adoption of electronic health records (or any major new system) is supposed to do, but rarely does in proprietary health settings.

  • They offer the kinds of extra attention to patients that improve their health, such as asking them questions about long-term health issues.

  • They coordinate care better between departments.

  • They have improved their workflows, saving a lot of money

And incidentally, deployment of an open source EHR took an estimated 40% of the cost of deploying a proprietary one.

Not many clinics of the type examined--those in rural, low-income areas--have the time and money to install electronic records, and far fewer use open source ones. But the half-dozen examined by the Chicago team were clear success stories. They covered a variety of areas and populations, and three used WorldVistA while three used other EHRs.

Their recommendations are:

  • Greater coordination between open source EHR developers and communities, to explain what open source is and how they benefit providers.

  • Forming a Community of Practice on health centers using open source EHRs.

  • Greater involvement from the Federal Government, not to sponsor open source, but to make communities aware that it's an option.

Why do so few providers adopt open source EHRs? The team attributed the problem partly to prejudice against open source. But I picked up another, deeper concern from their talk. They said success in implementing open source EHRs depends on a "strong, visionary leadership team." As much as we admire health providers, teams like that are hard to form and consequently hard to find. But of course, any significant improvement in work processes would require such a team. What the study demonstrated is that it happens more in the environment of an open source product.

There are some caveats to keep in mind when considering these findings--some limitations to the study. First, the researchers had very little data about the costs of implementing proprietary health care systems, because the vendors won't allow customers to discuss it, and just two studies have been published. Second, the sample of open source projects was small, although the consistency of positive results was impressive. And the researchers started out sympathetic to open source. Despite the endorsement of open source represented by their findings, they recognized that it's harder to find open source and that all the beneficial customizations take time and money. During a Birds-of-a-Feather session later in the conference, many of us agreed that proprietary solutions are here for quite some time, and can benefit by incorporating open source components.

The study nevertheless remains important and deserves to be released to Congress and the public by the Department of Health and Human Services. There's no point to keeping it under wraps; the researchers are proceeding with phase 2 of the study with independent funding and are sure to release it.

So who uses open source?

It's nice to hear about open source projects (and we had presentations on several at last year's OSCon health care track) but the question on the ground is what it's like to actually put one in place. The implementation story we heard this year was from a team involving Roberts-Hoffman Software and Tolven.

Roberts-Hoffman is an OSCon success story. Last year they received a contract from a small health care provider to complete a huge EHR project in a crazily short amount of time, including such big-ticket requirements as meeting HIPAA requirements. Roberts-Hoffman knew little about open source, but surmised that the customization it permitted would let them achieve their goal. Roberts-Hoffman CEO Vickie Hoffman therefore attended OSCon 2010, where she met a number of participants in the health care track (including me) and settled on Tolven as their provider.

The customer put some bumps in the road to to the open source approach. For instance, they asked with some anxiety whether an open source product would expose their data. Hoffman had a little educating to do.

Another hurdle was finding a vendor to take medication orders. Luckily, Lexicomp was willing to work with a small provider and showed a desire to have an open source solution for providers. Roberts-Hoffman ended up developing a Tolven module using Lexicomp's API and contributing it back to Tolven. This proprietary/open source merger was generally quite successful, although it was extra work providing tests that someone could run without a Lexicomp license.

In addition to meeting what originally seemed an impossible schedule, Tolven allowed an unusual degree of customization through templating, and ensured the system would work with standard medical vocabularies.

Why can't you deliver my data?

After presentations on health information exchanges at OSCON, I started to ruminate about data delivery. My wife and I had some problems with appliances this past Spring and indulged in some purchases of common household items, a gas grill from one company and a washing machine from another. Each offered free delivery. So if low-margin department stores can deliver 100-pound appliances, why can't my doctor deliver my data to a specialist I'm referred to?

The CONNECT Gateway and Direct project hopefully solve that problem. CONNECT is the older solution, with Direct offering an easier-to-implement system that small health care providers will appreciate. Both have the goal of allowing health care providers to exchange patient data with each other, and with other necessary organizations such as public health agencies, in a secure manner.

David Riley, who directed the conversion of CONNECT to an open-source, community-driven project at the Office of the National Coordinator in the Department of Health and Human Services, kicked off OSCon's health care track by describing the latest developments. He had led off last year's health care track with a perspective on CONNECT delivered from his role in government, and he moved smoothly this time into covering the events of the past year as a private developer.

The open-source and community aspects certainly proved their value when a controversy and lawsuit over government contracts threatened to stop development on CONNECT. Although that's all been resolved now, Riley decided in the Spring to leave government and set up an independent non-profit foundation, Alembic, to guide CONNECT. The original developers moved over to Alembic, notably Brian Behlendorf, and a number of new companies and contributors came along. Most of the vendors who had started out on the ONC project stayed with the ONC, and were advised by Riley to do so until Alembic's course was firm.

Lots of foundations handle open source projects (Apache, etc.) but Riley and Behlendorf decided none of them were proper for a government-centric health care project. CONNECT demanded a unique blend of sensitivity to the health care field and experience dealing with government agencies, who have special contract rules and have trouble dealing with communities. For instance, government agencies are tasked by Congress with developing particular solutions in a particular time frame, and cannot cite as an excuse that some developer had to take time off to get a full-time job elsewhere.

Riley knows how to handle the myriad pressures of these projects, and has brought that expertise to Alembic. CONNECT software has been released and further developed under a BSD license as the Aurion project. Now that the ONC is back on track and is making changes of its own, the two projects are trying to heal the fork and are following each other's changes closely. Because Aurion has to handle sensitive personal data deftly, Riley hopes to generalize some of the software and create other projects for handling personal data.

Two Microsoft staff came to OSCon to describe Direct and the open-source .NET libraries implementing it. It turned out that many in the audience were uninformed about Direct (despite an intense outreach effort by the ONC) and showed a good deal of confusion about it. So speakers Vaibhav Bhandari and Ali Emami spent the whole time alloted (and more) explaining Direct, with time for just a couple slides pointing out what the .NET libraries can do.

Part of the problem is that security is broken down into several different functions in ONC's solution. Direct does not help you decide whether to trust the person you're sending data to (you need to establish a trust relationship through a third party that grants certificates) or find out where to send it (you need to know the correspondent's email address or another connection point). But two providers or other health care entities who make an agreement to share data can use Direct to do so over email or other upcoming interfaces.

There was a lot of cynicism among attendees and speakers about whether government efforts, even with excellent protocols and libraries, can get doctors to offer patients and other doctors the necessary access to data. I think the reason I can get a big-box store to deliver an appliance but I can't get my doctor to deliver data is that the big-box store is part of a market, and therefore wants to please the customer. Despite all our talk of free markets in this country, health care is not a market. Instead, it's a grossly subsidized system where no one has choice. And it's not just the patients who suffer. Control is removed from the providers and payers as well.

The problem will be solved when patients start acting like customers and making appropriate demands. If you could say, "I'm not filling out those patient history forms one more time--you just get the information where I'm going," it might have an effect. More practically speaking, let's provide simple tools that let patients store their history on USB keys or some similar medium, so we can walk into a doctor's office and say "Here, load this up and you'll have everything you need."

What about you, now?

Patient control goes beyond data. It's really core to solving our crisis in health care and costs. A lot of sessions at OSCon covered things patients could do to take control of their health and their data, but most of them were assigned to the citizen health track (I mentioned them at the end of my preview article a week ago) and I couldn't attend them because they were concurrent with the health care track.

Eri Gentry delivered an inspiring keynote about her work in the biology start-up BioCurious, Karen Sandler (who had spoken in last year's health care track scared us all with the importance of putting open source software in medical devices, and Fred Trotter gave a brief but riveting summary of the problems in health care. Fred also led a session on the Quantified Self, which was largely a discussion with the audience about ways we could encourage better behavior in ourselves and the public at large.

Guaranteed to cause meaningful change

I've already touched on the importance of changing how most health care institutions treat patients, and how open source can help. David Uhlman (who has written a book for O'Reilly with Fred Trotter) covered the complex topic of meaningful use, a phrase that appeared in the recovery act of 2009 and that drives just about all the change in current U.S. institutions. The term "meaningful use" implies that providers do more than install electronic systems; they use them in ways that benefit the patients, the institutions themselves, and the government agencies that depend on their data and treatments.

But Uhlman pointed out that doctors and health administrators--let alone the vendors of EHRs--focus on the incentive money and seem eager to do the minimum that gets them a payout. This is self-defeating, because as the government will raise the requirements for meaningful use over the years, and will overwhelm quick-and-dirty implementations that fail to solve real problems. Of course, the health providers keep pushing back the more stringent requirements to later years, but they'll have to face the music someday. Perhaps the delay will be good for everyone in the long run, because it will give open source products a chance to demonstrate their value and make inroads where they are desperately needed.

As a crude incentive to install electronic records, meaningful use has been a big success. Before the recover act was passed, 15%-20% of U.S. providers had EHRs. Now the figures is 60% or 70% percent, and by the end of 2012 it will probably be 90%. But it remains to be seen whether doctors use these systems to make better clinical decisions, follow up with patients so they comply with treatments, and eliminate waste.

Uhlman said that technology accounts for about 20% of the solution. The rest is workflow. For instance, every provider should talk to patients on every visit about central health concerns, such as hypertension and smoking. Research has suggested that this will add 30% more time per visit. If it reduces illness and hospital admissions, of course, we'll all end up paying less in taxes and insurance. His slogan: meaningful use is a payout for quality data.

It may be surprising--especially to an OSCon audience--that one of the biggest hurdles to achieving meaningful use is basic computer skills. We're talking here about typing information in correctly, knowing that you need to scroll down to look at all information on the screen, and such like. All the institutions Uhlman visits think they're in fine shape and everybody has the basic skills, but every examination he's done proves that 20%-30% of the staff are novices in computer use. And of course, facilities are loath to spend extra money to develop these skills.

Open source everywhere

Open source has image and marketing problems in the health care field, but solutions are emerging all over the place. Three open source systems right now are certified for meaningful use: ClearHealth (Uhlman's own product), CareVue from MedSphere, and WorldVistA. OpenEMR is likely to join them soon, having completed the testing phase. vxVistA is certified but may depend on some proprietary pieces (the status was unclear during the discussion).

Two other intriguing projects presented at OSCon this year were popHealth and Indivo X. I interviewed architects from Indivo X and popHealth before they came to speak at OSCon. I'll just say here that popHealth has two valuable functions. It helps providers improve quality by providing a simple web interface that makes it easy for them to view and compare their quality measures (for instance, whether they offered appropriate treatment for overweight patients). Additionally, popHealth saves a huge amount of tedious manual effort by letting them automatically generate reports about these measures for government agencies. Indivo fills the highly valued space of personal health records. It is highly modular, permitting new data sources and apps to be added; in fact, speaker Daniel Haas wants it to be an "app store" for medical applications. Both projects use modern languages, frameworks, and databases, facilitating adoption and use.

Other health care track sessions

An excellent and stimulating track was rounded out with several other talks.

Shahid Shah delivered a talk on connecting medical devices to electronic record systems. He adroitly showed how the data collected from these devices is the most timely and accurate data we can get (better than direct reports from patients or doctors, and faster than labs), but we currently let it slip away from us. He also went over standard pieces of the open source stacks that facilitate the connection of devices, talked a bit about regulations, and discussed the role of routine engineering practices such as risk assessments and simulations.

Continuing on the quality theme, David Richards mentioned some lessons he learned designing a ways clinical decision support system. It's a demanding discipline. Accuracy is critical, but results must be available quickly so the doctor can use them to make decisions during the patient visit. Furthermore, the suggestions returned must be clear and precise.

Charlie Quinn talked about the collection of genetic information to achieve earlier diagnoses of serious conditions. I could not attend his talk because I was needed at another last-minute meeting, but I sat down for a while with him later.

The motto at his Benaroya Research Institute is to have diagnosis be more science, less art. With three drops of blood, they can do a range of tests on patients suspected of having particular health conditions. Genomic information in the blood can tell a lot about health, because blood contains viruses and other genomic material besides the patient's own genes.

Tests can compare the patients to each other and to a healthy population, narrowing down comparisons by age, race, and other demographics. As an example, the institute took samples before a vaccine was administered, and then at several frequent intervals in the month afterward. They could tell when the vaccine had the most powerful effect on the body.

The open source connection here is the institute's desire to share data among multiple institutions so that more patients can be compared and more correlations can be made. Quinn said it's hard to get institutions to open up their data.

All in all, I was energized by the health care track this year, and really impressed with the knowledge and commitment of the people I met. Audience questions were well-informed and contributed a lot to the presentations. OSCon shows that open source health care, although it hasn't broken into the mainstream yet, already inspires a passionate and highly competent community.

July 11 2011

popHealth open source software permits viewing and reporting of quality measures in health care

A couple weeks ago I talked to two members of the popHealth project, which culls quality measures from electronic health records and formats them either for convenient display--so providers can review their quality measures on the Web--or for submission to regulators who require reports on these measures. popHealth was produced by the MITRE corporation under a grant from the Office of the National Coordinator at the US Department of Health and Human Services. One of my interviewees, Andrew Gregorowicz, will speak about popHealth at the Open Source convention later this month.

Videos of the interviews follow.

Lisa Tutterow: The importance of quality measures in health care, and the niche filled by open source popHealth

Lisa Tutterow: How popHealth improves the reporting of quality measures in health care

Andrew Gregorowicz: popHealth's extendability and use of RESTful interfaces

Andrew Gregorowicz: popHealth's use of standard information from electronic health records, the goals of making it open source, and technical information

Andrew Gregorowicz: The relation of popHealth to standards, and the related hData project

Useful links:

Two other interviews with speakers at OSCon's health care track this year include Shahid N. Shah on medical devices and open source and Indivo X personal health record: an interview with Daniel Haas of Children's Hospital.

July 06 2011

OSCon preview: Shahid N. Shah on medical devices and open source

I talked recently with Shahid N. Shah, who is speaking in the health care track at the O'Reilly Open Source convention later this month about The Implications of Open Source Technologies in Safety-critical Medical Device Platforms. Shahid and I discussed:

Podcast (MP3)

  • Why the data generated from medical devices is particularly reliable patient-related information, and its value for improving treatment

  • The value of connecting these devices to electronic health records, and the kinds of research this enables

  • The role of open source software in making it easier for device manufacturers to add connectivity--and to get it approved by the FDA

  • How it's time for regulators such as the Department of Health and Human Services to take a look at how devices can contribute to better health care

Another OSCon health-care-related posting is my video interview about the Indivo X personal health record with Daniel Haas.

June 27 2011

Open source personal health record: no need to open Google Health

The news went out Friday that Google is shutting down Google Health. This portal was, along with Microsoft HealthVault (which is still going strong) the world's best-known place for people to store health information on themselves. Google Health and Microsoft HealthVault were widely cited as harbingers of a new zeal for taking control of one's body and becoming a partner with one's doctors in being responsible for health.

Great ideas, but hardly anybody uses these two services. Many more people use a PHR provided by their hospital or general practitioner, which is not quite the point of a PHR because you see many practitioners over the course of your life and your data ought to be integrated in one place where you can always get it.

Predictably, free software advocates say, "make Google Health open source!" This also misses the point. The unique attributes of cloud computing were covered in a series of articles I put up a few months ago. As I explain there, the source code for services such as Google Health is not that important. The key characteristic that makes Google Health and Microsoft HealthVault appealing is...that they are run by Google and Microsoft. Those companies were banking on the trust that the public has for large, well-endowed institutions to maintain a service. And Google's decision to shutter its Health service (quite reasonable because of its slow take-off) illustrates the weakness of such cloud services.

The real future of PHRs is already here in the form of open source projects that people can take in any direction they want. One is Indivo, whose lead architect I recently interviewed (video) and which is also covered in a useful blog about the end of Google Health by an author of mine, Fred Trotter.

Two other projects worth following are OpenMRS and
Tolven (which includes a PHR).
People are talking about extending the Department of Veterans Affairs' Blue Button. Trotter's Healthevet (the software behind Blue Button) is also an open source PHR.

Whatever features a PHR may offer are overshadowed by the key ability to accept data in industry-standard formats and interact with a wide range of devices. A good piece of free software can be endlessly enhanced with these capabilities.

So in short, there are great projects that are already open source and worth contributing to and implementing. The question is still open of who is best suited to host the services. I'm not picking winners, but as we get more and more sensors, personal health monitors, and other devices emitting data about ourselves, the PHR will find a home.

May 08 2011

Feeding the community fuels advances at Red Hat and JBoss

I wouldn't dare claim to pinpoint what makes Red Hat the most successful company with a pervasive open source strategy, but one intriguing thing sticks out: their free software development strategy is the precise inverse of most companies based on open source.

Take the way Red Hat put together CloudForms, one of their major announcements at last week's instance of the annual Red Hat Summit and JBoss World. As technology, CloudForms represents one of the many efforts in the computer industry to move up the stack in cloud computing, with tools for managing, migrating, and otherwise dealing with operating system instances along with a promise (welcome in these age of cloud outages) to allow easy switches between vendors and prevent lock-in. But CloudForms is actually a blend of 79 SourceForge projects. Red Hat created it by finding appropriate free software technologies and persuading the developers to work together toward this common vision.

I heard this story from vice president Scott Farrand of Hewlett-Packard. Their own toe hold on this crowded platform is the HP edition, a product offering that manages ProLiant server hosts and Flex Fabric networking to provide a platform for CloudForms.

The point of this story is that Red Hat rarely creates products like other open source companies, which tend to grow out of a single project and keep pretty close control over the core. Red Hat makes sure to maintain a healthy, independent community-based project. Furthermore, many open source companies try to keep ahead of the community, running centralized beta programs and sometimes keeping advanced features in proprietary versions of the product. In contrast, the community runs ahead of Red Hat projects. Whether it's the Fedora Linux distribution, the Drools platform underlying JBoss's BPM platform, JBoss Application Server lying behind JBoss's EAP offering, or many other projects forming the foundation of Red Hat and JBoss offerings, the volunteers typically do the experimentation and stabilize new features before the company puts together a stable package to support.

Red Hat Summit and JBoss World was huge and I got to attend only a handful of the keynotes and sessions. I spent five hours manning the booth of for Open Source for America, which got a lot of positive attention from conference attendees. Several other worthy causes in reducing poverty attracted a lot of volunteers.

In general, what I heard at the show didn't represent eye-catching innovations or sudden changes in direction, but solid progress along the lines laid out by Red Hat and JBoss in previous years. I'll report here on a few technical advances.

PaaS standardization: OpenShift

Red Hat has seized on the current computing mantra of our time, which is freedom in the cloud. (I wrote a series on this theme, culminating in a proposal for an open architecture for SaaS.) Whereas CloudForms covers the IaaS space, Red Hat's other big product announcement, OpenShift, tries to broaden the reach of PaaS. By standardizing various parts of the programming environment, Red Hat hopes to bring everyone together regardless of programming language, database back-end, or other options. For example, OpenShift is flexible enough to support PostgreSQL from EnterpriseDB, CouchDB from Couchbase, and MongoDB from 10gen, among the many partners Red Hat has lined up.

KVM optimization

The KVM virtualization platform, a direct competitor to VMware (and another project emerging from and remaining a community effort), continues to refine its performance and offer an increasing number of new features.

  • Linux hugepages (2 megabytes instead of 4 kilobytes) can lead to a performance improvement ranging from 24% to 46%, particularly when running databases.

  • Creating a virtual network path for each application can improve performance by reducing network bottlenecks.

  • vhost_net improves performance through bypassing the user-space virtualization model, QEMU.

  • Single Root I/O Virtualization (SR-IOV) allows direct access from a virtual host to an I/O device, improving performance but precluding migration of the instance to another physical host.

libvirt is much improved and is now the recommended administrative tool.

JBoss AS and EAP

Performance and multi-node management, seemed to be the obsessions driving AS 7. Performance improvements, which have led to a ten-fold speedup and almost ten times less memory use between AS 6 and AS 7, include:

  • A standardization of server requirements (ports used, etc.) so that these requirements can be brought up concurrently during system start-up

  • Reorganization of the code to better support multicore systems

  • A cache to overcome the performance hit in Java reflection.

Management enhancements include:

  • Combining nodes into domains where they can be managed as a unit

  • The ability to manage nodes through any scripting language, aided by a standard representation of configuration data types in a dynamic model with a JSON representation

  • Synching the GUI with the XML files so that a change made in either place will show up in the other

  • Offering a choice whether to bring up a server right away at system start-up, or later on an as-needed basis

  • Cycle detection when servers fail and are restarted

May 06 2011

Collaborative genetics, part 5: Next steps for genetic commons

Previous installment: Private practice, how to respect the patient

Sage is growing, and everything they're doing to promote the commons now will likely continue. They'll sign up more pharma companies to contribute data and more researchers to work in teams, such as in the Federation.

Although genetics seems to be a narrow area, it's pretty central to everything that government, hospitals, and even insurers want to achieve in lowering costs and improving care. This research is at the heart of such tasks as:

  • Making drug development faster and cheaper (drugs are now a major source of inflation in in health care, particularly among the growing elderly population)

  • Discovering in advance which patients will fail to respond to drugs, thus lowering costs and allowing them to access correct treatments faster

  • Improving our knowledge of the incidence and course of diseases in general

From my perspective--knowing little about medical research but a fair among about software--the two biggest areas that need attention are standardized formats and software tools to support such activities as network modeling and analyzing results. Each institution tends to be on its own, but there are probably a lot of refined tools out there that could help everybody.

Researchers may well underestimate how much effort needs to go into standardizing software tools and formats, and how much pay-off that work would produce. Researchers tend to be loners, brave mountaineers who like to scale the peaks on their own and solve each problem through heroism along the way. Investing in a few cams and rappels could greatly enhance their success.

Publicity and public engagement are good for any initiative, but my guess is that, if Sage and its collaborators develop some awesome tools and show more of the results we started to see at this conference, other institutions will find their way to them.

This posting is the last of a five-part series.

May 05 2011

Collaborative genetics, part 4: Private practice, how to respect the patient

Previous installment: Dividing the pie, from research to patents

The fear of revealing patient data pervades the medical field, from the Hippocratic Oath to the signs posted all over hospitals reminding staff not to discuss patients in the hallways and elevators. HIPAA's privacy provisions are parts most routinely cited, and many hospitals overreach their legal mandates, making it even harder than the law requires to get data. Whereas Americans have gotten used to the wanton collection of data in other spheres of life, health care persists in its idyllic island or innocence (and we react with outrage whenever this innocence proves illusory).

In my break-out session about terms of service, a lot of the talk revolved around privacy. The attendees acknowledged respectfully on one level and grumbled about how the laws got in the way of good research on another level. Their attitudes struck me as inconsistent and lacking resolve, but overall I felt that these leaders in the field of health lacked an appreciation for the sacredness of privacy as part of the trust a patient has for her doctor and the health care system.

Even Peter Kapitein, in his keynote, railed that concerns for privacy were just excuses used by institutions to withhold information they didn't want the public to know. This is often true, but I felt he went too far in uttering: "No patient will say, please don't use my data if it will help me or help someone else in my position." This is not what surveys show, such as Dr. Alan Westin's 2008 report to the FTC. When I spoke to Kapitein afterward, he acknowledged that he had exaggerated his point for the sake of rhetoric, and that he recognized the importance of privacy in many situations. Still, I fear that his strong statement might have a bad effect on his audience.

We all know that de-identified data is vulnerable to re-identification and that many patients have good reason to fear what would happen if certain people got word of their conditions. It's widely acknowledged that many patients withhold information from their own doctors out of embarrassment. They still need to have a choice when researchers ask for data too. Distrust of medical research is common among racial minorities, still angry at the notorious Tuskegee syphilis study and recently irked again by researchers' callous attitude toward the family of Henrietta Lacks.

Wilbanks recommends that the terms of service for the commons prohibit unethical uses of data, and specifically the combination of data from different sources for re-identification of patients.

It's ironic that one vulnerability might be forced on the Sage commons by patients themselves. Many patients offer their data to researchers with the stipulation that the patients can hear back about any information the researchers find out about them; this is called a "patient grant-back."

Grant-backs introduce significant ethical concerns, aside from privacy, because researchers could well find that the patient has a genetic makeup strongly disposing him to a condition for which there's no treatment, such as Huntington's Disease. Researchers may also find out things that sound scary and require professional interpretation to put into context. One doctor I talked to said the researcher should communicate any findings to the patient's doctor, not the patient himself. But that would be even harder to arrange.

In terms of privacy, requiring a researcher to contact the patient introduces a new threat of attack and places a huge administrative burden on the researchers, as well as any repository such as the commons. It means that the de-identified data must be linked in a database to contact information for the patient. Even if careful measures are taken to separate the two databases, an intruder has a much better chance of getting the data than if the patient left no such trace. Patients should be told that this is a really bad deal for them.

This posting is one of a five-part series. Final installment: Next steps for genetic commons

May 04 2011

Collaborative genetics, part 3: Dividing the pie, from research to patents

Previous installment: Five Easy Pieces, Sage's Federation

What motivates scientists, companies, and funders to develop bold new treatments? Of course, everybody enters the field out of a passion to save humanity. But back to our question--what motivates scientists, companies, and funders to develop bold new treatments?

The explanations vary not only for different parts of the industry (academics, pharma companies, biotech firms, government agencies such as NIH, foundations, patient advocacy groups) but for institutions at different levels in their field. And of course, individual scientists differ, some seeking only advancement in their departments and the rewards of publishing, whereas others jump whole-heartedly into the money pipeline.

The most illustrious and successful projects in open source and open culture have found ways to attract both funds and content. Sage, the Federation, and Arch2POCM will have to find their own sources of ammunition.

The fuse for this reaction may have to begin with funders in government and the major foundations. Currently they treat only a published paper as fulfillment of a contract. The journals also have to be brought into the game. All the other elements of the data chain that precede and follow publication need to get their due.

Sage is creating a format for citing data sets, which can be used in the same ways researchers cite papers. A researcher named Eric Shadt also announced a new journal Open Network Biology, with the commitment of publishing research papers along with the network models used, the software behind the results, and the underlying data.

Although researchers are competitive, they also recognize the importance of sharing information, so they are persuadable. If a researcher believes someone else may validate and help to improve her data, she has a strong incentive to make it open. Releasing her data can also raise her visibility in the field, independent of publications.

Arch2POCM has even more ticklish tasks. On the one hand, they want to direct researchers toward work that has a good chance of producing a treatment--and even this goal is muddled by the desire to encourage more risk-taking and a willingness to look beyond familiar genes. (Edwards ran statistics on studies in journals and found them severely clustered among a few genes that had already been thoroughly explored, mostly ignoring the other 80% or 90% of the human genome even though it is known to have characteristics of interest in the health field. His highly skewed graph drew a strong response of concern from the audience.)

According to Teri Melese, Director of Research Technologies and Alliances for UCSF School of Medicine, pharma companies already have programs aiming to promote research that uses their compounds by offering independent researchers the opportunity to submit their ideas for studies. But the company has to approve each project , and although the researcher can publish results, the data used in the experiment remains tightly under the control of the researcher or the company. This kind of program shows that nearly infinite degrees of compromise lie between totally closed research systems and a completely open commons--but to get the benefits of openness, companies and researchers will need to relinquish a lot more control than they have been willing to up till now.

The prospect of better research should attract funders, and Arch2POCM targets pharma companies in particular to pony up millions for research. The companies have a precedent for sharing data under the rubric of "pre-competitive research." According to Sage staffer Lara Mangravite, eight pharma companies have donated some research data to Sage.

The big trick is to leave as much of the results in the commons as possible while finding the right point in the development process where companies can extract compounds or other information and commercialize them as drugs. Sage would like to keep the core genetic information free from patents, but is willing to let commercial results be patented. Stephen Friend told me, "It is important to maintain a freedom of interoperability for the data and the metadata found within the hosted models of disease. Individuals and companies can still reduce to practice some of the proposed functions for proteins and file patents on these findings without contaminating the freedom of the information hosted on the commons platform."

The following diagram tries to show, in a necessarily over-simplified form, the elements that go into research and are produced by research. Naturally, results of some research are fed back in circular form to further research. Each input is represented as a circle, and is accompanied by a list of stake-holders who can assert ownership over it. Medicines, patents, and biological markers are listed at the bottom as obvious outputs that are not considered as part of the research inputs.

[Diagram of research inputs and outputs]

Inputs and outputs of research

The relationships ultimately worked out between the Sage commons and the pharma companies--which could be different for different classes of disease--will be crucial. The risk of being too strict is to drive away funds, while the risk of being too accommodating is to watch the commons collapse into just another consortium that divies up rewards among participants without giving the rest of the world open access.

What about making researchers who use the commons return the results of their research to the commons? This is another ticklish issue that was endlessly discussed in (and long after) a break-out session I attended. The Sage commons was compared repeatedly to software, with references to well-known licenses such as the GNU GPL, but analogies were ultimately unhelpful. A genetic commons represents a unique relationship among data related to particular patients, information of use in various stages of research, and commercially valuable products (to the tune of billions of dollars).

Sage and its supporters naturally want to draw as much research in to the commons as they can. It would be easy to draw up a reciprocal terms of service along the lines of, "If you use our data, give back your research results"--easy to draw up but hard to enforce. John Wilbanks, a staff person at Creative Commons who has worked heavily with Sage, said such a stipulation would be a suggestion rather than a requirement. If someone uses data without returning the resulting research, members of the community around the commons could express their disapproval by not citing the research.

But all this bothers me. Every open system recognizes that it has to co-exist with a proprietary outer world, and provide resources to that world. Even the GNU GPL doesn't require you to make your application free software if you compile it with the GNU compiler or run it on Linux. Furthermore, the legal basis for strict reciprocity is dubious, because data is not protected by copyright or any other intellectual property regime. (See my article on collections of information.)

I think that outsiders attending the Congress lacked a fundamental appreciation of the purpose of an information commons. One has to think long-term--not "what am I going to get from this particular contribution?" but "what might I be able to do in 10 years that I can't do today once a huge collection of tools and data is in the public domain?" Nobody knows, when they put something into the commons, what the benefits will be. The whole idea is that people will pop up out of nowhere and use the information in ways that the original donors could not imagine. That was the case for Linux and is starting to come true for Android. It's the driving motivation behind the open government movement. You have to have a certain faith to create a commons.

At regular points during the Congress, attendees pointed out that no legitimate motivation exists in health care unless it is aimed ultimately toward improving life for the patients. The medical field refers to the experiences of patients--the ways they react to drugs and other treatments--as "post-market effects," which gives you a hint where the field's priorities currently lie.

Patients were placed front and center in the opening keynote by Peter Kapitein, a middle-aged Dutch banker who suffered a series of wrenching (but successful) treatments for lymphoma. I'll focus in on one of his statements later.

It was even more impressive to hear a central concern for the patient's experience expressed by Vicki Seyfert-Margolis, Senior Science Advisor to the FDA's Chief Scientist. Speaking for herself, not as a representative of the FDA, she chastised industry, academia, and government alike for not moving fast enough to collaborate and crowdsource their work, then suggested that in the end the patients will force change upon us all. While suggesting that the long approval times for drugs and (especially) medical devices lie in factors outside the FDA's control, she also said the FDA is taking complaints about its process seriously and has launched a holistic view of the situation under its current director.

[Photo of Vicki Seyfert-Margolis

Vicki Seyfert-Margolis keynoting at Sage Commons Congress

The real problem is not slow approval, but a lack of drug candidates. Submissions by pharmaceutical companies have been declining over the years. The problem returns to the old-fashioned ways the industry works: walled off into individual labs that repeat each other's work over and over and can't learn from each other.

One of the Congress's break-out groups was tasked with outreach and building bonds with the public. Not only could Sage benefit if the public understood its mission and accomplishments (one reason I'm writing this blog) but patients are key sources for the information needed in the commons.

There was some discussion about whether Sage should take on the area served by PatientsLikeMe and DIYgenomics, accepting individual donations of information. I'm also a bit dubious about far a Facebook page will reach. The connection between patient input and useful genetic information is attenuated and greatly segmented. It may be better to partner with the organizations that interact directly with individuals among the public.

It's more promising to form relationships with patient advocacy groups, as a staff person from the Genetic Alliance pointed out. Advocacy groups can find patients for drug trials (many trials are canceled for lack of appropriate patients) and turn over genetic and phenotypal information collected from those patients. (A "phenotype" basically refers to everything interesting about you that expresses your state of health. It could include aspects of your body and mental state, vital statistics, a history of diseases and syndromes you've had, and more.)

This posting is one of a five-part series. Next installment: Private practice, how to respect the patient

May 03 2011

Collaborative genetics, part 2: Five Easy Pieces, Sage's Federation

Previous installment:
The ambitious goals of Sage Commons Congress

A pilot project was launched by Sage with four university partners under the moniker of the Federation, which sounds like something out of a spy thriller (or Star Trek, which was honored in stock photos in the PowerPoint presentation about this topic). Hopefully, the only thrill will be that expressed by the participants. Three of them presented the results of their research into aspects of aging and disease at the Congress.

The ultimate goal of the Federation is to bring together labs from different places to act like a single lab. Its current operation is more modest. As a pilot, the Federation received no independent funding. Each institution agreed simply to allow their researchers to collaborate with the other four institutions. Atul Butte of Stanford reported that the lack of explicit funding probably limited collaboration. In particular, the senior staff on each project did very little communication because their attention was always drawn to other tasks demanded by their institutions, such as writing grant proposals. Junior faculty did most of the actual collaboration.

As one audience member pointed out, "passion doesn't scale." But funding will change the Federation as well, skewing it toward whatever gets rewarded.

[Photo of audience and podium]

Audience and podium at Sage Commons Congress

When the Federation grows, the question it faces is whether to incorporate more institutions in a single entity under Sage's benign tutelage or to spawn new Federations that co-exist in their own loose federation. But the issue of motivations and rewards has to be tackled.

Another organization launched by Sage is Arch2POCM, whose name requires a bit of elucidation. The "Arch" refers to "archipelago" as a metaphor for a loose association among collaborating organizations. POCM stands for Proof of Clinical Mechanism. The founders of Arch2POCM believe that if the trials leading to POCM (or the more familiar proof of concept, POC) were done by public/private partnerships free of intellectual property rights, companies could benefit from reduced redundancy while still finding plenty of opportunities to file patents on their proprietary variants.

Arch2POCM, which held its own summit with a range of stakeholders in conjunction with the larger Congress, seeks to establish shared, patent-free research on a firm financial basis, putting organizational processes in place to reward scientists for producing data and research that go into the commons. rch2POCM's reach is ambitious: to find new biological markers (substances or techniques for tracking what happens in genes), and even the compounds (core components of effective drugs) that treat diseases.

The pay-off for a successful Arch2POCM project is enticing. Not only could drugs be developed much more cheaply and quickly, but we might learn more about the precise ways they affect patients so that we can save patients from taking drugs that are ineffective in their individual cases, and eliminate adverse effects. To get there, incentives once again come to the fore. A related platform called Synapse hosts the data models, providing a place to pick targets and host the clinical data produced by the open-access clinical trials.

This posting is one of a five-part series. Next installment: Dividing the pie, from research to patents

May 02 2011

Collaborative genetics, Part 1: The ambitious goals of Sage Commons Congress

In a field rife with drug-addicted industries that derive billions of dollars from a single product, and stocked with researchers who scramble for government grants (sadly cut back by the recent US federal budget), the open sharing of genetic data and tools may seem a dream. But it must be more than a dream when the Sage Commons Congress can draw 150 attendees (turning away many more) from research institutions such as the Netherlands Bioinformatica Centre and Massachusetts General Hospital, leading universities from the US and Europe, a whole roster of drug companies (Pfizer, Merck, Novartis, Lilly, Genentech), tech companies such as Microsoft and, foundations such as Alfred P. Sloan, and representatives from the FDA and the White House. I felt distinctly ill at ease trying to fit into such a well-educated crowd, but was welcomed warmly and soon found myself using words such as "phenotype" and "patient stratification."

Money is not the only complicating factor when trying to share knowledge about our genes and their effect on our health. The complex relationships of information generation, and how credit is handed out for that information, make biomedical data a case study all its own.

The complexity of health research data

I listened a couple weeks ago as researchers at this congress, held by Sage Bionetworks, questioned some of their basic practices, and I realized that they are on the leading edge of redefining what we consider information. For most of the history of science, information consisted of a published paper, and the scientist tucked his raw data in a moldy desk drawer. Now we are seeing a trend in scientific journals toward requiring authors to release the raw data with the paper (one such repository in biology is Dryad). But this is only the beginning. Consider what remains to be done:

  • It takes 18 to 24 months to get a paper published. The journal and author usually don't want to release the data until the date of publication, and some add an arbitrary waiting period after publication. That's an extra 18 to 24 months (a whole era in some fields) during which that data is withheld from researchers who could have built new discoveries on it.

  • Data must be curated, which includes:

  • Being checked for corrupt data and missing fields (experimental artifacts)

  • Normalization

  • Verifying HIPAA compliance and other assurances that data has been properly de-identified

  • Possible formatting according to some standard

  • Reviewing for internal and external validity

  • Advocates of sharing hope this work be crowdsourced to other researchers who want to use the data. But then who gets credited and rewarded for the work?

  • Negative results--experiments showing that a treatment doesn't work--are extremely important, and the data behind them is even more important. Of course, knowing where other researchers or companies failed could boost the efforts of other researchers and companies. Furthermore this data may help accomplish patient stratification--that is, show when some patients will benefit and some will not, even when their symptoms seem the same. The medical field is notorious for suppressing negative results, and the data rarely reaches researchers who can use it.

  • When researchers choose to release data--or are forced to do so by their publishers--it can be in an atrocious state because it missed out on the curation steps just mentioned. The data may also be in a format that makes it hard to extract useful information, either because no one has developed and promulgated an appropriate format, or because the researcher didn't have time to adopt it. Other researchers may not even be able to determine exactly what the format it. Sage is working on very simple text-based formats that provide a lowest common denominator that will help researchers get started.

  • Workflows and practices in the workplace have a big effect on the values taken by the data. These are very hard to document, but can help a great deal in reproducing and validating results. Geneticists are starting to use a workflow documentation tool called Taverna to record the ways they coordinate different software tools and data sets.

  • Data can be interpreted in multiple ways. Different teams look for different criteria and apply different standards of quality. It would be useful to share these variations.

  • A repeated theme at the Congress was "going beyond the narrative." The narrative here is the published article. Each article tells a story and draws conclusions. But a lot goes on behind the scenes in the art and science of medicine. Furthermore, letting new hypotheses emerge from data is just as important as verifying the narrative provided by one's initial hypothesis.

    One of the big questions raised in my mind--and not covered in the conference--was the effect it would have on the education of the next generation of scientists were teams to expose all those hidden aspects of data: the workflows, the curation and validation techniques, the interpretations. Perhaps you wouldn't need to attend the University of California at Berkeley to get a Berkeley education, or risk so many parking tickets along the way. Certainly, young researchers would have powerful resources for developing their craft, just as programmers have with the source code for free software.

    I've just gone over a bit of the material that the organizers of the Sage Commons Congress want their field to share. Let's turn to some of structures and mechanisms.

    Of networks

    Take a step back. Why do geneticists need to share data? There are oodles of precedents, of course: the Human Genome Project, biobricks, the Astrophysics Data System (shown off in a keynote by Alyssa A. Goodman from Harvard), open courseware, open access journals, and countless individual repositories put up by scientists. A particularly relevant data sharing initiative is the International HapMap Project, working on a public map of the human genome "which will describe the common patterns of human DNA sequence variation." This is not a loose crowdsourcing project, but more like a consortium of ten large research centers promising to release results publicly and forgo patents on the results.

    The field of genetics presents specific challenges that frustrate old ways of working as individuals in labs that hoard data. Basically, networks of genetic expression requires networks of researchers to untangle them.

    In the beginning, geneticists modeled activities in the cell through linear paths. A particular protein would activate or inhibit a particular gene that would then trigger other activities with ultimate effects on the human body.

    They found that relatively few activities could be explained linearly, though. The action of a protein might be stymied by the presence of others. And those other actors have histories of their own, with different pathways triggering or inhibiting pathways at many points. Stephen Friend, President of Sage Bionetworks, offers the example of an important gene implicated in breast cancer, the Human Epidermal growth factor Receptor 2, HER2/neu. The drugs that target this protein are weakened when another protein, Akt, is present.

    Trying to map these behaviors, scientists come up with meshes of paths. The field depends now on these network models. And one of its key goals is to evaluate these network models--not as true or false, right or wrong, because they are simply models that represent the life of the cell about as well as the New York subway map represents the life of the city--but for the models' usefulness in predicting outcomes of treatments.

    Network models containing many actors and many paths--that's why collaborations among research projects could contribute to our understanding of genetic expression. But geneticists have no forum for storing and exchanging networks. And nobody records them in the same format, which makes them difficult to build, trade, evaluate, and reuse.

    The Human Genome Project is a wonderful resource for scientists, but it contains nothing about gene expression, nothing about the network models and workflows and methods of curation mentioned earlier, nothing about software tools and templates to promote sharing, and ultimately nothing that can lead to treatments. This huge, multi-dimensional area is what the Sage Commons Congress is taking on.

    More collaboration, and a better understanding of network models, may save a field that is approaching crisis. The return on investment for pharmaceutical research, according to researcher Aled Edwards, has gone down over the past 20 years. In 2009, American companies spent one hundred billion dollars on research but got only 21 drugs approved, and only 7 of those were truly novel. Meanwhile, 90% of drug trials fail. And to throw in a statistic from another talk (Vicki Seyfert-Margolis from the FDA), drug side effects create medical problems in 7% of patients who take the drugs, and require medical interventions in 3% or more cases.

    This posting is one of a five-part series. Next installment: Five Easy Pieces, Sage's Federation

    April 15 2011

    Wrap-up of 2011 MySQL Conference

    Two themes: mix your relational database with less formal solutions and move to the cloud. Those were the messages coming from O'Reilly's MySQL conference this week. Naturally it included many other talks of a more immediate practical nature: data warehousing and business intelligence, performance (both in MySQL configuration and in the environment, which includes the changes caused by replacing disks with Flash), how to scale up, and new features in both MySQL and its children. But everyone seemed to agree that MySQL does not stand alone.

    The world of databases have changed both in scale and in use. As Baron Schwartz said in his broad-vision keynote, databases are starting to need to handle petabytes. And he criticized open source database options as having poorer performance than proprietary ones. As for use, the databases struggle to meet two types of requirements: requests from business users for expeditious reports on new relationships, and data mining that traverses relatively unstructured data such as friend relationships, comments on web pages, and network traffic.

    Some speakers introduced NoSQL with a bit of sarcasm, as if they had to provide an interface to Hbase or MongoDB as a check-off item. At the other extreme, in his keynote, Brian Aker summed up his philosophy about Drizzle by saying, "We are not an island unto ourselves, we are infrastructure."

    Judging from the informal audience polls, most of the audience had not explored NoSQL or the cloud yet. Most of the speakers about these technologies offered a mix of basic introductory material and useful practical information to meet the needs of their audience, who came, listened, and asked questions. I heard more give and take during the talks about traditional MySQL topics, because the audience seemed well versed in them.

    The analysts and experts are telling us we can save money and improve scalability using EC2-style cloud solutions, and adopt new techniques to achieve the familiar goals of reliability and fast response time. I think a more subtle challenge of the cloud was barely mentioned: it encourages a kind of replication that fuzzes the notion of consistency and runs against the ideal of a unique instance for each item of data. Of course, everyone uses replication for production relational databases anyway, both to avert disaster and for load balancing, so the ideal has been blurred for a long time. As we explore the potential of cloud systems as content delivery networks, they blur the single-instance ideal. Sarah Novotny, while warning about the risks of replication, gave a talk about some practical considerations in making it work, such as tolerating inconsistency for data about sessions.

    What about NoSQL solutions, which have co-existed with relational databases for decades? Everybody knows about key-value stores, and Memcache has always partnered with MySQL to serve data quickly. I had a talk with Roger Magoulas, an O'Reilly researcher, about some of the things you sacrifice if you use a NoSQL solution instead of a relational database and why that might be OK.

    Redundancy and consistency

    Instead of storing an attribute such as "supplier" or "model number" in a separate table, most NoSQL solutions make it a part of the record for each individual member of the database. The increased disk space or RAM required becomes irrelevant in an age when those resources are so cheap and abundant. What's more significant is that a programmer can store any supplier or model number she wants, instead of having to select from a fixed set enforced by foreign key constraints. This can introduce inconsistencies and errors, but practical database experts have known for a long time that perfect accuracy is a chimera (see the work of Jeff Jonas) and modern data analytics work around the noise. When you're looking for statistical trends, whether in ocean samples or customer preferences, you don't care whether 800 of your 8 million records have corrupt data in the field you're aggregating.

    Decentralizing decisions and control

    A relational database potentially gives the DBA most of the power: it is the DBA that creates the schema, defines stored procedures and triggers, and detects and repairs inconsistencies. Modern databases such as MySQL have already blurred the formerly rigid boundaries between DBA and application programmer. In the early days, the programmer had to do a lot of the work that was reserved for DBAs in traditional databases. NoSQL clearly undermines the control freaks even more. As I've already said, enforcing consistency is not as important nowadays as it once seemed, and modern programming languages offer other ways (such as enums and sets) to prevent errors.


    I think this is still the big advantage of relational databases. Their complicated schemas and join semantics allow data mining and extended uses that evolve over the years. Many NoSQL databases are designed around the particular needs of an organization at a particular time, and require records to be ordered in the way you want to access them. And I think this is why, as discussed in some of the sessions at this conference, many people start with their raw data in some NoSQL data store and leave the results of their processing in a relational database.

    The mood this week was business-like. I've attended conferences held by emerging communities that crackle with the excitement of building something new that no one can quite anticipate; the MySQL conference wasn't like that. The attendees have a job to do and an ROI to prove; they wanted to learn whatever would help them do that.

    But the relational model still reflects most of the data handling needs we have, so MySQL will stick around. This may actually be the best environment it has ever enjoyed. Oracle still devotes a crack programming team (which includes several O'Reilly authors) to meeting corporate needs through performance improvements and simple tools. Monty Program has forked off MariaDB and Percona has popularized XtraDB, all the while contributing new features under the GPL that any implementation can use. Drizzle strips MySQL down to a core while making old goals such as multi-master replication feasible. A host of companies in layered applications such as business intelligence cluster around MySQL.

    MySQL spawns its own alternative access modes, such as the HandlerSocket plugin that returns data quickly to simple queries while leaving the full relational power of the database in place for other uses. And vendors continue to find intriguing alternatives such as the Xeround cloud service that automates fail-over, scaling, and sharding while preserving MySQL semantics. I don't think any DBA's skills will become obsolete.

    April 13 2011

    What VMware's Cloud Foundry announcement is about

    I chatted today about VMware's Cloud Foundry with Roger Bodamer, the EVP of products and technology at 10Gen. 10Gen's MongoDB is one of three back-ends (along with MySQL and Redis) supported from the start by Cloud Foundry.

    If I understand Cloud Foundry and VMware's declared "Open PaaS" strategy, it should fill a gap in services. Suppose you are a developer who wants to loosen the bonds between your programs and the hardware they run on, for the sake of flexibility, fast ramp-up, or cost savings. Your choices are:

    • An IaaS (Infrastructure as a Service) product, which hands you an emulation of bare metal where you run an appliance (which you may need to build up yourself) combining an operating system, application, and related services such as DNS, firewall, and a database.

    • You can implement IaaS on your own hardware using a virtualization solution such as VMware's products, Azure, Eucalyptus, or RPM. Alternatively, you can rent space on a service such as Amazon's EC2 or Rackspace.

    • A PaaS (Platform as a Service) product, which operates at a much higher level. A vendor such as

    By now, the popular APIs for IaaS have been satisfactorily emulated so that you can move your application fairly easily from one vendor to another. Some APIs, notably OpenStack, were designed explicitly to eliminate the friction of moving an app and increase the competition in the IaaS space.

    Until now, the PaaS situation was much more closed. VMware claims to do for PaaS what Eucalyptus and OpenStack want to do for IaaS. Vmware has a conventional cloud service called Cloud Foundry, but will offer the code under an open source license. Right Scale has already announced that you can use it to run a Cloud Foundry application on EC2. And a large site could run Cloud Foundry on its own hardware, just as it runs VMware.

    Cloud Foundry is aggressively open middleware, offering a flexible way to administer applications with a variety of options on the top and bottom. As mentioned already, you can interact with MongoDB, MySQL, or Redis as your storage. (However, you have to use the particular API offered by each back-end; there is no common Cloud Foundry interface that can be translated to the chosen back end.) You can use Spring, Rails, or Node.js as your programming environment.

    So open source Cloud Foundry may prove to be a step toward more openness in the cloud arena, as many people call for and I analyzed in a series of articles last year. VMware will, if the gamble pays off, gain more customers by hedging against lock-in and will sell its tools to those who host PaaS on their own servers. The success of the effort will depend on the robustness of the solution, ease of management, and the rate of adoption by programmers and sites.

    March 23 2011

    SMART challenge and P4: open source projects look toward the broader use of health records

    In a country where doctors are still struggling to transfer basic
    patient information (such as continuity of care records) from one
    clinic to another, it may seem premature to think about seamless data
    exchange between a patient and multiple care organizations to support
    such things as real-time interventions in patient behavior and better
    clinical decision support. But this is precisely what medicine will
    need for the next breakthrough in making patients better and reducing
    costs. And many of the building blocks have recently fallen into

    Two recent open source developments have noticed these opportunities
    and hope to create new ones from them. One is the href="">SMART Apps for Health
    contest at, based on the href="">SMART Platform that is one
    of the darlings of href="">Federal
    CTO Aneesh Chopra and other advocates for health care innovation.
    The other development is href="">P4, the brainchild of a
    physician named Adrian Gropper who has recognized the importance of
    electronic records and made the leap into technology.

    SMART challenge: Next steps for a quickly spreading open source API

    I'm hoping the SMART Platform augurs the future of health IT: an open
    source project that proprietary vendors are rushing to adopt. The
    simple goal of SMART is to pull together health data from any
    appropriate source--labs, radiology, diagnoses, and even
    administrative information--and provide it in a common,
    well-documented, simple format so any programmer can write an app to
    process it. It's a sign of the mess electronic records have become
    over the years that this functionality hasn't emerged till now. And
    it's a sign of the tremendous strides health IT has made recently that
    SMART (and the building blocks on which it is based) has become so

    SMART has been released under the GPL, and is based on two other
    important open source projects: the href="">INDIVO health record system and
    the I2B2 informatics system. Like
    INDIVO, the SMART project was largely developed by Children's Hospital
    Boston, and was presented at a meeting I attended today by Dr. Kenneth
    D. Mandl, a director of the Intelligent Health Laboratory at the
    hospital and at Harvard Medical School. SMART started out with the
    goal of providing a RESTful API into data. Not surprisingly, as Mandl
    reported, the team quickly found itself plunged into the task of
    developing standards for health-related data. Current standards either
    didn't apply to the data they were exposing or were inappropriate for
    the new uses to which they wanted to put it.

    Health data is currently stored in a Babel of formats. Converting them
    all to a single pure information stream is hopeless; to make them
    available to research one must translate them on the fly to some
    universally recognized format. That's one of the goals of the href="">report
    on health care released in December 2010 by the President's
    Council of Advisors on Science and Technology. SMART is developing
    software to do the translation and serve up data from whatever desired
    source in "containers." Applications can then query the containers
    through SMART's API to retrieve data and feed to research and clinical

    Justifying SMART, Mandl presented solid principles of modern data
    processing that will be familiar to regular Radar readers:

    Data as a platform

    Storage should be as flexible and free of bias as possible, so that
    innovators can easily write new applications that do surprising and
    wonderful things with it. This principle contrasts starkly with most
    current health records, which make the data conform to a single
    original purpose and make it hard to extract the data for any other
    use, much less keep it clean enough for unanticipated uses. (Talk to
    doctors about how little the diagnoses they enter for billing purposes
    have to do with the actual treatments patients need.)

    An "Appstore for health"

    New applications should be welcome from any quarter. Mandl is hoping
    that apps will eventually cost just a few dollars, like a cell phone
    app. (Note to Apple: Mandl and the audience tended to use the terms
    "iPhone" and "Appstore" in a casual manner that slid from metaphors to
    generic terms for mobile devices and program repositories.) Mandl said
    that his teams' evaluation of apps would be on the loose side, more
    like Android than iPhone, but that the environment would not be a
    "Wild West." At each hospital or clinic, IT staff could set up their
    own repositories of approved apps, and add custom-built ones.

    A "learning health system"

    Data should be the engine behind continuous improvement of our health
    care system. As Mandl said, "every patient should be an opportunity to

    Open source and open standards

    As we've seen, standards are a prerequisite for data as a platform.
    Open source has done well for SMART and the platforms on which is
    based. But the current challenge, notably, allows proprietary as well
    as open source submissions. This agnosticism about licensing is a
    common factor across Apparently the sponsors believe
    they will encourage more and better submissions by allowing the
    developers to keep control over the resulting code. But at least most rules require some kind of right to use the app the
    SMART challenge is totally silent on rights. The danger, of course, is
    the developers will get tired of maintaining an app or will add
    onerous features after it becomes popular.

    An impressive list of electronic record vendors have promised support
    for SMART or integrated it into products in some way: Cerner, Siemens,
    Google, Microsoft, General Electric, and more. SMART seems to be on
    its way to a clean sweep of the electronic health care record
    industry. And one of its projects is aimed at the next frontier:
    integrating devices such as blood glucose readers into the system.

    P4: Bringing patients into the health record and their own treatment

    SMART is a widely championed collaboration among stellar institutions;
    P4 is the modest suggestion of a single doctor. But I'm including P4
    in this blog because I think it's incredibly elegant. As you delve
    into it, the concept evolves from seeming quite clever to completely

    The project aims to create a lightweight communication system based on
    standards and open source software. Any device or application that the
    patient runs to record such things as blood pressure or mood could be
    hooked into the system. Furthermore, the patient would be able to
    share data with multiple care providers in a fine-grained way--just
    the cholesterol and blood pressure readings, for example, or just
    vaccination information. (This was another goal of the PCAST report
    mentioned in the previous section.)

    Communicating medical records is such a central plank of health care
    reform that a division of Health and Human Services called the Office
    of the National Coordinator created two major open source projects
    with the help of electronic health record vendors: href="">CONNECT and href="">Direct. The latter is more
    lightweight, recently having released libraries that support the
    secure exchange of data over email.

    Vendors will jump in now and produce systems they can sell to doctors
    for the exchange of continuity of care records. But Gropper wants the
    patients to have the same capabilities. To do that, he is linking up
    Direct with another open source project developed by the Markle
    Foundation for the Veterans Administration and Department of Defense:
    Blue Button.

    Blue Button is a patient portal with a particularly simple interface.
    Log in to your account, press the button, and get a flat file in an
    easy-to-read format. Linked Data proponents grumble that the format is
    not structured enough, but like HTML it is simple to use and can be
    extended in the future.

    Blue Button is currently only a one-way system, however. A veteran can
    look at his health data but can't upload new information. Nor can
    multiple providers share the data. P4 will fix all that by using a
    Direct interface to create two-way channels. If you are recovering
    from a broken leg and want to upload your range-of-motion progress
    every day, you will be able to do this (given that a format for the
    data is designed and universally recognized) with your orthopedic
    surgeon, your physical therapist, and your primary care provider. P4
    will permit fine-grained access, so you can send out only the data you
    think is relevant to each institution.

    Gropper is aiming to put together a team of open source coders to
    present this project to a VA challenge. Details can be found on the href="">P4 web page.

    January 04 2011

    Health care at the O'Reilly Open Source Convention -- Call for Proposals is open

    The O'Reilly Open Source Convention
    is offering a health care track for the second year in a row. We had
    a wonderful health care track last year (summarized in our

    report to the Robert Wood Johnson Foundation
    I'll also post some links to videos, interviews, and blogs at the end
    of this article), and we're planning to build on our coverage of last
    year's topics as well as add some topics that got short shrift last

    Topics that didn't receive as much coverage last year as (I think)
    they deserved, and that we hope to feature this year, include:

    • Roles of standards in health record formats, and weaknesses that need
      to be addressed

    • Communication with devices (ranging from ordinary cell phones to
      specialized medical devices) and their use to improve care

    • Use of electronic records and clinical decision support
      outside the United States

    Important topics that we covered last year and whose developments we
    should continue to follow include:

    • The use of electronically collected data for research and
      evidence-based medicine

    • How to deploy electronic health records (particularly open source) in
      clinical settings

    • Secure health record exchange through CONNECT and the Direct Project

    • Programming electronic records, including web APIs, in order to add,
      extract, and perform calculations on data

    • Security, identity management, and patient control over records

    Please share this article with any appropriate forums or individuals
    and let them know about the conference's

    Call for Proposals

    A partial list follows of podcasts, videos, and other content related
    to last year's conference.

    December 22 2010

    Reaching the pinnacle: truly open web services and clouds

    Previous section:

    Why web services should be released as free software

    Free software in the cloud isn't just a nice-sounding ideal or even an efficient way to push innovation forward. Opening the cloud also opens the path to a bountiful environment of computing for all. Here are the steps to a better computing future.

    Provide choice

    The first layer of benefits when companies release their source code
    is incremental: incorporating bug fixes, promoting value-added
    resellers, finding new staff among volunteer programmers. But a free
    software cloud should go far beyond this.

    Remember that web services can be run virtually now. When you log in
    to a site to handle mail, CRM, or some other service, you may be
    firing up a virtual service within a hardware cloud.

    So web and cloud providers can set up a gallery of alternative
    services, trading off various features or offering alternative
    look-and-feel interfaces. Instead of just logging into a site such as and accepting whatever the administrators have put up
    that day, users could choose from a menu, and perhaps even upload
    their own preferred version of the service. The SaaS site would then
    launch the chosen application in the cloud. Published APIs would allow
    users on different software versions to work together.

    If a developer outside the company creates a new version with
    substantial enhancements, the company can offer it as an option. If
    new features slow down performance, the company can allow clients to
    decide whether the delays are worth it. To keep things simple for
    casual clients, there will probably always be a default service, but
    those who want alternatives can have them.

    Vendors can provide "alpha" or test sites where people can try out new
    versions created by the vendor or by outsiders. Like stand-alone
    software, cloud software can move through different stages of testing
    and verification.

    And providing such sandboxes can also be helpful to developers in
    general. A developer would no longer have to take the trouble to
    download, install, and configure software on a local computer to do
    development and testing. Just log into the sandbox and play.
    Google offers
    The Go Playground
    to encourage students of their Go language. CiviCRM,
    which is a free software server (not a cloud or web service) offers a
    sandbox for testing new
    features. A web service company in electronic health records,
    Practice Fusion,
    which issued an API challenge in September, is now creating a sandbox
    for third-party developers to test the API functionality on its
    platform. I would encourage web and cloud services to go even
    farther: open their own source code and provide sandboxes for people
    to rewrite and try out new versions.

    Let's take a moment for another possible benefit of running a
    service as a virtual instance. Infected computer systems present a
    serious danger to users (who can suffer from identity theft if their
    personal data is scooped up) and other systems, which can be
    victimized by denial-of-service attacks or infections of their own.
    An awkward tower of authorizations reaching right down into the
    firmware or hardware. In trusted computing, the computer itself checks
    to make sure that a recognized and uncompromised operating system is
    running at boot time. The operating system then validates each
    application before launching it.

    Trusted computing is Byzantine and overly controlling. The hardware
    manufacturer gets to decide which operating system you use, and
    through that which applications you use. Wouldn't users prefer
    to run cloud instances that are born anew each time they log in? That
    would wipe out any infection and ensure a trusted environment at the
    start of each session without cumbersome gatekeeping.

    Loosen the bonds on data

    As we've seen, one of the biggest fears keeping potential clients away
    from web services and cloud computing is the risk entailed in leaving
    their data in the hands of another company. Here it can get lost,
    stolen, or misused for nefarious purposes.

    But data doesn't have to be stored on the computer where the
    processing is done, or even at the same vendor. A user could fire up a
    web or cloud service, submit a data source and data store, and keep
    results in the data store. IaaS-style cloud computing involves
    encrypted instances of operating systems, and if web services did the
    same, users would automatically be protected from malicious
    prying. There is still a potential privacy issue whenever a user runs
    software on someone else's server, because it could skim off private
    data and give to a marketing firm or law enforcement.

    Alert web service vendors such as Google know they have to assuage
    user fears of locked-in data. In Google's case, they created a
    protocol called the Data Liberation Front (see an article by two
    Google employees,

    The Case Against Data Lock-in
    ). This will allow users to extract
    their data in a format that makes it feasible to reconstitute it in
    its original format on another system, but it doesn't actually sever
    the data from the service as I'm suggesting.

    A careful client would store data in several places (to guard against
    loss in case one has a disk failure or other catastrophe). The client
    would then submit one location to the web service for processing, and
    store the data back in all locations or store it in the original
    source and then copy it later, after making sure it has not been

    A liability issue remains when calculation and data are separated. If
    the client experiences loss or corruption, was the web service or the
    data storage service responsible? A ping-pong scenario could easily
    develop, with the web services provider saying the data storage
    service corrupted a disk sector, the data storage service saying the
    web service produced incorrect output, and the confused client left
    furious with no recourse.

    This could perhaps be solved by a hash or digest, a very stable and
    widely-used practice used to ensure that any change to the data, even
    the flip of a single bit, produces a different output value. A digest
    is a small number that represents a larger batch of data. Algorithms
    that create digests are fast but generate output that's reasonably
    unguessable. Each time the same input is submitted to the algorithm,
    it is guaranteed to generate the same digest, but any change to the
    input (through purposeful fiddling or an inadvertent error) will
    produce a different digest.

    The web service could log each completed activity along with the
    digest of the data it produces. The data service writes the data,
    reads it back, and computes a new digest. Any discrepancy signals a
    problem on the data service side, which it can fix by repeating the
    write. In the future, if data is corrupted but has the original
    digest, the client can blame the web service, because the web service
    must have written corrupt data in the first place.

    Sascha Meinrath, a wireless networking expert, would like to see
    programs run both on local devices and in the cloud. Each
    program could exploit the speed and security of the local device but
    reach seamlessly back to remote resources when necessary, rather like
    a microprocessor uses the local caches as much as possible and faults
    back to main memory when needed. Such a dual arrangement would offer
    flexibility, making it possible to continue work offline, keep
    particularly sensitive data off the network, and let the user trade
    off compute power for network usage on a case-by-case basis. (Wireless
    use on a mobile device can also run down the battery real fast.)

    Before concluding, I should touch on another trend that some
    developers hope will free users from proprietary cloud services:
    peer-to-peer systems. The concept behind peer-to-peer is appealing and
    have been

    gaining more attention recently
    individuals run servers on their systems at home or work and serve up
    the data they want. But there are hard to implement, for reasons I
    laid out in two articles,

    From P2P to Web Services: Addressing and Coordination

    From P2P to Web Services: Trust
    . Running your own
    software is somewhat moot anyway, because you're well advised to store
    your data somewhere else in addition to your own system. So long as
    you're employing a back-up service to keep your data safe in case of
    catastrophe, you might as well take advantage of other cloud services
    as well.

    I also don't believe that individual site maintained by
    individuals will remain the sources for important data, as the
    peer-to-peer model postulates. Someone is going to mine that data and
    aggregate it--just look at the proliferation of Twitter search
    services. So even if users try to live the ideal of keeping control
    over their data, and use distributed technologies like the
    Diaspora project,
    they will end up surrendering at least some control and data to a

    A sunny future for clouds and free software together

    The architecture I'm suggesting for computing makes free software even
    more accessible than the current practice of putting software on the
    Internet where individuals have to download and install it. The cloud
    can make free software as convenient as Gmail. In fact, for free
    software that consumes a lot of resources, the cloud can open it up to
    people who can't afford powerful computers to run the software.

    Web service offerings would migrate to my vision of a free software
    cloud by splitting into several parts, any or all of them free
    software. A host would simply provide the hardware and
    scheduling for the rest of the parts. A guest or
    appliance would contain the creative software implementing
    the service. A sandbox with tools for compilation, debugging,
    and source control would make it easy for developers to create new
    versions of the guest. And data would represent the results
    of the service's calculations in a clearly documented
    format. Customers would run the default guest, or select another guest
    on the vendor's site or from another developer. The guest would output
    data in the standardized format, to be stored in a location of the
    customer's choice and resubmitted for the next run.

    With cloud computing, the platform you're on no longer becomes
    important. The application is everything and the computer is (almost)
    nothing. The application itself may also devolve into a variety of
    mashed-up components created by different development teams and
    communicating over well-defined APIs, a trend I suggested almost a
    decade ago in an article titled

    Applications, User Interfaces, and Servers in the Soup

    The merger of free software with cloud and web services is a win-win.
    The convenience of IaaS and PaaS opens up opportunities for
    developers, whereas SaaS simplifies the use of software and extends its
    reach. Opening the source code, in turn, makes the cloud more
    appealing and more powerful. The transition will take a buy-in from
    cloud and SaaS providers, a change in the software development
    process, a stronger link between computational and data clouds, and
    new conventions to be learned by clients of the services. Let's get
    the word out.

    (I'd like to thank Don Marti for suggesting additional ideas for this
    article, including the fear of creating a two-tier user society, the
    chance to shatter the tyranny of IT departments, the poor quality of
    source code created for web services, and the value of logging
    information on user interaction. I would also like to thank Sascha
    Meinrath for the idea of seamless computing for local devices and the
    cloud, Anne Gentle for her idea about running test and production
    systems in the same cloud, and Karl Fogel for several suggestions,
    especially the value of usage statistics for programmers of web

    Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
    Could not load more posts
    Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
    Just a second, loading more posts...
    You've reached the end.
    No Soup for you

    Don't be the product, buy the product!

    YES, I want to SOUP ●UP for ...