Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

May 02 2012

Recombinant Research: Breaking open rewards and incentives

In the previous articles in this series I've looked at problems in current medical research, and at the legal and technical solutions proposed by Sage Bionetworks. Pilot projects have shown encouraging results but to move from a hothouse environment of experimentation to the mainstream of one of the world's most lucrative and tradition-bound industries, Sage Bionetworks must aim for its nucleus: rewards and incentives.

Previous article in the series: Sage Congress plans for patient engagement.

Think about the publication system, that wretchedly inadequate medium for transferring information about experiments. Getting the data on which a study was based is incredibly hard; getting the actual samples or access to patients is usually impossible. Just as boiling vegetables drains most of their nutrients into the water, publishing results of an experiment throws away what is most valuable.

But the publication system has been built into the foundation of employment and funding over the centuries. A massive industry provides distribution of published results to libraries and research institutions around the world, and maintains iron control over access to that network through peer review and editorial discretion. Even more important, funding grants require publication (but the data behind the study only very recently). And of course, advancement in one's field requires publication.

Lawrence Lessig, in his keynote, castigated for-profit journals for restricting access to knowledge in order to puff up profits. A chart in his talk showed skyrocketing prices for for-profit journals in comparison to non-profit journals. Lessig is not out on the radical fringe in this regard; Harvard Library is calling the current pricing situation "untenable" in a move toward open access echoed by many in academia.

Lawrence Lessig keynote at Sage Congress
Lawrence Lessig keynote at Sage Congress.

How do we open up this system that seemed to serve science so well for so long, but is now becoming a drag on it? One approach is to expand the notion of publication. This is what Sage Bionetworks is doing with Science Translational Medicine in publishing validated biological models, as mentioned in an earlier article. An even more extensive reset of the publication model is found in Open Network Biology (ONB), an online journal. The publishers require that an article be accompanied by the biological model, the data and code used to produce the model, a description of the algorithm, and a platform to aid in reproducing results.

But neither of these worthy projects changes the external conditions that prop up the current publication system.

When one tries to design a reward system that gives deserved credit to other things besides the final results of an experiment, as some participants did at Sage Congress, great unknowns loom up. Is normalizing and cleaning data an activity worth praise and recognition? How about combining data sets from many different projects, as a Synapse researcher did for the TCGA? How much credit do you assign researchers at each step of the necessary procedure for a successful experiment?

Let's turn to the case of free software to look at an example of success in open sharing. It's clear that free software has swept the computer world. Most web sites use free software ranging from the server on which they run to the language compilers that deliver their code. Everybody knows that the most popular mobile platform, Android, is based on Linux, although fewer realize that the next most popular mobile platforms, Apple's iPhones and iPads, run on a modified version of the open BSD operating system. We could go on and on citing ways in which free and open source software have changed the field.

The mechanism by which free and open source software staked out its dominance in so many areas has not been authoritatively established, but I think many programmers agree on a few key points:

  • Computer professionals encountered free software early in their careers, particularly as students or tinkerers, and brought their predilection for it into jobs they took at stodgier institutions such as banks and government agencies. Their managers deferred to them on choices for programming tools, and the rest is history.

  • Of course, computer professionals would not have chosen the free tools had they not been fit for the job (and often best for the job). Why is free software so good? Probably because the people creating it have complete jurisdiction over what to produce and how much time to spend producing it, unlike in commercial ventures with requirements established through marketing surveys and deadlines set unreasonably by management.

  • Different pieces of free software are easy to hook up, because one can alter their interfaces as necessary. Free software developers tend to look for other tools and platforms that could work with their own, and provide hooks into them (Apache, free database engines such as MySQL, and other such platforms are often accommodated.) Customers of proprietary software, in contrast, experience constant frustration when they try to introduce a new component or change components, because the software vendors are hostile to outside code (except when they are eager to fill a niche left by a competitor with market dominance). Formal standards cannot overcome vendor recalcitrance--a painful truth particularly obvious in health care with quasi-standards such as HL7.

  • Free software scales. Programmers work on it tirelessly until it's as efficient as it needs to be, and when one solution just can't scale any more, programmers can create new components such as Cassandra, CouchDB, or Redis that meet new needs.

Are there lessons we can take from this success story? Biological research doesn't fit the circumstances that made open source software a success. For instance, researchers start out low on the totem pole in very proprietary-minded institutions, and don't get to choose new ways of working. But the cleverer ones are beginning to break out and try more collaboration. Software and Internet connections help.

Researchers tend to choose formats and procedures on an ad hoc, project by project basis. They haven't paid enough attention to making their procedures and data sets work with those produced by other teams. This has got to change, and Sage Bionetworks is working hard on it.

Research is labor-intensive. It needs desperately to scale, as I have pointed out throughout this article, but to do so it needs entire new paradigms for thinking about biological models, workflow, and teamwork. This too is part of Sage Bionetworks' mission.

Certain problems are particularly resistant in research:

  • Conditions that affect small populations have trouble raising funds for research. The Sage Congress initiatives can lower research costs by pooling data from the affected population and helping researchers work more closely with patients.

  • Computation and statistical methods are very difficult fields, and biological research is competing with every other industry for the rare individuals who know these well. All we can do is bolster educational programs for both computer scientists and biologists to get more of these people.

  • There's a long lag time before one knows the effects of treatments. As Heywood's keynote suggested, this is partly solved by collecting longitudinal data on many patients and letting them talk among themselves.

Another process change has revolutionized the computer field: agile programming. That paradigm stresses close collaboration with the end-users whom the software is supposed to benefit, and a willingness to throw out old models and experiment. BRIDGE and other patient initiatives hold out the hope of a similar shift in medical research.

All these things are needed to rescue the study of genetics. It's a lot to do all at once. Progress on some fronts were more apparent than others at this year's Sage Congress. But as more people get drawn in, and sometimes fumbling experiments produce maps for changing direction, we may start to see real outcomes from the efforts in upcoming years.

All articles in this series, and others I've written about Sage Congress, are available through a bundle.

OSCON 2012 — Join the world's open source pioneers, builders, and innovators July 16-20 in Portland, Oregon. Learn about open development, challenge your assumptions, and fire up your brain.

Save 20% on registration with the code RADAR20

May 01 2012

Recombinant Research: Sage Congress plans for patient engagement

Clinical trials are the pathway for approving drug use, but they aren't good enough. That has become clear as a number of drugs (Vioxx being the most famous) have been blessed by the FDA, but disqualified after years of widespread use reveal either lack of efficacy or dangerous side effects. And the measures taken by the FDA recently to solve this embarrassing problem continue the heavy-weight bureaucratic methods it has always employed: more trials, raising the costs of every drug and slowing down approval. Although I don't agree with the opinion of Avik S. A. Roy (reprinted in Forbes) that Phase III trials tend to be arbitrary, I do believe it is time to look for other ways to test drugs for safety and efficacy.

First article in the series: Recombinant Research: Sage Congress Promotes Data Sharing in Genetics.

But the Vioxx problem is just one instance of the wider malaise afflicting the drug industry. They just aren't producing enough new medications, either to solve pressing public needs or to keep up their own earnings. Vicki Seyfert-Margolis of the FDA built on her noteworthy speech at last year's Sage Congress (reported in one of my articles about the conference) with the statistic that drug companies have submitted 20% fewer medications to the FDA between 2001 and 2007. Their blockbuster drugs produce far fewer profits than before as patents expire and fewer new drugs emerge (a predicament called the "patent cliff"). Seyfert-Margolis intimated that this crisis in the cause of layoffs in the industry, although I heard elsewhere that the companies are outsourcing more research, so perhaps the downsizing is just a reallocation of the same money.

Benefits of patient involvement

The field has failed to rise to the challenges posed by new complexity. Speakers at Sage Congress seemed to feel that genetic research has gone off the tracks. As the previous article in this series explained, Sage Bionetworks wants researchers to break the logjam by sharing data and code in GitHub fashion. And surprisingly, pharma is hurting enough to consider going along with an open research system. They're bleeding from a situation where as much as 80% of each clinical analysis is spent retrieving, formatting, and curating the data. Meanwhile, Kathy Giusti of the Multiple Myeloma Research Foundation says that in their work, open clinical trials are 60% faster.

Attendees at a breakout session where I sat in, including numerous managers from major pharma companies, expressed confidence that they could expand public or "pre-competitive" research in the direction Sage Congress proposed. The sector left to engage is the one that's central to all this work--the public.

If we could collect wide-ranging data from, say, 50,000 individuals (a May 2013 goal cited by John Wilbanks of Sage Bionetworks, a Kauffman Foundation Fellow), we could uncover a lot of trends that clinical trials are too narrow to turn up. Wilbanks ultimately wants millions of such data samples, and another attendee claimed that "technology will be ready by 2020 for a billion people to maintain their own molecular and longitudinal health data." And Jamie Heywood of PatientsLikeMe, in his keynote, claimed to have demonstrated through shared patient notes that some drugs were ineffective long before the FDA or manufacturers made the discoveries. He decried the current system of validating drugs for use and then failing to follow up with more studies, snorting that, "Validated means that I have ceased the process of learning."

But patients have good reasons to keep a close hold on their health data, fearing that an insurance company, an identity thief, a drug marketer, or even their own employer will find and misuse it. They already have little enough control over it, because the annoying consent forms we always have shoved in our faces when we come to a clinic give away a lot of rights. Current laws allow all kinds of funny business, as shown in the famous case of the Vermont law against data mining, which gave the Supreme Court a chance to say that marketers can do anything they damn please with your data, under the excuse that it's de-identified.

In a noteworthy poll by Sage Bionetworks, 80% of academics claimed they were comfortable sharing their personal health data with family members, but only 31% of citizen advocates would do so. If that 31% is more representative of patients and the general public, how many would open their data to strangers, even when supposedly de-identified?

The Sage Bionetworks approach to patient consent

It's basic research that loses. So Wilbanks and a team have been working for the past year on a "portable consent" procedure. This is meant to overcome the hurdle by which a patient has to be contacted and give consent anew each time a new researcher wants data related to his or her genetics, conditions, or treatment. The ideal behind portable consent is to treat the entire research community as a trusted user.

The current plan for portable consent provides three tiers:

Tier 1

No restrictions on data, so long as researchers follow the terms of service. Hopefully, millions of people will choose this tier.

Tier 2

A middle ground. Someone with asthma may state that his data can be used only by asthma researchers, for example.

Tier 3

Carefully controlled. Meant for data coming from sensitive populations, along with anything that includes genetic information.

Synapse provides a trusted identification service. If researchers find a person with useful characteristics in the last two tiers, and are not authorized automatically to use that person's data, they can contact Synapse with the random number assigned to the person. Synapse keeps the original email address of the person on file and will contact him or her to request consent.

Portable consent also involves a lot of patient education. People will sign up through a software wizard that explains the risks. After choosing portable consent, the person decides how much to put in: 23andMe data, prescriptions, or whatever they choose to release.

Sharon Terry of the Genetic Alliance said that patient advocates currently try to control patient data in order to force researchers to share the work they base on that data. Portable consent loosens this control, but the field may be ready for its more flexible conditions for sharing.

Pharma companies and genetics researchers have lots to gain from access to enormous repositories of patient data. But what do the patients get from it? Leaders in health care already recognize that patients are more than experimental subjects and passive recipients of treatment. The recent ONC proposal for Stage 2 of Meaningful Use includes several requirements to share treatment data with the people being treated (which seems kind of a no-brainer when stated this baldly) and the ONC has a Consumer/Patient Engagement Power Team.

Sage Congress is fully engaged in the patient engagement movement too. One result is the BRIDGE initiative, a joint project of Sage Bionetworks and Ashoka with funding from the Robert Wood Johnson Foundation, to solicit questions and suggestions for research from patients. Researchers can go for years researching a condition without even touching on some symptom that patients care about. Listening to patients in the long run produces more cooperation and more funding.

Portable consent requires a leap of faith, because as Wilbanks admits, releasing aggregates of patient data mean that over time, a patient is almost certain to be re-identified. Statistical techniques are just getting too sophisticated and compute power growing too fast for anyone to hide behind current tricks such as using only the first three digits of a five-digit postal code. Portable consent requires the data repository to grant access only to bona fide researchers and to set terms of use, including a ban on re-identifying patients. Still, researchers will have rights to do research, redistribute data, and derive products from it. Audits will be built in.

But as mentioned by Kelly Edwards of the University of Washington, tools and legal contracts can contribute to trust, but trust is ultimately based on shared values. Portable consent, properly done, engages with frameworks like Synapse to create a culture of respect for data.

In fact, I think the combination of the contractual framework in portable consent and a platform like Synapse, with its terms of use, might make a big difference in protecting patient privacy. Seyfert-Margolis cited predictions that 500 million smartphone users will be using medical apps by 2015. But mobile apps are notoriously greedy for personal data and cavalier toward user rights. Suppose all those smartphone users stored their data in a repository with clear terms of use and employed portable consent to grant access to the apps? We might all be safer.

The final article in this series will evaluate the prospects for open research in genetics, with a look at the grip of journal publishing on the field, and some comparisons to the success of free and open source software.

Next: Breaking Open Rewards and Incentives. All articles in this series, and others I've written about Sage Congress, are available through a bundle.

OSCON 2012 — Join the world's open source pioneers, builders, and innovators July 16-20 in Portland, Oregon. Learn about open development, challenge your assumptions, and fire up your brain.

Save 20% on registration with the code RADAR20

Sponsored post
soup-sponsored will be discontinued :(

Dear fans and users,
today, we have to share very sad news. will stop working in less than 10 days. :(
It's breaking our heart and we honestly tried whatever we could to keep the platform up and running. But the high costs and low revenue streams made it impossible to continue with it. We invested a lot of personal time and money to operate the platform, but when it's over, it's over.
We are really sorry. is part of the internet history and online for one and a half decades.
Here are the hard facts:
- In 10 days the platform will stop working.
- Backup your data in this time
- We will not keep backups nor can we recover your data
July, 20th, 2020 is the due date.
Please, share your thoughts and feelings here.
Reposted bydotmariuszMagoryannerdanelmangoerainbowzombieskilledmyunicorntomashLogHiMakalesorSilentRulebiauekjamaicanbeatlevuneserenitephinangusiastysmoke11Climbingpragne-ataraksjisauerscharfArchimedesgreywolfmodalnaTheCrimsonIdoljormungundmarbearwaco6mieczuuFeindfeuerDagarhenvairashowmetherainbowszpaqusdivihindsightTabslawujcioBateyelynTabslaensommenitaeliblameyouHalobeatzalicexxxmgnsNorkNorkarthiimasadclownsurprisemeTriforcefemiMalikorCyamissiostrablackmoth7KorewapluePstrykMarcoDWdesperateeeSalvator84100sunslamnedIntezupkazproszkuLarryGreenSkyoutofmyheadyannimsmall-idea-colliderdrfredxmascolaradeinneuerfreundnothingiseverythingnothingiseverything

April 30 2012

Recombinant Research: Sage Congress promotes data sharing in genetics

Given the exponential drop in the cost of personal genome sequencing (you can get a basic DNA test from 23andMe for a couple hundred dollars, and a full sequence will probably soon come down to one thousand dollars in cost), a new dawn seems to be breaking forth for biological research. Yet the assessment of genetics research at the recent Sage Congress was highly cautionary. Various speakers chided their own field for tilling the same ground over and over, ignoring the urgent needs of patients, and just plain researching the wrong things.

Sage Congress also has some plans to fix all that. These projects include tools for sharing data and storing it in cloud facilities, running challenges, injecting new fertility into collaboration projects, and ways to gather more patient data and bring patients into the planning process. Through two days of demos, keynotes, panels, and breakout sessions, Sage Congress brought its vision to a high-level cohort of 230 attendees from universities, pharmaceutical companies, government health agencies, and others who can make change in the field.

In the course of this series of articles, I'll pinpoint some of the pain points that can force researchers, pharmaceutical companies, doctors, and patients to work together better. I'll offer a look at the importance of public input, legal frameworks for cooperation, the role of standards, and a number of other topics. But we'll start by seeing what Sage Bionetworks and its pals have done over the past year.

Synapse: providing the tools for genetics collaboration

Everybody understands that change is driven by people and the culture they form around them, not by tools, but good tools can make it a heck of a lot easier to drive change. To give genetics researchers the best environment available to share their work, Sage Bionetworks created the Synapse platform.

Synapse recognizes that data sets in biological research are getting too large to share through simple data transfers. For instance, in his keynote about cancer research (where he kindly treated us to pictures of cancer victims during lunch), UC Santa Cruz professor David Haussler announced plans to store 25,000 cases at 200 gigabytes per case in the Cancer Genome Atlas, also known as TCGA in what seems to be a clever pun on the four nucleotides in DNA. Storage requirements thus work out to 5 petabytes, which Haussler wants to be expandable to 20 petabytes. In the face of big data like this, the job becomes moving the code to the data, not moving the data to the code.

Synapse points to data sets contributed by cooperating researchers, but also lets you pull up a console in a web browser to run R or Python code on the data. Some effort goes into tagging each data set with associated metadata: tissue type, species tested, last update, number of samples, etc. Thus, you can search across Synapse to find data sets that are pertinent to your research.

One group working with Synapse has already harmonized and normalized the data sets in TCGA so that a researcher can quickly mix and run stats on them to extract emerging patterns. The effort took about one and half full-time employees for six months, but the project leader is confident that with the system in place, "we can activate a similar size repository in hours."

This contribution highlights an important principle behind Synapse (appropriately called "viral" by some people in the open source movement): when you have manipulated and improved upon the data you find through Synapse, you should put your work back into Synapse. This work could include cleaning up outlier data, adding metadata, and so on. To make work sharing even easier, Synapse has plans to incorporate the Amazon Simple Workflow Service (SWF). It also hopes to add web interfaces to allow non-programmers do do useful work with data.

The Synapse development effort was an impressive one, coming up with a feature-rich Beta version in a year with just four coders. And Synapse code is entirely open source. So not only is the data distributed, but the creators will be happy for research institutions to set up their own Synapse sites. This may make Synapse more appealing to geneticists who are prevented by inertia from visiting the original Synapse.

Mike Kellen, introducing Synapse, compared its potential impact to that of moving research from a world of journals to a world like GitHub, where people record and share every detail of their work and plans. Along these lines, Synapse records who has used a data set. This has many benefits:

  • Researchers can meet up with others doing related work.

  • It gives public interest advocates a hook with which to call on those who benefit commercially from Synapse--as we hope the pharmaceutical companies will--to contribute money or other resources.

  • Members of the public can monitor accesses for suspicious uses that may be unethical.

There's plenty more work to be done to get data in good shape for sharing. Researchers must agree on some kind of metadata--the dreaded notion of ontologies came up several times--and clean up their data. They must learn about data provenance and versioning.

But sharing is critical for such basics of science as reproducing results. One source estimates that 75% of published results in genetics can't be replicated. A later article in this series will examine a new model in which enough metainformation is shared about a study for it to be reproduced, and even more important to be a foundation for further research.

With this Beta release of Synapse, Sage Bionetworks feels it is ready for a new initiative to promote collaboration in biological research. But how do you get biologists around the world to start using Synapse? For one, try an activity that's gotten popular nowadays: a research challenge.

The Sage DREAM challenge

Sage Bionetworks' DREAM challenge asks genetics researchers to find predictors of the progression of breast cancer. The challenge uses data from 2000 women diagnosed with breast cancer, combining information on DNA alterations affecting how their genes were expressed in the tumors, clinical information about their tumor status, and their outcomes over ten years. The challenge is to build models integrating the alterations with molecular markers and clinical features to predict which women will have the most aggressive disease over a ten year period.

Several hidden aspects of the challenge make it a clever vehicle for Sage Bionetworks' values and goals. First, breast cancer is a scourge whose urgency is matched by its stubborn resistance to diagnosis. The famous 2009 recommendations of U.S. Preventive Services Task Force, after all the controversy was aired, left us with the dismal truth that we don't know a good way to predict breast cancer. Some women get mastectomies in the total absence of symptoms based just on frightening family histories. In short, breast cancer puts the research and health care communities in a quandary.

We need finer-grained predictors to say who is likely to get breast cancer, and standard research efforts up to now have fallen short. The Sage proposal is to marshal experts in a new way that combines their strengths, asking them to publish models that show the complex interactions between gene targets and influences from the environment. Sage Bionetworks will publish data sets at regular intervals that it uses to measure the predictive ability of each model. A totally fresh data set will be used at the end to choose the winning model.

The process behind the challenge--particularly the need to upload code in order to run it on the Synapse site--automatically forces model builders to publish all their code. According to Stephen Friend, founder of Sage Bionetworks, "this brings a level of accountability, transparency, and reproducibility not previously achieved in clinical data model challenges."

Finally, the process has two more effects: it shows off the huge amount of genetic data that can be accessed through Synapse, and it encourages researchers to look at each other's models in order to boost their own efforts. In less than a month, the challenge already received more than 100 models from 10 sources.

The reward for winning the challenge is publication in a respected journal, the gold medal still sought by academic researchers. (More on shattering this obelisk later in the series.) Science Translational Medicine will accept results of the evaluation as a stand-in for peer review, a real breakthrough for Sage Bionetworks because it validates their software-based, evidence-driven process.

Finally, the DREAM challenge promotes use of the Synapse infrastructure, and in particular the method of bringing the code to the data. Google is donating server space for the challenge, which levels the playing field for researchers, freeing them from paying for their own computing.

A single challenge doesn't solve all the problems of incentives, of course. We still need to persuade researchers to put up their code and data on a kind of genetic GitHub, persuade pharmaceutical companies to support open research, and persuade the general public to share data about the phonemes (life data) and genes--all topics for upcoming articles in the series.

Next: Sage Congress Plans for Patient Engagement. All articles in this series, and others I've written about Sage Congress, are available through a bundle.

OSCON 2012 — Join the world's open source pioneers, builders, and innovators July 16-20 in Portland, Oregon. Learn about open development, challenge your assumptions, and fire up your brain.

Save 20% on registration with the code RADAR20

April 19 2012

Sage Congress: The synthesis of open source with genetics

For several years, O'Reilly Radar has been covering the exciting
potential that open source software, open data, and a general attitude
of sharing and cooperation bring to health care. Along with many
exemplary open source projects in areas directly affecting the
public — such as the VA's Blue
in electronic medical records and the href="">Direct project in data
exchange — the study of disease is undergoing a paradigm shift.

Sage Bionetworks stands at the
center of a wide range of academic researchers, pharmaceutical
companies, government agencies, and health providers realizing that
the old closed system of tiny teams who race each other to a cure has
got to change. Today's complex health problems, such as Alzheimer's,
AIDS, and cancer, are too big for a single team. And these
institutions are slowly wrenching themselves out of the habit of data
hoarding and finding ways to work together.

A couple weeks ago I talked to the founder of Sage Bionetworks,
Stephen Friend, about recent advances in open source in this area, and
the projects to be highlighted at the upcoming">Sage Commons congress. Steve is careful
to call this a "congress" instead of a "conference" because all
attendees are supposed to pitch in and contribute to the meme pool. I
covered Sage Congress in a series of
articles last year
. The following podcast ranges over
topics such as:

  • what is Sage Bionetworks [Discussed at the 00:25 mark];
  • the commitment of participants to open source software [Discussed at the 01:01 mark];
  • how open source can support a business model in drug development [Discussed at the 01:40 mark];
  • a look at the upcoming congress [Discussed at the 03:47 mark];
  • citizen-led contributions or network science [Discussed at the 06:12 mark];
  • data sharing philosophy [Discussed at the 09:01 mark];
  • when projects are shared with other institutions [Discussed at the 12:43 mark];
  • how to democratize medicine [Discussed at the 17:10 mark];
  • a portable legal consent approach where the patient controls his or her own data [Discussed at the 20:07 mark];
  • solving the problem of non-sharing in the industry [Discussed at the 22:15 mark]; and
  • key speakers at the congress [Discussed at the 26:35 mark].

Sessions from the congress will be broadcast live via webcast and posted on the Internet.

May 06 2011

Collaborative genetics, part 5: Next steps for genetic commons

Previous installment: Private practice, how to respect the patient

Sage is growing, and everything they're doing to promote the commons now will likely continue. They'll sign up more pharma companies to contribute data and more researchers to work in teams, such as in the Federation.

Although genetics seems to be a narrow area, it's pretty central to everything that government, hospitals, and even insurers want to achieve in lowering costs and improving care. This research is at the heart of such tasks as:

  • Making drug development faster and cheaper (drugs are now a major source of inflation in in health care, particularly among the growing elderly population)

  • Discovering in advance which patients will fail to respond to drugs, thus lowering costs and allowing them to access correct treatments faster

  • Improving our knowledge of the incidence and course of diseases in general

From my perspective--knowing little about medical research but a fair among about software--the two biggest areas that need attention are standardized formats and software tools to support such activities as network modeling and analyzing results. Each institution tends to be on its own, but there are probably a lot of refined tools out there that could help everybody.

Researchers may well underestimate how much effort needs to go into standardizing software tools and formats, and how much pay-off that work would produce. Researchers tend to be loners, brave mountaineers who like to scale the peaks on their own and solve each problem through heroism along the way. Investing in a few cams and rappels could greatly enhance their success.

Publicity and public engagement are good for any initiative, but my guess is that, if Sage and its collaborators develop some awesome tools and show more of the results we started to see at this conference, other institutions will find their way to them.

This posting is the last of a five-part series.

May 05 2011

Collaborative genetics, part 4: Private practice, how to respect the patient

Previous installment: Dividing the pie, from research to patents

The fear of revealing patient data pervades the medical field, from the Hippocratic Oath to the signs posted all over hospitals reminding staff not to discuss patients in the hallways and elevators. HIPAA's privacy provisions are parts most routinely cited, and many hospitals overreach their legal mandates, making it even harder than the law requires to get data. Whereas Americans have gotten used to the wanton collection of data in other spheres of life, health care persists in its idyllic island or innocence (and we react with outrage whenever this innocence proves illusory).

In my break-out session about terms of service, a lot of the talk revolved around privacy. The attendees acknowledged respectfully on one level and grumbled about how the laws got in the way of good research on another level. Their attitudes struck me as inconsistent and lacking resolve, but overall I felt that these leaders in the field of health lacked an appreciation for the sacredness of privacy as part of the trust a patient has for her doctor and the health care system.

Even Peter Kapitein, in his keynote, railed that concerns for privacy were just excuses used by institutions to withhold information they didn't want the public to know. This is often true, but I felt he went too far in uttering: "No patient will say, please don't use my data if it will help me or help someone else in my position." This is not what surveys show, such as Dr. Alan Westin's 2008 report to the FTC. When I spoke to Kapitein afterward, he acknowledged that he had exaggerated his point for the sake of rhetoric, and that he recognized the importance of privacy in many situations. Still, I fear that his strong statement might have a bad effect on his audience.

We all know that de-identified data is vulnerable to re-identification and that many patients have good reason to fear what would happen if certain people got word of their conditions. It's widely acknowledged that many patients withhold information from their own doctors out of embarrassment. They still need to have a choice when researchers ask for data too. Distrust of medical research is common among racial minorities, still angry at the notorious Tuskegee syphilis study and recently irked again by researchers' callous attitude toward the family of Henrietta Lacks.

Wilbanks recommends that the terms of service for the commons prohibit unethical uses of data, and specifically the combination of data from different sources for re-identification of patients.

It's ironic that one vulnerability might be forced on the Sage commons by patients themselves. Many patients offer their data to researchers with the stipulation that the patients can hear back about any information the researchers find out about them; this is called a "patient grant-back."

Grant-backs introduce significant ethical concerns, aside from privacy, because researchers could well find that the patient has a genetic makeup strongly disposing him to a condition for which there's no treatment, such as Huntington's Disease. Researchers may also find out things that sound scary and require professional interpretation to put into context. One doctor I talked to said the researcher should communicate any findings to the patient's doctor, not the patient himself. But that would be even harder to arrange.

In terms of privacy, requiring a researcher to contact the patient introduces a new threat of attack and places a huge administrative burden on the researchers, as well as any repository such as the commons. It means that the de-identified data must be linked in a database to contact information for the patient. Even if careful measures are taken to separate the two databases, an intruder has a much better chance of getting the data than if the patient left no such trace. Patients should be told that this is a really bad deal for them.

This posting is one of a five-part series. Final installment: Next steps for genetic commons

May 04 2011

Collaborative genetics, part 3: Dividing the pie, from research to patents

Previous installment: Five Easy Pieces, Sage's Federation

What motivates scientists, companies, and funders to develop bold new treatments? Of course, everybody enters the field out of a passion to save humanity. But back to our question--what motivates scientists, companies, and funders to develop bold new treatments?

The explanations vary not only for different parts of the industry (academics, pharma companies, biotech firms, government agencies such as NIH, foundations, patient advocacy groups) but for institutions at different levels in their field. And of course, individual scientists differ, some seeking only advancement in their departments and the rewards of publishing, whereas others jump whole-heartedly into the money pipeline.

The most illustrious and successful projects in open source and open culture have found ways to attract both funds and content. Sage, the Federation, and Arch2POCM will have to find their own sources of ammunition.

The fuse for this reaction may have to begin with funders in government and the major foundations. Currently they treat only a published paper as fulfillment of a contract. The journals also have to be brought into the game. All the other elements of the data chain that precede and follow publication need to get their due.

Sage is creating a format for citing data sets, which can be used in the same ways researchers cite papers. A researcher named Eric Shadt also announced a new journal Open Network Biology, with the commitment of publishing research papers along with the network models used, the software behind the results, and the underlying data.

Although researchers are competitive, they also recognize the importance of sharing information, so they are persuadable. If a researcher believes someone else may validate and help to improve her data, she has a strong incentive to make it open. Releasing her data can also raise her visibility in the field, independent of publications.

Arch2POCM has even more ticklish tasks. On the one hand, they want to direct researchers toward work that has a good chance of producing a treatment--and even this goal is muddled by the desire to encourage more risk-taking and a willingness to look beyond familiar genes. (Edwards ran statistics on studies in journals and found them severely clustered among a few genes that had already been thoroughly explored, mostly ignoring the other 80% or 90% of the human genome even though it is known to have characteristics of interest in the health field. His highly skewed graph drew a strong response of concern from the audience.)

According to Teri Melese, Director of Research Technologies and Alliances for UCSF School of Medicine, pharma companies already have programs aiming to promote research that uses their compounds by offering independent researchers the opportunity to submit their ideas for studies. But the company has to approve each project , and although the researcher can publish results, the data used in the experiment remains tightly under the control of the researcher or the company. This kind of program shows that nearly infinite degrees of compromise lie between totally closed research systems and a completely open commons--but to get the benefits of openness, companies and researchers will need to relinquish a lot more control than they have been willing to up till now.

The prospect of better research should attract funders, and Arch2POCM targets pharma companies in particular to pony up millions for research. The companies have a precedent for sharing data under the rubric of "pre-competitive research." According to Sage staffer Lara Mangravite, eight pharma companies have donated some research data to Sage.

The big trick is to leave as much of the results in the commons as possible while finding the right point in the development process where companies can extract compounds or other information and commercialize them as drugs. Sage would like to keep the core genetic information free from patents, but is willing to let commercial results be patented. Stephen Friend told me, "It is important to maintain a freedom of interoperability for the data and the metadata found within the hosted models of disease. Individuals and companies can still reduce to practice some of the proposed functions for proteins and file patents on these findings without contaminating the freedom of the information hosted on the commons platform."

The following diagram tries to show, in a necessarily over-simplified form, the elements that go into research and are produced by research. Naturally, results of some research are fed back in circular form to further research. Each input is represented as a circle, and is accompanied by a list of stake-holders who can assert ownership over it. Medicines, patents, and biological markers are listed at the bottom as obvious outputs that are not considered as part of the research inputs.

[Diagram of research inputs and outputs]

Inputs and outputs of research

The relationships ultimately worked out between the Sage commons and the pharma companies--which could be different for different classes of disease--will be crucial. The risk of being too strict is to drive away funds, while the risk of being too accommodating is to watch the commons collapse into just another consortium that divies up rewards among participants without giving the rest of the world open access.

What about making researchers who use the commons return the results of their research to the commons? This is another ticklish issue that was endlessly discussed in (and long after) a break-out session I attended. The Sage commons was compared repeatedly to software, with references to well-known licenses such as the GNU GPL, but analogies were ultimately unhelpful. A genetic commons represents a unique relationship among data related to particular patients, information of use in various stages of research, and commercially valuable products (to the tune of billions of dollars).

Sage and its supporters naturally want to draw as much research in to the commons as they can. It would be easy to draw up a reciprocal terms of service along the lines of, "If you use our data, give back your research results"--easy to draw up but hard to enforce. John Wilbanks, a staff person at Creative Commons who has worked heavily with Sage, said such a stipulation would be a suggestion rather than a requirement. If someone uses data without returning the resulting research, members of the community around the commons could express their disapproval by not citing the research.

But all this bothers me. Every open system recognizes that it has to co-exist with a proprietary outer world, and provide resources to that world. Even the GNU GPL doesn't require you to make your application free software if you compile it with the GNU compiler or run it on Linux. Furthermore, the legal basis for strict reciprocity is dubious, because data is not protected by copyright or any other intellectual property regime. (See my article on collections of information.)

I think that outsiders attending the Congress lacked a fundamental appreciation of the purpose of an information commons. One has to think long-term--not "what am I going to get from this particular contribution?" but "what might I be able to do in 10 years that I can't do today once a huge collection of tools and data is in the public domain?" Nobody knows, when they put something into the commons, what the benefits will be. The whole idea is that people will pop up out of nowhere and use the information in ways that the original donors could not imagine. That was the case for Linux and is starting to come true for Android. It's the driving motivation behind the open government movement. You have to have a certain faith to create a commons.

At regular points during the Congress, attendees pointed out that no legitimate motivation exists in health care unless it is aimed ultimately toward improving life for the patients. The medical field refers to the experiences of patients--the ways they react to drugs and other treatments--as "post-market effects," which gives you a hint where the field's priorities currently lie.

Patients were placed front and center in the opening keynote by Peter Kapitein, a middle-aged Dutch banker who suffered a series of wrenching (but successful) treatments for lymphoma. I'll focus in on one of his statements later.

It was even more impressive to hear a central concern for the patient's experience expressed by Vicki Seyfert-Margolis, Senior Science Advisor to the FDA's Chief Scientist. Speaking for herself, not as a representative of the FDA, she chastised industry, academia, and government alike for not moving fast enough to collaborate and crowdsource their work, then suggested that in the end the patients will force change upon us all. While suggesting that the long approval times for drugs and (especially) medical devices lie in factors outside the FDA's control, she also said the FDA is taking complaints about its process seriously and has launched a holistic view of the situation under its current director.

[Photo of Vicki Seyfert-Margolis

Vicki Seyfert-Margolis keynoting at Sage Commons Congress

The real problem is not slow approval, but a lack of drug candidates. Submissions by pharmaceutical companies have been declining over the years. The problem returns to the old-fashioned ways the industry works: walled off into individual labs that repeat each other's work over and over and can't learn from each other.

One of the Congress's break-out groups was tasked with outreach and building bonds with the public. Not only could Sage benefit if the public understood its mission and accomplishments (one reason I'm writing this blog) but patients are key sources for the information needed in the commons.

There was some discussion about whether Sage should take on the area served by PatientsLikeMe and DIYgenomics, accepting individual donations of information. I'm also a bit dubious about far a Facebook page will reach. The connection between patient input and useful genetic information is attenuated and greatly segmented. It may be better to partner with the organizations that interact directly with individuals among the public.

It's more promising to form relationships with patient advocacy groups, as a staff person from the Genetic Alliance pointed out. Advocacy groups can find patients for drug trials (many trials are canceled for lack of appropriate patients) and turn over genetic and phenotypal information collected from those patients. (A "phenotype" basically refers to everything interesting about you that expresses your state of health. It could include aspects of your body and mental state, vital statistics, a history of diseases and syndromes you've had, and more.)

This posting is one of a five-part series. Next installment: Private practice, how to respect the patient

May 03 2011

Collaborative genetics, part 2: Five Easy Pieces, Sage's Federation

Previous installment:
The ambitious goals of Sage Commons Congress

A pilot project was launched by Sage with four university partners under the moniker of the Federation, which sounds like something out of a spy thriller (or Star Trek, which was honored in stock photos in the PowerPoint presentation about this topic). Hopefully, the only thrill will be that expressed by the participants. Three of them presented the results of their research into aspects of aging and disease at the Congress.

The ultimate goal of the Federation is to bring together labs from different places to act like a single lab. Its current operation is more modest. As a pilot, the Federation received no independent funding. Each institution agreed simply to allow their researchers to collaborate with the other four institutions. Atul Butte of Stanford reported that the lack of explicit funding probably limited collaboration. In particular, the senior staff on each project did very little communication because their attention was always drawn to other tasks demanded by their institutions, such as writing grant proposals. Junior faculty did most of the actual collaboration.

As one audience member pointed out, "passion doesn't scale." But funding will change the Federation as well, skewing it toward whatever gets rewarded.

[Photo of audience and podium]

Audience and podium at Sage Commons Congress

When the Federation grows, the question it faces is whether to incorporate more institutions in a single entity under Sage's benign tutelage or to spawn new Federations that co-exist in their own loose federation. But the issue of motivations and rewards has to be tackled.

Another organization launched by Sage is Arch2POCM, whose name requires a bit of elucidation. The "Arch" refers to "archipelago" as a metaphor for a loose association among collaborating organizations. POCM stands for Proof of Clinical Mechanism. The founders of Arch2POCM believe that if the trials leading to POCM (or the more familiar proof of concept, POC) were done by public/private partnerships free of intellectual property rights, companies could benefit from reduced redundancy while still finding plenty of opportunities to file patents on their proprietary variants.

Arch2POCM, which held its own summit with a range of stakeholders in conjunction with the larger Congress, seeks to establish shared, patent-free research on a firm financial basis, putting organizational processes in place to reward scientists for producing data and research that go into the commons. rch2POCM's reach is ambitious: to find new biological markers (substances or techniques for tracking what happens in genes), and even the compounds (core components of effective drugs) that treat diseases.

The pay-off for a successful Arch2POCM project is enticing. Not only could drugs be developed much more cheaply and quickly, but we might learn more about the precise ways they affect patients so that we can save patients from taking drugs that are ineffective in their individual cases, and eliminate adverse effects. To get there, incentives once again come to the fore. A related platform called Synapse hosts the data models, providing a place to pick targets and host the clinical data produced by the open-access clinical trials.

This posting is one of a five-part series. Next installment: Dividing the pie, from research to patents

May 02 2011

Collaborative genetics, Part 1: The ambitious goals of Sage Commons Congress

In a field rife with drug-addicted industries that derive billions of dollars from a single product, and stocked with researchers who scramble for government grants (sadly cut back by the recent US federal budget), the open sharing of genetic data and tools may seem a dream. But it must be more than a dream when the Sage Commons Congress can draw 150 attendees (turning away many more) from research institutions such as the Netherlands Bioinformatica Centre and Massachusetts General Hospital, leading universities from the US and Europe, a whole roster of drug companies (Pfizer, Merck, Novartis, Lilly, Genentech), tech companies such as Microsoft and, foundations such as Alfred P. Sloan, and representatives from the FDA and the White House. I felt distinctly ill at ease trying to fit into such a well-educated crowd, but was welcomed warmly and soon found myself using words such as "phenotype" and "patient stratification."

Money is not the only complicating factor when trying to share knowledge about our genes and their effect on our health. The complex relationships of information generation, and how credit is handed out for that information, make biomedical data a case study all its own.

The complexity of health research data

I listened a couple weeks ago as researchers at this congress, held by Sage Bionetworks, questioned some of their basic practices, and I realized that they are on the leading edge of redefining what we consider information. For most of the history of science, information consisted of a published paper, and the scientist tucked his raw data in a moldy desk drawer. Now we are seeing a trend in scientific journals toward requiring authors to release the raw data with the paper (one such repository in biology is Dryad). But this is only the beginning. Consider what remains to be done:

  • It takes 18 to 24 months to get a paper published. The journal and author usually don't want to release the data until the date of publication, and some add an arbitrary waiting period after publication. That's an extra 18 to 24 months (a whole era in some fields) during which that data is withheld from researchers who could have built new discoveries on it.

  • Data must be curated, which includes:

  • Being checked for corrupt data and missing fields (experimental artifacts)

  • Normalization

  • Verifying HIPAA compliance and other assurances that data has been properly de-identified

  • Possible formatting according to some standard

  • Reviewing for internal and external validity

  • Advocates of sharing hope this work be crowdsourced to other researchers who want to use the data. But then who gets credited and rewarded for the work?

  • Negative results--experiments showing that a treatment doesn't work--are extremely important, and the data behind them is even more important. Of course, knowing where other researchers or companies failed could boost the efforts of other researchers and companies. Furthermore this data may help accomplish patient stratification--that is, show when some patients will benefit and some will not, even when their symptoms seem the same. The medical field is notorious for suppressing negative results, and the data rarely reaches researchers who can use it.

  • When researchers choose to release data--or are forced to do so by their publishers--it can be in an atrocious state because it missed out on the curation steps just mentioned. The data may also be in a format that makes it hard to extract useful information, either because no one has developed and promulgated an appropriate format, or because the researcher didn't have time to adopt it. Other researchers may not even be able to determine exactly what the format it. Sage is working on very simple text-based formats that provide a lowest common denominator that will help researchers get started.

  • Workflows and practices in the workplace have a big effect on the values taken by the data. These are very hard to document, but can help a great deal in reproducing and validating results. Geneticists are starting to use a workflow documentation tool called Taverna to record the ways they coordinate different software tools and data sets.

  • Data can be interpreted in multiple ways. Different teams look for different criteria and apply different standards of quality. It would be useful to share these variations.

  • A repeated theme at the Congress was "going beyond the narrative." The narrative here is the published article. Each article tells a story and draws conclusions. But a lot goes on behind the scenes in the art and science of medicine. Furthermore, letting new hypotheses emerge from data is just as important as verifying the narrative provided by one's initial hypothesis.

    One of the big questions raised in my mind--and not covered in the conference--was the effect it would have on the education of the next generation of scientists were teams to expose all those hidden aspects of data: the workflows, the curation and validation techniques, the interpretations. Perhaps you wouldn't need to attend the University of California at Berkeley to get a Berkeley education, or risk so many parking tickets along the way. Certainly, young researchers would have powerful resources for developing their craft, just as programmers have with the source code for free software.

    I've just gone over a bit of the material that the organizers of the Sage Commons Congress want their field to share. Let's turn to some of structures and mechanisms.

    Of networks

    Take a step back. Why do geneticists need to share data? There are oodles of precedents, of course: the Human Genome Project, biobricks, the Astrophysics Data System (shown off in a keynote by Alyssa A. Goodman from Harvard), open courseware, open access journals, and countless individual repositories put up by scientists. A particularly relevant data sharing initiative is the International HapMap Project, working on a public map of the human genome "which will describe the common patterns of human DNA sequence variation." This is not a loose crowdsourcing project, but more like a consortium of ten large research centers promising to release results publicly and forgo patents on the results.

    The field of genetics presents specific challenges that frustrate old ways of working as individuals in labs that hoard data. Basically, networks of genetic expression requires networks of researchers to untangle them.

    In the beginning, geneticists modeled activities in the cell through linear paths. A particular protein would activate or inhibit a particular gene that would then trigger other activities with ultimate effects on the human body.

    They found that relatively few activities could be explained linearly, though. The action of a protein might be stymied by the presence of others. And those other actors have histories of their own, with different pathways triggering or inhibiting pathways at many points. Stephen Friend, President of Sage Bionetworks, offers the example of an important gene implicated in breast cancer, the Human Epidermal growth factor Receptor 2, HER2/neu. The drugs that target this protein are weakened when another protein, Akt, is present.

    Trying to map these behaviors, scientists come up with meshes of paths. The field depends now on these network models. And one of its key goals is to evaluate these network models--not as true or false, right or wrong, because they are simply models that represent the life of the cell about as well as the New York subway map represents the life of the city--but for the models' usefulness in predicting outcomes of treatments.

    Network models containing many actors and many paths--that's why collaborations among research projects could contribute to our understanding of genetic expression. But geneticists have no forum for storing and exchanging networks. And nobody records them in the same format, which makes them difficult to build, trade, evaluate, and reuse.

    The Human Genome Project is a wonderful resource for scientists, but it contains nothing about gene expression, nothing about the network models and workflows and methods of curation mentioned earlier, nothing about software tools and templates to promote sharing, and ultimately nothing that can lead to treatments. This huge, multi-dimensional area is what the Sage Commons Congress is taking on.

    More collaboration, and a better understanding of network models, may save a field that is approaching crisis. The return on investment for pharmaceutical research, according to researcher Aled Edwards, has gone down over the past 20 years. In 2009, American companies spent one hundred billion dollars on research but got only 21 drugs approved, and only 7 of those were truly novel. Meanwhile, 90% of drug trials fail. And to throw in a statistic from another talk (Vicki Seyfert-Margolis from the FDA), drug side effects create medical problems in 7% of patients who take the drugs, and require medical interventions in 3% or more cases.

    This posting is one of a five-part series. Next installment: Five Easy Pieces, Sage's Federation

    Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
    Could not load more posts
    Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
    Just a second, loading more posts...
    You've reached the end.
    No Soup for you

    Don't be the product, buy the product!

    YES, I want to SOUP ●UP for ...