About
Abbrev:..oAnth.....Motto:...'Nothing to Hide'.#25c3/#CCC.:.. Den Nachgeborenen ein
gemahnendes Vorbild & zur bleibenden Erinnerung - Loc: München (Munich - Germany).
..............................................................................................................................
Intended: a caleidoscope of repostings, feeds & direct postings in EN....DE....FR..
Selected entries from oAnth are provided via scoop.it - oAnth miscellaneous .........
..............................................................................................................................
Start of active postings on this Tumblelog Diary [microblogging -- WP] on Jan 2009,
nonetheless a great number of earlier entries are indirectly implemented via RSS-feeds.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Selection by entry types - starting with the latest. . . . links. . . texts. . . quotes. . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . files . . . videos . . . images . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
See likewise . . . . . . . >> 02myTagManual . . . . . . >> latest compilations . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Links & feeds to my Posterous-account are protected - pls use password: oA:acc_
:: at twitter >> 02mytwi01 ... diaspora* >> oAnth ... friendfeed >> 02myffeed01 ::
..............................................................................................................................
............ ABOUT THE ACTUAL SOUP.IO STATUS - - - latest entry 2012-03-27 ...........
2012-05-08 - oAnth: during the coming days I will hardly be capable for personal online
aktivities - only RSS import will be provided, if soup.io works regulary.
Click here to check if anything new just came in.
May 02 2012
Recombinant Research: Breaking open rewards and incentives
In the previous articles in this series I've looked at problems in current medical research, and at the legal and technical solutions proposed by Sage Bionetworks. Pilot projects have shown encouraging results but to move from a hothouse environment of experimentation to the mainstream of one of the world's most lucrative and tradition-bound industries, Sage Bionetworks must aim for its nucleus: rewards and incentives.
Previous article in the series: Sage Congress plans for patient engagement.
Think about the publication system, that wretchedly inadequate medium for transferring information about experiments. Getting the data on which a study was based is incredibly hard; getting the actual samples or access to patients is usually impossible. Just as boiling vegetables drains most of their nutrients into the water, publishing results of an experiment throws away what is most valuable.
But the publication system has been built into the foundation of employment and funding over the centuries. A massive industry provides distribution of published results to libraries and research institutions around the world, and maintains iron control over access to that network through peer review and editorial discretion. Even more important, funding grants require publication (but the data behind the study only very recently). And of course, advancement in one's field requires publication.
Lawrence Lessig, in his keynote, castigated for-profit journals for restricting access to knowledge in order to puff up profits. A chart in his talk showed skyrocketing prices for for-profit journals in comparison to non-profit journals. Lessig is not out on the radical fringe in this regard; Harvard Library is calling the current pricing situation "untenable" in a move toward open access echoed by many in academia.

Lawrence Lessig keynote at Sage Congress.
How do we open up this system that seemed to serve science so well for so long, but is now becoming a drag on it? One approach is to expand the notion of publication. This is what Sage Bionetworks is doing with Science Translational Medicine in publishing validated biological models, as mentioned in an earlier article. An even more extensive reset of the publication model is found in Open Network Biology (ONB), an online journal. The publishers require that an article be accompanied by the biological model, the data and code used to produce the model, a description of the algorithm, and a platform to aid in reproducing results.
But neither of these worthy projects changes the external conditions that prop up the current publication system.
When one tries to design a reward system that gives deserved credit to other things besides the final results of an experiment, as some participants did at Sage Congress, great unknowns loom up. Is normalizing and cleaning data an activity worth praise and recognition? How about combining data sets from many different projects, as a Synapse researcher did for the TCGA? How much credit do you assign researchers at each step of the necessary procedure for a successful experiment?
Let's turn to the case of free software to look at an example of success in open sharing. It's clear that free software has swept the computer world. Most web sites use free software ranging from the server on which they run to the language compilers that deliver their code. Everybody knows that the most popular mobile platform, Android, is based on Linux, although fewer realize that the next most popular mobile platforms, Apple's iPhones and iPads, run on a modified version of the open BSD operating system. We could go on and on citing ways in which free and open source software have changed the field.
The mechanism by which free and open source software staked out its dominance in so many areas has not been authoritatively established, but I think many programmers agree on a few key points:
Computer professionals encountered free software early in their careers, particularly as students or tinkerers, and brought their predilection for it into jobs they took at stodgier institutions such as banks and government agencies. Their managers deferred to them on choices for programming tools, and the rest is history.
Of course, computer professionals would not have chosen the free tools had they not been fit for the job (and often best for the job). Why is free software so good? Probably because the people creating it have complete jurisdiction over what to produce and how much time to spend producing it, unlike in commercial ventures with requirements established through marketing surveys and deadlines set unreasonably by management.
Different pieces of free software are easy to hook up, because one can alter their interfaces as necessary. Free software developers tend to look for other tools and platforms that could work with their own, and provide hooks into them (Apache, free database engines such as MySQL, and other such platforms are often accommodated.) Customers of proprietary software, in contrast, experience constant frustration when they try to introduce a new component or change components, because the software vendors are hostile to outside code (except when they are eager to fill a niche left by a competitor with market dominance). Formal standards cannot overcome vendor recalcitrance--a painful truth particularly obvious in health care with quasi-standards such as HL7.
Free software scales. Programmers work on it tirelessly until it's as efficient as it needs to be, and when one solution just can't scale any more, programmers can create new components such as Cassandra, CouchDB, or Redis that meet new needs.
Are there lessons we can take from this success story? Biological research doesn't fit the circumstances that made open source software a success. For instance, researchers start out low on the totem pole in very proprietary-minded institutions, and don't get to choose new ways of working. But the cleverer ones are beginning to break out and try more collaboration. Software and Internet connections help.
Researchers tend to choose formats and procedures on an ad hoc, project by project basis. They haven't paid enough attention to making their procedures and data sets work with those produced by other teams. This has got to change, and Sage Bionetworks is working hard on it.
Research is labor-intensive. It needs desperately to scale, as I have pointed out throughout this article, but to do so it needs entire new paradigms for thinking about biological models, workflow, and teamwork. This too is part of Sage Bionetworks' mission.
Certain problems are particularly resistant in research:
Conditions that affect small populations have trouble raising funds for research. The Sage Congress initiatives can lower research costs by pooling data from the affected population and helping researchers work more closely with patients.
Computation and statistical methods are very difficult fields, and biological research is competing with every other industry for the rare individuals who know these well. All we can do is bolster educational programs for both computer scientists and biologists to get more of these people.
There's a long lag time before one knows the effects of treatments. As Heywood's keynote suggested, this is partly solved by collecting longitudinal data on many patients and letting them talk among themselves.
Another process change has revolutionized the computer field: agile programming. That paradigm stresses close collaboration with the end-users whom the software is supposed to benefit, and a willingness to throw out old models and experiment. BRIDGE and other patient initiatives hold out the hope of a similar shift in medical research.
All these things are needed to rescue the study of genetics. It's a lot to do all at once. Progress on some fronts were more apparent than others at this year's Sage Congress. But as more people get drawn in, and sometimes fumbling experiments produce maps for changing direction, we may start to see real outcomes from the efforts in upcoming years.
All articles in this series, and others I've written about Sage Congress, are available through a bit.ly bundle.
OSCON 2012 — Join the world's open source pioneers, builders, and innovators July 16-20 in Portland, Oregon. Learn about open development, challenge your assumptions, and fire up your brain.Save 20% on registration with the code RADAR20
May 01 2012
Recombinant Research: Sage Congress plans for patient engagement
Clinical trials are the pathway for approving drug use, but they aren't good enough. That has become clear as a number of drugs (Vioxx being the most famous) have been blessed by the FDA, but disqualified after years of widespread use reveal either lack of efficacy or dangerous side effects. And the measures taken by the FDA recently to solve this embarrassing problem continue the heavy-weight bureaucratic methods it has always employed: more trials, raising the costs of every drug and slowing down approval. Although I don't agree with the opinion of Avik S. A. Roy (reprinted in Forbes) that Phase III trials tend to be arbitrary, I do believe it is time to look for other ways to test drugs for safety and efficacy.
First article in the series: Recombinant Research: Sage Congress Promotes Data Sharing in Genetics.
But the Vioxx problem is just one instance of the wider malaise afflicting the drug industry. They just aren't producing enough new medications, either to solve pressing public needs or to keep up their own earnings. Vicki Seyfert-Margolis of the FDA built on her noteworthy speech at last year's Sage Congress (reported in one of my articles about the conference) with the statistic that drug companies have submitted 20% fewer medications to the FDA between 2001 and 2007. Their blockbuster drugs produce far fewer profits than before as patents expire and fewer new drugs emerge (a predicament called the "patent cliff"). Seyfert-Margolis intimated that this crisis in the cause of layoffs in the industry, although I heard elsewhere that the companies are outsourcing more research, so perhaps the downsizing is just a reallocation of the same money.
Benefits of patient involvement
The field has failed to rise to the challenges posed by new complexity. Speakers at Sage Congress seemed to feel that genetic research has gone off the tracks. As the previous article in this series explained, Sage Bionetworks wants researchers to break the logjam by sharing data and code in GitHub fashion. And surprisingly, pharma is hurting enough to consider going along with an open research system. They're bleeding from a situation where as much as 80% of each clinical analysis is spent retrieving, formatting, and curating the data. Meanwhile, Kathy Giusti of the Multiple Myeloma Research Foundation says that in their work, open clinical trials are 60% faster.
Attendees at a breakout session where I sat in, including numerous managers from major pharma companies, expressed confidence that they could expand public or "pre-competitive" research in the direction Sage Congress proposed. The sector left to engage is the one that's central to all this work--the public.
If we could collect wide-ranging data from, say, 50,000 individuals (a May 2013 goal cited by John Wilbanks of Sage Bionetworks, a Kauffman Foundation Fellow), we could uncover a lot of trends that clinical trials are too narrow to turn up. Wilbanks ultimately wants millions of such data samples, and another attendee claimed that "technology will be ready by 2020 for a billion people to maintain their own molecular and longitudinal health data." And Jamie Heywood of PatientsLikeMe, in his keynote, claimed to have demonstrated through shared patient notes that some drugs were ineffective long before the FDA or manufacturers made the discoveries. He decried the current system of validating drugs for use and then failing to follow up with more studies, snorting that, "Validated means that I have ceased the process of learning."
But patients have good reasons to keep a close hold on their health data, fearing that an insurance company, an identity thief, a drug marketer, or even their own employer will find and misuse it. They already have little enough control over it, because the annoying consent forms we always have shoved in our faces when we come to a clinic give away a lot of rights. Current laws allow all kinds of funny business, as shown in the famous case of the Vermont law against data mining, which gave the Supreme Court a chance to say that marketers can do anything they damn please with your data, under the excuse that it's de-identified.
In a noteworthy poll by Sage Bionetworks, 80% of academics claimed they were comfortable sharing their personal health data with family members, but only 31% of citizen advocates would do so. If that 31% is more representative of patients and the general public, how many would open their data to strangers, even when supposedly de-identified?
The Sage Bionetworks approach to patient consent
It's basic research that loses. So Wilbanks and a team have been working for the past year on a "portable consent" procedure. This is meant to overcome the hurdle by which a patient has to be contacted and give consent anew each time a new researcher wants data related to his or her genetics, conditions, or treatment. The ideal behind portable consent is to treat the entire research community as a trusted user.
The current plan for portable consent provides three tiers:
Tier 1
No restrictions on data, so long as researchers follow the terms of service. Hopefully, millions of people will choose this tier.
Tier 2
A middle ground. Someone with asthma may state that his data can be used only by asthma researchers, for example.
Tier 3
Carefully controlled. Meant for data coming from sensitive populations, along with anything that includes genetic information.
Synapse provides a trusted identification service. If researchers find a person with useful characteristics in the last two tiers, and are not authorized automatically to use that person's data, they can contact Synapse with the random number assigned to the person. Synapse keeps the original email address of the person on file and will contact him or her to request consent.
Portable consent also involves a lot of patient education. People will sign up through a software wizard that explains the risks. After choosing portable consent, the person decides how much to put in: 23andMe data, prescriptions, or whatever they choose to release.
Sharon Terry of the Genetic Alliance said that patient advocates currently try to control patient data in order to force researchers to share the work they base on that data. Portable consent loosens this control, but the field may be ready for its more flexible conditions for sharing.
Pharma companies and genetics researchers have lots to gain from access to enormous repositories of patient data. But what do the patients get from it? Leaders in health care already recognize that patients are more than experimental subjects and passive recipients of treatment. The recent ONC proposal for Stage 2 of Meaningful Use includes several requirements to share treatment data with the people being treated (which seems kind of a no-brainer when stated this baldly) and the ONC has a Consumer/Patient Engagement Power Team.
Sage Congress is fully engaged in the patient engagement movement too. One result is the BRIDGE initiative, a joint project of Sage Bionetworks and Ashoka with funding from the Robert Wood Johnson Foundation, to solicit questions and suggestions for research from patients. Researchers can go for years researching a condition without even touching on some symptom that patients care about. Listening to patients in the long run produces more cooperation and more funding.
Portable consent requires a leap of faith, because as Wilbanks admits, releasing aggregates of patient data mean that over time, a patient is almost certain to be re-identified. Statistical techniques are just getting too sophisticated and compute power growing too fast for anyone to hide behind current tricks such as using only the first three digits of a five-digit postal code. Portable consent requires the data repository to grant access only to bona fide researchers and to set terms of use, including a ban on re-identifying patients. Still, researchers will have rights to do research, redistribute data, and derive products from it. Audits will be built in.
But as mentioned by Kelly Edwards of the University of Washington, tools and legal contracts can contribute to trust, but trust is ultimately based on shared values. Portable consent, properly done, engages with frameworks like Synapse to create a culture of respect for data.
In fact, I think the combination of the contractual framework in portable consent and a platform like Synapse, with its terms of use, might make a big difference in protecting patient privacy. Seyfert-Margolis cited predictions that 500 million smartphone users will be using medical apps by 2015. But mobile apps are notoriously greedy for personal data and cavalier toward user rights. Suppose all those smartphone users stored their data in a repository with clear terms of use and employed portable consent to grant access to the apps? We might all be safer.
The final article in this series will evaluate the prospects for open research in genetics, with a look at the grip of journal publishing on the field, and some comparisons to the success of free and open source software.
Next: Breaking Open Rewards and Incentives. All articles in this series, and others I've written about Sage Congress, are available through a bit.ly bundle.
OSCON 2012 — Join the world's open source pioneers, builders, and innovators July 16-20 in Portland, Oregon. Learn about open development, challenge your assumptions, and fire up your brain.Save 20% on registration with the code RADAR20
April 30 2012
Recombinant Research: Sage Congress promotes data sharing in genetics
Given the exponential drop in the cost of personal genome sequencing (you can get a basic DNA test from 23andMe for a couple hundred dollars, and a full sequence will probably soon come down to one thousand dollars in cost), a new dawn seems to be breaking forth for biological research. Yet the assessment of genetics research at the recent Sage Congress was highly cautionary. Various speakers chided their own field for tilling the same ground over and over, ignoring the urgent needs of patients, and just plain researching the wrong things.
Sage Congress also has some plans to fix all that. These projects include tools for sharing data and storing it in cloud facilities, running challenges, injecting new fertility into collaboration projects, and ways to gather more patient data and bring patients into the planning process. Through two days of demos, keynotes, panels, and breakout sessions, Sage Congress brought its vision to a high-level cohort of 230 attendees from universities, pharmaceutical companies, government health agencies, and others who can make change in the field.
In the course of this series of articles, I'll pinpoint some of the pain points that can force researchers, pharmaceutical companies, doctors, and patients to work together better. I'll offer a look at the importance of public input, legal frameworks for cooperation, the role of standards, and a number of other topics. But we'll start by seeing what Sage Bionetworks and its pals have done over the past year.
Synapse: providing the tools for genetics collaboration
Everybody understands that change is driven by people and the culture they form around them, not by tools, but good tools can make it a heck of a lot easier to drive change. To give genetics researchers the best environment available to share their work, Sage Bionetworks created the Synapse platform.
Synapse recognizes that data sets in biological research are getting too large to share through simple data transfers. For instance, in his keynote about cancer research (where he kindly treated us to pictures of cancer victims during lunch), UC Santa Cruz professor David Haussler announced plans to store 25,000 cases at 200 gigabytes per case in the Cancer Genome Atlas, also known as TCGA in what seems to be a clever pun on the four nucleotides in DNA. Storage requirements thus work out to 5 petabytes, which Haussler wants to be expandable to 20 petabytes. In the face of big data like this, the job becomes moving the code to the data, not moving the data to the code.
Synapse points to data sets contributed by cooperating researchers, but also lets you pull up a console in a web browser to run R or Python code on the data. Some effort goes into tagging each data set with associated metadata: tissue type, species tested, last update, number of samples, etc. Thus, you can search across Synapse to find data sets that are pertinent to your research.
One group working with Synapse has already harmonized and normalized the data sets in TCGA so that a researcher can quickly mix and run stats on them to extract emerging patterns. The effort took about one and half full-time employees for six months, but the project leader is confident that with the system in place, "we can activate a similar size repository in hours."
This contribution highlights an important principle behind Synapse (appropriately called "viral" by some people in the open source movement): when you have manipulated and improved upon the data you find through Synapse, you should put your work back into Synapse. This work could include cleaning up outlier data, adding metadata, and so on. To make work sharing even easier, Synapse has plans to incorporate the Amazon Simple Workflow Service (SWF). It also hopes to add web interfaces to allow non-programmers do do useful work with data.
The Synapse development effort was an impressive one, coming up with a feature-rich Beta version in a year with just four coders. And Synapse code is entirely open source. So not only is the data distributed, but the creators will be happy for research institutions to set up their own Synapse sites. This may make Synapse more appealing to geneticists who are prevented by inertia from visiting the original Synapse.
Mike Kellen, introducing Synapse, compared its potential impact to that of moving research from a world of journals to a world like GitHub, where people record and share every detail of their work and plans. Along these lines, Synapse records who has used a data set. This has many benefits:
Researchers can meet up with others doing related work.
It gives public interest advocates a hook with which to call on those who benefit commercially from Synapse--as we hope the pharmaceutical companies will--to contribute money or other resources.
Members of the public can monitor accesses for suspicious uses that may be unethical.
There's plenty more work to be done to get data in good shape for sharing. Researchers must agree on some kind of metadata--the dreaded notion of ontologies came up several times--and clean up their data. They must learn about data provenance and versioning.
But sharing is critical for such basics of science as reproducing results. One source estimates that 75% of published results in genetics can't be replicated. A later article in this series will examine a new model in which enough metainformation is shared about a study for it to be reproduced, and even more important to be a foundation for further research.
With this Beta release of Synapse, Sage Bionetworks feels it is ready for a new initiative to promote collaboration in biological research. But how do you get biologists around the world to start using Synapse? For one, try an activity that's gotten popular nowadays: a research challenge.
The Sage DREAM challenge
Sage Bionetworks' DREAM challenge asks genetics researchers to find predictors of the progression of breast cancer. The challenge uses data from 2000 women diagnosed with breast cancer, combining information on DNA alterations affecting how their genes were expressed in the tumors, clinical information about their tumor status, and their outcomes over ten years. The challenge is to build models integrating the alterations with molecular markers and clinical features to predict which women will have the most aggressive disease over a ten year period.
Several hidden aspects of the challenge make it a clever vehicle for Sage Bionetworks' values and goals. First, breast cancer is a scourge whose urgency is matched by its stubborn resistance to diagnosis. The famous 2009 recommendations of U.S. Preventive Services Task Force, after all the controversy was aired, left us with the dismal truth that we don't know a good way to predict breast cancer. Some women get mastectomies in the total absence of symptoms based just on frightening family histories. In short, breast cancer puts the research and health care communities in a quandary.
We need finer-grained predictors to say who is likely to get breast cancer, and standard research efforts up to now have fallen short. The Sage proposal is to marshal experts in a new way that combines their strengths, asking them to publish models that show the complex interactions between gene targets and influences from the environment. Sage Bionetworks will publish data sets at regular intervals that it uses to measure the predictive ability of each model. A totally fresh data set will be used at the end to choose the winning model.
The process behind the challenge--particularly the need to upload code in order to run it on the Synapse site--automatically forces model builders to publish all their code. According to Stephen Friend, founder of Sage Bionetworks, "this brings a level of accountability, transparency, and reproducibility not previously achieved in clinical data model challenges."
Finally, the process has two more effects: it shows off the huge amount of genetic data that can be accessed through Synapse, and it encourages researchers to look at each other's models in order to boost their own efforts. In less than a month, the challenge already received more than 100 models from 10 sources.
The reward for winning the challenge is publication in a respected journal, the gold medal still sought by academic researchers. (More on shattering this obelisk later in the series.) Science Translational Medicine will accept results of the evaluation as a stand-in for peer review, a real breakthrough for Sage Bionetworks because it validates their software-based, evidence-driven process.
Finally, the DREAM challenge promotes use of the Synapse infrastructure, and in particular the method of bringing the code to the data. Google is donating server space for the challenge, which levels the playing field for researchers, freeing them from paying for their own computing.
A single challenge doesn't solve all the problems of incentives, of course. We still need to persuade researchers to put up their code and data on a kind of genetic GitHub, persuade pharmaceutical companies to support open research, and persuade the general public to share data about the phonemes (life data) and genes--all topics for upcoming articles in the series.
Next: Sage Congress Plans for Patient Engagement. All articles in this series, and others I've written about Sage Congress, are available through a bit.ly bundle.
OSCON 2012 — Join the world's open source pioneers, builders, and innovators July 16-20 in Portland, Oregon. Learn about open development, challenge your assumptions, and fire up your brain.Save 20% on registration with the code RADAR20
April 19 2012
Sage Congress: The synthesis of open source with genetics
For several years, O'Reilly Radar has been covering the exciting
potential that open source software, open data, and a general attitude
of sharing and cooperation bring to health care. Along with many
exemplary open source projects in areas directly affecting the
public — such as the VA's Blue
Button in electronic medical records and the
href="http://wiki.directproject.org/">Direct project in data
exchange — the study of disease is undergoing a paradigm shift.
Sage Bionetworks stands at the
center of a wide range of academic researchers, pharmaceutical
companies, government agencies, and health providers realizing that
the old closed system of tiny teams who race each other to a cure has
got to change. Today's complex health problems, such as Alzheimer's,
AIDS, and cancer, are too big for a single team. And these
institutions are slowly wrenching themselves out of the habit of data
hoarding and finding ways to work together.
A couple weeks ago I talked to the founder of Sage Bionetworks,
Stephen Friend, about recent advances in open source in this area, and
the projects to be highlighted at the upcoming
http://sagecongress.org/">Sage Commons congress. Steve is careful
to call this a "congress" instead of a "conference" because all
attendees are supposed to pitch in and contribute to the meme pool. I
covered Sage Congress in a series of
articles last year. The following podcast ranges over
topics such as:
- what is Sage Bionetworks [Discussed at the 00:25 mark];
- the commitment of participants to open source software [Discussed at the 01:01 mark];
- how open source can support a business model in drug development [Discussed at the 01:40 mark];
- a look at the upcoming congress [Discussed at the 03:47 mark];
- citizen-led contributions or network science [Discussed at the 06:12 mark];
- data sharing philosophy [Discussed at the 09:01 mark];
- when projects are shared with other institutions [Discussed at the 12:43 mark];
- how to democratize medicine [Discussed at the 17:10 mark];
- a portable legal consent approach where the patient controls his or her own data [Discussed at the 20:07 mark];
- solving the problem of non-sharing in the industry [Discussed at the 22:15 mark]; and
- key speakers at the congress [Discussed at the 26:35 mark].
Sessions from the congress will be broadcast live via webcast and posted on the Internet.
April 14 2012
April 06 2012
Promoting and documenting a small software project: VoIP Drupal update
Isn't the integration of mobile phones and the Web one of the hot topics in modern technology? If so, VoIP Drupal should become a fixture of web development and administration. I have been meeting with leaders of the project to help with their documentation and publicity. I reported on my first meeting with them here on Radar, and this posting is one of a series of follow-ups I plan to write.
Immediate pressures
When Leo Burd, the lead developer of VoIP Drupal, first pulled together a small cohort of supporters to help with documentation and promotion, only three weeks were left before DrupalCon, the major Drupal annual conference. Leo was doing some presentations there, and the pressing question was what users would want in order to get interested in VoIP Drupal, learn more about it, and be able to get started.
Leo was indefatigable. He planned to get some new modules finished before the conference, but in addition to coding and preparing his own presentations, he wanted to address the lack of introductory materials because it might get in the way of enticing new users to try VoIP Drupal.
In the end, Leo led a webinar to drum up interest. The little group of half a dozen self-selected fans reviewed his slides, sat in on a preview, and helped whip it into a really well-focused, hard-hitting survey. This webinar had 94 participants from 19 countries and more than 40 companies.
Michele Metts, a non-profit activist and Drupal consultant, also stepped up to create slides and lead webinars with slides explaining VoIP Drupal basics.

Slide from a VoIP Drupal webinar
Although we agreed that many parts of the system needed more documentation, it was a more effective use of time at this point to do webinars. VoiP Drupal has many interactive aspects that are best shown off through demonstrations. As I'll explain, documentation would require more resources.
Perceptions and challenges
Responses to the webinar and to Leo's DrupalCon presentation helped us understand better what it would take to win more adherents to this useful tool. Leo reported that few people knew of the existence of VoIP Drupal, but that when they heard of it, they assumed it would be expensive. His impression was that they knew traditional PBX systems and didn't realize how much more low-cost and light-weight VoIP is.
In my opinion, Web administrators (and many content providers) still see the Web and the telephone as separate worlds, never to meet. This despite the increasing popularity of running queries and Internet apps on mobile devices. We have to address server-side apathy. There should come a day when the things VoIP Drupal does are taken for granted: people leaving phone numbers on Web sites to receive SMS or voice updates, pressing Web buttons on their cell phones to get an interactive voice menu, etc.
The versatility of VoIP Drupal gives it fantastic potential but makes it harder to explain in an elevator speech. The ways in which voice can be integrated with the Web are nearly infinite. In addition, two distinctly different settings can benefit from VoIP. One consists of highly structured corporate sites, which can use VoIP Drupal for such typical bureaucratic tasks as offering interactive voice menus and routing customers to extensions. The other consists of sites serving underdeveloped areas, where people are more adept at using their phones than dealing with text-based Web sites. We need different pitches for each potential user.
Finally (and here my training as an editor pokes in), there are a number of different audiences who need to understand VoIP Drupal on different levels. Content providers need to understand what it can do and see models for incorporating it into their Drupal sites. Administrators need to understand how to find scripts and offer them to content providers. And programmers need to learn how to write fresh scripts. We also want to attract open source developers who could help enhance and maintain VoIP Drupal. There is even a tool called Visual VoIP Drupal that reduces the amount of programming skill required to create a new script pretty close to a minimum--that creates yet another audience.
Documentation needs
My goal in joining the informal VoiP Drupal promotion team was to use this project as a test case for exploring what software projects do for documentation, and how they could do better.
A number of sites have successfully implemented VoIP Drupal, some of them in quite sophisticated ways, but you could call these the alpha developers or early adopters. They managed, for instance, to find the reference documentation that requires several clicks to access, and which I did not notice for weeks. And needless to say, they required no other documentation, because the tutorials are extremely rudimentary.
I think that VoIP Drupal documentation is typical of early software projects--and in fact better than many. Projects tend to toss the reader a brief tutorial, provide some examples with inadequate explanatory text, and finally dunk the reader in the middle of an API reference. The tutorial tends to be fragile, in the sense that any problem encountered by the reader (a missing step, a change in environment) leaves her in an unrecoverable state, because there is no documentation about the way the system works and its assumptions. And the context for understanding reference documentation is missing, so that it can be used only by experienced developers who have seen similar programs in other environments.
Useful VoIP Drupal documentation includes (these are examples):
Motivational, overview documentation to stimulate the imagination
A very preliminary tutorial that starts a reader off with a minimal program using VoIP's high-level scripting library, and even more minimal documentation for the underlying PHP programming calls.
Some preliminary guidelines for core scripting activities.
Reference documentation in the form of a dump of doc strings from source code.
Our team knew that more descriptive text was needed to pull together these pieces, but whoever wrote a document would need some intensive time with a core developer. This was unfeasible in the rush to get the software in shape for DrupalCon. I found out about the difficulties of producing documents first-hand when I decided to tackle Visual VoIP Drupal, which looked simple and intuitive. Unfortunately, there were a lot of oddities in its behavior, and the simplicity of the interface didn't save me from having to know some of the subtler and less documented aspects of VoIP Drupal programming, such as handling input variables.
In a recent teleconference, I asked a bunch of preliminary questions and got a better idea of what documentation already covers, as well as a starting point for doing more documentation.
Current documentation tasks, in my opinion, include:
Expand the tutorials to show more of the capabilities of VoIP Drupal.
Provide explanations of key topics, such as different ways to handle voice input, keyboard input, and the metainformation provided by VoIP Drupal about each call. The developers' provision of a simple scripting system on top of PHP, and even more the creation of Visual VoIP Drupal, demonstrates their commitment to reaching non-programmers, but we have to follow through by filling in the background they lack.
Create a few videos or webinars on Visual VoIP Drupal.
Make links to the reference documentation more prominent, and link to it liberally in the tutorials and background documents. The use of doc strings from the code is reasonable for reference documentation, because nobody is asking it to look pretty and we want to maximize the probability that it will stay up to date.
Ask the community for examples and case studies, and describe what makes each one an interesting use of VoIP Drupal.
There's plenty that could be done, such as describing how to integrate VoIP Drupal into existing PHP code (and therefore more fully into existing Drupal pages) but that can be postponed. Leo said he's "particularly interested in working with Micky and the rest this group in the creation of a Visual VoIP Drupal webinar, and one about things we can do with VoIP Drupal right out of the box, with no or minimum programming."
What motivates people like Michele and me to put so much time into this project? Certainly, Michele wants to promote a project that she uses in her own work so that it thrives and continues to evolve. I can use what I learn from this work to provide services to other open source communities. But beyond all these individual rewards is a gut feeling that VoIP Drupal is cool. Using it is fun, and talking about it is also fun. Projects have achieved success for more light-weight reasons.
April 05 2012
Steep climb for National Cancer Institute toward open source collaboration
Although a lot of government agencies produce open source software, hardly any develop relationships with a community of outside programmers, testers, and other contributors. I recently spoke to John Speakman of the National Cancer Institute to learn about their crowdsourcing initiative and the barriers they've encountered.
First let's orient ourselves a bit--forgive me for dumping out a lot of abbreviations and organizational affiliations here. The NCI is part of the National Institutes of Health. Speakman is the Chief Program Officer for NCI's Center for Biomedical Informatics and Information Technology. Their major open source software initiative is the Cancer Biomedical Informatics Grid (caBIG), which supports tools for transferring and manipulating cancer research data. For example, it provides access to data classifying the carcinogenic aspects of genes (The Cancer Genome Atlas) and resources to help researchers ask questions of and visualize this data (the Cancer Molecular Analysis Portal).
Plenty of outside researchers use caBIG software, but it's a one-way street, somewhat in the way the Department of Veterans Affairs used to release its VistA software. NCI sees the advantages of a give-and-take such as the CONNECT project has achieved, through assiduous cultivation of interested outside contributors, and wants to wean its outside users away from the dependent relationship that has been all take and no give. And even the VA decided last year that a more collaborative arrangement for VistA would benefit them, thus putting the software under the guidance of an independent non-profit, the Open Source Electronic Health Record Agent (OSEHRA).
Another model is Forge.mil, which the Department of Defense set up with the help of CollabNet, the well-known organization in charge of the Subversion revision control tool. Forge.mil represents a collaboration between the DoD and private contractors, encouraging them to create shared libraries that hopefully increase each contractor's productivity, but it is not open source.
The OSEHRA model--creating an independent, non-government custodian--seems a robust solution, although it takes a lot of effort and risks failure if the organization can't create a community around the project. (Communities don't just spring into being at the snap of a bureaucrat's fingers, as many corporations have found to their regret.) In the case of CONNECT, the independent Alembic Foundation stepped in to fill the gap after a lawsuit stalled CONNECT's development within the government. According to Alembic co-founder David Riley, with the contract issues resolved, CONNECT's original sponsor--the Office of the National Coordinator--is spinning off CONNECT to a private sector, open source entity, and work is underway to merge the two baselines.
Whether an agency manages its own project or spins off management, it has to invest a lot of work to turn an internal project into one that appeals to outside developers. This burden has been discovered by many private corporations as well as public entities. Tasks include:
Setting up public repositories for code and data.
Creating a clean software package with good version control that make downloading and uploading simple.
Possibly adding an API to encourage third-party plugins, an effort that may require a good deal of refactoring and a definition of clear interfaces.
Substantially adding to the documentation.
General purging of internal code and data (sometimes even passwords!) that get in the way of general use.
Companies and institutions have also learned that "build it and they will come" doesn't usually work. An open source or open data initiative must be promoted vigorously, usually with challenges and competitions such as the Department of Health and Human Services offer in their annual Health Data Initiative forums (a.k.a datapaloozas).
With these considerations in mind, the NCI decided in the summer of 2011 to start looking for guidance and potential collaborators. Here, laws designed long ago to combat cronyism put up barriers. The NCI was not allowed to contact anyone it wanted out of the blue. Instead, it has to issue a Request for Information and talk to people who responded. Although the RFI went online, it obviously wasn't widely seen. After all, do you regularly look for RFIs and RFPs from government agencies? If so, I can safely guess that you're paid by a large company or lobbying agency to follow a particular area of interest.
RFIs and RFPs are released as a gesture toward transparency, but in reality they just make it easier for the usual crowd of established contractors and lobbyists to build on the relationships they already have with agencies. And true to form, the NCI received only a limited set of responses, frustrated in their attempts to talk to new actors with the expertise they needed for their open source efforts.
And because the RFI had to allow a limited time window for responses, there is no point in responding to it now.
Still, Speakman and his colleagues are educating themselves and meeting with stakeholders. Cancer research is a hot topic drawing zealous attention from many academic and commercial entities, and they're hungry for data. Already, the NCI is encouraged by the initial positive response from the cancer informatics community, many of whom are eager to see the caBIG software deposited in an open repository like GitHub right away. Luckily, HHS has already negotiated terms of service with GitHub and SourceForge, removing at least one important barrier to entry. The NCI is packaging its first tool (a laboratory information management system called caLIMS) for deposit into a public repository. So I'm hoping the NCI is too caBIG to fail.
February 23 2012
Report from HIMSS 2012: toward interoperability and openness
I was wondering how it would feel to be in the midst of 35,000 people whose livelihoods are driven by the decisions of a large institution at the moment when that institution releases a major set of rules. I didn't really find out, though. The 35,000 people I speak of are the attendees of the HIMSS conference and the institution is the Department of Health and Human Services. But HHS just sort of half-released the rules (called Stage 2 of meaningful use), telling us that they would appear online tomorrow and meanwhile rushing over a few of the key points in a presentation that drew overflow crowds in two rooms.
The reaction, I sensed, was a mix of relief and frustration. Relief because Farzad Mostashari, National Coordinator for Health Information Technology, promised us the rules would be familiar and hewed closely to what advisors had requested. Frustration, however, at not seeing the details. The few snippets put up on the screen contained enough ambiguities and poorly worded phrases that I'm glad there's a 60-day comment period before the final rules are adopted.
There isn't much one can say about the Stage 2 rules until they are posted and the experts have a chance to parse them closely, and I'm a bit reluctant to throw onto the Internet one of potentially 35,000 reactions to the announcement, but a few points struck me enough to be worth writing about. Mostashari used his pulpit for several pronouncements about the rules:
HHS would push ahead on goals for interoperability and health information exchange. "We can't wait five years," said Mostashari. He emphasized the phrase "standard-based" in referring to HIE.
Patient engagement was another priority. To attest to Stage 2, institutions will have to allow at least half their patients to download and transfer their records.
They would strive for continuous quality improvement and clinical decision support, key goals enabled by the building blocks of meaningful use.
Two key pillars of the Stage 2 announcement are requirements to use the Direct project for data exchange and HL7's consolidated CDA for the format (the only data exchange I heard mentioned was a summary of care, which is all that most institutions exchange when a patient is referred).
The announcement demonstrates the confidence that HHS has in the Direct project, which it launched just a couple years ago and that exemplifies a successful joint government/private sector project. Direct will allow health care providers of any size and financial endowment to use email or the Web to share summaries of care. (I mentioned it in yesterday's article.) With Direct, we can hope to leave the cumbersome and costly days of health information exchange behind. The older and more complex CONNECT project will be an option as well.
The other half of that announcement, regarding adoption of the CDA (incarnated as a CCD for summaries of care), is a loss for the older CCR format, which was an option in Stage 1. The CCR was the Silicon Valley version of health data, a sleek and consistent XML format used by Google Health and Microsoft HealthVault. But health care experts criticized the CCR as not rich enough to convey the information institutions need, so it lost out to the more complex CCD.
The news on formats is good overall, though. The HL7 consortium, which has historically funded itself by requiring organizations to become members in order to use its standards, is opening some of them for free use. This is critical for the development of open source projects. And at an HL7 panel today, a spokesperson said they would like to head more in the direction of free licensing and have to determine whether they can survive financially while doing so.
So I'm feeling optimistic that U.S. health care is moving "toward interoperability and openness," the phrase I used in the title to his article and also used in a posting from HIMSS two years ago.
HHS allowed late-coming institutions (those who began the Stage 1 process in 2011) to continue at Stage 1 for another year. This is welcome because they have so much work to do, but means that providers who want to demonstrate Stage 2 information exchange may have trouble because they can't do it with other providers who are ready only for Stage 1.
HHS endorsed some other standards today as well, notably SNOMED for diseases and LRI for lab results. Another nice tidbit from the summit includes the requirement to use electronic medication administration (for instance, bar codes to check for errors in giving medicine) to foster patient safety.
February 22 2012
Report from HIMSS: health care tries to leap the chasm from the average to the superb
I couldn't attend the session today on StealthVest--and small surprise. Who wouldn't want to come see an Arduino-based garment that can hold numerous health-monitoring devices in a way that is supposed to feel like a completely normal piece of clothing? As with many events at the HIMSS conference, which has registered over 35,000 people (at least four thousand more than last year), the StealthVest presentation drew an overflow crowd.
StealthVest sounds incredibly cool (and I may have another chance to report on it Thursday), but when I gave up on getting into the talk I walked downstairs to a session that sounds kind of boring but may actually be more significant: Practical Application of Control Theory to Improve Capacity in a Clinical Setting.
The speakers on this session, from Banner Gateway Medical Center in Gilbert, Arizona, laid out a fairly standard use of analytics to predict when the hospital units are likely to exceed their capacity, and then to reschedule patients and provider schedules to smooth out the curve. The basic idea comes from chemical engineering, and requires them to monitor all the factors that lead patients to come in to the hospital and that determine how long they stay. Queuing theory can show when things are likely to get tight. Hospitals care a lot about these workflow issues, as Fred Trotter and David Uhlman discuss in the O'Reilly book Beyond Meaningful Use, and they have a real effect on patient care too.
The reason I find this topic interesting is that capacity planning leads fairly quickly to visible cost savings. So hospitals are likely to do it. Furthermore, once they go down the path of collecting long-term data and crunching it, they may extend the practice to clinical decision support, public health reporting, and other things that can make a big difference to patient care.
A few stats about data in U.S. health care
Do we need a big push to do such things? We sure do, and that's why meaningful use was introduced into HITECH sections of the American Recovery and Reinvestment Act. HHS released mounds of government health data on Health.data.gov hoping to serve a similar purpose. Let's just take a look at how far the United States is from using its health data effectively.
Last November, a CompTIA survey (reported by Health Care IT News) found that only 28% of providers have comprehensive EHRs in use, and another 17% have partial implementations. One has to remember that even a "comprehensive" EHR is unlikely to support the sophisticated data mining, information exchange, and process improvement that will eventually lead to lower costs and better care.
According to a recent Beacon Partners survey (PDF), half of the responding institutions have not yet set up an infrastructure for pursuing health information exchange, although 70% consider it a priority. The main problem, according to a HIMSS survey, is budget: HIEs are shockingly expensive. There's more to this story, which I reported on from a recent conference in Massachusetts.
Stats like these have to be considered when HIMSS board chair, Charlene S. Underwood, extolled the organization's achievements in the morning keynote. HIMSS has promoted good causes, but only recently has it addressed cost, interoperability, and open source issues that can allow health IT to break out of the elite of institutions large or sophisticated enough to adopt the right practices.
As signs of change, I am particularly happy to hear of HIMSS's new collaboration with Open Health Tools and their acquisition of the mHealth summit. These should guide the health care field toward more patient engagement and adaptable computer systems. HIEs are another area crying out for change.
An HIE optimist
With the flaccid figures for HIE adoption in mind, I met Charles Parisot, chair of Interoperability Standards and Testing Manager for EHRA, which is HIMSS's Electronic Health Records Association. The biggest EHR vendors and HIEs come together in this association, and Parisot was just stoked with positive stories about their advances.
His take on the cost of HIEs is that most of them just do it in a brute force manner that doesn't work. They actually copy the data from each institution into a central database, which is hard to manage from many standpoints. The HIEs that have done it right (notably in New York state and parts of Tennessee) are sleek and low-cost. The solution involves:
Keeping the data at the health care providers, and storing in the HIE only some glue data that associates the patient and the type of data to the provider.
Keeping all metadata about formats out to the HIE, so that new formats, new codes, and new types of data can easily be introduced into the system without recoding the HIE.
Breaking information exchange down into constituent parts--the data itself, the exchange protocols, identification, standards for encryption and integrity, etc.--and finding standard solutions for each of these.
So EHRA has developed profiles (also known by its ONC term, implementation specifications) that indicate which standard is used for each part of the data exchange. Metadata can be stored in the core HL7 document, the Clinical Document Architecture, and differences between implementations of HL7 documents by different vendors can also be documented.
A view of different architectures in their approach can be found in an EHRA white paper, Supporting a Robust Health Information Exchange Strategy with a Pragmatic Transport Framework. As testament to their success, Parisot claimed that the interoperability lab (a huge part of the exhibit hall floor space, and a popular destination for attendees) could set up the software connecting all the vendors' and HIEs' systems in one hour.
I asked him about the simple email solution promised by the government's Direct project, and whether that may be the path forward for small, cash-strapped providers. He accepted that Direct is part of the solution, but warned that it doesn't make things so simple. Unless two providers have a pre-existing relationship, they need to be part of a directory or even a set of federated directories, and assure their identities through digital signatures.
And what if a large hospital receives hundreds of email messages a day from various doctors who don't even know to whom their patients are being referred? Parisot says metadata must accompany any communications--and he's found that it's more effective for institutions to pull the data they want than for referring physicians to push it.
Intelligence for hospitals
Finally, Parisot told me EHRA has developed standards for submitting data to EHRs from 350 types of devices, and have 50 manufacturers working on devices with these standards. I visited a booth of iSirona as an example. They accept basic monitoring data such as pulses from different systems that use different formats, and translate over 50 items of information into a simple text format that they transmit to an EHR. They also add networking to devices that communicate only over cables. Outlying values can be rejected by a person monitoring the data. The vendor pointed out that format translation will be necessary for some time to come, because neither vendors nor hospitals will replace their devices simply to implement a new data transfer protocol.
For more about devices, I dropped by one of the most entertaining parts of the conference, the Intelligent Hospital Pavilion. Here, after a badge scan, you are somberly led through a series of locked doors into simulated hospital rooms where you get to watch actors in nursing outfits work with lifesize dolls and check innumerable monitors. I think the information overload is barely ameliorated and may be worsened by the arrays of constantly updated screens.
But the background presentation is persuasive: by using attaching RFIDs and all sorts of other devices to everything from people to equipment, and basically making the hospital more like a factory, providers can radically speed up responses in emergency situations and reduce errors. Some devices use the ISM "junk" band, whereas more critical ones use dedicated spectrum. Redundancy is built in throughout the background servers.
Waiting for the main event
The US health care field held their breaths most of last week, waiting for Stage 2 meaningful use guidelines from HHS. The announcement never came, nor did it come this morning as many people had hoped. Because meaningful use is the major theme of HIMSS, and many sessions were planned on helping providers move to Stage 2, the delay in the announcement put the conference in an awkward position.
HIMSS is also nonplussed over a delay in another initiative, the adoption of a new standard in the classification of disease and procedures. ICD-10 is actually pretty old, having been standardized in the 1980s, and the U.S. lags decades behind other countries in adopting it. Advantages touted for ICD-10 are:
It incorporates newer discoveries in medicine than the dominant standard in the U.S., ICD-9, and therefore permits better disease tracking and treatment.
Additionally, it's much more detailed than ICD-9 (with an order of magnitude more classifications). This allows the recording of more information but complicates the job of classifying a patient correctly.
ICD-10 is rather controversial. Some people would prefer to base clinical decisions on SNOMED, a standard described in the Beyond Meaningful Use book mentioned earlier. Ultimately, doctors lobbied hard against the HHS timeline for adopting ICD-10 because providers are so busy with meaningful use. (But of course, the goals of adopting meaningful use are closely tied to the goals of adopting ICD-10.) It was the pushback from these institutions that led HHS to accede and announce a delay. HIMSS and many of its members were disappointed by the delay.
In addition, there is an upcoming standard, ICD-11, whose sandal some say ICD-10 is not even worthy to lace. A strong suggestion that the industry just move to ICD-11 was aired in Government Health IT, and the possibility was raised in Health Care IT News as well. In addition reflecting the newest knowledge about disease, ICD-11 is praised for its interaction with SNOMED and its use of Semantic Web technology.
That last point makes me a bit worried. The Semantic Web has not been widely adopted, and if people in the health IT field think ICD-10 is complex, how are they going to deal with drawing up and following relationships through OWL? I plan to learn more about ICD-11 at the conference.
February 17 2012
Documentation strategy for a small software project: launching VoIP Drupal introductions
VoIP Drupal is a window onto the promises and challenges faced by a new open source project, including its documentation. At O'Reilly, we've been conscious for some time that we lack a business model for documenting new collaborative projects--near the beginning, at the stage where they could use the most help with good materials to promote their work, but don't have a community large enough to support a book--and I joined VoIP Drupal to explore how a professional editor can help such a team.
Small projects can reach a certain maturity with poor and sparse document. But the critical move from early adopters to mainstream requires a lot more hand-holding for prospective users. And these projects can spare hardly any developer time for documentation. Users and fans can be helpful here, but their documentation needs to be checked and updated over time; furthermore, reliance on spontaneous contributions from users leads to spotty and unpredictable coverage.
Large projects can hire technical writers, but what they do is very different from traditional documentation; they must be community managers as well as writers and editors (see Anne Gentle's book Conversation and Community: The Social Web for Documentation). So these projects can benefit from research into communities also.
I met at the MIT Media Lab this week with Leo Burd, the inventor of VoIP Drupal, and a couple other supporters, notably Micky Metts of DrupalConnection.com. We worked out some long-term plans for firming up VoIP Drupal's documentation and other training materials. But we also had to deal with an urgent need for materials to offer at DrupalCon, which begins in just over one month.
Challenges
One of the difficulties of explaining VoIP Drupal is that it's just so versatile. The foundations are simple:
A thin wrapper around PHP permits developers to write simple scripts that dial phone numbers, send SMS messages, etc. These scripts run on services that initiate connections and do translation between voice and text (Tropo, Twilio, and the free Plivo are currently supported).
Administrators on Drupal sites can use the Drupal interface to configure VoIP Drupal modules and add phone/SMS scripts to their sites.
Content providers can use the VoIP Drupal capabilities provided by their administrators to do such things as send text messages to site users, or to enable site users to record messages using their phone or computer.
Already you can see one challenge: VoIP Drupal has three different audiences that need very different documentation. In fact, we've thought of two more audiences: decision-makers who might build a business or service on top of VoIP Drupal, and potential team members who will maintain and build new features.
Some juicy modules built on top of VoIP Drupal's core extend its versatility to the point where it's hard to explain on an elevator ride what VoIP Drupal could do. Leo tosses out a few ideas such as:
Emergency awareness systems that use multiple channels to reach out to a population who live in a certain area. That would require a combination of user profiling, mapping and communication capabilities tend to be extremely hard to put together under one single package.
Community polling/voting systems that are accessible via web, SMS, email, phone, etc.
CRM systems that keep track (and even record) phone interactions, organize group conference calls with the click of a button, etc.
Voice-based bulletin boards.
Adding multiple authentication mechanisms to a site.
Sending SMS event notifications based on Google Calendars.
In theory you could create a complete voice and SMS based system out of VoIP Drupal and ignore the web site altogether, but that would be a rather cumbersome exercise. VoIP Drupal is well-suited to integrating voice and the Web--and it leaves lots of room for creativity.
Long-term development
A community project, we agreed, needs to be incremental and will result in widely distributed documents. Some people like big manuals, but most want a quickie getting-started guide and then lots of chances to explore different options at their own pace. Communities are good for developing small documents of different types. The challenge is finding someone to cover any particular feature, as well as to do the sometimes tedious work of updating the document over time.
We decided that videos would be valuable for the administrators and content providers, because they work through graphical interfaces. However, the material should also be documented in plain text. This expands access to the material in two ways. First, VoIP Drupal may be popular in part of the world where bandwidth limitations make it hard to view videos. Second, the text pages are easier to translate into other languages.
Just as a video can be worth a thousand words, working scripts can replace a dozen explanations. Leo will set up a code contribution site on Github. This is more work than it may seem, because malicious or buggy scripts can wreak havoc for users (imagine someone getting a thousand identical SMS messages over the course of a single hour, for instance), so contributions have to be vetted.
Some projects assign a knowledgeable person or two to create an outline, then ask community members to fill it in. I find this approach too restrictive. Having a huge unfilled structure is just depressing. And one has to grab the excitement of volunteers wherever it happens to land. Just asking them to document what they love about a project will get you more material than presenting them with a mandate to cover certain topics.
But then how do you get crucial features documented? Wait and watch forums for people discussing those features. When someone seems particularly knowledgeable and eager to help, ask him or her for a longer document that covers the feature. You then have to reward this person for doing the work, and a couple ways that make sense in this situation include:
Get an editor to tighten up the document and work with the author to make a really professional article out of it.
Highlight it on your web site and make sure people can find it easily. For many volunteers, seeing their material widely used is the best reward.
We also agreed that we should divide documentation into practical, how-to documents and conceptual documents. Users like to grab a hello-world document and throw together their first program. As they start to shape their own projects, they realize they don't really understand how the system fits together and that they need some background concepts. Here is where most software projects fail. They assume that the reader understands the reasoning behind the design and knows how best to use it.
Good conceptual documentation is hard to produce, partly because the lead developers have the concepts so deeply ingrained that they don't realize what it is that other people don't know. Breaking the problems down into small chunks, though, can make it easier to produce useful guides.
Like many software projects, VoIP Drupal documentation currently starts the reader off with a list of modules. The team members liked an idea of mine to replace these with brief tutorials or use cases. Each would start with a goal or question (what the reader wants to accomplish) and then introduce the relevant module. In general, given the flexibility of VoIP Drupal, we agreed we need a lot more "why and when" documentation.
Immediate preparations
Before we take on a major restructuring and expansion of documentation, though, we have a tight deadline for producing some key videos and documents. Leo is going to lead a development workshop at DrupalCon, and he has to determine the minimum documentation needed to make it a productive experience. He also wants to do a webinar on February 28 or 29, and a series of videos on basic topics such as installing VoIP Drupal, a survey of successful sites using it, and a nifty graphical interface called Visual VoIP Drupal. Visual VoIP Drupal, which will be released in a few weeks, is one of the new features Leo would like to promote in order to excite users. It lets a programmer select blocks and blend them into a script through a GUI, instead of typing all the code.
The next few weeks will bring a flurry of work to realize our vision.
January 20 2012
Developer Week in Review: Early thoughts on iBooks Author
One down, two to go, Patriots-wise. Thankfully, this week's game is on Sunday, so it doesn't conflict with my son's 17th birthday on Saturday. They grow up so quickly; I can remember him playing with his Comfy Keyboard, now he's writing C code for robots.
A few thoughts on iBooks Author and Apple's textbook move
Thursday's Apple announcement of Apple's new iBooks Author package isn't developer news per se, but I thought I'd drop in a few initial thoughts before jumping into the meat of the WIR because it will have an impact on the community in several ways.
Most directly, it is another insidious lock-in that Apple is wrapping inside a candy-covered package. Since iBooks produced with the tool can only be viewed in full on iOS devices, textbooks and other material produced with iBooks Author will not be available (at least in the snazzy new interactive form) on Kindles or other ereaders. If Apple wanted to play fair, it should make the new iBooks format an open standard. Of course, this would cut Apple out of its cut of the royalties as well as yielding the all-important control of the user experience that Steve Jobs installed as a core value in the company.
On a different level, this could radically change the textbook and publishing industry. It will make it easier to keep textbooks up to date and start to loosen the least-common-denominator stranglehold that huge school districts have on the textbook creation process. On the other hand, I can see a day when pressure from interest groups results in nine different textbooks being used in the same class, one of which ignores evolution, one of which emphasizes the role of Antarctic-Americans in U.S. history, etc.
It's also another step in the disintermediation of publishing since the cost of getting your book out to the world just dropped to zero (not counting proofreading, indexing, editing, marketing, and all the other nice things a traditional publisher does for a writer). I wonder if Apple is going to enforce the same puritanical standards on iBooks as they do on apps. What are they going to do when someone submits a My Little Pony / Silent Hill crossover fanfic as an iBook?
Another item off my bucket list
I've been to Australia. I've had an animal cover book published. And now I've been called a moron (collectively) by Richard Stallman.
The occasion was the previously mentioned panel on the legacy of Steve Jobs, on which I participated this previous weekend. As could have been expected, Stallman started in describing Jobs as someone who the world would have been better off without. He spent the rest of the hour defending the position that it doesn't matter how unusable the free alternative to a proprietary platform is, only that it's free. When we disagreed, he shouted us down as "morons."
As I've mentioned before, that position makes a few invalid assumptions. One is that people's lives will be better if they use a crappy free software package over well-polished commercial products. In reality, the perils of commercial software that Stallman demonizes so consistently are largely hypothetical, whereas the usability issues of most consumer-facing free software are very real. For the 99.999% of people who aren't software professionals, the important factor is whether the darn thing works, not if they can swap out an internal module.
The other false premise at play here is that companies are Snidely Whiplash wanna-bes that go out of their way to oppress the masses. Stallman, to his credit as a savvy propagandist, has co-opted the slogans of the Occupy Wall Street movement, referring to the 1% frequently. The reality is that when companies try to pull shady stunts, especially in the software industry, they usually get caught and have to face the music. Remember the furor over Apple's allegedly accidental recording of location data on the iPhone? Stallman's dystopian future, where corporations use proprietary platforms as a tool of subjugation, has pretty much failed every time it's actually been tried on the ground. I'm not saying corporations are angels, or even that they have the consumer's best interests in mind, it's just that they aren't run by demonic beings that eat babies and plot the enslavement of humanity.
Achievement unlocked: Erased user's hard drive
Sometimes life as a software engineer may seem like a game, but Microsoft evidently wants to turn it into a real one. The company has announced a new plug-in for Visual Studio that lets you earn achievements for coding practices and other developer-related activities.
Most of them are tongue in cheek, but I'm terrified that we may start seeing these achievements in live production code as developers compete to earn them all. Among the more fear-inspiring:
- "Write 20 single letter class-level variables in one file. Kudos to you for being cryptic!"
- "Write a single line of 300 characters long. Who needs carriage returns?"
- "More than 10 overloads of a method. You could go with this or you could go with that."
Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.Save 20% on registration with the code RADAR20
Got news?
Please send tips and leads here.
Related:
- Apple's mind-bogglingly greedy and evil license agreement
- A Closer Look at iBooks Author, Textbooks and Exclusivity
- FOSS isn't always the answer
- Master a new skill? Here's your badge
- More Developer Week in Review coverage
December 01 2011
Could closed core prove a more robust model than open core?
When participating recently in a sprint held at Google to document four free software projects, I thought about what might have prompted Google to invest in this effort. Their willingness to provide a hotel, work space, and food for some thirty participants, along with staff support all week long, demonstrates their commitment to nurturing open source.
Google is one of several companies for which I'll coin the term "closed core." The code on which they build their business and make their money is secret. (And given the enormous infrastructure it takes to provide a search service, opening the source code wouldn't do much to stimulate competition, as I point out in a posting on O'Reilly's radar blog). But they depend on a huge range of free software, ranging from Linux running on their racks to numerous programming languages and libraries that they've drawn on to develop their services.
So Google contributes a lot back to the free software community. The release code for many non-essential functions. They promote the adoption of standards such as HTML 5. They have been among the first companies to offer APIs for important functions, including their popular Google Maps. They have opened the source code to Android (although its development remains under their control), which has been the determining factor in making Android devices compete with the arguably more highly-functioning iOS products. They even created a whole new programming language (Go) and are working on another.
Google is not the only "closed core" company (for instance, Facebook has also built their service around APIs and released their Cassandra project). Microsoft has a whole open source program, including some important contributions to health IT. Scads of other companies, such as IBM, Hewlett Packard, and VMware, have complex relationships to open source software that don't fit a simple "open core" or "closed core" model. But the closed core trend represents a fertile collaboration between communities and companies that have businesses in specific areas. The closed core model requires businesses to determine where their unique value lies and to be generous in offering the public extra code that supports their infrastructure but does not drive revenue.
This model may prove more robust and lasting than open core, which attracts companies occupying minor positions in their industries. The shining example of open core is MySQL, but its complex status, including a long history of dual licensing and simultaneous development by several organizations, make it a difficult model from which to draw lessons about the whole movement. In particular, Software as a Service redefines the relationships that the free software movement has traditionally defined between open and proprietary. Deploying and monitoring the core SaaS software creates large areas for potential innovation, as we saw with Cassandra, where a company can benefit from turning their code into a community project.
November 23 2011
Intellectual Property Strategy: a book, a panel, and a movement
I attended one of Berkman's panels of leading thinkers at Harvard on Monday, and picked up (legitimately) a copy of John Palfrey's new book, Intellectual Property Strategy. The speakers on Monday, who included household names of the free culture movement such as Lawrence Lessig and Eric von Hippel, emphasized the culture shift that is breaking the seemingly iron grip of current policies that favor wealthy companies with portfolios of patents and copyrights. But I think even these speakers failed to convey how huge a sea change in underway.
The general tone was regret for the attitude among most institutions--including universities and other non-profits with public missions--that keeps information closed by default. This is what Palfrey calls the notion of intellectual property as "sword and shield," mostly of value either to launch lawsuits or discourage others from doing so. Law professor Jonathan Zittrain (whose book The Future of the Internet I reviewed at its release) pointed out that when Harvard professors wanted to offer cyberseminars, the provost first opposed it, then insisted they change the term to "an online lecture and discussion series." In the wake of Stanford's popular online AI class, Harvard may look more benignly on these efforts to reach out beyond its privileged campus in the future.
Lessig said that, at the beginning of the free culture movement, an observer would expect it to make its first inroads among the educational communities and universities, while those who make a business out of their IP (artists, film-makers, musicians) would be the last to grudgingly go along. Instead, the opposite has happened. Large numbers of creative people are experimenting with Creative Commons and shared projects, while the educators have been slow to see the light.
And Terry Fisher, director of Berkman, reported that in his world travels to teach about intellectual property, he has found wide gaps between chief officers in their understanding and acceptance of the idea that sharing and openness can be valuable. Those who have benefitted a lot from the current regime have trouble seeing any alternative. And while he assumed in the past that big IP holders would never change until forced to, he now has a softer view and thinks that education can move them in the right direction.
And that's Palfrey goal. He wants to modulate the sword-and-shield approach--although not jettisoning it entirely-- with a more positive view that looks for ways to improve an organization's bottom line or mission by sharing, either through an open license or more nuanced arrangements.
His book therefore blends two perspectives on the intellectual property scene, in sometime awkward angles reminiscent of a Picasso portrait. On the one hand, he encourages all organizations--including non-profits, who are usually left out of IP texts--to carry out knowledge management, consider where they are innovating intellectually, and think about how licensing can spread best practices while bringing in extra revenue. This is not a new idea to intellectual property lawyers and other such specialists, but it's valuable for the CxO-level readers at which this book is aimed.
On the other hand, Palfrey is a fan of freedom, which can range from a classic open source license to such midway strategies as providing an API to data or asking one's customers to design products. These are all familiar practices to free culture and free software advocated. And although Palfrey acknowledges that traditional intellectual property conflicts with the ideal of openness, he spends much of the book trying to harmonize the two approaches, a noble quest that I'm not sure brings back the grail.
The book could also provide fodder for Richard Stallman and others who deride the term "intellectual property" altogether. What thread ties together licensing a Disney character to put on a children's product, patenting a medicine, and releasing music on iTunes? Nothing, to my point of view, except the very word "intellectual property." Yet they all are covered in this book, requiring it to stay at a high level of abstraction.
Still, the basic messages in the book hold up, and MIT Press is doing some interesting experiments with the book itself. They are releasing an iPad app based on it (later to be ported to Android) and setting up a web site for case studies, video interviews, and perhaps ultimately comments from readers. Many of us at the forum encouraged this exploration of reader interactivity, and I asked, "When does this stop being a book and stop being an app, and become a movement"?
Monday's panel also honored Palfrey for his long work at Harvard and congratulated him on a move that surprised everyone at Berkman: he is leaving at the end of the semester to become head of the prep school, Phillips Academy. Here he will push forward innovation in another area that excites him, the effect of digital media on the education and development of young people. He researched these ideas in his book Born Digital, which I reviewed on Radar. Palfrey assured me on Monday that he would not eliminate all print books from the Phillips Academy library, as Cushing Academy did a couple years ago.
November 21 2011
VoIP Drupal reaches out to the developing world
I don't know why so few of us turned up on Saturday for the VoIP Drupal hackathon. As a way to integrate voice and SMS into a Drupal site, the VoIP modules form a door throught which Drupal can move into a vast world of touch tone telephones, smart telephones, and text messaging, and therefore toward integrating a huge range of users in developing regions who use those technologies instead of desktop or laptop computers. Perhaps Boston isn't the right place or November the right month for a workshop (although the weather was quite nice), but just four of us gathered to get the low-down on VoIP Drupal from Leo Burd, a research scientist at MIT's Media Lab and Center for Civic Media.
Together with just a couple other developers, he is putting together modules that support Twilio and Tropo, two cloud platforms that are highly scalable and provide telephone and SMS capabilities accessible from different countries. For cases where those services are not available or desirable, VoIP Drupal provides support for Free/SWITCH, an open source telephony platform, via the Plivo communication framework/API.
Most of the time we played with the scripting language using the VoIP Drupal sandbox. The scripting language is a domain-specific language for VoIP built on top of Drupal's module language, PHP. It has about 15 commands to create interactive calls doing such things as recording and playing back audio, handling input from the telephone keypad, managing conference calls, and sending and receiving SMS messages. A trivial script I created went like this:
$script->addSetVoice('woman'); $script->addSetLanguage('fr');
$script->addSay('Voiçi un message. Ne répondez pas.');
$script->addHangup();
(The scripting language requires you to create the $script object first, but the sandbox does that for you silently.) When I played this back, I got a pretty authentic sounding Parisian voice, having even the suitably cavalier tone when she told me not to talk in return (although she was tolerant of my spelling mistakes).
Of course, much richer applications are available through the scripting language. It is mostly linear, although you can define and call subroutines, you can set and retrieve variables, and there is a primitive assembly-language-like statement that lets you branch to a label based on a condition. Furthermore, the modules' full power is available through a PHP API. The Drupal administration menu allows you to specify a script to play when the site makes an outgoing call, a script to play when someone calls the site's phone number, and a script to play when someone sends a text message to the site's phone number.
Burd showed off a site put together by a non-profit in Dorchester (a low-income area of Boston, Mass.) together with the MIT Center for Civic Media. A group of young students recorded some descriptions of nearby locations of interest. These locations display plaques with a phone number for the web site and an extension unique for their location. Someone dialing in hears the message and is invited to record his or her own opinions or stories about that part of the city. Attendees today were so impressed that they said, if this application could be released as a drop-in module, it would boost the use of the VoIP modules immediately.
Just a few of the many uses for VoIP in Drupal include:
The equivalent of mass mailings via voice calls and SMS, so you can send messages, for instance, to people who sign up for political campaigns
Letting visitors leave voice mail or add verbal comments to the site
Embedding a phone interface on the web page so people can make VoIP calls directly from your site, with no extra stand-alone software such as Skype
Providing a conference call service through your site
Letting people sign up for groups to receive SMS messages on chosen topics of interest
More modules are under construction; an overview is available on the Drupal web site. A messaging module lets you send a voice message that is delivered in by phone, email, or SMS, as preferred by the site's visitor. The modules are developed for Drupal version 6, but Burd plans to create Drupal 7 modules as soon as the version 6 ones reach their 1.0 release, depending on interest from the community.
Of the underlying services supported, Twilio offers voice generation for four languages. Tropo supports voice generation for 24 languages, and can also do speech-to-text. Both of those companies have been very friendly to the VoIP Drupal project and promote it at Drupal conferences. Free/SWITCH and Plivo require you to do a lot of the work that Twilio and Tropo will do for you. But Free/SWITCH is useful for areas without Twilio or Tropo support, and for high volume use because it tends to cost less under those circumstances. Free/SWITCH also give the programmer more control over the server and allows you to run everything from the same box. Overall, VoIP Drupal represents another step toward an Internet where communicating by voice is taken for granted.
October 21 2011
Wrap-up from FLOSS Manuals book sprint at Google
At several points during this week's documentation sprint at Google, I talked with the founder of FLOSS Manuals, Adam Hyde, who developed the doc sprint as it is practiced today. Our conversation often returned to the differences between the group writing experience we had this week and traditional publishing. The willingness of my boss at O'Reilly Media to send me to this conference shows how interested the company is learning what we might be able to take from sprints.
Some of the differences between sprints and traditional publishing are quite subtle. The collaborative process is obviously different, but many people outside publishing might not realize just how deeply the egoless collaboration of sprints flies in the face of the traditional publishing model. The reason is that publishing has long depended on the star author. In whatever way a person becomes this kind a star, whether by working his way up the journalism hierarchy like Thomas Friedman or bursting on the scene with a dazzling person story like Greg Mortenson (author of Three Cups of Tea), stardom is almost the only way to sell books in profitable numbers. Authors who use the books themselves to build stardom still need to keep themselves in the public limelight somehow. Without colorful personalities, the publishing industry needs a new way to make money (along with Hollywood, television, and pop music).
But that's not the end of differences. Publishers also need to promise a certain amount of content, whereas sprinters and other free documentation projects can just put out what they feel like writing and say, "If you want more, add it." Traditional publishing will alienate readers if books come out with certain topics missing. Furthermore, if a book lacks a popular topic that a competitor has, the competitor will trounce the less well-endowed book in the market. So publishers are not simply inventing needs to maintain control over the development effort. They're not exerting control just to tamp down on unauthorized distribution or something like that. When they sell content, users have expectations that publishers strive to meet, so they need strong control over the content and the schedule for each book.
But O'Reilly, along with other publishers across the industry, is trying to change expectations. The goal of comprehensiveness conflicts with another goal, timeliness, that is becoming more and more important. We're responding in three ways that both bring us closer to what FLOSS Manuals is doing: we put out "early releases" containing parts of books that are underway, we sign contracts for projects on limited topics that are very short by design, and we're experimenting with systems that are even closer to the FLOSS Manuals system, allowing authors to change a book at whim and publish a new version immediately.
Although FLOSS Manuals produces free books and gets almost none of its funding from sales (the funding comes from grants and from the sponsors of sprints), the idea of sprinting is still compatible with traditional publishing, in which sales are the whole point. Traditional publishers tend to give several thousand dollars to authors in the form of advances, and if the author takes several months to produce a book, we don't see the royalties that pay us back for that investment for a long time. Why not spend a few thousand dollars to bring a team of authors to a pleasant but distraction-free location (I have to admit that Google headquarters is not at all distraction-free) and pay for a week of intense writing?
Authors would probably find it much more appealing to take a one-week vacation and say good-bye to their families for this time than to spend months stealing time on evenings and weekends and apologizing for not being fully present.
The problem, as I explained in my first posting this week, is that you never quite know what you're going to get from a sprint. In addition, the material is still rough at the end of a week and has to absorb a lot of work to rise to the standards of professional publishing. Still, many technical publishers would be happy to get over a hundred pages of relevant material in a single week.
Publishers who fail to make documents free and open might be more disadvantaged when seeking remote contributions. Sprints don't get many contributions from people outside the room where it is conducted, but sometimes advice and support weigh in on some critical, highly technical point. The sprints I have participated in (sometimes remotely) benefited from answers that came out of the cloud to resolve difficult questions. For instance, one commenter on this week's KDE conference warned us we were using product names all wrong and had us go back through the book to make sure our branding was correct.
Will people offer their time to help authors and publishers develop closed books? O'Reilly has put books online during development, and random visitors do offer fixes and comments. There is some good will toward anyone who wants to offer guidance that a community considers important. But free, open documents are likely to draw even more help from crowdsourcing.
At the summit today, with the books all wrapped up and published, we held a feedback session. The organizers asked us our opinions on the sprint process, the writing tools, and how to make the sprint more effective. Our facilitator raised three issues that, once again, reminded me of the requirements of traditional publishing:
-
Taking long-term responsibility for a document. How does one motivate people to contribute to it? In the case of free software communities, they need to make updates a communal responsibility and integrate the document into their project life cycle just like the software.
Promoting the document. Without lots of hype, people will not notice the existence of the book and pick it up. Promotion is pretty much the same no matter how content is produced (social networking, blogging, and video play big roles nowadays), but free books are distinguished by the goal of sharing widely without concern for authorial control or payment. Furthermore, while FLOSS Manuals is conscious of branding, it does not use copyright or trademarks to restrict use of graphics or other trade dress.
Integrating a document into a community. This is related to both maintenance and promotion. But every great book has a community around it, and there are lots of ways people can use them in training and other member-building activities. Forums and feedback pages are also important.
Over the past decade, a system of information generation has grown up in parallel with the traditional expert-driven system. In the old system everyone defers to an expert, while in the new system the public combines its resources. In the old system, documents are fixed after publication, whereas in the new system they are fluid. The old system was driven by the author's ego and increasingly by the demand for generating money, whereas the new system has revenue possibilities but has a strong sense of responsibility for the welfare of communities.
Mixtures of grassroots content generation and unique expertise have existed (Homer, for instance) and more models will be found. Understanding the points of commonality between the systems will help us develop such models.
(All my postings from this sprint are listed in a bit.ly bundle.)
FLOSS Manuals books published after three-day sprint
The final day of the FLOSS Manuals documentation sprint at Google began with a bit of a reprieve from Sprintmeister Adam Hyde's dictum that we should do no new writing. He allowed us to continue work till noon, time that the KDE team spent partly in heated arguments over whether we had provided enough coverage of key topics (the KDE project architecture, instructions for filing bug reports, etc.), partly in scrutinizing dubious material the book had inherited from the official documentation, and (at least a couple of us) actually writing material for chapters that readers may or may not find useful, such as a glossary.
I worried yesterday that the excitement of writing a complete book would be succeeded by the boring work of checking flow and consistency. Some drudgery was involved, but the final reading allowed groups to revisit their ways of presenting concepts and bringing in the reader.
Having done everything I thought I could do for the KDE team, I switched to OpenStreetMap, who produced a short, nicely paced, well-illustrated user guide. I think it's really cool that Google, which invests heavily in its own mapping service, helps OpenStreetMap as well. (They are often represented in Google Summer of Code.)
After dinner we started publishing our books. The new publication process at FLOSS Manuals loads the books not only to the FLOSS Manuals main page but to Lulu for purchase.

Publishing books at doc sprint
Joining the pilgrimage that all institutions are making toward wider data use, FLOSS Manuals is exposing more and more of the writing process. As described by founder Adam Hyde in a blog posting today, Visualising your book, recently added tools that help participants and friends follow the progress of the book (you can view a list of chapters edited on an RSS feed, for instance) and get a sense of what was done. For instance, a timeline with circles representing chapter edits shows you which chapters had the most edits and when activity took place. (Pierre Commenge created the visualization for FLOSS Manuals.)

Participants at doc sprint
(All my postings from this sprint are listed in a
href="https://bitly.com/bundles/praxagora/4">bit.ly bundle.)
October 20 2011
Day two of FLOSS Manuals book sprint at Google Summer of Code summit
We started the second day of the FLOSS Manuals sprint with a circle encounter where each person shared some impressions of the first day. Several reported that they had worked on wikis and other online documentation before, but discovered that doing a book was quite different (I could have told them that, of course). They knew that a book had to be more organized, and offer more background than typical online documentation. More fundamentally, they felt more responsibility toward a wider range of readers, knowing that the book would be held up as an authority on the software they worked on and cared so much about.
We noted how gratifying it was to get questions answered instantly and be able to go through several rounds of editing in just a couple minutes. I admitted that I had been infected with the enthusiasm of the KDE developers I was working with, but had to maintain a bit of critical distance, an ability to say, "Hey, you're telling me this piece of software is wonderful, but I find its use gnarly and convoluted."
As I explained in Monday's posting, all the writing had to fit pretty much into two days. Each of the four teams started yesterday by creating an outline, and I'm sure my team was not the only one to revise it constantly throughout the day.

Circle at beginning of the day
Today, the KDE team took a final look at the outline and discussed everything we'd like to add to it. We pretty much finalized it early int the day and just filled in the blanks for the next eleven hours. I continued to raise flags about what I felt were insufficiently detailed explanations, and got impatient enough to write a few passages of my own in the evening.

Celebrating our approach to the end of the KDE writing effort
The KDE book is fairly standard developer documentation, albeit a beginner's guide with lots of practical advice about working in the KDE environment with the community. As a relatively conventional book, it was probably a little easier to write (but also probably less fun) than the more high-level approaches taken by some other teams that were trying to demonstrate to potential customers that their projects were worth adopting. Story-telling will be hard to detect in the KDE book.
And we finished! Now I'm afraid we'll find tomorrow boring, because we won't be allowed (and probably won't need) to add substantial new material. Instead, we'll be doing things like checking everything for consistency, removing references to missing passages, adding terms to the glossary, and other unrewarding slogs through a document that is far too familiar to us already. The only difference between the other team members and me is that I may be assigned to do this work on some other project.
(All my postings from this sprint are listed in a bit.ly bundle.)
October 19 2011
Day one of FLOSS Manuals book sprint at Google Summer of Code summit
Four teams at Google launched into endeavors that will lead, less than 72 hours from now, to complete books on four open source projects (KDE, OpenStreetMap, OpenMRS, and Sahana Eden). Most participants were recruited on the basis of a dream and a promise, so going through the first third of our sprint was eye-opening for nearly everybody. Although I had participated in one sprint before on-site and two sprints remotely, I found that the modus operandi has changed so much during the past year of experimentation that I too had a lot to learn.
Our doc sprint coordinator, Adam Hyde, told each team to spend an hour making an outline. The team to which I was assigned, KDE, took nearly two, and part way through Adam came in to tell us to stop because we had enough topics for three days of work. We then dug in to filling in the outline through a mix of fresh writing and cutting and pasting material from the official KDE docs. The latter required a complete overhaul, and naturally proved often to be more than a year out of date.

KDE team at doc sprint
The KDE team's focus on developer documentation spared them the open-ended discussions over scope that the other teams had to undergo. But at key points during the writing, we still were forced to examine passages that appeared too hurried and unsubstantiated, evidence of gaps in information. At each point we had to determine what the hidden topics were, and then whether to remove all references to them (as we did, for instance, on the topic of getting permission to commit code fixes) or to expand them into new chapters of their own (as we did for internationalization). The latter choice created a dilemma of its own, because none of the team members present had experience with internationalization, so we reached out and tried to contact remote KDE experts who could write the chapter.
The biggest kudos today go to Sahana Eden, I think. I reported yesterday that the team expressed deep difference of opinion about the audience they should address and how they should organize their sprint. Today they made some choices and got a huge amount of documentation down on the screen. Much of it was clearly provisional (they were boo'ed for including so many bulleted lists) but it was evidence of their thinking and a framework for further development.

Sahana team at doc sprint
My own team had a lot of people with serious jet lag, and we had some trouble going from 9:00 in the morning to 9:30 at night. But we created (or untangled, as the case may be) some 60 pages of text. We reorganized the book at least once per hour, a process that the FLOSS Manuals interface makes as easy as drag and drop. A good heuristic was to choose a section title for each group of chapters. If we couldn't find a good title, we had to break up the group.
The end of the day brought us to the half-way mark for writing. We ares told we need to complete everything at the end of the evening tomorrow and spend the final day rearranging and cleaning up text. More than a race against time, this is proving to be a race against complexity.

Topics for discussion at doc sprint
October 18 2011
FLOSS Manuals sprint starts at Google Summer of Code summit
Five days of intense book production kicked off today at the FLOSS Manuals sprint, hosted by Google. Four free software projects have each sent three to five volunteers to write books about the projects this week. Along the way we'll all learn about the group writing process and the particular use of book sprints to make documentation for free software.
I came here to provide whatever editorial help I can and to see the similarities and differences between conventional publishing and the intense community effort represented by book sprints. I plan to spend time with each of the four projects, participating in their discussions and trying to learn what works best by comparing what they bring in the way of expertise and ideas to their projects. All the work will be done out in the open on the FLOSS Manuals site for the summit, so you are welcome also to log in and watch the progress of the books or even contribute.
A book in a week sounds like a pretty cool achievement, whether for a free software projects or a publisher. In fact, the first day (today) and last day of the sprint are unconferences, so there are only three days for actual writing. The first hour tomorrow will be devoted to choosing a high-level outline for each project, and then they will be off and running.
And there are many cautions about trying to apply this model to conventional publishing. First, the books are never really finished at the end of the sprint, even though they go up for viewing and for sale immediately. I've seen that they have many rough spots, such as redundant sections written by different people on the same topic, and mistakes in cross-references or references to non-existent material. Naturally, they also need a copy-edit. This doesn't detract from the value of the material produced. It just means they need some straightening out to be considered professional quality.
Books that come from sprints are also quite short. I think a typical length is 125 pages, growing over time as follow-up sprints are held. The length also depends of course on the number of people working on the sprint. We have the minimum for a good sprint here at Google, because the three to five team members will be joined by one or two people like me who are unaffiliated.
Finally, the content of a sprint book is decided on an ad hoc basis. FLOSS Manuals founder Adam Hyde explained today that his view of outlining and planning has evolved considerably. He quite rationally assumed at first that very book should have a detailed outline before the sprint started. Then he found that one could not impose an outline on sprinters, but had to let them choose subjects they wanted to cover. Each sprinter brings certain passions, and in such an intense environment one can only go with the flow and let each person write what interests him or her. Somehow, the books pull together into a coherent product, but one cannot guarantee they'll have exactly what the market is asking for. I, in fact, was involved in the planning of a FLOSS Manuals sprint for the CiviCRM manual (the first edition of a book that is now in its third) and witnessed the sprinters toss out an outline that I had spent weeks producing with community leaders.
So a sprint is different in every way from a traditional published manual, and I imagine this will be true for community documentation in general.
The discussions today uncovered the desires and concerns of the sprinters, and offered some formal presentations to prepare us, we hope, for the unique experience of doing a book sprint. The concerns expressed by sprinters were pretty easy to anticipate. How does one motivate community members to write? How can a project maintain a book in a timely manner after it is produced? What is the role of graphics and multimedia? How does one produce multiple translations?
Janet Swisher, a documentation expert from Austin who is on the board of FLOSS Manuals, gave a presentation asking project leaders to think about basic questions such as why a user would use their software and what typical uses are. Her goal was to bring home the traditional lessons of good writing: empathy for a well-defined audience. "If I had a nickel for every web site I've visited put up by an open source project that doesn't state what the software is for..." she said. That's just a single egregious instance of the general lack of understanding of the audience that free software authors suffer from.
Later, Michael McAndrew of the CiviCRM project took us several steps further along the path, asking what the project leaders would enjoy documenting and "what would be insane to leave out." I sat with the group from Sahana to watch as they grappled with the pressures these questions created. This is sure one passionate group of volunteers, caring deeply about what they do. Splits appeared concerning how much time to devote to high-level concepts versus practical details, which audiences to serve, and what to emphasize in the outline. I have no doubt, however, listening to them listen to each other, that they'll have their plan after the first hour tomorrow and will be off and running.
October 16 2011
BioCurious opens its lab in Sunnyvale, CA
When I got to the BioCurious lab yesterday evening, they were just cleaning up some old coffee makers. These, I learned, had been turned into sous vide cookers in that day's class.

New lab at BioCurious
Sous vide cookers are sort of the gourmet rage at the moment. One normally costs several hundred dollars, but BioCurious offered a class for $117 where seventeen participants learned to build their own cookers and took them home at the end. They actually cooked steak during the class--and I'm told that it come out very good--but of course, sous vide cookers are also useful for biological experiments because they hold temperatures very steady.
The class used Arduinos to provide the temperature control for the coffee pots and other basic hardware, so the lesson was more about electronics than biology. But it's a great illustration of several aspects of what BioCurious is doing: a mission of involving ordinary people off the street in biological experiments, using hands-on learning, and promoting open source hardware and software.
Other classes have taught people to insert dyes into cells (in order to teach basic skills such as pipetting), to run tests on food for genetically modified ingredients, and to run computer analyses on people's personal DNA sequences. The latter class involved interesting philosophical discussions about how much to trust their amateur analyses and how to handle potentially disturbing revelations about their genetic make-up. All the participants in that class got their sequencing done at 23andme first, so they had sequences to work with and could compare their own work with what the professionals turned up.
Experiments at BioCurious are not just about health. Synthetic biologists, for instance, are trying a lot of different ways to create eco-friendly synthetic fuels.
BioCurious is not a substitute for formal training in biochemistry, biology, and genetics. But it is a place for people to get a feel for what biologists do and for real biologists without access to expensive equipment to do research of their dreams.
In a back room (where I was allowed to go after being strenuously warned not to touch anything--BioCurious is an official BSL 1 facility, and they're lucky the city of Sunnyvale allowed them to open), one of the staff showed a traditional polymerase chain reaction (PCR) machine, which costs several thousand dollars and is critical for sequencing DNA.

Traditional commercial PCR
A couple BioCurious founders analyzed the functions of a PCR and, out of plywood and off-the-shelf parts, built an OpenPCR with open hardware specs. At $599, OpenPCR opens up genetic research to a far greater audience.

BioCurious staffer with OpenPCR
How low-budget is BioCurious? After meeting for a year in somebody's garage, they finally opened this space three weeks ago with funds raised through Kickstarter. All the staff and instructors are volunteers. They keep such a tight rein on spending that a staffer told me they could keep the place open by teaching one class per week. Of the $117 students spent today for their five-hour class, $80 went to hardware.
BioCurious isn't unique (a similar space has been set up in New York City, and some movements such as synthetic biology promote open information), but it's got a rare knack for making people comfortable with processes and ideas that normally put them off. When executive director Eri Gentry introduces the idea to many people, they react with alarm and put up their hands, as if they're afraid of being overwhelmed by technobabble. (I interviewed Gentry (MP3) before a talk she gave at this year's O'Reilly Open Source Convention.)

Founder and executive director Eri Gentry
BioCurious attacks that fear and miscomprehension. Like Hacker Dojo, another Silicon Valley stalwart whose happy hour I attended Friday night, they wants an open space for open-minded people. Hacker Dojo and BioCurious will banish forever the stereotype of the scientist or engineer as a socially maladroit loner. The attendees are stringently welcoming and interested in talking about what they do in says that make it understandable.
I thought of my two children, both of whom pursued musical careers. I wondered how they would have felt about music if kids weren't exposed to music until junior high school, whereupon they were sat down and forced to learn the circle of fifths and first species counterpoint. That's sort of how we present biology to the public--and then, even those who do show an interest are denied access to affordable equipment. BioCurious is on the cusp of a new scientific revolution.

Eri Gentry with Andy Oram in lab
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...

