Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

May 02 2012

Recombinant Research: Breaking open rewards and incentives

In the previous articles in this series I've looked at problems in current medical research, and at the legal and technical solutions proposed by Sage Bionetworks. Pilot projects have shown encouraging results but to move from a hothouse environment of experimentation to the mainstream of one of the world's most lucrative and tradition-bound industries, Sage Bionetworks must aim for its nucleus: rewards and incentives.

Previous article in the series: Sage Congress plans for patient engagement.

Think about the publication system, that wretchedly inadequate medium for transferring information about experiments. Getting the data on which a study was based is incredibly hard; getting the actual samples or access to patients is usually impossible. Just as boiling vegetables drains most of their nutrients into the water, publishing results of an experiment throws away what is most valuable.

But the publication system has been built into the foundation of employment and funding over the centuries. A massive industry provides distribution of published results to libraries and research institutions around the world, and maintains iron control over access to that network through peer review and editorial discretion. Even more important, funding grants require publication (but the data behind the study only very recently). And of course, advancement in one's field requires publication.

Lawrence Lessig, in his keynote, castigated for-profit journals for restricting access to knowledge in order to puff up profits. A chart in his talk showed skyrocketing prices for for-profit journals in comparison to non-profit journals. Lessig is not out on the radical fringe in this regard; Harvard Library is calling the current pricing situation "untenable" in a move toward open access echoed by many in academia.

Lawrence Lessig keynote at Sage Congress
Lawrence Lessig keynote at Sage Congress.

How do we open up this system that seemed to serve science so well for so long, but is now becoming a drag on it? One approach is to expand the notion of publication. This is what Sage Bionetworks is doing with Science Translational Medicine in publishing validated biological models, as mentioned in an earlier article. An even more extensive reset of the publication model is found in Open Network Biology (ONB), an online journal. The publishers require that an article be accompanied by the biological model, the data and code used to produce the model, a description of the algorithm, and a platform to aid in reproducing results.

But neither of these worthy projects changes the external conditions that prop up the current publication system.

When one tries to design a reward system that gives deserved credit to other things besides the final results of an experiment, as some participants did at Sage Congress, great unknowns loom up. Is normalizing and cleaning data an activity worth praise and recognition? How about combining data sets from many different projects, as a Synapse researcher did for the TCGA? How much credit do you assign researchers at each step of the necessary procedure for a successful experiment?

Let's turn to the case of free software to look at an example of success in open sharing. It's clear that free software has swept the computer world. Most web sites use free software ranging from the server on which they run to the language compilers that deliver their code. Everybody knows that the most popular mobile platform, Android, is based on Linux, although fewer realize that the next most popular mobile platforms, Apple's iPhones and iPads, run on a modified version of the open BSD operating system. We could go on and on citing ways in which free and open source software have changed the field.

The mechanism by which free and open source software staked out its dominance in so many areas has not been authoritatively established, but I think many programmers agree on a few key points:

  • Computer professionals encountered free software early in their careers, particularly as students or tinkerers, and brought their predilection for it into jobs they took at stodgier institutions such as banks and government agencies. Their managers deferred to them on choices for programming tools, and the rest is history.

  • Of course, computer professionals would not have chosen the free tools had they not been fit for the job (and often best for the job). Why is free software so good? Probably because the people creating it have complete jurisdiction over what to produce and how much time to spend producing it, unlike in commercial ventures with requirements established through marketing surveys and deadlines set unreasonably by management.

  • Different pieces of free software are easy to hook up, because one can alter their interfaces as necessary. Free software developers tend to look for other tools and platforms that could work with their own, and provide hooks into them (Apache, free database engines such as MySQL, and other such platforms are often accommodated.) Customers of proprietary software, in contrast, experience constant frustration when they try to introduce a new component or change components, because the software vendors are hostile to outside code (except when they are eager to fill a niche left by a competitor with market dominance). Formal standards cannot overcome vendor recalcitrance--a painful truth particularly obvious in health care with quasi-standards such as HL7.

  • Free software scales. Programmers work on it tirelessly until it's as efficient as it needs to be, and when one solution just can't scale any more, programmers can create new components such as Cassandra, CouchDB, or Redis that meet new needs.

Are there lessons we can take from this success story? Biological research doesn't fit the circumstances that made open source software a success. For instance, researchers start out low on the totem pole in very proprietary-minded institutions, and don't get to choose new ways of working. But the cleverer ones are beginning to break out and try more collaboration. Software and Internet connections help.

Researchers tend to choose formats and procedures on an ad hoc, project by project basis. They haven't paid enough attention to making their procedures and data sets work with those produced by other teams. This has got to change, and Sage Bionetworks is working hard on it.

Research is labor-intensive. It needs desperately to scale, as I have pointed out throughout this article, but to do so it needs entire new paradigms for thinking about biological models, workflow, and teamwork. This too is part of Sage Bionetworks' mission.

Certain problems are particularly resistant in research:

  • Conditions that affect small populations have trouble raising funds for research. The Sage Congress initiatives can lower research costs by pooling data from the affected population and helping researchers work more closely with patients.

  • Computation and statistical methods are very difficult fields, and biological research is competing with every other industry for the rare individuals who know these well. All we can do is bolster educational programs for both computer scientists and biologists to get more of these people.

  • There's a long lag time before one knows the effects of treatments. As Heywood's keynote suggested, this is partly solved by collecting longitudinal data on many patients and letting them talk among themselves.

Another process change has revolutionized the computer field: agile programming. That paradigm stresses close collaboration with the end-users whom the software is supposed to benefit, and a willingness to throw out old models and experiment. BRIDGE and other patient initiatives hold out the hope of a similar shift in medical research.

All these things are needed to rescue the study of genetics. It's a lot to do all at once. Progress on some fronts were more apparent than others at this year's Sage Congress. But as more people get drawn in, and sometimes fumbling experiments produce maps for changing direction, we may start to see real outcomes from the efforts in upcoming years.

All articles in this series, and others I've written about Sage Congress, are available through a bit.ly bundle.

OSCON 2012 — Join the world's open source pioneers, builders, and innovators July 16-20 in Portland, Oregon. Learn about open development, challenge your assumptions, and fire up your brain.

Save 20% on registration with the code RADAR20

May 01 2012

Recombinant Research: Sage Congress plans for patient engagement

Clinical trials are the pathway for approving drug use, but they aren't good enough. That has become clear as a number of drugs (Vioxx being the most famous) have been blessed by the FDA, but disqualified after years of widespread use reveal either lack of efficacy or dangerous side effects. And the measures taken by the FDA recently to solve this embarrassing problem continue the heavy-weight bureaucratic methods it has always employed: more trials, raising the costs of every drug and slowing down approval. Although I don't agree with the opinion of Avik S. A. Roy (reprinted in Forbes) that Phase III trials tend to be arbitrary, I do believe it is time to look for other ways to test drugs for safety and efficacy.

First article in the series: Recombinant Research: Sage Congress Promotes Data Sharing in Genetics.

But the Vioxx problem is just one instance of the wider malaise afflicting the drug industry. They just aren't producing enough new medications, either to solve pressing public needs or to keep up their own earnings. Vicki Seyfert-Margolis of the FDA built on her noteworthy speech at last year's Sage Congress (reported in one of my articles about the conference) with the statistic that drug companies have submitted 20% fewer medications to the FDA between 2001 and 2007. Their blockbuster drugs produce far fewer profits than before as patents expire and fewer new drugs emerge (a predicament called the "patent cliff"). Seyfert-Margolis intimated that this crisis in the cause of layoffs in the industry, although I heard elsewhere that the companies are outsourcing more research, so perhaps the downsizing is just a reallocation of the same money.

Benefits of patient involvement

The field has failed to rise to the challenges posed by new complexity. Speakers at Sage Congress seemed to feel that genetic research has gone off the tracks. As the previous article in this series explained, Sage Bionetworks wants researchers to break the logjam by sharing data and code in GitHub fashion. And surprisingly, pharma is hurting enough to consider going along with an open research system. They're bleeding from a situation where as much as 80% of each clinical analysis is spent retrieving, formatting, and curating the data. Meanwhile, Kathy Giusti of the Multiple Myeloma Research Foundation says that in their work, open clinical trials are 60% faster.

Attendees at a breakout session where I sat in, including numerous managers from major pharma companies, expressed confidence that they could expand public or "pre-competitive" research in the direction Sage Congress proposed. The sector left to engage is the one that's central to all this work--the public.

If we could collect wide-ranging data from, say, 50,000 individuals (a May 2013 goal cited by John Wilbanks of Sage Bionetworks, a Kauffman Foundation Fellow), we could uncover a lot of trends that clinical trials are too narrow to turn up. Wilbanks ultimately wants millions of such data samples, and another attendee claimed that "technology will be ready by 2020 for a billion people to maintain their own molecular and longitudinal health data." And Jamie Heywood of PatientsLikeMe, in his keynote, claimed to have demonstrated through shared patient notes that some drugs were ineffective long before the FDA or manufacturers made the discoveries. He decried the current system of validating drugs for use and then failing to follow up with more studies, snorting that, "Validated means that I have ceased the process of learning."

But patients have good reasons to keep a close hold on their health data, fearing that an insurance company, an identity thief, a drug marketer, or even their own employer will find and misuse it. They already have little enough control over it, because the annoying consent forms we always have shoved in our faces when we come to a clinic give away a lot of rights. Current laws allow all kinds of funny business, as shown in the famous case of the Vermont law against data mining, which gave the Supreme Court a chance to say that marketers can do anything they damn please with your data, under the excuse that it's de-identified.

In a noteworthy poll by Sage Bionetworks, 80% of academics claimed they were comfortable sharing their personal health data with family members, but only 31% of citizen advocates would do so. If that 31% is more representative of patients and the general public, how many would open their data to strangers, even when supposedly de-identified?

The Sage Bionetworks approach to patient consent

It's basic research that loses. So Wilbanks and a team have been working for the past year on a "portable consent" procedure. This is meant to overcome the hurdle by which a patient has to be contacted and give consent anew each time a new researcher wants data related to his or her genetics, conditions, or treatment. The ideal behind portable consent is to treat the entire research community as a trusted user.

The current plan for portable consent provides three tiers:

Tier 1

No restrictions on data, so long as researchers follow the terms of service. Hopefully, millions of people will choose this tier.

Tier 2

A middle ground. Someone with asthma may state that his data can be used only by asthma researchers, for example.

Tier 3

Carefully controlled. Meant for data coming from sensitive populations, along with anything that includes genetic information.

Synapse provides a trusted identification service. If researchers find a person with useful characteristics in the last two tiers, and are not authorized automatically to use that person's data, they can contact Synapse with the random number assigned to the person. Synapse keeps the original email address of the person on file and will contact him or her to request consent.

Portable consent also involves a lot of patient education. People will sign up through a software wizard that explains the risks. After choosing portable consent, the person decides how much to put in: 23andMe data, prescriptions, or whatever they choose to release.

Sharon Terry of the Genetic Alliance said that patient advocates currently try to control patient data in order to force researchers to share the work they base on that data. Portable consent loosens this control, but the field may be ready for its more flexible conditions for sharing.

Pharma companies and genetics researchers have lots to gain from access to enormous repositories of patient data. But what do the patients get from it? Leaders in health care already recognize that patients are more than experimental subjects and passive recipients of treatment. The recent ONC proposal for Stage 2 of Meaningful Use includes several requirements to share treatment data with the people being treated (which seems kind of a no-brainer when stated this baldly) and the ONC has a Consumer/Patient Engagement Power Team.

Sage Congress is fully engaged in the patient engagement movement too. One result is the BRIDGE initiative, a joint project of Sage Bionetworks and Ashoka with funding from the Robert Wood Johnson Foundation, to solicit questions and suggestions for research from patients. Researchers can go for years researching a condition without even touching on some symptom that patients care about. Listening to patients in the long run produces more cooperation and more funding.

Portable consent requires a leap of faith, because as Wilbanks admits, releasing aggregates of patient data mean that over time, a patient is almost certain to be re-identified. Statistical techniques are just getting too sophisticated and compute power growing too fast for anyone to hide behind current tricks such as using only the first three digits of a five-digit postal code. Portable consent requires the data repository to grant access only to bona fide researchers and to set terms of use, including a ban on re-identifying patients. Still, researchers will have rights to do research, redistribute data, and derive products from it. Audits will be built in.

But as mentioned by Kelly Edwards of the University of Washington, tools and legal contracts can contribute to trust, but trust is ultimately based on shared values. Portable consent, properly done, engages with frameworks like Synapse to create a culture of respect for data.

In fact, I think the combination of the contractual framework in portable consent and a platform like Synapse, with its terms of use, might make a big difference in protecting patient privacy. Seyfert-Margolis cited predictions that 500 million smartphone users will be using medical apps by 2015. But mobile apps are notoriously greedy for personal data and cavalier toward user rights. Suppose all those smartphone users stored their data in a repository with clear terms of use and employed portable consent to grant access to the apps? We might all be safer.

The final article in this series will evaluate the prospects for open research in genetics, with a look at the grip of journal publishing on the field, and some comparisons to the success of free and open source software.

Next: Breaking Open Rewards and Incentives. All articles in this series, and others I've written about Sage Congress, are available through a bit.ly bundle.

OSCON 2012 — Join the world's open source pioneers, builders, and innovators July 16-20 in Portland, Oregon. Learn about open development, challenge your assumptions, and fire up your brain.

Save 20% on registration with the code RADAR20

April 30 2012

Recombinant Research: Sage Congress promotes data sharing in genetics

Given the exponential drop in the cost of personal genome sequencing (you can get a basic DNA test from 23andMe for a couple hundred dollars, and a full sequence will probably soon come down to one thousand dollars in cost), a new dawn seems to be breaking forth for biological research. Yet the assessment of genetics research at the recent Sage Congress was highly cautionary. Various speakers chided their own field for tilling the same ground over and over, ignoring the urgent needs of patients, and just plain researching the wrong things.

Sage Congress also has some plans to fix all that. These projects include tools for sharing data and storing it in cloud facilities, running challenges, injecting new fertility into collaboration projects, and ways to gather more patient data and bring patients into the planning process. Through two days of demos, keynotes, panels, and breakout sessions, Sage Congress brought its vision to a high-level cohort of 230 attendees from universities, pharmaceutical companies, government health agencies, and others who can make change in the field.

In the course of this series of articles, I'll pinpoint some of the pain points that can force researchers, pharmaceutical companies, doctors, and patients to work together better. I'll offer a look at the importance of public input, legal frameworks for cooperation, the role of standards, and a number of other topics. But we'll start by seeing what Sage Bionetworks and its pals have done over the past year.

Synapse: providing the tools for genetics collaboration

Everybody understands that change is driven by people and the culture they form around them, not by tools, but good tools can make it a heck of a lot easier to drive change. To give genetics researchers the best environment available to share their work, Sage Bionetworks created the Synapse platform.

Synapse recognizes that data sets in biological research are getting too large to share through simple data transfers. For instance, in his keynote about cancer research (where he kindly treated us to pictures of cancer victims during lunch), UC Santa Cruz professor David Haussler announced plans to store 25,000 cases at 200 gigabytes per case in the Cancer Genome Atlas, also known as TCGA in what seems to be a clever pun on the four nucleotides in DNA. Storage requirements thus work out to 5 petabytes, which Haussler wants to be expandable to 20 petabytes. In the face of big data like this, the job becomes moving the code to the data, not moving the data to the code.

Synapse points to data sets contributed by cooperating researchers, but also lets you pull up a console in a web browser to run R or Python code on the data. Some effort goes into tagging each data set with associated metadata: tissue type, species tested, last update, number of samples, etc. Thus, you can search across Synapse to find data sets that are pertinent to your research.

One group working with Synapse has already harmonized and normalized the data sets in TCGA so that a researcher can quickly mix and run stats on them to extract emerging patterns. The effort took about one and half full-time employees for six months, but the project leader is confident that with the system in place, "we can activate a similar size repository in hours."

This contribution highlights an important principle behind Synapse (appropriately called "viral" by some people in the open source movement): when you have manipulated and improved upon the data you find through Synapse, you should put your work back into Synapse. This work could include cleaning up outlier data, adding metadata, and so on. To make work sharing even easier, Synapse has plans to incorporate the Amazon Simple Workflow Service (SWF). It also hopes to add web interfaces to allow non-programmers do do useful work with data.

The Synapse development effort was an impressive one, coming up with a feature-rich Beta version in a year with just four coders. And Synapse code is entirely open source. So not only is the data distributed, but the creators will be happy for research institutions to set up their own Synapse sites. This may make Synapse more appealing to geneticists who are prevented by inertia from visiting the original Synapse.

Mike Kellen, introducing Synapse, compared its potential impact to that of moving research from a world of journals to a world like GitHub, where people record and share every detail of their work and plans. Along these lines, Synapse records who has used a data set. This has many benefits:

  • Researchers can meet up with others doing related work.

  • It gives public interest advocates a hook with which to call on those who benefit commercially from Synapse--as we hope the pharmaceutical companies will--to contribute money or other resources.

  • Members of the public can monitor accesses for suspicious uses that may be unethical.

There's plenty more work to be done to get data in good shape for sharing. Researchers must agree on some kind of metadata--the dreaded notion of ontologies came up several times--and clean up their data. They must learn about data provenance and versioning.

But sharing is critical for such basics of science as reproducing results. One source estimates that 75% of published results in genetics can't be replicated. A later article in this series will examine a new model in which enough metainformation is shared about a study for it to be reproduced, and even more important to be a foundation for further research.

With this Beta release of Synapse, Sage Bionetworks feels it is ready for a new initiative to promote collaboration in biological research. But how do you get biologists around the world to start using Synapse? For one, try an activity that's gotten popular nowadays: a research challenge.

The Sage DREAM challenge

Sage Bionetworks' DREAM challenge asks genetics researchers to find predictors of the progression of breast cancer. The challenge uses data from 2000 women diagnosed with breast cancer, combining information on DNA alterations affecting how their genes were expressed in the tumors, clinical information about their tumor status, and their outcomes over ten years. The challenge is to build models integrating the alterations with molecular markers and clinical features to predict which women will have the most aggressive disease over a ten year period.

Several hidden aspects of the challenge make it a clever vehicle for Sage Bionetworks' values and goals. First, breast cancer is a scourge whose urgency is matched by its stubborn resistance to diagnosis. The famous 2009 recommendations of U.S. Preventive Services Task Force, after all the controversy was aired, left us with the dismal truth that we don't know a good way to predict breast cancer. Some women get mastectomies in the total absence of symptoms based just on frightening family histories. In short, breast cancer puts the research and health care communities in a quandary.

We need finer-grained predictors to say who is likely to get breast cancer, and standard research efforts up to now have fallen short. The Sage proposal is to marshal experts in a new way that combines their strengths, asking them to publish models that show the complex interactions between gene targets and influences from the environment. Sage Bionetworks will publish data sets at regular intervals that it uses to measure the predictive ability of each model. A totally fresh data set will be used at the end to choose the winning model.

The process behind the challenge--particularly the need to upload code in order to run it on the Synapse site--automatically forces model builders to publish all their code. According to Stephen Friend, founder of Sage Bionetworks, "this brings a level of accountability, transparency, and reproducibility not previously achieved in clinical data model challenges."

Finally, the process has two more effects: it shows off the huge amount of genetic data that can be accessed through Synapse, and it encourages researchers to look at each other's models in order to boost their own efforts. In less than a month, the challenge already received more than 100 models from 10 sources.

The reward for winning the challenge is publication in a respected journal, the gold medal still sought by academic researchers. (More on shattering this obelisk later in the series.) Science Translational Medicine will accept results of the evaluation as a stand-in for peer review, a real breakthrough for Sage Bionetworks because it validates their software-based, evidence-driven process.

Finally, the DREAM challenge promotes use of the Synapse infrastructure, and in particular the method of bringing the code to the data. Google is donating server space for the challenge, which levels the playing field for researchers, freeing them from paying for their own computing.

A single challenge doesn't solve all the problems of incentives, of course. We still need to persuade researchers to put up their code and data on a kind of genetic GitHub, persuade pharmaceutical companies to support open research, and persuade the general public to share data about the phonemes (life data) and genes--all topics for upcoming articles in the series.

Next: Sage Congress Plans for Patient Engagement. All articles in this series, and others I've written about Sage Congress, are available through a bit.ly bundle.

OSCON 2012 — Join the world's open source pioneers, builders, and innovators July 16-20 in Portland, Oregon. Learn about open development, challenge your assumptions, and fire up your brain.

Save 20% on registration with the code RADAR20

April 05 2012

Steep climb for National Cancer Institute toward open source collaboration

Although a lot of government agencies produce open source software, hardly any develop relationships with a community of outside programmers, testers, and other contributors. I recently spoke to John Speakman of the National Cancer Institute to learn about their crowdsourcing initiative and the barriers they've encountered.

First let's orient ourselves a bit--forgive me for dumping out a lot of abbreviations and organizational affiliations here. The NCI is part of the National Institutes of Health. Speakman is the Chief Program Officer for NCI's Center for Biomedical Informatics and Information Technology. Their major open source software initiative is the Cancer Biomedical Informatics Grid (caBIG), which supports tools for transferring and manipulating cancer research data. For example, it provides access to data classifying the carcinogenic aspects of genes (The Cancer Genome Atlas) and resources to help researchers ask questions of and visualize this data (the Cancer Molecular Analysis Portal).

Plenty of outside researchers use caBIG software, but it's a one-way street, somewhat in the way the Department of Veterans Affairs used to release its VistA software. NCI sees the advantages of a give-and-take such as the CONNECT project has achieved, through assiduous cultivation of interested outside contributors, and wants to wean its outside users away from the dependent relationship that has been all take and no give. And even the VA decided last year that a more collaborative arrangement for VistA would benefit them, thus putting the software under the guidance of an independent non-profit, the Open Source Electronic Health Record Agent (OSEHRA).

Another model is Forge.mil, which the Department of Defense set up with the help of CollabNet, the well-known organization in charge of the Subversion revision control tool. Forge.mil represents a collaboration between the DoD and private contractors, encouraging them to create shared libraries that hopefully increase each contractor's productivity, but it is not open source.

The OSEHRA model--creating an independent, non-government custodian--seems a robust solution, although it takes a lot of effort and risks failure if the organization can't create a community around the project. (Communities don't just spring into being at the snap of a bureaucrat's fingers, as many corporations have found to their regret.) In the case of CONNECT, the independent Alembic Foundation stepped in to fill the gap after a lawsuit stalled CONNECT's development within the government. According to Alembic co-founder David Riley, with the contract issues resolved, CONNECT's original sponsor--the Office of the National Coordinator--is spinning off CONNECT to a private sector, open source entity, and work is underway to merge the two baselines.

Whether an agency manages its own project or spins off management, it has to invest a lot of work to turn an internal project into one that appeals to outside developers. This burden has been discovered by many private corporations as well as public entities. Tasks include:

  • Setting up public repositories for code and data.

  • Creating a clean software package with good version control that make downloading and uploading simple.

  • Possibly adding an API to encourage third-party plugins, an effort that may require a good deal of refactoring and a definition of clear interfaces.

  • Substantially adding to the documentation.

  • General purging of internal code and data (sometimes even passwords!) that get in the way of general use.

Companies and institutions have also learned that "build it and they will come" doesn't usually work. An open source or open data initiative must be promoted vigorously, usually with challenges and competitions such as the Department of Health and Human Services offer in their annual Health Data Initiative forums (a.k.a datapaloozas).

With these considerations in mind, the NCI decided in the summer of 2011 to start looking for guidance and potential collaborators. Here, laws designed long ago to combat cronyism put up barriers. The NCI was not allowed to contact anyone it wanted out of the blue. Instead, it has to issue a Request for Information and talk to people who responded. Although the RFI went online, it obviously wasn't widely seen. After all, do you regularly look for RFIs and RFPs from government agencies? If so, I can safely guess that you're paid by a large company or lobbying agency to follow a particular area of interest.

RFIs and RFPs are released as a gesture toward transparency, but in reality they just make it easier for the usual crowd of established contractors and lobbyists to build on the relationships they already have with agencies. And true to form, the NCI received only a limited set of responses, frustrated in their attempts to talk to new actors with the expertise they needed for their open source efforts.

And because the RFI had to allow a limited time window for responses, there is no point in responding to it now.

Still, Speakman and his colleagues are educating themselves and meeting with stakeholders. Cancer research is a hot topic drawing zealous attention from many academic and commercial entities, and they're hungry for data. Already, the NCI is encouraged by the initial positive response from the cancer informatics community, many of whom are eager to see the caBIG software deposited in an open repository like GitHub right away. Luckily, HHS has already negotiated terms of service with GitHub and SourceForge, removing at least one important barrier to entry. The NCI is packaging its first tool (a laboratory information management system called caLIMS) for deposit into a public repository. So I'm hoping the NCI is too caBIG to fail.

February 17 2012

Documentation strategy for a small software project: launching VoIP Drupal introductions

VoIP Drupal is a window onto the promises and challenges faced by a new open source project, including its documentation. At O'Reilly, we've been conscious for some time that we lack a business model for documenting new collaborative projects--near the beginning, at the stage where they could use the most help with good materials to promote their work, but don't have a community large enough to support a book--and I joined VoIP Drupal to explore how a professional editor can help such a team.

Small projects can reach a certain maturity with poor and sparse document. But the critical move from early adopters to mainstream requires a lot more hand-holding for prospective users. And these projects can spare hardly any developer time for documentation. Users and fans can be helpful here, but their documentation needs to be checked and updated over time; furthermore, reliance on spontaneous contributions from users leads to spotty and unpredictable coverage.

Large projects can hire technical writers, but what they do is very different from traditional documentation; they must be community managers as well as writers and editors (see Anne Gentle's book Conversation and Community: The Social Web for Documentation). So these projects can benefit from research into communities also.

I met at the MIT Media Lab this week with Leo Burd, the inventor of VoIP Drupal, and a couple other supporters, notably Micky Metts of DrupalConnection.com. We worked out some long-term plans for firming up VoIP Drupal's documentation and other training materials. But we also had to deal with an urgent need for materials to offer at DrupalCon, which begins in just over one month.

Challenges

One of the difficulties of explaining VoIP Drupal is that it's just so versatile. The foundations are simple:

  • A thin wrapper around PHP permits developers to write simple scripts that dial phone numbers, send SMS messages, etc. These scripts run on services that initiate connections and do translation between voice and text (Tropo, Twilio, and the free Plivo are currently supported).

  • Administrators on Drupal sites can use the Drupal interface to configure VoIP Drupal modules and add phone/SMS scripts to their sites.

  • Content providers can use the VoIP Drupal capabilities provided by their administrators to do such things as send text messages to site users, or to enable site users to record messages using their phone or computer.

Already you can see one challenge: VoIP Drupal has three different audiences that need very different documentation. In fact, we've thought of two more audiences: decision-makers who might build a business or service on top of VoIP Drupal, and potential team members who will maintain and build new features.

Some juicy modules built on top of VoIP Drupal's core extend its versatility to the point where it's hard to explain on an elevator ride what VoIP Drupal could do. Leo tosses out a few ideas such as:

  • Emergency awareness systems that use multiple channels to reach out to a population who live in a certain area. That would require a combination of user profiling, mapping and communication capabilities tend to be extremely hard to put together under one single package.

  • Community polling/voting systems that are accessible via web, SMS, email, phone, etc.

  • CRM systems that keep track (and even record) phone interactions, organize group conference calls with the click of a button, etc.

  • Voice-based bulletin boards.

  • Adding multiple authentication mechanisms to a site.

  • Sending SMS event notifications based on Google Calendars.

In theory you could create a complete voice and SMS based system out of VoIP Drupal and ignore the web site altogether, but that would be a rather cumbersome exercise. VoIP Drupal is well-suited to integrating voice and the Web--and it leaves lots of room for creativity.

Long-term development

A community project, we agreed, needs to be incremental and will result in widely distributed documents. Some people like big manuals, but most want a quickie getting-started guide and then lots of chances to explore different options at their own pace. Communities are good for developing small documents of different types. The challenge is finding someone to cover any particular feature, as well as to do the sometimes tedious work of updating the document over time.

We decided that videos would be valuable for the administrators and content providers, because they work through graphical interfaces. However, the material should also be documented in plain text. This expands access to the material in two ways. First, VoIP Drupal may be popular in part of the world where bandwidth limitations make it hard to view videos. Second, the text pages are easier to translate into other languages.

Just as a video can be worth a thousand words, working scripts can replace a dozen explanations. Leo will set up a code contribution site on Github. This is more work than it may seem, because malicious or buggy scripts can wreak havoc for users (imagine someone getting a thousand identical SMS messages over the course of a single hour, for instance), so contributions have to be vetted.

Some projects assign a knowledgeable person or two to create an outline, then ask community members to fill it in. I find this approach too restrictive. Having a huge unfilled structure is just depressing. And one has to grab the excitement of volunteers wherever it happens to land. Just asking them to document what they love about a project will get you more material than presenting them with a mandate to cover certain topics.

But then how do you get crucial features documented? Wait and watch forums for people discussing those features. When someone seems particularly knowledgeable and eager to help, ask him or her for a longer document that covers the feature. You then have to reward this person for doing the work, and a couple ways that make sense in this situation include:

  • Get an editor to tighten up the document and work with the author to make a really professional article out of it.

  • Highlight it on your web site and make sure people can find it easily. For many volunteers, seeing their material widely used is the best reward.

We also agreed that we should divide documentation into practical, how-to documents and conceptual documents. Users like to grab a hello-world document and throw together their first program. As they start to shape their own projects, they realize they don't really understand how the system fits together and that they need some background concepts. Here is where most software projects fail. They assume that the reader understands the reasoning behind the design and knows how best to use it.

Good conceptual documentation is hard to produce, partly because the lead developers have the concepts so deeply ingrained that they don't realize what it is that other people don't know. Breaking the problems down into small chunks, though, can make it easier to produce useful guides.

Like many software projects, VoIP Drupal documentation currently starts the reader off with a list of modules. The team members liked an idea of mine to replace these with brief tutorials or use cases. Each would start with a goal or question (what the reader wants to accomplish) and then introduce the relevant module. In general, given the flexibility of VoIP Drupal, we agreed we need a lot more "why and when" documentation.

Immediate preparations

Before we take on a major restructuring and expansion of documentation, though, we have a tight deadline for producing some key videos and documents. Leo is going to lead a development workshop at DrupalCon, and he has to determine the minimum documentation needed to make it a productive experience. He also wants to do a webinar on February 28 or 29, and a series of videos on basic topics such as installing VoIP Drupal, a survey of successful sites using it, and a nifty graphical interface called Visual VoIP Drupal. Visual VoIP Drupal, which will be released in a few weeks, is one of the new features Leo would like to promote in order to excite users. It lets a programmer select blocks and blend them into a script through a GUI, instead of typing all the code.

The next few weeks will bring a flurry of work to realize our vision.

November 23 2011

October 21 2011

Wrap-up from FLOSS Manuals book sprint at Google

At several points during this week's documentation sprint at Google, I talked with the founder of FLOSS Manuals, Adam Hyde, who developed the doc sprint as it is practiced today. Our conversation often returned to the differences between the group writing experience we had this week and traditional publishing. The willingness of my boss at O'Reilly Media to send me to this conference shows how interested the company is learning what we might be able to take from sprints.

Some of the differences between sprints and traditional publishing are quite subtle. The collaborative process is obviously different, but many people outside publishing might not realize just how deeply the egoless collaboration of sprints flies in the face of the traditional publishing model. The reason is that publishing has long depended on the star author. In whatever way a person becomes this kind a star, whether by working his way up the journalism hierarchy like Thomas Friedman or bursting on the scene with a dazzling person story like Greg Mortenson (author of Three Cups of Tea), stardom is almost the only way to sell books in profitable numbers. Authors who use the books themselves to build stardom still need to keep themselves in the public limelight somehow. Without colorful personalities, the publishing industry needs a new way to make money (along with Hollywood, television, and pop music).

But that's not the end of differences. Publishers also need to promise a certain amount of content, whereas sprinters and other free documentation projects can just put out what they feel like writing and say, "If you want more, add it." Traditional publishing will alienate readers if books come out with certain topics missing. Furthermore, if a book lacks a popular topic that a competitor has, the competitor will trounce the less well-endowed book in the market. So publishers are not simply inventing needs to maintain control over the development effort. They're not exerting control just to tamp down on unauthorized distribution or something like that. When they sell content, users have expectations that publishers strive to meet, so they need strong control over the content and the schedule for each book.

But O'Reilly, along with other publishers across the industry, is trying to change expectations. The goal of comprehensiveness conflicts with another goal, timeliness, that is becoming more and more important. We're responding in three ways that both bring us closer to what FLOSS Manuals is doing: we put out "early releases" containing parts of books that are underway, we sign contracts for projects on limited topics that are very short by design, and we're experimenting with systems that are even closer to the FLOSS Manuals system, allowing authors to change a book at whim and publish a new version immediately.

Although FLOSS Manuals produces free books and gets almost none of its funding from sales (the funding comes from grants and from the sponsors of sprints), the idea of sprinting is still compatible with traditional publishing, in which sales are the whole point. Traditional publishers tend to give several thousand dollars to authors in the form of advances, and if the author takes several months to produce a book, we don't see the royalties that pay us back for that investment for a long time. Why not spend a few thousand dollars to bring a team of authors to a pleasant but distraction-free location (I have to admit that Google headquarters is not at all distraction-free) and pay for a week of intense writing?

Authors would probably find it much more appealing to take a one-week vacation and say good-bye to their families for this time than to spend months stealing time on evenings and weekends and apologizing for not being fully present.

The problem, as I explained in my first posting this week, is that you never quite know what you're going to get from a sprint. In addition, the material is still rough at the end of a week and has to absorb a lot of work to rise to the standards of professional publishing. Still, many technical publishers would be happy to get over a hundred pages of relevant material in a single week.

Publishers who fail to make documents free and open might be more disadvantaged when seeking remote contributions. Sprints don't get many contributions from people outside the room where it is conducted, but sometimes advice and support weigh in on some critical, highly technical point. The sprints I have participated in (sometimes remotely) benefited from answers that came out of the cloud to resolve difficult questions. For instance, one commenter on this week's KDE conference warned us we were using product names all wrong and had us go back through the book to make sure our branding was correct.

Will people offer their time to help authors and publishers develop closed books? O'Reilly has put books online during development, and random visitors do offer fixes and comments. There is some good will toward anyone who wants to offer guidance that a community considers important. But free, open documents are likely to draw even more help from crowdsourcing.

At the summit today, with the books all wrapped up and published, we held a feedback session. The organizers asked us our opinions on the sprint process, the writing tools, and how to make the sprint more effective. Our facilitator raised three issues that, once again, reminded me of the requirements of traditional publishing:

  • Taking long-term responsibility for a document. How does one motivate people to contribute to it? In the case of free software communities, they need to make updates a communal responsibility and integrate the document into their project life cycle just like the software.

  • Promoting the document. Without lots of hype, people will not notice the existence of the book and pick it up. Promotion is pretty much the same no matter how content is produced (social networking, blogging, and video play big roles nowadays), but free books are distinguished by the goal of sharing widely without concern for authorial control or payment. Furthermore, while FLOSS Manuals is conscious of branding, it does not use copyright or trademarks to restrict use of graphics or other trade dress.

  • Integrating a document into a community. This is related to both maintenance and promotion. But every great book has a community around it, and there are lots of ways people can use them in training and other member-building activities. Forums and feedback pages are also important.

Over the past decade, a system of information generation has grown up in parallel with the traditional expert-driven system. In the old system everyone defers to an expert, while in the new system the public combines its resources. In the old system, documents are fixed after publication, whereas in the new system they are fluid. The old system was driven by the author's ego and increasingly by the demand for generating money, whereas the new system has revenue possibilities but has a strong sense of responsibility for the welfare of communities.

Mixtures of grassroots content generation and unique expertise have existed (Homer, for instance) and more models will be found. Understanding the points of commonality between the systems will help us develop such models.

(All my postings from this sprint are listed in a bit.ly bundle.)

FLOSS Manuals books published after three-day sprint

The final day of the FLOSS Manuals documentation sprint at Google began with a bit of a reprieve from Sprintmeister Adam Hyde's dictum that we should do no new writing. He allowed us to continue work till noon, time that the KDE team spent partly in heated arguments over whether we had provided enough coverage of key topics (the KDE project architecture, instructions for filing bug reports, etc.), partly in scrutinizing dubious material the book had inherited from the official documentation, and (at least a couple of us) actually writing material for chapters that readers may or may not find useful, such as a glossary.

I worried yesterday that the excitement of writing a complete book would be succeeded by the boring work of checking flow and consistency. Some drudgery was involved, but the final reading allowed groups to revisit their ways of presenting concepts and bringing in the reader.

Having done everything I thought I could do for the KDE team, I switched to OpenStreetMap, who produced a short, nicely paced, well-illustrated user guide. I think it's really cool that Google, which invests heavily in its own mapping service, helps OpenStreetMap as well. (They are often represented in Google Summer of Code.)

After dinner we started publishing our books. The new publication process at FLOSS Manuals loads the books not only to the FLOSS Manuals main page but to Lulu for purchase.

Publishing books at doc sprint
Publishing books at doc sprint

Joining the pilgrimage that all institutions are making toward wider data use, FLOSS Manuals is exposing more and more of the writing process. As described by founder Adam Hyde in a blog posting today, Visualising your book, recently added tools that help participants and friends follow the progress of the book (you can view a list of chapters edited on an RSS feed, for instance) and get a sense of what was done. For instance, a timeline with circles representing chapter edits shows you which chapters had the most edits and when activity took place. (Pierre Commenge created the visualization for FLOSS Manuals.)

Participants at doc sprint
Participants at doc sprint

(All my postings from this sprint are listed in a href="https://bitly.com/bundles/praxagora/4">bit.ly bundle.)

October 20 2011

Day two of FLOSS Manuals book sprint at Google Summer of Code summit

We started the second day of the FLOSS Manuals sprint with a circle encounter where each person shared some impressions of the first day. Several reported that they had worked on wikis and other online documentation before, but discovered that doing a book was quite different (I could have told them that, of course). They knew that a book had to be more organized, and offer more background than typical online documentation. More fundamentally, they felt more responsibility toward a wider range of readers, knowing that the book would be held up as an authority on the software they worked on and cared so much about.

We noted how gratifying it was to get questions answered instantly and be able to go through several rounds of editing in just a couple minutes. I admitted that I had been infected with the enthusiasm of the KDE developers I was working with, but had to maintain a bit of critical distance, an ability to say, "Hey, you're telling me this piece of software is wonderful, but I find its use gnarly and convoluted."

As I explained in Monday's posting, all the writing had to fit pretty much into two days. Each of the four teams started yesterday by creating an outline, and I'm sure my team was not the only one to revise it constantly throughout the day.

Circle at beginning of the day
Circle at beginning of the day

Today, the KDE team took a final look at the outline and discussed everything we'd like to add to it. We pretty much finalized it early int the day and just filled in the blanks for the next eleven hours. I continued to raise flags about what I felt were insufficiently detailed explanations, and got impatient enough to write a few passages of my own in the evening.

Celebrating our approach to the end of the KDE writing effort
Celebrating our approach to the end of the KDE writing effort

The KDE book is fairly standard developer documentation, albeit a beginner's guide with lots of practical advice about working in the KDE environment with the community. As a relatively conventional book, it was probably a little easier to write (but also probably less fun) than the more high-level approaches taken by some other teams that were trying to demonstrate to potential customers that their projects were worth adopting. Story-telling will be hard to detect in the KDE book.

And we finished! Now I'm afraid we'll find tomorrow boring, because we won't be allowed (and probably won't need) to add substantial new material. Instead, we'll be doing things like checking everything for consistency, removing references to missing passages, adding terms to the glossary, and other unrewarding slogs through a document that is far too familiar to us already. The only difference between the other team members and me is that I may be assigned to do this work on some other project.

(All my postings from this sprint are listed in a bit.ly bundle.)

October 19 2011

Day one of FLOSS Manuals book sprint at Google Summer of Code summit

Four teams at Google launched into endeavors that will lead, less than 72 hours from now, to complete books on four open source projects (KDE, OpenStreetMap, OpenMRS, and Sahana Eden). Most participants were recruited on the basis of a dream and a promise, so going through the first third of our sprint was eye-opening for nearly everybody. Although I had participated in one sprint before on-site and two sprints remotely, I found that the modus operandi has changed so much during the past year of experimentation that I too had a lot to learn.

Our doc sprint coordinator, Adam Hyde, told each team to spend an hour making an outline. The team to which I was assigned, KDE, took nearly two, and part way through Adam came in to tell us to stop because we had enough topics for three days of work. We then dug in to filling in the outline through a mix of fresh writing and cutting and pasting material from the official KDE docs. The latter required a complete overhaul, and naturally proved often to be more than a year out of date.

KDE team at doc sprint
KDE team at doc sprint

The KDE team's focus on developer documentation spared them the open-ended discussions over scope that the other teams had to undergo. But at key points during the writing, we still were forced to examine passages that appeared too hurried and unsubstantiated, evidence of gaps in information. At each point we had to determine what the hidden topics were, and then whether to remove all references to them (as we did, for instance, on the topic of getting permission to commit code fixes) or to expand them into new chapters of their own (as we did for internationalization). The latter choice created a dilemma of its own, because none of the team members present had experience with internationalization, so we reached out and tried to contact remote KDE experts who could write the chapter.

The biggest kudos today go to Sahana Eden, I think. I reported yesterday that the team expressed deep difference of opinion about the audience they should address and how they should organize their sprint. Today they made some choices and got a huge amount of documentation down on the screen. Much of it was clearly provisional (they were boo'ed for including so many bulleted lists) but it was evidence of their thinking and a framework for further development.

Sahana team at doc sprint
Sahana team at doc sprint

My own team had a lot of people with serious jet lag, and we had some trouble going from 9:00 in the morning to 9:30 at night. But we created (or untangled, as the case may be) some 60 pages of text. We reorganized the book at least once per hour, a process that the FLOSS Manuals interface makes as easy as drag and drop. A good heuristic was to choose a section title for each group of chapters. If we couldn't find a good title, we had to break up the group.

The end of the day brought us to the half-way mark for writing. We ares told we need to complete everything at the end of the evening tomorrow and spend the final day rearranging and cleaning up text. More than a race against time, this is proving to be a race against complexity.

Topics for discussion at doc sprint
Topics for discussion at doc sprint

October 18 2011

FLOSS Manuals sprint starts at Google Summer of Code summit

Five days of intense book production kicked off today at the FLOSS Manuals sprint, hosted by Google. Four free software projects have each sent three to five volunteers to write books about the projects this week. Along the way we'll all learn about the group writing process and the particular use of book sprints to make documentation for free software.

I came here to provide whatever editorial help I can and to see the similarities and differences between conventional publishing and the intense community effort represented by book sprints. I plan to spend time with each of the four projects, participating in their discussions and trying to learn what works best by comparing what they bring in the way of expertise and ideas to their projects. All the work will be done out in the open on the FLOSS Manuals site for the summit, so you are welcome also to log in and watch the progress of the books or even contribute.

A book in a week sounds like a pretty cool achievement, whether for a free software projects or a publisher. In fact, the first day (today) and last day of the sprint are unconferences, so there are only three days for actual writing. The first hour tomorrow will be devoted to choosing a high-level outline for each project, and then they will be off and running.

And there are many cautions about trying to apply this model to conventional publishing. First, the books are never really finished at the end of the sprint, even though they go up for viewing and for sale immediately. I've seen that they have many rough spots, such as redundant sections written by different people on the same topic, and mistakes in cross-references or references to non-existent material. Naturally, they also need a copy-edit. This doesn't detract from the value of the material produced. It just means they need some straightening out to be considered professional quality.

Books that come from sprints are also quite short. I think a typical length is 125 pages, growing over time as follow-up sprints are held. The length also depends of course on the number of people working on the sprint. We have the minimum for a good sprint here at Google, because the three to five team members will be joined by one or two people like me who are unaffiliated.

Finally, the content of a sprint book is decided on an ad hoc basis. FLOSS Manuals founder Adam Hyde explained today that his view of outlining and planning has evolved considerably. He quite rationally assumed at first that very book should have a detailed outline before the sprint started. Then he found that one could not impose an outline on sprinters, but had to let them choose subjects they wanted to cover. Each sprinter brings certain passions, and in such an intense environment one can only go with the flow and let each person write what interests him or her. Somehow, the books pull together into a coherent product, but one cannot guarantee they'll have exactly what the market is asking for. I, in fact, was involved in the planning of a FLOSS Manuals sprint for the CiviCRM manual (the first edition of a book that is now in its third) and witnessed the sprinters toss out an outline that I had spent weeks producing with community leaders.

So a sprint is different in every way from a traditional published manual, and I imagine this will be true for community documentation in general.

The discussions today uncovered the desires and concerns of the sprinters, and offered some formal presentations to prepare us, we hope, for the unique experience of doing a book sprint. The concerns expressed by sprinters were pretty easy to anticipate. How does one motivate community members to write? How can a project maintain a book in a timely manner after it is produced? What is the role of graphics and multimedia? How does one produce multiple translations?

Janet Swisher, a documentation expert from Austin who is on the board of FLOSS Manuals, gave a presentation asking project leaders to think about basic questions such as why a user would use their software and what typical uses are. Her goal was to bring home the traditional lessons of good writing: empathy for a well-defined audience. "If I had a nickel for every web site I've visited put up by an open source project that doesn't state what the software is for..." she said. That's just a single egregious instance of the general lack of understanding of the audience that free software authors suffer from.

Later, Michael McAndrew of the CiviCRM project took us several steps further along the path, asking what the project leaders would enjoy documenting and "what would be insane to leave out." I sat with the group from Sahana to watch as they grappled with the pressures these questions created. This is sure one passionate group of volunteers, caring deeply about what they do. Splits appeared concerning how much time to devote to high-level concepts versus practical details, which audiences to serve, and what to emphasize in the outline. I have no doubt, however, listening to them listen to each other, that they'll have their plan after the first hour tomorrow and will be off and running.

March 02 2011

Software patents, prior art, and revelations of the Peer to Patent review

A href="http://us1.campaign-archive1.com/?u=33d934c165e69e4b507504c2b&id=8771dc3ae5&e=77c352ede8#mctoc1">report
from the Peer to Patent initiative shows
that the project is having salutary effects on the patent system.
Besides the greater openness that Peer to Patent promotes in
evaluating individual patent applications, it is creating a new
transparency and understanding of the functioning of the patent system
as a whole. I'll give some background to help readers understand the
significance of Manny Schecter's newsletter item, which concerns prior
art that exists outside of patents. I'll add my own comments about
software patents.


Let's remind ourselves of the basic rule of patenting: no one should
get a patent for something that was done before by someone else. Even
if you never knew that some guy on the other side of the world thought
of adding a new screw to add to a boiler, you can't get a patent on
the idea of adding a screw in that place for that purpose. The other
guy's work is called prior art, and such prior art can be
found in all kinds of places: marketing brochures, academic journals,
or actual objects that operate currently or operated any time in the
past. For software (which is of particular interest to most readers
of this blog), prior art could well be source code.

Now for the big lapse at the U.S. Patent Office: they rarely look for
prior art out in the real world. They mostly check previously granted
U.S. patents--a pretty narrow view of technology. And that has
seriously harmed software patenting.

Software was considered a form of thinking rather than as a process or
machine up until the early 1980s, and therefore unpatentable. Patents
started to be granted on software in the United States in the early
1980s and took off in a big way in the 1990s. (A useful href="http://www.bitlaw.com/software-patent/history.html">history has
been put up by Bitlaw. This sudden turn meant that patent
examiners were suddenly asked to evaluate applications in a field
where there were no patents previously. So of course they couldn't
find prior art. It would have been quixotic in any case to expect
examiners--allowed less than 20 hours per patent--to learn a new field
of software and go out among the millions of lines of code to search
for examples of what they were being asked to grant patents for.

In many parts of the world, software is still considered unsuitable
for patenting, but it's worth noting that the European Union has been
handing out patents on software without acknowledging them as such,
because a hard-fought battle among free software advocates has kept
software officially unpatentable.

In the U.S., patents have been handed out right and left for two
decades now, so the prior art does exist within patents on software.
But that even makes things worse. First, the bad patents handed out
over the initial decades continues to weigh down software with
lawsuits that lack merit. Second, the precedent of so many unmerited
patents gives examiners the impression that it's OK to grant patents
on the same kinds of overly broad and obvious topics now.

Now to Schecter's article. He says the patent office has long
acknowledged that they look mostly to patents for prior art, but they
won't admit that this is a problem. One has to prove to them that
there is important prior art out in the field, and that this prior art
can actually lead to the denial of applications.

And Peer to Patent has accomplished that. From Schecter:

Approximately 20% of patent applications in the pilot were rejected in
view of prior art references submitted through Peer To Patent, and
over half of the references applied by examiners as grounds for those
rejections were non-patent prior art.

The discussion over the patent process, which has progressed so
painfully slowly over many years, now takes a decisive step forward.
Prior art in the field should be taken into account during the process
of examining patents. The next question is how.

Peer to Patent and related efforts such as href="http://www.articleonepartners.com/">Article One Partners
offer a powerful step toward a solution. Much of the tinkering
proposed in current debates, such as the number of patent examiners,
the damages awarded for infringement, and so forth (a bill was
debated in the Senate today, I've heard), will accomplish much less to
cut down the backlog of 700,000 applications and improve outcomes than
we could achieve through serious involvement of public input.

I am not a zealot on the subject of software patents. I've read a lot
of patent applications and court rulings about patents (see, for
instance, my href="http://www.praxagora.com/andyo/article/patent_bilski_aftermath.html">
analysis of the Bilski decision) and explored the case for
software patents sympathetically in href="http://radar.oreilly.com/archives/2007/09/three_vantage_p.html">another
article. But I have to come down on the side of position that
software and business processes, like other areas of pure human
thought, have no place in the patent system.

Maybe Rivest, Shamir, and Adleman deserved their famous href="http://www.google.com/patents?vid=4405829">patent (now
expired) on public-key cryptography--that was a huge leap of thought
making a historic change in how computers are used in the world. But
the modern patents I've seen are nothing like the RSA algorithm. They
represent cheap patches on tired old practices. Proponents of software
patents may win their battle in the halls of power, but they have lost
their argument on the grounds of the patents to which their policy has
led. Sorry, there's just too much crap out there.

Media old and new are mobilized for effective causes

The bright light of social media has attracted the attention of followers in every discipline, from media and academia to corporate marketing and social causes. There was something for everybody today in the talk by researcher Sasha Costanza-Chock at Harvard's Berkman center on Transmedia Mobilization. He began with a stern admonition to treat conventional and broadcast media as critical resources, and moved ultimately to a warning to treat social networks and other peer-driven media as critical resources. I hope I can reproduce at least a smidgen of the insights and research findings he squeezed into forty-five minute talk (itself a compressed version of a much longer presentation he has delivered elsewhere).

Control the message, control the funding

Consultants (not normally known for welcoming a wide range of outside opinions themselves) have been browbeating the corporations and governments over social media, trying to get it through their noggins that Twitter and Facebook are not merely a new set of media outlets to fill with their PR and propaganda. Corporations and governments are notoriously terrified of losing control over "the message," but that is the only way they will ever get a message out in the new world of peer-to-peer communications. According to Costanza-Chock, non-profit social causes suffer from the same skittishness. "Some of them will go to the bitter end trying to maintain top-down control," he laughed. But "the smart ones" will get the point and become equal participants in forums that they don't try to dominate.

Another cogent observation, developed further in his discussion with the audience, drew a line between messaging and funding. Non-profits depend on foundations and other large donors, and need to demonstrate to them that the non-profit has actually done something. "We exchanged messages with 100,000 participants on MySpace" comes in sounding worth a lot less than, "We shot three documentaries and distributed press releases to four hundred media outlets." Sasha would like to see forums on social media for funders as well as the non-profits they fund. All sides need to learn the value of being peers in a distributed system, and how to use that role effectively.

Is the Internet necessary?

Sasha's key research for this talk involved the 2006 pro-immigrant demonstrations that played a role in bringing down the Sensenbrenner Bill that would have imposed severe restrictions on citizens in an attempt to marginalize and restrict the movements of immigrants. The protests filled the streets of major cities across the country, producing the largest demonstrations ever seen on U.S. soil. How did media play a role?

Sasha started by seemingly downplaying the importance of Internet-based media. He went over recent statistics about computer use, broadband access, and cell phone ownership among different demographics and showed that lower-class, Spanish-speaking residents (the base of the protestors in Los Angeles, where he carried out his research) were woeful under-represented. It would appear that the Internet was not a major factor in the largest demonstrations in U.S. history. But he found that they played a subtle role in combination with traditional media.

Immigrants are also largely shut out of mainstream media; it's a red-letter day when a piece about their lives or written from their point of view appears on page 10 of the local paper. Most of the mobilization, therefore, Sasha attributed to Spanish talk radio, which Los Angeles immigrants turned on all day and whose hosts made a conscious decision to work together and promote social action around the Sensenbrenner Bill.

Sasha also discovered other elements of traditional media, such as a documentary movie about Latino protests in 1960s Los Angeles that aired shortly before the demonstrations. And here's where social media came in: high school students who played roles in the documentary posted clips of their parts on MySpace. There were other creative uses of YouTube and the social media sites to spread the word about protests. Therefore, the Internet can't be dismissed. It could not have done much without material from traditional media to draw on, but the traditional media would not have had such a powerful effect without the Internet either.

One interesting aspect of Sasha's research concerned identity formation. You can't join with people in a cause unless you view the group as part of your identity, and traditional media go a long way to helping people form their identities. Just by helping to make a video, you can start to identify with a cause. It's interesting how revolutionaries in countries such as Tunisia and Egypt formed identities as a nation in opposition to their leaders instead (as most dictators strive to achieve) in sympathy with them. So identity formation is a critical process, and we don't know yet how much social networks can do to further it.

In conclusion, it seems that old and new media will co-exist for an indefinite period of time, and will reinforce each other. Interesting questions were raised in the audience about whether the new environment can create meeting spaces where people on opposing sides can converse productively, or whether it will be restricted to the heavily emotion-laden and one-sided narratives that we see so much of nowadays. One can't control the message, but the message can sure be powerful.

February 18 2011

An era in which to curate skills: report from Tools of Change conference

Three days of intensive
discussion about the current state of publishing
wrapped up last
night in New York City. Let's take a tour from start to end.

If I were to draw one point from the first session I attended, I would
say that metadata is a key competitive point for publishers. First,
facts such as who has reviewed a book, how many editions it has been
through, and other such things we call "metadata" can be valuable to
readers and institutions you want to sell content to. Second, it can
be valuable to you internally as part of curation (which I'll get to
later). Basically, metadata makes content far more useful. But it's
so tedious to add and to collate with other content's metadata that
few people out in the field bother to add it. Publishers are
well-placed to do it because they have the resources to pay people for
that unappreciated task.

If I were to leap to the other end of the conference and draw one
point from the closing keynotes, I would say that the key to survival
is to start with the determination that you're going to win, and to
derive your strategy from that. The closing keynoters offered a couple
strategies along those lines.


Kathy Sierra
claimed she started her href="http://oreilly.com/store/series/headfirst.html">Head First
series with no vision loftier than to make money and beat the
competition. The key to sales, as she has often explained in her
talks and articles on "Creating Passionate Users," is not to promote
the author or the topic but to promote what the reader could become.

href="http://www.toccon.com/toc2011/public/schedule/detail/17571">Ben
Huh of I Can Has
Cheezburger
said that one must plan a book to be a bestseller, the
way car or appliance manufacturers plan to meet the demands of their
buyers.

Thus passed the start and end of this conference. A lot happened in
between. I'll cover a few topics in this blog.

Skills for a future publishing

There clearly is an undercurrent of worry, if not actual panic, in
publishing. We see what is happening to newspapers, we watch our
markets shrink like those of the music and television industries, and
we check our own balance sheets as Borders Books slides into
bankruptcy. I cannot remember another conference where I heard, as I
did this week, the leader of a major corporation air doubts from the
podium about the future of her company and her career.

Many speakers combatted this sense of helplessness, of course, but
their advice often came across as, "Everything is going haywire and
you can't possibly imagine what the field will look like in a few
years, so just hang on and go with the flow. And by the way,
completely overturn your workflows and revamp your skill sets."

Nevertheless, I repeatedly heard references to four classic skills
that still rule in the field of publishing. These skills were always
important and will remain important, but they have to shift and in
some ways to merge.

Two of these skills are research and sales. Although one was usually
expected to do research on the market and topic before writing and do
sales afterward, the talks by Sierra, Huh, and others suggested that
these are continuous activities, and hard to separate. The big buzz in
all the content industries is about getting closer to one's audience.
There is never a start and end to the process.

The consensus is that casual exploitation of social
networking--sending out postings and updates and trying to chat with
readers online--won't sell your content. Your readers are a market and
must be researched like one: using surveys, statistical analysis, and
so on. This news can be a relief to the thousands of authors who feel
guilty (and perhaps are made to feel guilty by their publishers)
because they don't get pleasure from reporting things on Facebook
ranging from the breakfast cereal they ate to their latest brilliant
insight. But the question of how bring one's audience into one's
project--a topic I'll refer to as crowdsourcing and cover later--is a
difficult one.

Authoring and curation are even more fundamental skills. Curation has
traditionally meant just making sure assets are safe, uncorrupted, and
ready for use, but it has broadened (particularly in the href="http://www.toccon.com/toc2011/public/schedule/detail/18000">keynote
by Steve Rosenbaum) to include gathering information, filtering
and tagging it, and generally understanding what's useful to different
audiences. This has always been a publisher's role. In the age of
abundant digital content, the gathering and filtering functions can
dwarf the editorial side of publishing. Thus, although Thomson Reuters
has enormous resources of their own, they also generate value by href="http://www.toccon.com/toc2011/public/schedule/detail/17827">
tracking the assets of many other organizations.

When working with other people's material, curation, authoring, and
editing all start to merge. Perhaps organizing other people's work
into a meaningful sequence is as valuable as authoring one's own. In
short, curation adds value in ways that are different from authoring
but increasingly valid.

Capitalizing on the old

I am not ready to change my business cards from saying "Editor" to
"Curator" because that would make it look like I'm presiding over a
museum. Indeed, I fear that many publishers are dragged down by their
legacy holdings, which may go back a hundred years or more. I talked
to one publisher who felt like his time was mostly taken up with huge
holdings of classics that had been converted to digital form, and he
was struggling to find time just to learn how his firm could add the
kinds of interactivity, multimedia, links, and other enhancements that
people at the show were saying these digital works deserved.

We hope that no publishers will end up as museums, but some may have
to survive by partnering with newer, more limber companies that grok
the digital age better, rather as the publisher of T.S. Eliot's
classic poem The Waste Land partnered with Touch Press, the
new imprint set up by Wolfram Research and discussed in a href="http://www.toccon.com/toc2011/public/schedule/detail/17732">keynote
by Theodore Gray. Readers will expect more than plain old rendered
text from their glittery devices, and Gray's immensely successful book
The Elements (which brought to life some very classic content, the
periodic table) shows one model for giving them what they want.

Two polar extremes

Gray defined his formula for success as one of bringing together top
talent in every field, rather as Hollywood film-makers do. Gray claims
to bring together "real authors" (meaning people with extraordinary
perspectives to offer), video producers with top industry
qualifications, and programmers whose skills go beyond the ordinary.

I can't fault Gray's formula--in fact, Head First follows the same
model by devoting huge resources to each book and choosing its topics
carefully to sell well--but if it was the only formula for success,
the book industry would put out a few hundred products each year like
Hollywood does. Gray did not offer this economic analysis himself, but
it's the only way I see it working financially. Not only would this
shift squelch the thousands of quirky niche offerings that make
publishing a joy at present, I don't consider it sustainable. How will
the next generation of top producers mature and experiment? If there
is no business model to support the long tail, they'll never develop
the skills needed in Gray's kind of business.

Business models are also a problem at the other extreme,
crowdsourcing. Everybody would like to draw on the insights of
readers. We've learned from popular books such as
The Wisdom of
Crowds
and Wikinomics that our
public has valuable things to say, and our own works grow in value if
we mine them adeptly. There are innumerable conversations going on out
there, on the forums and the rating sites, and the social networks,
and publishers want to draw those conversations into the book. The
problem is that our customers are very happy on the communities they
have created themselves, and while they will drop in on our site to
rate a product or correct an error, they won't create the digital
equivalent of Paris's nineteenth-century cafe culture for us.

Because I have been fascinated for years by online information sharing
and have href="http://www.praxagora.com/community_documentation/">researched it
a fair amount, I made use of the conference in the appropriate way
by organizing a roundtable for anyone who was interested under the
subject, "Can crowdsourcing coexist with monetization?" Some of the
projects the participants discussed included:

  • A book review site that pays experts for reviews, and then opens up
    the site to the public for their comments. This approach uses
    high-quality content to attract more content.

  • O'Reilly's own Answers
    site
    , which works similarly by sharing authors' ideas as well as
    excerpts from their books and draws postings from readers.

  • A site for baby product recommendations, put up by the publisher of
    books on parenting. The publisher has succeeded in drawing large
    groups of avid participants, and has persuaded volunteers to moderate
    the site for free. But it hasn't taken the next steps, such as
    generating useful content for its own books from readers, or finding
    ways to expand into information for parents of older children so the
    publisher can keep them on the site.

  • Offering a site for teachers to share educational materials and
    improve their curricula. In this case, the publisher is not interested
    in monetizing the content, but the participants use the site to
    improve their careers.

In between the premium offerings of Touch Press and the resale of
crowdsourced material lies a wide spectrum of things publishers can
do. At all points on the spectrum, though, traditional business
models are challenged.

The one strategic move that was emphasized in session after session
was to move our digital content to standards. EPUB, HTML 5, and other
standards are evolving and growing (sometimes beyond the scope, it
seems, of any human being to grasp the whole). If we use these
formats, we can mix and mingle our content with others, and thus take
advantage of partnerships, crowdsourcing, and new devices and
distribution opportunities.

Three gratifying trends

Trends can feel like they're running against us in the publishing
industry. But I heard of three trends that should make us feel good:
reading is on the increase, TV watching is on the decrease (which will
make you happy if you agree with such analysts as Jerry Mander and
Neil Postman), and people want portability--the right to read their
purchases on any device. The significance of the last push is that it
will lead to more openness and more chances for the rich environment
of information exchange that generates new media and ideas. We're in a
fertile era, and the first assets we need to curate are our own
skills.

An era in which to curate skills: report from Tools of Change conference

Three days of intensive
discussion about the current state of publishing
wrapped up last
night in New York City. Let's take a tour from start to end.

If I were to draw one point from the first session I attended, I would
say that metadata is a key competitive point for publishers. First,
facts such as who has reviewed a book, how many editions it has been
through, and other such things we call "metadata" can be valuable to
readers and institutions you want to sell content to. Second, it can
be valuable to you internally as part of curation (which I'll get to
later). Basically, metadata makes content far more useful. But it's
so tedious to add and to collate with other content's metadata that
few people out in the field bother to add it. Publishers are
well-placed to do it because they have the resources to pay people for
that unappreciated task.

If I were to leap to the other end of the conference and draw one
point from the closing keynotes, I would say that the key to survival
is to start with the determination that you're going to win, and to
derive your strategy from that. The closing keynoters offered a couple
strategies along those lines.


Kathy Sierra
claimed she started her href="http://oreilly.com/store/series/headfirst.html">Head First
series with no vision loftier than to make money and beat the
competition. The key to sales, as she has often explained in her
talks and articles on "Creating Passionate Users," is not to promote
the author or the topic but to promote what the reader could become.

href="http://www.toccon.com/toc2011/public/schedule/detail/17571">Ben
Huh of I Can Has
Cheezburger
said that one must plan a book to be a bestseller, the
way car or appliance manufacturers plan to meet the demands of their
buyers.

Thus passed the start and end of this conference. A lot happened in
between. I'll cover a few topics in this blog.

Skills for a future publishing

There clearly is an undercurrent of worry, if not actual panic, in
publishing. We see what is happening to newspapers, we watch our
markets shrink like those of the music and television industries, and
we check our own balance sheets as Borders Books slides into
bankruptcy. I cannot remember another conference where I heard, as I
did this week, the leader of a major corporation air doubts from the
podium about the future of her company and her career.

Many speakers combatted this sense of helplessness, of course, but
their advice often came across as, "Everything is going haywire and
you can't possibly imagine what the field will look like in a few
years, so just hang on and go with the flow. And by the way,
completely overturn your workflows and revamp your skill sets."

Nevertheless, I repeatedly heard references to four classic skills
that still rule in the field of publishing. These skills were always
important and will remain important, but they have to shift and in
some ways to merge.

Two of these skills are research and sales. Although one was usually
expected to do research on the market and topic before writing and do
sales afterward, the talks by Sierra, Huh, and others suggested that
these are continuous activities, and hard to separate. The big buzz in
all the content industries is about getting closer to one's audience.
There is never a start and end to the process.

The consensus is that casual exploitation of social
networking--sending out postings and updates and trying to chat with
readers online--won't sell your content. Your readers are a market and
must be researched like one: using surveys, statistical analysis, and
so on. This news can be a relief to the thousands of authors who feel
guilty (and perhaps are made to feel guilty by their publishers)
because they don't get pleasure from reporting things on Facebook
ranging from the breakfast cereal they ate to their latest brilliant
insight. But the question of how bring one's audience into one's
project--a topic I'll refer to as crowdsourcing and cover later--is a
difficult one.

Authoring and curation are even more fundamental skills. Curation has
traditionally meant just making sure assets are safe, uncorrupted, and
ready for use, but it has broadened (particularly in the href="http://www.toccon.com/toc2011/public/schedule/detail/18000">keynote
by Steve Rosenbaum) to include gathering information, filtering
and tagging it, and generally understanding what's useful to different
audiences. This has always been a publisher's role. In the age of
abundant digital content, the gathering and filtering functions can
dwarf the editorial side of publishing. Thus, although Thomson Reuters
has enormous resources of their own, they also generate value by href="http://www.toccon.com/toc2011/public/schedule/detail/17827">
tracking the assets of many other organizations.

When working with other people's material, curation, authoring, and
editing all start to merge. Perhaps organizing other people's work
into a meaningful sequence is as valuable as authoring one's own. In
short, curation adds value in ways that are different from authoring
but increasingly valid.

Capitalizing on the old

I am not ready to change my business cards from saying "Editor" to
"Curator" because that would make it look like I'm presiding over a
museum. Indeed, I fear that many publishers are dragged down by their
legacy holdings, which may go back a hundred years or more. I talked
to one publisher who felt like his time was mostly taken up with huge
holdings of classics that had been converted to digital form, and he
was struggling to find time just to learn how his firm could add the
kinds of interactivity, multimedia, links, and other enhancements that
people at the show were saying these digital works deserved.

We hope that no publishers will end up as museums, but some may have
to survive by partnering with newer, more limber companies that grok
the digital age better, rather as the publisher of T.S. Eliot's
classic poem The Waste Land partnered with Touch Press, the
new imprint set up by Wolfram Research and discussed in a href="http://www.toccon.com/toc2011/public/schedule/detail/17732">keynote
by Theodore Gray. Readers will expect more than plain old rendered
text from their glittery devices, and Gray's immensely successful book
The Elements (which brought to life some very classic content, the
periodic table) shows one model for giving them what they want.

Two polar extremes

Gray defined his formula for success as one of bringing together top
talent in every field, rather as Hollywood film-makers do. Gray claims
to bring together "real authors" (meaning people with extraordinary
perspectives to offer), video producers with top industry
qualifications, and programmers whose skills go beyond the ordinary.

I can't fault Gray's formula--in fact, Head First follows the same
model by devoting huge resources to each book and choosing its topics
carefully to sell well--but if it was the only formula for success,
the book industry would put out a few hundred products each year like
Hollywood does. Gray did not offer this economic analysis himself, but
it's the only way I see it working financially. Not only would this
shift squelch the thousands of quirky niche offerings that make
publishing a joy at present, I don't consider it sustainable. How will
the next generation of top producers mature and experiment? If there
is no business model to support the long tail, they'll never develop
the skills needed in Gray's kind of business.

Business models are also a problem at the other extreme,
crowdsourcing. Everybody would like to draw on the insights of
readers. We've learned from popular books such as
The Wisdom of
Crowds
and Wikinomics that our
public has valuable things to say, and our own works grow in value if
we mine them adeptly. There are innumerable conversations going on out
there, on the forums and the rating sites, and the social networks,
and publishers want to draw those conversations into the book. The
problem is that our customers are very happy on the communities they
have created themselves, and while they will drop in on our site to
rate a product or correct an error, they won't create the digital
equivalent of Paris's nineteenth-century cafe culture for us.

Because I have been fascinated for years by online information sharing
and have href="http://www.praxagora.com/community_documentation/">researched it
a fair amount, I made use of the conference in the appropriate way
by organizing a roundtable for anyone who was interested under the
subject, "Can crowdsourcing coexist with monetization?" Some of the
projects the participants discussed included:

  • A book review site that pays experts for reviews, and then opens up
    the site to the public for their comments. This approach uses
    high-quality content to attract more content.

  • O'Reilly's own Answers
    site
    , which works similarly by sharing authors' ideas as well as
    excerpts from their books and draws postings from readers.

  • A site for baby product recommendations, put up by the publisher of
    books on parenting. The publisher has succeeded in drawing large
    groups of avid participants, and has persuaded volunteers to moderate
    the site for free. But it hasn't taken the next steps, such as
    generating useful content for its own books from readers, or finding
    ways to expand into information for parents of older children so the
    publisher can keep them on the site.

  • Offering a site for teachers to share educational materials and
    improve their curricula. In this case, the publisher is not interested
    in monetizing the content, but the participants use the site to
    improve their careers.

In between the premium offerings of Touch Press and the resale of
crowdsourced material lies a wide spectrum of things publishers can
do. At all points on the spectrum, though, traditional business
models are challenged.

The one strategic move that was emphasized in session after session
was to move our digital content to standards. EPUB, HTML 5, and other
standards are evolving and growing (sometimes beyond the scope, it
seems, of any human being to grasp the whole). If we use these
formats, we can mix and mingle our content with others, and thus take
advantage of partnerships, crowdsourcing, and new devices and
distribution opportunities.

Three gratifying trends

Trends can feel like they're running against us in the publishing
industry. But I heard of three trends that should make us feel good:
reading is on the increase, TV watching is on the decrease (which will
make you happy if you agree with such analysts as Jerry Mander and
Neil Postman), and people want portability--the right to read their
purchases on any device. The significance of the last push is that it
will lead to more openness and more chances for the rich environment
of information exchange that generates new media and ideas. We're in a
fertile era, and the first assets we need to curate are our own
skills.

May 11 2010

Crowdsourcing and the challenge of payment


An unusual href="http://www.meetup.com/Distributed-Work/calendar/13300733/">Distributed
Distributed Work Meetup was held last night in four different
cities simultaneously, arranged through many hours of hard work by href="http://www.meetup.com/Distributed-Work/members/9584137/">Lukas
Biewald and his colleagues at distributed work provider href="http://crowdflower.com/">CrowdFlower.

With all the sharing of experiences and the on-the-spot analyses
taking place, I didn't find an occasion to ask my most pressing
question, so I'll put it here and ask my readers for comments:

How can you set up crowdsourcing where most people work for free but
some are paid, and present it to participants in a way that makes it
seem fair?


This situation arises all the time, with paid participants such as
application developers and community managers, but there's a lot of
scary literature about "crowding out" and other dangers. One basic
challenge is choosing what work to reward monetarily. I can think of
several dividing lines, each with potential problems:

  • Pay for professional skills and ask for amateur contributions on a
    volunteer basis.

    The problem with that approach is that so-called amateurs are invading
    the turf of professionals all the time, and their deft ability to do
    so has been proven over and over at crowdsourcing sites such as href="http://www.innocentive.com/">InnoCentive for inventors and
    SpringLeap or href="http://99designs.com/">99 Designs for designers. Still,
    most people can understand the need to pay credentialed professionals
    such as lawyers and accountants.

  • Pay for extraordinary skill and accept more modest contributions on a
    volunteer basis.

    This principle usually reduces to the previous one, because there's no
    bright line dividing the extraordinary from the ordinary. Companies
    adopting this strategy could be embarrassed when a volunteer turns in
    work whose quality matches the professional hires, and MySQL AB in
    particular was known for hiring such volunteers. But if it turns out
    that a large number of volunteers have professional skills, the whole
    principle comes into doubt.

  • Pay for tasks that aren't fun.

    The problem is that it's amazing what some people consider fun. On the
    other hand, at any particular moment when you need some input, you
    might be unable to find people who find it fun enough to do it for
    you. This principle still holds some water; for instance, I heard
    Linus Torvalds say that proprietary software was a reasonable solution
    for programming tasks that nobody would want to do for personal
    satisfaction.

  • Pay for critical tasks that need attention on an ongoing basis.

    This can justify paying people to monitor sites for spam and
    obscenity, keep computer servers from going down, etc. The problem
    with this is that no human being can be on call constantly. If you're
    going to divide a task among multiple people, you'll find that a
    healthy community tends to be more vigilant and responsive than
    designated individuals.

I think there are guidelines for mixing pay with volunteer work, and
I'd like to hear (without payment) ideas from the crowd.

Now I'll talk a bit about the meetup.

Venue and setup

I just have to start with the Boston-area venue. I had come to many
events at the MIT Media Lab and had always entered Building E14 on the
southwest side. The Lab struck me as a musty, slightly undermaintained
littered with odd jetsam and parts of unfinished projects; a place you
could hardly find your way around but that almost dragged creativity
from you into the open. The Lab took up a new building in 2009 but to
my memory the impact is still similar--it's inherent to the mission
and style of the researchers.

For the first time last night, I came to the building's northeast
entrance, maintained by the MIT School of Architecture. It is Ariel to
the Media Lab's Caliban: an airy set of spacious white-walled forums
sparsely occupied by chairs and occasional art displays. In a very
different way, this space also promotes open thoughts and broad
initiatives.

The ambitious agenda called for the four host cities (Boston, New
York, San Francisco, and Seattle) to share speakers over
videoconferencing equipment. Despite extensive preparation, we all had
audio, video, and connectivity problems at the last minute (in fact,
the Boston organizers crowdsourced the appeal for a laptop and I
surrendered mine for the video feed). Finally in Boston we
disconnected and had an old-fashioned presentation/discusser with an
expert speaker.

In regard to the MIT Media Lab and Architecture School, I think it's
amusing to report that Foursquare didn't recognize either one when I
asked for my current location. Instead, Foursquare offered a variety
of sites across the river, plus the nearby subway, the bike path, and
a few other oddities.

We were lucky to have href="http://crowdsourcing.typepad.com/about.html">Jeff Howe, the
WIRED contributor who invented the term href="http://crowdsourcing.typepad.com/">Crowdsourcing and wrote a
popular href="http://www.randomhouse.com/catalog/display.pperl/9780307396204.html">book
on it. He is currently a Nieman Fellow at Harvard. His talk was wildly
informal (he took an urgent call from a baby sitter in the middle) but
full of interesting observations and audience interactions.

He asked us to promote his current big project with WIRED, href="http://www.wired.com/epicenter/2010/03/one-book-one-twitter/">
One Book, One Twitter. His goal is to reproduce globally the
literacy projects carried out in many cities (one happens every year
in my town, Arlington, Mass.) where a push to get everyone to read a
book is accompanied by community activities and meetups. Through a
popular vote on WIRED, the book href="http://en.wikipedia.org/wiki/American_Gods">American Gods by
Neil Gaiman was chosen, and
people are tweeting away at #1b1t and related tags.

Discussion

With the sponsorship by CrowdFlower, our evening focused on
crowdsourcing for money. We had a few interesting observations about
the differences between free Wikipedia-style participation and
work-for-pay, but what was most interesting is that basic human
processes like community-building go in both places.

Among Howe's personal revelations was his encounter with the fear of
crowdsourcing. Everyone panics when they first see what crowdsourcing
is doing to his or her personal profession. For instance, when Howe
talked about the graphic design sites mentioned earlier, professional
designers descended on him in a frenzy. He played the sage, lecturing
them that the current system for outsourcing design excludes lots of
creative young talent, etc.

But even Howe, when approached by an outfit that is trying to
outsource professional writing, felt the sting of competition and
refused to help them. But he offered respect for href="http://www.helium.com/">Helium, which encourages self-chosen
authors to sign up and compete for freelance assignments.

Howe is covering citizen journalism, though, a subject that Dan
Gillmor wrote about in a book that O'Reilly published href="http://oreilly.com/catalog/9780596102272/">We the
Media
, and that he continues to pursue at his href="http://mediactive.com/">Mediactive site and a new book.

Job protection can also play a role in opposition to crowdsourcing,
because it makes it easier for people around the world to work on
local projects. (Over half the workers on href="https://www.mturk.com/mturk/welcome">Mechanical Turk now
live in India. Biewald said one can't trust what workers say on their
profiles; IP addresses reveal the truth.) But this doesn't seem to
have attracted the attention of the xenophobes who oppose any support
for job creation in other countries, perhaps because it's hard to get
riled up about "jobs" that have the granularity of a couple seconds.

Crowdsourcing is known to occur, as Howe put it, in "situations of
high social capital," simply meaning that people care about each other
and want to earn each other's favor. It's often reinforced by explicit
rating systems, but even more powerful is the sense of sharing and
knowing that someone else is working alongside you. In a href="http://broadcast.oreilly.com/2008/12/finding-a-sweet-spot-for-crowd.html">blog
I wrote a couple years ago, I noted that competition site href="http://www.topcoder.com/">TopCoder maintained a thriving
community among programmers who ultimately were competing with each
other.

Similarly, the successful call center href="http://liveops.com/">LiveOps provides forums for operators
to talk about their experiences and share tips. This has become not
just a source of crowdsourced help, and not even a way to boost morale
by building community, but an impetus for quality. Operators actually
discipline each other and urge each other to greater heights of
productivity. LiveOps pays its workers more per hour than outsourcing
calls to India normally costs to clients, yet LiveOps is successful
because of its reputation for high quality.

We asked why communities of paid workers tended to reinforce quality
rather than go in the other direction and band together to cheat the
system. I think the answer is obvious: workers know that if they are
successful at cheating, clients will stop using the system and it will
go away, along with their source of income.

Biewald also explained that CrowdFlower has found it fairly easy to
catch and chase away cheaters. It seeds its jobs with simple questions
to which it knows the right answers, and warns the worker right away
if the questions are answered incorrectly. After a couple alerts, the
bad worker usually drops out.

We had a brief discussion afterward about the potential dark side of
crowdsourcing, which law professor Jonathan Zittrain covered in a talk
called href="http://www.scu.edu/ethics/practicing/focusareas/technology/zittrain.html">Minds
for Sale. One of Zittrain's complaints is that malicious actors
can disguise their evil goals behind seemingly innocuous tasks farmed
out to hundreds of unknowing volunteers. But someone who used to work
for Amazon.com's Mechanical Turk said people are both smarter and more
ethical than they get credit for, and that participants on that
service quickly noted any task that looked unsavory and warned each
other away.

As the name Mechanical Turk (which of course had a completely
unrelated historical origin) suggests, many tasks parceled out by
crowdsourcing firms are fairly mechanical ones that we just haven't
figured out how to fully mechanize yet: transcribing spoken words,
recognizing photos, etc. Biewald said that his firm still has a big
job persuading potential clients that they can trust key parts of the
company supply chain to anonymous, self-chosen workers. I think it may
be easier when the company realizes that a task is truly mechanical
and that they keep full control over the design of the project. But
crowdsourcing is moving up in the world fast; not only production but
control and choice are moving into the crowd.

Howe highlighted Fox News, which runs a href="http://ureport.foxnews.com/">UReport site for
volunteers. The stories on Fox News' web site, according to Howe, are
not only written by volunteers but chosen through volunteer ratings,
somewhat in Slashdot style.

Musing on the sociological and economic implications of crowdsourcing,
as we did last night, can be exciting. Even though Mechanical Turk
doesn't seem to be profitable, its clients capture many cost savings,
and other crowdsourcing firms have made headway in particular
fields. Howe hails crowdsourcing as the first form of production that
really reflects the strengths of the Internet, instead of trying to
"force old industrial-age crap" into an online format. But beyond the
philosophical rhetoric, crowdsourcing is an area where a lot of
companies are making money.

January 06 2010

December 24 2009

Peer to Patent Australia recruits volunteer prior art searchers

The

Peer to Patent
project has already earned its place in history. It was explicitly
cited as inspiration for the open government initiative in the Obama
administration, which recently released a comprehensive directive
(available as a
PDF)
covering federal agencies. The founder of the project, law professor
Beth Noveck, began implementation of the directive as Deputy CTO in
the US government. But I've been wondering, along with many other
people, where Peer to Patent itself is going.

It's encouraging to hear that a new pilot has started in Australia and
has gathered a small community of volunteer patent art seekers. You
can check out the

official site

and its

Wikipedia page
.
Because Australia is much smaller in population than the US and sees
much less patent activity, the scope of the pilot is smaller but seems
to be chugging along nicely.

The pilot started on December 9 and plans to run for six months,
offering 40 patents for review in the areas of software and business
methods (the same ones as the US Peer to Patent project). Among
participating patent applicants are IBM, General Electric,
Hewlett-Packard, Yahoo!, CSIRO, and Aristocrat. Right now, 15 patents
are posted, each has at least one volunteer reviewer, and one boasts
two suggestions for potential prior art.

Professor Brian Fitzgerald of the Queensland University of Technology,
the Project Leader of Peer to Patent Australia, says, "Peer to Patent
allows people from anywhere to plug into the patent examination
process and to add what value they can. And from what we have seen in
the US, it works: examiners are relying on the Peer to Patent prior
art notifications. Our aim is to help build an international platform
for the project as well as embed its benefits within the Australian
patent system. We ask you to join the Australian project and help
contribute to the development of Peer to Patent on a worldwide basis."

While the U.S. pilot is undergoing evaluation, Peer to Patent's
executive directory Mark Webbink says, "Signs are good for a potential
restart of the program some time in 2010. Dave Kappos, the Under
Secretary of Commerce and Director of the USPTO, has long been a
supporter of Peer to Patent, and the prior art contributions appear to
be proving useful. The worldwide economy produced some drag on program
expansion when the UK Intellectual Property Office delayed its
anticipated pilot. However, the Japan Patent Office, which previously
ran its own peer review pilot, now appears interested in expanding its
program. IP Australia and Queensland University of Technology are to
be commended for moving on the pilot so quickly." Brian Fitzgerald
says that China and other Asian countries are watching Japan and
Australia with interest.

I have followed Peer to Patent since fairly early drafts of the
proposal, have written about it frequently, and believe it is both
viable and necessary. The recent ruling against Microsoft Office shows
that patents in software, at least, are way out of control. Prior art
cannot in itself solve a broken system, but a robust examination
process can at least make applicants think twice about trying to exert
ownership over routine concepts such as separating a document's markup
from its content. (That's the purpose of markup in the first place.)
Incidentally, Australia has its own version of the famous

Bilski patent case
,
Grant v Commissioner of Patents.

In fact, the progress Peer to Patent has made in many countries proves
my faith in it. Just think about the inertia of government agencies
and the impenetrability of both the individual patent application and
the patent process as a whole. Who would imagine, putting all those
barriers together, that Peer to Patent could have accomplished so much
already?

We're not on Internet time here, but on policy time. Peer to Patent is
still a baby, and with enough care and feeding it can thrive and grow
strong.

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl