Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

December 22 2010

Reaching the pinnacle: truly open web services and clouds

Previous section:

Why web services should be released as free software

Free software in the cloud isn't just a nice-sounding ideal or even an efficient way to push innovation forward. Opening the cloud also opens the path to a bountiful environment of computing for all. Here are the steps to a better computing future.

Provide choice

The first layer of benefits when companies release their source code
is incremental: incorporating bug fixes, promoting value-added
resellers, finding new staff among volunteer programmers. But a free
software cloud should go far beyond this.

Remember that web services can be run virtually now. When you log in
to a site to handle mail, CRM, or some other service, you may be
firing up a virtual service within a hardware cloud.

So web and cloud providers can set up a gallery of alternative
services, trading off various features or offering alternative
look-and-feel interfaces. Instead of just logging into a site such as
Salesforce.com and accepting whatever the administrators have put up
that day, users could choose from a menu, and perhaps even upload
their own preferred version of the service. The SaaS site would then
launch the chosen application in the cloud. Published APIs would allow
users on different software versions to work together.

If a developer outside the company creates a new version with
substantial enhancements, the company can offer it as an option. If
new features slow down performance, the company can allow clients to
decide whether the delays are worth it. To keep things simple for
casual clients, there will probably always be a default service, but
those who want alternatives can have them.

Vendors can provide "alpha" or test sites where people can try out new
versions created by the vendor or by outsiders. Like stand-alone
software, cloud software can move through different stages of testing
and verification.

And providing such sandboxes can also be helpful to developers in
general. A developer would no longer have to take the trouble to
download, install, and configure software on a local computer to do
development and testing. Just log into the sandbox and play.
Google offers
The Go Playground
to encourage students of their Go language. CiviCRM,
which is a free software server (not a cloud or web service) offers a
sandbox for testing new
features. A web service company in electronic health records,
Practice Fusion,
which issued an API challenge in September, is now creating a sandbox
for third-party developers to test the API functionality on its
platform. I would encourage web and cloud services to go even
farther: open their own source code and provide sandboxes for people
to rewrite and try out new versions.

Let's take a moment for another possible benefit of running a
service as a virtual instance. Infected computer systems present a
serious danger to users (who can suffer from identity theft if their
personal data is scooped up) and other systems, which can be
victimized by denial-of-service attacks or infections of their own.
An awkward tower of authorizations reaching right down into the
firmware or hardware. In trusted computing, the computer itself checks
to make sure that a recognized and uncompromised operating system is
running at boot time. The operating system then validates each
application before launching it.

Trusted computing is Byzantine and overly controlling. The hardware
manufacturer gets to decide which operating system you use, and
through that which applications you use. Wouldn't users prefer
to run cloud instances that are born anew each time they log in? That
would wipe out any infection and ensure a trusted environment at the
start of each session without cumbersome gatekeeping.

Loosen the bonds on data

As we've seen, one of the biggest fears keeping potential clients away
from web services and cloud computing is the risk entailed in leaving
their data in the hands of another company. Here it can get lost,
stolen, or misused for nefarious purposes.

But data doesn't have to be stored on the computer where the
processing is done, or even at the same vendor. A user could fire up a
web or cloud service, submit a data source and data store, and keep
results in the data store. IaaS-style cloud computing involves
encrypted instances of operating systems, and if web services did the
same, users would automatically be protected from malicious
prying. There is still a potential privacy issue whenever a user runs
software on someone else's server, because it could skim off private
data and give to a marketing firm or law enforcement.

Alert web service vendors such as Google know they have to assuage
user fears of locked-in data. In Google's case, they created a
protocol called the Data Liberation Front (see an article by two
Google employees,

The Case Against Data Lock-in
). This will allow users to extract
their data in a format that makes it feasible to reconstitute it in
its original format on another system, but it doesn't actually sever
the data from the service as I'm suggesting.

A careful client would store data in several places (to guard against
loss in case one has a disk failure or other catastrophe). The client
would then submit one location to the web service for processing, and
store the data back in all locations or store it in the original
source and then copy it later, after making sure it has not been
corrupted.

A liability issue remains when calculation and data are separated. If
the client experiences loss or corruption, was the web service or the
data storage service responsible? A ping-pong scenario could easily
develop, with the web services provider saying the data storage
service corrupted a disk sector, the data storage service saying the
web service produced incorrect output, and the confused client left
furious with no recourse.

This could perhaps be solved by a hash or digest, a very stable and
widely-used practice used to ensure that any change to the data, even
the flip of a single bit, produces a different output value. A digest
is a small number that represents a larger batch of data. Algorithms
that create digests are fast but generate output that's reasonably
unguessable. Each time the same input is submitted to the algorithm,
it is guaranteed to generate the same digest, but any change to the
input (through purposeful fiddling or an inadvertent error) will
produce a different digest.

The web service could log each completed activity along with the
digest of the data it produces. The data service writes the data,
reads it back, and computes a new digest. Any discrepancy signals a
problem on the data service side, which it can fix by repeating the
write. In the future, if data is corrupted but has the original
digest, the client can blame the web service, because the web service
must have written corrupt data in the first place.


Sascha Meinrath, a wireless networking expert, would like to see
programs run both on local devices and in the cloud. Each
program could exploit the speed and security of the local device but
reach seamlessly back to remote resources when necessary, rather like
a microprocessor uses the local caches as much as possible and faults
back to main memory when needed. Such a dual arrangement would offer
flexibility, making it possible to continue work offline, keep
particularly sensitive data off the network, and let the user trade
off compute power for network usage on a case-by-case basis. (Wireless
use on a mobile device can also run down the battery real fast.)

Before concluding, I should touch on another trend that some
developers hope will free users from proprietary cloud services:
peer-to-peer systems. The concept behind peer-to-peer is appealing and
have been

gaining more attention recently
:
individuals run servers on their systems at home or work and serve up
the data they want. But there are hard to implement, for reasons I
laid out in two articles,

From P2P to Web Services: Addressing and Coordination
and

From P2P to Web Services: Trust
. Running your own
software is somewhat moot anyway, because you're well advised to store
your data somewhere else in addition to your own system. So long as
you're employing a back-up service to keep your data safe in case of
catastrophe, you might as well take advantage of other cloud services
as well.

I also don't believe that individual site maintained by
individuals will remain the sources for important data, as the
peer-to-peer model postulates. Someone is going to mine that data and
aggregate it--just look at the proliferation of Twitter search
services. So even if users try to live the ideal of keeping control
over their data, and use distributed technologies like the
Diaspora project,
they will end up surrendering at least some control and data to a
service.

A sunny future for clouds and free software together

The architecture I'm suggesting for computing makes free software even
more accessible than the current practice of putting software on the
Internet where individuals have to download and install it. The cloud
can make free software as convenient as Gmail. In fact, for free
software that consumes a lot of resources, the cloud can open it up to
people who can't afford powerful computers to run the software.

Web service offerings would migrate to my vision of a free software
cloud by splitting into several parts, any or all of them free
software. A host would simply provide the hardware and
scheduling for the rest of the parts. A guest or
appliance would contain the creative software implementing
the service. A sandbox with tools for compilation, debugging,
and source control would make it easy for developers to create new
versions of the guest. And data would represent the results
of the service's calculations in a clearly documented
format. Customers would run the default guest, or select another guest
on the vendor's site or from another developer. The guest would output
data in the standardized format, to be stored in a location of the
customer's choice and resubmitted for the next run.

With cloud computing, the platform you're on no longer becomes
important. The application is everything and the computer is (almost)
nothing. The application itself may also devolve into a variety of
mashed-up components created by different development teams and
communicating over well-defined APIs, a trend I suggested almost a
decade ago in an article titled

Applications, User Interfaces, and Servers in the Soup
.

The merger of free software with cloud and web services is a win-win.
The convenience of IaaS and PaaS opens up opportunities for
developers, whereas SaaS simplifies the use of software and extends its
reach. Opening the source code, in turn, makes the cloud more
appealing and more powerful. The transition will take a buy-in from
cloud and SaaS providers, a change in the software development
process, a stronger link between computational and data clouds, and
new conventions to be learned by clients of the services. Let's get
the word out.

(I'd like to thank Don Marti for suggesting additional ideas for this
article, including the fear of creating a two-tier user society, the
chance to shatter the tyranny of IT departments, the poor quality of
source code created for web services, and the value of logging
information on user interaction. I would also like to thank Sascha
Meinrath for the idea of seamless computing for local devices and the
cloud, Anne Gentle for her idea about running test and production
systems in the same cloud, and Karl Fogel for several suggestions,
especially the value of usage statistics for programmers of web
services.)

December 20 2010

Why web services should be released as free software

Previous section:

Why clouds and web services will continue to take over computing

Let's put together a pitch for cloud and web service providers. We have two hurdles to leap: one persuading them how they'll benefit by releasing the source code to their software, and one addressing their fear of releasing the source code. I'll handle both tasks in this section, which will then give us the foundation to look at a world of free clouds and web services.


Cloud and web service providers already love free software

Reasons for developing software as open source have been told and
retold many times; popular treatments include Eric S. Raymond's
essays in the collection

The Cathedral and the Bazaar
(which O'Reilly puts out
in print),
and Yochai Benkler's Wealth of Networks
(available online as a
PDF
as well as the basis for a
wiki,
and published by Yale University Press). But cloud and web service
companies don't have to be sold on free software--they use it all the
time.

The cornucopia of tools and libraries produced by projects such as the
open source Ruby on Rails make it the first stop on many
services' search for software. Lots of them still code pages in
other open source tools and languages such as PHP and jQuery. Cloud
providers universally base their offerings on Linux, and many use open
source tools to create their customers' virtual systems.
(Amazon.com currently bases its cloud offerings on href="http://www.xen.org/">Xen, and
KVM, heavily backed
by Red Hat, is also a contender.) The best monitoring tools are also
free software. In general, free software is sweeping through the
cloud. (See also

Open Source Projects for Cloud on Rise, According to Black Duck Software Analysis
).

So cloud and web service providers live the benefits of free software
every day. They know the value of communities who collaborate to
improve and add new layers to software. They groove on the convenience
of loading as much software they want on any systems without
struggling with a license server. They take advantage of frequent
releases with sparkling new features. And they know that there are
thousands of programmers out in the field familiar with the software,
so hiring is easier.

And they give back to open source communities too: they contribute
money, developer time, and valuable information about performance
issues and other real-life data about the operation of the software.

But what if we ask them to open their own code? We can suggest that
they can have better software by letting their own clients--the
best experts in how their software is used--try it out and look
over the source for problems. Web service developers also realize that
mash-ups and extensions are crucial in bringing them more traffic, so
one can argue that opening their source code will make it easier for
third-party developers to understand it and write to it.

Web and cloud services are always trying to hire top-notch programmers
too, and it's a well-established phenomenon that releasing the
source code to a popular product produces a cadre of experts out in
the field. Many volunteers submit bug fixes and enhancements in order
to prove their fitness for employment--and the vendors can pick
up the best coders.

These arguments might not suffice to assail the ramparts of vendors'
resistance. We really need to present a vision of open cloud computing
and persuade vendors that their clients will be happier with services
based on free software. But first we can dismantle some of the fear
around making source code open.

No reason to fear opening the source code

Some cloud and web providers, even though they make heavy use of free
software internally, may never have considered releasing their own
code because they saw no advantages to it (there are certainly
administrative and maintenance tasks associated with opening source
code). Others are embarrassed about the poor structure and coding
style of their fast-changing source code.

Popular methodologies for creating Web software can also raise a
barrier to change. Companies have chosen over the past decade to
feature small, tight-knit teams who communicate with each other and
stakeholders informally and issue frequent software releases to try
out in the field and then refine. Companies find this process more
"agile" than the distributed, open-source practice of putting
everything in writing online, drawing in as broad a range of
contributors as possible, and encouraging experiments on the side. The
agile process can produce impressive results quickly, but it places an
enormous burden on a small group of people to understand what clients
want and massage it into a working product.

We can't move cloud and SaaS sites to free software, in any case, till
we address the fundamental fear some of these sites share with
traditional proprietary software developers: that someone will take
their code, improve it, and launch a competing service. Let's turn to
that concern.

If a service releases its code under the GNU Affero General Public
License, as mentioned in the
previous section,
anyone who improves it and runs a web site with the improved code is
legally required to release their improvements. So we can chip away at
the resistance with several arguments.

First, web services win over visitors through traits that are
unrelated to the code they run. Traits that win repeat visits include:


  • Staying up (sounds so simple, but needs saying)

  • The network effects that come from people inviting their friends or
    going where the action is--effects that give the innovative
    vendor a first-mover advantage

  • A site's appearance and visual guides to navigation, which
    includes aspects that can be trademarked

  • Well-designed APIs that facilitate the third-party applications
    mentioned earlier

So the source code to SaaS software isn't as precious a secret
as vendors might think. Anyway, software is more and more a commodity
nowadays. That's why a mind-boggling variety of JavaScript
frameworks, MVC platforms, and even whole new programming languages
are being developed for the vendors' enjoyment. Scripting
languages, powerful libraries, and other advances speed up the pace of
development. Anyone who likes the look of a web service and wants to
create a competitor can spin it up in record time for low cost.

Maybe we've softened up the vendors some. Now, on to the
pinnacle of cloud computing--and the high point on which this
article will end--a vision of the benefits a free cloud could
offer to vendors, customers, and developers alike.

Next section:
Reaching the pinnacle: truly open web services and clouds.

December 17 2010

Why clouds and web services will continue to take over computing

Series

What are the chances for a free software cloud?

  • Resolving the contradictions between web services, clouds, and open source (12/13)
  • Defining clouds, web services, and other remote computing (12/15)
  • Why clouds and web services will continue to take over computing (12/17)
  • Why web services should be released as free software (12/20)
  • Reaching the pinnacle: truly open web services and clouds (12/22)

Additional posts in this 5-part series are available here.

Previous section:

Definitions: Clouds, web services, and other remote computing

The tech press is intensely occupied and pre-occupied with analyzing the cloud from a business point of view. Should you host your operations in a cloud provider? Should you use web services for office work? The stream of articles and blogs on these subjects show how indisputably the cloud is poised to take over.

But the actual conclusions these analysts reach are intensely
conservative: watch out, count up your costs carefully, look closely
at

regulations and liability issues
that hold you back, etc.
The analysts are obsessed with the cloud, but they're not
encouraging companies to actually use it--or at least
they're saying we'd better put lots of thought into it
first.

My long-term view convinces me we all WILL be in the cloud.
No hope in bucking the trend. The advantages are just too compelling.

I won't try to replicate here the hundreds and hundreds of
arguments and statistics produced by the analysts. I'll just run
quickly over the pros and cons of using cloud computing and web
services, and why they add up to a ringing endorsement. That will help
me get to the question that really concerns this article: what can we
do to preserve freedom in the cloud?

The promise of the cloud shines bright in many projections. The
federal government has committed to a "Cloud First" policy in its
recent

Information Technology reform plan
.
The companies offering IaaS, and Paas, and SaaS promulgate
mouth-watering visions of their benefits. But some of the advantages I
see aren't even in the marketing literature--and some of them, I bet,
could make even a free software advocate come around to appreciating
the cloud.

Advantages of cloud services

The standard litany of reasons for moving to IaaS or PaaS can be
summarized under a few categories:

Low maintenance

No more machine rooms, no more disk failures (that is, disk failures you know about and have to deal with), no more late-night calls to go in and reboot a critical server.

These simplifications, despite the fears of some Information
Technology professionals, don't mean companies can fire their system
administrators. The cloud still calls for plenty of care and
feeding. Virtual systems go down at least as often as physical ones,
and while the right way to deal with system failures is to automate
recovery, that takes sophisticated administrators. So the system
administrators will stay employed and will adapt. The biggest change
will be a shift from physical system management to diddling with
software; for an amusing perspective on the shift see my short story

Hardware Guy
.


Fast ramp-up and elasticity

To start up a new operation, you no longer have to wait for hardware to arrive and then lose yourself in snaking cables for hours. Just ask the cloud center to spin up as many virtual systems as you want.

Innovative programmers can also bypass IT management, developing new
products in the cloud. Developers worry constantly whether their
testing adequately reproduces the real-life environment in which
production systems will run, but if both the test systems and the
final production systems run in the cloud, the test systems can match
the production ones much more closely.

The CIO of O'Reilly Media, citing the goal of directing

60 percent of IT spending into new projects
,
has made internal and external cloud computing into pillars of

O'Reilly's IT strategy
.

Because existing companies have hardware and systems for buying
hardware in place already, current cloud users tend to come from
high-tech start-ups. But any company that wants to launch a new
project can benefit from the cloud. Peaks and troughs in usage can
also be handled by starting and stopping virtual systems--you
just have to watch how many get started up, because a lack of
oversight can incur run-away server launches and high costs.

Cost savings

In theory, clouds provide economies of scale that undercut anything an individual client could do on their own. How can a private site, chugging away on a few computers, be more efficient than thousands of fungible processors in one room under the eye of a highly trained expert, all strategically located in an area with cheap real estate and electricity?

Currently, the cost factor in the equation is not so cut and dried.
Running multiple servers on a single microprocessor certainly brings
savings, although loads have to be balanced carefully to avoid slowing
down performance unacceptably. But running processors constantly
generates heat, and if enough of them are jammed together the costs of
air conditioning could exceed the costs of the computers. Remote
computing also entails networking costs.

It will not take long, however, for the research applied by cloud
vendors to pay off in immense efficiencies that will make it hard for
organizations to justify buying their own computers.


Elasticity and consolidation make IaaS so attractive that large
companies are trying to build "private clouds" and bring all the
organization's server hardware into one department, where the
hardware is allocated as virtual resources to the rest of the company.
These internal virtualization projects don't incur some of the
disadvantages that this paper address, so I won't consider them
further.

Advantages of web services

SaaS offers some benefits similar to IaaS and PaaS, but also
significant differences.

Low maintenance

No more installation, no more upgrades, no more incompatibilities with other system components or with older versions of the software on other people's systems. Companies licensing data, instead of just buying it on disks, can access it directly from the vendor's site and be sure of always getting the most recent information.


Fast ramp-up and elasticity

As with IaaS, SaaS frees staff from running every innovation past the IT group. They can recreate their jobs and workflows in the manner they want.


Feedback

To see what's popular and to prioritize future work, companies love to know how many people are using a feature and how long they spend in various product functions. SaaS makes this easy to track because it can log every mouse click.

Enough of the conventional assessment. What hidden advantages lie in clouds and web services?

What particularly should entice free and open software software advocates is web services' prospects for making money. Although free software doesn't have to be offered cost-free (as frequently assumed by those who don't know the field), there's no way to prevent people from downloading and installing it, so most of the money in free software is made through consulting and additional services. Web services allow subscriptions instead, a much more stable income. Two popular content management systems exemplify this benefit: WordPress offers hosting at wordpress.com and Drupal at drupalgardens.com, all while offering their software as open source.

But I find another advantage to web services. They're making
applications better than they ever have been in the sixty-year history
of application development.

Compare your own experiences with stand-alone software to web sites. The quality of the visitor's experience on a successful web site is much better. It's reminiscent of the old cliché about restaurant service in capitalist versus socialist economies.

According to this old story, restaurants in capitalist countries
depend on repeat business from you and your friends, driving the
concern for delivering a positive customer experience from management
down to the lowest level of the wait staff. In a socialist economy,
supposedly, the waiters know they will get paid no matter whether you
like their service or not, so they just don't try. Furthermore,
taking pains to make you happy would be degrading to them as heroes of
a workers' society.

I don't know whether this phenomenon is actually true of restaurants,
but an analogous dynamic holds in software. Web sites know that
visitors will vanish in half a second if the experience is not
immediately gripping, gratifying, and productive. Every hour of every
day, the staff concentrate on the performance and usability of the
site. Along with the business pressure on web services to keep users
on the page, the programmers there can benefit from detailed feedback
about which pages are visited, in which order, and for how long.

In contrast, the programmers of stand-alone software measure
their personal satisfaction by the implementation of complex and
sophisticated calculations under the product's surface. Creating
the user interface is a chore relegated to less knowledgeable staff.

Whatever the reason, I find the interfaces of proprietary as well as
free software to be execrable, and while I don't have statistics to
bolster my claim. I think most readers can cite similar experiences.
Games are the main exception, as well as a few outstanding consumer
applications, but these unfortunately do not seem a standard for the
vast hoards of other programmers to follow.

Moving one's aching fingers from stand-alone software to a web
service brings a sudden rush of pleasure, affirming what working with
computers can be. A bit of discipline in the web services world would
be a good cold bath for the vendors and coders.


Drawbacks of clouds and web services

So why are the analysts and customers still wary of cloud computing? They have their reasons, but some dangers are exaggerated.

Managers responsible for sensitive data feel a visceral sense of vulnerability when they entrust that data to some other organization. Web services have indeed had breaches, because they are prisoners of the twin invariants that continue to ensure software flaws: programmers are human, and so are administrators. Another risk comes when data is transmitted to a service such as Amazon.com's S3, a process during which it be seen or even in theory altered.

Still, I expect the administrators of web and cloud services to be better trained and more zealous in guarding against security breaches than the average system administrator at a private site. The extra layer added by IaaS also creates new possibilities. An article called "Security in the Cloud" by Gary Anthes, published in the November 2010 Communications of the ACM, points to research projects by Hewlett-Packard and IBM that would let physical machines monitor the virtual machines running on them for viruses and other breaches of security, a bit like a projectionist can interrupt a movie.

A cloud or web service provider creates some risk just because it
provides a tasty target to intruders, who know they can find thousands
of victims in one place. On the other hand, if you put your data in
the cloud, you aren't as likely to lose it to some drive-by
trouble-seeker picking it up off of a wireless network that your
administrator failed to secure adequately, as famously happened to
T.J. Maxx (and they weren't alone).

And considering that security experts suspect most data breaches to be
internal, putting data in the cloud might make it more secure by
reducing its exposure to employees outside of the few programmers or
administrators with access rights. If the Department of Defense had
more systems in the cloud, perhaps it wouldn't have suffered such a
sinister security breach in 2008 through a

flash drive with a virus
.

In general, the solution to securing data and transactions is to
encrypt everything. Encrypting the operating systems loaded in IaaS,
for instance, gives the client some assurance that no one can figure
out what it's doing in the cloud, even if another client or even the
vendor itself tries to snoop. If some technological earthquake
undermines the integrity of encryption technologies--such as the
development of a viable quantum computer--we'll have to rethink the
foundations of the information age entirely anyway.

The main thing to remember is that most data breaches are caused by
lapses totally unrelated to how servers are provisioned: they happen
because staff stored unencrypted data on laptops or mobile devices,
because intruders slipped into applications by exploiting buffer
overflows or SQL injection, and so on. (See, for instance, a
U.S. Health & Human Services study saying that
"Laptop theft
is the most prevalent cause of the breach of health information
affecting more than 500 people.
")

Regulations such as HIPAA can rule out storing some data off-site, and
concerns about violating security regulations come up regularly during
cloud discussions. But these regulations affect only a small amount of
the data and computer operations, and the regulations can be changed
once the computer industry shows that clouds are both valuable and
acceptably secure.

Bandwidth is a concern, particularly in less technologically developed
parts of the world (like much of the United States, come to think of
it), where bandwidth is inadequate. But in many of these areas, people
often don't even possess computers. SaaS is playing a major role
in underdeveloped areas because it leverages the one type of computer
in widespread use (the cell phone) and the one digital network
that's widely available (the cellular grid). So in some ways,
SaaS is even more valuable in underdeveloped areas, just in a
different form from regions with high bandwidth and universal access.

Nevertheless, important risks and disadvantages have been identified
in clouds and web services. IaaS and PaaS are still young enough (and
their target customers sophisticated enough) for the debate to keep up
pretty well with trends; in contrast, SaaS has been crying out quite a
while for remedies to be proposed, such as the
best practices
recently released by the Consumer Federation of America. This article
will try to raise the questions to a higher level, to find more
lasting solutions to problems such as the following.

Availability

Every system has down time, but no company wants to be at the mercy of a provider that turns off service, perhaps for 24 hours or more, because they failed to catch a bug in their latest version or provide adequate battery backup during a power failure.

When Wikileaks was forced off of Amazon.com's cloud service, it sparked outrage whose echo reached as far as a Wall Street Journal blog and highlighted the vulnerability of depending on clouds. Similarly, the terms of service on social networks and other SaaS sites alienate some people who feel they have legitimate content that doesn't pass muster on those sites.

Liability

One of the big debates in the legal arena is how to apportion blame when a breach or failure happens in a cascading service, where one company leases virtual systems in the cloud to provide a higher-level service to other companies.


Reliability

How can you tell whether the calculation that a service ran over your corporate data produced the correct result? This is a lasting problem with proprietary software, which the free software developers argue they've solved, but which most customers of proprietary software have learned to live with and which therefore doesn't turn them against web services.

But upgrades can present a problem. When a new version of stand-alone
software comes out, typical consumers just click "Yes" on the upgrade
screen and live with the consequences. Careful system administrators
test the upgrade first, even though the vendor has tested it, in case
it interacts perniciously with some factor on the local site and
reveals a bug. Web services reduce everyone to the level of a passive
consumer by upgrading their software silently. There's no
recourse for clients left in the lurch.


Control

Leaving the software on the web service's site also removes all end-user choice. Some customers of stand-alone software choose to leave old versions in place because the new version removed a feature the customers found crucial, or perhaps just because they didn't want the features in the new version and found its performance worse. Web services offer one size to fit all.

Because SaaS is a black box, and one that can change behavior without
warning to the visitors, it can provoke concerns among people
sensitive about consistency and reliability. See my article

Results from Wolfram Alpha: All the Questions We Ever Wanted to Ask About Software as a Service
.

Privacy

Web services have been known to mine customer data and track customer behavior for marketing purposes, and have given data to law enforcement authorities. It's much easier to monitor millions of BlackBerry messages traveling through a single server maintained by the provider than the messages bouncing in arbitrary fashion among thousands of Sendmail servers. If a customer keeps the data on its own systems, law enforcement can still subpoena it, but at least the customer knows she's being investigated.

In the United States, furthermore, the legal requirements that investigators must meet to get data is higher for customers' systems than for data stored on a third-party site such as a web service. Recent Congressional hearings (discussed on O'Reilly's Radar site highlighted the need to update US laws to ensure privacy for cloud users).


These are knotty problems, but one practice could tease them apart:
making the software running clouds or web services open source.

A number of proponents for this viewpoint can be found, such as the Total Information Outsourcing group, as well as a few precedents. Besides the WordPress and Drupal services mentioned earlier, StatusNet runs the microblogging site identi.ca and opens up its code so that other people could run sites that interoperate with it. Source code for Google's AppEngine, mentioned earlier as a leading form of IaaS, has been offered for download by Google under a free license. Talend offers data integration and business intelligence as both free software and SaaS.

The Free Software Foundation, a leading free software organization that provides a huge amount of valuable software to Linux and other systems through the GNU project, has created a license called the GNU Affero General Public License that encourages open code for web services. When sites such as StatusNet release code under that license, other people are free to build web services on it but must release all their enhancements and bug fixes to the world as well.

What problems can be ameliorated by freeing the cloud and web service software? Can the companies who produced that software be persuaded to loosen their grip on the source code? And what could a world of free cloud and web services look like? That is where we will turn next.

Next section:
Why web services should be released as free software.

December 15 2010

Defining clouds, web services, and other remote computing

Series

What are the chances for a free software cloud?

  • Resolving the contradictions between web services, clouds, and open source (12/13)
  • Defining clouds, web services, and other remote computing (12/15)
  • Why clouds and web services will continue to take over computing (12/17)
  • Why web services should be released as free software (12/20)
  • Reaching the pinnacle: truly open web services and clouds (12/22)

Additional posts in this 5-part series are available here.

Technology commentators are a bit trapped by the term "cloud," which has been kicked and slapped around enough to become truly shapeless. Time for confession: I stuck the term in this article's title because I thought it useful to attract readers' attention. But what else should I do? To run away from "cloud" and substitute any other term ("web services" is hardly more precise, nor is the phrase "remote computing" I use from time to time) just creates new confusions and ambiguities.

So in this section I'll offer a history of services that have
led up to our cloud-obsessed era, hoping to help readers distinguish
the impacts and trade-offs created by all the trends that lie in the
"cloud."

Computing and storage

The basic notion of cloud computing is simply this: one person uses a
computer owned by another in some formal, contractual manner. The
oldest precedent for cloud computing is therefore timesharing, which
was already popular in the 1960s. With timesharing, programmers could
enter their programs on teletype machines and transmit them over
modems and phone lines to central computer facilities that rented out
CPU time in units of one-hundredth of a second.

Some sites also purchased storage space on racks of large magnetic
tapes. The value of storing data remotely was to recover from flood,
fire, or other catastrophe.

The two major, historic cloud services offered by the
Amazon.com--Elastic Compute Cloud (EC2) and Simple Storage Service
(S3)--are the descendants of timesharing and remote backup,
respectively.

EC2 provides complete computer systems to clients, who can request any
number of systems and dismiss them again when they are no longer
needed. Pricing is quite flexible (even including an option for an
online auction) but is essentially a combination of hourly rates and
data transfer charges.

S3 is a storage system that lets clients reserve as much or as little
space as needed. Pricing reflects the amount of data stored and the
amount of data transferred in and out of Amazon's storage. EC2 and S3
complement each other well, because EC2 provides processing but no
persistent storage.

Timesharing and EC2-style services work a bit like renting a community
garden. Just as community gardens let apartment dwellers without
personal back yards grow fruits and vegetables, timesharing in the
1960s brought programming within reach of people who couldn't
afford a few hundred thousand dollars to buy a computer. All the
services discussed in this section provide hardware to people who run
their own operations, and therefore are often called
Infrastructure as a Service or IaaS.

We can also trace back cloud computing in another direction as the
commercially viable expression of grid computing, an idea
developed through the first decade of the 2000s but whose
implementations stayed among researchers. The term "grid"
evokes regional systems for delivering electricity, which hide the
origin of electricity so that I don't have to strike a deal with
a particular coal-burning plant, but can simply plug in my computer
and type away. Similarly, grid computing combined computing power from
far-flung systems to carry out large tasks such as weather modeling.
These efforts were an extension of earlier cluster technology
(computers plugged into local area networks), and effectively
scattered the cluster geographically. Such efforts were also inspired
by the well-known
SETI@home program,
an early example of Internet crowdsourcing that millions of people have
downloaded to help process signals collected from telescopes.

Another form of infrastructure became part of modern life in the 1990s
when it seemed like you needed your own Web site to be anybody.
Internet providers greatly expanded their services, which used to
involve bare connectivity and an email account. Now they also offer
individualized Web sites and related services. Today you can find a
wealth of different hosting services at different costs depending on
whether you want a simple Web presence, a database, a full-featured
content management system, and so forth.

These hosting services keep costs low by packing multiple users onto
each computer. A tiny site serving up occasional files, such as my own
praxagora.com, needs nothing that
approaches the power of a whole computer system. Thanks to virtual
hosting, I can use a sliver of a web server that dozens of other sites
share and enjoy my web site for very little cost. But praxagora.com
still looks and behaves like an independent, stand-alone web server.
We'll see more such legerdemain as we explore virtualization and
clouds further.

The glimmer of the cloud in the World Wide Web

The next great breakthrough in remote computing was the concept of an
Application Service Provider. This article started with one
contemporary example, Gmail. Computing services such as payroll
processing had been outsourced for some time, but in the 1990s, the
Web made it easy for a business to reach right into another
organization's day-to-day practice, running programs on central
computers and offer interfaces to clients over the Internet. People
used to filling out forms and proceeding from one screen to the next
on a locally installed program could do the same on a browser with
barely any change in behavior.

Using an Application Service Provider is a little like buying a house
in the suburbs with a yard and garden, but hiring a service to
maintain them. Just as the home-owner using a service doesn't
have to get his hands dirty digging holes for plants, worry about the
composition of the lime, or fix a broken lawnmower, companies who
contract with Application Service Providers don't have to
wrestle with libraries and DLL hell, rush to upgrade software when
there's a security breach, or maintain a license server. All
these logistics are on the site run by the service, hidden away from
the user.

Early examples of Application Service Providers for everyday personal
use include blogging sites such as blogger.com and wordpress.com.
These sites offer web interfaces for everything from customizing the
look of your pages to putting up new content (although advanced users
have access to back doors for more complex configuration).

Interestingly, many companies recognized the potential of web browsers
to deliver services in the early 2000s. But browsers' and
JavaScript's capabilities were too limited for rich interaction.
These companies had to try to coax users into downloading plugins that
provided special functionality. The only plugin that ever caught on
was Flash (which, of course, enables many other applications). True
web services had to wait for the computer field to evolve along
several dimensions.

As broadband penetrated to more and more areas, web services became a
viable business model for delivering software to individual users.
First of all, broadband connections are "always on," in
contrast to dial-up. Second, the HttpRequest extension allows browsers
to fetch and update individual snippets of a web page, a practice that
programmers popularized under the acronym AJAX.

Together, these innovations allow web applications to provide
interfaces almost as fast and flexible as native applications running
on your computer, and a new version of HTML takes the process even
farther. The movement to the web is called Software as a
Service
or SaaS.

The

pinned web site feature introduced in Internet Explorer 9

encourages users to create menu items or icons representing web sites,
making them as easy to launch as common applications on their
computer. This feature is a sign of the shift of applications from
the desktop to the Web.

Every trend has its logical conclusion, even if it's farther
than people are willing to go in reality. The logical conclusion of
SaaS is a tiny computer with no local storage and no software except
the minimal operating system and networking software to access servers
that host the software to which users have access.

Such thin clients were already prominent in the work world
before Web services became popular; they connected terminals made by
companies such as Wyse with local servers over cables. (Naturally,
Wyse has

recently latched on to the cloud hype
.)
The Web equivalent of thin clients is mobile devices such as iPhones
with data access, or
Google Chrome OS,
which Google is hoping will wean people away from popular software
packages in favor of Web services like Google Docs. Google is planning
to release a netbook running Chrome OS in about six months. Ray
Ozzie, chief software architect of Microsoft, also speaks of an
upcoming reality of
continuous cloud services delivered to thin appliances
.
The public hasn't followed the Web services revolution this far,
though; most are still lugging laptops.

Data, data everywhere

Most of the world's data is now in digital form, probably in some
relational database such as Oracle, IBM's DB2, or MySQL. If the
storage of the data is anything more formal than a spreadsheet on some
clerical worker's PC (and a shameful amount of critical data is still
on those PCs), it's probably already in a kind of cloud.

Database administrators know better than to rely on a single disk to
preserve those millions upon millions of bytes, because tripping over
an electric cable can lead to a disk crash and critical information
loss. So they not only back up their data on tape or some other
medium, but duplicate it on a series of servers in a strategy called
replication. They often transmit data second by second over
hundreds of miles of wire so that flood or fire can't lead to
permanent loss.

Replication strategies can get extremely complex (for instance, code
that inserts the "current time" can insert different
values as the database programs on various servers execute it), and
they are supplemented by complex caching strategies. Caches are
necessary because public-facing systems should have the most commonly
requested data--such as current pricing information for company
products--loaded right into memory. An extra round-trip over the
Internet for each item of data can leave users twiddling their thumbs in
annoyance. Loading or "priming" these caches can take
hours, because primary memories on computers are so large.

The use of backups and replication can be considered a kind of private
cloud, and if a commercial service becomes competitive in reliability
or cost, we can expect businesses to relax their grip and entrust
their data to such a service.

We've seen how Amazon.com's S3 allowed people to store
data on someone else's servers. But as a primary storage area,
S3 isn't cost-effective. It's probably most valuable when
used in tandem with an IaaS service such as EC2: you feed your data
from the data cloud service into the compute cloud service.

Some people also use S3, or one of many other data storage services,
as a backup to their local systems. Although it may be hard to get
used to trusting some commercial service over a hard drive you can
grasp in your hand, the service has some advantages. They are actually
not as likely as you are to drop the hard drive on the floor and break
it, or have it go up in smoke when a malfunctioning electrical system
starts a fire.

But data in the cloud has a much more powerful potential. Instead of
Software as a Service, a company can offer its data online for others
to use.

Probably the first company to try this radical exposure of data was
Amazon.com, who can also be credited for starting the cloud services
mentioned earlier. Amazon.com released a service that let programmers
retrieve data about its products, so that instead of having to visit
dozens of web pages manually and view the data embedded in the text,
someone could retrieve statistics within seconds.

Programmers loved this. Data is empowering, even if it's just
sales from one vendor, and developers raced to use the application
programming interface (API) to create all kinds of intriguing
applications using data from Amazon. Effectively, they leave it up to
Amazon to collect, verify, maintain, search through, and correctly
serve up data on which their applications depend. Seen as an aspect of
trust, web APIs are an amazing shift in the computer
industry.

Amazon's API was a hack of the Web, which had been designed to
exchange pages of information. Like many other Internet services, the
Web's HTTP protocol offers a few basic commands: GET, PUT, POST,
and DELETE. The API used the same HTTP protocol to get and put
individual items of data. And because it used HTTP, it could easily be
implemented in any language. Soon there were libraries of programming
code in all popular languages to access services such as
Amazon.com's data.

Another early adopter of Web APIs was Google. Because its
Google Maps service exposed
data in a program-friendly form, programmers started to build useful
services on top of it. One famous example combined Google Maps with a
service that published information on properties available for rent;
users could quickly pull up a map showing where to rent a room in
their chosen location. Such combinations of services were called
mash-ups, with interesting cultural parallels to the
practices of musicians and artists in the digital age who combine
other people's work from many sources to create new works.

The principles of using the Web for such programs evolved over several
years in the late 1990s, but the most popular technique was codified
in a 2000 PhD thesis by HTTP designer Roy Thomas Fielding, who
invented the now-famous term REST (standing for Representational State
Transfer) to cover the conglomeration of practices for defining URLs
and exchanging messages. Different services adhere to these principles
to a greater or lesser extent. But any online service that wants to
garner serious, sustained use now offers an API.

A new paradigm for programmers

SaaS has proven popular for programmers. In 1999, a company named VA
Linux created a site called
SourceForge
with the classic SaaS goal of centralizing the administration of
computer systems and taking that burden off programmers' hands. A
programmer could upload his program there and, as is typical for free
software and open source, accept code contributions from anyone else
who chose to download the program.

VA Linux at that time made its money selling computers that ran the
GNU/Linux operating system. It set up SourceForge as a donation to the
free software community, to facilitate the creation of more free
software and therefore foster greater use of Linux. Eventually the
hardware business dried up, so SourceForge became the center of the
company's business: corporate history anticipated cloud
computing history.

SourceForge became immensely popular, quickly coming to host hundreds
of thousands of projects, some quite heavily used. It has also
inspired numerous other hosting sites for programmers, such as
Github. But these sites don't
completely take the administrative hassle out of being a programmer.
You still need to run development software--such as a compiler
and debugger--on your own computer.

Google leapt up to the next level of programmer support with
Google App Engine,
a kind of programmer equivalent to Gmail or
Google Docs.
App Engine is a cocoon within which you can plant a software larva and
carry it through to maturity. Like SaaS, the programmer does the
coding, compilation, and debugging all on the App Engine site. Also
like SaaS, the completed program runs on the site and offers a web
interface to the public. But in terms of power and flexibility, App
Engine is like IaaS because the programmer can use it to offer any
desired service. This new kind of development paradigm is called
Platform as a Service or PaaS.

Microsoft offers both IaaS and PaaS in its
Windows Azure
project.

Hopefully you now see how various types of remote computing are alike,
as well as different.

December 13 2010

Resolving the contradictions between web services, clouds, and open source

Series

What are the chances for a free software cloud?

Additional posts in this 5-part series are available here.

Predicting trends in computer technology is an easy way to get into trouble, but two developments have been hyped so much over the past decade that there's little risk in jumping on their bandwagons: free software and cloud computing. What's odd is that both are so beloved of crystal-gazers, because on the surface they seem incompatible.

The first trend promises freedom, the second convenience. Both freedom and convenience inspire people to adopt new technology, so I believe the two trends will eventually coexist and happily lend power to each other. But first, the proponents of each trend will have to get jazzed up about why the other trend is so compelling.

Freedom is promised by the free and open source software movement. Its
foundation is the principle of radical sharing: the knowledge one
produces should be offered to others. Starting with a few
break-through technologies that surprised outsiders by coming to
dominate their industries--the GNU C compiler, the Linux kernel,
the Apache web server--free software has insinuated itself into
every computing niche.

The trend toward remote computing--web services and the vaguely
defined cloud computing--promises another appealing kind of
freedom: freedom from having to buy server hardware and set up
operations, freedom from installations and patches and upgrades,
freedom in general from administrative tasks. Of course, these
advantages are merely convenience, not the kind of freedom championed
by the free software movement.

Together with the mobile revolution (not just programs on cell phones,
but all kinds of sensors, cameras, robots, and specialized devices for
recording and transmitting information) free software and remote
computing are creating new environments for us to understand
information, ourselves, and each other.

The source of the tension

Remote computing, especially the layer most of us encounter as web
services, is offered on a take-it-or-leave-it basis. Don't like
Facebook's latest change to its privacy settings? (Or even where
it locates its search box?) Live with it or break your Facebook habit
cold turkey.

Free software, as we'll see, was developed in resistance to such
autocratic software practices. And free software developers were among
the first to alert the public about the limitations of clouds and web
services. These developers--whose ideals are regularly challenged
by legal, social, and technological change--fear that remote
computing undermines the premises of free software. To understand the
tension, let's contrast traditional mail delivery with a popular
online service such as
Gmail, a textbook example of a web
service familiar to many readers.

For years, mail was transmitted by free software. The most popular
mail server was Sendmail, which could stand with the examples I listed
at the beginning of this article as one of earliest examples of free
software in widespread use. Sendmail's source code has been
endlessly examined, all too often for its many security flaws.

Lots of organizations still use free software mail servers, even
though in the commercial world, Microsoft's closed-source
Exchange is the standard. But organizations are flocking now to Gmail,
which many people find the most appealing interface for email.

Not only is Gmail closed, but the service would remain closed even if
Google released all the source code. This is because nobody who uses
Gmail software actually loads it on their systems (except some
JavaScript that handles user interaction). We all simply fire up a
browser to send a message to code running on Google servers. And if
Google hypothetically released the source code and someone set up a
competing Gmail, that would be closed for the same reason. A web
service runs on a privately owned computer and therefore is always
closed.

So the cloud--however you define it--seems to render the notion of
software freedom meaningless. But things seem to get even worse. The
cloud takes the client/server paradigm to its limit. There is forever
an unbreachable control gap between those who provide the service and
those who sign up for it.

And this is apparently a step backward in computing history. Closed,
proprietary software erected a gateway between the all-powerful
software developers and the consumers of the software. Free software
broke the gate down by giving the consumers complete access to source
code and complete freedom to do what they wanted. Amateurs around the
world have grabbed the opportunity to learn programming techniques
from free software and to make it fit their whims and needs. Now, once
again, software hidden behind a server commands the user to relinquish
control--and as the popularity of Gmail and other services show,
users are all too ready to do it.

Cloud computing is leading to the bifurcation of computing into a
small number of developers with access to the full power and
flexibility that computers can offer, contrasted with a world full of
small devices offering no say in what the vendors choose for us to
run, a situation predicted in Jonathan Zittrain's book

The Future of the Internet
.

Tim Berners-Lee, inventor of the World Wide Web, as part of a major
Scientific American article,

criticized social networks like Facebook
as silos that commit the
sin of hoarding data entered by visitors instead of exposing it openly
on the Internet. Ho, Sir Berners-Lee, that's exactly why many visitors
use social networks: to share their personal thoughts and activities
with a limited set of friends or special-interest groups. Social
networks and their virtual walls therefore contribute to the potential
of the Internet as a place to form communities.

But Berners-Lee was airing his complaint as part of a larger point
about the value of providing data for new and unanticipated
applications, and his warning does raise the question of scale. If
Facebook-type networks became the default and people "lived" on them
all the time instead of the wider Web, opportunities for
interconnection and learning would diminish.

Complementary trends

But one would be jumping to conclusions to assume that cloud computing
is inimical to free software. Google is one of the world's great
consumers of free software, and a supporter as well. Google runs its
servers on Linux, and has placed it at the core of its fast-growing
Android mobile phone system. Furthermore, Google submits enhancements
to free software projects, releases many of its peripheral
technologies as open source, and runs projects such as
Summer of Code to develop
new free software programs and free software programmers in tandem.

This is the trend throughout computing. Large organizations with banks
of servers tend to run free software on them. The tools with which
they program and administer the servers are also free.

A "free software cloud" may seem to be an oxymoron, like
"non-combat troops." But I believe that free software and
remote computing were made for each other; their future lies together
and the sooner they converge, the faster they will evolve and gain
adoption. In fact, I believe a free software cloud--much more
than the "open cloud" that
many organizations are working on--lies
in our future. This series will explore the traits of each trend and
show why they are meant to join hands.

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl