Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

February 08 2013

Distributed resilience with functional programming

Functional programming has a long and distinguished heritage of great work — that was only used by a small group of programmers. In a world dominated by individual computers running single processors, the extra cost of thinking functionally limited its appeal. Lately, as more projects require distributed systems that must always be available, functional programming approaches suddenly look a lot more appealing.

Steve Vinoski, an architect at Basho Technologies, has been working with distributed systems and complex projects for a long time, first as a tentative explorer and then leaping across to Erlang when it seemed right. Seventeen years as a columnist on C, C++, and functional languages have given him a unique viewpoint on how developers and companies are deciding whether and how to take the plunge.

Highlights from our recent interview include:

  • From CORBA/C++ to Erlang — “Every time I looked at it, it seemed to have an answer.” [Discussed at the 3:14 mark]
  • Everything old is new again — “Seeing people accidentally or by having to work through the problems, stumbling upon these old research papers and old ideas.” [7:20]
  • Erlang is not hugely fast — “It’s more for control, not for data streaming.” [16:58]
  • Webmachine — “[It's what you would get] If you took HTTP and made a flowchart of it … and then implement that flowchart.” [23:50]

  • Is Erlang syntax a barrier? [28:39]

Even if functional programming isn’t something you want to do now, keep an eye on it: there’s a lot more coming. There are many options besides Erlang, too!

You can view the entire conversation in the following video:

Related:

December 14 2012

LISA mixes the ancient and modern: report from USENIX system administration conference

I came to LISA, the classic USENIX conference, to find out this year who was using such advanced techniques as cloud computing, continuous integration, non-relational databases, and IPv6. I found lots of evidence of those technologies in action, but also had the bracing experience of getting stuck in a talk with dozens of Solaris fans.

Such is the confluence of old and new at LISA. I also heard of the continued relevance of magnetic tape–its storage costs are orders of magnitude below those of disks–and of new developements on NFS. Think of NFS as a protocol, not a filesystem: it can now connect many different filesystems, including the favorites of modern distributed system users.

LISA, and the USENIX organization that valiantly unveils it each year, are communities at least as resilient as the systems that their adherents spend their lives humming. Familiar speakers return each year. Members crowd a conference room in the evening to pepper the staff with questions about organizational issues. Attendees exchange their t-shirts for tuxes to attend a three-hour reception aboard a boat on the San Diego harbor, which this time was experiencing unseasonably brisk weather. (Full disclosure: I skipped the reception and wrote this article instead.)Let no one claim that computer administrators are anti-social.

Again in the spirit of full disclosure, let me admit that I perform several key operations on a Solaris system. When it goes away (which someday it will), I’ll have to alter some workflows.

The continued resilience of LISA

Conferences, like books, have a hard go of it in the age of instant online information. I wasn’t around in the days when people would attend conferences to exchange magnetic tapes with their free software, but I remember the days when companies would plan their releases to occur on the first day of a conference and would make major announcements there. The tradition of using conferences to propel technical innovation is not dead; for instance, OpenStack was announced at an O’Reilly Open Source convention.

But as pointed out by Thomas Limoncelli, an O’Reilly author (Time Management for System Administrators) and a very popular LISA speaker, the Internet has altered the equation for product announcements in two profound ways. First of all, companies and open source projects can achieve notoriety in other ways without leveraging conferences. Second, and more subtly, the philosophy of “release early, release often” launches new features multiple times a year and reduces the impact of major versions. The conferences need a different justification.

Limoncelli says that LISA has survived by getting known as the place you can get training that you can get nowhere else. “You can learn about a tool from the person who created the tool,” he says. Indeed, at the BOFs it was impressive to hear the creator of a major open source tool reveal his plans for a major overhaul that would permit plugin modules. It was sobering though to hear him complain about a lack of funds to do the job, and discuss with the audience some options for getting financial support.

LISA is not only a conference for the recognized stars of computing, but a place to show off students who can create a complete user administration interface in their spare time, or design a generalized extension of common Unix tools (grep, diff, and so forth) that work on structured blocks of text instead of individual lines.

Another long-time attendee told me that companies don’t expect anyone here to whip out a checkbook in the exhibition hall, but they still come. They have a valuable chance at LISA to talk to people who don’t have direct purchasing authority but possess the technical expertise to explain to their bosses the importance of new products. LISA is also a place where people can delve as deep as the please into technical discussions of products.

I noticed good attendance at vendor-sponsored Bird-of-a-Feather sessions, even those lacking beer. For instance, two Ceph staff signed up for a BOF at 10 in the evening, and were surprised to see over 30 attendees. It was in my mind a perfect BOF. The audience talked more than the speakers, and the speakers asked questions as well as delivering answers.

But many BOFs didn’t fit the casual format I used to know. Often, the leader turned up with a full set of slides and took up a full hour going through a list of new features. There were still audience comments, but no more than at a conference session.

Memorable keynotes

One undeniable highlight of LISA was the keynote by Internet pioneer Vint Cerf. After years in Washington, DC, Cerf took visible pleasure in geeking out with people who could understand the technical implications of the movements he likes to track. His talk ranged from the depth of his wine cellar (which he is gradually outfitting with sensors for quality and security) to interplanetary travel.

The early part of his talk danced over general topics that I think were already adequately understood by his audience, such as the value of DNSSEC. But he often raised useful issues for further consideration, such as who will manage the billions of devices that will be attached to the Internet over the next few years. It can be useful to delegate read access and even write access (to change device state) to a third party when the device owner is unavailable. In trying to imagine a model for sets of device, Cerf suggested the familiar Internet concept of an autonomous system, which obviously has scaled well and allowed us to distinguish routers running different protocols.

The smart grid (for electricity) is another concern of Cerf’s. While he acknowledged known issues of security and privacy, he suggested that the biggest problem will be the classic problem of coordinated distributed systems. In an environment where individual homes come and go off the grid, adding energy to it along with removing energy, it will be hard to predict what people need and produce just the right amount at any time. One strategy involves microgrids: letting neighborhoods manage their own energy needs to avoid letting failures cascade through a large geographic area.

Cerf did not omit to warn us of the current stumbling efforts in the UN to institute more governance for the Internet. He acknowledged that abuse of the Internet is a problem, but said the ITU needs an “excuse to continue” as radio, TV, etc. migrate to the Internet and the ITU’s standards see decreasing relevance.

Cerf also touted the Digital Vellum project for the preservation of data and software. He suggested that we need a legal framework that would require software developers to provide enough information for people to continue getting access to their own documents as old formats and software are replaced. “If we don’t do this,” he warned, “our 22nd-century descendants won’t know much about us.”

Talking about OpenFlow and Software Defined Networking, he found its most exciting opportunity is to let us use content to direct network traffic in addition to, or instead of, addresses.

Another fine keynote was delivered by Matt Blaze on a project he and colleagues conducted to assess the security of the P25 mobile systems used everywhere by security forces, including local police and fire departments, soldiers in the field, FBI and CIA staff conducting surveillance, and executive bodyguards. Ironically, there are so many problems with these communication systems that the talk was disappointing.

I should in no way diminish the intelligence and care invested by these researchers from the University of Pennsylvania. It’s just the history of P25 makes security lapses seem inevitable. Because it was old, and was designed to accommodate devices that were even older, it failed to implement basic technologies such as asymmetric encryption that we now take for granted. Furthermore, most of the users of these devices are more concerned with getting messages to their intended destinations (so that personnel can respond to an emergency) than in preventing potential enemies from gaining access. Putting all this together, instead of saying “What impressive research,” we tend to say, “What else would you expect?”

Random insights

Attendees certainly had their choice of virtualization and cloud solutions at the conference. A very basic introduction to OpenStack was offered, along with another by developers of CloudStack. Although the latter is older and more settled, it is losing the battle of mindshare. One developer explained that CloudStack has a smaller scope than OpenStack, because CloudStack is focused on high-computing environments. However, he claimed, CloudStack works on really huge deployments where he hasn’t seen other successful solutions. Yet another open source virtuallization platform presented was Google’s Ganeti.

I also attended talks and had chats with developers working on the latest generation of data stores: massive distributed file systems like Hadoop’s HDFS, and high-performance tools such as HBase and Impala, for accessing the data it stores. There seems be accordion effect in data stores: developers start with simple flat or key-value structures. Then they find the need over time–depending on their particular applications–for more hierarchy or delimited data, and either make their data stores more heavyweight or jerry-rig the structure through conventions such as defining fields for certain purposes. Finally we’re back at something mimicking the features of a relational database, and someone rebels and starts another bare-bones project.

One such developer told me hoped his project never turns into a behemoth like CORBA or (lamentably) what WS-* specifications seem to have wrought.

CORBA is universally recognized as dead–perhaps stillborn, because I never heard of major systems deployed in production. In fact, I never knew of an implementation that caught up with the constant new layers of complexity thrown on by the standards committee.

In contrast, WS-* specifications teeter on the edge of acceptability, as a number of organizations swear by it.

I pointed out to my colleague that most modern cloud or PC systems are unlikely to suffer from the weight of CORBA or WS-*, because the latter two systems were created for environments without trust. They were meant to tie together organizations with conflicting goals, and were designed by consortia of large vendors jockeying for market share. For both of these reasons, they have to negotiate all sorts of parameters and add many assurances to every communication.

Recently we’ve seen an increase of interest in functional programming. It occurred to me this week that many aspects of functional programming go nicely with virtualization and the cloud. When you write code with no side effects and no global lack of state, you can recover more easily when instances of your servers disappear. It’s fascinating to see how technologies coming from many different places push each other forward–and sometimes hold each other back.

August 09 2012

Applying markup to complexity

When XML exploded onto the scene, it ignited visions of magical communications, simplified document storage, and a whole new wave of application capabilities. Reality has proved calmer, with competition from JSON and other formats tackling a wide variety of problems, while the biggest of the big data problems have such volume that adding markup seems likely to create new problems.

However, at the in-progress Balisage conference, it’s clear that markup remains really good at solving a middle category of problems, where its richer structures can shine without creating headaches of volume or complication. In the past, Balisage often focused on hard problems most people didn’t yet have, but this year’s program tackles challenges that more developers are encountering as their projects grow in complexity.

XML and JSON

JSON gave programmers much of what they wanted: a simple format for shuttling (and sometimes storing) loosely structured data. Its simpler toolset, freed of a heritage of document formats and schemas, let programmers think less about information formats and more about the content of what they were sending.

Developers using XML, however, have found themselves cut off from that data flow, spending a lot of time creating ad hoc toolsets for consuming and creating JSON in otherwise XML-centric toolchains. That experience is leading toward experiments with more formal JSON integration in XQuery and XSLT — and raising some difficult questions about XML itself.

XML and JSON look at data through different lenses. XML is a tree structure of elements, attributes, and content, while JSON is arrays, objects, and values. Element order matters by default in XML, while JSON is far less ordered and contains many more anonymous structures. A paper by Mary Holstege focused primarily on possibilities type introspection in XQuery, but her talk also ventured into how that might help address the challenges presented by the different types in JSON.

Eric van der Vlist, while recognizing that XSLT 3.0 is taking some steps to integrate JSON, reported on broader visions of an XML/JSON “chimera”, though he hoped to come up with something more elegant than the legendary but impractical creature. After his talk, he also posted some broader reflections on a data model better able to accomodate both XML and JSON expectations.

Jonathan Robie reflected on the growing importance of JSON (and his own initial reluctance to take it seriously) as semi-structured data takes over the many tasks it can solve easily. He described XML as shining at handling complex documents and the XML toolset as excellent support for a “hub format,” but also thought that the XML toolchain needs something like JSON. He compared the proposed XSLT 3.0 features for handling maps with JSONiq, and agreed with Holstege and van der Vlist that different expectations about the importance of order created the largest mismatches.

Hans-Jurgen Rennau had probably the most optimistic take, describing a Unified Document Language – not a markup syntax, but a model that could accomodate varied approaches to representing data. His proposal did include concrete syntax for representing this model in XML documents, as well as a description of alternate markup styles that help represent the model beyond XML.

I don’t expect that any of these proposals, even when and if they are widely implemented, will immediately grab the attention of people happily using JSON. In the short term they will serve primarily as bridges for data, helping XML and JSON systems co-exist. In the longer term, however, they may serve as bridges between the cultures of the two formats. Both approaches have their limitations. XML is cumbersome in many cases, while JSON is less pleasantly capable of representing ordered document structures.

JSON freed web developers to create much more complex applications with data formats that feel less complicated. As developers grow more and more ambitious, however, they may find themselves moving back into complex situations where the XML toolkit looks more capable of handling information without the overhead of vast quantities of custom code. If that toolkit supports their existing formats, mixing and matching should be easier.

Metadata, content and design

Markup and data types are themselves metadata, providing information about the data they encapsulate, but Balisage and its predecessor conferences have often focused on metadata structures at higher levels — the Semantic Web, RDF, Topic Maps, and OWL. So far, this year’s talks have been cautious about making big metadata promises.

Kurt Cagle gave the only talk on a subject that once seemed to dominate the conference, ontologies and tools for managing them. His metadata stack was large, and changing near the end of the work to include SPARQL over HTTP. If Semantic Web technologies can settle into the small and focused groove Cagle described, it seems like they might find a comfortable niche in web infrastructure rather than attempting to conquer it.

Diane Kennedy discussed the PRISM Source Vocabulary, an effort similar in its focus on applying technology to solve a set of problems for a particularly difficult context. The technology in the talk was unsurprising, but the social context was difficult, describing a missionary effort, to bring metadata ideas from a very “content first” crowd to magazines, a very “design first” crowd. Multiple delivery platforms, notably the iPad, have made design first communities more willing to consider at least a subset of metadata and markup technology.

Markup and programming language boundaries

While designers are historically a difficult crowd to sell semantic markup, programmers have been a skeptical audience about the value of markup — especially when “you got your markup in my programming language.” There are, of course, many programmers attending and speaking at Balisage, but the boundaries between people who care primarily about the data and those who care primarily about the processing are a unique and ever-changing combination of blurry and (cutting) sharp.

A number of speakers described themselves as “not programmers” and their work as “not programming” despite the data processing work they were clearly doing. Ari Nordstrom opened his talk on moving as much coding as possible to XML by discussing his differences with the C# culture he works with. In another talk, Yves Marcous said “I am not a programmer” only to be told by the audience immediately “Yes, you are!”

XML’s document-centric approach to the world seems to drive people toward declarative and functional programming styles. That is partly a side-effect of looking at so many documents that it becomes convenient to turn programs into documents, an angle that is very hard to explain to those outside the document circle. However, the strong tendencies toward functional programming emerge, I suspect, from the headaches of processing markup in “traditional” object-oriented or imperative programming. The Document Object Model, long the most-criticized aspect of JavaScript, exemplifies this mismatch (compounded by a mismatch between Java and JavaScript object models, of course). As jQuery and many other developers know, navigating a document tree through declarative CSS selectors is much easier.

Steven Pemberton’s talk on serialization and abstraction examined these kinds of questions in the context of form development. Managing user interactions with forms has long been labor-intensive, with developers creating ever-more complex (and often ever-more fragile) tools for forms-based validation and interactivity. Pemberton described how decisions made early in the development of a programming discipline can leave lingering and growing costs as approaches that seemed to work for simple cases grow under the pressure of increasing requirements. The XForms work attempts to leave the growing JavaScript snarl for a more-manageable declarative approach, but has not succeeded in changing web forms culture so far.

Jorge Luis Williams and David Cramer, both of Rackspace, found a different target for a declarative approach, mixing documentation into their code for validating RESTful services. The divide between REST and other web service approaches isn’t quite the same as the declarative / imperative divide, but I still felt a natural complement between the validation approach they were using and the underlying services they were testing.

A series of talks Tuesday afternoon poked and prodded at how markup could provide services to programming languages, exploring the boundaries between them. Economist Matthew McCormick discussed a system that provided documentation and structure to libraries of mathematical functions written in a variety of programming languages. Markup served as glue between the libraries, describing common functionality. David Lee wanted a toolset that would let him extract documentation from code — not just the classic JavaDoc extraction, but compilers reporting much of their internal information about variables in a processable markup format.

Norm Walsh started in different territory, discussing the challenges of creating compact non-markup formats for XML vocabularies, but wound up in a similar place. Attempting to shift a vocabulary from an XML format to a C-like syntax introduces dissonance even as it reduces verbosity. After noting the unusual success of the RELAX NG compact syntax and the simplicity that made it possible, he showed some of his own work on creating a compact syntax for XProc, declared it flawed, and showed a shift toward more programming-like approaches that eased some of the mismatch.

If you’re a programmer reading this, you may be wondering why these boundaries should matter to you. These frontiers tend to get explored from the markup side, and it’s possible that this work doesn’t solve a problem for you now. As conference chair Tommie Usdin put it in her welcome, however, Balisage is a place to “be exposed to some things that matter to other people but not to you — or at least not to you right now.”

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl