Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

June 17 2013

Why We Started the Velocity Conference

Back in 2006, Debra Chrapaty, then VP of Operations for Windows Live (later CIO at Zynga, and now CEO of Nirvanix) made a prescient comment to me: “In the future, being a developer on someone’s platform will mean being hosted on their infrastructure.” As it often turns out, things don’t work out quite as planned. A few months later, Amazon announced EC2, and it was Amazon, not Microsoft, that became the platform whose infrastructure startups chose to host their applications on. But Debra certainly nailed the big idea!

I wrote a blog post about that conversation, entitled Operations: The New Secret Sauce, which included the statement “Operations used to be thought of as boring. It’s now ground zero in the computing wars.” Jesse Robbins, then “Master of Disaster” at Amazon and later co-founder and CEO of Opscode, told me that everyone in operations at Amazon printed out that blog post and posted it in their cubicles.  Operations had been a relatively low-status job. Jesse told me that was the first time anyone had made a strong public statement about how important it was becoming.

As a result of that post, Jesse, Steve Souders, and a group of others came to me the following year and said “We need a gathering place for our tribe.”  That gathering place became the Velocity Conference, now in its sixth year.  We chose to include not just web operations, but also web performance and the emerging field of “DevOps” – the development model for applications hosted in the cloud.

This seems to be part of the secret sauce of some of our most successful events:  the recognition that it’s not just about technology but the people who put it into practice. At the heart of conferences like Velocity and Strata are new job descriptions, new skills, and new opportunities to grow careers and companies. That’s also why we increasingly think of these events not as conferences but as gathering places for communities.  Technology matters. The people who put them into practice matter more.

The Velocity Conference starts tomorrow in Santa Clara.  There is still time to attend.

August 20 2012

DNA: The perfect backup medium

It wasn’t enough for Dr. George Church to help Gilbert “discover” DNA sequencing 30 years ago, create the foundations for genomics, create the Personal Genome Project, drive down the cost of sequencing,  and start humanity down the road of synthetic biology. No, that wasn’t enough.

He and his team decided to publish an easily understood scientific paper (““Next-generation Information Storage in DNA“) that promises to change the way we store and archive information. While this technology may take years to perfect, it provides a roadmap toward an energy efficient, archival storage medium with a host of built-in advantages.

The paper demonstrates the feasibility of using DNA as a storage medium with a theoretical capacity of 455 exabytes per gram. (An exabyte is 1 million terabytes.) Now before you throw away your massive RAID 5 cluster and purchase a series of sequencing machines, know that DNA storage appears to be very high latency. Also know that Church, Yuan Gao, and Sriram Kosuri are not yet writing 455 exabytes of data, they’ve started with a more modest goal of writing Church’s recent book on genomics to a 5.29 MB “bitstream,” here’s an excerpt from the paper:

We converted an html-coded draft of a book that included 53,426 words, 11 JPG images and 1 JavaScript program into a 5.27 megabit bitstream. We then encoded these bits onto 54,898 159nt oligonucleotides (oligos) each encoding a 96-bit data block (96nt), a 19-bit address specifying the location of the data block in the bit stream (19nt), and flanking 22nt common sequences for amplification and sequencing. The oligo library was synthesized by ink-jet printed, high-fidelity DNA microchips. To read the encoded book, we amplified the library by limited-cycle PCR and then sequenced on a single lane of an Illumina HiSeq.

If you know anything about filesystems, this is an amazing paragraph. They’ve essentially defined a new standard for filesystem inodes on DNA. Each 96-bit block has a 19-bit descriptor. They then read this DNA bitstream by using something called Polymerase Chain Reaction (PCR). This is important because it means that reading this information involves generating millions of copies of the data in a format that has been proven to be durable. This biological “backup system” has replication capabilities “built-in.” Not just that, but this replication process has had billions of years of reliability data available.

While this technology may only be practical for long-term storage and high-latency archival purposes, you can already see that this paper makes a strong case for the viability of this approach. Of all biological storage media, this work has demonstrated the longest bit stream and is built atop a set of technologies (DNA sequencing) that have been focused on repeatability and error correction for decades.

In addition to these advantages, DNA storage has other advantages over tape or hard drive — it has a steady-state storage cost of zero, a lifetime that far exceeds that of magnetic storage, and very small space requirements.

If you have a huge amount of data that needs to be archived, the advantages of DNA as a storage medium (once the technology matures) could quickly translate to significant cost savings. Think about the energy requirements of a data center that needs to store and archive an exabyte of data. Compare that to the cost of maintaining a sequencing lab and a few Petri dishes.

For most of us, this reality is still science fiction, but Church’s work makes it less so every day. Google is uniquely positioned to realize this technology. It has already been established that Google’s founders pay close attention to genomics. They invested an unspecified amount in Church’s Personal Genome Project (PGP) in 2008, and they have invested a company much closer to home: 23andme. Google also has a large research arm focused on energy savings and efficiency with scientists like Urs Hozle looking for new ways to get more out of the energy that Google spends to run data centers.

If this technology points the way to the future of very high latency, archival storage, I predict that Google will lead the way in implementation. It is the perfect convergence of massive data and genomics, and just the kind of dent that company is trying to make in the universe.

Sponsored post
Reposted byLegendaryy Legendaryy

June 12 2012

Velocity Profile: Schlomo Schapiro

This is part of the Velocity Profiles series, which highlights the work and knowledge of web ops and performance experts.

Schlomo SchapiroSchlomo Schapiro
Systems Architect, Open Source Evangelist

How did you get into web operations and performance?

Previously I was working as a consultant for Linux, open source tools and virtualization. While this is a great job, it has one major drawback: One usually does not stay with one customer long enough to enable the really big changes, especially with regard to how the customer works. When ImmobilienScout24 came along and offered me the job as a Systems Architect, this was my ticket out of consulting and into diving deeply into a single customer scenario. The challenges that ImmobilienScout24 faced were very much along the lines that occupied me as well:

  • How to change from "stable operations" to "stable change."
  • How to fully automate a large data center and stop doing repeating tasks manually.
  • How to drastically increase the velocity of our release cycles.

What are your most memorable projects?

There are a number of them:

  • An internal open source project to manage the IT desktops by the people who use them.
  • An open source project, Lab Manager Light, that turns a standard VMware vSphere environment into a self-service cloud.
  • The biggest and still very much ongoing project is the new deployment and systems automation for our data center. The approach — which is also new — is to unify the management of our Linux servers under the built-in package manager, in our case RPM. That way all files on the servers are already taken care of and we only need to centrally orchestrate the package roll-out waves and service start/stop. The tools we use for this are published here.
  • Help to nudge us to embrace DevOps last year after the development went agile some three years ago.
  • Most important of all, I feel that ImmobilienScout24 is now on its way to maintain and build upon the technological edge matching our market share as the dominating real-estate listing portal in Germany. This will actually enable us to keep growing and setting the pace in the ever-faster Internet world.

What's the toughest problem you've had to solve?

The real challenge is not to hack up a quick solution but to work as a team to build a sustainable world. Technical debt discussions are now a major part of my daily work. As tedious as they can be, I strongly believe that at our current state sustainability is at least as important as innovation.

What tools and techniques do you rely on most?

Asking questions and trying to understand with everybody together how things really work. Walking a lot through the office with a coffee cup and talking to people. Taking the time to sit down with a colleague at the keyboard and seeing things through. Sometimes it helps to shorten a discussion with a a little hacking and "look, it just works" — but this should always be a way to start a discussion. The real work is better done together as a team.

What is your web operations and performance super power?

I hope that I manage to help us all to look forward to the next day at work. I also try to simplify things until they are really simple, and I annoy everybody by nagging about separation of concerns.

Velocity 2012: Web Operations & Performance — The smartest minds in web operations and performance are coming together for the Velocity Conference, being held June 25-27 in Santa Clara, Calif.

Save 20% on registration with the code RADAR20


June 07 2012

What is DevOps?

Adrian Cockcroft's article about NoOps at Netflix ignited a controversy that has been smouldering for some months. John Allspaw's detailed response to Adrian's article makes a key point: What Adrian described as "NoOps" isn't really. Operations doesn't go away. Responsibilities can, and do, shift over time, and as they shift, so do job descriptions. But no matter how you slice it, the same jobs need to be done, and one of those jobs is operations. What Adrian is calling NoOps at Netflix isn't all that different from Operations at Etsy. But that just begs the question: What do we mean by "operations" in the 21st century? If NoOps is a movement for replacing operations with something that looks suspiciously like operations, there's clearly confusion. Now that some of the passion has died down, it's time to get to a better understanding of what we mean by operations and how it's changed over the years.

At a recent lunch, John noted that back in the dawn of the computer age, there was no distinction between dev and ops. If you developed, you operated. You mounted the tapes, you flipped the switches on the front panel, you rebooted when things crashed, and possibly even replaced the burned out vacuum tubes. And you got to wear a geeky white lab coat. Dev and ops started to separate in the '60s, when programmer/analysts dumped boxes of punch cards into readers, and "computer operators" behind a glass wall scurried around mounting tapes in response to IBM JCL. The operators also pulled printouts from line printers and shoved them in labeled cubbyholes, where you got your output filed under your last name.

The arrival of minicomputers in the 1970s and PCs in the '80s broke down the wall between mainframe operators and users, leading to the system and network administrators of the 1980s and '90s. That was the birth of modern "IT operations" culture. Minicomputer users tended to be computing professionals with just enough knowledge to be dangerous. (I remember when a new director was given the root password and told to "create an account for yourself" ... and promptly crashed the VAX, which was shared by about 30 users). PC users required networks; they required support; they required shared resources, such as file servers and mail servers. And yes, BOFH ("Bastard Operator from Hell") serves as a reminder of those days. I remember being told that "no one" else is having the problem you're having — and not getting beyond it until at a company meeting we found that everyone was having the exact same problem, in slightly different ways. No wonder we want ops to disappear. No wonder we wanted a wall between the developers and the sysadmins, particularly since, in theory, the advent of the personal computer and desktop workstation meant that we could all be responsible for our own machines.

But somebody has to keep the infrastructure running, including the increasingly important websites. As companies and computing facilities grew larger, the fire-fighting mentality of many system administrators didn't scale. When the whole company runs on one 386 box (like O'Reilly in 1990), mumbling obscure command-line incantations is an appropriate way to fix problems. But that doesn't work when you're talking hundreds or thousands of nodes at Rackspace or Amazon. From an operations standpoint, the big story of the web isn't the evolution toward full-fledged applications that run in the browser; it's the growth from single servers to tens of servers to hundreds, to thousands, to (in the case of Google or Facebook) millions. When you're running at that scale, fixing problems on the command line just isn't an option. You can't afford letting machines get out of sync through ad-hoc fixes and patches. Being told "We need 125 servers online ASAP, and there's no time to automate it" (as Sascha Bates encountered) is a recipe for disaster.

The response of the operations community to the problem of scale isn't surprising. One of the themes of O'Reilly's Velocity Conference is "Infrastructure as Code." If you're going to do operations reliably, you need to make it reproducible and programmatic. Hence virtual machines to shield software from configuration issues. Hence Puppet and Chef to automate configuration, so you know every machine has an identical software configuration and is running the right services. Hence Vagrant to ensure that all your virtual machines are constructed identically from the start. Hence automated monitoring tools to ensure that your clusters are running properly. It doesn't matter whether the nodes are in your own data center, in a hosting facility, or in a public cloud. If you're not writing software to manage them, you're not surviving.

Furthermore, as we move further and further away from traditional hardware servers and networks, and into a world that's virtualized on every level, old-style system administration ceases to work. Physical machines in a physical machine room won't disappear, but they're no longer the only thing a system administrator has to worry about. Where's the root disk drive on a virtual instance running at some colocation facility? Where's a network port on a virtual switch? Sure, system administrators of the '90s managed these resources with software; no sysadmin worth his salt came without a portfolio of Perl scripts. The difference is that now the resources themselves may be physical, or they may just be software; a network port, a disk drive, or a CPU has nothing to do with a physical entity you can point at or unplug. The only effective way to manage this layered reality is through software.

So infrastructure had to become code. All those Perl scripts show that it was already becoming code as early as the late '80s; indeed, Perl was designed as a programming language for automating system administration. It didn't take long for leading-edge sysadmins to realize that handcrafted configurations and non-reproducible incantations were a bad way to run their shops. It's possible that this trend means the end of traditional system administrators, whose jobs are reduced to racking up systems for Amazon or Rackspace. But that's only likely to be the fate of those sysadmins who refuse to grow and adapt as the computing industry evolves. (And I suspect that sysadmins who refuse to adapt swell the ranks of the BOFH fraternity, and most of us would be happy to see them leave.) Good sysadmins have always realized that automation was a significant component of their job and will adapt as automation becomes even more important. The new sysadmin won't power down a machine, replace a failing disk drive, reboot, and restore from backup; he'll write software to detect a misbehaving EC2 instance automatically, destroy the bad instance, spin up a new one, and configure it, all without interrupting service. With automation at this level, the new "ops guy" won't care if he's responsible for a dozen systems or 10,000. And the modern BOFH is, more often than not, an old-school sysadmin who has chosen not to adapt.

James Urquhart nails it when he describes how modern applications, running in the cloud, still need to be resilient and fault tolerant, still need monitoring, still need to adapt to huge swings in load, etc. But he notes that those features, formerly provided by the IT/operations infrastructures, now need to be part of the application, particularly in "platform as a service" environments. Operations doesn't go away, it becomes part of the development. And rather than envision some sort of uber developer, who understands big data, web performance optimization, application middleware, and fault tolerance in a massively distributed environment, we need operations specialists on the development teams. The infrastructure doesn't go away — it moves into the code; and the people responsible for the infrastructure, the system administrators and corporate IT groups, evolve so that they can write the code that maintains the infrastructure. Rather than being isolated, they need to cooperate and collaborate with the developers who create the applications. This is the movement informally known as "DevOps."

Amazon's EBS outage last year demonstrates how the nature of "operations" has changed. There was a marked distinction between companies that suffered and lost money, and companies that rode through the outage just fine. What was the difference? The companies that didn't suffer, including Netflix, knew how to design for reliability; they understood resilience, spreading data across zones, and a whole lot of reliability engineering. Furthermore, they understood that resilience was a property of the application, and they worked with the development teams to ensure that the applications could survive when parts of the network went down. More important than the flames about Amazon's services are the testimonials of how intelligent and careful design kept applications running while EBS was down. Netflix's ChaosMonkey is an excellent, if extreme, example of a tool to ensure that a complex distributed application can survive outages; ChaosMonkey randomly kills instances and services within the application. The development and operations teams collaborate to ensure that the application is sufficiently robust to withstand constant random (and self-inflicted!) outages without degrading.

Taken at IBM's headquarter On the other hand, during the EBS outage, nobody who wasn't an Amazon employee touched a single piece of hardware. At the time, JD Long tweeted that the best thing about the EBS outage was that his guys weren't running around like crazy trying to fix things. That's how it should be. It's important, though, to notice how this differs from operations practices 20, even 10 years ago. It was all over before the outage even occurred: The sites that dealt with it successfully had written software that was robust, and carefully managed their data so that it wasn't reliant on a single zone. And similarly, the sites that scrambled to recover from the outage were those that hadn't built resilience into their applications and hadn't replicated their data across different zones.

In addition to this redistribution of responsibility, from the lower layers of the stack to the application itself, we're also seeing a redistribution of costs. It's a mistake to think that the cost of operations goes away. Capital expense for new servers may be replaced by monthly bills from Amazon, but it's still cost. There may be fewer traditional IT staff, and there will certainly be a higher ratio of servers to staff, but that's because some IT functions have disappeared into the development groups. The bonding is fluid, but that's precisely the point. The task — providing a solid, stable application for customers — is the same. The locations of the servers on which that application runs, and how they're managed, are all that changes.

One important task of operations is understanding the cost trade-offs between public clouds like Amazon's, private clouds, traditional colocation, and building their own infrastructure. It's hard to beat Amazon if you're a startup trying to conserve cash and need to allocate or deallocate hardware to respond to fluctuations in load. You don't want to own a huge cluster to handle your peak capacity but leave it idle most of the time. But Amazon isn't inexpensive, and a larger company can probably get a better deal taking its infrastructure to a colocation facility. A few of the largest companies will build their own datacenters. Cost versus flexibility is an important trade-off; scaling is inherently slow when you own physical hardware, and when you build your data centers to handle peak loads, your facility is underutilized most of the time. Smaller companies will develop hybrid strategies, with parts of the infrastructure hosted on public clouds like AWS or Rackspace, part running on private hosting services, and part running in-house. Optimizing how tasks are distributed between these facilities isn't simple; that is the province of operations groups. Developing applications that can run effectively in a hybrid environment: that's the responsibility of developers, with healthy cooperation with an operations team.

The use of metrics to monitor system performance is another respect in which system administration has evolved. In the early '80s or early '90s, you knew when a machine crashed because you started getting phone calls. Early system monitoring tools like HP's OpenView provided limited visibility into system and network behavior but didn't give much more information than simple heartbeats or reachability tests. Modern tools like DTrace provide insight into almost every aspect of system behavior; one of the biggest challenges facing modern operations groups is developing analytic tools and metrics that can take advantage of the data that's available to predict problems before they become outages. We now have access to the data we need, we just don't know how to use it. And the more we rely on distributed systems, the more important monitoring becomes. As with so much else, monitoring needs to become part of the application itself. Operations is crucial to success, but operations can only succeed to the extent that it collaborates with developers and participates in the development of applications that can monitor and heal themselves.

Success isn't based entirely on integrating operations into development. It's naive to think that even the best development groups, aware of the challenges of high-performance, distributed applications, can write software that won't fail. On this two-way street, do developers wear the beepers, or IT staff? As Allspaw points out, it's important not to divorce developers from the consequences of their work since the fires are frequently set by their code. So, both developers and operations carry the beepers. Sharing responsibilities has another benefit. Rather than finger-pointing post-mortems that try to figure out whether an outage was caused by bad code or operational errors, when operations and development teams work together to solve outages, a post-mortem can focus less on assigning blame than on making systems more resilient in the future. Although we used to practice "root cause analysis" after failures, we're recognizing that finding out the single cause is unhelpful. Almost every outage is the result of a "perfect storm" of normal, everyday mishaps. Instead of figuring out what went wrong and building procedures to ensure that something bad can never happen again (a process that almost always introduces inefficiencies and unanticipated vulnerabilities), modern operations designs systems that are resilient in the face of everyday errors, even when they occur in unpredictable combinations.

In the past decade, we've seen major changes in software development practice. We've moved from various versions of the "waterfall" method, with interminable up-front planning, to "minimum viable product," continuous integration, and continuous deployment. It's important to understand that the waterfall and methodology of the '80s aren't "bad ideas" or mistakes. They were perfectly adapted to an age of shrink-wrapped software. When you produce a "gold disk" and manufacture thousands (or millions) of copies, the penalties for getting something wrong are huge. If there's a bug, you can't fix it until the next release. In this environment, a software release is a huge event. But in this age of web and mobile applications, deployment isn't such a big thing. We can release early, and release often; we've moved from continuous integration to continuous deployment. We've developed techniques for quick resolution in case a new release has serious problems; we've mastered A/B testing to test releases on a small subset of the user base.

All of these changes require cooperation and collaboration between developers and operations staff. Operations groups are adopting, and in many cases, leading in the effort to implement these changes. They're the specialists in resilience, in monitoring, in deploying changes and rolling them back. And the many attendees, hallway discussions, talks, and keynotes at O'Reilly's Velocity conference show us that they are adapting. They're learning about adopting approaches to resilience that are completely new to software engineering; they're learning about monitoring and diagnosing distributed systems, doing large-scale automation, and debugging under pressure. At a recent meeting, Jesse Robbins described scheduling EMT training sessions for operations staff so that they understood how to handle themselves and communicate with each other in an emergency. It's an interesting and provocative idea, and one of many things that modern operations staff bring to the mix when they work with developers.

What does the future hold for operations? System and network monitoring used to be exotic and bleeding-edge; now, it's expected. But we haven't taken it far enough. We're still learning how to monitor systems, how to analyze the data generated by modern monitoring tools, and how to build dashboards that let us see and use the results effectively. I've joked about "using a Hadoop cluster to monitor the Hadoop cluster," but that may not be far from reality. The amount of information we can capture is tremendous, and far beyond what humans can analyze without techniques like machine learning.

Likewise, operations groups are playing a huge role in the deployment of new, more efficient protocols for the web, like SPDY. Operations is involved, more than ever, in tuning the performance of operating systems and servers (even ones that aren't under our physical control); a lot of our "best practices" for TCP tuning were developed in the days of ISDN and 56 Kbps analog modems, and haven't been adapted to the reality of Gigabit Ethernet, OC48* fiber, and their descendants. Operations groups are responsible for figuring out how to use these technologies (and their successors) effectively. We're only beginning to digest IPv6 and the changes it implies for network infrastructure. And, while I've written a lot about building resilience into applications, so far we've only taken baby steps. There's a lot there that we still don't know. Operations groups have been leaders in taking best practices from older disciplines (control systems theory, manufacturing, medicine) and integrating them into software development.

And what about NoOps? Ultimately, it's a bad name, but the name doesn't really matter. A group practicing "NoOps" successfully hasn't banished operations. It's just moved operations elsewhere and called it something else. Whether a poorly chosen name helps or hinders progress remains to be seen, but operations won't go away; it will evolve to meet the challenges of delivering effective, reliable software to customers. Old-style system administrators may indeed be disappearing. But if so, they are being replaced by more sophisticated operations experts who work closely with development teams to get continuous deployment right; to build highly distributed systems that are resilient; and yes, to answer the pagers in the middle of the night when EBS goes down. DevOps.

Velocity 2012: Web Operations & Performance — The smartest minds in web operations and performance are coming together for the Velocity Conference, being held June 25-27 in Santa Clara, Calif.

Save 20% on registration with the code RADAR20

Photo: Taken at IBM's headquarters in Armonk, NY. By Mike Loukides.


June 06 2012

A crazy awesome gaming infrastructure

In this Velocity Podcast, I had a conversation with Sarah Novotny (@sarahnovotny), CIO of Meteor Entertainment. This conversation centers mostly on building a high-performance gaming infrastructure and bridging the gap between IT and business. Sarah has some great insights into building an environment for human performance that goes along with your quest for more reliable, scalable, tolerant, and secure web properties.

Our conversation lasted 00:15:44 and if you want to pinpoint any particular topic, you can find the specific timing below. Sarah provides some of her background and experience as well as what she is currently doing at Meteor here. The full conversation is outlined below.

  • As a CIO, how do you bridge the gap between technology and business? 00:02:28

  • How do you educate corporate stakeholders about the importance of DevOps and the impact it can have on IT? 00:03:26

  • How does someone set up best practices in an organization? 00:05:24

  • Are there signals that DevOps is actually happening where development and operations are merging? 00:08:31

  • How do you measure performance and make large changes in an online game without disrupting players? 00:09:59

  • How do you prepare for scaling your crazy awesome infrastructure needed for game play? 00:12:28

  • Have you gathered metrics on public and private clouds and do you know which ones to scale to when needed? 00:14:03

In addition to her work at Meteor, Sarah is co-chair of OSCON 2012 (being held July 16-20 in Portland, Ore.). We hope to see you there. You can also read Sarah's blog for more insights into what she's up to.

Velocity 2012: Web Operations & Performance — The smartest minds in web operations and performance are coming together for the Velocity Conference, being held June 25-27 in Santa Clara, Calif.

Save 20% on registration with the code RADAR20


IPv6 day and the state of the edge

IPv6 logoIPv6 enters into permanent operation today and we'll finally have all the addresses we need. Unfortunately the old system with its baked in scarcity — operating like a tireless gravitational force — has already had a few decades to deform the architecture of the Internet in important and perhaps irreversible ways.

I got a notice from Apple reminding me that my MobileMe hosting is going away on June 30. I'm lazy when it comes to certain things and at one point or another iWeb and MobileMe seemed like a simple way to get a personal web page out there. I just wanted a bit of publicly searchable state to clarify who I am (as differentiated from that other Jim Stogdill on the web) that wasn't mediated, moderated, monetized, and walled off by Facebook or some other Austro-Hungarian Central Power of the web. A little place I could call my own.

Really, this is a stupid problem to have. In the last month those pages have had fewer than 100 visits and I could have served them all from a low wattage pluggable computer stashed in a closet without it breaking a sweat. But the Internet doesn't work that way, or not as easily as it should. And at least one of the reasons is its history of address scarcity.

I attended the "Internet Everywhere" panel at the World Science Festival over the weekend. Maybe the most interesting bit was when Neil Gershenfeld forcefully reminded us that the Internet was never intended to be just a bitnet. He was thanking Vint Cerf for making state-full edges a core design principle of the original web. Distributed state meant that adding nodes also added capability and that ownership and power stayed distributed as the Net grew. Maybe it's a sign of where we are now that the man he was thanking works for the web's other Central Power these days.

Unfortunately that chronic shortage of addresses contracted the web, shifting the definition of "edge" from the device you are looking at to the ISP it's connected to. That redefinition from Internet host to mere remote client means that I have to go through the minor hassle of re-hosting my four little pages of HTML instead of happily forgetting that it's in my closet.

I've long been vexed by the asymmetry inherent in DHCP-enabled second class citizenship and I remember the first time I tried to build a permanently addressable home on the web. It was a bunch of years ago and I had my eye on a used Cobalt Qube on eBay. I figured I'd use it as a web server and blog host etc. But like I said before, sometimes I'm lazy, and a fixed IP address was too expensive and (at least at the time) Dynamic DNS was enough of a hurdle for me to say "to hell with it."

Any geek will tell you that it can be done, that I'm making a mountain of a mole hill, and it's not even that hard. "Pay extra for a fixed and registered IP address or use Dynamic DNS." But IP address scarcity made it just hard and expensive enough to make sure that edge hosting didn't become the norm. I'm not commenting on whether it's possible (it is), but whether it's the low-energy state for the broad population of netizens.

Address scarcity contributes to a strange attractor that deformed the logic of the Internet at scale and helped guarantee the cloud would become the primary architecture. When Vint and his colleagues chose that 32-bit address space they thought they were just making a simple engineering tradeoff based on a seemingly predictable future. But it turns out they were adding a bit of dark matter to our Internet cosmos, perhaps just enough to shift the whole thing from open and expanding to closed and collapsing. Address scarcity added to the gravitational force of centralization and control.

On the other hand, if we had IPv6 from the very beginning maybe a whole lot more of us would be hosting our blogs, photos, videos, and pretty much everything else right there in a DMZ hosted on their home router. In that world services like YouTube might need be no more than curation overlays and CDNs for popular content. Sort of a commercially provided BitTorrent index for the stuff we hosted from our closets.

What else might we have built with such an infrastructure? The cloud gives us a sandbox to build applications in, but it also sandboxes our sense of what is even possible. How many startups don't start from the unexamined assumption of cloud hosting today? Why HealthVault? Why not a device that I keep in my house that is completely under my control for that kind of personal information? I could even put it in my safe deposit box if I didn't have any doctor's appointments.

Maybe security concerns and natural economies of scale would have made centralization and "the cloud" inevitable outcomes without any help from address scarcity. But as our universe continues to collapse into a few very highly capitalized Central Powers I find myself hoping that IPv6 will take away at least some of the gravitational force that is pulling it in on itself.


June 05 2012

The software professional vs the software artist

I hope that James Turner's post on "The overhead of insecure infrastructure" was ironic or satiric. The attitude he expresses is all too common, and frankly, is the reason that system administrators and other operations people can't keep their systems secure.

Why do we have to deal with vulnerabilities in operating systems and applications? It's precisely because of prima donna software developers who think they're "artists" and can't be bothered to take the time to do things right. That, and a long history of management that was more interested in meeting ship dates than shipping secure software; and the never ending and always escalating battle between the good guys and the bad guys, as black hats find new vulnerabilities that no one thought of a week ago, let alone a few years ago.

Yes, that's frustrating, but that's life. If a developer in my organization said that he was too good and creative to care about writing secure code, he would be out on his ear. Software developers are not artistes. They are professionals, and the attitude James describes is completely unprofessional and entirely too common.

One of the long-time puzzles in English literature is Jonathan Swift's "A Modest Proposal for Preventing the Children of Poor People From Being a Burden on Their Parents or Country, and for Making Them Beneficial to the Publick." It suggests solving the problem of famine in Ireland by cannibalism. Although Swift is one of English literature's greatest satirists, the problem here is that he goes too far: the piece is just too coldly rational, and never gives you the sly look that shows something else is going on. Is Turner a latter-day Swift? I hope so.


Velocity Profile: Kate Matsudaira

This is part of the Velocity Profiles series, which highlights the work and knowledge of web ops and performance experts.

Kate MatsudairaKate Matsudaira
VP Engineering

How did you get into web operations and performance?

I started working as a software engineer, and being at Amazon working on the internals of the retail website it was almost impossible not to have some exposure to pager duty and operations. As my career progressed and I moved into leadership roles on teams working on 24/7 websites, typically spanning hundreds of servers (and now instances), it was necessary to understand operations and performance.

What was your most memorable project?

Memorable can be two things, really good or really bad. Right now I am excited about the work we have been doing on to make our website super fast and work well across devices (and all the data mining and machine learning is also really interesting).

As for really bad, though, there was a launch almost a decade ago where we implemented an analytics datastore on top of a relational database instead of something like map/reduce. If only Hadoop and all the other great data technologies were around and prevalent back then!

What's the toughest problem you've had to solve?

Building an index of all the links on the web (a link search engine, basically) in one year with less than $1 million, including the team.

What tools and techniques do you rely on most?

Tools: pick the best one for the job at hand. Techniques: take the time to slow down before making snap judgements.

Who do you follow in the web operations and performance world?

Artur Bergman, Cliff Moon, Ben Black, John Allspaw, Rob Treat, and Theo Schlossnagle.

What is your web operations and performance super power?

Software architecture. You have to design your applications to be operational.

Velocity 2012: Web Operations & Performance — The smartest minds in web operations and performance are coming together for the Velocity Conference, being held June 25-27 in Santa Clara, Calif.

Save 20% on registration with the code RADAR20


June 01 2012

Developer Week in Review: The overhead of insecure infrastructure

I'm experiencing a slow death by pollen this week, which has prompted me to ponder some of the larger issues of life. In particular, I was struck by the news that an FPGA chip widely used in military applications has an easily exploitable back door.

There is open discussion at the moment about whether this was a deliberate attempt by a certain foreign government (*cough* China *cough*) to gain access to sensitive data and possibly engage in Stuxnet-like mischief, or just normal carelessness on the part of chip designers who left a debugging path open and available. Either way, there's a lot of hardware out there walking around with its fly down, so to speak.

As developers, we put a lot of time and effort into trying to block the acts of people with bad intent. At my day job, we have security "ninjas" on each team that take special training and devote a fair amount of their time to keeping up with the latest exploits and remediations. Web developers constantly have to guard against perils such as cross-site scripting and SQL injection hacks. Mobile developers need to make sure their remote endpoints are secure and provide appropriate authentication.

The thing is, we shouldn't have to. The underlying platforms and infrastructures we develop on top of should take care of all of this, and leave us free to innovate and create the next insanely great thing. The fact that we have to spend so much of our time building fences rather than erecting skyscrapers is a sign of how badly this basic need has been left unmet.

So why is the development biome so under protected? I think there are several factors. The first is fragmentation. It's easier to guard one big army base than 1,000 small ones. In the same way, the more languages, operating systems and packages that are in the wild, the more times you have to reinvent the wheel. Rather than focus on making a small number of them absolutely bulletproof (and applying constant vigilance to them), we jump on the flavor of the day, regardless of how much or little effort has been put into reducing the exposed security footprint of our new toy.

The fact that we have independent, massive efforts involved in securing the base operating systems for MacOS, Windows, Linux, BSD, etc, is nothing short of a crime against the development community. Pretty it up any way that suits you with a user interface, but there should (at this point in the lifecycle of operating systems) only be a single, rock-solid operating system that the whole world uses. It is only because of greed, pettiness, and bickering that we have multiple, fragile operating systems, all forgetting to lock their car before they go out to dinner.

Languages are a bit more complex, because there is a genuine need for different languages to match different styles of development and application needs. But, again, the language space is polluted with far too many "me-too" wannabes that distract from the goal of making the developer's security workload as low as possible. The next time you hear about a site that gets pwned by a buffer overrun exploit, don't think "stupid developers!", think "stupid industry!" Any language that allows a developer to leave themselves vulnerable to that kind of attack is a bad language, period!

The other major factor in why things are so bad is that we don't care, evidently. If developers refused to develop on operating systems or languages that didn't supply unattackable foundations, companies such as Apple and Microsoft (and communities such as the Linux kernel devs) would get the message in short order. Instead, we head out to conferences like WWDC eager for the latest bells and whistles, but nary a moment will be spent to think about how the security of the OS could be improved.

Personally, I'm tired of wasting time playing mall security guard, rather than Great Artist. In a world where we had made security a must-have in the infrastructure we build on, rather than in the code we develop, think of how much more amazing code could have been written. Instead, we spend endless time in code reviews, following best practices, and otherwise cleaning up after our security-challenged operating systems, languages and platform. Last weekend, we honored (at least in the U.S.) those who have given their life to physically secure our country. Maybe it's time to demand that those who secure our network and computing infrastructures do as good a job ...

OSCON 2012 — Join the world's open source pioneers, builders, and innovators July 16-20 in Portland, Oregon. Learn about open development, challenge your assumptions, and fire up your brain.

Save 20% on registration with the code RADAR20


May 30 2012

Which is easier to tune, humans or machines?

In this new Velocity Podcast, I had a conversation with Kate Matsudaira (@katemats), Vice President of Engineering at This conversation centers mostly on the human side of engineering and performance. Kate has some great insights into building an environment for human performance that goes along with your quest for more performant, reliable, scalable, tolerant, secure web properties.

Our conversation lasted 00:20:00 and if you want to pinpoint any particular topic, you can find the specific timing below. Kate provides some of her background and experience as well as what she is currently doing at here. The full conversation is outlined below.

  • Which is easier to tune for performance, humans or machines? 00:00:30

  • To achieve better performance from people, how do you teach people to trade-off the variables time, cost, quality and scope? 00:02:32

  • What do you look for when you hire engineers that will work on highly performant web properties? 00:05:06

  • In this talent-surplus economy, do you find it more difficult to hire engineers? 00:07:10

  • How do you demonstrate DevOps and Performance engineering value to an organization? 00:08:36

  • How does one go about monitoring everything and not slow down your web properties with monitoring everything? 00:12:56

  • Does continuous improvement help deliver performant properties? 00:15:14

If you would like to hear Kate speak on "Leveling up - Taking your operations and engineering role to the next level," she is presenting at the 2012 Velocity Conference in Santa Clara, Calif. on Wednesday 6/27/12 at 1:00 pm. We hope to see you there.

Velocity 2012: Web Operations & Performance — The smartest minds in web operations and performance are coming together for the Velocity Conference, being held June 25-27 in Santa Clara, Calif.

Save 20% on registration with the code RADAR20


May 17 2012

JavaScript and Dart: Can we do better?

JavaScript keeps advancing by leaps and bounds, but is it powerful enough yet? Is the Web ready to take on all the challenges we throw at it?

I talked with Seth Ladd, a web engineer and Chrome Developer Advocate at Google who's working on Dart, but still, I'm happy to say, interested in JavaScript itself. He's been working with larger projects and larger teams figuring out how to build bigger, faster, and more complex applications than most of us care to dream about.

Seth's constant push - "we can do better" - takes a hard look at where we are today with web programming, acknowledging decades of improvement but looking hard for the next best thing.

Highlights from the full video interview include:

  • Speed - is JavaScript fast enough yet? [Discussed at the 2:12 mark]
  • 60 frames per second - can the browser look that smooth? [Discussed at the 3:21 mark]
  • Dart - Structure, tooling, and reaching both JavaScript and C++ programmers [Discussed at the 6:27 mark]
  • "Dart compiles to modern JavaScript today" [Discussed at the 9:16 mark]
  • "JavaScript is becoming the bytecode of the Web" - many languages compile to JavaScript [Discussed at the 11:16 mark]
  • View Source isn't what it used to be - is Github the answer? [Discussed at the 12:07 mark]

You can view the entire conversation in the following video:

Fluent Conference: JavaScript & Beyond — Explore the changing worlds of JavaScript & HTML5 at the O'Reilly Fluent Conference (May 29 - 31 in San Francisco, Calif.).

Save 20% on registration with the code RADAR20


May 16 2012

Velocity Profile: Justin Huff

This is part of the Velocity Profiles series, which highlights the work and knowledge of web ops and performance experts.

Justin HuffJustin Huff
Software Engineer

How did you get into web operations and performance?

Picnik's founders Mike Harrington and Darrin Massena needed someone who knew something about Linux. Darrin and I had known each other for a few years, so my name came up. At the time, I was doing embedded systems work, but ended up moonlighting for Picnik. It wasn't long before I came over full time. I always expected to help them get off the ground and then they'd find a "real sysadmin" to take over. Turns out, I ended up enjoying ops! I was lucky enough to straddle the world between ops and back-end dev. Sound familiar?

What is your most memorable project?

Completing a tight database upgrade at a Starbucks mid-way between Seattle and Portland. "Replicate faster, PLEASE!" Also, in the build-up to Picnik's acquisition by Google, Mike asked me what it would take to handle 10 times our current traffic and to do it in 30 days. We doubled Picnik's hardware, including a complete network overhaul. It went flawlessly and continued to serve Picnik until Google shut it down in April of this year.

What's the toughest problem you've had to solve?

When Flickr launched with Picnik as its photo editor, we started to see really weird behavior causing some Flickr API calls to hang. I spent a good chunk of that day on the phone with John Allspaw and finally identified an issue with how our NAT box was munging TCP timestamps that were interacting badly with Flickr's servers. I learned a couple things: First, both John and I were able to gather highly detailed info (tcpdumps) at key points in our networks (and hosts) — sometimes you just have to go deep; second, it's absolutely imperative that you have good technical contacts with your partners.

What tools and techniques do you rely on most?

Graphs and monitoring are critical. Vim, because I can't figure out Emacs. Automation, because I can't even remember what I had for breakfast.

Who do you follow in the web operations and performance world?

Bryan Berry (@bryanwb) is great. Joe Williams (@williamsjoe) is doing great stuff — and his Twitter profile pic is awesome.

What is your web operations and performance super power?

I think I'm good at building, maintaining, and understanding complete systems. Other engineering disciplines are typically concerned about the details of a single part of a larger system. As web engineers, we have to grok the system, the components, and their interactions ... at 2 AM.

Velocity 2012: Web Operations & Performance — The smartest minds in web operations and performance are coming together for the Velocity Conference, being held June 25-27 in Santa Clara, Calif.

Save 20% on registration with the code RADAR20


May 10 2012

Commerce Weekly: The competitive push toward mobile payment

Here are a few of this week's stories from the commerce space that caught my eye.

Mobile payments are coming, one way or another

Square_AngleyHands.pngThe New York Times (NYT) took a look this week at the push toward mobile payments and the various paths toward that end. The push isn't only coming from a consumer desire for a mobile wallet, but also from the payment companies. The NYT's post reports:

"Merchants are facing heavy pressure to upgrade their payment terminals to accept smart cards. Over the last several months, Visa, Discover and MasterCard have said that merchants that cannot accept these cards will be liable for any losses owing to fraud."

This could be the push needed for mobile payment, at least in the U.S., to get over the technology hump that has thus far been hindering it from catching on. Jennifer Miles, executive vice president at payment terminal provider VeriFone, told the NYT, "Everybody is going to be upgrading ... Before the credit card companies made their announcements, almost no merchants were buying terminals with smart card and NFC capabilities." She says VeriFone no longer installs payment terminals without NFC readers.

NFC technology, however, not only requires upgrades from merchants, but also consumers. The post reviews mobile payment solutions from PayPal and Square, noting the directive for these two companies may be more consumer centric:

"Both PayPal and Square say that asking customers to buy NFC-enabled phones and wait for merchants to install new hardware is folly. Neither company says it has plans to incorporate NFC into its wallet."

This consumer-centric approach might be part of what's behind VeriFone's announcement this week that it would jump into the payment processing fray. Bloomberg reports:

"VeriFone Systems Inc. (PAY), the largest maker of credit-card terminals, will offer an attachment that lets mobile devices accept credit and debit cards, making a deeper push into a market pioneered by Square Inc. and EBay Inc. (EBAY)'s PayPal ... VeriFone's version will allow partners such as banks to customize the service to transmit coupons and loyalty points to consumers, said Greg Cohen, a senior vice president at San Jose, California-based VeriFone."

VeriFone's system will work with Apple and Android mobile devices.

X.commerce harnesses the technologies of eBay, PayPal and Magento to create the first end-to-end multi-channel commerce technology platform. Our vision is to enable merchants of every size, service providers and developers to thrive in a marketplace where in-store, online, mobile and social selling are all mission critical to business success. Learn more at

MasterCard releases PayPass

MasterCard announced its new PayPass Wallet Services this week. The company describes the global service in a press release:

PayPass Wallet Services delivers three distinct components — PayPass Acceptance Network (PayPass Online and PayPass Contactless), PayPass Wallet and PayPass API. These services enable a consistent shopping experience no matter where and how consumers shop, as well as a suite of digital wallet services, and developer tools to make it easier to connect other wallets into the PayPass Online acceptance network.

In other words, it's designed to work with any sort of digital wallet used by its partners. According to the release, American Airlines and Barnes & Noble are in the initial group of merchant partners.

One of the big differences between MasterCard's system and those of its competitors is its open nature. PC World reports:

What sets MasterCard's offering apart from digital wallet systems announced by Visa, Google, PayPal and others is how much the company is opening up its platform to third parties, said Gartner wireless analyst Mark Hung. Banks and other partners will be able to adopt PayPass Wallet Services in two different ways: They can use MasterCard's own service under their own brand or just use the company's API (application programming interface) to build their own platform.

Mobile payment readiness, global edition

How ready is the world for mobile payments? MasterCard has that covered this week, too. In a guest post at Forbes, vice president of MasterCard Worldwide Theodore Iacobuzio wrote about the launch of the MasterCard Mobile Payments Readiness Index (MPRI), a data-driven survey of the mobile payments landscape. Iacobuzio says the index "assesses and ranks 34 global economies in terms of how ready (or not) they are for mobile payments of three types":

  • M-commerce, which is e-commerce conducted from a mobile phone or tablet.
  • Point-of-Sale (POS) mobile payments where a smart phone becomes the authentication device to complete a transaction at checkout.
  • Person-to-Person (P2P) mobile payments that involve the direct transfer of funds from one person to another using a mobile device.

Iacobuzio says that "one of the top-level findings is that unless all constituents — banks, merchants, telcos, device makers, governments — collaborate on developing new solutions and services, the mainstream adoption of mobile payments will be slower, more contentious and more expensive." He discusses the needs for mobile payments around the world, including in developed, developing and emerging countries.

But who's ready? The following image is a screenshot of the index summary. Note that no country has yet hit the "inflection point":

A screenshot of the MasterCard Mobile Payments Readiness Index (MPRI). Click here to access the full site.

Dan Rowinski at ReadWriteWeb has a nice analysis of the index. In part, he says much of the finance world, including MasterCard, may be viewing the mobile payment situation through "rose-colored glasses":

"For instance, why do mobile payments skew heavily toward young males in developed countries? The answer, more or less, is because it is cool. The actual need for mobile payments (NFC or otherwise) is not as clear in the U.S. as it is in other countries, like Kenya and Singapore."

Mobile shopping needs faster carts

Michael Darnaud, CEO of i-Cue Design, proposed a solution this week for one of the major problems with mobile shopping: speed, or lack thereof. In a post at Mobile Commerce Daily, he says the steps to a purchase simply take too long because of the number of data transfers involved:

"Just clicking a button to 'add,' 'delete' or 'change quantity' on the mobile Web requires sending transaction data from the shopper's mobile device to the vendor's server — average three to five seconds — via cell towers, not high-speed cables. These interim steps, long before checking out, are the challenge — it is all about time."

"Time is money" is no joke in mobile commerce. Darnaud notes: "A recent Wall Street Journal article declared that sales at Amazon increase by 1 percent for every 100 milliseconds it shaves off download times." To that end, he suggests an improvement to online cart technology that "reduces the time it takes to 'add,' 'delete' or 'change quantity' by virtually 100 percent because it eliminates the need for a server call for each of those commands." He describes his solution:

"This 'instant-add' cart solution requires nothing but familiar HTML and JavaScript. It is an incremental change that can be inserted into virtually any new or existing cart.

And what that means to a customer arriving at your site on the mobile Web is that he or she can see a product, click 'add to cart' and have no forced page change or reload or waiting time at all as a result."

Darnaud also notes the "elegance" of the solution: "... it forms a perfect bridge between desktop and mobile Web. The reason is simply that it works identically on both, via the browser."

Tip us off

News tips and suggestions are always welcome, so please send them along.


Understanding Mojito

Yahoo's Mojito is a different kind of framework: all JavaScript, but running on both the client and the server. Code can run on the server, or on the client, depending on how the framework is tuned. It shook my web architecture assumptions by moving well beyond the convenience of a single language, taking advantage of that approach to process code where it seems most efficient. Programming this way will make it much easier to bridge the gap between developing code and running it efficiently.

I talked with Yahoo architect fellow and VP Bruno Fernandez-Ruiz (@olympum) about the possibilities Node opened and Mojito exploits.

Highlights from the full video interview include:

  • "The browser loses the chrome." Web applications no longer always look like they've come from the Web. [Discussed at the 02:11 mark]
  • Basic "Hello World" in Mojito. How do you get started? [Discussed at the 05:05 mark]
  • Exposing web services through YQL. Yahoo Query Language lets you work with web services without sweating the details. [Discussed at the 07:56 mark]
  • Manhattan, a closed Platform as a Service. If you want a more complete hosting option for your Mojito applications, take a look. [Discussed at the 10:29 mark]
  • Code should flow among devices. All of these devices speak HTML and JavaScript. Can we help them talk with each other? [Discussed at the 11:50 mark]

You can view the entire conversation in the following video:

Fluent Conference: JavaScript & Beyond — Explore the changing worlds of JavaScript & HTML5 at the O'Reilly Fluent Conference (May 29 - 31 in San Francisco, Calif.).

Save 20% on registration with the code RADAR20


May 09 2012

Theo Schlossnagle on DevOps as a career

In this new Velocity Podcast, I had a conversation with Theo Schlossnagle (@postwait), the founder and CEO of OmniTI. This conversation centers mostly on DevOps as a discipline and career. Theo, as always, has some interesting insights into DevOps and how to build a successful career in this industry.

Our conversation lasted 00:13:21. If you want to pinpoint any particular topic, you can find the specific timing below. I will apologize now: Theo's image froze a couple minutes into our conversation, but since it was our second attempt at this, and it is a conversation, I feel the content of his answers is what most of us what to hear, not whether or not he is smiling or gesturing.

  • Are we splitting hairs with our terms of WebOps, DevOps, WebDev, etc? 00:00:42
  • What are the important goals developers should have in mind when building Systems that Operate? 00:01:28
  • How do you define, spec and set best practices for your DevOps organization so that your whole team is working well? 00:02:38
  • What does a typical day look like in the DevOps world? 00:03:39
  • What are the key attributes and skills someone should have to become a skilled DevOps? 00:04:50
  • What is the hardest to master for a young DevOps, security, scalability, reliability or performance?00:06:22
  • Is DevOps more of a craft, discipline, methodology, way of thinking, what is it?00:07:35
  • If your DevOps is operating well, do you notice it and how do you measure it if all is well?00:08:47
  • What do you think the most significant thing a sharp DevOps person can contribute to an organization, and how do they know if they have achieved excellence? 00:10:16

If you would like to hear Theo speak on "It's All About Telemetry," he is presenting at the 2012 Velocity Conference in Santa Clara, Calif. on Tuesday 6/26/12 at 1:00pm. We hope to see you there.

Velocity 2012: Web Operations & Performance — The smartest minds in web operations and performance are coming together for the Velocity Conference, being held June 25-27 in Santa Clara, Calif.

Save 20% on registration with the code RADAR20


Velocity Profile: Nicole Sullivan

This is part of the Velocity Profiles series, which highlights the work and knowledge of web ops & performance experts.

Nicole SullivanNicole Sullivan

How did you get into web operations & performance?

Accidentally. Years back, I got hired into a company in France that was building a website for one of the major cell phone providers over there. They had some serious performance issues — the site was crashing in Internet Explorer (IE) pretty much any time you interacted with the page. It was a hunt to figure out what was going on because, at that time, there really wasn't a lot of published performance information out there. So, I ended up finding out that filters in the CSS file were causing IE to crash. That hunt to identify the problem and then the subsequent hunts to simplify the page so that other errors wouldn't have such a big impact was really fun. That's what got me into it.

What is your most memorable project?

Optimizing Facebook's CSS back in 2009 was a memorable project. They had 1.9 MB of CSS, which is just huge. That project is when I realized that most performance issues and most code issues are actually human issues. But you have to solve the human issues or the bad code will just keep popping up — sort of like performance Whac-A-Mole.

Another project that was cool was They had a lot of CSS, but more than the quantity, it was really tangled. They would have to rewrite things over and over again, just because everything was so context-dependent. That one was fun because it was neat to see the team end up being able to build things much faster once their front-end architecture issues were removed.

What's the toughest problem you've had to solve?

One of the toughest problems I have to solve, and I have to solve it all the time, is how to make performance and operations improvements work in a legacy world. We don't work in a world where we can just wipe the slate clean and do it right from the start. We work in a world where the website has to stay up and we have to make these changes while everything is running. The balance between keeping the legacy running and managing to do improvements, until the legacy can be removed, is probably the hardest problem. And it happens on almost every project.

What tools and techniques do you rely on most?

The work from the Chrome team has been making me really happy lately. They're pushing the boundaries in front-end code, JavaScript, CSS, and especially dev tools. I was on Firefox Dev Tools for a long time, but there was too much incompatibility between different versions of Firefox and the tools that I absolutely needed to do my job every day. So I swapped, reluctantly, over to Chrome and have actually found that the Chrome Developer Tools have made some substantial improvements in terms of usability and the kinds of information that you can get out of the tools. It's pretty cool stuff.

Who do you follow in the web operations & performance world?

Chris Coyier is constantly experimenting, throwing stuff out there, trying new techniques, trying out the browser stuff, and finding the rough edges where things don't work very well. Tab Atkins and Alex Russell are both involved in Chrome and standards at Google. They're amazing people to follow. Another person is Lea Verou. She really pushes the edge in tooling around CSS and taking the specs and bending them to do things they maybe weren't intended to do. I also follow people who are doing LESS and SASS because the preprocessing languages are an interesting development and have a whole different set of performance constraints.

What is your web operations & performance super power?

I think I do pretty well with CSS stuff. I've been doing it for more than a decade now. Friends will send me CSS issues that they're struggling with and I can jump in and pretty quickly identify why it isn't working. Somehow, I've internalized all of the different bits of the different browsers and just kind of know what to do or what not to do.

Velocity 2012: Web Operations & Performance — The smartest minds in web operations and performance are coming together for the Velocity Conference, being held June 25-27 in Santa Clara, Calif.

Save 20% on registration with the code RADAR20


Giving the Velocity website a performance makeover

Zebulon Young and I, web producers at O'Reilly Media, recently spent time focusing on the performance of the Velocity website. We were surprised by the results we achieved with a relatively small amount of effort. In two days we dropped Velocity's page weight by 49% and reduced the total average U.S. load time by 3.5 seconds1. This is how we did it.

Velocity is about speed, right?

To set the stage, here's the average load time for Velocity's home page as measured2 by Keynote before our work:

Chart: 7 Second Load Times

As the averages hovered above seven seconds, these load times definitely needed work. But where to start?

The big picture

If you take a look at the raw numbers for Velocity, you'll see that, while it's a relatively simple page, there's something much bigger behind the scenes. As measured3 above, the full page weight was 507 kB and there were 87 objects. This meant that the first time someone visited Velocity, their browser had to request and display a total of 87 pieces of HTML, images, CSS, and more — the whole of which totaled nearly half a megabyte:

Chart: Total Bytes 507k, Total Objects 87

Here's a breakdown of the content types by size:

Content Pie Chart

To top it off, a lot of these objects were still being served directly from our Santa Rosa, Calif. data center, instead of our Content Delivery Network (CDN). The problem with expecting every visitor to connect to our servers in California is simple: Not every visitor is near Santa Rosa. Velocity's visitors are all over the globe, so proper use of a CDN means that remote visitors will be served objects much closer to the connection they are currently using. Proximity improves delivery.

Getting started

At this point, we had three simple goals to slim down Velocity:

  1. Move all static objects to the CDN
  2. Cut down total page weight (kilobytes)
  3. Minimize the number of objects

1) CDN relocation and image compression

Our first task was compressing images and relocating static objects to the CDN. Using and the Google Page Speed lossless compression tools, we got to work crushing those image file sizes down.

To get a visual of the gains that we made, here are before and after waterfall charts from tests that we performed using Look at the download times for ubergizmo.jpg:

Before CDN Waterfall

You can see that the total download time for that one image dropped from 2.5 seconds to 0.3 seconds. This is far from a scientific A/B comparison, so you won't always see results this dramatic from CDN usage and compression, but we're definitely on the right track.

2) Lazy loading images

When you're trimming fat from your pages to improve load time, an obvious step is to only load what you need, and only load it when you need it. The Velocity website features a column of sponsor logos down the right-hand side of most pages. At the time of this writing, 48 images appear in that column, weighing in at 233 kB. However, only a fraction of those logos appear in even a large browser window without scrolling down.

Sidebar Sponsor Image Illustration

We addressed the impact these images had on load time in two ways. First, we deferred the load of these images until after the rest of the page had rendered — allowing the core page content to take priority. Second, when we did load these images, we only loaded those that would be visible in the current viewport. Additional logos are then loaded as they are scrolled into view.

These actions were accomplished by replacing the <img> tags in the HTML rendered by the server with text and meta-data that is then acted upon by JavaScript after the page loads. The code, which has room for additional enhancements, can be downloaded from GitHub.

The result of this enhancement was the removal of 48 requests and a full 233 kB from the initial page load, just for the sponsor images4. Even when the page has been fully rendered in the most common browser window size of 1366 x 768 pixels, this means cutting up to 44 objects and 217 kB from the page weight. Of course, the final page weight varies by how much of the page a visitor views, but the bottom line is that these resources don't delay the rendering of the primary page content. This comes at the cost of only a slight delay before the targeted images are displayed when the page initially loads and when it is scrolled. This delay might not be acceptable in all cases, but it's a valuable tool to have on your belt.

3) Using Sprites

The concept of using sprites for images has always been closely tied to Steve Souders' first rule for faster-loading websites, make fewer HTTP requests. The idea is simple: combine your background images into a single image, then use CSS to display only the important parts.

Historically there's been some reluctance to embrace the use of sprites because it seems as though there's a lot of work for marginal benefits. In the case of Velocity, I found that creation of the sprites only took minutes with the use of Steve Souders' simple SpriteMe tool. The results were surprising:

Sprite Consolidation Illustration

Just by combining some images and (once again) compressing the results, we saw a drop of page weight by 47 kB and the total number of objects reduced by 11.

4) Reassessing third-party widgets (Flickr and Twitter)

Third-party widget optimization can be one of the most difficult performance challenges to face. The code often isn't your own, isn't hosted on your servers, and, because of this, there are inherent inflexibilities. In the case of Velocity, we didn't have many widgets to review and optimize. After we spent some time surveying the site, we found two widgets that needed some attention.

The Flickr widget

The Flickr widget on Velocity was using JavaScript to pull three 75x75 pixel images directly from Flickr so they could be displayed on the "2011 PHOTOS" section seen here:

Flickr Widget Screenshot

There were a couple of problems with this. One, the randomization of images isn't essential to the user experience. Two, even though the images from Flickr are only 75x75, they were averaging about 25 kB each, which is huge for a tiny JPEG. With this in mind, we did away with the JavaScript altogether and simply hosted compressed versions of the images on our CDN.

With that simple change, we saved 56 kB (going from 76 kB to 20 kB) in file size alone.

The "Tweet" widget

As luck would have it, there had already been talk of removing the Tweet widget from the Velocity site before we began our performance efforts. After some investigation into how often the widget was used, then some discussion of its usefulness, we decided the Twitter widget was no longer essential. We removed the Twitter widget and the JavaScript that was backing it.

Tweet Widget Screenshot

The results

So without further ado, let's look at the results of our two-day WPO deep dive. As you can see by our "after" Keynote readings, the total downloaded size dropped to 258.6 kB and the object count slimmed down to 34:

After WPO Content Breakdown

After WPO Content Pie Chart

Our starting point of 507 kB with 87 objects, was reduced by 49%, with 56% fewer objects on the page.

And for the most impressive illustration of the performance gains that were made, here's the long-term graph of Velocity's load times, in which they start around 7 seconds and settle around 2.5 seconds:

Chart Showing Drop to 2.5 Second Average Load Times


The biggest lesson we learned throughout this optimization process was that there isn't one single change that makes your website fast. All of the small performance changes we made added up, and suddenly we were taking seconds off our page's load times. With a little time and consideration, you may find similar performance enhancements in your own site.

And one last thing: Zeb and I will see you at Velocity in June.

Velocity 2012: Web Operations & Performance — The smartest minds in web operations and performance are coming together for the Velocity Conference, being held June 25-27 in Santa Clara, Calif.

Save 20% on registration with the code RADAR20

1, 2, 3Measurements and comparisons taken with Keynote (Application Perspective - ApP) Emulated Browser monitoring tools.

4We also applied this treatment to the sponsor banner in the page footer, for additional savings.

Reposted bydatenwolfcremeathalis

May 08 2012

jQuery took on a common problem and then grew through support

As part of our Velocity Profiles series, we're highlighting interesting conversations we've had with web ops and performance pros.

In the following interview from Velocity 2011, jQuery creator John Resig (@jeresig) discusses the early days of jQuery, the obstacles of cross-platform mobile development, and JavaScript's golden age.

Highlights from the interview include:

  • The initial goals for jQuery and why it caught on — Resig's web app projects kept bumping up against cross-browser issues, so he took a step back and built a JavaScript library that addressed his frustrations. He also notes that good documentation and feedback mechanisms are big reasons why jQuery caught on so quickly. "Put yourself in the shoes of someone who's trying to use your thing," he says. [Discussed 22 seconds in.]

  • The challenges of developing jQuery Mobile — "It's been a rocky adventure," Resig says. The core issue is the same as on the desktop side — cross-browser compatibility — but Resig says there's an extra twist: mobile has "even more browsers, and they're weirder." [Discussed at 2:28.]
  • Is JavaScript in a golden age? — It's in a "prolonged golden age," Resig says. The key shift is that many developers now acknowledge JavaScript's importance. "You can't build a web application without understanding JavaScript. JavaScript is a fundamental aspect of any sort of web development you do today." [Discussed at 4:05.]

The full interview is available in the following video.

Velocity 2012: Web Operations & Performance — The smartest minds in web operations and performance are coming together for the Velocity Conference, being held June 25-27 in Santa Clara, Calif.

Save 20% on registration with the code RADAR20


May 03 2012

Jason Grigsby and Lyza Danger Gardner on mobile web design

This Velocity podcast with Cloud Four founding members Jason Grigsby (@grigs) and Lyza Danger Gardner (@lyzadanger) centers on mobile web performance. It's a fitting topic since these two wrote "Head First Mobile Web." Jason and Lyza have interesting insights into building high-performance websites that are ready for mobile.

Our conversation lasted nearly 20 minutes, so if you want to pinpoint any particular topic use the specific timing links noted below. The full interview is embedded at the end of this post.

  • The difference between a website and a mobile website 00:00:50
  • What tools are available for determining your performance benchmarks for a mobile web site? 00:03:18
  • What considerations need to be taken into effect to truly build a site that performs like greased lightning? 00:05:02
  • Has Google improved its Android browser to catch up with the Chrome browser? 00:07:04
  • What are some of the most common mistakes or patterns that developers make when building a mobile web site? 00:08:08
  • What do the two terms "mobile-first responsive web design" and "progressive enhancement" mean? 00:12:36
  • How do you make progressive enhancements when one Android phone may have five different browsers? Do you have five forks of a code base? 00:13:30
  • How do developers pick up best practices for mobile web development? 00:15:38
  • The mobile platform keeps growing and bringing lots of change. 00:17:13

If you would like to hear Jason Grigsby speak on "Performance Implications of Responsive Web Design," he is presenting at the 2012 Velocity Conference in Santa Clara, Calif. on Tuesday, June 26 at 1 pm. We hope to see you there.

Velocity 2012: Web Operations & Performance — The smartest minds in web operations and performance are coming together for the Velocity Conference, being held June 25-27 in Santa Clara, Calif.

Save 20% on registration with the code RADAR20


May 02 2012

Velocity Profile: Sergey Chernyshev

This is part of the Velocity Profiles series, which highlights the work and knowledge of web ops and performance experts.

Sergey ChernyshevSergey Chernyshev
Director of web systems and applications, truTV
Organizer, New York Web Performance Meetup
@sergeyche, @perfplanet

How did you get into web operations and performance?

I've been doing web development and operations since 1996. Before there were different people running websites, one person was responsible for everything. So in addition to adding features, I was making sure websites were running and running fast. In 2007, I heard Steve Souders and Teni Thurer present their first findings at the Web 2.0 Expo, and after that, I was converted to the church of web performance optimization (WPO).

What is your most memorable project?

The most memorable are the two projects I'm most active on: Show Slow and running the New York Web Performance Meetup.

What's the toughest problem you've had to solve?

The toughest is to make people believe that WPO is important and change perspectives on how to approach performance. It's far from solved, but I hope I helped by kick-starting a local community movement — we now have 16 active groups across the globe with more than 5,000 members.

What tools and techniques do you rely on most?

Show Slow and WebPageTest.

Who do you follow in the web operations and performance world?

I run the @perfplanet account on Twitter where I follow a bunch of people and re-tweet WPO-related tweets. You can see my list here.

In addition, Brad Fitzpatrick of LiveJournal fame isn't doing much of this work these days, but he's behind many great technologies, including Memcached, Gearman and more.

Velocity 2012: Web Operations & Performance — The smartest minds in web operations and performance are coming together for the Velocity Conference, being held June 25-27 in Santa Clara, Calif.

Save 20% on registration with the code RADAR20


Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.
No Soup for you

Don't be the product, buy the product!

YES, I want to SOUP ●UP for ...