Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

December 18 2012

Interoperating the industrial Internet

One of the most interesting points made in GE’s “Unleashing the Industrial Internet” event was GE CEO Jeff Immelt’s statement that only 10% of the value of Internet-enabled products is in the connectivity layer; the remaining 90% is in the applications that are built on top of that layer. These applications enable decision support, the optimization of large scale systems (systems “above the level of a single device,” to use Tim O’Reilly’s phrase), and empower consumers.

Given the jet engine that was sitting on stage, it’s worth seeing how far these ideas can be pushed. Optimizing a jet engine is no small deal; Immelt said that the engine gained an extra 5-10% efficiency through software, and that adds up to real money. The next stage is optimizing the entire aircraft; that’s certainly something GE and its business partners are looking into. But we can push even harder: optimize the entire airport (don’t you hate it when you’re stuck on a jet waiting for one of those trucks to push you back from the gate?). Optimize the entire air traffic system across the worldwide network of airports. This is where we’ll find the real gains in productivity and efficiency.

So it’s worth asking about the preconditions for those kinds of gains. It’s not computational power; when you come right down to it, there aren’t that many airports, aren’t that many flights in the air at one time. There are something like 10,000 flights in the air at one time, worldwide; and in these days of big data, and big distributed systems, that’s not a terribly large number. It’s not our ability to write software; there would certainly be some tough problems to solve, but certainly nothing as difficult as, say, searching the entire web and returning results in under a second.

But there is one important prerequisite for software that runs above the level of a single machine, and that’s interoperability. That’s something the inventors of the Internet understood early on; nothing became a standard unless at least two independent, interoperable implementations existed. The Interop conference didn’t start as a trade show, it started as a technical exercise where everyone brought their experimental hardware and software and worked on it until it played well together.

If we’re going to build useful applications on top of the industrial Internet, we must ensure, from the start, that the components we’re talking about interoperate. It’s not just a matter of putting HTTP everywhere. Devices need common, interoperable data representations. And that problem can’t be solved just by invoking XML: several years of sad experience has proven that it’s certainly possible to be proprietary under the aegis of “open” XML standards.

It’s a hard problem, in part because it’s not simply technical. It’s also a problem of business culture, and the desire to extract as much monetary value from your particular system as possible. We see the consumer Internet devolving into a set of walled gardens, with interoperable protocols but license agreements that prevent you from moving data from one garden into another. Can the industrial Internet do better? It takes a leap of faith to imagine manufacturers of industrial equipment practicing interoperability, at least in part because so many manufacturers have already developed their own protocols and data representations in isolation. But that’s what our situation demands. Should a GE jet engine interoperate with a jet engine from Pratt and Whitney? What would that mean, what efficiencies in maintenance and operations would that entail? I’m sure that any airline would love a single dashboard that would show the status of all its equipment, regardless of vendor. Should a Boeing aircraft interoperate with Airbus and Bombardier in a system to exchange in-flight data about weather and other conditions? What if their flight computers were in constant communication with each other? What would that enable? Leaving aviation briefly: self-driving cars have the potential to be much safer than human-driven cars; but they become astronomically safer if your Toyota can exchange data directly with the BMW coming in the opposite direction. (“Oh, you intend to turn left here? Your turn signal is out, by the way.”)

Extracting as much value as possible from a walled garden is false optimization. It may lead you to a local maximum in profitability, but it leaves the biggest gains, the 90% that Immelt talked about in his keynote, behind. Tim O’Reilly has talked about the “clothesline paradox“: if you dry your clothes on a clothesline, the money you save doesn’t disappear from the economy, even though it disappears from the electric company’s bottom line. The economics of walled gardens is the clothesline paradox’s evil twin. Building a walled garden may increase local profitability, but prevents larger gains, Immelt’s 90% gains in productivity, from existing. They never reach the economy.

Can the industrial Internet succeed in breaking down walled gardens, whether they arise from business culture, legacy technology, or some other source? That’s a hard problem. But it’s the problem the industrial Internet must solve if it is to succeed.

This is a post in our industrial Internet series, an ongoing exploration of big machines and big data. The series is produced as part of a collaboration between O’Reilly and GE.

October 16 2012

Industrial Internet links

Here’s a broad look at a few recent items of interest related to the industrial Internet — the world of smart, connected, big machines.

Smarter Robots, With No Wage Demands (Bloomberg Businessweek) — By building more intelligence into robots, Rethink Robotics figures it can get them into jobs where work has historically been too irregular or too small-scale for automation. That could mean more manufacturing stays in American factories, though perhaps with fewer workers.

The Great Railway Caper (O’Reilly Strata EU) — Today’s railroads rely heavily on the industrial Internet to optimize locomotive operations and maintain their very valuable physical plant. Some of them were pioneers in big networked machines. Part of Sprint originated as the Southern Pacific Railroad Network of Intelligent Telecommunications, which used the SP’s rights-of-way to transmit microwave and fiber optic signals. But in the 1950s, computing in railways was primitive (as it was just about everywhere else, too). John Graham-Cumming relayed this engaging story of network optimization in 1955 at our Strata Conference in London two weeks ago.

The Quiet Comfort of the Internet of Things (O’Reilly Strata EU) — Alexandra Deschamps-Sonsino presents a quirky counterpoint to the Internet of big things; what you might call the Internet of very small things. She leads Designswarm and founded Good Night Lamp, which produces a family of Internet-connected lamps that let friends and family communicate domestic milestones, like bedtime. Her Strata keynote explains a bit of her work and approach.

Solar panel control systems vulnerable to hacks, feds warn (Ars Technica) — A good reminder that the industrial Internet can be vulnerable to the same sorts of attacks as the rest of the Internet if it’s not built out properly — in this case, run-of-the-mill SQL injection.

The industrial Internet series is produced as part of a collaboration between O’Reilly and GE.


October 09 2012

Six themes from Velocity Europe

By Steve Souders and John Allspaw

More than 700 performance and operations engineers were in London last week for Velocity Europe 2012. Below, Velocity co-chairs Steve Souders and John Allspaw note high-level themes from across the various tracks (especially the hallway track) that are emerging for the WPO and DevOps communities.

Velocity Europe 2012 in LondonVelocity Europe 2012 in London

Performance themes from Steve Souders

I was in awe of the speaker and exhibitor lineup going into Velocity Europe. It was filled with knowledgeable gurus and industry leaders. As Velocity Europe unfolded a few themes kept recurring, and I wanted to share those with you.

Performance matters more — The places and ways that web performance matters keeps growing. The talks at Velocity covered desktop, mobile (native, web, and hybrid), tablet, TV, DSL, cable, FiOS, 3G, 4G, LTE, and WiMAX across social, financial, ecommerce, media, games, sports, video, search, analytics, advertising, and enterprise. Although all of the speakers were technical, they talked about how the focus on performance extends to other departments in their companies as well as the impact performance has on their users. Web performance has permeated all aspects of the web and has become a primary focus for web companies.

Organizational challenges are the hardestLonely Planet and SoundCloud talked about how the challenges in shifting their organizational culture to focus on performance were more difficult than the technical work required to actually improve performance. During the hallway track, myself and a few other speakers were asked about ways to initiate this culture shift. There’s growing interest in figuring out how to change a company’s culture to value and invest in performance. This reminded me of our theme from Velocity 2009, the impact of performance and operations on the bottom line, where we brought in case studies that described the benefits of web performance using the vocabulary of the business. In 2013 I predict we’ll see a heavier emphasis on case studies and best practices for making performance a priority for the organization using a slightly different vocabulary, with terms like “culture,” “buy-in” and “DNA.”

The community is huge — As of today there are 42 web performance meetup groups totaling nearly 15,000 members worldwide: 15,000 members just over three years! In addition to meetup groups, Aaron Kulick and Stephen Thair organized the inaugural WebPerfDays events in Santa Clara, Calif. and London (respectively). WebPerfDays, modelled after DevOpsDays, is an unconference for the web performance community organized by the web performance community. Although these two events coincided with Velocity, the intent is that anyone in the world can use the resources (templates, website, Twitter handle, etc.) to organize their own WebPerfDays. A growing web performance community means more projects, events, analyses, etc. reaching more people. I encourage you to attend your local web performance meetup group. If there isn’t one, then organize it. And consider organizing your own WebPerfDays as a one-day mini-Velocity in your own backyard.

Operations themes from John Allspaw

As if it was an extension of what we saw at Velocity U.S., there were a number of talks that underscored the importance of the human factor in web operations. I gave a tutorial called “Escalating Scenarios: A Deep Dive Into Outage Pitfalls” that mostly centered around the situations when ops teams find themselves responding to complex failure scenarios. Stephen Nelson-Smith gave a whirlwind tour of patterns and anti-patterns on workflows and getting things done in an engineering and operations context.

Gene Kim, Damon Edwards, John Willis, and Patrick Debois looked at the fundamentals surrounding development and operations cooperation and collaboration, in “DevOps Patterns Distilled.” Mike Rembetsy and Patrick McDonnell followed up with the implementation of those fundamentals at Etsy over a four-year period.

Theo Schlossnagle, ever the “dig deep” engineer, spoke on monitoring and observability. He gave some pretty surgical techniques for peering into production infrastructure in order to get an idea of what’s going on under the hood, with DTrace and tcpdump.

A number of talks covered handling large-scale growth:

These are just a few of the highlights we saw at Velocity Europe in London. As usual, half the fun was the hallway track: engineers trading stories, details, and approaches over food and drink. A fun and educational time was had by all.

May 09 2012

Giving the Velocity website a performance makeover

Zebulon Young and I, web producers at O'Reilly Media, recently spent time focusing on the performance of the Velocity website. We were surprised by the results we achieved with a relatively small amount of effort. In two days we dropped Velocity's page weight by 49% and reduced the total average U.S. load time by 3.5 seconds1. This is how we did it.

Velocity is about speed, right?

To set the stage, here's the average load time for Velocity's home page as measured2 by Keynote before our work:

Chart: 7 Second Load Times

As the averages hovered above seven seconds, these load times definitely needed work. But where to start?

The big picture

If you take a look at the raw numbers for Velocity, you'll see that, while it's a relatively simple page, there's something much bigger behind the scenes. As measured3 above, the full page weight was 507 kB and there were 87 objects. This meant that the first time someone visited Velocity, their browser had to request and display a total of 87 pieces of HTML, images, CSS, and more — the whole of which totaled nearly half a megabyte:

Chart: Total Bytes 507k, Total Objects 87

Here's a breakdown of the content types by size:

Content Pie Chart

To top it off, a lot of these objects were still being served directly from our Santa Rosa, Calif. data center, instead of our Content Delivery Network (CDN). The problem with expecting every visitor to connect to our servers in California is simple: Not every visitor is near Santa Rosa. Velocity's visitors are all over the globe, so proper use of a CDN means that remote visitors will be served objects much closer to the connection they are currently using. Proximity improves delivery.

Getting started

At this point, we had three simple goals to slim down Velocity:

  1. Move all static objects to the CDN
  2. Cut down total page weight (kilobytes)
  3. Minimize the number of objects

1) CDN relocation and image compression

Our first task was compressing images and relocating static objects to the CDN. Using and the Google Page Speed lossless compression tools, we got to work crushing those image file sizes down.

To get a visual of the gains that we made, here are before and after waterfall charts from tests that we performed using Look at the download times for ubergizmo.jpg:

Before CDN Waterfall

You can see that the total download time for that one image dropped from 2.5 seconds to 0.3 seconds. This is far from a scientific A/B comparison, so you won't always see results this dramatic from CDN usage and compression, but we're definitely on the right track.

2) Lazy loading images

When you're trimming fat from your pages to improve load time, an obvious step is to only load what you need, and only load it when you need it. The Velocity website features a column of sponsor logos down the right-hand side of most pages. At the time of this writing, 48 images appear in that column, weighing in at 233 kB. However, only a fraction of those logos appear in even a large browser window without scrolling down.

Sidebar Sponsor Image Illustration

We addressed the impact these images had on load time in two ways. First, we deferred the load of these images until after the rest of the page had rendered — allowing the core page content to take priority. Second, when we did load these images, we only loaded those that would be visible in the current viewport. Additional logos are then loaded as they are scrolled into view.

These actions were accomplished by replacing the <img> tags in the HTML rendered by the server with text and meta-data that is then acted upon by JavaScript after the page loads. The code, which has room for additional enhancements, can be downloaded from GitHub.

The result of this enhancement was the removal of 48 requests and a full 233 kB from the initial page load, just for the sponsor images4. Even when the page has been fully rendered in the most common browser window size of 1366 x 768 pixels, this means cutting up to 44 objects and 217 kB from the page weight. Of course, the final page weight varies by how much of the page a visitor views, but the bottom line is that these resources don't delay the rendering of the primary page content. This comes at the cost of only a slight delay before the targeted images are displayed when the page initially loads and when it is scrolled. This delay might not be acceptable in all cases, but it's a valuable tool to have on your belt.

3) Using Sprites

The concept of using sprites for images has always been closely tied to Steve Souders' first rule for faster-loading websites, make fewer HTTP requests. The idea is simple: combine your background images into a single image, then use CSS to display only the important parts.

Historically there's been some reluctance to embrace the use of sprites because it seems as though there's a lot of work for marginal benefits. In the case of Velocity, I found that creation of the sprites only took minutes with the use of Steve Souders' simple SpriteMe tool. The results were surprising:

Sprite Consolidation Illustration

Just by combining some images and (once again) compressing the results, we saw a drop of page weight by 47 kB and the total number of objects reduced by 11.

4) Reassessing third-party widgets (Flickr and Twitter)

Third-party widget optimization can be one of the most difficult performance challenges to face. The code often isn't your own, isn't hosted on your servers, and, because of this, there are inherent inflexibilities. In the case of Velocity, we didn't have many widgets to review and optimize. After we spent some time surveying the site, we found two widgets that needed some attention.

The Flickr widget

The Flickr widget on Velocity was using JavaScript to pull three 75x75 pixel images directly from Flickr so they could be displayed on the "2011 PHOTOS" section seen here:

Flickr Widget Screenshot

There were a couple of problems with this. One, the randomization of images isn't essential to the user experience. Two, even though the images from Flickr are only 75x75, they were averaging about 25 kB each, which is huge for a tiny JPEG. With this in mind, we did away with the JavaScript altogether and simply hosted compressed versions of the images on our CDN.

With that simple change, we saved 56 kB (going from 76 kB to 20 kB) in file size alone.

The "Tweet" widget

As luck would have it, there had already been talk of removing the Tweet widget from the Velocity site before we began our performance efforts. After some investigation into how often the widget was used, then some discussion of its usefulness, we decided the Twitter widget was no longer essential. We removed the Twitter widget and the JavaScript that was backing it.

Tweet Widget Screenshot

The results

So without further ado, let's look at the results of our two-day WPO deep dive. As you can see by our "after" Keynote readings, the total downloaded size dropped to 258.6 kB and the object count slimmed down to 34:

After WPO Content Breakdown

After WPO Content Pie Chart

Our starting point of 507 kB with 87 objects, was reduced by 49%, with 56% fewer objects on the page.

And for the most impressive illustration of the performance gains that were made, here's the long-term graph of Velocity's load times, in which they start around 7 seconds and settle around 2.5 seconds:

Chart Showing Drop to 2.5 Second Average Load Times


The biggest lesson we learned throughout this optimization process was that there isn't one single change that makes your website fast. All of the small performance changes we made added up, and suddenly we were taking seconds off our page's load times. With a little time and consideration, you may find similar performance enhancements in your own site.

And one last thing: Zeb and I will see you at Velocity in June.

Velocity 2012: Web Operations & Performance — The smartest minds in web operations and performance are coming together for the Velocity Conference, being held June 25-27 in Santa Clara, Calif.

Save 20% on registration with the code RADAR20

1, 2, 3Measurements and comparisons taken with Keynote (Application Perspective - ApP) Emulated Browser monitoring tools.

4We also applied this treatment to the sponsor banner in the page footer, for additional savings.

Reposted bydatenwolfcremeathalis

June 17 2011

Velocity 2011 debrief

Women's MeetupVelocity wrapped up yesterday. This was Velocity's fourth year and every year has seen significant growth, but this year felt like a tremendous step up in all areas. Total attendance grew from 1,200 last year to more than 2,000 people. The workshops were huge, the keynotes were packed, and the sessions in each track were bigger than anyone expected. The exhibit hall was more than twice as big as last year and it was still crowded every time I was there.

Sample some of the tweets to see the reaction of attendees, sponsors, and exhibitors.

Several folks on the #velocityconf Twitter stream have been asking about slides and videos. You can find those on the Velocity Slides and Videos page. There are about 25 slide decks up there right now. The rest of the slides will be posted as we receive them from the speakers. Videos of all the keynotes will be made available for free. Several are there already posted, including "Career Development" by Theo Schlossnagle, "JavaScript & Metaperformance" by Doug Crockford, and "Look at Your Data" by the omni-awesome John Rauser. Videos of every afternoon session are also available via the Velocity Online Access Pass ($495).

Velocity 2011 had a great crowd with a lot of energy. Check out the Velocity photos to get a feel for what was happening. We had more women speakers than ever before and I was psyched when I saw this photo of the Women's Networking Meet Up that took place during the conference (also posted above).

Velocity 2011: Take-aways, Trends, and Highlights — In this webcast following Velocity 2011, program chairs Steve Souders and John Allspaw will identify and discuss key trends and announcements that came out of the event and how they will impact the web industry in the year to come.

Join us on Friday, June 24, 2011, at 10 am PT

Register for this free webcast

Make sure to check out all the announcements that were made at Velocity. There were a couple big announcements about Velocity itself, including:

  • After four years Jesse Robbins is passing the co-chair mantle to John Allspaw. I worked with John at Yahoo! when he was with Flickr. John is VP of Tech Ops at Etsy now. He stepped into many of the co-chair duties at this Velocity in preparation for taking on the role at the next Velocity.
  • Speaking of the next Velocity, we announced there will be a Velocity Europe in November in Berlin.
    The exact venue and dates will be announced soon, followed quickly by a call for proposals.
    We're extremely excited about expanding Velocity to Europe and look forward to connecting with the performance and operations communities there,
    and helping grow the WPO and devops industries in that part of the world.
    In addition, the second Velocity China will be held in Beijing in December 2011.
  • And of course we'll be back next June for our fifth year of Velocity here in the Bay Area.

I covered a lot in this post and didn't even talk about any of the themes, trends, and takeaways. John and I will be doing that at the Velocity Wrap-up Webcast on Friday, June 24 at 10am PT. It's free so invite your friends and colleagues to join in.


June 03 2011

Radar's top stories: May 30-June 3, 2011

Here's a look at the top stories published on Radar this week.

How the Library of Congress is building the Twitter archive
One year after Twitter donated its archives, the Library of Congress is still building the infrastructure to make the data accessible to researchers.
10 ways to botch a mobile app
With the aim of injecting reason and business know-how into the app development process, "App Savvy" author Ken Yarmosh outlines the top 10 reasons why apps often falter or fail.
The story behind Velocity 2011
As we approach the fourth Velocity conference, here's a look at how the web performance and operations communities came together, what they've done to improve the web experience, and the work that lies ahead.
The state of speed and the quirks of mobile optimization
Google performance evangelist and Velocity co-chair Steve Souders discusses browser competition, the differences between mobile and desktop optimization, and his hopes for the HTTP Archive.
Open Question: Would you fund your favorite author?
With the launch of the publishing platform, readers can fund the books they want to read. If given the chance, would you fund the next book from your favorite author?

OSCON Java 2011, being held July 25-27 in Portland, Ore., is focused on open source technologies that make up the Java ecosystem. Save 20% on registration with the code OS11RAD

June 01 2011

The state of speed and the quirks of mobile optimization

Google's performance evangelist and Velocity co-chair Steve Souders (@souders) recently talked with me about speed, browser wars, and desktop performance vs mobile performance. He also discussed a new project he's working on called the HTTP Archive, which documents how web content is constructed, how it changes over time, and how those changes play out.

Our interview follows.

What are the major factors slowing down site performance?

SteveSouders.jpgSteve Souders: For years when developers started focusing on the performance of their websites, they would start on the back end, optimizing C++ code or database queries. Then we discovered that about 10% or 20% of the overall page load time was spent on the back end. So if you cut that in half, you only improve things 5%, maybe 10%. In many cases, you can reduce the back end time to zero and most users won't notice.

So really, improvement comes from the time spent on the front end, on the network transferring resources and in the browser pulling in those resources. In the case of JavaScript and CSS, it's in parsing them and executing JavaScript. Without any changes in user network connection speeds, websites are able to cut their page load times in half. And that's because even with fast connection speeds or slow connection speeds, there are ways that the browser downloads these resources that developers can control. For example, more parallel downloads or less network overhead in managing connections.

We can work around some of the network problems, but inside the browser there's very little that developers can do in how JavaScript and CSS are handled. And of those two, JavaScript is a much bigger problem. Websites have a lot more JavaScript on them than CSS, and what they do with that JavaScript takes a lot more time. I always tell website owners: "If you care about how fast your website is, the first place to look is JavaScript. And if you can adopt some of the performance best practices we have around JavaScript, you can easily make gains in having your website load faster."

Why do load times vary by browser?

Steve Souders: Even if you're on the same machine, loading a page in one browser vs another can lead to very different timing. Some of the factors that affect this difference are things like the JavaScript engine, caching, network behavior, and rendering.

I don't think it's likely that we ever will see standardization in all of those areas — which I think is a good thing. If we look at the competition in the last few years in JavaScript engines, I think we would all agree that that competition has resulted in tremendous technological growth in that space. We can see that same growth in other areas as the focus on performance continues to grow.

Velocity 2011, being held June 14-16 in Santa Clara, Calif., offers the skills and tools you need to master web performance and operations.

Save 20% on registration with the code VEL11RAD

Are we in the middle of a "speed arms race" between browser developers?

Steve Souders: We're certainly in a phase where there's a lot of competition across the browser teams, and speed is one of the major competitive differentiators. That's music to my ears. I don't know if we're in the middle of it, because it's been going on for two or three years now. Going forward, I think speed is always going to be a critical factor for a browser to be successful. So perhaps we're just at the very beginning of that race.

Starting around 2005 and 2006, we started to see web apps far outpacing the capabilities of the browsers that they ran in, mostly in terms of JavaScript and CSS but also in resource downloads and the size of resources. I'll be honest, I was nervous about the adoption of AJAX and Web 2.0, given the current state of the browsers, but after that explosion, the browsers took notice, and I think that's when this focus on performance really took off. We've seen huge improvements in network behavior, parallel downloads, and JavaScript performance. JavaScript engines have become much faster, and improvements in CSS and layout — and some of the awareness around these performance best practices — has helped as well.

We're just reaching the point where the browsers are catching up with the rich interactive web apps that they're hosting. And all of a sudden, HTML5 came on the scene — the audio tag, video tag, canvas, SVG, web workers, custom font files — and I think as we see these HTML5 features get wider adoption, browsers are going to have to put even more focus on performance. Certainly mobile is another area where browser performance is going to have a lot of growth and is going to have a critical impact on the adoption and success of the web.

What new optimization quirks or obstacles does mobile browsing create?

Steve Souders: As large and multi-dimensional as the browser matrix currently is, it's nothing compared to the size of that matrix for mobile, where we have even more browsers, hardware profiles, connection speeds, types of connections, and proxies.

One of the biggest challenges developers are going to face on the mobile side is getting a handle on the performance of what we're building across the devices we care about. I talked earlier about how on the desktop, without any change in connection speed, developers could work around some of the network obstacles and get significant improvement in their page load times. On mobile, that's going to be more difficult.

The connections on mobile are slower, but they're also constrained in other ways. For example, the number of connections per server and the maximum number of connections across all servers are typically lower on mobile devices than they are on the desktop. And the path that HTTP requests have to take from a mobile device to their origin server can be much more convoluted and slower than they are on the desktop.

So, network performance is going to be a major obstacle, but we can't forget about JavaScript and CSS. Mobile devices have less power than desktops. The same amount of JavaScript and CSS — or even half the amount of JavaScript and CSS — that we have in the desktop could take significantly longer when executed on a mobile platform.

What should companies be doing to optimize mobile browsing?

Steve Souders: Developers are in a great place when it comes to building desktop websites because there's a significant number of performance best practices out there with a lot of research behind them and a lot of tooling and automation. The problem is, we don't have any of that for mobile. That's the goal, but right now, it doesn't exist.

When I started talking a year or so ago about switching the focus to mobile performance, most people would respond with, "Don't the best practices we have for desktop also apply to mobile?" And I always said the same thing, "I don't know, but I'm going to find out." My guess is that half of them are important, a quarter of them don't really matter, and a quarter of them actually hurt mobile performance. Then there's a whole set of performance best practices that are really important for mobile but don't matter so much for the desktop, so no one's really focused on them.

In the first few months that I've been able to focus on mobile, that's played out pretty well. There are some things, like domain sharding, that are really great for desktop performance but actually hurt mobile performance. And there are other things — like "data: URIs" for embedding images, and relying on localStorage for long-term caching — that are great for mobile and don't exist in any of the popular lists of performance best practices. Unfortunately, companies that want to invest in mobile performance don't have a large body of best practices to refer to. And that's where we are now, at least that's where I am now — trying to identify those best practices. Once we have them, we can evangelize, codify, and automate them.

What is the HTTP Archive and how can developers use it to improve site speed?

HTTP Archive logoSteve Souders: Over the last five years, we've seen a lot of interest in website optimization — and websites have changed over that time. Unfortunately, we don't have any record of what those changes have been, how significant they've been, or what areas we've seen change and what areas we haven't seen change. The purpose of the HTTP Archive is to give us that history.

It's similar to the Internet Archive started by Brewster Kahle in 1996 — the Internet Archive collects the web's content and the HTTP Archive archives how that content was built and served. Both are important: The Internet Archive provides society with a record of the evolution of digital media on the web, and the HTTP Archive provides a record of how that digital content has been served to users and how it's changing, specifically for people interested in website performance.

This project will highlight areas where we've seen good traction of performance best practices and where we haven't. Another insight that will come from this is that it's important, for example, for browser vendors and JavaScript framework developers to develop tools and features that can be adopted by developers to improve performance. It's also important to provide support for development patterns that are currently popular on the Internet. We can't ignore the way the web is currently built and just author new features and wait for developers to adopt them. The HTTP Archive will provide some perspective on current development practices and patterns so browser developers can focus on performance optimizations that fit with those patterns.

Image transfer size and image request chart
Click to enlarge and see more trends data from the HTTP Archive.

Right now, there aren't that many slices of the data, but the ones that are there are pretty powerful. I think the most impactful ones are the trending charts because we can see how the web is changing over time. For example, we noticed that the size of images has grown about 12% over the last five months. That's pretty significant. And there are new technologies that address performance issues like image size — Google has recently released the WebP proposal for a new image format that reduces image size. So, the adoption of that new format by developers and other browsers might be accelerated when they see that image size is growing and will consume even more bandwidth going forward.

Associated photo on index pages: Speedy Gonzales by blmurch, on Flickr


May 24 2011

To the end of bloated code and broken websites

In a recent discussion, Nicole Sullivan (@stubbornella), architect at Stubbornella Consulting Group and a speaker at Velocity 2011, talked about the state of CSS — how it's adapting to mobile, how it's improving performance, and how some CSS best practices have led to "bloated code and broken websites."

Our interview follows.

How are CSS best practices evolving?

NicoleSullivan.jpgNicole Sullivan: New tools are being added to browsers, and the Chrome team is really pushing the limits of what we can do with CSS, but there is still an uphill battle. Some of the best practices are actually bad for the domain.

I recently wrote an article about the best practices and what's wrong with them. I figured out this year that it wasn't just that the best practices weren't ideal — it's that they were absolutely, every single time, leading to bloated code and broken websites. It was a revelation for me to realize the best practices were often causing issues.

How are architect-level CSS tools improving?

Nicole Sullivan: The preprocessors have gotten much better. They were partially created because people didn't like the syntax of CSS and wanted a new one, but the preprocessors changed a bunch of things that weren't necessarily useful to change. In the last year or so, the preprocessors have embraced CSS and have become a testing ground for what can go into browsers. At the same time, the Chrome team is pushing forward on WebKit — it's a pretty exciting time to be working on this stuff.

Are you encountering browser support issues when building with CSS and HTML5?

Nicole Sullivan: Particularly with CSS3, there's a ton of variation and levels of support. But what CSS3 gives us are ways of doing visual decorations without actually needing images. Stoyan Stefanov and I wrote a few years ago to crush and optimize images because we realized that image weight was one of the big problems on the web. Overall, CSS was sort of the source of the problem because it was bringing in all of this extra media via images.

The cool thing with CSS3 is that now we can eliminate a lot of those images by using the more advanced properties — "border-radius" can give us rounded corners without images; you can get gradients now without images; you can get drop shadows and things like that. The thing is to be flexible enough with design that it's still going to work if, say, it doesn't have that gradient. And to realize that for users on an older browser, it's not worth the weight you'd add to the page to get them that gradient or the rounded corners — they're much more interested in having a snappy, usable experience than they are in having every visual flourish possible.

Velocity 2011, being held June 14-16 in Santa Clara, Calif., offers the skills and tools you need to master web performance and operations.

Save 20% on registration with the code VEL11RAD

How about at the mobile level — what are the major issues you're facing in that space?

Nicole Sullivan: Media queries are the biggest issue for mobile right now. Designers and developers are excited to be able to query, for example, the size of the screen and to offer different layouts for the iPhone or the iPad. But that means you're sending your entire layout for a desktop view and a mobile view down to a mobile phone or down to an iPad, which is way more than you want to be sending over the wire. Designers need to put mobile first and then maybe layer on a desktop experience — but then only sending that code to a desktop user. All of this requires more of a server-side solution.

Do developers need to build two different sites to accomplish that?

Nicole Sullivan: It depends. On my little iPhone, there's not a lot of screen real estate. If I go to a travel website, I don't want every feature they've got cluttering up my iPhone. I want to know what flight I'm on, what my confirmation number is — that kind of thing. It makes sense on the design side to think about why your users are coming to the mobile site and then designing for those needs.

What happens to desktop design is there's sort of a land grab. Each team tries to grab a little bit of space and add stuff so they'll get traffic to their part of the site. It creates a disjointed user experience. The great thing about mobile is that people aren't doing that — there isn't enough screen real estate to have a land grab yet.

This interview was edited and condensed.


March 29 2011

Process management blurs the line between IT and business

Business process management (BPM) and more specifically business process optimization (BPO) is about fully understanding existing business processes and then applying agreed-upon improved approaches to support market goals. Rather than exploring BPO from the viewpoint of the business, here I'll briefly explore some of the motivations and benefits from an IT perspective.

Almost every business change has a technology impact

There are very few IT systems today that exist in isolation within an organization. Systems interact because they often require data from each other and they are interdependent in terms of sequential steps in a business and technology process. As a result, a change in one system invariably has a downstream impact on one or more other systems or processes. Often, the consequences of these changes are poorly understood by both IT and business stakeholders. Put another way: in interdependent complex systems and processes, there is seldom the notion of a small change.

Once both IT and business stakeholders recognize this, there is an opportunity to turn it into a highly positive outcome.

IT must be perpetual teachers and learners

As is the case in achieving many of the objectives of an IT strategy, it begins with communications. Every contact between IT and the business is an opportunity to teach and to learn. This is a reciprocal interaction. When I hear or read a sentence that begins, "Could you make a small change for me…" I know we're already starting from a bad place. Unless the requester fully understands the internal complexity of all the interdependent systems and the potential impacts (which is rare), it's presumptuous for him or her to estimate the scale of the change. Conversely, any IT person who minimizes the impact of a change without fully understanding the potential impact does a disservice in setting expectations that may not be met.

For IT requests, it's best and safe to assume that a change will have impact, but the scale of that change will not be known until reasonable diligence is performed. That's a much better starting point.

Let's now assume that the change is not inconsequential. Two opportunities present themselves.

IT is an important business facilitator

First, stakeholders that are impacted by the change should be brought together to discuss the impact. I'm always surprised how these meetings reveal gaps in everyone's understanding of business processes between departments. To me, this is where IT can shine as the connective tissue within an organization. More than ever, technology forces organizations to better understand and agree on processes — and that's often well before the subject of supporting technology is even relevant to the conversation.

Use this opportunity to surface the entire process and for everyone to understand the impacts of any change. Improvements to the process very often emerge. IT has suddenly motivated business process optimization.

There is no such thing as too much process documentation

Second, assuming no documentation exists, this is the right time to map the process. If you're like many organizations, your IT systems grew organically with little emphasis placed on business process design. My guess is that comprehensive, high-quality, current process documentation is uncommon. It's never too late to start. If you have business stakeholders in a room discussing and agreeing on the current and future process, this is the time to document it. There is a burgeoning market for tools and support to help enable and simplify this work.

Ultimately, documented processes make it easier to build the right software and to make changes with less overhead activities in the future.

The essential roles of business analyst and solutions architect

It's this emphasis and attendant benefits of understanding and documenting business processes that supports the expanded roles of both the business analyst and solutions architect. These two roles, and having the right amount of capacity for your organization's demand, will be essential to succeeding with your IT strategy and in growing the business. In many organizations, the business analyst for this work may or may not be in IT, thus further blurring the lines between where IT starts and ends and where business responsibilities start and end.

Perhaps it's possible that in the not too distant future we'll look at IT as part of the business and not as a separate entity in the manner it is today. It just might be the increased emphasis on business process management that acts as the catalyst.


June 03 2010

How Facebook satisfied a need for speed

FacebookRemember how Facebook used to lumber and strain? And have you noticed how it doesn't feel slow anymore? That's because the engineering team pulled off an impressive feat: an in-depth optimization and rewrite project made the site twice as fast.

Robert Johnson, Facebook's director of engineering and a speaker at the upcoming Velocity and OSCON conferences, discusses that project and its accompanying lessons learned below. Johnson's insights have broad application -- you don't need hundreds of millions of users to reap the rewards.

Facebook recently overhauled its platform to improve performance. How long did that process take to complete?

Robert Johnson: Making the site faster isn't something we're ever really done with, but we did make a big push the second half of last year. It took about a month of planning and six months of work to make the site twice as fast.

What big technical changes were made during the rewrite?

Robert Johnson: Velocity conference 2010The two biggest changes were to pipeline the page content to overlap generation, network, and render time, and to move to a very small core JavaScript library for features that are required on the initial page load.

The pipelining project was called BigPipe, and it streams content back to the browser as soon as it's ready. The browser can start downloading static resources and render the most important parts of the page while the server is still generating the rest of the page. The new JavaScript library is called Primer.

In addition to these big site-wide projects, we also performed a lot of general cleanup to make everything smaller and lighter, and we incorporated best practices such as image spriting.

Were developers encouraged to work in different ways?

This was one of the trickiest parts of the project. Moving fast is one of our most important values, and we didn't want to do anything to slow down development. So most of our focus was on building tools to make things perform well when developers do the things that are easiest for them. For example, with Primer, making it easy to integrate and hard to misuse was as important to its design as making it fast.

We also built detailed monitoring of everything that could affect performance, and set up systems to check code before release.

It's really important that developers be automatically alerted when there's a problem, instead of developers having to go out of their way for every change. That way, people can continue innovating quickly, and only stop to deal with performance in the relatively unusual case that they've caused a problem.

How do you address exponential growth? How do you get ahead of it?

You never get ahead of everything, but you have to keep ahead of most things most of the time. So whenever you go in to make a particular system scale better, you can't settle for twice as good, you really need to shoot for 10 or 100 times as good. Making something twice as good only buys a few months, and you're back at it again as soon as you're done.

In general, this means scaling things by allowing greater federation and parallelism and not just making things more efficient. Efficiency is of course important, too, but it's really a separate issue.

Two other important things: have good data about how things are trending so you catch problems before you're in trouble, and test everything you can before you have to rely on it.

In most cases the easiest way for us to test something new is to put it in production for a small number of users or on a small number of machines. For things that are completely new, we set up "dark launches" that are invisible to the user but mimic the load from the real product as much as possible. For example, before we launched chat we had millions of JavaScript clients connecting to our backend to make sure it could handle the load.

Facebook's size and traffic aren't representative of most sites, but are there speed and scaling lessons you've learned that have universal application?

OSCON Conference 2010The most important one isn't novel, but it's worth repeating: scale everything horizontally.

For example, if you had a database for users that couldn't handle the load, you might decide to break it into two functions -- say, accounts and profiles -- and put them on different databases. This would get you through the day but it's a lot of work and it only buys you twice the capacity. Instead, you should write the code to handle the case where two users aren't on the same database. This is probably even more work than splitting the application code in half, but it will continue to pay off for a very long time.

The most important thing here isn't to have fancy systems for failover or load balancing. In fact, those things tend to take a lot of time and get you in trouble if you don't get them right. You really just need to be able to split any function to run on multiple machines that operate as independently as possible.

The second lesson is to measure everything you can. Performance bottlenecks and scaling problems are often in unexpected places. The things you think will be hard are often not the biggest problems, because they're the things you've thought about a lot. It's actually a lot more like debugging than people realize. You can't be sure your product doesn't have bugs just by looking at the code, and similarly you can't be sure your product will scale because you designed it well. You have to actually set it up and pound it with traffic -- real or test -- and measure what happens.

What is Scribe? How is it used within Facebook?

Scribe is a system we wrote to aggregate log data from thousands of servers. It turned out to be generally useful in a lot of places where you need to move large amounts of data asynchronously and you don't need database-level reliability.

Scribe scales extremely large -- I think we do more than 100 billion messages a day now. It has a simple and easy-to-use interface, and it handles temporary network or machine failures nicely.

We use Scribe for everything from logging performance data, to updating search indexes, to gathering metrics for platform apps and pages. There are more than 100 different logs in use at Facebook today.

I was struck by a phrase in one of your recent blog posts: You said Scribe has a "reasonable level of reliability for a lot of use cases." How did you sell that internally?

For some use cases I didn't. We can't use the system for user data because it's not sufficiently reliable, and keeping user data safe is something we take extremely seriously.

But there are a lot of things that aren't user data, and in practice, data loss in Scribe is extremely rare. For many use cases it's well worth it to be able to collect a massive amount of data.

For example, the statistics we provide to page owners depend on a large amount of data logged from the site. Some of this is from large pages where we could just take a sample of the data, but most of it is from small pages that need detailed reporting and can't be sampled. A rare gap in this data is much better than having to limit the number of things we're able to report to page owners, or only giving approximate numbers that aren't useful for smaller pages.

This interview was condensed and edited.

Robert Johnson will discuss Facebook's optimization techniques at the Velocity Conference (6/22-6/24) and OSCON (7/19-7/23).

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!