Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

November 19 2013

Four short links: 19 November 2013

  1. Why The Banner Ad is Heroic — enough to make Dave Eggers cry. Advertising triumphalism rampant.
  2. Udacity/Thrun ProfileA student taking college algebra in person was 52% more likely to pass than one taking a Udacity class, making the $150 price tag–roughly one-third the normal in-state tuition–seem like something less than a bargain. In which Udacity pivots to hiring-sponsored workforce training and the new educational revolution looks remarkably like sponsored content.
  3. Amazing is Building Substations (GigaOm) — the company even has firmware engineers whose job it is to rewrite the archaic code that normally runs on the switchgear designed to control the flow of power to electricity infrastructure. Pretty sure that wasn’t a line item in the pitch deck for “the first Internet bookstore”.
  4. Panoramic Images — throw the camera in the air, get a 360×360 image from 36 2-megapixel lenses. Not sure that throwing was previously a recognised UI gesture.

September 07 2013

De la sécurité et de la surveillance passive - IETF

De la sécurité et de la #surveillance passive - #IETF
http://www.ietf.org/blog/2013/09/security-and-pervasive-monitoring

Le président de l’IETF, Jari Arkko, appelle les participants de l’IETF, face aux révélations de la surveillance massive de la NSA, à construire un internet plus sûr et à améliorer les protocoles de l’internet. « Nous devons jeter un regard critique » et améliorer la sécurité des protocoles de l’internet, déclare le président, qui appelle les ingénieurs de l’internet à y travailler lors de leur prochain congrès. L’internet doit passer à la « sécurité » par défaut. Tags : (...)

#securité #protocole #standards #infrastructure

July 12 2013

October 09 2012

Tracking Salesforce’s push toward developers

SalesforceSalesforceHave you ever seen Salesforce’s “no software” graphic? It’s the word “software” surrounded by a circle with a red line through it. Here’s a picture of the related (and dancing) “no software” mascot.

Now, if you consider yourself a developer, this is a bit threatening, no? Imagine sitting at a Salesforce event in 2008 in Chicago while Salesforce.com’s CEO, Marc Benioff, swiftly works an entire room of business users into an anti-software frenzy. I was there to learn about Force.com, and I’ll summarize the message I understood four years ago as “Not only can companies benefit from Salesforce.com, they also don’t have to hire developers.”

The message resonated with the audience. Salesforce had been using this approach for a decade: Don’t buy software you have to support, maintain, and hire developers to customize. Use our software-as-a-service (SaaS) instead.  The reality behind Salesforce’s trajectory at the time was that it too needed to provide a platform for custom development.

Salesforce’s dilemma: They needed developers

This “no software” message was enough for the vast majority of the small-to-medium-sized business (SMB) market, but to engage with companies at the largest scale, you need APIs and you need to be able to work with developers. At the time, in 2008, Salesforce was making moves toward the developer community. First there was Apex, then there was Force.com.

In 2008, I evaluated Force.com, and while capable, it didn’t strike me as something that would appeal to most developers outside of existing Salesforce customers.  Salesforce was aiming at the corporate developers building software atop competing stacks like Oracle.  While there were several attempts to sell it as such, it wasn’t a stand-alone product or framework.  In my opinion, no developer would assess Force.com and opt to use it as the next development platform.

This 2008 TechCrunch article announcing the arrival of Salesforce’s Developer-as-a-Service (DaaS) platform serves as a reminder of what Salesforce had in mind. They were still moving forward with an anti-software message for the business while continuing to make moves into the developer space. Salesforce built a capable platform. Looking back at Force.com, it felt more like an even more constrained version of Google App Engine. In other words, capable and scalable, but at the time a bit constraining for the general developer population. Don’t get me wrong: Force.com wasn’t a business failure by any measure; they have an impressive client list even today, but what they didn’t achieve was traction and awareness among the developer community.

2010: Who bought Heroku? Really?

When Salesforce.com purchased Heroku, I was initially surprised. I didn’t see it coming, but it made perfect sense after the fact. Heroku is very much an analog to Salesforce for developers, and Heroku brought something to Salesforce that Force.com couldn’t achieve: developer authenticity and community.

Heroku, like Salesforce, is the opposite of shrink-wrapped software. There’s no investment in on-premise infrastructure, and once you move to Heroku — or any other capable platform-as-a-service (PaaS), like Joyent — you question why you would ever bother doing without such a service.  As a developer, once you’ve made the transition to never again having to worry about running “yum update” on a CentOS machine or worry about kernel patches in a production deployment, it is difficult to go back to a world where you have to worry about system administration.

Yes, there are arguments to be made for procuring your own servers and taking on the responsibility for system administration: backups, worrying about disks, and the 100 other things that come along with owning infrastructure. But those arguments really only start to make sense once you achieve the scale of a large multi-national corporation (or after a freak derecho in Reston, Va).

Jokes about derecho-prone data centers aside, this market is growing up, and with it we’re seeing an unbeatable trend of obsoleting yesterday’s system administrator.  With capable options like Joyent and Heroku, there’s very little reason (other than accounting) for any SMB to own hardware infrastructure. It’s a large capital expense when compared to the relatively small operational expense you’ll throw down to run a scalable architecture on a Heroku or a Joyent.

Replace several full-time operations employees with software-as-a-service; shift the cost to developers who create real value; insist on a proper service-level agreement (SLA); and if you’re really serious about risk mitigation, use several platforms at once.  Heroku is exactly the same play Salesforce made for customer relationship management (CRM) over the last decade, except this time they are selling PostgreSQL and a platform to run applications.

Heroku is closer to what Salesforce was aiming for with Force.com.  Here are two things to note about this Heroku acquisition almost two years after the fact:

  1. Force.com was focused on developers — a certain kind of developer in the “enterprise.” While it was clear that Salesforce had designs on using Force.com to expand the market, the existing marketing and product management function at Salesforce ended up creating something that was way too connected to the existing Salesforce brand, along with its anti-developer message.
  2. Salesforce.com’s developer-focused acquisitions are isolated from the Salesforce.com brand (on purpose?) — When Benioff discussed Heroku at Dreamforce, he made it clear that Heroku would remain “separate.”  While it is tough to know how “separate” Heroku remains, the brand has changed very little, and I think this is the important thing to note two years after the fact. Salesforce understands that they need to attract independent developers and they understand that the Salesforce brand is something of a “scarecrow” for this audience. They invested an entire decade in telling businesses that software isn’t necessary, and senior management is too smart to confuse developers with that message.

Investments point toward greater developer outreach: Sauce Labs

This brings me to Sauce Labs — a company that recently raised $3 million from “Triage Ventures as well as [a] new investor Salesforce.com,” according to Jolie O’Dell at VentureBeat.

Sauce Labs provides a hosted web testing platform.  I’ve used it for some one-off testing jobs, and it is impressive.  You can spin up a testing machine in about two minutes from an array of operating systems, mobile devices, and browsers, and then run a test script either in Selenium or WebDriver. The platform can be used for acceptance testing, and Jason Huggins and John Dunham’s emphasis of late has been mobile testing.  Huggins supported the testing grid at Google, and he started Selenium while at ThoughtWorks in 2004. By every measure, Sauce Labs is a developer’s company as much as Heroku.

Sauce Labs, like Heroku before it, also satisfies the Salesforce.com analogy perfectly. Say I have a company that develops a web site. Well, if I’m deploying this application to a platform like Joyent or Heroku continuously, I also need to be able to support some sort of continuous automated testing system. If I need to test on an array of browsers and devices, would I procure the necessary hardware infrastructure to set up my own testing infrastructure, or … you can see where I’m going.

I also think I can see where Salesforce is going. They didn’t acquire Sauce Labs, but this investment is another data point and another view into what Salesforce.com is paying attention to. I think it has taken them 4-5 years, but they are continuing a push toward developers. Heroku stood out from the list of Salesforce acquisitions: It wasn’t CRM, sales, or marketing focused; it was a pure technology play. Salesforce’s recent investments, from Sauce Labs to Appirio to Urban Airship, suggest that Salesforce is become more relevant to the individual developer who is uninterested in Salesforce’s other product offerings.

Some random concluding conjecture

Although I think it would be too expensive, I wonder what would happen if Salesforce acquired GitHub. GitHub just received an unreal investment ($100M), so I just don’t see it happening. But if you were to combine GitHub, Heroku, and Sauce Labs into a single company, you’d have a one-stop shop for the majority of development and production infrastructure that people are paying attention to.  Add an Atlassian to the mix, and it would be tough to avoid that company.

This is nothing more than conjecture, but I do get the sense that there has been an interesting shift happening at Salesforce ever since 2008. I think the next few years are going to see even more activity.

August 23 2012

Four short links: 23 August 2012

  1. Computational Social Science (Nature) — Facebook and Twitter data drives social science analysis. (via Vaughan Bell)
  2. The Single Most Important Object in the Global Economy (Slate) — Companies like Ikea have literally designed products around pallets: Its “Bang” mug, notes Colin White in his book Strategic Management, has had three redesigns, each done not for aesthetics but to ensure that more mugs would fit on a pallet (not to mention in a customer’s cupboard). (via Boing Boing)
  3. Narco Ultralights (Wired) — it’s just a matter of time until there are no humans on the ultralights. Remote-controlled narcodrones can’t be far away.
  4. Shortcut Foo — a typing tutor for editors, photoshop, and the commandline, to build muscle memory of frequently-used keystrokes. Brilliant! (via Irene Ros)

August 01 2012

More than bricks and mortar: how to make the most of your facilities

Good facilities are integral to good universities, so how can HE leaders finance, plan and manage their estates in a way that leads to gains and not losses? Join the live chat, Friday 3 August

Campus development: everyone's at it. From minor refurbishment projects to more sizeable construction jobs, it would seem - in the UK at least - that appetite for new, bigger and better facilities has defied the austerity mantra.

State-of-the-art facilities are not simply a vanity project. They help attract students, provide a tailored space in which academic staff can teach and conduct research, and are part of the wider distinctiveness and economic strategy.

The Organisation for Economic Cooperation and Development (OECD) puts it this way: "Research shows the increasing importance of the role of higher education institutions in local and regional economies through knowledge creation and knowledge transfer. Facilities play a crucial role in meeting educational needs and providing places where knowledge exchange can happen. However, they are an expensive commodity to provide and maintain."

And at a time of considerable change in higher education, coupled with a global economy in renewed crisis, many wonder if greater gains could not be achieved by investing elsewhere in the sector - predominantly in teaching and research.

But the question shouldn't be whether buildings are worth more than brains. In an assessment of which is more valuable to the creation of scientific knowledge, scientists or facilities, assistant professor Fabian Waldinger concludes: "It is difficult to evaluate how much high quality scientists and better facilities contribute to the creation of scientific knowledge".

Similarly, a non-targeted injection of funds into capital projects won't guarantee a university's survival. As a recent report into US colleges and universities found, development without a good strategic plan could lead to liquidity issues. The report, The financially sustainable university, explains: "Many institutions have operated on the assumption that the more they build, spend, diversify and expand, the more they will persist and prosper. But instead, the opposite has happened: Institutions have become over-leveraged."

So how can facilities and senior managers finance, plan and manage their estates in a way that leads to gains and not losses? And as technology permeates all areas of HE, what is its role in facilities management? Join our live chat panel to explore what an effective learning environment looks like, what the benchmarks and performance indicators of effective management are, and how to make university facilities financially and environmentally sustainable.

The live chat takes place on Friday 3 August, in the comment threads beneath this blog and will begin at 12 BST

If you would like to join the panel, please send me an email.

This content is brought to you by Guardian Professional. To get more articles like this direct to your inbox, become a member of the Higher Education Network.


guardian.co.uk © 2012 Guardian News and Media Limited or its affiliated companies. All rights reserved. | Use of this content is subject to our Terms & Conditions | More Feeds




June 07 2012

What is DevOps?

Adrian Cockcroft's article about NoOps at Netflix ignited a controversy that has been smouldering for some months. John Allspaw's detailed response to Adrian's article makes a key point: What Adrian described as "NoOps" isn't really. Operations doesn't go away. Responsibilities can, and do, shift over time, and as they shift, so do job descriptions. But no matter how you slice it, the same jobs need to be done, and one of those jobs is operations. What Adrian is calling NoOps at Netflix isn't all that different from Operations at Etsy. But that just begs the question: What do we mean by "operations" in the 21st century? If NoOps is a movement for replacing operations with something that looks suspiciously like operations, there's clearly confusion. Now that some of the passion has died down, it's time to get to a better understanding of what we mean by operations and how it's changed over the years.

At a recent lunch, John noted that back in the dawn of the computer age, there was no distinction between dev and ops. If you developed, you operated. You mounted the tapes, you flipped the switches on the front panel, you rebooted when things crashed, and possibly even replaced the burned out vacuum tubes. And you got to wear a geeky white lab coat. Dev and ops started to separate in the '60s, when programmer/analysts dumped boxes of punch cards into readers, and "computer operators" behind a glass wall scurried around mounting tapes in response to IBM JCL. The operators also pulled printouts from line printers and shoved them in labeled cubbyholes, where you got your output filed under your last name.

The arrival of minicomputers in the 1970s and PCs in the '80s broke down the wall between mainframe operators and users, leading to the system and network administrators of the 1980s and '90s. That was the birth of modern "IT operations" culture. Minicomputer users tended to be computing professionals with just enough knowledge to be dangerous. (I remember when a new director was given the root password and told to "create an account for yourself" ... and promptly crashed the VAX, which was shared by about 30 users). PC users required networks; they required support; they required shared resources, such as file servers and mail servers. And yes, BOFH ("Bastard Operator from Hell") serves as a reminder of those days. I remember being told that "no one" else is having the problem you're having — and not getting beyond it until at a company meeting we found that everyone was having the exact same problem, in slightly different ways. No wonder we want ops to disappear. No wonder we wanted a wall between the developers and the sysadmins, particularly since, in theory, the advent of the personal computer and desktop workstation meant that we could all be responsible for our own machines.

But somebody has to keep the infrastructure running, including the increasingly important websites. As companies and computing facilities grew larger, the fire-fighting mentality of many system administrators didn't scale. When the whole company runs on one 386 box (like O'Reilly in 1990), mumbling obscure command-line incantations is an appropriate way to fix problems. But that doesn't work when you're talking hundreds or thousands of nodes at Rackspace or Amazon. From an operations standpoint, the big story of the web isn't the evolution toward full-fledged applications that run in the browser; it's the growth from single servers to tens of servers to hundreds, to thousands, to (in the case of Google or Facebook) millions. When you're running at that scale, fixing problems on the command line just isn't an option. You can't afford letting machines get out of sync through ad-hoc fixes and patches. Being told "We need 125 servers online ASAP, and there's no time to automate it" (as Sascha Bates encountered) is a recipe for disaster.

The response of the operations community to the problem of scale isn't surprising. One of the themes of O'Reilly's Velocity Conference is "Infrastructure as Code." If you're going to do operations reliably, you need to make it reproducible and programmatic. Hence virtual machines to shield software from configuration issues. Hence Puppet and Chef to automate configuration, so you know every machine has an identical software configuration and is running the right services. Hence Vagrant to ensure that all your virtual machines are constructed identically from the start. Hence automated monitoring tools to ensure that your clusters are running properly. It doesn't matter whether the nodes are in your own data center, in a hosting facility, or in a public cloud. If you're not writing software to manage them, you're not surviving.

Furthermore, as we move further and further away from traditional hardware servers and networks, and into a world that's virtualized on every level, old-style system administration ceases to work. Physical machines in a physical machine room won't disappear, but they're no longer the only thing a system administrator has to worry about. Where's the root disk drive on a virtual instance running at some colocation facility? Where's a network port on a virtual switch? Sure, system administrators of the '90s managed these resources with software; no sysadmin worth his salt came without a portfolio of Perl scripts. The difference is that now the resources themselves may be physical, or they may just be software; a network port, a disk drive, or a CPU has nothing to do with a physical entity you can point at or unplug. The only effective way to manage this layered reality is through software.

So infrastructure had to become code. All those Perl scripts show that it was already becoming code as early as the late '80s; indeed, Perl was designed as a programming language for automating system administration. It didn't take long for leading-edge sysadmins to realize that handcrafted configurations and non-reproducible incantations were a bad way to run their shops. It's possible that this trend means the end of traditional system administrators, whose jobs are reduced to racking up systems for Amazon or Rackspace. But that's only likely to be the fate of those sysadmins who refuse to grow and adapt as the computing industry evolves. (And I suspect that sysadmins who refuse to adapt swell the ranks of the BOFH fraternity, and most of us would be happy to see them leave.) Good sysadmins have always realized that automation was a significant component of their job and will adapt as automation becomes even more important. The new sysadmin won't power down a machine, replace a failing disk drive, reboot, and restore from backup; he'll write software to detect a misbehaving EC2 instance automatically, destroy the bad instance, spin up a new one, and configure it, all without interrupting service. With automation at this level, the new "ops guy" won't care if he's responsible for a dozen systems or 10,000. And the modern BOFH is, more often than not, an old-school sysadmin who has chosen not to adapt.

James Urquhart nails it when he describes how modern applications, running in the cloud, still need to be resilient and fault tolerant, still need monitoring, still need to adapt to huge swings in load, etc. But he notes that those features, formerly provided by the IT/operations infrastructures, now need to be part of the application, particularly in "platform as a service" environments. Operations doesn't go away, it becomes part of the development. And rather than envision some sort of uber developer, who understands big data, web performance optimization, application middleware, and fault tolerance in a massively distributed environment, we need operations specialists on the development teams. The infrastructure doesn't go away — it moves into the code; and the people responsible for the infrastructure, the system administrators and corporate IT groups, evolve so that they can write the code that maintains the infrastructure. Rather than being isolated, they need to cooperate and collaborate with the developers who create the applications. This is the movement informally known as "DevOps."

Amazon's EBS outage last year demonstrates how the nature of "operations" has changed. There was a marked distinction between companies that suffered and lost money, and companies that rode through the outage just fine. What was the difference? The companies that didn't suffer, including Netflix, knew how to design for reliability; they understood resilience, spreading data across zones, and a whole lot of reliability engineering. Furthermore, they understood that resilience was a property of the application, and they worked with the development teams to ensure that the applications could survive when parts of the network went down. More important than the flames about Amazon's services are the testimonials of how intelligent and careful design kept applications running while EBS was down. Netflix's ChaosMonkey is an excellent, if extreme, example of a tool to ensure that a complex distributed application can survive outages; ChaosMonkey randomly kills instances and services within the application. The development and operations teams collaborate to ensure that the application is sufficiently robust to withstand constant random (and self-inflicted!) outages without degrading.

Taken at IBM's headquarter On the other hand, during the EBS outage, nobody who wasn't an Amazon employee touched a single piece of hardware. At the time, JD Long tweeted that the best thing about the EBS outage was that his guys weren't running around like crazy trying to fix things. That's how it should be. It's important, though, to notice how this differs from operations practices 20, even 10 years ago. It was all over before the outage even occurred: The sites that dealt with it successfully had written software that was robust, and carefully managed their data so that it wasn't reliant on a single zone. And similarly, the sites that scrambled to recover from the outage were those that hadn't built resilience into their applications and hadn't replicated their data across different zones.

In addition to this redistribution of responsibility, from the lower layers of the stack to the application itself, we're also seeing a redistribution of costs. It's a mistake to think that the cost of operations goes away. Capital expense for new servers may be replaced by monthly bills from Amazon, but it's still cost. There may be fewer traditional IT staff, and there will certainly be a higher ratio of servers to staff, but that's because some IT functions have disappeared into the development groups. The bonding is fluid, but that's precisely the point. The task — providing a solid, stable application for customers — is the same. The locations of the servers on which that application runs, and how they're managed, are all that changes.

One important task of operations is understanding the cost trade-offs between public clouds like Amazon's, private clouds, traditional colocation, and building their own infrastructure. It's hard to beat Amazon if you're a startup trying to conserve cash and need to allocate or deallocate hardware to respond to fluctuations in load. You don't want to own a huge cluster to handle your peak capacity but leave it idle most of the time. But Amazon isn't inexpensive, and a larger company can probably get a better deal taking its infrastructure to a colocation facility. A few of the largest companies will build their own datacenters. Cost versus flexibility is an important trade-off; scaling is inherently slow when you own physical hardware, and when you build your data centers to handle peak loads, your facility is underutilized most of the time. Smaller companies will develop hybrid strategies, with parts of the infrastructure hosted on public clouds like AWS or Rackspace, part running on private hosting services, and part running in-house. Optimizing how tasks are distributed between these facilities isn't simple; that is the province of operations groups. Developing applications that can run effectively in a hybrid environment: that's the responsibility of developers, with healthy cooperation with an operations team.

The use of metrics to monitor system performance is another respect in which system administration has evolved. In the early '80s or early '90s, you knew when a machine crashed because you started getting phone calls. Early system monitoring tools like HP's OpenView provided limited visibility into system and network behavior but didn't give much more information than simple heartbeats or reachability tests. Modern tools like DTrace provide insight into almost every aspect of system behavior; one of the biggest challenges facing modern operations groups is developing analytic tools and metrics that can take advantage of the data that's available to predict problems before they become outages. We now have access to the data we need, we just don't know how to use it. And the more we rely on distributed systems, the more important monitoring becomes. As with so much else, monitoring needs to become part of the application itself. Operations is crucial to success, but operations can only succeed to the extent that it collaborates with developers and participates in the development of applications that can monitor and heal themselves.

Success isn't based entirely on integrating operations into development. It's naive to think that even the best development groups, aware of the challenges of high-performance, distributed applications, can write software that won't fail. On this two-way street, do developers wear the beepers, or IT staff? As Allspaw points out, it's important not to divorce developers from the consequences of their work since the fires are frequently set by their code. So, both developers and operations carry the beepers. Sharing responsibilities has another benefit. Rather than finger-pointing post-mortems that try to figure out whether an outage was caused by bad code or operational errors, when operations and development teams work together to solve outages, a post-mortem can focus less on assigning blame than on making systems more resilient in the future. Although we used to practice "root cause analysis" after failures, we're recognizing that finding out the single cause is unhelpful. Almost every outage is the result of a "perfect storm" of normal, everyday mishaps. Instead of figuring out what went wrong and building procedures to ensure that something bad can never happen again (a process that almost always introduces inefficiencies and unanticipated vulnerabilities), modern operations designs systems that are resilient in the face of everyday errors, even when they occur in unpredictable combinations.

In the past decade, we've seen major changes in software development practice. We've moved from various versions of the "waterfall" method, with interminable up-front planning, to "minimum viable product," continuous integration, and continuous deployment. It's important to understand that the waterfall and methodology of the '80s aren't "bad ideas" or mistakes. They were perfectly adapted to an age of shrink-wrapped software. When you produce a "gold disk" and manufacture thousands (or millions) of copies, the penalties for getting something wrong are huge. If there's a bug, you can't fix it until the next release. In this environment, a software release is a huge event. But in this age of web and mobile applications, deployment isn't such a big thing. We can release early, and release often; we've moved from continuous integration to continuous deployment. We've developed techniques for quick resolution in case a new release has serious problems; we've mastered A/B testing to test releases on a small subset of the user base.

All of these changes require cooperation and collaboration between developers and operations staff. Operations groups are adopting, and in many cases, leading in the effort to implement these changes. They're the specialists in resilience, in monitoring, in deploying changes and rolling them back. And the many attendees, hallway discussions, talks, and keynotes at O'Reilly's Velocity conference show us that they are adapting. They're learning about adopting approaches to resilience that are completely new to software engineering; they're learning about monitoring and diagnosing distributed systems, doing large-scale automation, and debugging under pressure. At a recent meeting, Jesse Robbins described scheduling EMT training sessions for operations staff so that they understood how to handle themselves and communicate with each other in an emergency. It's an interesting and provocative idea, and one of many things that modern operations staff bring to the mix when they work with developers.

What does the future hold for operations? System and network monitoring used to be exotic and bleeding-edge; now, it's expected. But we haven't taken it far enough. We're still learning how to monitor systems, how to analyze the data generated by modern monitoring tools, and how to build dashboards that let us see and use the results effectively. I've joked about "using a Hadoop cluster to monitor the Hadoop cluster," but that may not be far from reality. The amount of information we can capture is tremendous, and far beyond what humans can analyze without techniques like machine learning.

Likewise, operations groups are playing a huge role in the deployment of new, more efficient protocols for the web, like SPDY. Operations is involved, more than ever, in tuning the performance of operating systems and servers (even ones that aren't under our physical control); a lot of our "best practices" for TCP tuning were developed in the days of ISDN and 56 Kbps analog modems, and haven't been adapted to the reality of Gigabit Ethernet, OC48* fiber, and their descendants. Operations groups are responsible for figuring out how to use these technologies (and their successors) effectively. We're only beginning to digest IPv6 and the changes it implies for network infrastructure. And, while I've written a lot about building resilience into applications, so far we've only taken baby steps. There's a lot there that we still don't know. Operations groups have been leaders in taking best practices from older disciplines (control systems theory, manufacturing, medicine) and integrating them into software development.

And what about NoOps? Ultimately, it's a bad name, but the name doesn't really matter. A group practicing "NoOps" successfully hasn't banished operations. It's just moved operations elsewhere and called it something else. Whether a poorly chosen name helps or hinders progress remains to be seen, but operations won't go away; it will evolve to meet the challenges of delivering effective, reliable software to customers. Old-style system administrators may indeed be disappearing. But if so, they are being replaced by more sophisticated operations experts who work closely with development teams to get continuous deployment right; to build highly distributed systems that are resilient; and yes, to answer the pagers in the middle of the night when EBS goes down. DevOps.

Velocity 2012: Web Operations & Performance — The smartest minds in web operations and performance are coming together for the Velocity Conference, being held June 25-27 in Santa Clara, Calif.

Save 20% on registration with the code RADAR20

Photo: Taken at IBM's headquarters in Armonk, NY. By Mike Loukides.

Related:

June 05 2012

The software professional vs the software artist

I hope that James Turner's post on "The overhead of insecure infrastructure" was ironic or satiric. The attitude he expresses is all too common, and frankly, is the reason that system administrators and other operations people can't keep their systems secure.

Why do we have to deal with vulnerabilities in operating systems and applications? It's precisely because of prima donna software developers who think they're "artists" and can't be bothered to take the time to do things right. That, and a long history of management that was more interested in meeting ship dates than shipping secure software; and the never ending and always escalating battle between the good guys and the bad guys, as black hats find new vulnerabilities that no one thought of a week ago, let alone a few years ago.

Yes, that's frustrating, but that's life. If a developer in my organization said that he was too good and creative to care about writing secure code, he would be out on his ear. Software developers are not artistes. They are professionals, and the attitude James describes is completely unprofessional and entirely too common.

One of the long-time puzzles in English literature is Jonathan Swift's "A Modest Proposal for Preventing the Children of Poor People From Being a Burden on Their Parents or Country, and for Making Them Beneficial to the Publick." It suggests solving the problem of famine in Ireland by cannibalism. Although Swift is one of English literature's greatest satirists, the problem here is that he goes too far: the piece is just too coldly rational, and never gives you the sly look that shows something else is going on. Is Turner a latter-day Swift? I hope so.

Related:

June 01 2012

Developer Week in Review: The overhead of insecure infrastructure

I'm experiencing a slow death by pollen this week, which has prompted me to ponder some of the larger issues of life. In particular, I was struck by the news that an FPGA chip widely used in military applications has an easily exploitable back door.

There is open discussion at the moment about whether this was a deliberate attempt by a certain foreign government (*cough* China *cough*) to gain access to sensitive data and possibly engage in Stuxnet-like mischief, or just normal carelessness on the part of chip designers who left a debugging path open and available. Either way, there's a lot of hardware out there walking around with its fly down, so to speak.

As developers, we put a lot of time and effort into trying to block the acts of people with bad intent. At my day job, we have security "ninjas" on each team that take special training and devote a fair amount of their time to keeping up with the latest exploits and remediations. Web developers constantly have to guard against perils such as cross-site scripting and SQL injection hacks. Mobile developers need to make sure their remote endpoints are secure and provide appropriate authentication.

The thing is, we shouldn't have to. The underlying platforms and infrastructures we develop on top of should take care of all of this, and leave us free to innovate and create the next insanely great thing. The fact that we have to spend so much of our time building fences rather than erecting skyscrapers is a sign of how badly this basic need has been left unmet.

So why is the development biome so under protected? I think there are several factors. The first is fragmentation. It's easier to guard one big army base than 1,000 small ones. In the same way, the more languages, operating systems and packages that are in the wild, the more times you have to reinvent the wheel. Rather than focus on making a small number of them absolutely bulletproof (and applying constant vigilance to them), we jump on the flavor of the day, regardless of how much or little effort has been put into reducing the exposed security footprint of our new toy.

The fact that we have independent, massive efforts involved in securing the base operating systems for MacOS, Windows, Linux, BSD, etc, is nothing short of a crime against the development community. Pretty it up any way that suits you with a user interface, but there should (at this point in the lifecycle of operating systems) only be a single, rock-solid operating system that the whole world uses. It is only because of greed, pettiness, and bickering that we have multiple, fragile operating systems, all forgetting to lock their car before they go out to dinner.

Languages are a bit more complex, because there is a genuine need for different languages to match different styles of development and application needs. But, again, the language space is polluted with far too many "me-too" wannabes that distract from the goal of making the developer's security workload as low as possible. The next time you hear about a site that gets pwned by a buffer overrun exploit, don't think "stupid developers!", think "stupid industry!" Any language that allows a developer to leave themselves vulnerable to that kind of attack is a bad language, period!

The other major factor in why things are so bad is that we don't care, evidently. If developers refused to develop on operating systems or languages that didn't supply unattackable foundations, companies such as Apple and Microsoft (and communities such as the Linux kernel devs) would get the message in short order. Instead, we head out to conferences like WWDC eager for the latest bells and whistles, but nary a moment will be spent to think about how the security of the OS could be improved.

Personally, I'm tired of wasting time playing mall security guard, rather than Great Artist. In a world where we had made security a must-have in the infrastructure we build on, rather than in the code we develop, think of how much more amazing code could have been written. Instead, we spend endless time in code reviews, following best practices, and otherwise cleaning up after our security-challenged operating systems, languages and platform. Last weekend, we honored (at least in the U.S.) those who have given their life to physically secure our country. Maybe it's time to demand that those who secure our network and computing infrastructures do as good a job ...

OSCON 2012 — Join the world's open source pioneers, builders, and innovators July 16-20 in Portland, Oregon. Learn about open development, challenge your assumptions, and fire up your brain.

Save 20% on registration with the code RADAR20


Related:


February 06 2012

Business-government ties complicate cyber security

From time to time, we like to check in with "Inside Cyber Warfare" author Jeffrey Carr to get his thoughts on the digital security landscape. These conversations often address specific threats, but with the recent release of the second edition of Carr's book, we decided to explore some of the larger concepts shaping this space.

Are corporate and government interests in the U.S. becoming one and the same? That is, an attack on an American business' network may be regarded as an assault on the country itself?

Jeffrey Carr: Due to the dependence of the U.S. government upon private contractors, the insecurity of one impacts the security of the other. The fact is that there are an unlimited number of ways that an attacker can compromise a person, organization or government agency due to the interdependencies and connectedness that exist between both.

Are national network security and media piracy becoming interrelated and confused?

Jeffrey Carr: It has definitely become confused to the point where the Department of Homeland Security (DHS) is now the enforcement arm of the Recording Industry Association of America (RIAA), which I find utterly disgraceful. It's due entirely to the money and power that entertainment industry lobbyists have to wave in front of members of Congress. It has absolutely nothing to do with improving the security of our critical infrastructure or reducing the attack platform used by bad actors.

Flipping this around, how much of a cyber threat does the U.S. pose to other countries?

Jeffrey Carr: The U.S. is probably as capable or more capable at conducting cyber operations than any of the other nation states who engage in it. It's not a question of "they do it to us, but we don't do it to them." It's a question of how to defend your critical assets in light of the fact that everyone is doing it.

What recent technologies concern you the most?

Jeffrey Carr: We are racing to adopt cloud computing without regard to security. In fact, many customers wrongly assume that the cloud provider is responsible for their data's security when the reverse is true. Not only is security a major problem, but there's no telling where in the world your data may reside since most large cloud providers have server farms scattered around the world. That, in turn, makes the data susceptible to foreign governments that have cause to request legal access to data sitting on servers inside their borders.

Inside Cyber Warfare, 2nd Edition — Jeffrey Carr's second edition of "Inside Cyber Warfare" goes beyond the headlines of attention-grabbing DDoS attacks and takes a deep look inside recent cyber-conflicts, including the use of Stuxnet.

This interview was edited and condensed.

Related:

January 12 2012

Four short links: 12 January 2012

  1. Smart Hacking for Privacy -- can mine smart power meter data (or even snoop it) to learn what's on the TV. Wow. (You can also watch the talk). (via Rob Inskeep)
  2. Conditioning Company Culture (Bryce Roberts) -- a short read but thought-provoking. It's easy to create mindless mantras, but I've seen the technique that Bryce describes and (when done well) it's highly effective.
  3. hydrat (Google Code) -- a declarative framework for text classification tasks.
  4. Dynamic Face Substitution (FlowingData) -- Kyle McDonald and Arturo Castro play around with a face tracker and color interpolation to replace their own faces, in real-time, with celebrities such as that of Brad Pitt and Paris Hilton. Awesome. And creepy. Amen.

October 03 2011

USA: Occupy Together

The website Occupy Together offers a wealth of information on the social movements catalyzing in many cities in the United States and in other countries around the world against corporate greed and corruption.

-------------------------

oAnth:

this entry is part of the OccupyWallStreet compilation 2011-09/10, here.

September 07 2011

Four short links: 7 September 2011

  1. Comparing Link Attention (Bitly) -- Twitter, Facebook, and direct (email/IM/etc) have remarkably similar patterns of decay of interest. (via Hilary Mason)
  2. Three Ages of Google -- from batch, to scaling through datacenters, and finally now to techniques for real-time scaling. Of interest to everyone interested in low-latency high-throughput transactions. Datacenters have the diameter of a microsecond, yet we are still using entire stacks designed for WANs. Real-time requires low and bounded latencies and our stacks can't provide low latency at scale. We need to fix this problem and towards this end Luiz sets out a research agenda, targeting problems that need to be solved. (via Tim O'Reilly)
  3. eReaders and eBooks (Luke Wroblewski) -- many eye-opening facts. In 2010 Amazon sold 115 Kindle books for every 100 paperback books. 65% of eReader owners use them in bed, in fact 37% of device usage is in bed.
  4. VT220 on a Mac -- dead sexy look. Impressive how many adapters you need to be able to hook a dingy old serial cable up to your shiny new computer.

May 11 2011

How the cloud helps Netflix

NetflixAs Internet-based companies outgrow their data centers, they're looking at larger cloud-based infrastructures such as those offered by Microsoft, Google, and Amazon. Last year, Netflix made such a transition when it moved some of its services into Amazon's cloud.

In a recent interview, Adrian Cockcroft, (@adrianco) cloud architect at Netflix and a speaker at Velocity 2011, talked about what it took to move Netflix to the cloud, why they chose Amazon's platform, and how the company is accommodating the increasing demands of streaming.

Our interview follows.


Why did Netflix choose to migrate to Amazon's cloud?

AdrianCockcroft.jpg Adrian Cockcroft: We couldn't build our own data centers fast enough to track our growth rate and global roll out, so we leveraged Amazon's ability to build and run large-scale infrastructure. In doing that, we got extreme agility. For example, when we decided to test world-wide deployment of services, our developers were immediately able to launch large-scale deployments and tests on another continent, with no planning delay.

What architectural changes were required to move from a conventional data center to a cloud environment?

Adrian Cockcroft: We took the opportunity to re-work our apps to a fine-grain SOA-style architecture, where each developer pushes his own auto-scaled service. We made a clean separation of stateful services and stateless business logic, and designed with the assumption that large numbers of systems would fail and that we should keep running without intervention. This was largely about paying down our technical debt and building a scalable web-based product using current best practices.

Velocity 2011, being held June 14-16 in Santa Clara, Calif., offers the skills and tools you need to master web performance and operations.

Save 20% on registration with the code VEL11RAD

What issues are you facing as streaming demand increases?

Adrian Cockcroft: We work with all three "terabit-scale" content delivery networks — Level 3, Limelight, and Akamai. They stream our movies to the end customer, and if there is a problem with one of them, traffic automatically switches to another. We don't see any limits on how much traffic we can stream. We aren't trying to feed everyone in the world from a single central point — it's widely distributed.

Netflix doesn't ask customers to change much on their side (browsers, speeds, etc.) — how do you achieve this level of inclusivity, and do you see it continuing?

Adrian Cockcroft: We have very wide ranging support for streaming devices and expect this to continue. We are working on the HTML5 video tag standards, which may eventually allow DRM-protected playback of movies on any browser with no plugin. We currently depend on Silverlight for Windows and Mac OS, and we don't have a supported DRM mechanism for playback on Linux browsers.

For hardware devices, we work with the chip manufacturers to build Netflix-ready versions of the chipsets used to build TV sets and Blu-ray players. That way we are included in almost all new Internet-connected TV devices.

This interview was edited and condensed.



Related:


April 21 2011

Developing countries and Open Compute

Open Compute ProjectDuring a panel discussion after the recent Facebook Open Compute announcement, a couple of panelists — Jason Waxman, GM in Intel's server platforms group, and Forrest Norrod, VP and GM of Dell's server platform — indicated the project could be beneficial to developing countries. Waxman said:

The reality is, you walk into data centers in emerging countries and it's a 2-kilowatt rack and there's maybe three servers in that rack, and the whole data center is powered inefficiently — their air is going every which way and it's hot, it's cold. It costs a lot. It's not ecologically conscious. By opening up this platform and by building awareness of what the best practices are in how to build a data center, how to make efficient servers and why you should care about building efficient servers and how to densely populate into a rack, there are a lot of places ... that can benefit from this type of information.

In a similar vein, Norrod said:

I think what you're going to see happen here is an opportunity for those Internet companies in the developing world to take a leap forward, jumping over the last 15 years of learnings, and exploiting the most efficient data center and server designs that we have today.

The developing countries angle intrigued me, so I sent an email to Benetech founder and CEO Jim Fruchterman to get his take. Fruchterman's company has a unique focus: apply the "intellectual capital and resources of Silicon Valley" to create solutions around the world for a variety of social problems. Recent projects have focused on human rights, literacy, and the development of the Miradi nature conservation project software.

His verdict? While efficient data centers are useful, they're secondary to pressing issues like infrastructure, reliable power, and basic literacy.

Fruchterman's reply follows:

JimFruchterman.jpgWhile I'm excited about an open initiative coming from Facebook, I'm not so sure that its impact on developing countries will be all that significant in the foreseeable future. Watching the announcement video, I didn't find these words coming out of the Facebook teams' mouths, but instead the Intel and Dell panelists. And, their comments focused mostly on India, China and Brazil — not exactly your typical "developing" countries.

The good news is, of course, that these open plans show how to reduce energy and acquisition costs per compute cycle. So, anyone building a data center can build a cheaper and lower power data center. That's great. But, building data centers is probably not on the top of the wish lists of most developing countries. Telecom and broadband infrastructure, reliable power (at the grid level, not the server power supply level), end-user device cost and reliability, localization, and even basic literacy seem to be more crucial to these communities. And, most of these factors are prerequisites to investing significantly in data centers.

Of course, our biggest concerns around Facebook are around free speech, anonymous speech, and the protection of human rights defenders. Facebook is increasingly a standard part of global user experience, and we think that it's crucial that Facebook get in front of these concerns, rather than being inadvertently a tool of repressive governments. We're glad that groups like the Electronic Frontier Foundation (EFF) have been working with Facebook and seeing progress, but we need more.

Fruchterman's response was edited and condensed.



Related:


April 09 2011

April 07 2011

What Facebook's Open Compute Project means

Open Compute ProjectToday, Jonathan Heiliger, VP of Operations at Facebook, and his team announced the Open Compute Project, releasing their data center hardware stack as open source. This is a revolutionary project, and I believe it's one of the most important in infrastructure history. Let me explain why.

The way we operate systems and datacenters at web scale is fundamentally different than the world most server vendors seem to design their products to run in.

Web-scale systems focus on the entire system as a whole. In our world, individual servers are not special, and treating them as special can be dangerous. We expect servers to fail and we increasingly rely on the software we write to manage those failures. In many cases, the most valuable thing we can do when hardware fails is to simply provision a new one as quickly as possible. That means having enough capacity to do that, a way of programmatically managing the infrastructure, and an easy way to replace the failed components.

The server vendors have been slow to make this transition because they have been focused on individual servers, rather than systems as a whole. What we want to buy is racks of machines, with power and networking preconfigured, which we can wheel in, bolt down, and plug in. For the most part we don't care about logos, faceplates, and paint jobs. We won't use complex integrated proprietary management interfaces, and we haven't cared about video cards in a long time ... although it is still very hard to buy a server without them.

This gap is what led Google to build their own machines optimized for their own applications in their own datacenters. When Google did this, they gained a significant competitive advantage. Nobody else could deploy as much compute power as quickly and efficiently. To complete with Google's developers you also must compete with their operations and data center teams. As Tim O'Reilly said: "Operations is the new secret sauce."

When Jonathan and his team set out to build Facebook's new datacenter in Oregon, they knew they would have to do something similar to achieve the needed efficiency. Jonathan says that the Prineville, Ore. data center uses 38% less energy to do the same work as Facebook's existing facilities, while costing 24% less.

Facebook then took the revolutionary step of releasing the designs for most of the hardware in the datacenter under the Creative Commons license. They released everything from the power supply and battery backup systems to the rack hardware, motherboards, chassis, battery cabinets, and even their electrical and mechanical construction specifications.

This is a gigantic step for open source hardware, for the evolution of the web and cloud computing, and for infrastructure and operations in general. This is the beginning of a shift that began with open source software, from vendors and consumers to a participatory and collaborative model. Jonathan explains:

"The ultimate goal of the Open Compute Project, however, is to spark a collaborative dialogue. We're already talking with our peers about how we can work together on Open Compute Project technology. We want to recruit others to be part of this collaboration — and we invite you to join us in this mission to collectively develop the most efficient computing infrastructure possible."

At the announcement this morning, Graham Weston of Rackspace announced that they would be participating in Open Compute, which is an ideal compliment to the OpenStack cloud computing projects. Representatives from Dell and HP spoke at the announcement and also said that they would participate in this new project. The conversation has already begun.

December 09 2010

Four short links: 9 December 2010

  1. Lowersrc -- simple dynamic image placeholders for wireframing. Open source Javascript. (via Lachlan Hardy on Twitter)
  2. In Praise of the Long Form (Julie Starr) -- It can be time consuming sifting through the daily wall of news stories and blogposts to find the handful of gems that genuinely interest or move you. These services, which recommend only a handful of excellent journalism pieces each day, can help. The act of selection, the human process of filtering, remains a valuable service.
  3. Glu -- LinkedIn's application deployment framework. (via Pete Warden)
  4. The Risky Cloud (Simon Phipps) -- While the Internet itself may have a high immunity to attacks, a monoculture hosted on it does not. We might be able to survive a technical outage, but a political outage or a full-fledged termination of service are likely to put a company that's relied on the cloud for critical infrastructure out of business.

July 16 2010

Four short links: 16 July 2010

  1. GPL WordPress Theme Angst -- a podcaster brought together Matt Mullenweg (creator of WordPress), and Chris Pearson (creator of the Thesis theme). Chris doesn't believe WordPress's GPL should be inherited by themes. Matt does, and the SFLC and others agree. The conversation is interesting because (a) they and the podcaster do a great job of keeping it civil and on-track and purposeful, and (b) Chris is unswayed. Chris built on GPLed software without realizing it, and is having trouble with the implications. Chris's experience, and feelings, and thought processes, are replicated all around the world. This is like a usability bug for free software. (via waxpancake on Twitter)
  2. 480G SSD Drive -- for a mere $1,599.99. If you wonder why everyone's madly in love with parallel, it's because of this order-of-magnitude+ difference in price between regular hard drives and the Fast Solution. Right now, the only way to rapidly and affordably crunch a ton of data is to go parallel. (via marcoarment on Twitter)
  3. Pandas and Lobsters: Why Google Cannot Build Social Software -- this resonates with me. The primary purpose of a social application is connecting with others, seeing what they're up to, and maybe even having some small, fun interactions that though not utilitarian are entertaining and help us connect with our own humanity. Google apps are for working and getting things done; social apps are for interacting and having fun. Read it for the lobster analogy, which is gold.
  4. Wayfinder -- The majority of all the location and navigation related software developed at Wayfinder Systems, a fully owned Vodafone subsidiary, is made available publicly under a BSD licence. This includes the distributed back-end server, tools to manage the server cluster and map conversion as well as client software for e.g. Android, iPhone and Symbian S60. Technical documentation is available in the wiki and discussions around the software are hosted in the forum. Interesting, and out of the blue. At the very least, there's some learning to be done by reading the server infrastructure. (via monkchips on Twitter)

March 16 2010

Google Fiber and the FCC National Broadband Plan

I've puzzled over Google's Fiber project ever since they announced it. It seemed too big, too hubristic (even for a company that's already big and has earned the right to hubris)--and also not a business Google would want to be in. Providing the "last mile" of Internet service is a high cost/low payoff business that I'm glad I escaped (a friend an I seriously considered starting an ISP back in '92, until we said "How would we deal with customers?").


But the FCC's announcement of their plans to widen broadband Internet access in the US (the "National Broadband Strategy") puts Google Fiber in a new context. The FCC's plans are cast in terms of upgrading and expanding the network infrastructure. That's a familiar debate, and Google is a familiar participant. This is really just an extension of the "network neutrality" debate that has been going on with fits and starts over the past few years.


Google has been outspoken in their support for the idea that network carriers shouldn't discriminate between different kinds of traffic. The established Internet carriers largely have opposed network neutrality, arguing that they can't afford to build the kind of high-bandwidth networks that are required for delivering video and other media. While the debate over network neutrality has quieted down recently, the issues are still floating out there, and no less important. Will the networks of the next few decades be able to handle whatever kinds of traffic we want to throw at it?


In the context of network neutrality, and in the context of the FCC's still unannounced (and certain to be controversial) plans, Google Fiber is the trump card. It's often been said that the Internet routes around damage. Censorship is one form of damage; non-neutral networks are another. Which network would you choose? One that can't carry the traffic you want, or one that will? Let's get concrete: if you want video, would you choose a network that only delivers real-time video from providers who have paid additional bandwidth charges to your carrier? Google's core business is predicated upon the availability of richer and richer content on the net. If they can ensure that all the traffic that people want can be carried, they win; if they can't, if the carriers mediate what can and can't be carried, they lose. But Google Fiber ensures that our future networks will indeed be able to "route around damage", and makes what the other carriers do irrelevant. Google Fiber essentially tells the carriers "If you don't build the network we need, we will; you will either move with the times, or you won't survive."


Looked at this way, non-network-neutrality requires a weird kind of collusion. Deregulating the carriers by allowing them to charge premium prices for high bandwidth services, only works as long as all the carriers play the same game, and all raise similar barriers against high-bandwidth traffic. As soon as one carrier says "Hey, we have a bigger vision; we're not going to put limits on what you want to do," the game is over. You'd be a fool not to use that carrier. You want live high-definition video conferencing? You got it. You want 3D video, requiring astronomical data rates? You want services we haven't imagined yet? You can get those too. AT&T and Verizon don't like it? Tough; it's a free market, and if you offer a non-competitive product, you lose. The problem with the entrenched carriers' vision is that, if you discriminate against high-bandwidth services, you'll kill those services off before they can even be invented.


The U.S. is facing huge problems with decaying infrastructure. At one time, we had the best highway system, the best phone system, the most reliable power grid; no longer. Public funding hasn't solved the problem; in these tea-party days, nobody's willing to pay the bills, and few people understand why the bills have to be as large as they are. (If you want some insight into the problems of decaying infrastructure, here's an op-ed piece on Pennsylvania's problems repairing its bridges.) Neither has the private sector, where short-term gain almost always wins over the long-term picture.


But decaying network infrastructure is a threat to Google's core business, and they aren't going to stand by idly. Even if they don't intend to become a carrier themselves, as Eric Schmidt has stated, they could easily change their minds if the other carriers don't keep up. There's nothing like competition (or even the threat of competition) to make the markets work.


We're looking at a rare conjunction. It's refreshing to see a large corporation talk about creating the infrastructure they need to prosper--even if that means getting into a new kind of business. To rewrite the FCC Chairman's metaphor, it's as if GM and Ford were making plans to upgrade the highway system so they could sell better cars. It's an approach that's uniquely Googley; it's the infrastructure analog to releasing plugins that "fix" Internet Explorer for HTML5. "If it's broken and you won't fix it, we will." That's a good message for the carriers to hear. Likewise, it's refreshing to see the FCC, which has usually been a dull and lackluster agency, taking the lead in such a critical area. An analyst quoted by the Times says "One again, the FCC is putting the service providers on the spot." As well they should. A first-class communications network for all citizens is essential if the U.S. is going to be competitive in the coming decades. It's no surprise that Google and the FCC understands this, but I'm excited by their commitment to building it.


Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl