Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

February 24 2014

Four short links: 24 February 2014

  1. Understanding Understanding Source Code with Functional Magnetic Resonance Imaging (PDF) — we observed 17 participants inside an fMRI scanner while they were comprehending short source-code snippets, which we contrasted with locating syntax error. We found a clear, distinct activation pattern of five brain regions, which are related to working memory, attention, and language processing. I’m wary of fMRI studies but welcome more studies that try to identify what we do when we code. (Or, in this case, identify syntax errors—if they wanted to observe real programming, they’d watch subjects creating syntax errors) (via Slashdot)
  2. Oobleck Security (O’Reilly Radar) — if you missed or skimmed this, go back and reread it. The future will be defined by the objects that turn on us. 50s scifi was so close but instead of human-shaped positronic robots, it’ll be our cars, HVAC systems, light bulbs, and TVs. Reminds me of the excellent Old Paint by Megan Lindholm.
  3. Google Readying Android Watch — just as Samsung moves away from Android for smart watches and I buy me and my wife a Pebble watch each for our anniversary. Watches are in the same space as Goggles and other wearables: solutions hunting for a problem, a use case, a killer tap. “OK Google, show me offers from brands I love near me” isn’t it (and is a low-lying operating system function anyway, not a userland command).
  4. Most Winning A/B Test Results are Illusory (PDF) — Statisticians have known for almost a hundred years how to ensure that experimenters don’t get misled by their experiments [...] I’ll show how these methods ensure equally robust results when applied to A/B testing.

February 04 2014

Four short links: 4 February 2014

  1. UX Fundamentals, Crash Course — 31 posts introducing the fundamental practices and mindsets of UX.
  2. Why We Love Persona And You Should Too — Mozilla’s identity system is an interesting offering. Fancy that, you might have single-sign on without Single Pwn-On.
  3. Raspberry Pi As Test Harness — Pi accessory maker uses Pis to automate the testing of his … it’s Pis all the way down.
  4. The Holodeck Begins to Take Shape — displays, computation, and interesting input devices, are coming together in various guises.

August 20 2013

Four short links: 22 August 2013

  1. bletchley (Google Code) — Bletchley is currently in the early stages of development and consists of tools which provide: Automated token encoding detection (36 encoding variants); Passive ciphertext block length and repetition analysis; Script generator for efficient automation of HTTP requests; A flexible, multithreaded padding oracle attack library with CBC-R support.
  2. Hackers of the RenaissanceFour centuries ago, information was as tightly guarded by intellectuals and their wealthy patrons as it is today. But a few episodes around 1600 confirm that the Hacker Ethic and its attendant emphasis on open-source information and a “hands-on imperative” was around long before computers hit the scene. (via BoingBoing)
  3. Maker Camp 2013: A Look Back (YouTube) — This summer, over 1 million campers made 30 cool projects, took 6 epic field trips, and met a bunch of awesome makers.
  4. huxley (Github) — Watches you browse, takes screenshots, tells you when they change. Huxley is a test-like system for catching visual regressions in Web applications. (via Alex Dong)

December 10 2012

Four short links: 10 December 2012

  1. RE2: A Principled Approach to Regular Expressions — a regular expression engine without backtracking, so without the potential for exponential pathological runtimes.
  2. Mobile is Entertainment (Luke Wroblewski) — 79% of mobile app time is spent on fun, even as desktop web use is declining.
  3. Five UX Research Pitfalls (Elaine Wherry) — I live this every day: Sometimes someone will propose an idea that doesn’t seem to make sense. While your initial reaction may be to be defensive or to point out the flaws in the proposed A/B study, you should consider that your buddy is responding to something outside your view and that you don’t have all of the data.
  4. Building a Keyboard: Part 1 (Jesse Vincent) — and Part 2 and general musings on the topic of keyboards. Jesse built his own. Yeah, he’s that badass.

October 17 2012

Tools for test-driven development in Scala

Scala, a language designed for well-structured and readable programs, is richly provisioned with testing frameworks. The community has adopted test-driven development (TDD) and behavior-driven development (BDD) with zeal. These represent the baseline for trustworthy code development today.

TDD and BDD expand beyond the traditional model of incorporating a test phase into the development process. Most programmers know that ad hoc debugging is not sufficient and that they need to run tests on isolated functions (unit testing) to make sure that a change doesn’t break anything (regression testing). But testing libraries available for Scala, in supporting TDD and BDD, encourage developers to write tests before they even write the code being tested.

Tests can be expressed in human-readable text reminiscent of natural language (although you can’t stretch the comparison too far) so that you are documenting what you want your code to do while expressing the test that ensures that code ultimately will meet your requirements.

Daniel Hinojosa, author of Testing in Scala, describes the frameworks and their use for testing, TDD, and BDD in this interview.

Highlights from our discussion include:

  • The special advantages of test frameworks for Scala. [Discussed at the 0:10 mark]
  • The two main testing frameworks, ScalaTest and Specs2. It’s worth studying both of these frameworks, but you’ll probably ultimately stick to one based on programming style and how you want to do mocking. [Discussed at the 2:12 mark]
  • Mocking simply means removing operations that will take a long time or require outside support, such as a database. When testing, you want to fool your code into believing that the operation took place while actually simulating it. This is especially critical for TDD, because tests are so extensive and run so regularly. [Discussed at the 04:01 mark]
  • How the new ScalaMock library extends the abilities to mock parts of the system. This is an emerging technology. [Discussed at the 7:36 mark]
  • Generating random input test data. You can actually make your code more robust by throwing garbage values at it rather than by planning what data to input, because a programmer usually fails to anticipate some of the data that will be encountered in production use. For instance, you might not realize how large the input data will be, or might forget to include negative numbers. Scala gives you a full range of control ranging from specifying precise values to allowing completely random input. [Discussed at the 8:24 mark]
  • Looking toward the future of Scala testing. [Discussed at the 10:38 mark]

You can view the entire conversation in the following video:

Related:

September 09 2012

The many sides to shipping a great software project

Chris Vander Mey, CEO of Scaled Recognition, and author of a new O’Reilly book, Shipping Greatness, lays out in this video some of the deep lessons he learned during his years working on some very high-impact and high-priority projects at Google and Amazon.

Chris takes a very expansive view of project management, stressing the crucial decisions and attitudes that leaders need to take at every stage from the team’s initial mission statement through the design, coding, and testing to the ultimate launch. By merging technical, organizational, and cultural issues, he unravels some of the magic that makes projects successful.

Highlights from the full video interview include:

  • Some of the projects Chris has shipped. [Discussed at the 0:30 mark]
  • How to listen to your audience while giving a presentation. [Discussed at the 1:24 mark]
  • Deadlines and launches. [Discussed at the 6:40 mark]
  • Importance of keeping team focused on user experience of launch. [Discussed at the 12:15 mark]
  • Creating an API, and its relationship to requirements and Service Oriented Architectures. [Discussed at the 15:27 mark]
  • 22:36 What integration testing can accomplish. [Discussed at the 22:36 mark]

You can view the entire conversation in the following video:

July 20 2012

Data Jujitsu: The art of turning data into product

Having worked in academia, government and industry, I’ve had a unique opportunity to build products in each sector. Much of this product development has been around building data products. Just as methods for general product development have steadily improved, so have the ideas for developing data products. Thanks to large investments in the general area of data science, many major innovations (e.g., Hadoop, Voldemort, Cassandra, HBase, Pig, Hive, etc.) have made data products easier to build. Nonetheless, data products are unique in that they are often extremely difficult, and seemingly intractable for small teams with limited funds. Yet, they get solved every day.

How? Are the people who solve them superhuman data scientists who can come up with better ideas in five minutes than most people can in a lifetime? Are they magicians of applied math who can cobble together millions of lines of code for high-performance machine learning in a few hours? No. Many of them are incredibly smart, but meeting big problems head-on usually isn’t the winning approach. There’s a method to solving data problems that avoids the big, heavyweight solution, and instead, concentrates building something quickly and iterating. Smart data scientists don’t just solve big, hard problems; they also have an instinct for making big problems small.

We call this Data Jujitsu: the art of using multiple data elements in clever ways to solve iterative problems that, when combined, solve a data problem that might otherwise be intractable. It’s related to Wikipedia’s definition of the ancient martial art of jujitsu: “the art or technique of manipulating the opponent’s force against himself rather than confronting it with one’s own force.”

How do we apply this idea to data? What is a data problem’s “weight,” and how do we use that weight against itself? These are the questions that we’ll work through in the subsequent sections.

To start, for me, a good definition of a data product is a product that facilitates an end goal through the use of data. It’s tempting to think of a data product purely as a data problem. After all, there’s nothing more fun than throwing a lot of technical expertise and fancy algorithmic work at a difficult problem. That’s what we’ve been trained to do; it’s why we got into this game in the first place. But in my experience, meeting the problem head-on is a recipe for disaster. Building a great data product is extremely challenging, and the problem will always become more complex, perhaps intractable, as you try to solve it.

Before investing in a big effort, you need to answer one simple question: Does anyone want or need your product? If no one wants the product, all the analytical work you throw at it will be wasted. So, start with something simple that lets you determine whether there are any customers. To do that, you’ll have to take some clever shortcuts to get your product off the ground. Sometimes, these shortcuts will survive into the finished version because they represent some fundamentally good ideas that you might not have seen otherwise; sometimes, they’ll be replaced by more complex analytic techniques. In any case, the fundamental idea is that you shouldn’t solve the whole problem at once. Solve a simple piece that shows you whether there’s an interest. It doesn’t have to be a great solution; it just has to be good enough to let you know whether it’s worth going further (e.g., a minimum viable product).

Here’s a trivial example. What if you want to collect a user’s address? You might consider a free-form text box, but writing a parser that can identify a name, street number, apartment number, city, zip code, etc., is a challenging problem due to the complexity of the edge cases. Users don’t necessarily put in separators like commas, nor do they necessarily spell states and cities correctly. The problem becomes much simpler if you do what most web applications do: provide separate text areas for each field, and make states drop-down boxes. The problem becomes even simpler if you can populate the city and state from a zip code (or equivalent).

Now for a less trivial example. A LinkedIn profile includes a tremendous amount of information. Can we use a profile like this to build a recommendation system for conferences? The answer is “yes.” But before answering “how,” it’s important to step back and ask some fundamental questions:

A) Does the customer care? Is there a market fit? If there isn’t, there’s no sense in building an application.

B) How long do we have to learn the answer to Question A?

We could start by creating and testing a full-fledged recommendation engine. This would require an information extraction system, an information retrieval system, a model training layer, a front end with a well-designed user interface, and so on. It might take well over 1,000 hours of work before we find out whether the user even cares.

Instead, we could build a much simpler system. Among other things, the LinkedIn profile lists books.

Book recommendations from LinkedIn profile

Books have ISBN numbers, and ISBN numbers are tagged with keywords. Similarly, there are catalogs of events that are also cataloged with keywords (Lanyrd is one). We can do some quick and dirty matching between keywords, build a simple user interface, and deploy it in an ad slot to a limited group of highly engaged users. The result isn’t the best recommendation system imaginable, but it’s good enough to get a sense of whether the users care. Most importantly, it can be built quickly (e.g., in a few days, if not a few hours). At this point, the product is far from finished. But now you have something you can test to find out whether customers are interested. If so, you can then gear up for the bigger effort. You can build a more interactive user interface, add features, integrate new data in real time, and improve the quality of the recommendation engine. You can use other parts of the profile (skills, groups and associations, even recent tweets) as part of a complex AI or machine learning engine to generate recommendations.

The key is to start simple and stay simple for as long as possible. Ideas for data products tend to start simple and become complex; if they start complex, they become impossible. But starting simple isn’t always easy. How do you solve individual parts of a much larger problem? Over time, you’ll develop a repertoire of tools that work for you. Here are some ideas to get you started.

Use product design

One of the biggest challenges of working with data is getting the data in a useful form. It’s easy to overlook the task of cleaning the data and jump to trying to build the product, but you’ll fail if getting the data into a usable form isn’t the first priority. For example, let’s say you have a simple text field into which the user types a previous employer. How many ways are there to type “IBM”? A few dozen? In fact, thousands: everything from “IBM” and “I.B.M.” to “T.J. Watson Labs” and “Netezza.” Let’s assume that to build our data product it’s necessary to have all these names tied to a common ID. One common approach to disambiguate the results would be to build a relatively complex artificial intelligence engine, but this would take significant time. Another approach would be to have a drop-down list of all the companies, but this would be a horrible user experience due to the length of the list and limited flexibility in choices.

What about Data Jujitsu? Is there a much simpler and more reliable solution? Yes, but not in artificial intelligence. It’s not hard to build a user interface that helps the user arrive at a clean answer. For example, you can:

  • Support type-ahead, encouraging the user to select the most popular term.
  • Prompt the user with “did you mean … ?”
  • If at this point you still don’t have anything usable, ask the user for more help: Ask for a stock ticker symbol or the URL of the company’s home page.

The point is to have a conversation rather than just a form. Engage the user to help you, rather than relying on analysis. You’re not just getting the user more involved (which is good in itself), you’re getting clean data that will simplify the work for your back-end systems. As a matter of practice, I’ve found that trying to solve a problem on the back end is 100-1,000 times more expensive than on the front end.

MapR delivers on the promise of Hadoop, making big data management and analysis a reality for more business users. The award-winning MapR Distribution brings unprecedented dependability, speed, and ease-of-use to Hadoop.

When in doubt, use humans

As technologists, we are predisposed to look for scalable technical solutions. We often jump to technical solutions before we know what solutions will work. Instead, see if you can break down the task into bite-size portions that humans can do, then figure out a technical solution that allows the process to scale. Amazon’s Mechanical Turk is a system for posting small problems online and paying people a small amount (typically a couple of cents) for solutions. It’s come to the rescue of many an entrepreneur who needed to get a product off the ground quickly but didn’t have months to spend on developing an analytical solution.

Here’s an example. A camera company wanted to test a product that would tell restaurant owners how many tables were occupied or empty during the day. If you treat this problem as an exercise in computer vision, it’s very complex. It can be solved, but it will take some PhDs, lots of time, and large amounts of computing power. But there’s a simpler solution. Humans can easily look at a picture and tell whether or not a table has anyone seated at it. So the company took images at regular intervals and used humans to count occupied tables. This gave them the opportunity to test their idea and determine whether the product was viable before investing in a solution to a very difficult problem. It also gave them the ability to find out what their customers really wanted to know: just the number of occupied tables? The average number of people at each table? How long customers stayed at the table? That way, when they start to build the real product, using computer vision techniques rather than humans, they know what problem to solve.

Humans are also useful for separating valid input from invalid. Imagine building a system to collect recipes for an online cookbook. You know you’ll get a fair amount of spam; how do you separate out the legitimate recipes? Again, this is a difficult problem for artificial intelligence without substantial investment, but a fairly simple problem for humans. When getting started, we can send each page to three people via Mechanical Turk. If all agree that the recipe is legitimate, we can use it. If all agree that the recipe is spam, we can reject it. And if the vote is split, we can escalate by trying another set of reviewers or adding additional data to those additional reviewers that allows them to make a better assessment. The key thing is to watch for the signals the humans use to make their decisions. When we’ve identified those signals, we can start building more complex automated systems. By using humans to solve the problem initially, we can learn a great deal about the problem at a very low cost.

Aardvark (a promising startup that was acquired by Google) took a similar path. Their goal was to build a question and answer service that routed users’ questions to real people with “inside knowledge.” For example, if a user wanted to know a good restaurant for a first date in Palo Alto, Calif., Aardvark would route the question to people living in the broader Palo Alto area, then compile the answers. They started by building tools that would allow employees to route the questions by hand. They knew this wouldn’t scale, but it let them learn enough about the routing problem to start building a more automated solution. The human solution not only made it clear what they needed to build, it proved that the technical solution was worth the effort and bought them the time they needed to build it.

In both cases, if you were to graph the work expended versus time, it would look something like this:

Work vs Time graph

Ignore the fact that I’ve violated a fundamental law of data science and presented a graph without scales on the axes. The point is that technical solutions will always win in the long run; they’ll always be more efficient, and even a poor technical solution is likely to scale better than using humans to answer questions. But when you’re getting started, you don’t care about the long run. You just want to survive long enough to have a long run, to prove that your product has value. And in the short term, human solutions require much less work. Worry about scaling when you need to.

Be opportunistic for wins

I’ve stressed building the simplest possible thing, even if you need to take shortcuts that appear to be extreme. Once you’ve got something working and you’ve proven that users want it, the next step is to improve the product. Amazon provides a good example. Back when they started, Amazon pages contained product details, reviews, the price, and a button to buy the item. But what if the customer isn’t sure he’s found what he wants and wants to do some comparison shopping? That’s simple enough in the real world, but in the early days of Amazon, the only alternative was to go back to the search engine. This is a “dead end flow”: Once the user has gone back to the search box, or to Google, there’s a good chance that he’s lost. He might find the book he wants at a competitor, even if Amazon sells the same product at a better price.

Amazon needed to build pages that channeled users into other related products; they needed to direct users to similar pages so that they wouldn’t lose the customer who didn’t buy the first thing he saw. They could have built a complex recommendation system, but opted for a far simpler system. They did this by building collaborative filters to add “People who viewed this product also viewed” to their pages. This addition had a profound effect: Users can do product research without leaving the site. If you don’t see what you want at first, Amazon channels you into another page. It was so successful that Amazon has developed many variants, including “People who bought this also bought” (so you can load up on accessories), and so on.

The collaborative filter is a great example of starting with a simple product that becomes a more complex system later, once you know that it works. As you begin to scale the collaborative filter, you have to track the data for all purchases correctly, build the data stores to hold that data, build a processing layer, develop the processes to update the data, and deal with relevancy issues. Relevance can be tricky. When there’s little data, it’s easy for a collaborative filter to give strange results; with a few errant clicks in the database, it’s easy to get from fashion accessories to power tools. At the same time, there are still ways to make the problem simpler. It’s possible to do the data analysis in a batch mode, reducing the time pressure; rather than compute “People who viewed this also viewed” on the fly, you can compute it nightly (or even weekly or monthly). You can make do with the occasional irrelevant answer (“People who bought leather handbags also bought power screwdrivers”), or perhaps even use Mechanical Turk to filter your pre-computed recommendations. Or even better, ask the users for help.

Being opportunistic can be done with analysis of general products, too. The Wall Street Journal chronicles a case in which Zynga was able to rapidly build on a success in their game FishVille. You can earn credits to buy fish, but you can also purchase credits. The Zynga Analytics team noticed that a particular set of fish was being purchased at six times the rate of all the other fish. Zynga took the opportunity to design several similar virtual fish, for which they charged $3 to $4 each. The data showed that they clearly had stumbled on to something. The common trait was that the translucent feature of the fish was what the customer wanted. Using this combination of quick observations and deploying lightweight tests, they were able to significantly add to their profits.

Ground your product in the real world

We can learn more from Amazon’s collaborative filters. What happens when you go into a physical store to buy something, say, headphones? You might look for sale prices, you might look for reviews, but you almost certainly don’t just look at one product. You look at a few, most likely something located near whatever first caught your eye. By adding “People who viewed this product also viewed,” Amazon built a similar experience into the web page. In essence, they “grounded” their virtual experience to a similar one in the real world via data.

LinkedIn’s People You May Know embodies both Data Jujitsu and grounding the product in the real world. Think about what happens when you arrive at a conference reception. You walk around the outer edge until you find someone you recognize, then you latch on to that person until you see some more people you know. At that point, your interaction style changes: Once you know there are friendly faces around, you’re free to engage with people you don’t know. (It’s a great exercise to watch this happen the next time you attend a conference.)

The same kind of experience takes place when you join a new social network. The first data scientists at LinkedIn recognized this and realized that their online world had two big challenges. First, because it is a website, you can’t passively walk around the outer edges of the group. It’s like looking for friends in a darkened room. Second, LinkedIn is fighting for every second you stay on its site; it’s not like a conference where you’re likely to have a drink or two while looking for friends. There’s a short window, really only a few seconds, for you to become engaged. If you don’t see any point to the site, you click somewhere else and you’re gone.

Earlier attempts to solve this problem, such as address book importers or search facilities, imposed too much friction. They required too much work for the poor user, who still didn’t understand why the site was valuable. But our LinkedIn team realized that a few simple heuristics could be used to determine a set of “people you may know.” We didn’t have the resources to build a complete solution. But to get something started, we could run a series of simple queries on the database: “what do you do,” “where do you live,” “where did you go to school,” and other questions that you might ask someone you met for the first time. We also used triangle closing (if Jane is connected to Mark, and Mark is connected to Sally, Sally and Jane have a high likelihood of knowing each other). To test the idea, we built a customized ad that showed each user the three people they were most likely to know. Clicking on one of those people took you to the “add connection” page. (Of course, if you saw the ad again, the results would have been the same, but the point was to quickly test with minimal impact to the user.) The results were overwhelming; it was clear that this needed to become a full-blown product, and it was quickly replicated by Facebook and all other social networks. Only after realizing that we had a hit on our hands did we do the work required to build the sophisticated machinery necessary to scale the results.

After People You May Know, our LinkedIn team realized that we could use a similar approach to build Groups You May Like. We built it almost as an exercise, when we were familiarizing ourselves with some new database technologies. It took under a week to build the first version and get it on to the home page, again using an ad slot. In the process, we learned a lot about the limitations and power of a recommendation system. On one hand, the numbers showed that people really loved the product. But additional filter rules were needed: Users didn’t like it when the system recommended political or religious groups. In hindsight, this seems obvious, almost funny, but it would have been very hard to anticipate all the rules we needed in advance. This lightweight testing gave us the flexibility to add rules as we discovered we needed them. Since we needed to test our new databases anyway, we essentially got this product “for free.” It’s another great example of a group that did something successful, then immediately took advantage of the opportunities for further wins.

Give data back to the user to create additional value

By giving data back to the user, you can create both engagement and revenue. We’re far enough into the data game that most users have realized that they’re not the customer, they’re the product. Their role in the system is to generate data, either to assist in ad targeting or to be sold to the highest bidder, or both. They may accept that, but I don’t know anyone who’s happy about it. But giving data back to the user is a way of showing that you’re on their side, increasing their engagement with your product.

How do you give data back to the user? LinkedIn has a product called “Who’s Viewed Your Profile.” This product lists the people who have viewed your profile (respecting their privacy settings, of course), and provides statistics about the viewers. There’s a time series view, a list of search terms that have been used to find you, and the geographical areas in which the viewers are located. It’s timely and actionable data, and it’s addictive. It’s visible on everyone’s home page, and it shows the number of profile views, so it’s not static. Every time you look at your LinkedIn page, you’re tempted to click.

Who Viewed Profile box from LinkedIn

And people do click. Engagement is so high that LinkedIn has two versions: one free, and the other part of the subscription package. This product differentiation benefits the casual user, who can see some summary statistics without being overloaded with more sophisticated features, while providing an easy upgrade path for more serious users.

LinkedIn isn’t the only product that provides data back to the user. Xobni analyzes your email to provide better contact management and help you control your inbox. Mint (acquired by Intuit) studies your credit cards to help you understand your expenses and compare them to others in your demographic. Pacific Gas and Electric has a SmartMeter that allows you to analyze your energy usage. We’re even seeing health apps that take data from your phone and other sensors and turn it into a personal dashboard.

In short, everyone reading this has probably spent the last year or more of their professional life immersed in data. But it’s not just us. Everyone, including users, has awakened to the value of data. Don’t hoard it; give it back, and you’ll create an experience that is more engaging and more profitable for both you and your company.

No data vomit

As data scientists, we prefer to interact with the raw data. We know how to import it, transform it, mash it up with other data sources, and visualize it. Most of your customers can’t do that. One of the biggest challenges of developing a data product is figuring out how to give data back to the user. Giving back too much data in a way that’s overwhelming and paralyzing is “data vomit.” It’s natural to build the product that you would want, but it’s very easy to overestimate the abilities of your users. The product you want may not be the product they want.

When we were building the prototype for “Who’s Viewed My Profile,” we created an early version that showed all sorts of amazing data, with a fantastic ability to drill down into the detail. How many clicks did we get when we tested it? Zero. Why? An “inverse interaction law” applies to most users: The more data you present, the less interaction.

Cool interactions graph

The best way to avoid data vomit is to focus on actionability of data. That is, what action do you want the user to take? If you want them to be impressed with the number of things that you can do with the data, then you’re likely producing data vomit. If you’re able to lead them to a clear set of actions, then you’ve built a product with a clear focus.

Expect unforeseen side effects

Of course, it’s impossible to avoid unforeseen side effects completely, right? That’s what “unforeseen” means. However, unforeseen side effects aren’t a joke. One of the best examples of an unforeseen side effect is “My TiVo Thinks I’m Gay.” Most digital video recorders have a recommendation system for other shows you might want to watch; they’ve learned from Amazon. But there are cases wherein a user has watched a particular show (say “Will & Grace”), and then it recommends other shows with similar themes (“The Ellen DeGeneres Show,” “Queer as Folk,” etc.). Along similar lines, An Anglo friend of mine who lives in a neighborhood with many people from Southeast Asia recently told me that his Netflix recommendations are overwhelmed with Bollywood films.

This sounds funny, and it’s even been used as the basis of a sitcom plot. But it’s a real pain point for users. Outsmarting the recommendation engine once it has “decided” what you want is difficult and frustrating, and you stand a good chance of losing the customer. What’s going wrong? In the case of the Bollywood recommendations, the algorithm is probably overemphasizing the movies that have been watched by the surrounding population. With the TiVo, there’s no easy way to tell the system that it’s wrong. Instead, you’re forced to try to outfox it, and users who have tried have discovered that it’s hard to out think an intelligent agent that has gotten the wrong idea.

Improving precision and recall

What tools do we have to think about bad results — things like unfortunate recommendations and collaborative filtering gone wrong? Two concepts, precision and recall, let us describe the problem more precisely. Here’s what they mean:

Precision — The ability to provide a result that exactly matches what’s desired. If you’re building a recommendation engine, can you give a good recommendation every time? If you’re displaying advertisements, will every ad result in a click? That’s high precision.

Recall — The set of possible good recommendations. Recall is fundamentally about inventory: Good recall means that you have a lot of good recommendations, or a lot of advertisements that you can potentially show the user.

It’s obvious that you’d like to have both high precision and high recall. For example, if you’re showing a user advertisements, you’d be in heaven if you have a lot of ads to show, and every ad has a high probability of resulting in a click. Unfortunately, precision and recall often work against each other: As precision increases, recall drops, and vice versa. The number of ads that have a 95% chance of resulting in a click is likely to be small indeed, and the number of ads with a 1% chance is obviously much larger.

So, an important issue in product design is the tradeoff between precision versus recall. If you’re working on a search engine, precision is the key, and having a large inventory of plausible search results is irrelevant. Results that will satisfy the user need to get to the top of the page. Low-precision search results yield a poor experience.

On the other hand, low-precision ads are almost harmless (perhaps because they’re low precision, but that’s another matter). It’s hard to know what advertisement will elicit a click, and generally it’s better to show a user something than nothing at all. We’ve seen enough irrelevant ads that we’ve learned to tune them out effectively.

The difference between these two cases is how the data is presented to the user. Search data is presented directly: If you search Google for “data science,” you’ll get 1.16 billion results in 0.47 seconds (as of this writing). The results on the first few pages will all have the term “data science” in them. You’re getting results directly related to your search; this makes intuitive sense. But the rationale behind advertising content is obfuscated. You see ads, but you don’t know why you were shown those ads. Nothing says, “We showed you this ad because you searched for data science and we know you live in Virginia, so here’s the nearest warehouse for all your data needs.” Since the relationship between the ad and your interests is obfuscated, it’s hard to judge an ad harshly for being irrelevant, but it’s also not something you’re going to pay attention to.

Generalizing beyond advertising, when building any data product in which the data is obfuscated (where there isn’t a clear relationship between the user and the result), you can compromise on precision, but not on recall. But when the data is exposed, focus on high precision.

Subjectivity

Another issue to contend with is subjectivity: How does the user perceive the results? One product at LinkedIn delivers a set of up to 10 job recommendations. The problem is that users focus on the bad recommendations, rather than the good ones. If nine results are spot on and one is off, the user will leave thinking that the entire product is terrible. One bad experience can spoil a consistently good experience. If, over five web sessions, we show you 49 perfect results in a row, but the 50th one doesn’t make sense, the damage is still done. It’s not quite as bad as if the bad result appeared in the first session, but it’s still done, and it’s hard to recover. The most common guideline is to strive for a distribution in which there are many good results, a few great ones, and no bad ones.

That’s only part of the story. You don’t really know what the user will consider a poor recommendation. Here are two sets of job recommendations:

Jobs You May Be Interested In example 1

Jobs You May Be Interested In example 2

What’s important: The job itself? Or the location? Or the title? Will the user consider a recommendation “bad” if it’s a perfect fit, but requires him to move to Minneapolis? What if the job itself is a great fit, but the user really wants “senior” in the title? You really don’t know. It’s very difficult for a recommendation engine to anticipate issues like these.

Enlisting other users

One jujitsu approach to solving this problem is to flip it around and use the social system to our advantage. Instead of sending these recommendations directly to the user, we can send the recommendations to their connections and ask them to pass along the relevant ones. Let’s suppose Mike sends me a job recommendation that, at first glance, I don’t like. One of these two things is likely to happen:

  • I’ll take a look at the job recommendation and realize it is a terrible recommendation and it’s Mike’s fault.
  • I’ll take a look at the job recommendation and try to figure out why Mike sent it. Mike may have seen something in it that I’m missing. Maybe he knows that the company is really great.

At no time is the system being penalized for making a bad recommendation. Furthermore, the product is producing data that now allows us to better train the models and increase overall precision. Thus, a little twist in the product can make a hard relevance problem disappear. This kind of cleverness lets you take a problem that’s extraordinarily challenging and gives you an edge to make the product work.

Referral Center example from LinkedIn

Ask and you shall receive

We often focus on getting a limited set of data from a user. But done correctly, you can engage the user to give you more useful, high-quality data. For example, if you’re building a restaurant recommendation service, you might ask the user for his or her zip code. But if you also ask for the zip code where the user works, you have much more information. Not only can you make recommendations for both locations, but you can predict the user’s typical commute patterns and make recommendations along the way. You increase your value to the user by giving the user a greater diversity of recommendations.

In keeping with Data Jujitsu, predicting commute patterns probably shouldn’t be part of your first release; you want the simplest thing that could possibly work. But asking for the data gives you the potential for a significantly more powerful and valuable product.

Take heed not just to demand data. You need to explain to the user why you’re asking for data; you need to disarm the user’s resistance to providing more information by telling him that you’re going to provide value (in this case, more valuable recommendations), rather than abusing the data. It’s essential to remember that you’re having a conversation with the user, rather than giving him a long form to fill out.

Anticipate failure

As we’ve seen, data products can fail because of relevance problems arising from the tradeoff between precision and recall. Design your product with the assumption that it will fail. And in the process, design it so that you can preserve the user experience even if it fails.

Two data products that demonstrate extremes in user experience are Sony’s AIBO (a robotic pet), and interactive voice response systems (IVR), such as the ones that answer the phone when you call an airline to change a flight.

Let’s consider the AIBO first. It’s a sophisticated data product. It takes in data from different sensors and uses this data to train models so that it can respond to you. What do you do if it falls over or does something similarly silly, like getting stuck walking into a wall? Do you kick it? Curse at it? No. Instead, you’re likely to pick it up and help it along. You are effectively compensating for when it fails. Let’s suppose instead of being a robotic dog, it was a robot that brought hot coffee to you. If it spilled the coffee on you, what would your reaction be? You might both kick it and curse at it. Why the difference? The difference is in the product’s form and execution. By making the robot a dog, Sony limited your expectations; you’re predisposed to cut the robot slack if it doesn’t perform correctly.

Now, let’s consider the IVR system. This is also a sophisticated data product. It tries to understand your speech and route you to the right person, which is no simple task. When you call one these systems, what’s your first response? If it is voice activated, you might say, “operator.” If that doesn’t work, maybe you’ll say “agent” or “representative.” (I suspect you’ll be wanting to scream “human” into the receiver.) Maybe you’ll start pressing the button “0.” Have you ever gone through this process and felt good? More often than not, the result is frustration.

What’s the difference? The IVR product inserts friction into the process (at least from the customer’s perspective), and limits his ability to solve a problem. Furthermore, there isn’t an easy way to override the system. Users think they’re up against a machine that thinks it is smarter than they are, and that is keeping them from doing what they want. Some could argue that this is a design feature, that adding friction is a way of controlling the amount of interaction with customer service agents. But, the net result is frustration for the customer.

You can give your data product a better chance of success by carefully setting the users’ expectations. The AIBO sets expectations relatively low: A user doesn’t expect a robotic dog to be much other than cute. Let’s think back to the job recommendations. By using Data Jujitsu and sending the results to the recipient’s network, rather than directly to him, we create a product that doesn’t act like an overly intelligent machine that the user is going to hate. By enlisting a human to do the filtering, we put a human face behind the recommendation.

One under-appreciated facet of designing data products is how the user feels after using the product. Does he feel good? Empowered? Or disempowered and dejected? A product like the AIBO, or like job recommendations sent via a friend, is structured so that the user is predisposed toward feeling good after he’s finished.

In many applications, a design treatment that gives the user control over the outcome can go far to create interactions that leave the user feeling good. For example, if you’re building a collaborative filter, you will inevitably generate incorrect recommendations. But you can allow the user to tell you about poor recommendations with a button that allows the user to “X” out recommendations he doesn’t like.

Facebook uses this design technique when they show you an ad. They also give you control to hide the ad, as well as an an opportunity to tell them why you don’t think the ad is relevant. The choices they give you range from not being relevant to being offensive. This provides an opportunity to engage users as well as give them control. It turns annoyance into empowerment; rather than being a victim of the bad ad targeting, users get to feel that they can make their own recommendations about which ads they will see in the future.

Facebook ad targeting and customization

Putting Data Jujitsu into practice

You’ve probably recognized some similarities between Data Jujitsu and some of the thought behind agile startups: Data Jujitsu embraces the notion of the minimum viable product and the simplest thing that could possibly work. While these ideas make intuitive sense, as engineers, many of us have to struggle against the drive to produce a beautiful, fully-featured, massively complex solution. There’s a reason that Rube Goldberg cartoons are so attractive. Data Jujitsu is all about saying “no” to our inner Rube Goldberg.

I talked at the start about getting clean data. It’s impossible to overstress this: 80% of the work in any data project is in cleaning the data. If you can come up with strategies for data entry that are inherently clean (such as populating city and state fields from a zip code), you’re much better off. Work done up front in getting clean data will be amply repaid over the course of the project.

A surprising amount of Data Jujitsu is about product design and user experience. If you can design your product so that users are predisposed to cut it some slack when it’s wrong (like the AIBO or, for that matter, the LinkedIn job recommendation engine), you’re way ahead. If you can enlist your users to help, you’re ahead on several levels: You’ve made the product more engaging, and you’ve frequently taken a shortcut around a huge data problem.

The key aspect of making a data product is putting the “product” first and “data” second. Saying it another way, data is one mechanism by which you make the product user-focused. With all products, you should ask yourself the following three questions:

  1. What do you want the user to take away from this product?
  2. What action do you want the user to take because of the product?
  3. How should the user feel during and after using your product?

If your product is successful, you will have plenty of time to play with complex machine learning algorithms, large computing clusters running in the cloud, and whatever you’d like. Data Jujitsu isn’t the end of the road; it’s really just the beginning. But it’s the beginning that allows you to get to the next step.

Strata Conference + Hadoop World — The O’Reilly Strata Conference, being held Oct. 23-25 in New York City, explores the changes brought to technology and business by big data, data science, and pervasive computing. This year, Strata has joined forces with Hadoop World. Save 20% on registration with the code RADAR20

Related:

 

June 08 2012

In defense of frivolities and open-ended experiments

My first child was born just about nine months ago. From the hospital window on that memorable day, I could see that it was surprisingly sunny for a Berkeley autumn afternoon. At the time, I'd only slept about three of the last 38 hours. My mind was making up for the missing haze that usually fills the Berkeley sky. Despite my cloudy state, I can easily recall those moments following my first afternoon laying with my newborn son. In those minutes, he cleared my mind better than the sun had cleared the Berkeley skies.

While my wife slept and recovered, I talked to my boy, welcoming him into this strange world and his newfound existence. I told him how excited I was for him to learn about it all: the sky, planets, stars, galaxies, animals, happiness, sadness, laughter. As I talked, I came to realize how many concepts I understand that he lacked. For every new thing I mentioned, I realized there were 10 more that he would need to learn just to understand that one.

Of course, he need not know specific facts to appreciate the sun's warmth, but to understand what the sun is, he must first learn the pyramid of knowledge that encapsulates our understanding of it: He must learn to distinguish self from other; he must learn about time, scale and distance and proportion, light and energy, motion, vision, sensation, and so on.

Anatomy of a sunset

I mentioned time. Ultimately, I regressed to talking about language, mathematics, history, ancient Egypt, and the Pyramids. It was the verbal equivalent of "wiki walking," wherein I go to Wikipedia to look up an innocuous fact, such as the density of gold, and find myself reading about Mesopotamian religious practices an hour later.

It struck me then how incredible human culture, science, and technology truly are. For billions of years, life was restricted to a nearly memoryless existence, at most relying upon brief changes in chemical gradients to move closer to nutrient sources or farther from toxins.

With time, these basic chemo- and photo-sensory apparatuses evolved; creatures with longer memories — perhaps long enough to remember where food sources were richest — possessed an evolutionary advantage. Eventually, the time scales on which memory operates extended longer; short-term memory became long-term memory, and brains evolved the ability to maintain a memory across an entire biological lifetime. (In fact, how the brain coordinates such memories is a core question of my neuroscientific research.)

Brain

However, memory did not stop there. Language permitted interpersonal communication, and primates finally overcame the memory limitations of a single lifespan. Writing and culture imbued an increased permanence to memory, impervious to the requirement for knowledge to pass verbally, thus improving the fidelity of memory and minimizing the costs of the "telephone game effect."

We are now in the digital age, where we are freed from the confines of needing to remember a phone number or other arbitrary facts. While I'd like to think that we're using this "extra storage" for useful purposes, sadly I can tell you more about minutiae of the Marvel Universe and "Star Wars" canon than will ever be useful (short of an alien invasion in which our survival as a species is predicated on my ability to tell you that Nightcrawler doesn't, strictly speaking, teleport, but rather he travels through another dimension, and when he reappears in our dimension the "BAMF" sound results from some sulfuric gasses entering our dimension upon his return).

But I wiki-walk digress.

So what does all of this extra memory gain us?

Accelerated innovation.

As a scientist my (hopefully) novel research is built upon the unfathomable number of failures and successes dedicated by those who came before me. The common refrain is that we scientists stand on the shoulders of giants. It is for this reason that I've previously argued that research funding is so critical, even for apparently "frivolous" projects. I've got a Google Doc noting impressive breakthroughs that emerged from research that, on the surface, has no "practical" value:

Although you can't legislate innovation or democratize a breakthrough, you can encourage a system that maximizes the probability that a breakthrough can occur. This is what science should be doing and this is, to a certain extent, what Silicon Valley is already doing.

The more data, information, software, tools, and knowledge available, the more we as a society can build upon previous work. (That said, even though I'm a huge proponent for more data, the most transformational theory from biology came about from solid critical thinking, logical, and sparse data collection.)

Of course, I'm biased, but I'm going to talk about two projects in which I'm involved: one business and one scientific. The first is Uber, an on-demand car service that allows users to request a private car via their smartphone or SMS. Uber is built using a variety of open software and tools such as Python, MySQL, node.js, and others. These systems helped make Uber possible.

Uber screenshot

As a non-engineer, it's staggering to think of the complexity of the systems that make Uber work: GPS, accurate mapping tools, a reliable cellular/SMS system, automated dispatching system, and so on. But we as a culture become so quickly accustomed to certain advances that, should our system ever experience a service disruption, Louis C.K. would almost certainly be prophetic about the response:

The other project in which I'm involved is brainSCANr. My wife and I recently published a paper on this, but the basic idea is that we mined the text of more than three million peer-reviewed neuroscience research articles to find associations between topics and search for potentially missing links (which we called "semi-automated hypothesis generation").

We built the first version of the site in a week, using nothing but open data and tools. The National Library of Medicine, part of the National Institutes of Health, provides an API to search all of these manuscripts in their massive, 20-million-paper-plus database. We used Python to process the associations, the JavaScript InfoVis Toolkit to plot the data, and Google App Engine to host it all. I'm positive when the NIH funded the creation of PubMed and its API, they didn't have this kind of project in mind.

That's the great thing about making more tools available; it's arrogant to think that we can anticipate the best ways to make use of our own creations. My hope is that brainSCANr is the weakest incarnation of this kind of scientific text mining, and that bigger and better things will come of it.

Twenty years ago, these projects would have been practically impossible, meaning that the amount of labor involved to make them would have been impractical. Now they can be built by a handful of people (or a guy and his pregnant wife) in a week.

Just as research into black holes can lead to a breakthrough in wireless communication, so too can seemingly benign software technologies open amazing and unpredictable frontiers. Who would have guessed that what began with a simple online bookstore would grow into Amazon Web Services, a tool that is playing an ever-important role in innovation and scientific computing such as genetic sequencing?

So, before you scoff at the "pointlessness" of social networks or the wastefulness of "another web service," remember that we don't always do the research that will lead to the best immediate applications or build the company that is immediately useful or profitable. Nor can we always anticipate how our products will be used. It's easy to mock Twitter because you don't care to hear about who ate what for lunch, but I guarantee that the people whose lives were saved after the Haiti earthquake or who coordinated the spark of the Arab Spring are happy Twitter exists.

While we might have to justify ourselves to granting agencies, or venture capitalists, or our shareholders in order to do the work we want to do, sometimes the "real" reason we spend so much of our time working is the same reason people climb mountains: because it's awesome that we can. That said, it's nice to know that what we're building now will be improved upon by our children in ways we can't even conceive.

I can't wait to have this conversation with my son when — after learning how to talk, of course — he's had a chance to build on the frivolities of my generation.

Related:

October 20 2011

Jason Huggins' Angry Birds-playing Selenium robot

I've used Selenium on several Java projects, so I was just assuming that the topic of Selenium would be germane to JavaOne. I sent the co-creator of Selenium, Jason Huggins (@hugs), a quick email to see if he was interested in talking to us on camera about Selenium and Java, and he responded with a quick warning: He wasn't into Java. "Python and JavaScript (and to a lesser extent, CoffeeScript and Hypertalk) are my true passions when it comes to programming," he wrote. I thought this was fair enough — very few people could call Java "a passion" at this point — and I could do my best to steer the conversation toward Java. Selenium can be scripted in whatever language, and I was convinced that we needed to include some content about testing in our interviews.

He also was wondering if he could talk about something entirely different: "a Selenium-powered, 'Angry Birds'-playing mobile-phone-testing robot." While I had initially been worried I'd have to sit for several hours of interviews about Component Dependency Ennui 4.2, here was an interesting guy that wanted to not only demonstrate his "Angry Birds"-playing robot but also relate it to his testing-focused startup Saucelabs. I welcomed the opportunity, and here's the result:

From what I could gather, Huggins' bot is driving two servo motors that control a retractable "dowel" finger covered in some sort of skin-like material that can fool the capacitive touch sensor of a mobile device. He sends keystroke commands through this Arduino-based controller, which then sends signals to two servo motors. The frame of the device is made of what looks like balsa wood. He's calling it a "BitBeamBot." You can find out all about it here and you can see it in action in the following video:

Relating BitBeamBot to Saucelabs and Selenium

In the course of the interview it became clear that BitBeamBot was the product of an off-time project. Here's how Huggins explained it: Imagine a wall of these retractable dowels, each representing a single pixel. if you could create a system to control these dowels, then you could draw pictures with a controller.

While working on this project, Huggins attended a Maker Faire and found some suitable technology. His creation of a single-arm controller then led to his big "eureka" moment: This same technology could create a robot that can play "Angry Birds," and if a contraption can play "Angry Birds," it's a simple leap to create a system that can test any mobile application in the real world.

Huggins went through a similar discovery process with Selenium. Selenium is a contraption that supports and contains a browser. You feed a series of instructions and criteria to a browser and then you measure the output.

With BitBeamBot, Huggins has taken the central software idea that he developed at Thoughtworks and applied it to the physical world. He envisions a service from Saucelabs, the company he co-founded, where customers would pay to have mobile applications tested in farms of these mobile testing robots.

Saucelabs

Saucelabs is focused on the idea that testing infrastructure is often more expensive to set up and maintain than most companies realize. The burden of maintaining an infrastructure of browsers and machines can often exceed the effort required to support a production network.

With Saucelabs you can move your testing infrastructure to the cloud. The company offers a service that executes testing scripts on cloud-based hardware. For a few dollars you can run a suite of unit tests against an application without having to worry about physical hardware and ongoing maintenance. Saucelabs is trying to do for testing what Amazon EC2 and other services have done for hosting.

Toward the end of the interview (contained in the first video, above) we also discussed some interesting recent developments at Saucelabs, including a new system that uses SSH port forwarding to allow Saucelabs' testing infrastructure to test internal applications behind a corporate firewall.

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

Related:

July 06 2011

Don't put all your trust in mobile emulators

Screenshot-Android Emulator (Default:5554).png by BryanKemp, on FlickrThe development of mobile best practices and tools to measure mobile performance is just starting to heat up.

In a recent interview, Steve Souders, performance evangelist at Google, said we're going to see an increased abilities to debug, profile, and iterate with code remotely in the mobile space. However, Souders warned against putting too much faith in mobile emulators and simulators:

I think we're all pretty much down with being very cautious about using emulators. Using an emulator, or even a user-agent switcher or switching your WebKit to have a user agent cloaking as a mobile phone, makes things easier. And that's totally cool to do once you verify that the thing you're trying to optimize or tweak is unaffected by using an emulator or a user-agent switcher.

If it turns out that you've just spent a couple days or a week optimizing something using an emulator, then when you actually run that code on the mobile device and it doesn't work, you'll feel really bad you wasted that time. I always look at the test I'm trying to do, or the behavior I'm trying to diagnose, and test it on the mobile device first as much as I can. I verify that it behaves in a particular way, then I'll see if it behaves that same way on something on my desktop. If it does, then I'll do a lot of my work there [on the desktop], but every couple hours I'll go back and make sure what I'm finding still works that way on the actual device.

For more of Souders' thoughts on mobile optimization best practices and the evolution of performance tools, check out the the entire interview in the following video:

Android Open, being held October 9-11 in San Francisco, is a big-tent meeting ground for app and game developers, carriers, chip manufacturers, content creators, OEMs, researchers, entrepreneurs, VCs, and business leaders.

Save 20% on registration with the code AN11RAD

Photo: Screenshot-Android Emulator (Default:5554).png by BryanKemp, on Flickr



Related:


  • The state of speed and the quirks of mobile optimization
  • 10 ways to botch a mobile app
  • Mobile apps and development platforms get more consumer centric
  • To the end of bloated code and broken websites



  • June 30 2011

    How Netflix handles all those devices

    Netflix's shift to streaming delivery has made quite an impression on Internet traffic. According to Sandvine's latest report, Netflix now claims almost 30% of peak downstream traffic in North America.

    That traffic occurs, in no small part, because Netflix can run on so many devices — PCs, tablets, gaming consoles, phones, and so on. In the following interview, Netflix's Matt McCarthy (@dnl2ba) shares a few lessons from building across those varied platforms. McCarthy and co-presenter Kimberly Trott will expand on many of these same topics during their session at next month's OSCON.

    What are some of the user interface (UI) challenges that Netflix faces when working across devices?

    Matt McCarthyMatt McCarthy: Scaling UI performance to run well on a low-cost Blu-ray player and still take advantage of a PlayStation 3's muscle has required consulting WebKit and hardware experts, rewriting components that looked perfectly good a week before, and patiently tuning cache sizes and animations. There's no silver bullet.

    Since we've standardized on WebKit, we don't have to support multiple disparate rendering engines, DOM API variants, or script engines. However, there are lots of complex rendering scenarios that are difficult to anticipate and test, especially now that we're starting to take advantage of WebKit accelerated compositing. There are WebKit test suites, but none that are both comprehensive and well documented, so we're working on our own test suite that we can use to validate partners' ports of our platform.

    OSCON JavaScript and HTML5 Track — Discover the new power offered by HTML5, and understand JavaScript's imminent colonization of server-side technology.

    Save 20% on registration with the code OS11RAD

    How do the platform lessons Netflix has learned apply to other developers?

    Matt McCarthy: The challenges we face may be familiar to many large-scale AJAX application developers. In addition, mobile developers need to make similar trade-offs between memory usage and performance, other sophisticated user interfaces need to handle UI state, and most large code bases can benefit from good abstraction, encapsulation, and reuse.

    The urgency and difficulty of solving those challenges may differ for different applications, of course. If your application is very simple, it would be silly for you to use the level of abstraction we've implemented to support A/B testing in Netflix device UIs. But if you're innovating heavily on user experience, your performance isn't always what you'd like, and your UI is an endless font of race conditions and application state bugs, then maybe you'd like to learn about our successes and mistakes.

    There were reports last year that some Netflix PS3 users were seeing several different UIs. What are the benefits and challenges with this kind of A/B testing?

    Matt McCarthy: Netflix is a subscriber service, so ultimately what we care about is customer retention. But retention, by definition, takes a long time to measure. We use proxy metrics that correlate well with retention. Some of our most closely watched metrics have to do with how many hours of content customers stream per month. Personally, I find it gratifying to have business interests that are aligned closely with our customers' interests.

    The challenges grow as the A/B test matrix grows, since the number of test cell combinations scales geometrically with the number of tests. Our quality assurance team has been working on automated tests to detect regressions so a fancy new feature doesn't inadvertently break another feature that launched last month. Our engineers adhere to a number of best practices, e.g. defining, documenting, and adhering to interfaces so we don't find nasty surprises when we replace a UI component in a test cell.

    A/B testing user interfaces obviously takes a lot more effort than developing our "best bet" UI and calling it a day, but it's been well worth the cost. We've already been surprised a few times by TV UI test results, and it's changed the direction we've taken in new UI tests for both TV devices and our website. Every surprise validates our approach, and it shows us a new way to delight and retain more customers.

    This interview was edited and condensed.



    Related:


    May 10 2011

    Process kills developer passion

    The other day, at lunch, I had a bit of an epiphany. I also had the pulled pork, but that's another story. In any event, something came into clarity that had been bothering me below the surface for a long time.

    Over the past few years, the software industry has become increasingly focused on process and metrics as a way to ensure "quality" code. If you were to follow all the best practices now, you would be:

    • Doing full TDD, writing your tests before you wrote any implementing code.
    • Requiring some arbitrary percentage of code coverage before check-in.
    • Having full code reviews on all check-ins.
    • Using tools like Coverity to generate code complexity numbers and requiring developers to refactor code that has too high a complexity rating.

    In addition, if your company has drunk the Scrum Kool-Aid, you would also be spending your days:

    • Generating headlines, stories and tasks.
    • Grooming stories before each sprint
    • Sitting through planning sessions.
    • Tracking your time to generate burn-down charts for management.

    In short, you're spending a lot of your time on process, and less and less actually coding the applications. I've worked on some projects where the test cases took two- or three-times as much time to code as the actual code, or where having to shoehorn in shims to make unit tests work has reduced the readability of the code. I've also seen examples of developers having to game the tools to get their line coverage or code complexity numbers to meet targets.

    The underlying feedback loop making this progressively worse is that passionate programmers write great code, but process kills passion. Disaffected programmers write poor code, and poor code makes management add more process in an attempt to "make" their programmers write good code. That just makes morale worse, and so on.

    OSCON 2011 — Join today’s open source innovators, builders, and pioneers July 25-29 as they gather at the Oregon Convention Center in Portland, Ore.

    Save 20% on registration with the code OS11RAD

    Now, I'm certainly not advocating some kind of Wild-West approach where nothing is tested, developers code what they want regardless of schedule, etc. But the blind application of process best practices across all development is turning what should be a creative process into chartered accountancy with a side of prison. While every one of these hoops looks good in isolation (except perhaps Scrum ...), making developers jump through all of them will demoralize even the most passionate geek.

    I don't have a magic bullet here, but companies need to start acknowledging that there is a qualitative difference between developers. Making all of them wear the same weighted yokes to ensure the least among them doesn't screw up is detrimental to overall morale and efficiency of the whole.

    Now, this may sound a little arrogant: "I'm an experienced developer, I don't need any of these new-fangled practices to make my code good." But, for example, maybe junior (or specialized) developers should be writing the unit tests, leaving the more seasoned developers free to concentrate on the actual implementation of the application. Maybe you don't need to micro-manage them with daily updates to VersionOne to make sure they're going to make their sprint commitments. Perhaps an over-the-shoulder code review would be preferable to a formal code review process.

    And as an aside, if you're going to say you're practicing agile development, then practice agile development! A project where you decide before you start a product cycle the features that must be in the product, the ship date, and the assigned resources is a waterfall project. Using terms like "stories" and "sprints" just adds a crunchy agile shell, and it's madness to think anything else. And frankly, this is what has led to the entire Scrum/burndown chart mentality, because development teams aren't given the flexibility to "ship what's ready, when it's ready."

    Unless the problems I'm talking about are addressed, I fear that the process/passion negative feedback loop is going to continue to drag otherwise engaged developers down into a morass of meetings and metrics-gaming.

    January 10 2011

    Four short links: 10 January 2011

    1. Tools and Practices for Working Virtually -- a detailed explanation of how the RedMonk team works virtually.
    2. Twitter Accounts for All Stack Overflow Users by Reputation (Brian Bondy) -- superawesome list of clueful people.
    3. The Wonderful World of Early Computing -- from bones to the ENIAC, some surprising and interesting historical computation devices. (via John D. Cook)
    4. Overlapping Experiment Infrastructure (PDF) -- they can't run just one test at a time, so they have infrastructure to comprehensively test all features against all features and in real time pull out statistical conclusions from the resulting data. (via Greg Linden)

    January 05 2011

    September 14 2010

    iPod program helps school test scores

    Last month, we had an exceptional panel talking about Mobile in Education at our largest Mobile Portland meeting ever. A report on how iPod Touches are making huge differences in third-grade test scores really stuck with me.

    Joe Morelock, the director of technology and innovation for the Canby School District in Oregon, shared with us how Canby started a pilot program of iPod Touch devices in a single third-grade classroom. The pilot's success led to the district setting a goal of providing every third-grade student with access to an iPod Touch.

    Morelock has documented the program in a presentation you can download from the school district's wiki.

    Below, I've pulled out a few slides from Morelock's presentation that illustrate the remarkable improvements. These charts start to explain why the school district got behind the program so quickly.

    The charts compare the performance of third graders throughout the Canby school district with those whose classroom used iPod Touches throughout the year. As you can see in the chart below, the number of students that meet or nearly meet the math requirements on a standardized test are much higher for the iPod Touch classroom (left circle).



    Pie charts comparing math scores of students with iPod Touches with those throughout the district



    The difference in performance is striking when looking at students with disabilities (below, left column):



    Migrant and ELL students



    The increase in test scores for students with disabilities appears to validate some of the early anecdotal reports that iPhones and iPod Touches were making a difference for children and adults with autism.

    The program also had a positive affect on English language learners (below, right column):

    Students w/ disabilities, minorities


    And it's not just math scores. Here are reading test results from the same classroom:

    Reading test scores

    Reading test scores continued

    Parents whose children have been exposed to iPod Touches in the classroom don't like the idea that their children may not have them when they move on to the next school year, so they're organizing fundraisers to purchase additional devices. Because iPod Touches are relatively inexpensive, five can be purchased for the same price that would have been required to purchase a single laptop.

    The Canby School District is extending the iPod program by providing iPod Touches for all third graders district-wide during the 2010-2011 school year. In addition, pilot programs using iPads will run at the elementary-, middle- and high-school levels.

    Perhaps most importantly, both students and teachers love using the devices:

    You know that little boy who came up to us this morning? He loves the iPod Touches. They have made an incredible difference in his math work. He has Asperger’s, and before the iPods, he could never sit through a math class. The kid absolutely loves math now and gets As. He sits himself up at the front of the room -- he likes to be by himself -- tucks his foot up, leans on the desk and goes to town on math. It's simply amazing. -- Gale Hipp, sixth-grade math teacher. [Note: Link added.]

    And simply:

    This is the most fun I have had teaching in the last 25 years. -- Deana Calcagno, fifth-grade teacher.


    The full panel discussion is available in the following video. Morelock's segment on the Canby School District and their iPod pilot program starts at 19:20.




    Related:

    June 10 2010

    Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
    Could not load more posts
    Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
    Just a second, loading more posts...
    You've reached the end.

    Don't be the product, buy the product!

    Schweinderl