Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

April 13 2012

Developer Week in Review: Everyone can program?

I'm devoting this week's edition of the WIR to a single news item. Sometimes something gets stuck in my craw, and I have to cough it out or choke on it (hopefully none of you are reading this over lunch ...).

Yet another attempt to create programming for dummies ...

Just today, I came across a news article discussing a recent Apple patent application for a technology to allow "non-programmers" to create iOS applications.

This seems to be the holy grail of software design, to get those pesky overpaid software developers out of the loop and let end-users create their own software. I'll return to the particulars of the Apple application in a moment, but first I want to discuss the more general myth, because it is a myth, that there's some magic bullet that could let lay people create applications.

The underlying misunderstanding is that it is something technical that is standing between "Joe Sixpack" and the software of his dreams. The line of reasoning goes that because languages are hard to understand and require specialized knowledge, there's a heavy learning curve before a new person could be productive. In reality, the particulars of a specific platform are largely irrelevant to whether a skilled software engineer can be productive in it, though there are certainly languages and operating systems that are easier to code for than others. But the real difference between a productive engineer and a slow one lies in how good the engineer is at thinking about software, not C or Java or VB.

Almost without exception, any software engineer I talk to thinks it's insane when an employer would rather hire someone with two years total experience, all of it in a specific language, rather than one with 10 years experience in a variety of languages, all other factors being equal. When I think about a problem, I don't think in Java or Objective-C, I think in algorithms and data structures. Then, once I understand it, I implement it in whatever language is appropriate.

I believe that a lot of the attitude one sees toward software engineering — that it's an "easy" profession that "anyone" could do if it weren't for the "obfuscated" technology — comes from the fact that it's a relatively well-paid profession that doesn't require a post-graduate degree. I match or out-earn most people slaving away with doctorates in the sciences, yet I only have a lowly bachelors, and not even a BS. "Clearly," we must be making things artificially hard so we can preserve our fat incomes.

In a sense, they are right, in that it doesn't take huge amounts of book learning to be a great programmer. What it takes is an innate sense of how to break apart problems and see the issues and pitfalls that might reach out to bite you. It also takes a certain logical bent of mind that allows you to get to the root of the invariable problems that are going to occur.

Really good software engineers are like great musicians. They have practiced their craft, because nothing comes for free, but they also have a spark of something great inside them to begin with that makes them special. And the analogy is especially apt because while there are always tools being created to make it easier for "anyone" to create music, it still takes a special talent to make great music.

Which brings us back to Apple's patent. Like most DWIM (do what I mean) technologies for programming, it handles a very specific and fairly trivial set of applications, mainly designed for things like promoting a restaurant. No one is going to be writing "Angry Birds" using it. Calling it a tool to let anyone program is like saying that Dreamweaver lets anyone create a complex ecommerce site integrated into a back-end inventory management system.

The world is always going to need skilled software engineers, at least until we create artificial intelligences capable of developing their own software based on vague end-user requirements. So, we're good to go for at least the next 10 years.

Fluent Conference: JavaScript & Beyond — Explore the changing worlds of JavaScript & HTML5 at the O'Reilly Fluent Conference (May 29 - 31 in San Francisco, Calif.).

Save 20% on registration with the code RADAR20

Got news?

Please send tips and leads here.


January 05 2012

Understanding randomness is a double-edged sword

The Drunkard's Walk coverLeonard Mlodinow's "The Drunkard's Walk: How Randomness Rules our Lives" is a great book on an important subject. As data scientists know, random phenomenon are everywhere, and humans don't understand them well. We're not wired to understand them well. This book is a huge help, and will be a relief to anyone who's heard people say "I don't believe in global warming because last winter we got a lot of snow," or some load of crap like that. The book is well written, there's a lot of storytelling, and the storytelling is fun and interesting. Along the way Mlodinow gives coherent explanations of Bayes' theorem, the Monty Hall problem (offering the simplest correct explanation I've ever seen), the origins of statistics and more. If you want an excellent non-mathematical introduction to probabilistic thinking, this is the book to get. (If you want the mathematics, this book studiously avoids equations. Get William Feller's "An Introduction to Probability Theory and Its Applications" for the deeper material.)

But there's always a but. But, but, but ...

I have two problems with "The Drunkard's Walk." They've been nagging me ever since I finished.

First, Mlodinow spends a lot of time debunking the notion of "hot streaks." He's right, and that's important: most hot streaks in sports and elsewhere can be adequately explained by randomness. Randomness is inherently streaky and clumpy; it's not just a smooth gray. In fact, if you get something that looks smooth and "random," it's almost certainly not random. So far, so good. But — when he moves from Roger Maris' record-breaking season to portfolio managers picking hot stocks, there's a fundamental asymmetry.

With Maris, the author starts with the long-term batting average. We're not just "flipping coins"; we're flipping a weighted coin, a coin that happens to land with the "home run" side facing up a lot more frequently than it would if I were in the batter's box. That's all well and good. If I faced a season's worth of professional baseball pitching, I daresay I wouldn't get a single hit, let alone any home runs. But — and this is important — he doesn't do the same for the stock pickers, book acquisition editors, or Hollywood movie execs that he talks about. For them, it's just flipping coins. And it's one thing to say that, if you just flipped coins for 10 years, you'd have a 75% chance of duplicating a great financial manager's performance over some five-year period. It's another thing to imply that the manager's performance is just a matter of luck, not skill. Yes, there is a lot of luck involved, but where's the notion of baseline performance, of long term success or failure, that was the starting point for analyzing Maris' hot year? Maris' hot year may have been a random phenomenon, but it was a random phenomenon in the context of five years hitting more than 20 home runs per season, during which his cumulative batting average was somewhere around .271. What's the stock picker's cumulative batting average? Who are the other financial analysts working at the same level? We never find out. And that's a big part of the story to omit.

Second, Mlodinow frequently forgets one of the most important aspects of the mathematical study of random processes. When we're talking probability and statistics, we're talking about interchangeable events. It's easy to forget this, but as Mlodinow himself points out, there are many, many ways to make important mistakes when you're talking about probability. The important thing about urns with black and white balls is that the balls are the same. (If you don't know about urns, take a probability course or read the book; they're baked into the history of probability theory.) If some of the balls were ovals and some were star-shaped, these probability experiments wouldn't work.

So, back again to the stock pickers, the acquisitions editors, and the Hollywood execs. We agree at some level that all at-bats in baseball are equivalent. This is, of course, an idealization, but it's one we're fairly comfortable with. But all stocks are not the same, all books are not the same, and all movies are not the same. They may be the same within a certain class (energy stocks, cheap romance novels, spy movies). A stock analyst who's good with financials may have nothing to say about manufacturing. But at the high end of the spectrum (literary novels, fine wines, art movies), everything is unique, precisely in a way that Harlequin romances aren't. Probability and statistics are still powerful tools, but you have to be very careful about how you apply them.

Since I'm in the publishing business, I'm particularly annoyed by the story of an editor who, in an experiment, was given a typewritten chapter of a V. S. Naipaul novel that had won a major award. She rejected it. I'm not a fan of Naipaul, so I'm sympathetic. But is that evidence of her editorial skill (or lack thereof), or of random processes? Since we're now in a world where every event is unique we have to ask more questions: What publisher was she working for? Grove Press, which publishes top drawer literary fiction with a tendency toward the avant garde (for whom Naipaul might have been too stodgy)? Or Bantam, which specializes in lightweight beach-side reading? In both cases, a rejection would have been perfectly appropriate. Probability aside, it's a cheap shot to say: "Because this book won a major award, we'd expect editors at a publishing company to accept it. If they don't, that's evidence that publishing is a random process."

Publishing (and movies, and wines, and maybe even stocks) are a different world, and the disagreements are precisely what is important. Modeling disagreement as random fluctuation isn't doing anyone a service. I may dislike Naipaul's fiction, but I hardly see that as a random result. We could ask about the conditional probability that an English major will dislike Naipaul, given that the English major plays piano, has a strong background in electrical engineering and mathematics, and likes Salman Rushdie, and use that to come up with some sort of number. But I'd have no idea what that number means. We're not picking black and white balls out of urns here — or if we are, the balls are of different shapes and sizes.

Am I just going back to the human tendency to build stories where there is nothing but randomness? Am I just refusing to deal with the stark realities of random phenomenon that surround us everywhere? Perhaps. Then again, that's what makes us human. And in the many situations where probability and statistics aren't appropriate tools, such as picking books or movies, then all we have to fall back on is our ability to make stories, our ability to make sense. Where "make" is precisely the most important word in that last sentence.

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20


There's an important, but subtle, distinction to be made between events that can be modelled by random processes and events that are actually random processes. Mlodinow makes this same distinction in his discussion of Maris. At one point, he grants that a hot or cold streak could be the result of changes in exercise, eating habits, personal stress, or any number of non-random factors; but since we can't account for these, he rolls them up into randomness. So, essentially he's saying that a hot streak can be modelled as a random process, though it may have an underlying cause that isn't random at all. Say Maris signed a contract to appear on the front of Wheaties boxes, and decided that he might as well eat the stuff. And say that eating Wheaties actually did increase his slugging percentage significantly. If so, betting heavily on Maris during a hot streak might not be such a bad idea, since he's not just hitting well because he happens to be lucky. And if so, I would still bet heavily that Maris' record-breaking year could be modelled as a random process. After all, probability and statistics are very blunt instruments.

Rolling up potentially non-random factors that can't be measured into "randomness" is a common trick, and reasonably acceptable. You can't analyze what you don't know. But it's a trick that worries me. Let's take a situation that I think is similar, but with much more profound consequences. A decade or so ago, it was well-known that Tamoxifen was a useful drug against breast cancer, effective in roughly 80% of all cases. That's equivalent to saying that Tamoxifen has an .800 batting average. You could model Tamoxifen's success by flipping a coin that came up heads 80% of the time.

But more recent research has revealed that Tamoxifen's story isn't random; at least, not random in that way. It's successful almost 100% of the time on patients with certain genetics and almost 0% of the time on other patients. In other words, the randomness is in the stream of incoming patients, not the effects of the drug. That discovery has a huge practical effect on breast cancer treatment. You can do tests to figure out whether treating a patient with Tamoxifen is likely to be successful, or a waste of time. You can also look in a more focused way for treatments that will be effective on the remaining 20%. Even more important: It's my belief that the next generation of medicine will be "personalized." Rather than using drugs that have been successful in broad clinical trials involving thousands of patients, we'll be focusing on drugs that are tuned to an individual's genetic makeup. Is it possible that the drug that would be effective on the 20% of women who don't respond to Tamoxifen has already been discovered and discarded, because its success rate wasn't statistically significant? Is it possible that there's a drug that's 100% effective on only 5%? Or 1%? What methods will we use to evaluate the performance of these drugs?

Understanding randomness is a double-edged sword. Humans are built to create patterns, even when there's nothing going on but random phenomena. Granted, that's an extremely important story, and Mlodinow does an excellent job of telling it. At the same time, we are wired to create stories, and can't afford to let randomness stop us from doing so, particularly when a story that gives a richer understanding of the data is just beyond our grasp. Understanding what is random and what is not (or, more precisely stated, understanding what parts of any processes are really random) is the key. While humans are all too willing to grasp at the straws of a story when there's no story there (just go to any casino), we can also throw out the stories we haven't yet finished because we're convinced there's nothing there. And that's a tragedy.

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.
No Soup for you

Don't be the product, buy the product!

YES, I want to SOUP ●UP for ...