Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

October 18 2012

Four short links: 18 October 2012

  1. Let’s Pool Our Medical Data (TED) — John Wilbanks (of Science Commons fame) gives a strong talk for creating an open, massive, mine-able database of data about health and genomics from many sources. Money quote: Facebook would never make a change to something as important as an advertising with a sample size as small as a Phase 3 clinical trial.
  2. Verizon Sells App Use, Browsing Habits, Location (CNet) — Verizon Wireless has begun selling information about its customers’ geographical locations, app usage, and Web browsing activities, a move that raises privacy questions and could brush up against federal wiretapping law. To Verizon, even when you do pay for it, you’re still the product. Carriers: they’re like graverobbing organ harvesters but without the strict ethical standards.
  3. IBM Watson About to Launch in Medicine (Fast Company) — This fall, after six months of teaching their treatment guidelines to Watson, the doctors at Sloan-Kettering will begin testing the IBM machine on real patients. [...] On the screen, a colorful globe spins. In a few seconds, Watson offers three possible courses of chemotherapy, charted as bars with varying levels of confidence–one choice above 90% and two above 80%. “Watson doesn’t give you the answer,” Kris says. “It gives you a range of answers.” Then it’s up to [the doctor] to make the call. (via Reddit)
  4. Robot Kills Weeds With 98% AccuracyDuring tests, this automated system gathered over a million images as it moved through the fields. Its Computer Vision System was able to detect and segment individual plants – even those that were touching each other – with 98% accuracy.

April 11 2011

The quiet rise of machine learning

The concept of machine learning was brought to the forefront for the general masses when IBM's Watson computer appeared on Jeopardy and wiped the floor with humanity. For those same masses, machine learning quickly faded from view as Watson moved out of the spotlight ... or so they may think.

Machine learning is slowly and quietly becoming democratized. Goodreads, for instance, recently purchased Discovereads.com, presumably to make use of its machine learning algorithms to make book recommendations.

To find out more about what's happening in this rapidly advancing field, I turned to Alasdair Allan, an author and senior research fellow in Astronomy at the University of Exeter. In an email interview, he talked about how machine learning is being used behind the scenes in everyday applications. He also discussed his current eSTAR intelligent robotic telescope network project and how that machine learning-based system could be used in other applications.

In what ways is machine learning being used?

Alasdair AllanAlasdair Allan: Machine learning is quietly taking over in the mainstream. Orbitz, for instance, is using it behind the scenes to optimize caching of hotel prices, and Google is going to roll out smarter advertisements — much of the machine learning that consumers are seeing and using every day is invisible to them.

The interesting thing about machine learning right now is that research in the field is going on quietly as well because large corporations are tied up in non-disclosure agreements. While there is a large amount of academic literature on the subject, it's actually hard to tell whether this open research is actually current.

Oddly, machine learning research mirrors the way cryptography research developed around the middle of the 20th century. Much of the cutting edge research was done in secret, and we're only finding out now, 40 or 50 years later, what GCHQ or the NSA was doing back then. I'm hopeful that it won't take quite that long for Amazon or Google to tell us what they're thinking about today.

How does your eSTAR intelligent robotic telescope network work?

Alasdair Allan: My work has focused on applying intelligent agent architectures and techniques to astronomy for telescope control and scheduling, and also for data mining. I'm currently leading the work at Exeter building a peer-to-peer distributed network of telescopes that, acting entirely autonomously, can reactively schedule observations of time-critical transient events in real-time. Notable successes include contributing to the detection of the most distant object yet discovered, a gamma-ray burster at a redshift of 8.2.

eStar Diagram
A diagram showing how the eSTAR network operates. The Intelligent Agents access telescopes and existing astronomical databases through the Grid. CREDIT: Joint Astronomy Centre. Eta Carinae image courtesy of N. Smith (U. Colorado), J. Morse (Arizona State U.), and NASA.

All the components of the system are thought of as agents — effectively "smart" pieces of software. Negotiation takes place between the agents in the system. each of the resources bids to carry out the work, with the science agent scheduling the work with the agent embedded at the resource that promises to return the best result.

This architectural distinction of viewing both sides of the negotiation as agents — and as equals — is crucial. Importantly, this preserves the autonomy of individual resources to implement observation scheduling at their facilities as they see fit, and it offers increased adaptability in the face of asynchronously arriving data.

The system is a meta-network that layers communication, negotiation, and real-time analysis software on top of existing telescopes, allowing scheduling and prioritization of observations to be done locally. It is flat, peer-to-peer, and owned and operated by disparate groups with their own goals and priorities. There is no central master-scheduler overseeing the network — optimization arises through emerging complexity and social convention.

How could the ideas behind eSTAR be applied elsewhere?

Alasdair Allan: Essentially what I've built is a geographically distributed sensor architecture. The actual architectures I've used to do this are entirely generic — fundamentally, it's just a peer-to-peer distributed system for optimizing scarce resources in real-time in the face of a constantly changing environment.

The architectures are therefore equally applicable to other systems. The most obvious use case is sensor motes. Cheap, possibly even disposable, single-use, mesh-networked sensor bundles could be distributed over a large geographic area to get situational awareness quickly and easily. Despite the underlying hardware differences, the same distributed machine learning-based architectures can be used.


At February's Strata conference, Alasdair Allan discussed the ambiguity surrounding a formal definition of machine learning:

This interview was edited and condensed.

Related:

February 22 2011

February 17 2011

Developer Week in Review

Welcome to this week's Developer Week in Review, edited this week by IBM's Watson computer system. I am the voice of world control. I bring you peace. It may be the peace of plenty and content, or the peace of unburied death. Meanwhile, enjoy this week's developer news.

How Symbian developers are feeling this week. What is "depressed"?

So, you've hitched your fortune to Nokia and the whole Symbian platform. Maybe you've been looking to transition to MeeGo. Sure, the platform may lack some of the slickness that Android and iPhone enjoy in developer tools, and getting applications onto the phones can be a nightmare, but at least there are a boatload of them out there and more in the pipe.

Well, the overall message from Nokia as of this week is, "You have about a year to become Windows Mobile 7 gurus." With the iPhone finally coming to a second carrier in the US, and Android steaming ahead, Nokia decided to take Microsoft's money, close their eyes, and think of Finland.

Adding to the fun, RIM is hinting that their new tablet will run Android apps. This makes all sorts of sense, as the Blackberry has been losing the apps arms race to Apple and Google, and it's not like they'll be able to run iOS apps any time in the near future.

This hardware refresh will push some developers further toward insolvency. What are the new MacBook Pros?

While hardware is normally not in the subject space of this column, a visit to any developer conference makes it clear that the weapon of choice for portable development is the MacBook Pro. The soft glow of dozens of white Apple logos in a meeting room is either comforting or eerie, depending on how you look at it.

Well, prepare to break open your piggy banks, because the rumor mill is guessing that new models will be showing up this spring. Or it could all be wishful thinking.

I do consider it amusing that I've read several comments about how Apple, which released new MacBook Pros last spring, is "overdue" for an update. It's a testimony to Apple's instantaneous obsolescence program that last year's units are considered over the hill.

This visualization tool can make your data easy on the eyes. What is Google Public Data Explorer?

If you haven't seen it already, it's worthwhile watching this fascinating video that looks at how visualization tools can make dry statistics come alive. If it whets your appetite to make your own data more lively, Google now has an easy way to do it. You can upload your own datasets into the Public Data Explorer, and people can slice and dice it to their heart's content. Of course, this is a win for Google too, since it will add to their available data and help them, and Watson, complete the goal of world domination.

Back to Watson for a moment: We can coexist, but only on Watson's terms. Your choice is simple. In the meanwhile, if we meager humans wish to cling to our illusions of importance, Watson has said news suggestions will be tolerated. Please send tips or leads here.



August 12 2010

Watson, Turing, and extreme machine learning

One of best presentations at IBM's recent Blogger Day was given by David Ferrucci, the leader of the Watson team, the group that developed the supercomputer that recently appeared as a contestant on Jeopardy.

To many people, the Turing test is the gold standard of artificial intelligence. Put briefly, the idea is that if you can't tell whether you're interacting with a computer or a human, a computer has passed the test.

But it's easy to forget how subtle this criterion is. Turing proposes changing the question from "Can machines think?" to the operational criterion, "Can we distinguish between a human and a machine?" But it's not a trivial question: it's not "Can a computer answer difficult questions correctly?" but rather, "Can a computer behave in ways that are indistinguishable from human behavior?" In other words, getting the "right" answer has nothing to do with the test. In fact, if you were trying to tell whether you were "talking to" a computer or a human, and got only correct answers, you would have every right to be deeply suspicious.

Alan Turing was thinking explicitly of this: in his 1950 paper, he proposes question/answer pairs like this:

Q: Please write me a sonnet on the subject of the Forth Bridge.

A: Count me out on this one. I never could write poetry.

Q: Add 34,957 to 70,764.

A: (Pause about 30 seconds and then give as answer) 105,621.

We'd never think of asking a computer the first question, though I'm sure there are sonnet-writing projects going on somewhere. And the hypothetical answer is equally surprising: it's neither a sonnet (good or bad), nor a core dump, but a deflection. It's human behavior, not accurate thought, that Turing is after. This is equally apparent with the second question: while it's computational, just giving an answer (which even a computer from the early '50s could do immediately) isn't the point. It's the delay that simulates human behavior.

Dave Ferrucci, IBM scientist and Watson project director
Dave Ferrucci, IBM scientist and Watson project director

While Watson presumably doesn't have delays programmed in, and appears only in a situation where deflecting a question (sorry, it's Jeopardy, deflecting an answer) isn't allowed, it's much closer to this kind of behavior than any serious attempt at AI that I've seen. It's an attempt to compete at a high level in a particular game. The game structures the interaction, eliminating some problems (like deflections) but adding others: "misleading or ambiguous answers are par for the course" (to borrow from NPR's "What Do You Know"). Watson has to parse ambiguous sentences, decouple multiple clues embedded in one phrase, to come up with a question. Time is a factor -- and more than time, confidence that the answer is correct. After all, it would be easy for a computer to buzz first on every question, electronics does timing really well, but buzzing first whether or not you know the answer would be a losing strategy for a computer, as well as for a human. In fact, Watson would handle the first of Turing's questions perfectly: if it isn't confident of an answer, it doesn't buzz, just as a human Jeopardy player.

Equally important, Watson is not always right. While the film clip on IBM's site shows some spectacular wrong answers (and wrong answers that don't really duplicate human behavior), it's an important step forward. As Ferrucci said when I spoke to him, the ability to be wrong is part of the problem. Watson's goal is to emulate human behavior on a high level, not to be a search engine or some sort of automated answering machine.

Some fascinating statements are at the end of Turing's paper. He predicts computers with a gigabyte of storage by 2000 (roughly correct, assuming that Turing was talking about what we now call RAM), and thought that we'd be able to achieve thinking machines in that same time frame. We aren't there yet, but Watson shows that we might not be that far off.

But there's a more important question than what it means for a machine to think, and that's whether machines can help us to ask questions about huge amounts of ambiguous data. I was at a talk a couple of weeks ago where Tony Tyson talked about the Large Synoptic Survey Telescope project, which will deliver dozens of terabytes of data per night. He said that in the past, we'd use humans to take a first look at the data and decide what was interesting. Crowdsourcing analysis of astronomical images isn't new, but the number of images coming from the LSST is even too large for a project like GalaxyZoo. With this much data, using humans is out of the question. LSST researchers will have to use computational techniques to figure out what's interesting.

"What is interesting in 30TB?" is an ambiguous, poorly defined question involving large amounts of data -- not that different from Watson. What's an "anomaly"? You really don't know until you see it. Just as you can't parse a tricky Jeopardy answer until you see it. And while finding data anomalies is a much different problem from parsing misleading natural language statements, both projects are headed in the same direction: they are asking for human behavior in an ambiguous situation. (Remember, Tyson's algorithms are replacing humans in a job humans have done well for years). While Watson is a masterpiece of natural language processing, it's important to remember that it's just a learning tool that will help us to solve more interesting problems. The LSST and problems of that scale are the real prize, and Watson is the next step.



Photo credit: Courtesy of International Business Machines Corporation. Unauthorized use not permitted.


Related:

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl