Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

September 19 2011

Four short links: 19 September 2011

  1. 1996 vs 2011 Infographic from Online University (Evolving Newsroom) -- "AOL and Yahoo! may be the butt of jokes for young people, but both are stronger than ever in the Internet's Top 10". Plus ça change, plus c'est la même chose.
  2. Pandas -- open source Python package for data analysis, fast and powerful. (via Joshua Schachter)
  3. The Society of Mind -- MIT open courseware for the classic Marvin Minsky theory that explains the mind as a collection of simpler processes. The subject treats such aspects of thinking as vision, language, learning, reasoning, memory, consciousness, ideals, emotions, and personality. Ideas incorporate psychology, artificial intelligence, and computer science to resolve theoretical issues such as whole vs. parts, structural vs. functional descriptions, declarative vs. procedural representations, symbolic vs. connectionist models, and logical vs. common-sense theories of learning. (via Maria Popover)
  4. Gamers Solve Problem in AIDS Research That Puzzled Scientists for Years (Ed Yong) -- researchers put a key protein from an HIV-related virus onto the Foldit game. If we knew where the halves joined together, we could create drugs that prevented them from uniting. But until now, scientists have only been able to discern the structure of the two halves together. They have spent more than ten years trying to solve structure of a single isolated half, without any success. The Foldit players had no such problems. They came up with several answers, one of which was almost close to perfect. In a few days, Khatib had refined their solution to deduce the protein’s final structure, and he has already spotted features that could make attractive targets for new drugs. Foldit is a game where players compete to find the best shape for a protein, but it's capable of being played by anyone--barely an eighth of players work in science.

June 24 2011

Four short links: 24 June 2011

  1. Eliza pt 3 -- delightful recapitulation of the reaction to Eliza and Weizenbaum's reaction to that reaction, including his despair over the students he taught at MIT. Weizenbaum wrote therein of his students at MIT, which was of course all about science and technology. He said that they "have already rejected all ways but the scientific to come to know the world, and [they] seek only a deeper, more dogmatic indoctrination in that faith (although that word is no longer in their vocabulary)."
  2. Computer Vision Models -- textbook written in the open for public review. (via Hacker News)
  3. Echoprint -- open source and open data music fingerprinting service from MusicBrainz and others. I find it interesting that doing something new with music data requires crowdsourcing because nobody has the full set.
  4. Three Arguments Against The Singularity (Charlie Stross) -- We clearly want machines that perform human-like tasks. We want computers that recognize our language and motivations and can take hints, rather than requiring instructions enumerated in mind-numbingly tedious detail. But whether we want them to be conscious and volitional is another question entirely. I don't want my self-driving car to argue with me about where we want to go today. I don't want my robot housekeeper to spend all its time in front of the TV watching contact sports or music videos. And I certainly don't want to be sued for maintenance by an abandoned software development project.

August 12 2010

Watson, Turing, and extreme machine learning

One of best presentations at IBM's recent Blogger Day was given by David Ferrucci, the leader of the Watson team, the group that developed the supercomputer that recently appeared as a contestant on Jeopardy.

To many people, the Turing test is the gold standard of artificial intelligence. Put briefly, the idea is that if you can't tell whether you're interacting with a computer or a human, a computer has passed the test.

But it's easy to forget how subtle this criterion is. Turing proposes changing the question from "Can machines think?" to the operational criterion, "Can we distinguish between a human and a machine?" But it's not a trivial question: it's not "Can a computer answer difficult questions correctly?" but rather, "Can a computer behave in ways that are indistinguishable from human behavior?" In other words, getting the "right" answer has nothing to do with the test. In fact, if you were trying to tell whether you were "talking to" a computer or a human, and got only correct answers, you would have every right to be deeply suspicious.

Alan Turing was thinking explicitly of this: in his 1950 paper, he proposes question/answer pairs like this:

Q: Please write me a sonnet on the subject of the Forth Bridge.

A: Count me out on this one. I never could write poetry.

Q: Add 34,957 to 70,764.

A: (Pause about 30 seconds and then give as answer) 105,621.

We'd never think of asking a computer the first question, though I'm sure there are sonnet-writing projects going on somewhere. And the hypothetical answer is equally surprising: it's neither a sonnet (good or bad), nor a core dump, but a deflection. It's human behavior, not accurate thought, that Turing is after. This is equally apparent with the second question: while it's computational, just giving an answer (which even a computer from the early '50s could do immediately) isn't the point. It's the delay that simulates human behavior.

Dave Ferrucci, IBM scientist and Watson project director
Dave Ferrucci, IBM scientist and Watson project director

While Watson presumably doesn't have delays programmed in, and appears only in a situation where deflecting a question (sorry, it's Jeopardy, deflecting an answer) isn't allowed, it's much closer to this kind of behavior than any serious attempt at AI that I've seen. It's an attempt to compete at a high level in a particular game. The game structures the interaction, eliminating some problems (like deflections) but adding others: "misleading or ambiguous answers are par for the course" (to borrow from NPR's "What Do You Know"). Watson has to parse ambiguous sentences, decouple multiple clues embedded in one phrase, to come up with a question. Time is a factor -- and more than time, confidence that the answer is correct. After all, it would be easy for a computer to buzz first on every question, electronics does timing really well, but buzzing first whether or not you know the answer would be a losing strategy for a computer, as well as for a human. In fact, Watson would handle the first of Turing's questions perfectly: if it isn't confident of an answer, it doesn't buzz, just as a human Jeopardy player.

Equally important, Watson is not always right. While the film clip on IBM's site shows some spectacular wrong answers (and wrong answers that don't really duplicate human behavior), it's an important step forward. As Ferrucci said when I spoke to him, the ability to be wrong is part of the problem. Watson's goal is to emulate human behavior on a high level, not to be a search engine or some sort of automated answering machine.

Some fascinating statements are at the end of Turing's paper. He predicts computers with a gigabyte of storage by 2000 (roughly correct, assuming that Turing was talking about what we now call RAM), and thought that we'd be able to achieve thinking machines in that same time frame. We aren't there yet, but Watson shows that we might not be that far off.

But there's a more important question than what it means for a machine to think, and that's whether machines can help us to ask questions about huge amounts of ambiguous data. I was at a talk a couple of weeks ago where Tony Tyson talked about the Large Synoptic Survey Telescope project, which will deliver dozens of terabytes of data per night. He said that in the past, we'd use humans to take a first look at the data and decide what was interesting. Crowdsourcing analysis of astronomical images isn't new, but the number of images coming from the LSST is even too large for a project like GalaxyZoo. With this much data, using humans is out of the question. LSST researchers will have to use computational techniques to figure out what's interesting.

"What is interesting in 30TB?" is an ambiguous, poorly defined question involving large amounts of data -- not that different from Watson. What's an "anomaly"? You really don't know until you see it. Just as you can't parse a tricky Jeopardy answer until you see it. And while finding data anomalies is a much different problem from parsing misleading natural language statements, both projects are headed in the same direction: they are asking for human behavior in an ambiguous situation. (Remember, Tyson's algorithms are replacing humans in a job humans have done well for years). While Watson is a masterpiece of natural language processing, it's important to remember that it's just a learning tool that will help us to solve more interesting problems. The LSST and problems of that scale are the real prize, and Watson is the next step.



Photo credit: Courtesy of International Business Machines Corporation. Unauthorized use not permitted.


Related:

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl