June 27 2012

Four short links: 27 June 2012

  1. Turing Centenary Speech (Bruce Sterling) -- so many thoughtbombs, this repays rereading. We’re okay with certain people who “think different” to the extent of buying Apple iPads. We’re rather hostile toward people who “think so very differently” that their work will make no sense for thirty years — if ever. We’ll test them, and see if we can find some way to get them to generate wealth for us, but we’re not considerate of them as unusual, troubled entities wandering sideways through a world they never made. ... Cognition exists, and computation exists, but they’re not the same phenomenon with two different masks on. ... Explain to me, as an engineer, why it’s so important to aspire to build systems with “Artificial Intelligence,” and yet you’d scorn to build “Artificial Femininity.” What is that about? ... Every day I face all these unstable heaps of creative machinery. How do we judge art created with, by, and or through these devices? What is our proper role with them? [...] How do we judge what we’re doing? How do we distribute praise and blame, rewards and demerits, how do to guide it, how do we attribute meaning to it? ... oh just read the whole damn piece, it's the best thing you'll read this month.
  2. Handsontable -- Excel-like grid editing plugin for jQuery (MIT-licensed).
  3. Lumoback (Kickstarter) -- smart posture sensor which provides a gentle vibration when you slouch to remind you to sit or stand straight. It is worn on your lower back and designed to be slim, sleek and so comfortable that you barely feel it when you have it on. (via Tim O'Reilly)
  4. Robot Hand Beats You At Rock-Paper-Scissors (IEEE) -- tl;dr: computer vision and fast robotics means it chooses after you reveal, but it happens so quickly that you don't realize it's cheating. (via Hacker News)

June 05 2012

Four short links: 5 June 2012

  1. StreetView: A Wolf in Sheep's Clothing (Adrian Holovaty) -- Now, I’m realizing the biggest Street View data coup of all: those vehicles are gathering the ultimate training set for driverless cars.
  2. Racist Culture is a Factory Defect (Anil Dash) -- so true.
  3. From Game Console to TV (Luke Wroblewski) -- Microsoft's Xbox video game console is now used more for watching movies and TV shows and listening to music online than playing video games online.
  4. Internet Everywhere -- video replay from the World Science Festival.

June 04 2012

Four short links: 4 June 2012

  1. How To Be An Explorer of the World (Amazon) -- I want to take this course on design anthropology but this book, the assigned text, looks like an excellent second best.
  2. StuxNet Was American-Made Cyberwarfare Tool (NY Times) -- not even the air gap worked for Iran, “It turns out there is always an idiot around who doesn’t think much about the thumb drive in their hand.”
  3. So Much For The Paperless Society (Beta Knowledge Tumblr) -- graph of the waxing and waning use of bond paper in North America. Spoiler: we're still using a lot.
  4. Magnifying Temporal Variation in Video -- Our goal is to reveal temporal variations in videos that are difficult or impossible to see with the naked eye and display them in an indicative manner. Our method, which we call Eulerian Video Magnification, takes a standard video sequence as input, and applies spatial decomposition, followed by temporal filtering to the frames. The resulting signal is then amplified to reveal hidden information. Using our method, we are able to visualize the flow of blood as it fills the face and also to amplify and reveal small motions. Our technique can run in real time to show phenomena occurring at temporal frequencies selected by the user. This is amazing: track the pulse in your face from a few frames. (via Hacker News)

March 16 2012

Four short links: 16 March 2012

  1. Militarizing Your Backyard With Python and Computer Vision (video) -- using a water cannon, computer video, Arduino, and Python to keep marauding squirrel hordes under control. See the finished result for Yakkity Saxed moist rodent goodness.
  2. Soundbite -- dialogue search for Apple's Final Cut Pro and Adobe Premiere Pro. Boris Soundbite quickly and accurately finds any word or phrase spoken in recorded media. Shoot squirrels with computer vision, search audio with computer hearing. We live in the future, people. (via Andy Baio)
  3. Single Page Apps with Backbone.js -- interesting and detailed dissection of how one site did it. Single page apps are where the server sends back one HTML file which changes (via Javascript) in response to the user's activity, possibly with API calls happening in the background, but where the browser is very definitely not requesting more full HTML pages from the server. The idea is to have speed (pull less across the wire each time the page changes) and also to use the language you already know to build the web page (Javascript).
  4. Why Finish Books? (NY Review of Books) -- the more bad books you finish, the fewer good ones you''ll have time to start. Applying this to the rest of life is left as an exercise for the reader.

January 12 2012

Four short links: 12 January 2012

  1. Smart Hacking for Privacy -- can mine smart power meter data (or even snoop it) to learn what's on the TV. Wow. (You can also watch the talk). (via Rob Inskeep)
  2. Conditioning Company Culture (Bryce Roberts) -- a short read but thought-provoking. It's easy to create mindless mantras, but I've seen the technique that Bryce describes and (when done well) it's highly effective.
  3. hydrat (Google Code) -- a declarative framework for text classification tasks.
  4. Dynamic Face Substitution (FlowingData) -- Kyle McDonald and Arturo Castro play around with a face tracker and color interpolation to replace their own faces, in real-time, with celebrities such as that of Brad Pitt and Paris Hilton. Awesome. And creepy. Amen.

December 23 2011

November 30 2011

November 03 2011

Four short links: 3 November 2011

  1. Feedback Without Frustration (YouTube) -- Scott Berkun at the HIVE conference talks about how feedback fails, and how to get it successfully. He is so good.
  2. Americhrome -- history of the official palette of the United States of America.
  3. Discovering Talented Musicians with Musical Analysis (Google Research blgo) -- very clever, they do acoustical analysis and then train up a machine learning engine by asking humans to rate some tracks. Then they set it loose on YouTube and it finds people who are good but not yet popular. My favourite: I'll Follow You Into The Dark by a gentleman with a wonderful voice.
  4. Dark Sky (Kickstarter) -- hyperlocal hyper-realtime weather prediction. Uses radar imagery to figure out what's going on around you, then tells you what the weather will be like for the next 30-60 minutes. Clever use of data plus software.

October 21 2011

Developer Week in Review: Talking to your phone

I've spent the last week or so getting up to speed on the ins and outs of Vex Robotics tournaments since I foolishly volunteered to be competition coordinator for an event this Saturday. I've also been helping out my son's team, offering design advice where I could. Vex is similar to Dean Kamen's FIRST Robotics program, but the robots are much less expensive to build. That means many more people can field robots from a given school and more people can be hands-on in the build. If you happen to be in southern New Hampshire this Saturday, drop by Pinkerton Academy and watch two dozen robots duke it out.

In non-robotic news ...

Why Siri matters

SiriIt's easy to dismiss Siri, Apple's new voice-driven "assistant" for the iPhone 4S, as just another refinement of the chatbot model that's been entertaining people since the days of ELIZA. No one would claim that Siri could pass the Turing test, for example. But, at least in my opinion, Siri is important for several reasons.

On a pragmatic level, Siri makes a lot of common smartphone tasks much easier. For example, I rarely used reminders on the iPhone and preferred to use a real keyboard when I had to create appointments. But Siri makes adding a reminder or appointment so easy that I have made it pretty much my exclusive method of entering them. It also is going to be a big win for drivers trying to use smartphones in their cars, especially in states that require hands-free operations.

I suspect Siri will also end up being a classic example of crowdsourcing. If I were Apple, I would be capturing every "miss" that Siri couldn't handle and looking for common threads. Since Siri is essentially doing natural language processing and applying rules to your requests, Apple can improve Siri progressively by adding the low-hanging fruit. For example, at the moment, Siri balks at a question like, "How are the Patriots doing?" I'd be shocked if it fails to answer that question in a year since sports scores and standings will be at the heart of commonly asked questions.

For developers, the benefits of Siri are obvious. While it's a closed box right now, if Apple follows its standard model, we should expect to see API and SDK support for it in future releases of iOS. At the moment, apps that want voice control (and they are few and far between) have to implement it themselves. Once apps can register with Siri, any app will be able to use voice.

Velocity Europe, being held Nov. 8-9 in Berlin, will bring together the web operations and performance communities for two days of critical training, best practices, and case studies.

Save 20% on registration with the code RADAR20

Can Open Office survive? logoLong-time WIR readers will know that I'm no fan of how Oracle has treated its acquisitions from Sun. A prime example is OpenOffice. In June, OpenOffice was spun off from Oracle, and therefore lost its allowance. Now the OpenOffice team is passing around the hat, looking for funds to keep the project going.

We need to support Open Office because it's the only project that really keeps Microsoft honest as far as providing open standards access to Microsoft Office products. It's also the only way that Linux users can deal with the near-ubiquitous use of Office document formats in the real world (short of running Office in a VM or with Wine.)

The revenge of SQL

The NoSQL crowd has always had Google App Engine as an ally since the only database available to App Engine apps has been the App Engine Datastore, which (among other things) doesn't support joins. But much as Apple initially rejected multitasking on the iPhone (until it decided to embrace it), Google appears to have thrown in the towel as far as SQL goes.

It's always dangerous to hold an absolutist position (with obvious exceptions, such as despising Jar Jar Binks). SQL may have been overused in the past, but it's foolish to reject SQL altogether. It can be far too useful at times. SQL can be especially handy, as an example, when developing pure REST-like web services. It's nice to see that Google has taken a step back from the edge. Or, to put it more pragmatically, that it listens to its customer base on occasion.

Got news?

Please send tips and leads here.


September 09 2011

Four short links: 9 September 2011

  1. A Simple Test For Whether People Will Pay For News -- an excellent thought experiment, one which sends shivers down the spines of editors.
  2. -- This is as complete a list as possible of links to carrier and other provider network status pages as well as links to network diagnostic tools; user contributions are strongly encouraged. (via Jesse Vincent)
  3. Sudoku Solver Just in CSS -- boggle. (via Paul Irish)
  4. MIL-OSS Conference Writeup -- Alex S. Voultepsis explained how the intelligence community has built up an internal infrastructure with the tools that people want to use; in a vast number of cases, they use OSS to do this. For example, Intellipedia is implemented using MediaWiki, the same software that runs Wikipedia. (via John Scott)

August 02 2011

Four short links: 2 August 2011

  1. DIY UAVs for Cyber-Warface -- aerial drone that poses as celltower, sniffs wifi, cracks passwords, and looks badass. The photo should be captioned "IM IN UR SKIES, SNIFFIN UR GMAIL SESSION COOKIEZ." (via Bryan O'Sullivan)
  2. Wicked Problems (Karl Schroeder) -- a category of problem which, once you read the definition, you recognize everywhere. 5. Every solution to a wicked problem is a "one-shot operation"; because there is no opportunity to learn by trial and error, every attempt counts significantly. I like Karl's take: our biggest challenges are no longer technological. They are issues of communication, coordination, and cooperation. These are, for the most part, well-studied problems that are not wicked. The methodologies that solve them need to be scaled up from the small-group settings where they currently work well, and injected into the DNA of our society--or, at least, built into our default modes of using the internet. They then can be used to tackle the wicked problems.
  3. Stanford AI Class -- Peter Norvig teaching an AI class at Stanford with online open participation. Joins Archaeology of Ancient Egypt in league of university classes where anyone can join in. The former will let you register with Stanford (presumably for $$$) to join the class. The latter lets you audit for free, as the class will be run in open and transparent fashion. The former will be supported by the for-sale textbook, the latter by freely-downloadable readings.
  4. Sensory and Chemical Analysis of "Shackleton’s" Mackinlay Scotch Whisky (PDF) -- Three cases of Mackinlay’s Rare Highland Malt whisky were excavated from the ice under Sir Ernest Shackleton’s 1907 expedition base camp hut at Cape Royds in Antarctica in January 2010. The majority of the bottles were in a pristine state of preservation and three were returned to Scotland in January 2011 for the first sensory and organoleptic analysis of a Scotch malt whisky distilled in the late 1890s. I love science where figures have captions like: Principal component analysis (PCA) of peat derived congeners in peated whisky and new-make spirit. I hope the finders got to drink at least some of it, but sentences like this make it seem improbable: The three whisky bottles, minus the whisky sampled via the syringe for this work, will be returned to New Zealand and the Antarctic Heritage Trust will subsequently return the artefacts to Antarctica and place them back under the floor of Shackleton’s hut for posterity. (via Chris Heathcote)

July 18 2011

Four short links: 18 July 2011

  1. Organisational Warfare (Simon Wardley) -- notes on the commoditisation of software, with interesting analyses of the positions of some large players. On closer inspection, Salesforce seems to be doing more than just commoditisation with an ILC pattern, as can be clearly seen from Radian's 6 acquisition. They also seem to be operating a tower and moat strategy, i.e. creating a tower of revenue (the service) around which is built a moat devoid of differential value with high barriers to entry. When their competitors finally wake up and realise that the future world of CRM is in this service space, they'll discover a new player dominating this space who has not only removed many of the opportunities to differentiate (e.g. social CRM, mobile CRM) but built a large ecosystem that creates high rates of new innovation. This should be a fairly fatal combination.
  2. Learning to Win by Reading Manuals in a Monte-Carlo Framework (MIT) -- starting with no prior knowledge of the game or its UI, the system learns how to play and to win by experimenting, and from parsed manual text. They used FreeCiv, and assessed the influence of parsing the manual shallowly and deeply. Trust MIT to turn RTFM into a paper. For human-readable explanation, see the press release.
  3. A Shapefile of the TZ Timezones of the World -- I have nothing but sympathy for the poor gentleman who compiled this. Political boundaries are notoriously arbitrary, and timezones are even worse because they don't need a war to change. (via Matt Biddulph)
  4. Microsoft Adventure -- 1979 Microsoft game for the TRS-80 has fascinating threads into the past and into what would become Microsoft's future.

June 03 2011

Four short links: 3 June 2011

  1. Silk Road (Gawker) -- Tor-delivered "web" site that is like an eBay for drugs, currency is Bitcoins. Jeff Garzik, a member of the Bitcoin core development team, says in an email that bitcoin is not as anonymous as the denizens of Silk Road would like to believe. He explains that because all Bitcoin transactions are recorded in a public log, though the identities of all the parties are anonymous, law enforcement could use sophisticated network analysis techniques to parse the transaction flow and track down individual Bitcoin users. "Attempting major illicit transactions with bitcoin, given existing statistical analysis techniques deployed in the field by law enforcement, is pretty damned dumb," he says. The site is viewable here, and here's a discussion of delivering hidden web sites with Tor. (via Nelson Minar)
  2. Dr Waller -- a big game using DC Comics characters where players end up crowdsourcing science on GalaxyZoo. A nice variant on the captcha/ESP-style game that Luis von Ahn is known for. (via BoingBoing)
  3. Machine Learning Demos -- hypnotically beautiful. Code for download.
  4. Esper -- stream event processing engine, GPLv2-licensed Java. (via Stream Event Processing with Esper and Edd Dumbill)

April 04 2011

Four short links: 4 April 2011

  1. Find The Future -- New York Public Library big game, by Jane McGonigal. (via Imran Ali)
  2. Enable Certificate Checking on Mac OS X -- how to get your browser to catch attempts to trick you with revoked certificates (more of a worry since security problems at certificate authorities came to light). (via Peter Biddle)
  3. Clever Algorithms -- Nature-Inspired Programming Recipes from AI, examples in Ruby. I hadn't realized there were Artificial Immune Systems. Cool! (via Avi Bryant)
  4. Rethinking Evaluation Metrics in Light of Flickr Commons -- conference paper from Museums and the Web. As you move from "we are publishing, you are reading" to a read-write web, you must change your metrics. Rather than import comments and tags directly into the Library's catalog, we verify the information then use it to expand the records of photos that had little description when acquired by the Library. [...] The symbolic 2,500th record, a photo from the Bain collection of Captain Charles Polack, is illustrative of the updates taking place based on community input. The new information in the record of this photo now includes his full name, death date, employer, and the occasion for taking the photo, the 100th Atlantic crossing as ocean liner captain. An additional note added to the record points the Library visitor to the Flickr conversation and more of the story with references to gold shipments during WWI. Qualitative measurements, like level of engagement, are a challenge to gauge and convey. While resources expended are sometimes viewed as a cost, in this case they indicate benefit. If you don't measure the right thing, you'll view success as a failure. (via Seb Chan)

February 21 2011

September 03 2010

Four short links: 3 Sep 2010

  1. Arranging Things: The Rhetoric of Object Placement (Amazon) -- [...] the underlying principles that govern how Western designers arrange things in three-dimensional compositions. Inspired by Greek and Roman notions of rhetoric [...] Koren elucidates the elements of arranging rhetoric that all designers instinctively use in everything from floral compositions to interior decorating. (via Elaine Wherry)
  2. 2010 Mario AI Championship -- three tracks: Gameplay, Learning, and Level Generation. Found via Ben Weber's account of his Level Generation entry. My submission utilizes a multi-pass approach to level generation in which the system iterates through the level several times, placing different types of objects during each pass. During each pass through the level, a subset of each object type has a specific probability of being added to the level. The result is a computationally efficient approach to generating a large space of randomized levels.
  3. Wave in a Box -- Google to flesh out existing open source Wave client and server into full "Wave in a Box" app status.
  4. 3D Sound in Google Earth (YouTube) -- wow. (via Planet In Action)

August 12 2010

Watson, Turing, and extreme machine learning

One of best presentations at IBM's recent Blogger Day was given by David Ferrucci, the leader of the Watson team, the group that developed the supercomputer that recently appeared as a contestant on Jeopardy.

To many people, the Turing test is the gold standard of artificial intelligence. Put briefly, the idea is that if you can't tell whether you're interacting with a computer or a human, a computer has passed the test.

But it's easy to forget how subtle this criterion is. Turing proposes changing the question from "Can machines think?" to the operational criterion, "Can we distinguish between a human and a machine?" But it's not a trivial question: it's not "Can a computer answer difficult questions correctly?" but rather, "Can a computer behave in ways that are indistinguishable from human behavior?" In other words, getting the "right" answer has nothing to do with the test. In fact, if you were trying to tell whether you were "talking to" a computer or a human, and got only correct answers, you would have every right to be deeply suspicious.

Alan Turing was thinking explicitly of this: in his 1950 paper, he proposes question/answer pairs like this:

Q: Please write me a sonnet on the subject of the Forth Bridge.

A: Count me out on this one. I never could write poetry.

Q: Add 34,957 to 70,764.

A: (Pause about 30 seconds and then give as answer) 105,621.

We'd never think of asking a computer the first question, though I'm sure there are sonnet-writing projects going on somewhere. And the hypothetical answer is equally surprising: it's neither a sonnet (good or bad), nor a core dump, but a deflection. It's human behavior, not accurate thought, that Turing is after. This is equally apparent with the second question: while it's computational, just giving an answer (which even a computer from the early '50s could do immediately) isn't the point. It's the delay that simulates human behavior.

Dave Ferrucci, IBM scientist and Watson project director
Dave Ferrucci, IBM scientist and Watson project director

While Watson presumably doesn't have delays programmed in, and appears only in a situation where deflecting a question (sorry, it's Jeopardy, deflecting an answer) isn't allowed, it's much closer to this kind of behavior than any serious attempt at AI that I've seen. It's an attempt to compete at a high level in a particular game. The game structures the interaction, eliminating some problems (like deflections) but adding others: "misleading or ambiguous answers are par for the course" (to borrow from NPR's "What Do You Know"). Watson has to parse ambiguous sentences, decouple multiple clues embedded in one phrase, to come up with a question. Time is a factor -- and more than time, confidence that the answer is correct. After all, it would be easy for a computer to buzz first on every question, electronics does timing really well, but buzzing first whether or not you know the answer would be a losing strategy for a computer, as well as for a human. In fact, Watson would handle the first of Turing's questions perfectly: if it isn't confident of an answer, it doesn't buzz, just as a human Jeopardy player.

Equally important, Watson is not always right. While the film clip on IBM's site shows some spectacular wrong answers (and wrong answers that don't really duplicate human behavior), it's an important step forward. As Ferrucci said when I spoke to him, the ability to be wrong is part of the problem. Watson's goal is to emulate human behavior on a high level, not to be a search engine or some sort of automated answering machine.

Some fascinating statements are at the end of Turing's paper. He predicts computers with a gigabyte of storage by 2000 (roughly correct, assuming that Turing was talking about what we now call RAM), and thought that we'd be able to achieve thinking machines in that same time frame. We aren't there yet, but Watson shows that we might not be that far off.

But there's a more important question than what it means for a machine to think, and that's whether machines can help us to ask questions about huge amounts of ambiguous data. I was at a talk a couple of weeks ago where Tony Tyson talked about the Large Synoptic Survey Telescope project, which will deliver dozens of terabytes of data per night. He said that in the past, we'd use humans to take a first look at the data and decide what was interesting. Crowdsourcing analysis of astronomical images isn't new, but the number of images coming from the LSST is even too large for a project like GalaxyZoo. With this much data, using humans is out of the question. LSST researchers will have to use computational techniques to figure out what's interesting.

"What is interesting in 30TB?" is an ambiguous, poorly defined question involving large amounts of data -- not that different from Watson. What's an "anomaly"? You really don't know until you see it. Just as you can't parse a tricky Jeopardy answer until you see it. And while finding data anomalies is a much different problem from parsing misleading natural language statements, both projects are headed in the same direction: they are asking for human behavior in an ambiguous situation. (Remember, Tyson's algorithms are replacing humans in a job humans have done well for years). While Watson is a masterpiece of natural language processing, it's important to remember that it's just a learning tool that will help us to solve more interesting problems. The LSST and problems of that scale are the real prize, and Watson is the next step.

Photo credit: Courtesy of International Business Machines Corporation. Unauthorized use not permitted.


June 17 2010

Four short links: 17 June 2010

  1. What is IBM's Watson? (NY Times) -- IBM joining the big data machine learning race, and hatching a Blue Gene system that can answer Jeopardy questions. Does good, not great, and is getting better.
  2. Google Lays Out its Mobile Strategy (InformationWeek) -- notable to me for Rechis said that Google breaks down mobile users into three behavior groups: A. "Repetitive now" B. "Bored now" C. "Urgent now", a useful way to look at it. (via Tim)
  3. BP GIS and the Mysteriously Vanishing Letter -- intrigue in the geodata world. This post makes it sound as though cleanup data is going into a box behind BP's firewall, and the folks who said "um, the government should be the depot, because it needs to know it has a guaranteed-untampered and guaranteed-able-to-access copy of this data" were fired. For more info, including on the data that is available, see the geowanking thread.
  4. Streamhacker -- a blog talking about text mining and other good things, with nltk code you can run. (via heraldxchaos on Delicious)

March 25 2010

Four short links: 25 March 2010

  1. Aren't You Being a Little Hasty in Making This Data Free? -- very nice deconstruction of a letter sent by ESRI and competitors to the British Government, alarmed at the announcement that various small- and mid-sized datasets would no longer be charged for. In short, companies that make money reselling datasets hate the idea of free datasets. The arguments against charging are that the cost of gating access exceeds revenue and that open access maximises economic gain. (via glynmoody on Twitter)
  2. User Assisted Audio Selection -- amazing movie that lets you sing or hum along with a piece of music to pull them out of the background music. The researcher, Paris Smaragdis has a done lot of other nifty audio work. (via waxpancake on Twitter)
  3. Cologne-based Libraries Release 5.4M Bibliographic Records to CC0 -- I see resonance here with the Cologne Archives disaster last year, where the building collapsed and 18km of shelves covering over 2000 years of municipal history were lost. When you have digital heritage, embrace the ease of copying and spread those bits as far and wide as you can. Hoarding bits comes with a risk of a digital Cologne disaster, where one calamity deletes your collection. (via glynmoody on Twitter)
  4. ThinkTank -- web app that lets you analyse your tweets, break down responses to queries, and archive your Twitter experience. Built by Expert Labs.

March 03 2010

Four short links: 3 March 2010

  1. Top 25 Most Dangerous Programming Errors (MITRE) -- I could play bingo with this on some of the programs I wrote when I was learning to code. Now, of course, I am perfect. *cough*cough*
  2. RepRap Printing in Clay -- interesting because of the high price of the plastic that fab units typically use. Other groups are working on this--see, for example, recycled glass, sugar, and maltodextrin.
  3. Artificial Flight and Other Myths -- amusing parody of anti-AI arguments.
  4. Snake Oil Supplements -- visualisation of the scientific evidence for various food supplements. What interested me is that it's automatically generated from data in this Google Doc.

