June 07 2013

Four short links: 7 June 2013

  1. Accumulo — NSA’s BigTable implementation, released as an Apache project.
  2. How the Robots Lost (Business Week) — the decline of high-frequency trading profits (basically, markets worked and imbalances in speed and knowledge have been corrected). Notable for the regulators getting access to the technology that the traders had: Last fall the SEC said it would pay Tradeworx, a high-frequency trading firm, $2.5 million to use its data collection system as the basic platform for a new surveillance operation. Code-named Midas (Market Information Data Analytics System), it scours the market for data from all 13 public exchanges. Midas went live in February. The SEC can now detect anomalous situations in the market, such as a trader spamming an exchange with thousands of fake orders, before they show up on blogs like Nanex and ZeroHedge. If Midas sees something odd, Berman’s team can look at trading data on a deeper level, millisecond by millisecond.
  3. PRISM: Surprised? (Danny O’Brien) — I really don’t agree with the people who think “We don’t have the collective will”, as though there’s some magical way things got done in the past when everyone was in accord and surprised all the time. It’s always hard work to change the world. Endless, dull hard work. Ten years later, when you’ve freed the slaves or beat the Nazis everyone is like “WHY CAN’T IT BE AS EASY TO CHANGE THIS AS THAT WAS, BACK IN THE GOOD OLD DAYS. I GUESS WE’RE ALL JUST SHEEPLE THESE DAYS.”
  4. What We Don’t Know About Spying on Citizens is Scarier Than What We Do Know (Bruce Schneier) — The U.S. government is on a secrecy binge. It overclassifies more information than ever. And we learn, again and again, that our government regularly classifies things not because they need to be secret, but because their release would be embarrassing. Open source BigTable implementation: free. Data gathering operation around it: $20M/year. Irony in having the extent of authoritarian Big Brother government secrecy questioned just as a whistleblower’s military trial is held “off the record”: priceless.

May 29 2013

Four short links: 29 May 2013

  1. Quick Reads of Notable New Zealanders — notable for two reasons: (a) CC-NC-BY licensed, and (b) gorgeous gorgeous web design. Not what one normally associates with Government web sites!
  2. svg.js — Javascript library for making and munging SVG images. (via Nelson Minar)
  3. Linkbot: Create with Robots (Kickstarter) — accessible and expandable modular robot. Loaded w/ absolute encoding, accelerometer, rechargeable lithium ion battery and ZigBee. (via IEEE Spectrum)
  4. The Promise and Peril of Real-Time Corrections to Political Misperceptions (PDF) — paper presenting results of an experiment comparing the effects of real-time corrections to corrections that are presented after a short distractor task. Although real-time corrections are modestly more effective than delayed corrections overall, closer inspection reveals that this is only true among individuals predisposed to reject the false claim. In contrast, individuals whose attitudes are supported by the inaccurate information distrust the source more when corrections are presented in real time, yielding beliefs comparable to those never exposed to a correction. We find no evidence of realtime corrections encouraging counterargument. Strategies for reducing these biases are discussed. So much for the Google Glass bullshit detector transforming politics. (via Vaughan Bell)

May 07 2013

Four Short Links: 7 May 2013

  1. Raspberry Pi Wireless Attack ToolkitA collection of pre-configured or automatically-configured tools that automate and ease the process of creating robust Man-in-the-middle attacks. The toolkit allows your to easily select between several attack modes and is specifically designed to be easily extendable with custom payloads, tools, and attacks. The cornerstone of this project is the ability to inject Browser Exploitation Framework Hooks into a web browser without any warnings, alarms, or alerts to the user. We accomplish this objective mainly through wireless attacks, but also have a limpet mine mode with ettercap and a few other tricks.
  2. Industrial Robot with SDK For Researchers (IEEE Spectrum) — $22,000 industrial robot with 7 degrees-of-freedom arms, integrated cameras, sonar, and torque sensors on every joint. [...] The Baxter research version is still running a core software system that is proprietary, not open. But on top of that the company built the SDK layer, based on ROS (Robot Operation System), and this layer is open source. In addition, there are also some libraries of low level tasks (such as joint control and positioning) that Rethink made open.
  3. OtherMill (Kickstarter) — An easy to use, affordable, computer controlled mill. Take all your DIY projects further with custom circuits and precision machining. (via Mike Loukides)
  4. go-raft (GitHub) — open source implementation of the Raft distributed consensus protocol, in Go. (via Ian Davis)

May 06 2013

Four short links: 6 May 2013

  1. Nautilus — elegantly-designed science web ‘zine. Includes Artificial Emotions on AI, neuro, and psych efforts to recognise and simulate emotions.
  2. A Short Essay on 3D PrintingThis hands-off approach to culpability cannot last long. If you design something to go into someone’s bathroom, it will make it’s way into their childs mouth. If someone buys, downloads and prints a case for their OUYA and they suffer an electric shock as a result, who is to blame? If a person replaces their phone case with a 3D printed one, and it doesn’t survive a drop to the floor, what then? We need to create a new chain of responsiblity for this emerging, and potentially very profitable business. (via Near Future Laboratory)
  3. Zuckerberg’s PAC (Anil Dash) — One of Mark Zuckerberg’s most famous mottos is “Move fast and break things.” When it comes to policy impacting the lives of millions of people around the world, there couldn’t be a worse slogan. Let’s see if we can get to be as accountable to the technology industry as it purports to be, since they will undoubtedly claim to have the grassroots support of our community regardless of whether that’s true or not.
  4. Pirate Economics — four dimensions of pirate institutions. Not BitTorrent pirates, but Berbers and arr-harr-avast-ye-swabbers nautical pirates. Pirate crews not only elected their captains on the basis of universal pirate suffrage, but they also regularly deposed them by democratic elections if they were not satisfied with their performance. Like the Berbers, or the US constitution, pirates didn’t just rely on democratic elections to keep their leaders under check. Though the captain of the ship was in charge of battle and strategy, pirate crews also used a separate democratic election to elect the ship’s quartermaster who was in charge of allocating booty, adjudicating disputes and administering discipline. Thus they had a nascent form of separation of powers.

April 24 2013

Four short links: 2 May 2013

  1. Metrico — puzzle game for Playstation centered around infographics (charts and graphs). (via Flowing Data)
  2. The Lease They Can Do (Business Week) — excellent Paul Ford piece on money, law, and music streaming services. So this is not about technology. Nor is it really about music. This is about determining the optimal strategy for mass licensing of digital artifacts.
  3. How Effective Is a Humanoid Robot as a Tool for Interviewing Young Children? (PLosONE) — The results reveal that the children interacted with KASPAR very similar to how they interacted with a human interviewer. The quantitative behaviour analysis reveal that the most notable difference between the interviews with KASPAR and the human were the duration of the interviews, the eye gaze directed towards the different interviewers, and the response time of the interviewers. These results are discussed in light of future work towards developing KASPAR as an ‘interviewer’ for young children in application areas where a robot may have advantages over a human interviewer, e.g. in police, social services, or healthcare applications.
  4. Funding: Australia’s Grant System Wastes Time (Nature, paywalled) — We found that scientists in Australia spent more than five centuries’ worth of time preparing research-grant proposals for consideration by the largest funding scheme of 2012. Because just 20.5% of these applications were successful, the equivalent of some four centuries of effort returned no immediate benefit to researchers.

April 05 2013

Four short links: 5 April 2013

  1. Millimetre-Accuracy 3D Imaging From 1km Away (The Register) — With further development, Heriot-Watt University Research Fellow Aongus McCarthy says, the system could end up both portable and with a range of up to 10 Km. See the paper for the full story.
  2. Robot Ants With Pheromones of Light (PLoS Comp Biol) — see also the video. (via IEEE Spectrum’s AI blog)
  3. tabula — open source tool for liberating data tables trapped inside PDF files. (via Source)
  4. There’s No Economic Imperative to Reconsider an Open Internet (SSRN) — The debate on the neutrality of Internet access isn’t new, and if its intensity varies over time, it has for a long while tainted the relationship between Internet Service Providers (ISPs) and Online Service Providers (OSPs). This paper explores the economic relationship between these two types of players, examines in laymen’s terms how the traffic can be routed efficiently and the associated cost of that routing. The paper then assesses various arguments in support of net discrimination to conclude that there is no threat to the internet economy such that reconsidering something as precious as an open internet would be necessary. (via Hamish MacEwan)

April 04 2013

Four short links: 4 April 2013

  1. geo-bootstrap — Twitter Bootstrap fork that looks like a classic geocities page. Because. (via Narciso Jaramillo)
  2. Digital Public Library of America — public libraries sharing full text and metadata for scans, coordinating digitisation, maximum reuse. See The Verge piece. (via Dan Cohen)
  3. Snake Robots — I don’t think this is a joke. The snake robot’s versatile abilities make it a useful tool for reaching locations or viewpoints that humans or other equipment cannot. The robots are able to climb to a high vantage point, maneuver through a variety of terrains, and fit through tight spaces like fences or pipes. These abilities can be useful for scouting and reconnaissance applications in either urban or natural environments. Watch the video, the nightmares will haunt you. (via Aaron Straup Cope)
  4. The Power of Data in Aboriginal Hands (PDF) — critique of government statistical data gathering of Aboriginal populations. That ABS [Australian Bureau of Statistics] survey is designed to assist governments, commentators or academics who want to construct policies that shape our lives or encourage a one-sided public discourse about us and our position in the Australian nation. The survey does not provide information that Indigenous people can use to advance our position because the data is aggregated at the national or state level or within the broad ABS categories of very remote, remote, regional or urban Australia. These categories are constructed in the imagination of the Australian nation state. They are not geographic, social or cultural spaces that have relevance to Aboriginal people. [...] The Australian nation’s foundation document of 1901 explicitly excluded Indigenous people from being counted in the national census. That provision in the constitution, combined with Section 51, sub section 26, which empowered the Commonwealth to make special laws for ‘the people of any race, other than the Aboriginal race in any State’ was an unambiguous and defining statement about Australian nation building. The Founding Fathers mandated the federated governments of Australia to oversee the disappearance of Aboriginal people in Australia.

January 04 2013

January 01 2013

Four short links: 1 January 2013

  1. Robots Will Take Our Jobs (Wired) — I agree with Kevin Kelly that (in my words) software and hardware are eating wetware, but disagree that This is not a race against the machines. If we race against them, we lose. This is a race with the machines. You’ll be paid in the future based on how well you work with robots. Ninety percent of your coworkers will be unseen machines. Most of what you do will not be possible without them. And there will be a blurry line between what you do and what they do. You might no longer think of it as a job, at least at first, because anything that seems like drudgery will be done by robots. Civilizations which depend on specialization reward work and penalize idleness. We already have more people than work for them, and if we’re not to be creating a vast disconnected former workforce then we (society) need to get a hell of a lot better at creating jobs and not destroying them.
  2. Why Workers are Losing the War Against Machines (The Atlantic) — There is no economic law that says that everyone, or even most people, automatically benefit from technological progress.
  3. Early Quora Design Notes — I love reading post-mortems and learning from what other people did. Picking a starting point is important because it will be the axis the rest of the design revolves around — but it’s tricky and not always the first page in the flow. Ideally, you should start with the page that serves the most significant goals of the product.
  4. Free Data Science BooksI don’t mean free as in some guy paid for a PDF version of an O’Reilly book and then posted it online for others to use/steal, but I mean genuine published books with a free online version sanctioned by the publisher. That is, “the publisher has graciously agreed to allow a full, free version of my book to be available on this site.” (via Stein Debrouwere)

November 28 2012

Four short links: 28 November 2012

  1. Moral Machinesit will no longer be optional for machines to have ethical systems. Your car is speeding along a bridge at fifty miles per hour when errant school bus carrying forty innocent children crosses its path. Should your car swerve, possibly risking the life of its owner (you), in order to save the children, or keep going, putting all forty kids at risk? If the decision must be made in milliseconds, the computer will have to make the call. (via BoingBoing)
  2. Hystrixa latency and fault tolerance library designed to isolate points of access to remote systems, services and 3rd party libraries, stop cascading failure and enable resilience in complex distributed systems where failure is inevitable. More information. (via Tom Loosemore)
  3. Offline First: A Better HTML5 Experience — can’t emphasize how important it is to have offline functionality for the parts of the world that don’t have blanket 3G/LTE/etc coverage. (280 south from SF, for example).
  4. Disaster of Biblical Proportions (Business Insider) — impressive collection of graphs and data showing commodity prices indicate our species is living beyond its means.

October 18 2012

Four short links: 18 October 2012

  1. Let’s Pool Our Medical Data (TED) — John Wilbanks (of Science Commons fame) gives a strong talk for creating an open, massive, mine-able database of data about health and genomics from many sources. Money quote: Facebook would never make a change to something as important as an advertising with a sample size as small as a Phase 3 clinical trial.
  2. Verizon Sells App Use, Browsing Habits, Location (CNet) — Verizon Wireless has begun selling information about its customers’ geographical locations, app usage, and Web browsing activities, a move that raises privacy questions and could brush up against federal wiretapping law. To Verizon, even when you do pay for it, you’re still the product. Carriers: they’re like graverobbing organ harvesters but without the strict ethical standards.
  3. IBM Watson About to Launch in Medicine (Fast Company) — This fall, after six months of teaching their treatment guidelines to Watson, the doctors at Sloan-Kettering will begin testing the IBM machine on real patients. [...] On the screen, a colorful globe spins. In a few seconds, Watson offers three possible courses of chemotherapy, charted as bars with varying levels of confidence–one choice above 90% and two above 80%. “Watson doesn’t give you the answer,” Kris says. “It gives you a range of answers.” Then it’s up to [the doctor] to make the call. (via Reddit)
  4. Robot Kills Weeds With 98% AccuracyDuring tests, this automated system gathered over a million images as it moved through the fields. Its Computer Vision System was able to detect and segment individual plants – even those that were touching each other – with 98% accuracy.

October 11 2012

Four short links: 11 October 2012

  1. ABalytics — dead simple A/B testing with Google Analytics. (via Dan Mazzini)
  2. Fastest Rubik Cube Solver is Made of Lego — it takes less than six seconds to solve the cube. Watch the video, it’s … wow. Also cool is watching it fail. (via Hacker News)
  3. Fairfax Watches BitTorrent (TorrentFreak) — At a government broadband conference in Sydney, Fairfax’s head of video Ricky Sutton admitted that in a country with one of the highest percentage of BitTorrent users worldwide, his company determines what shows to buy based on the popularity of pirated videos online.
  4. Web Performance Tools (Steve Souders) — compilation of popular web performance tools. Reminds me of nmap’s list of top security tools.

September 21 2012

Four short links: 21 September 2012

  1. Business Intelligence on FarmsMachines keep track of all kinds of data about each cow, including the chemical properties of its milk, and flag when a particular cow is having problems or could be sick. The software can compare current data with historical patterns for the entire herd, and relate to weather conditions and other seasonal variations. Now a farmer can track his herd on his iPad without having to get out of bed, or even from another state. (via Slashdot)
  2. USAxGITHUB — monitor activity on all the US Federal Government’s github repositories. (via Sarah Milstein)
  3. Rethinking Robotics — $22k general purpose industrial robot. “‘It feels like a true Macintosh moment for the robot world,’ said Tony Fadell, the former Apple executive who oversaw the development of the iPod and the iPhone. Baxter will come equipped with a library of simple tasks, or behaviors — for example, a “common sense” capability to recognize it must have an object in its hand before it can move and release it.” (via David ten Have)
  4. Shift LabsShift Labs makes low-cost medical devices for resource-limited settings. [Crowd]Fund the manufacture and field testing of the Drip Clip [...] a replacement for expensive pumps that dose fluid from IV bags.

August 21 2012

Four short links: 21 August 2012

  1. Recording Revenues for the Typical Artist (Digital Music News) — more than 82 percent of their revenue from paid downloads, with CDs accounting for more than 11 percent. That leaves streaming revenues – including Spotify – with a scant 6.5 percent contribution. (via Simon Grigg)
  2. Chinese SMS Payment Malwarethe virus — which lurks in wallpaper apps and ‘activates’ post-download – quietly gains access to users’ SMS functionality before exploiting a vulnerability within China Mobile’s SMS payment gateway to carry out transactions and access data.
  3. Wall Street’s Robots Are Not Out To Get You (Renee DiResta) — injecting some reality into the robotrading “IMMINENT DEATH OF MONEY PREDICTED” hypetastrophe.
  4. Blocker Flash Cards (Gamasutra) — a collection of common ways game developers try to stall progress on something they don’t like. Not common to the games industry, though: I think I’ve encountered every single one of the tactics in various guises. In other news, many human beings are passive-aggressive meatsacks waiting to be composted for the good of the planet.

August 17 2012

Wall Street’s robots are not out to get you

ABOVE by Lyfetime, on FlickrTechnology is critical to today’s financial markets. It’s also surprisingly controversial. In most industries, increasing technological involvement is progress, not a problem. And yet, people who believe that computers should drive cars suddenly become Luddites when they talk about computers in trading.

There’s widespread public sentiment that technology in finance just screws the “little guy.” Some of that sentiment is due to concern about a few extremely high-profile errors. A lot of it is rooted in generalized mistrust of the entire financial industry. Part of the problem is that media coverage on the issue is depressingly simplistic. Hyperbolic articles about the “rogue robots of Wall Street” insinuate that high-frequency trading (HFT) is evil without saying much else. Very few of those articles explain that HFT is a catchall term that describes a host of different strategies, some of which are extremely beneficial to the public market.

I spent about six years as a trader, using automated systems to make markets and execute arbitrage strategies. From 2004-2011, as our algorithms and technology became more sophisticated, it was increasingly rare for a trader to have to enter a manual order. Even in 2004, “manual” meant instructing an assistant to type the order into a terminal; it was still routed to the exchange by a computer. Automating orders reduced the frequency of human “fat finger” errors. It meant that we could adjust our bids and offers in a stock immediately if the broader market moved, which enabled us to post tighter markets. It allowed us to manage risk more efficiently. More subtly, algorithms also reduced the impact of human biases — especially useful when liquidating a position that had turned out badly. Technology made trading firms like us more profitable, but it also benefited the people on the other sides of those trades. They got tighter spreads and deeper liquidity.

Many HFT strategies have been around for decades. A common one is exchange arbitrage, which Time magazine recently described in an article entitled “High Frequency Trading: Wall Street’s Doomsday Machine?”:

A high-frequency trader might try to take advantage of minuscule differences in prices between securities offered on different exchanges: ABC stock could be offered for one price in New York and for a slightly higher price in London. With a high-powered computer and an ‘algorithm,’ a trader could buy the cheap stock and sell the expensive one almost simultaneously, making an almost risk-free profit for himself.

It’s a little bit more difficult than that paragraph makes it sound, but the premise is true — computers are great for trades like that. As technology improved, exchange arb went from being largely manual to being run almost entirely via computer, and the market in the same stock across exchanges became substantially more efficient. (And as a result of competition, the strategy is now substantially less profitable for the firms that run it.)

Market making — posting both a bid and an offer in a security and profiting from the bid-ask spread — is presumably what Knight Capital was doing when it experienced “technical difficulties.” The strategy dates from the time when exchanges were organized around physical trading pits. Those were the bad old days, when there was little transparency and automation, and specialists and brokers could make money ripping off clients who didn’t have access to technology. Market makers act as liquidity providers, and they are an important part of a well-functioning market. Automated trading enables them to manage their orders efficiently and quickly, and helps to reduce risk.

So how do those high-profile screw-ups happen? They begin with human error (or, at least, poor judgment). Computerized trading systems can amplify these errors; it would be difficult for a person sending manual orders to simultaneously botch their markets in 148 different companies, as Knight did. But it’s nonsense to make the leap from one brokerage experiencing severe technical difficulties to claiming that automated market-making creates some sort of systemic risk. The way the market handled the Knight fiasco is how markets are supposed to function — stupidly priced orders came in, the market absorbed them, the U.S. Securities and Exchange Commission (SEC) and the exchanges adhered to their rules regarding which trades could be busted (ultimately letting most of the trades stand and resulting in a $440 million loss for Knight).

There are some aspects of HFT that are cause for concern. Certain strategies have exacerbated unfortunate feedback loops. The Flash Crash illustrated that an increase in volume doesn’t necessarily mean an increase in real liquidity. Nanex recently put together a graph (or a “horrifying GIF“) showing the sharply increasing number of quotes transmitted via automated systems across various exchanges. What it shows isn’t actual trades, but it does call attention to a problem called “quote spam.” Algorithms that employ this strategy generate a large number of buy and sell orders that are placed in the market and then are canceled almost instantly. They aren’t real liquidity; the machine placing them has no intention of getting a fill — it’s flooding the market with orders that competitor systems have to process. This activity leads to an increase in short-term volatility and higher trading costs.

The New York Times just ran an interesting article on HFT that included data on the average cost of trading one share of stock. From 2000 to 2010, it dropped from $.076 to $.035. Then it appears to have leveled off, and even increased slightly, to $.038 in 2012. If (as that data suggests) we’ve arrived at the point where the “market efficiency” benefit of HFT is outweighed by the risk of increased volatility or occasional instability, then regulators need to step in. The challenge is determining how to disincentivize destabilizing behavior without negatively impacting genuine liquidity providers. One possibility is to impose a financial transaction tax, possibly based on how long the order remains in the market or on the number of orders sent per second.

Rethinking regulation and market safeguards in light of new technology is absolutely appropriate. But the state of discourse in the mainstream press — mostly comprised of scare articles about “Wall Street’s terrifying robot invasion” — is unfortunate. Maligning computerized strategies because they are computerized is the wrong way to think about the future of our financial markets.

Photo: ABOVE by Lyfetime, on Flickr


August 10 2012

Four short links: 10 August 2012

  1. The Coffee-Ring Effect (YouTube) — beautiful video of what happens in liquids as they evaporate, explaining why coffee stains are rings, and how to create liquids with even evaporative coating.
  2. The Importance of Quantitative Thinking Medicine (PDF) — scaling laws underly aging, metabolism, drug delivery, BMI, and more. Full of wow moments, like Fractals are a common feature of many complex systems ranging from river networks, earthquakes, and the internet to stock markets and cities. [...] Geometrically, the nested levels of continuous branching and crenulations inherent in fractal­like structures optimise the transport of information, energy, and resources by maximising the surface areas across which these essential features of life flow within any volume. Because of their fractal nature, these effective surface areas are much larger than their apparent physical size. For example, even though the volume of our lungs is about 5–6 L, the total surface area of all the alveoli is almost the size of a tennis court and the total length of airways is about 2500 km. Even more striking is that if all the arteries, veins, and capillaries of an individual’s circulatory system were laid end to end, its total length would be about 100000 km, or nearly two and a half times around the earth.
  3. Autonomous Robotic Plane at MIT (YouTube) — hypnotic to watch it discover the room. A product of the Robust Robotics Group at MIT.
  4. Electric Sheep — hypnotic screensaver, where the sleeping computers collaborate on animations. You can vote up or down the animation on your screen, changing the global gene pool. Popular animations survive and propagate.

August 07 2012

Four short links: 7 August 2012

  1. Why Toys Make Good Medical Devices (YouTube) — Jose Gomez-Marquez profiled by CNN. His group at MIT is Little Devices.
  2. 3D Printed Exoskeletal Arms for Little Girlresearchers at a Delaware hospital 3D printed a durable custom device with the tiny, lightweight custom parts she needed. Good for iterations, replacements, and an astonishingly high number of “awww” moments in the video.
  3. Figshareallows researchers to publish all of their data in a citable, searchable and sharable manner. All data is persistently stored online under the most liberal Creative Commons licence, waiving copyright where possible. figshare was started by a frustrated Imperial College PhD student as a way to disseminate all research outputs and not just static images through traditional academic publishing. It is now supported by Digital Science, a Macmillan Publishers company.
  4. Zombeeshoney bees that have been parasitized by the Zombie Fly Apocephalus borealis. Fly-parasitized honey bees become “ZomBees” showing the “zombie-like behavior” of leaving their hives at night on “a flight of the living dead.” See also NPR interview.

July 27 2012

Four short links: 27 July 2012

  1. Social Media in China (Fast Company) — fascinating interview with Tricia Wang. We often don’t think we have a lot to learn from tech companies outside of the U.S., but Twitter should look to Weibo for inspiration for what can be done. It’s like a mashup of Tumblr, Zynga, Facebook, and Twitter. It’s very picture-based, whereas Twitter is still very text-based. In Weibo, the pictures are right under each post, so you don’t have to make an extra click to view them. And people are using this in subversive ways. Whether you’re using algorithms to search text or actual people–and China has the largest cyber police force in the world—it’s much easier to censor text than images. So people are very subversive in hiding messages in pictures. These pictures are sometimes very different than what people are texting, or will often say a lot more than the actual text itself. (via Tricia Wang)
  2. A Treatise on Font Rasterisation With an Emphasis on Free Software (Freddie Witherden) — far more than you ever thought you wanted to know about how fonts are rendered. (via Thomas Fuchs)
  3. Softwear Automation — robots to make clothes, something which is surprisingly rare. (via Andrew McAfee)
  4. A Guide to Analyzing Python Performance — finding speed and memory problems in your Python code. With pretty pictures! (via Ian Kallen)

April 19 2012

Strata Week: The rise of the robot essay graders

Here are a few of the data stories that caught my attention this week.

Automated essay-scoring software scores as well as humans

Taking a test at the Real Estate Investing College by Casey Serin, on FlickrRobot essay graders: They grade the same as humans. That's the conclusion of a study conducted by University of Akron's Dean of the College of Education Mark Shermis and Kaggle data scientist Ben Hamner. The researchers examined some 22,000 essays that were administered to junior and high school students as part of their states' standardized testing process, comparing the grades given by human graders and those given by automated grading software. They found that "overall, automated essay scoring was capable of producing scores similar to human scores for extended-response writing items with equal performance for both source-based and traditional writing genre" (PDF of the report).

"The demonstration showed conclusively that automated essay scoring systems are fast, accurate, and cost effective," says Tom Vander Ark, managing partner at the investment firm Learn Capital, in a press release touting the study's results.

The study coincides with an active competition hosted on Kaggle and sponsored by the Hewlett Foundation, in which data scientists are challenged with developing the best algorithm to automatically grade student essays. "Better tests support better learning," noted the foundation's Education Program Director Barbara Chow in the press release. "This demonstration of rapid and accurate automated essay scoring will encourage states to include more writing in their state assessments. And, the more we can use essays to assess what students have learned, the greater the likelihood they'll master important academic content, critical thinking, and effective communication."

Personally, I like writing for a human audience. Bots leave really stupid blog comments — but I bet there's an algorithm for that too.

Velocity 2012: Web Operations & Performance — The smartest minds in web operations and performance are coming together for the Velocity Conference, being held June 25-27 in Santa Clara, Calif.

Save 20% on registration with the code RADAR20

Scaling Instagram

The billion-dollar acquisition of the mobile photo-sharing app Instagram was big news last week. The news coincided with a presentation by co-founder Mike Krieger at an AirBnB Tech Talk about how the startup managed to scale to 30 million users worldwide with a small team of back-end developers (a very small team, in fact). Krieger's presentation is interesting in its own right, of course, but news of the acquisition by Facebook certainly fueled interest — in the deal and in the tech under the Instagram hood.

Krieger's slides can be found here. The presentation details some of the early and ongoing challenges of handling the app's increasing number of users and their photos (including the recent roll-out of an Android app, which added another million new users in just 12 hours). Although Instagram hasn't suffered any major outages of the likes seen by Twitter and Tumblr, Krieger does note a number of early problems, including a missing favicon.ico that was causing a lot of 404 errors in Django.


The UK's National Audit Office has just released its look at the government's open data efforts, reports The Guardian. Although the open data initiative gets good marks for the "tsunami of data" it's released — 8,300 datasets — there remain questions about cost and usage.

Governmental departments estimate they spend between £53,000 and £500,000 each year on publishing the data, with the police crime maps, for example, costing £300,000 to set up and £150,000 per year to maintain. And it's not clear that the data is in demand, according to the National Audit Office report: "None of the departments reported significant spontaneous public demand for the standard dataset releases." This doesn't account for the ways in which third-party vendors may be using the data, however.

Big Data Week

April 23-29 is "Big Data Week," an event created by DataSift that will feature meetups and hackathons in several cities around the world. Big Data Week aims to bring together the "core communities" — data scientists, data technologies, data visualization, and data business. A list of events is available on the Big Data Week website.

Got data news?

Feel free to email me.

Photo: Taking a test at the Real Estate Investing College

November 16 2011

Four short links: 16 November 2011

  1. Q&A with Rob O'Callahan (ComputerWorld) -- an excellent insight into how Mozilla sees the world. In particular how proprietary mobile ecosystems are the new proprietary desktop ecosystems, and how the risks for the web are the same (writing for one device, not for all).
  2. Bikes That Charge USB Devices -- German bicycle maker Silverback has recently launched two bikes with built-in USB ports that can charge devices as the rider pedals. (via Julie Starr)
  3. Mobile Farm Robots (Wired) -- The Harvest Automation robots are knee-high, wheeled machines. Each robot has a gripper for grasping pots, a deck for carrying pots, and an array of sensors to keep track of where it is and what’s around it. Teams of robots zip around nursery fields, single-mindedly spacing and grouping plants. Think Wall-E without the doe eyes and cuddly personality, or the little forest-tending ‘bots in the 1972 sci-fi classic Silent Running.
  4. ThinkUp 1.0 -- out of beta, the software to build your own archive of your social network presence is ready for prime time. See Anil's post for a pointed take on why this is desperately important right now.

