Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

April 04 2012

Four short links: 4 April 2012

  1. Typing Club -- lessons to improve your touch-typing, building you up letter by letter to speed and mastery. Like how I learned, only without the typewriters and the bibs and the roomful of girls. It wasn't easy being the only boy in typing class, but somehow I managed. (via EdTech ideas)
  2. SQL Injection via HTTP Headers -- excellent introduction to how some surprising HTTP headers can be attack vectors.
  3. How Not to Sort by Average Rating (Evan Miller) -- so easy to get it wrong, so eye-wateringly complex a formula to do it right. (via Hacker News)
  4. I Hereby Resign (Reg Braithwaite) -- not an actual resignation letter, but it highlights exactly why asking to see applicants' Facebook pages is a bad idea. "If you are surfing my Facebook, you could reasonably be expected to discover that I am a Lesbian. Since discrimination against me on this basis is illegal in Ontario, I am just preparing myself for the possibility that you might refuse to hire me and instead hire someone who is a heterosexual but less qualified in any way. Likewise, if you do hire me, I might need to have your employment contracts disclosed to ensure you aren't paying me less than any male and/or heterosexual colleagues with equivalent responsibilities and experience." Ditto "spouse is pregnant so I'm about to take maternity leave just after you hire me", etc. Those things you spend days thumping into HR that they aren't supposed to ask about? All on the applicants' Facebook pages.

November 25 2011

Four short links: 25 November 2011

  1. Continuous Three-Dimensional Control of a Virtual Helicopter Using a Motor Imagery Based Brain-Computer Interface (PLOSone) -- direct brain control is becoming a reality, tiny step by tiny step. Also: HELICOPTERS!
  2. Forward Secrecy for HTTPS -- Google contributed a better HTTPS cipher suite to OpenSSL, one that doesn't share keys between conversations. Yay the Goog for giving back.
  3. Ratings Systems (Quora) -- very good answer from the VP of Engineering at Netflix about the purposes and effects of different ratings and feedback systems. Full of pithy and true guidelines like: Your users have a certain mental budget they will invest in your rating system. The more work you make each decision, the fewer decisions you will get. This is true in many contexts other than rating systems as well. You can't randomly throw feedback mechanisms into your app, you must design them as deliberately and thoughtfully as the rest of your site.
  4. InstaCSS -- very simple very useful reference site. Grod like simplicity.

November 08 2011

October 28 2011

Four short links: 28 October 2011

  1. Open Access Week -- a global event promoting Open Access as a new norm in scholarship and research.
  2. The Copiale Cipher -- cracking a historical code with computers. Details in the paper: The book describes the initiation of "DER CANDIDAT" into a secret society, some functions of which are encoded with logograms. (via Discover Magazine)
  3. Coordino -- open source Quota-like question-and-answer software. (via Smashing Magazine)
  4. Baroque.me -- visualization of the first prelude from the first Cello Suite by Bach. Music is notoriously difficult to visualize (Disney's Fantasia is the earliest attempt that I know of) as there is so much it's possible to capture. (via Andy Baio)

September 22 2011

Four short links: 22 September 2011

  1. Implicit and Explicit Feedback -- for preferences and recommendations, implicit signals (what people clicked on and actually listened to) turn out to be strongly correlated with what they would say if you asked. (via Greg Linden)
  2. Pivoting to Monetize Mobile Hyperlocal Social Gamification by Going Viral -- Schuyler Erle's stellar talk at the open source geospatial tools conference. Video, may cause your sides to ache.
  3. repl.it -- browser-based environment for exploring different programming languages from FORTH to Python and Javascript by way of Brainfuck and LOLCODE.
  4. Twitter Storm (GitHub) -- distributed realtime computation system, intended for realtime what Hadoop is to batch processing. Interesting because you improve most reporting and control systems when you move them closer to real-time. Eclipse-licensed open source.

February 17 2011

Four short links: 17 February 2011

  1. The True Cost of Publishing on the Kindle -- an article, apparently by a horrified negotiator with Amazon, revealing that magazine and newspaper publishers pay the WhisperNet delivery costs of their editions. That's not Amazon overhead, it comes out of the publisher's royalty slice. (via Hacker News)
  2. Fonts in Use -- examples of sweet typography and the fonts that were used.
  3. Ffffound -- social network for graphic designers (invite only) with a "people who liked also liked" type of recommendation system. Very clever. So as you research "I want to build a cheesy 70s logo", you thumbs up the images you like and soon the system is suggesting designs with elements of cheesy 70s logos to you. I love that it is invitation-only: you're trusting the judgement of the other people, so you had better only let in people whose judgement you trust.
  4. China's Second Wives and Gift Culture -- second wives, status, and brand. But any city that has a middle class is going to have Second Wives. [...] Even Jiang Zemin, the former President, had a very high profile mistress - a singer called Song Zuying who appears on the Chinese New Year programme every year. And it's not a scandal. A reminder that if you think you can export your crappy business built on American status symbols, you're leaping into the Sea of Fail. (via Sciblogs)

December 02 2010

Strata Week: Replaced by robots

Forget very small shell scripts: these days, there are robots looking to replace us.

Stories from statistics

Strata 2011The only thing better than seeing all the stats on your favorite sports team is reading a good article about their most recent game. An article not only tells you a story, but also gives you a sense of connection to another fan (or at least follower) of your team: the writer. These days, however, that writer could be a robot.

A company called StatSheet has been analyzing data on college and pro sports since 2007, but this month launched a network of almost 350 websites dedicated to individual Division I basketball teams that will feature "automated content."

The StatSheet Network provides maximum coverage of every team, regardless of the size of the team's fan base or surrounding population. And because our technology platform generates content automatically in real-time, you will be able to get that coverage without the delays and inefficiencies imposed by traditional media companies.

According to the New York Times, the "story-writing software does not perform linguistic analysis; it just uses template sentences and a database of phrases that numbers about 5,000 for now." Still, even these basics can lead to results that sound somewhat authentic, despite simple sentence structures.

One of the StatSheet Network's Big Ten school sites
Caption: One of the StatSheet Network's Big Ten school sites.

One can easily imagine applications for such robo-writing in financial reporting or advertising, or in other venues that draw heavily on quantifiable data.

Not your mother's targeting advertising

Speaking of advertising, targeting may get a wee bit more personal -- as in, uniquely personal. That's the idea behind BlueCava, Inc., a company working to digitally "fingerprint" each of the world's devices: not just computers, but cell phones, gaming consoles, and potentially even cars. They're currently at about 200 million devices registered and counting; they expect to reach 1 billion by the end of next year, according to the Wall Street Journal.

We're not just talking cookies here. BlueCava looks at all the different information each device provides, such as software and fonts installed, timestamps, user agents, screen size, and browser plugins. It then assigns a device ID token, and can track the online behavior of that device.

BlueCava deploys that information for two purposes: to combat fraud (e.g., the same device being used to log into many accounts in order to use many credit cards), and to help marketers discover consumer behavior. That kind of uniquely targeted device data is a goldmine to advertisers -- especially when combined with a bit of extra information such as user demographics or estimated income.

Income, you ask? Well, yes. "BlueCava says the information it collects about devices can't be traced back to individuals and that it will offer people a way to opt out of being tracked," the WSJ reported. But imagine that a device's user logs into a website or downloads a phone app with BlueCava's technology embedded. Whatever name or email address that person uses to log in can be matched against offline databases such as property deeds, vehicle registrations, and other public records (including income estimates). The potential marketing power of online and offline data aggregated into a single, evolving user profile is enormous.

As are, clearly, the privacy concerns. The FTC this week released a preliminary staff report proposing "a framework to balance the privacy interests of consumers with innovation that relies on consumer information to develop beneficial new products and services." The full report is also expected to contain recommendations for a "Do Not Track" mechanism.

Public comments on the FTC's report will be accepted until Jan. 31, 2011.





The opportunities and implications of data products will be examined at the Strata Conference (Feb. 1-3, 2011). Save 30% on registration with the code STR11RAD.





You can call it Al


You know we can't mention the federal government in a post about robots without arriving at the military. A British defense contractor called BAE Systems has joined forces with academia (Imperial College London, University of Southampton, University of Bristol, and Oxford University) to develop the Autonomous Learning Agents for Decentralised Data and Information Networks: ALADDIN.

Simply put, ALLADIN is somewhat like the Borg: it's a collective of robot soldiers that collect and share data before arriving at a joint decision about how to proceed. This could turn out to provide better decisions in chaotic situations than those stemming from a single leader, and that's the research question at issue in the project.

While the test scenarios are currently focused on disaster relief, The Economist points out the clear implications for other chaotic situations, such as warfare. ALADDIN's strengths seem to include multi-agent coordination, situational awareness, and resource allocation. And it does so without human emotion.

Whether that's a good thing, and whether the algorithms can be fine-tuned well enough to deploy pursuit in life-taking rather than life-saving, remains to be seen.

A stitch in data saves ... ?

If automated information retrieval floats your boat, be sure to check out Needlebase, a tool for harvesting, merging, and exploring data just made available to the public.

This brief video tutorial offers a great introduction:


The genius of Needle as a tool is that it provides a platform from which to browse the web, allowing you to "train" it as you go. Build a template for your dataset, show Needle which fields on a web page contain the information you want, and then let it guess a few additional example pages. After you confirm that it's got the hang of things, watch it import all the data you want from a collection of pages. You can also import from local CSV files or other types of data stores.

Then, when it's time to de-dup or merge fields, a simple drag-and-drop interface makes things a snap. Needle can also help visualize or publish your data.

You can read about a recent use-case at ReadWriteWeb. Marshall Kirkpatrick has a longer write-up here.

Send us news

Email us news, tips and interesting tidbits at strataweek@oreilly.com.

June 23 2010

Four short links: 23 June 2010

  1. Ira Glass on Being Wrong (Slate) -- fascinating interview with Ira Glass on the fundamental act of learning: being wrong. I had this experience a couple of years ago where I got to sit in on the editorial meeting at the Onion. Every Monday they have to come up with like 17 or 18 headlines, and to do that, they generate 600 headlines per week. I feel like that's why it's good: because they are willing to be wrong 583 times to be right 17. (via Hacker News)
  2. Real Lives and White Lies in the Funding of Scientific Research (PLoSBiology) -- very clear presentation of the problems with the current funding models of scientific research, where the acknowledged best scientists spend most of their time writing funding proposals. K.'s plight (an authentic one) illustrates how the present funding system in science eats its own seed corn. To expect a young scientist to recruit and train students and postdocs as well as producing and publishing new and original work within two years (in order to fuel the next grant application) is preposterous.
  3. jQTouch Roadmap -- interesting to me is the primary distinction between Sencha and jQTouch, namely that jQT is for small devices (phones) only, while Sencha handles small and large (tablet) touch-screen devices. (via Simon St Laurent)
  4. Travel Itineraries from Flickr Photo Trails (Greg Linden) -- clever idea, to use metadata extracted from Flickr photos (location, time, etc.) to construct itineraries for travellers, saying where to go, how long to spend there, and how long to expect to spend getting from place to place. Another story of the surprise value that can be extracted from overlooked data.

May 31 2010

Four short links: 31 May 2010

  1. Transparency is Not Enough (danah boyd) -- we need people to not just have access to the data, but have access to the context surrounding the data. A very thoughtful talk from Gov 2.0 Expo about meaningful data release.
  2. Feed6 -- the latest from Rohit Khare is a sort of a "hot or not" for pictures posted to Twitter. Slightly addictive, while somewhat purposeless. Remarkable for how banal the "most popular" pictures are, it reminds me of the way Digg, Reddit, and other such sites trend towards the uninteresting and dissatisfying. Flickr's interestingness still remains one of the high points of user-curated notability. (via rabble on Twitter)
  3. Potential Policy Recommendations to Support the Reinvention of Journalism (PDF) -- FTC staff discussion document that floats a number of policy proposals around journalism: additional IP rights to defend against aggregators like Google News; protection of "hot news" facts; statutory limits to "fair use"; antitrust exemptions for cartel paywalls; and more. Jeff Jarvis hates it, but Alexander Howard found something to love in the proposal that the government "maximize the easy accessibility of government information" to help journalists find and investigate stories more easily. (via Jose Antonio Vargas)
  4. Smokescreen -- a Flash player in Javascript. See Simon Willison's explanation of how it works. Was created by the fantastic Chris Smoak, who was an early Google Maps hacker and built the BusMonster interface to Seattle public transport. (via Simon Willison)

May 20 2010

Four short links: 20 May 2010

  1. People are Walking Architecture -- presentation by Matt Jones of BERG, taking a new lens to this AR/ubicomp/whatever-it-is-today world. "[Mobile phones are] a whole toy box full of playful, inventive strategies for exploring cities ...."
  2. Lexicalist -- insight into geographic and age distribution of language use, based on Twitter data. (via Language Log)
  3. Advanced Visualization Techniques -- nice overview of some non-standard visualization techniques. Short shameful confession: I love polar dendrograms with a passion. These techniques are to visualizers as algorithms and data structures to programmers: each is used in specific circumstances and compromises some things to gain in others. (via Flowing Data)
  4. iPad Usability Report (Nielsen-Norman Group) -- 93-page report based on user studies. The iPad etched-screen aesthetic does look good. No visual distractions or nerdy buttons. The penalty for this beauty is the re-emergence of a usability problem we haven't seen since the mid-1990s: Users don't know where they can click. For the last 15 years of Web usability research, the main problems have been that users don't know where to go or which option to choose — not that they don't even know which options exist. With iPad UIs, we're back to this square one. (via Andrew Savikas)

April 29 2010

March 25 2010

Four short links: 25 March 2010

  1. Aren't You Being a Little Hasty in Making This Data Free? -- very nice deconstruction of a letter sent by ESRI and competitors to the British Government, alarmed at the announcement that various small- and mid-sized datasets would no longer be charged for. In short, companies that make money reselling datasets hate the idea of free datasets. The arguments against charging are that the cost of gating access exceeds revenue and that open access maximises economic gain. (via glynmoody on Twitter)
  2. User Assisted Audio Selection -- amazing movie that lets you sing or hum along with a piece of music to pull them out of the background music. The researcher, Paris Smaragdis has a done lot of other nifty audio work. (via waxpancake on Twitter)
  3. Cologne-based Libraries Release 5.4M Bibliographic Records to CC0 -- I see resonance here with the Cologne Archives disaster last year, where the building collapsed and 18km of shelves covering over 2000 years of municipal history were lost. When you have digital heritage, embrace the ease of copying and spread those bits as far and wide as you can. Hoarding bits comes with a risk of a digital Cologne disaster, where one calamity deletes your collection. (via glynmoody on Twitter)
  4. ThinkTank -- web app that lets you analyse your tweets, break down responses to queries, and archive your Twitter experience. Built by Expert Labs.

March 15 2010

Four short links: 15 March 2010

  1. A German Library for the 21st Century (Der Spiegel) -- But browsing in Europeana is just not very pleasurable. The results are displayed in thumbnail images the size of postage stamps. And if you click through for a closer look, you're taken to the corresponding institute. Soon you're wandering helplessly around a dozen different museum and library Web sites -- and you end up lost somewhere between the "Vlaamse Kunstcollectie" and the "Wielkopolska Biblioteka Cyfrowa." Would it not be preferable to incorporate all the exhibits within the familiar scope of Europeana? "We would have preferred that," says Gradmann. "But then the museums would not have participated." They insist on presenting their own treasures. This is a problem encountered everywhere around the world: users hate silos but institutions hate the thought of letting go of their content. We're going to have to let go to win. (via Penny Carnaby)
  2. StoryGarden -- a web-based tool for gathering and analyzing a large number of stories contributed by the public. The content of the stories, along with some associated survey questions, are processed in an automated semantic computing process for an immediate, interactive display for the lay public, and in a more thorough manual process for expert analysis.
  3. Google Apps Script -- VBA for the 2010s. Currently mainly for spreadsheets, but some hooks into Gmail and Google Calendar.
  4. There's a Rootkit in the Closet -- lovely explanation of finding and isolating a rootkit, reconstructing how it got there and deconstructing the rootkit to figure out what it did. It's a detective story, no less exciting than when Cliff Stohl wrote The Cuckoo's Egg.

February 26 2010

Four short links: 26 February 2010

  1. Who Is Going To Build The New Public Services? -- a thoughtful exploration of the possibilities and challenges of third parties building public software systems. There's a lot of talk of "just put up the data and we'll build the apps" but I think this is a more substantial consideration of which apps can be built by whom.
  2. Quake 3 for Android -- kiss the weekend goodbye, NexusOne owners! My theory is that no platform has "made it" until a first person shooter has been ported to it. (via BoingBoing)
  3. Graph Mining -- slides and reading list from seminar series at UCSB on different aspects of mining graphs. Relevant because, obviously, social networks are one such graph to be mined.
  4. Treadmill Desk -- I want one. Staying fit while working at a sedentary job is important but not easy. I tried to type while using a stepper, but that's just a recipe for incomprehensible typing fail. (via BoingBoing)

February 24 2010

Four short links: 24 February 2010

  1. Maker! Map the World's Data -- web-based tool to make elegant maps from public data.
  2. Flat World Knowledge, a commercial publisher of open textbooks (Jon Udell) -- notable if only for Publishers need to be device-agnostic in the broadest sense. The printed book is one of the devices we target..
  3. datapkg -- a data packaging tool, so you can easily find and install datasets.
  4. On Karma -- very detailed look at user reputation, full of great takeaways. As with the FICO score, it is a bad idea to co-opt a reputation system for another purpose, and it dilutes the actual meaning of the score in its original context.

February 16 2010

Fourt short links: 16 Feb 2010

  1. Of Tandoori and Epicuration (JP Rangaswami) -- Curation is the process by which aggregate data is imbued with personalised trust.
  2. Siri -- a personal assistant iPhone app, like IWantSandy but with voice recognition.
  3. Evaluating the Reasons for Non-use of Cornell University's Institutional Repository -- great lessons for all open data projects. The reward structure established by each discipline largely defines the motivation behind faculty behavior. As eloquently stated by the economist, "While we are going through a digital revolution - in the way we teach and communicate with each other - the reputation of being published in the print journals is still the strongest incentive for motivation." This position was largely echoed by the engineer, who stated "what is holding us to the journal is the promotion procedure. This is about a problem of measurement with how Cornell evaluates my work." That said, there are real risks associated with changing one's practices, especially when one assumes the role of an early innovator. As the communication faculty member summarized, "There has to be a better way than the current system, but I'm not willing to be on the leading edge in using that system." (via JHW)
  4. Google Voice Transcriptions Annotated as Poetry -- found art that reminds us that it's hard to wreck a nice beach.
    WHATEVER THIS IS (Caller: My friend Christina)

    Hey mister
    it's Christina
    just left you a message and then
    I got your message and realized
    you're stuck out

    but I'll try you.

    But yeah, just trying to be tomorrow
    (if you get the chance)
    And if you're a few Karen in China the next day
    Council lot more
    eating minnows on the step
    and give me a little

    I'll be hanging around then and I am
    well,
    whatever this is.

December 01 2009

Four short links: 1 November 2009

  1. Apertus -- open source cinema camera. (via joshua on Delicious)
  2. A Survey of Collaborative Filtering Techniques -- From basic techniques to the state-of-the-art, we attempt to present a comprehensive survey for CF techniques, which can be served as a roadmap for research and practice in this area. (via bos on Delicious)
  3. Drizzle Replication using RabbitMQ as Transport -- we're watching the growing use of message queues in web software, and here's an interesting application. (via sogrady on Delicious)
  4. Facebook Data Team: Distributed Data Analysis at Facebook -- job ad from Facebook gives numbers on company use of their Hive data warehouse tool built on top of Hadoop: Today, Facebook counts 29% of its employees (and growing!) as Hive users. More than half (51%) of those users are outside of Engineering. They come from distinct groups like User Operations, Sales, Human Resources, and Finance. Many of them had never used a database before working here. Thanks to Hive, they are now all data ninjas who are able to move fast and make great decisions with data. (via Simon Willison)

November 13 2009

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl