Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

May 24 2012

Four short links: 24 May 2012

  1. Last Saturday My Son Found His People at the Maker Faire -- aww to the power of INFINITY.
  2. Dictionaries Linking Words to Concepts (Google Research) -- Wikipedia entries for concepts, text strings from searches and the oppressed workers down the Text Mines, and a count indicating how often the two were related.
  3. Magic Wand (Kickstarter) -- I don't want the game, I want a Bluetooth magic wand. I don't want to click the OK button, I want to wave a wand and make it so! (via Pete Warden)
  4. E-Commerce Performance (Luke Wroblewski) -- If a page load takes more than two seconds, 40% are likely to abandon that site. This is why you should follow Steve Souders like a hawk: if your site is slower than it could be, you're leaving money on the table.

April 19 2012

Four short links: 19 April 2012

  1. Superfastmatch -- open source text comparison tool, used to locate plagiarism/churnalism in online news sites. You can pull out the text engine and use it for your own "find where this text is used elsewhere" applications (e.g., what's being forwarded out in email, how much of this RFP is copy and paste, what's NOT boilerplate in this contract, etc.). (via Pete Warden)
  2. Ten Design Principles for Engaging Math Tasks (Dan Meyer) -- education gold, engagement gold, and some serious ideas you can use in your own apps.
  3. Clustering Related Stories (Jenny Finkel) -- description of how to cluster related stories, talks about some of the tricks. Interesting without being too scary.
  4. Prince of Persia (GitHub) -- I have waited to see if the novelty wore off, but I still find this cool: 1980s source code on GitHub.

March 23 2012

Visualization of the Week: Anachronistic language in "Mad Men"

"Mad Men" returns on Sunday night for its fifth season, and Princeton grad student Ben Schmidt returns with a look at anachronistic language in the series widely acclaimed for its historical accuracy.

Last month, we looked at Schmidt's visualization of anachronistic language in PBS' "Downton Abbey," which is set in the 1910s. Schmidt takes the same approach to examining the dialogue of "Mad Men," running the scripts through the Google Ngram database to see how the show's language stacks up against texts published in the 1960s.

Anachronistic language in Mad Men
A look at the anachronistic language in "Mad Men." Check out a larger version of this image and read related analysis.

How does the show fare? Schmidt has found that there are "noticeably fewer outliers towards the top" in "Mad Men" compared to "Downton Abbey." Moreover, those outliers are actually appropriate. He has an essay in this week's Atlantic looking at some of this language in more detail.

Found a great visualization? Tell us about it

This post is part of an ongoing series exploring visualizations. We're always looking for leads, so please drop a line if there's a visualization you think we should know about.

Fluent Conference: JavaScript & Beyond — Explore the changing worlds of JavaScript & HTML5 at the O'Reilly Fluent Conference (May 29 - 31 in San Francisco, Calif.).

Save 20% on registration with the code RADAR20

More Visualizations:

February 16 2012

Four short links: 16 February 2012

  1. The Undue Weight of Truth (Chronicle of Higher Education) -- Wikipedia has become fossilized fiction because the mechanism of self-improvement is broken.
  2. Playfic -- Andy Baio's new site that lets you write text adventures in the browser. Great introduction to programming for language-loving kids and adults.
  3. Review of Alone Together (Chris McDowall) -- I loved this review, its sentiments, and its presentation. Work on stuff that matters.
  4. Why ESRI As-Is Can't Be Part of the Open Government Movement -- data formats without broad support in open source tools are an unnecessary barrier to entry. You're effectively letting the vendor charge for your data, which is just stupid.

February 08 2012

Four short links: 8 February 2012

  1. Mavuno -- an open source, modular, scalable text mining toolkit built upon Hadoop. (Apache-licensed)
  2. Cow Clicker -- Wired profile of Cowclicker creator Ian Bogost. I was impressed by Cow Clickers [...] have turned what was intended to be a vapid experience into a source of camaraderie and creativity. People create communities around social activities, even when they are antisocial. (via BoingBoing)
  3. Unicode Has a Pile of Poo Character (BoingBoing) -- this is perfect.
  4. The Research Works Act and the Breakdown of Mutual Incomprehension (Cameron Neylon) -- an excellent summary of how researchers and publishers view each other and their place in the world.

January 13 2012

Four short links: 13 January 2012

  1. How The Internet Gets Inside Us (The New Yorker) -- at any given moment, our most complicated machine will be taken as a model of human intelligence, and whatever media kids favor will be identified as the cause of our stupidity. When there were automatic looms, the mind was like an automatic loom; and, since young people in the loom period liked novels, it was the cheap novel that was degrading our minds. When there were telephone exchanges, the mind was like a telephone exchange, and, in the same period, since the nickelodeon reigned, moving pictures were making us dumb. When mainframe computers arrived and television was what kids liked, the mind was like a mainframe and television was the engine of our idiocy. Some machine is always showing us Mind; some entertainment derived from the machine is always showing us Non-Mind. (via Tom Armitage)
  2. SWFScan -- Windows-only Flash decompiler to find hardcoded credentials, keys, and URLs. (via Mauricio Freitas)
  3. Paranga -- haptic interface for flipping through an ebook. (via Ben Bashford)
  4. Facebook Gives Politico Deep Access to Users Political Sentiments (All Things D) -- Facebook will analyse all public and private updates that mention candidates and an exclusive partner will "use" the results. Remember, if you're not paying for it then you're the product and not the customer.

January 12 2012

Four short links: 12 January 2012

  1. Smart Hacking for Privacy -- can mine smart power meter data (or even snoop it) to learn what's on the TV. Wow. (You can also watch the talk). (via Rob Inskeep)
  2. Conditioning Company Culture (Bryce Roberts) -- a short read but thought-provoking. It's easy to create mindless mantras, but I've seen the technique that Bryce describes and (when done well) it's highly effective.
  3. hydrat (Google Code) -- a declarative framework for text classification tasks.
  4. Dynamic Face Substitution (FlowingData) -- Kyle McDonald and Arturo Castro play around with a face tracker and color interpolation to replace their own faces, in real-time, with celebrities such as that of Brad Pitt and Paris Hilton. Awesome. And creepy. Amen.

January 09 2012

The hidden language and "wonderful experience" of product reviews

How do reviews, both positive and negative, influence the price of a product on Amazon? What phrases used by reviewers make us more or less likely to complete a purchase? These are some of the questions that computer scientist Panagiotis Ipeirotis, an associate professor at New York University's Stern School of Business, set out to investigate by analyzing the text in thousands of reviews on Amazon. Ipeirotis continues to research this space.

Ipeirotis' findings are surprising: consumers will pay more for the same product if the seller's reviews are good, certain types of negative reviews actually boost sales, and spelling plays an important role.

Our interview follows.

How important are product reviews on Amazon? Can they give sellers more pricing power?

http://assets.en.oreilly.com/1/eventprovider/1/_@user_4490.jpgPanagiotis Ipeirotis: The reviews have a significant effect. When buying online, customers are not only purchasing the product, they're also inherently buying the guarantee of a seamless transaction. Customers read the feedback left from other buyers to evaluate the reputation of the seller. Since customers are willing to pay more to buy from merchants with a better reputation — something we call the "reputation premium" — that feedback tends to have an effect on future prices that the merchant can charge.

What are some of the most influential phrases?

Panagiotis Ipeirotis: "Never received" is a killer phrase in terms of reputation. It reduced the price a seller can charge by an average of $7.46 in the products examined. "Wonderful experience" is one of the most positive, increasing the price a seller can charge by $5.86 for the researched products.

How can very positive reviews be bad for sales?

Panagiotis Ipeirotis: Extremely positive reviews that contain no concrete details tend to be perceived as non-objective — written by fanboys or spammers. We observed this mainly in the context of product reviews, where superlative phrases like "Best camera!" with no further details are actually seen negatively.

Can a negative review ever be good for sales?

Panagiotis Ipeirotis: It can when the review is overly negative or criticizes aspects of the product that are not its primary purpose — the video quality in an SLR camera, for example. Or, when customers have unreasonable expectations: "Battery life lasts only for two days of shooting." Readers interpret these types of negative comments as "This is good enough for me," and it decreases their uncertainty about the product.

What is the effect of badly written reviews on sales?

Panagiotis Ipeirotis: Reviews containing spelling and grammatical errors consistently result in suboptimal outcomes, like lower sales or lower response rates. That was a fascinating but, in retrospect, expected finding. This holds true in a wide variety of settings, from reviews of electronics to hotels. It's even the case when examining email correspondence about a decision, such as whether or not to hire a contractor.

We don't know the exact reason yet, but the effect is very systematic. There are several possible explanations:

  • Readers think that the customers who buy this product are uneducated, so they don't buy it.
  • Reviews that are badly written are considered unreliable and therefore increase the uncertainty about the product.
  • Badly written reviews are unsuccessful attempts to spam and are a signal that even the other good reviews may not be authentic.

What's the relationship between the product attributes discussed in reviews and the attributes that lead to sales?

Panagiotis Ipeirotis: We observed that the aspects of a product that drive the online discussion are not necessarily the ones that define consumer decisions to buy it. For example, "zoom" tends to be discussed a lot for small point-and-shoot cameras. However, very few people are influenced by the zoom capabilities when it comes down to deciding which camera to buy.

This interview was edited and condensed.

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

Related:

Four short links: 9 January 2012

  1. Mr Daisey and the Apple Factor (This American Life) -- episode looking at the claims of human rights problems in Apple's Chinese factories.
  2. OpenPilot -- open source UAVs with cameras. Yes, a DIY spy drone on autopilot. (via Jim Stogdill)
  3. mbox -- more technical information than you ever thought you'd need, to be saved for the time when you have to parse mailbox files. It's a nightmare. (via Hacker News)
  4. Maui (Google Code) -- Maui automatically identifies main topics in text documents. Depending on the task, topics are tags, keywords, keyphrases, vocabulary terms, descriptors, index terms or titles of Wikipedia articles. GPLv3.

December 28 2011

December 26 2011

Four short links: 26 December 2011

  1. Pattern -- a BSD-licensed bundle of Python tools for data retrieval, text analysis, and data visualization. If you were going to get started with accessible data (Twitter, Google), the fundamentals of analysis (entity extraction, clustering), and some basic visualizations of graph relationships, you could do a lot worse than to start here.
  2. Factorie (Google Code) -- Apache-licensed Scala library for a probabilistic modeling technique successfully applied to [...] named entity recognition, entity resolution, relation extraction, parsing, schema matching, ontology alignment, latent-variable generative models, including latent Dirichlet allocation. The state-of-the-art big data analysis tools are increasingly open source, presumably because the value lies in their application not in their existence. This is good news for everyone with a new application.
  3. Playtomic -- analytics as a service for gaming companies to learn what players actually do in their games. There aren't many fields untouched by analytics.
  4. Write or Die -- iPad app for writers where, if you don't keep writing, it begins to delete what you wrote earlier. Good for production to deadlines; reflective editing and deep thought not included.

December 22 2011

Four short links: 22 December 2011

  1. Fuzzy String Matching in Python (Streamhacker) -- useful if you're to have a hope against the swelling dark forces powered by illiteracy and touchscreen keyboards.
  2. The Business of Illegal Data (Strata Conference) -- fascinating presentation on criminal use of big data. "The more data you produce, the happier criminals are to receive and use it. Big data is big business for organized crime, which represents 15% of GDP."
  3. Isarithmic Maps -- an alternative to chloropleths for geodata visualization.
  4. Server-Side Javascript Injection (PDF) -- a Blackhat talk about exploiting backend vulnerabilities with techniques learned from attacking Javascript frontends. Both this paper and the accompanying talk will discuss security vulnerabilities that can arise when software developers create applications or modules for use with JavaScript-based server applications such as NoSQL database engines or Node.js web servers. In the worst-case scenario, an attacker can exploit these vulnerabilities to upload and execute arbitrary binary files on the server machine, effectively granting him full control over the server.

December 09 2011

Four short links: 9 December 2011

  1. Critically Making the Internet of Things (Anne Galloway) -- session notes from a conference, see also part two. Good thoughts, hastily captured. For example, this from Bruce Sterling: RFID + Superglue + Object ≠ IoT and the talk I want to see: “A study of how broken, hacked and malfunctioning digital road signs subvert the physical space of roadways.”
  2. Conquering the CHAOS of Online Community at StackExchange -- StackExchange is doing some thoughtful work analysing conversations and channeling dissent into a healthy construction to guide future productive discussion. "We taught the users that it was alright to disagree, and gave them a set of arguments they could reference without every thread degenerating into a fight."
  3. Little Big Details -- one small detail done right, every day.
  4. Ranking Live Streams of Data (LinkedIn) -- behind the "interesting discussions" report.

December 06 2011

Four short links: 6 December 2011

  1. How to Dispel Your Illusions (NY Review of Books) -- Freeman Dyson writing about Daniel Kahneman's latest book. Only by understanding our cognitive illusions can we hope to transcend them.
  2. Appify-UI (github) -- Create the simplest possible Mac OS X apps. Uses HTML5 for the UI. Supports scripting with anything and everything. (via Hacker News)
  3. Translation Memory (Etsy) -- using Lucene/SOLR to help automate the translation of their UI. (via Twitter)
  4. Automatically Tagging Entities with Descriptive Phrases (PDF) -- Microsoft Research paper on automated tagging. Under the hood it uses Map/Reduce and the Microsoft Dryad framework. (via Ben Lorica)

November 18 2011

Four short links: 18 November 2011

  1. Learning With Quantified Self -- this CS grad student broke Jeopardy records using an app he built himself to quantify and improve his ability to answer Jeopardy questions in different categories. This is an impressive short talk and well worth watching.
  2. Evaluating Text Extraction Algorithms -- The gold standard of both datasets was produced by human annotators. 14 different algorithms were evaluated in terms of precision, recall and F1 score. The results have show that the best opensource solution is the boilerpipe library. (via Hacker News)
  3. Parallel Flickr -- tool for backing up your Flickr account. (Compare to one day of Flickr photos printed out)
  4. Quneo Multitouch Open Source MIDI and USB Pad (Kickstarter) -- interesting to see companies using Kickstarter to seed interest in a product. This one looks a doozie: pads, sliders, rotary sensors, with LEDs underneath and open source drivers and SDK. Looks almost sophisticated enough to drive emacs :-)

November 15 2011

Four short links: 15 November 2011

  1. Cost-Effectiveness of Internet-Based Self-Management Compared with Usual Care in Asthma (PLoSone) -- Internet-based self-management of asthma can be as effective as current asthma care and costs are similar.
  2. Apache Lucy -- full-text search engine library written in C and targeted at dynamic languages. It is a "loose C" port of Apache Lucene™, a search engine library for Java.
  3. The Near Future of Citizen Science (Fiona Romeo) -- near future of science is all about honing the division of labour between professionals, amateurs and bots. See Bryce's bionic software riff. (via Matt Jones)
  4. Microsoft's Patent Claims Against Android (Groklaw) -- behold, citizen, the formidable might of Microsoft's patents and how they justify a royalty from every Android device equal to that which you would owe if you built a Windows Mobile device: These Microsoft patents can be divided into several basic categories: (1) the '372 and '780 patents relate to web browsers; (2) the '551 and '233 patents relate to electronic document annotation and highlighting; (3) the '522 patent relates to resources provided by operating systems; (4) the '517 and '352 patents deal with compatibility with file names once employed by old, unused, and outmoded operating systems; (5) the '536 and '853 patents relate to simulating mouse inputs using non-mouse devices; and (6) the '913 patent relates to storing input/output access factors in a shared data structure. A shabby display of patent menacing.

November 14 2011

Four short links: 14 November 2011

  1. Science Hack Day SF Videos (justin.tv) -- the demos from Science Hack Day SF. The journey of a thousand miles starts with a Hack Day.
  2. A Cross-Sectional Study of Canine Tail-Chasing and Human Responses to It, Using a Free Video-Sharing Website (PLoSone) -- Approximately one third of tail-chasing dogs showed clinical signs, including habitual (daily or "all the time") or perseverative (difficult to distract) performance of the behaviour. These signs were observed across diverse breeds. Clinical signs appeared virtually unrecognised by the video owners and commenting viewers; laughter was recorded in 55% of videos, encouragement in 43%, and the commonest viewer descriptors were that the behaviour was "funny" (46%) or "cute" (42%).
  3. RSS Died For Your Sins (Danny O'Brien) -- if you have seven thousand people following you, a good six thousand of those are going to be people you don’t particularly like. The problem, as ever, is—how do you pick out the other thousand? Especially when they keep changing? I firmly believe that one of the pressing unsolved technological problems of the modern age is getting safely away from people you don't like, without actually throttling them to death beforehand, nor somehow coming to the conclusion that they don't exist, nor ending up turning yourself into a hateful monster.
  4. Generating Text from Functional Brain Images (Frontiers in Human Neuroscience) -- We built a model of the mental semantic representation of concrete concepts from text data and learned to map aspects of such representation to patterns of activation in the corresponding brain image. Turns out that the clustering of concepts in Wikipedia is similar to how they're clustered in the brain. They found clusters in Wikipedia, mapped to the brain activity for known words, and then used that mapping to find words for new images of brain activity. (via The Economist)

October 21 2011

Four short links: 21 October 2011

  1. What Mozilla is Up To (Luke Wroblewski) -- notes from a talk that Brendan Eich gave at Web 2.0 Summit. The new browser war is between the Web and new walled gardens of native networked apps. Interesting to see the effort Mozilla's putting into native-alike Web apps.
  2. YouTube Insult Generator (Adrian Holovaty) -- mines YouTube for insults of a particular form.
  3. Ultrasound for iPhone (Geekwire) -- this personal sensor is $8000 today, but bound to drop. I want personal ultrasound at least once a month. How long until it's in the $200-500 range? (via BERG London)
  4. Web Applications Class at Stanford OpenClassroom -- a Ruby on Rails class taught by John Ousterhout, creator of TCL/Tk and log-structured filesystems.

October 20 2011

Four short links: 20 October 2011

  1. Earth Turns 6015 -- my plan to celebrate on Saturday the amazing thing that is our universe. Scientists know humility, curiosity, and awe. All the scientists I know speak of their awe at the natural world. I'd like to see data scientists take a moment to soak in the complexity of a problem, appreciating it in all its tangled majesty, separate from attempts to unravel it.
  2. Data Jujitsu -- Luke Wroblewski took notes at DJ Patil's Web 2.0 Expo talk, and this caught my eye: Unstructured data is harder to work with. Open text fields in forms are can cause issues. There are between 4 and 8 thousand variations of IBM and "Software Engineer" in LinkedIn's database.
  3. Secret iOS Business -- the dirty innards of iOS apps: phoning home, crap security, and bloated lazy design. My horror grew with every example.
  4. Culinary Reactions: Everyday Chemistry of Cooking -- Simon Quellen Field's new book on the chemistry of cooking. Simon's the man behind scitoys and his passion for understanding is a force of nature.

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl