Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

February 09 2012

Strata Week: Your personal automated data scientist

Here are a few of the data stories that caught my attention this week:

Wolfram|Alpha Pro: An on-call data scientist

The computational knowledge engine Wolfram|Alpha unveiled a pro version this week. For $4.99 per month ($2.99 for students), Wolfram|Alpha Pro offers access to more of the computational power "under the hood" of the site, in part by allowing users to upload their own datasets, which Wolfram|Alpha will in turn analyze.

This includes:

  • Text files — Wolfram|Alpha will respond with the character and word count, provide an estimate on how long it would take to read aloud, and reveal the most common word, average sentence length and more.
  • Spreadsheets — It will crunch the numbers and return a variety of statistics and graphs.
  • Image files — It will analyze the image's dimensions, size, and colors, and let you apply several different filters.

Wolfram Alpha Pro example
Wolfram|Alpha Pro subscribers can upload and analyze their own datasets.

There's also a new extended keyboard that contains the Greek alphabet and other special characters for manually entering data. Data and analysis from these entries and any queries can also be downloaded.

"In a sense," writes Wolfram's founder Stephen Wolfram, "the concept is to imagine what a good data scientist would do if confronted with your data, then just immediately and automatically do that — and show you the results."

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

Crisis-mapping and data protection standards

Ushahidi's Patrick Meier takes a look at the recently released Data Protection Manual issued by the International Organization for Migration (IOM). According to the IOM, the manual is meant to serve as a guide to help:

" ... protect the personal data of the migrants in its care. It follows concerns about the general increase in data theft and loss and the recognition that hackers are finding ever more sophisticated ways of breaking into personal files. The IOM Data Protection Manual aims to protect the integrity and confidentiality of personal data and to prevent inappropriate disclosure."

Meier describes the manual as "required reading" but notes that there is no mention of social media in the 150-page document. "This is perfectly understandable given IOM's work," he writes, "but there is no denying that disaster-affected communities are becoming more digitally-enabled — and thus, increasingly the source of important, user-generated information."

Meier moves through the Data Protection Manual's principles, highlighting the ones that may be challenged when it comes to user-generated, crowdsourced data and raising important questions about consent, privacy, and security.

Doubting the dating industry's algorithms

Many online dating websites claim that their algorithms are able to help match singles with their perfect mate. But a forthcoming article in "Psychological Science in the Public Interest," a journal of the Association for Psychological Science, casts some doubt on the data science of dating.

According to the article's lead author Eli Finkel, associate professor of social psychology at Northwestern University, "there is no compelling evidence that any online dating matching algorithm actually works." Finkel argues that dating sites' algorithms do not "adhere to the standards of science," and adds that "it is unlikely that their algorithms can work, even in principle, given the limitations of the sorts of matching procedures that these sites use."

It's "relationship science" versus the in-take questions that most dating sites ask in order to help users create their profiles and suggest matches. Finkel and his coauthors note that some of the strongest predictors for good relationships — such as how couples interact under pressure — aren't assessed by dating sites.

The paper calls for the creation of a panel to grade the scientific credibility of each online dating site.

Got data news?

Feel free to email me.

Related:

June 08 2011

Dating with data

OkCupid logoOkCupid is a free dating site with seven million users. The site's blog, OkTrends, mines data from those users to tackle important subjects like "The case for an older woman" and "The REAL 'stuff white people like'."

Beyond clever headlines, OkCupid also uses an unusual pedigree to separate itself from the dating site pack: The business was founded by four Harvard-educated mathematicians.

"It probably scared people when they first heard that four math majors were starting a dating site," said CEO Sam Yagan during a recent interview. But the founders' backgrounds greatly influenced how they approached the problem of dating.

"A lot of other dating sites are based on psychology," Yagan said. "The fundamental premise of a site like eHarmony is that they know the answer. Our approach to dating isn't that there's some psychological theory that will be the answer to all your problems. We think that dating is a problem to be solved using data and analytics. There is no magic formula that can help everyone to find love. Instead, we bring value by building a decent-sized platform that allows people to provide information that helps us to customize a match algorithm to each person's needs."

OkCupid works by having users state basic preferences and answering questions like "Is it wrong to spank a child who's been bad?" Users are matched based on the overlap of their answers and how important each question is to both users.

Yagan said data was built into the business model from the beginning. "We knew from the time we started the company that the data we were generating would have three purposes: helping us match people up, attracting advertisers since that was the core of our revenue model, and that the data would also be interesting socially."

In 2007, the company hired a PR firm to publicize some of its findings, such as the fact that when gas prices rise, users narrow the search radius for matches. "We called dozens of reporters and nobody cared," Yagan said. So OkCupid fired the PR firm and started publishing their findings on the OkTrends blog. The blog has thus far doubled traffic to the site.

"The blog is partly an advice column, but instead of being written by a psychologist, the data writes itself," Yagan said. "For example, we don't tell you that you should or should not use a flash for your profile photo. We just tell you that if you use a flash you'll look seven years older."


OSCON Data 2011, being held July 25-27 in Portland, Ore., is a gathering for developers who are hands-on, doing the systems work and evolving architectures and tools to manage data. (This event is co-located with OSCON.)

Save 20% on registration with the code OS11RAD


I asked Yagan about the data on which OkTrends draws. "We have people's registration data," he said. "Then we have stated preferences; the answers that people give to the questions we ask them. We use that kind of data occasionally, but it's not the core difference that we have. The core difference is in the category of revealed preferences. Imagine if you had a video camera in every bar and you could observe every interaction between two people and see the success rate of that interaction. We essentially have that video camera on our site."

The reason revealed preferences are so important is that they track real-world behavior — what people really want rather than what they say they want. "When you get 12 messages and you only reply to three of them, you are voting with your time," Yagan said. "Or when a guy is shorter than you, you don't reply."

Mobile adds a new revealed preferences dimension for OkCupid. "As our product gets more mobile and location-aware, we are more likely to be on that date with them," Yagan said. "Then we can model the kinds of conversations on the site that lead to an in-person meeting." OkCupid can currently track the five million messages sent every week on the site as well as other revealed preferences, like ratings of profiles.

According to Yagan, OkCupid doesn't use sophisticated data mining or analytics tools: "Most of it can be done by querying the database and crunching numbers in Excel. The fact that we have four math majors and a full-time statistician means that we take that number crunching very seriously."



Related:


May 03 2011

Tech Weekly podcast: Looking for art, love ... and Bin Laden

Aleks Krotoski, Charles Arthur and Jemima Kiss are joined in the Tech Weekly studio this week by former Guardian Technology editor and artist Vic Keegan and Artfinder founder Chris Thorpe to discover what the web can do to help art lovers find inspiration.

Aleks speaks with Sam Yagan, chief executive & co-founder of the biggest free online dating site in the US, OKCupid, to learn a little about finding love online.

Plus, Charles breaks down the political implications of the live feed from Pakistan to the White House in Washington DC, during this week's US Military operation to kill al Qaeda's Osama Bin Laden, and fills the team in on the other technology headlines that have been making waves around the world.



Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl