Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

February 15 2012

What the data can tell us about dating and other social congregation

Valentine's Day turned out to be a good time to discuss data crunching of online dating. Kevin Lewis, a PhD candidate in sociology and Berkman Center Fellow, drew an overflow room today for his talk Mate Choice in an Online Dating Site. It's yet another example of how, as people go online, they leave a trail of data that could never be captured before.

Here are some examples how traditional researchers are restricted:

  • They can get marriage data, but have much less data about dating, cohabitation without marriage, and other non-traditional arrangements that are increasingly common. Dating sites let us in at a much earlier stage in a relationship that may or may not lead to marriage.

  • They can measure certain recorded demographics such as age and race, but miss a huge range of criteria by which people evaluate potential mates. People enter lots of interesting facts about themselves and their hoped-for mates on dating sites.

  • Because researchers miss the initial contacts, they have trouble tracing back from a result (marriage) to the criteria used by the dating couples.

As an example of the the last problem, Lewis mentioned the observation that people usually date and marry others with similar levels of formal education. Actually, researchers have long hypothesized that men don't care much about women's educational levels. They would be willing to date and marry outside their educational levels. It's the women who care, and since they rule out men with much higher or lower educational levels, we end up with the current results.

Now Lewis can cite concrete data proving that hypothesis. On a dating site, men initiate and respond to contacts with women of many different levels. But the women don't initiate many contacts outside their own level, and don't respond to contacts from men outside that level.

How did Lewis conduct his research? Briefly, he persuaded OkCupid to give him a large data set stripped of free-text fields, but containing information on race, religion, and several other criteria. He chose data in the New York City area for heterosexual couples. Considering that 22% of heterosexual adults have found their current partners through online sites (the figure is even higher for same-sex couples: 61%), this is a lot of valuable data.

Of course, there are risks in extrapolating from this data set. Admittedly, OkCupid users tend to be younger and more Internet-savvy than the overall dating population. It's hard to tell whether some criterion is truly a determining factor or a consequence of some other factor (for instance, educational level is correlated with age). Still, Lewis controlled for variables a good deal and feels there is a lot of statistical validity to his findings.

As just one other example, he documented a lot of contacts across racial lines, more than one might expect. But there were definite patterns. For instance, black women received a lot fewer contacts from other races than most groups. In this way, the data on dating gives us a look at our values in choices in other forms of social interaction, not just romance.

December 15 2011

Where is the OkCupid for elections?

OK Candidate and ElectnextTo date, we've generally been more adept at collecting and storing data than making sense of it. The companies, individuals and governments that become the most adept at data analysis are doing more than finding the signal in the noise: They are creating a strategic capability. Sometimes, the data comes from unexpected directions. For instance, OkCupid's approach to dating with data has earned it millions of users. In the process, OkCupid has gained great insight into the dynamics of dating in the 21st century, which it then shared on its blog.

Based upon their success, I wondered aloud at this year's Newsfoo whether a similar data-driven web app could be built to help citizens match themselves up with candidates:

After Tim tweeted the observation, I quickly learned two things:

  1. Albert Sun, Daniel Bachhuber, Ashwin Shandilya and Jay Zalowitz had built exactly that app at the 2011 Times Open Hack Day on the day I posed the question. OkCandidate is a web app that matches up a citizen with a Republican presidential candidate. (There's no comparable matching engine for Barack Obama, perhaps given that Democrats expect that the current incumbent of the White House will be the Democratic Party's nominee in 2012.) OkCandidate presents a straightforward series of questions about a wide range of core foreign and domestic issues with ratings to allow the user to rank the importance of agreeing with a given candidate. The app is open source, so if you want to try to improve the code, click on over to OkCandidate on GitHub.
  2. ElectNext, a Philadelphia-based startup, has focused on solving this problem. The "eHarmony for voters," as TechCrunch describes it, aims to match you to your candidate. I also learned that ElectNext won the Judges' Choice Award at the 2011 Web 2.0 Expo/NY Startup Showcase. In the video below, Joanne Wilson and Mo Koyfman discuss the startup from a venture capitalist's perspective.

The politics of big data

Creating a better issue-matching engine for voters and candidates is a genuinely useful civic function. The not-so-hidden opportunity here, however, may be to gather a rich dataset from those choices in precisely the same way that OkCupid has done for dating. That's clearly part of the mindset here: "The data on individual users we don't share with anyone," ElectNext founder Keya Danenbaum told Fast Company. "But the way we foresee using all this information we're collecting is ... eventually to aggregate that and say something really interesting in a poll type of report."

How news organizations and campaigns alike collect, store and analyze data is going to matter much more. Close watchers of the intersection of politics and technology already think the Obama campaign's data crunching may help the president win re-election. As Personal Democracy Media co-founder Micah Sifry put it back in April, "it's the data, stupid."

Big data is "powering the race for the White House," wrote Patrick Ruffini, president of Engage, an interactive agency in D.C.:

The hottest job in today's Presidential campaigns is the Data Mining Scientist — whose job it is to sort through terabytes of data and billions of behaviors tracked in voter files, consumer databases, and site logs. They'll use the numbers to uncover hidden patterns that predict how you'll vote, if you'll pony up with a donation, and if you'll influence your friends to support a candidate.

Alistair Croll, the co-chair of the Strata Conference, thinks it's a strategic capability. "After Eisenhower, you couldn't win an election without radio," he told me at Strata, Calif., in February. "After JFK, you couldn't win an election without television. After Obama, you couldn't win an election without social networking. I predict that in 2012, you won't be able to win an election without big data."

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20


June 08 2011

Dating with data

OkCupid logoOkCupid is a free dating site with seven million users. The site's blog, OkTrends, mines data from those users to tackle important subjects like "The case for an older woman" and "The REAL 'stuff white people like'."

Beyond clever headlines, OkCupid also uses an unusual pedigree to separate itself from the dating site pack: The business was founded by four Harvard-educated mathematicians.

"It probably scared people when they first heard that four math majors were starting a dating site," said CEO Sam Yagan during a recent interview. But the founders' backgrounds greatly influenced how they approached the problem of dating.

"A lot of other dating sites are based on psychology," Yagan said. "The fundamental premise of a site like eHarmony is that they know the answer. Our approach to dating isn't that there's some psychological theory that will be the answer to all your problems. We think that dating is a problem to be solved using data and analytics. There is no magic formula that can help everyone to find love. Instead, we bring value by building a decent-sized platform that allows people to provide information that helps us to customize a match algorithm to each person's needs."

OkCupid works by having users state basic preferences and answering questions like "Is it wrong to spank a child who's been bad?" Users are matched based on the overlap of their answers and how important each question is to both users.

Yagan said data was built into the business model from the beginning. "We knew from the time we started the company that the data we were generating would have three purposes: helping us match people up, attracting advertisers since that was the core of our revenue model, and that the data would also be interesting socially."

In 2007, the company hired a PR firm to publicize some of its findings, such as the fact that when gas prices rise, users narrow the search radius for matches. "We called dozens of reporters and nobody cared," Yagan said. So OkCupid fired the PR firm and started publishing their findings on the OkTrends blog. The blog has thus far doubled traffic to the site.

"The blog is partly an advice column, but instead of being written by a psychologist, the data writes itself," Yagan said. "For example, we don't tell you that you should or should not use a flash for your profile photo. We just tell you that if you use a flash you'll look seven years older."

OSCON Data 2011, being held July 25-27 in Portland, Ore., is a gathering for developers who are hands-on, doing the systems work and evolving architectures and tools to manage data. (This event is co-located with OSCON.)

Save 20% on registration with the code OS11RAD

I asked Yagan about the data on which OkTrends draws. "We have people's registration data," he said. "Then we have stated preferences; the answers that people give to the questions we ask them. We use that kind of data occasionally, but it's not the core difference that we have. The core difference is in the category of revealed preferences. Imagine if you had a video camera in every bar and you could observe every interaction between two people and see the success rate of that interaction. We essentially have that video camera on our site."

The reason revealed preferences are so important is that they track real-world behavior — what people really want rather than what they say they want. "When you get 12 messages and you only reply to three of them, you are voting with your time," Yagan said. "Or when a guy is shorter than you, you don't reply."

Mobile adds a new revealed preferences dimension for OkCupid. "As our product gets more mobile and location-aware, we are more likely to be on that date with them," Yagan said. "Then we can model the kinds of conversations on the site that lead to an in-person meeting." OkCupid can currently track the five million messages sent every week on the site as well as other revealed preferences, like ratings of profiles.

According to Yagan, OkCupid doesn't use sophisticated data mining or analytics tools: "Most of it can be done by querying the database and crunching numbers in Excel. The fact that we have four math majors and a full-time statistician means that we take that number crunching very seriously."


Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!