Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

October 22 2013

Mining the social web, again

When we first published Mining the Social Web, I thought it was one of the most important books I worked on that year. Now that we’re publishing a second edition (which I didn’t work on), I find that I agree with myself. With this new edition, Mining the Social Web is more important than ever.

While we’re seeing more and more cynicism about the value of data, and particularly “big data,” that cynicism isn’t shared by most people who actually work with data. Data has undoubtedly been overhyped and oversold, but the best way to arm yourself against the hype machine is to start working with data yourself, to find out what you can and can’t learn. And there’s no shortage of data around. Everything we do leaves a cloud of data behind it: Twitter, Facebook, Google+ — to say nothing of the thousands of other social sites out there, such as Pinterest, Yelp, Foursquare, you name it. Google is doing a great job of mining your data for value. Why shouldn’t you?

There are few better ways to learn about mining social data than by starting with Twitter; Twitter is really a ready-made laboratory for the new data scientist. And this book is without a doubt the best and most thorough approach to mining Twitter data out there. But that’s only a starting point. We hear a lot in the press about sentiment analysis and mining unstructured text data; this book shows you how to do it. If you need to mine the data in web pages or email archives, this book shows you how. And if you want to understand how to people collaborate on projects, Mining the Social Web is the only place I’ve seen that analyzes GitHub data.

All of the examples in the book are available on Github. In addition to the example code, which is bundled into IPython notebooks, Matthew has provided a VirtualBox VM that installs Python, all the libraries you need to run the examples, the examples themselves, and an IPython server. Checking out the examples is as simple as installing Virtual Box, installing Vagrant, cloning the 2nd edition’s Github archive, and typing “vagrant up.” (This quick start guide summarizes all of that.) You can execute the examples for yourself in the virtual machine; modify them; and use the virtual machine for your own projects, since it’s a fully functional Linux system with Python, Java, MongoDB, and other necessities pre-installed. You can view this as a book with accompanying examples in a particularly nice package, or you can view the book as “premium support” for an open source project that consists of the examples and the VM.

If you want to engage with the data that’s surrounding you, Mining the Social Web is the best place to start. Use it to learn, to experiment, and to build your own data projects.

September 18 2012

Data is the real business model for social

As social media websites gather ever-growing data stores, they might be better served by finding ways to make profitable use of that data instead serving ads as their chief means of raising revenue. While the data might give them the information they need to serve more targeted ads — although in my experience they still have a ways to go with that — the real value in the site could be the data itself.

Of course, if social sites start selling data to the highest bidder that leaves open questions of data ownership and privacy and finding ways to strip personal identifiers.

Marie Wallace (@marie_wallace) is social analytics strategist for the IBM Collaboration Solutions division. She has spent more than a decade at IBM working on content analytics, and her experience uniquely positions her to address questions regarding big data, social media and analytics. Our interview follows.

Social media’s real value might not be in selling ads, but in the data they are collecting. Why do you think that is?

Marie Wallace: The reason ad targeting has worked so well for search is because it’s aligned and supportive to that particular activity; when I am searching for information about products or services I am happy to get ads that may help direct my search. Ads are somewhat analogous to a value-added service and social search makes the ads more personalized and relevant, which is why Google has invested so heavily in Google+.

The key is that in most cases ads only work in a search-like context, however with most social media sites people are not going there to search. They are going to converse with friends and family, which makes ads interruptive and frequently invasive. This is further exacerbated by mobile, where limited real estate makes ads even more offensive as they are distracting and clutter the screen. Social search is one example of a service that sits on top of social data, but there are a whole plethora of other services that social data can drive — from market research to consumer/brand engagement, social recommenders, information filtering, or expertise location.

It’s one thing to recognize the value of data, but how do you extract that value?

Marie Wallace: Extracting value from data requires a well-described set of scenarios with a clear understanding of what facts would be considered valuable for those scenarios. For example; when looking for a job there are a very specific set of questions that people want asked and answered: employee sentiment, corporate success (revenue, customers, products, growth), location, demographics, technologies, industries, skills, competitors, values, culture.

These are very different to the questions (and hence analysis) that might be pertinent to a different scenario. For example; when deciding where to go on holidays people are likely more interested in the location, activities, accommodation, weather, cost, demographics, or visitor sentiment. The key here is that analysis has to be not only domain-, but scenario-specific, which is why targeted specialist services like LinkedIn or Tripadvisor are always going to be able to deliver greater analytics value for the specific scenarios they support.

There are concerns on social networks about the sites abusing the data users are contributing. Is there a reliable way to anonymize data and deliver it in aggregate form that strips out individual user information?

Marie Wallace: I think the issue of privacy is a more complex problem, and while anonymizing user information is part of the solution, I don’t believe it’s at the heart of the problem. I believe the key social media challenges moving forward will be those of permission, trust, and transparency. People need to know exactly how their data is being used so that they can give permission for that use and that use only. For example; if I have a Tesco loyalty card and I trust them to respect my data, then I might be happy for them to see my Facebook Likes so they can provide me with more relevant special offers. Or if I register on LinkedIn I know that my data is going to be provided to recruiters and hiring companies, but I most definitely don’t want them to use it for any other undisclosed purpose.

There is also a likelihood that in the future we will see information brokers emerge, which provides a level of indirection (perhaps even obfuscation or anonymization) where they act as mediators on our behalf. This simplifies the authorization model, but does assume that we trust the information brokers and the models that they use for controlling access to our information.

Have the tools caught up with the amount and variety of data so that services like social networks can begin to manipulate the data they collect?

Marie Wallace: Having spent the last decade working on content analytics and semantic technologies, I can confidently say that many of the required tools have been around for years waiting for demand to catch up with supply. The advent of social media, alongside the growth of a new generation of big data platforms, now gives them the perfect business problem, dataset, and execution platform through which to shine. However, I believe the industry does have one significant gap in this otherwise rich landscape of technologies, and it’s a gap that I believe will impact the value that we can derive from these social networks.

It’s our handling of massive-scale networks that I believe is going to become a technological challenge as we move rapidly toward massive-scale graphs with social, semantic, temporal, and geospatial characteristics and as we look to apply complex analytics across these networks. There are a number of existing technologies from the linked data world that could morph to fill this gap, or alternatively there is a new generation of graph databases and analytics algorithms emerging focused on tackling this specialized problem. Only time will tell in terms of which technologies will emerge the winners.

What kinds of uses could you envision social sites finding for their data?

Marie Wallace: For the medium-term, I suspect that we will continue to see social analysis being driven by the marketing, sales, and support organizations. Social data will be used for market research, to help expand sales channels, and to improve how brands interact with customers.

As we move from marketing to sales to support, the type of analysis becomes more complex and this will put pressure on the algorithms being used to evaluate the data and derive insights; identity and entity disambiguation, micro-segmentation, influence analysis, sentiment, intent, network information flow, and community dynamics. A growing number of social applications will emerge, each delivering niche value to consumers and generating specialist data for brands. This ecosystem of social networks will drive consumer-brand engagement; everything from consumer feedback systems, customer support, to product and service innovation. Brands will move away from a focus on passive listening/monitoring to one of active engagement, and this will require a broader range of analytics in order to optimize and operationalize those interactions.

Further out I see us expanding the personalization that can be realized. Social data will become increasingly important for personalizing every search and navigation experience from Google, Amazon, Netflix, to Expedia, however search is only the tip of the iceberg. I anticipate that in the longer term social data will be used to personalize a whole range of experiences that cross the physical/digital divide; transforming how we shop, what we think, how we learn, and ultimately how we live.

Just imagine what will happen when we intersect the social web, the semantic web, with the web of data. Then we will really see personalization take on a whole new form!

This interview was edited and condensed. This post was originally published on


Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.
No Soup for you

Don't be the product, buy the product!

YES, I want to SOUP ●UP for ...