Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

March 22 2012

Strata Week: Machine learning vs domain expertise

Here are a few of the data stories that caught my attention this week:

Debating the future of subject area expertise

Data Science Debate panel at Strata CA 12
The "Data Science Debate" panel at Strata California 2012. Watch the debate.

The Oxford-style debate at Strata continues to be one of the most-talked-about events from the conference. This week, it's O'Reilly's Mike Loukides who weighs in with his thoughts on the debate, which had the motion "In data science, domain expertise is more important than machine learning skill." (For those that weren't there, the machine learning side "won." See Mike Driscoll's summary and full video from the debate.)

Loukides moves from the unreasonable effectiveness of data to examine the "unreasonable necessity of subject experts." He writes that:

"Whether you hire subject experts, grow your own, or outsource the problem through the application, data only becomes 'unreasonably effective' through the conversation that takes place after the numbers have been crunched ... We can only take our inexplicable results at face value if we're just going to use them and put them away. Nobody uses data that way. To push through to the next, even more interesting result, we need to understand what our results mean; our second- and third-order results will only be useful when we understand the foundations on which they're based. And that's the real value of a subject matter expert: not just asking the right questions, but understanding the results and finding the story that the data wants to tell. Results are good, but we can't forget that data is ultimately about insight, and insight is inextricably tied to the stories we build from the data. And those stories are going to be ever more essential as we use data to build increasingly complex systems."

Microsoft hires former Yahoo chief scientist

Microsoft has hired Raghu Ramakrishnan as a technical fellow for its Server and Tools Business (STB), reports ZDNet's Mary Jo Foley. According to his new company bio, Ramakrishnan's work will involve "big data and integration between STB's cloud offerings and the Online Services Division's platform assets."

Ramakrishnan comes to Microsoft from Yahoo, where he's been the chief scientist for three divisions — Audience, Cloud Platforms and Search. As Foley notes, Ramakrishnan's move is another indication that Microsoft is serious about "playing up its big data assets." Strata chair Edd Dumbill examined Microsoft's big data strategy earlier this year, noting in particular its work on a Hadoop distribution for Windows server and Azure.

Analyzing the value of social media data

How much is your data worth? The Atlantic's Alexis Madrigal does a little napkin math based on figures from the Internet Advertising Bureau to come up with a broad and ambiguous range between half a cent and $1,200 — depending on how you decide to make the calculation, of course.

In an effort to make those measurements easier and more useful, Google unveiled some additional reports as part of its Analytics product this week. It's a move Google says will help marketers:

"... identify the full value of traffic coming from social sites and measure how they lead to direct conversions or assist in future conversions; understand social activities happening both on and off of your site to help you optimize user engagement and increase social key performance indicators (KPIs); and make better, more efficient data-driven decisions in your social media marketing programs."

Engagement and conversion metrics for each social network will now be trackable through Google Analytics. Partners for this new Social Data Hub, include Disqus, Echo, Reddit, Diigo, and Digg, among others.

Fluent Conference: JavaScript & Beyond — Explore the changing worlds of JavaScript & HTML5 at the O'Reilly Fluent Conference (May 29 - 31 in San Francisco, Calif.).

Save 20% on registration with the code RADAR20

Got data news?

Feel free to email me.


November 09 2011

Social network analysis isn't just for social networks

Social networking has become a pervasive part of our everyday online experience, and by extension, that means the analysis and application of social data is an essential component of business.

In the following interview, "Social Network Analysis for Startups" co-author Maksim Tsvetovat (@maksim2042) offers a primer on social network analysis (SNA) and how it has relevance beyond social-networking services.

What is social network analysis (SNA)?

Maksim Tsvetovat: Social network analysis is an offshoot of the social sciences — sociology, political science, psychology, anthropology and others — that studies human interactions by using graph-theoretic approaches rather then traditional statistics. It's a scientific methodology for data analysis and also a collection of theories about how and why people interact — and how these interaction patterns change and affect our lives as individuals or societies. The theories come from a variety of social sciences, but they are always backed up with mathematical ways of measuring if a specific theory is applicable to a specific set of data.

In the science world, the field is considered interdisciplinary, so gatherings draw mathematicians, physicists, computer scientists, sociologists, political scientists and even an occasional rock musician.

As far as the technology aspect goes, the analysis methods are embodied in a set of software tools, such as the Python-based NetworkX library, which the book uses extensively. These tools can be used for analyzing and visualizing network data in a variety of contexts, from visualizing the spread of disease to business intelligence applications.

In terms of marketing applications, there's plenty of science behind "why things go viral" — and the book goes briefly into it — but I find that it's best to leave marketing to marketing professionals.

Does SNA refer specifically to the major social-networking services, or does it also apply beyond them?

Maksim Tsvetovat: SNA refers to the study of relationships between people, companies, organizations, websites, etc. If we have a set of relationships that may be forming a meaningful pattern, we can use SNA methods to make sense of it.

Major social-networking services are a great source of data for SNA, and they present some very interesting questions — most recently, how can a social network act as an early warning system for natural disasters? I'm also intrigued by the emergent role of Twitter as a "common carrier" and aggregation technology for data from other media. However, the analysis methodology is applicable to many other data sources. In fact, I purposefully avoided using Twitter as a data source in the book — it's the obvious place to start and also a good place to get tunnel vision about the technology.

Instead, I concentrated on getting and analyzing data from other sources, including campaign finance, startup company funding rounds, international treaties, etc., to demonstrate the potential breadth of applications of this technology.

Social Network Analysis for Startups — Social network analysis (SNA) is a discipline that predates Facebook and Twitter by 30 years. Through expert SNA researchers, you'll learn concepts and techniques for recognizing patterns in social media, political groups, companies, cultural trends, and interpersonal networks.

Today Only Get "Social Network Analysis for Startups" for $9.99 (save 50%).

How does SNA relate to startups?

Maksim Tsvetovat: A lot of startups these days talk about social-this and social-that — and all of their activity can be measured and understood using SNA metrics. Being able to integrate SNA into their internal business intelligence toolkits should make businesses more attuned to their audiences.

I have personally worked with three startups that used SNA to fine-tune their social media targeting strategies by locating individuals and communities, and addressing them directly. Also, my methodologies have been used by a few large firms: the digital marketing agency DIGITAS is using SNA daily for a variety of high-profile clients. (Disclosure: my startup firm, DeepMile Networks, is involved in supplying SNA tools and services to DIGITAS and a number of others.)

What SNA shifts should developers watch for in the near future?

Maksim Tsvetovat: Multi-mode network analysis, which is analyzing networks with many types of "actors" (people, organizations, resources, governments, etc.). I approach the topic briefly in the book — but much remains to be done.

Also, watch for more real-time analysis. Most SNA is done on snapshot-style data that is, at best, a few hours out-of-date — some is years out-of-date. The release of Twitter's Storm tool should spur developers to make more SNA tools work on real-time and flowing data.

This interview was edited and condensed.

Associated photo on home and category pages: bulletin board [before there was twitter] by woodleywonderworks, on Flickr.


Reposted byRK RK
Sponsored post
5371 6093 500
rockyourmind, foods, 2010-2020.

So Long, and Thanks for All the Fish.
Reposted fromRockYourMind RockYourMind

July 19 2011

Google+ is the social backbone

Google plusThe launch of Google+ is the beginning of a fundamental change on the web. A change that will tear down silos, empower users and create opportunities to take software and collaboration to new levels.

Social features will become pervasive, and fundamental to our interaction with networked services. Collaboration from within applications will be as natural to us as searching for answers on the web is today.

It's not just about Google vs Facebook

Much attention has focused on Google+ as a Facebook competitor, but to view the system solely within that context is short-sighted. The consequences of the launch of Google+ are wider-reaching, more exciting and undoubtedly more controversial.

Google+ is the rapidly growing seed of a web-wide social backbone, and the catalyst for the ultimate uniting of the social graph. All it will take on Google's part is a step of openness to bring about such a commoditization of the social layer. This would not only be egalitarian, but would also be the most effective competitive measure against Facebook.

As web search connects people to documents across the web, the social backbone connects people to each other directly, across the full span of web-wide activity. (For the avoidance of doubt, I take "web" to include networked phone and tablet applications, even if the web use is invisible to the user.)

Search removed the need to remember domain names and URLs. It's a superior way to locate content. The social backbone will relieve our need to manage email addresses and save us laborious "friending" and permission-granting activity — in addition to providing other common services such as notification and sharing.

Though Google+ is the work of one company, there are good reasons to herald it as the start of a commodity social layer for the Internet. Google decided to make Google+ be part of the web and not a walled garden. There is good reason to think that represents an inclination to openness and interoperation, as I explain below.

Strata Conference New York 2011, being held Sept. 22-23, covers the latest and best tools and technologies for data science -- from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.

Save 20% on registration with the code STN11RAD

It's time for the social layer to become a commodity

We're now several years into the era of social networks. Companies have come and gone, trying to capture the social graph and exploit it. Well intentioned but doomed grass-roots initiatives have waxed and waned. Facebook has won the platform game, being the dominant owner of our social attention, albeit mostly limited to non-workplace application.

What does this activity in social software mean? Clearly, social features are important to us as users of computers. We like to identify our friends, share with them, and meet with them. And it's not just friends. We want to identify co-workers, family, sales prospects, interesting celebrities.

Currently, we have all these groups siloed. Because we have many different contexts and levels of intimacy with people in these groups, we're inclined to use different systems to interact with them. Facebook for gaming, friends and family. LinkedIn for customers, recruiters, sales prospects. Twitter for friends and celebrities. And so on into specialist communities: Instagram and Flickr, Yammer or Salesforce Chatter for co-workers.

The situation is reminiscent of electronic mail before it became standardized. Differing semi-interoperable systems, many as walled gardens. Business plans predicated on somehow "owning" the social graph. The social software scene is filled with systems that assume a closed world, making them more easily managed as businesses, but ultimately providing for an uncomfortable interface with the reality of user need.

An interoperable email system created widespread benefit, and permitted many ecosystems to emerge on top of it, both formal and ad-hoc. Email reduced distance and time between people, enabling rapid iteration of ideas, collaboration and community formation. For example, it's hard to imagine the open source revolution without email.

When the social layer becomes a standard facility, available to any application, we'll release ourselves into a world of enhanced diversity, productivity and creative opportunity. Though we don't labor as much under the constraints of distance or time as we did before email, we are confined by boundaries of data silos. Our information is owned by others, we cannot readily share what is ours, and collaboration is still mostly boxed by the confines of an application's ability.

A social backbone would also be a boost for diversity. Communities of interest would be enabled by the ready availability of social networking, without having the heavy lifting in creating the community, or run the risk of disapproval or censorship from a controlling enterprise.

The effect of email interoperability didn't just stop at enabling communication: it was a catalyst for standards in document formats and richer collaboration. The social backbone won't just make it easier to handle permissions, identity and sharing, but will naturally exert pressure for further interoperation between applications. Once their identity is united across applications, users will expect their data to travel as well.

We see already a leaning toward this interoperability: the use of Twitter, Facebook and Google as sign-on mechanisms across websites and games, attempts to federate and intermingle social software, cloud-based identity and wallet services.

What a social backbone would do

As users, what can we expect a social backbone to do for us? The point is to help computers serve us better. We naturally work in contexts that involve not only documents and information, but groups of people. When working with others, the faster and higher bandwidth the communication, the better.

To give some examples, consider workplace collaboration. Today's groupware solutions are closed worlds. It's impractical for them to encompass either a particularly flexible social model, or a rich enough variety of applications and content, so they support a restricted set of processes. A social backbone could make groupware out of every application. For the future Photoshop, iMovie and Excel, it adds the equivalent power of calling someone over and saying "Hey, what about this?"

Or think about people you interact with. When you're with someone, everything you're currently doing with them is important. Let's say you're working with your friend Jane on the school's PTA fundraiser, and her and your kids play together. Drag Jane into your PTA and Playdates circles. Drop a letter to parents into the PTA circle, and your calendar's free/busy info into Playdates.

Now you're sharing information both of you need. Next Thursday you see Jane at school. While you're chatting, naturally the topic of playdates and the PTA come up. You bring up Jane on your phone, and there are links right there to the letter you're writing, and some suggested dates for mutually free time.

Teaching computer systems about who we know lets them make better guesses as to what we need to know, and when. My examples are merely simple increases in convenience. The history of computing frequently shows that once a platform is opened up, the creative achievements of others far exceed those dreamed of by the platform's progenitors.

The social backbone democratizes social software: developers are freed from the limitations of walled gardens, and the power to control what you do with your friends and colleagues is returned to you, the user.

Social backbone services

Which services will the social backbone provide? We can extract these from those provided by today's web and social software applications:

  • Identity — authenticating you as a user, and storing information about you
  • Sharing — access rights over content
  • Notification — informing users of changes to content or contacts' content
  • Annotation — commenting on content
  • Communication — direct interaction among members of the system

These facilities are not new requirements. Each of them have been met in differing ways by existing services. Google and Amazon serve as identity brokers with a reasonable degree of assurance, as do Twitter and Facebook, albeit with a lesser degree of trust.

A host of web services address sharing of content, though mostly focused on sharing the read permission, rather than the edit permission. Notification originated with email, graduated through RSS, and is now a major part of Twitter's significance, as well as a fundamental feature of Facebook. Annotation is as old as the web, embodied by the hyperlink, but has been most usefully realized through blogging, Disqus, Twitter and Facebook commenting. Communication between users has been around as long as multi-user operating systems, but is most usefully implemented today in Facebook chat and instant messaging, where ad-hoc groups can easily be formed.

Why not Facebook?

Unfortunately, each of today's answers to providing these social facilities are limited by their implementation. Facebook provides the most rounded complement of social features, so it's a reasonable question to ask why Facebook itself can't provide the social backbone for the Internet.

Facebook's chief flaw is that is a closed platform. Facebook does not want to be the web. It would like to draw web citizens into itself, so it plays on the web, but in terms that leave no room for doubt where the power lies. Content items in Facebook do not have a URI, so by definition can never be part of the broader web. If you want to use Facebook's social layer, you must be part of and subject to the Facebook platform.

Additionally, there are issues with the symmetry of Facebook's friending model: it just doesn't model real life situations. Even the term "friend" doesn't allow for the nuance that a capable web-wide social backbone needs.

This is not to set up a Facebook vs Google+ discussion, but to highlight that Facebook doesn't meet the needs of a global social backbone.

Why Google+?

Why is Google+ is the genesis of a social backbone? The simple answer is that it's the first system to combine a flexible enough social model with a widespread user base, and a company for whom exclusive ownership of the social graph isn't essential to their business.

Google also has the power to bootstrap Google+ as a social backbone: the integration of Google+ into Google's own web applications would be a powerful proving ground and advertisement for the concept.

Yet one company alone should not have the power to manage identity for everyone. A workable and safe social backbone must support competition and choice, while still retaining the benefits of the network. Email interoperability was created not by the domination of one system, but by standards for communication.

To achieve a web-wide effect, Google+ needs more openness and interoperability, which it does not yet have. The features offered by the upcoming Google+ API will give us a strong indication of Google's attitude towards control and interoperability.

There is some substantial evidence that Google would support an open and interoperable social backbone:

  • Google's prominence as a supporter of the open web, which is crucial to its business.
  • The early inclination to interoperation of Google+: public content items have a URI, fallback to email is supported for contacts who are not Google+ members.
  • Google is loudly trumpeting their Data Liberation Front, committed to giving users full access to their own data.
  • Google has been involved in the creation of, or has supported, early stage technologies that address portions of the social backbone, including OAuth, OpenID, OpenSocial, PubSubHubbub.
  • Google displays an openness to federation with interoperating systems, evinced most keenly by Joseph Smarr, the engineer behind the Google+ Circles model. The ill-fated Google Wave incorporated federation.
  • The most open system possible would best benefit Google's mission in organizing the world's information, and their business in targeting relevant advertising.

Toward the social backbone

Computers ought to serve us and provide us with means of expression.

A common, expressive and interoperable social backbone will help users and software developers alike. Liberated from information silos and repeat labor of curating friends and acquaintances, we will be free to collaborate more freely. Applications will be better able to serve us as individuals, not as an abstract class of "users".

The road to the social backbone must be carefully trodden, with privacy a major issue. There is a tough trade-off between providing usable systems and those with enough nuance to sufficiently meet our models of collaboration and sharing.

Obstacles notwithstanding, Google+ represents the promise of a next generation of social software. Incorporating learnings from previous failures, a smattering of innovation, and a close attention to user need, it is already a success.

It requires only one further step of openness to take the Google+ product into the beginnings of a social backbone. By taking that step, Google will contribute as much to humanity as it has with search.

Edd Dumbill is the chair of O'Reilly's Strata and OSCON conferences. Find him here on Google+.

(Google's Joseph Smarr, a member of the Google+ team, will discuss the future of the social web at OSCON. Save 20% on registration with the code OS11RAD.)


Reposted byRK RK

May 12 2011

Parsing a new Pew report: 3 ways the Internet is shaping healthcare

On balance, people report being helped by the health information they find online, not harmed. While social networking sites are not a significant source of health information for online users, they do provide a source of encouragement and offer community for caregivers and patients. One quarter of online users have looked at drug reviews online, with some 38% of caregivers doing so. One quarter of online users have watched a video about health. And a new kind of digital divide is growing between users who have access to mobile broadband and those who do not.

Those are just a few of the insights from a new survey on the social life of health information from the Pew Internet and Life Project. The results shed new light on how the online world is using the Internet to gather and share health data.

The Internet has disrupted how, where, when and what information we can gather and share about ourselves, one another and the conditions that we suffer from. Following are three key trends that reflect how the Internet is changing healthcare.

Health IT at OSCON 2011 — The conjunction of open source and open data with health technology promises to improve creaking infrastructure and give greater control and engagement for patients. These topics will be explored in the healthcare track at OSCON (July 25-29 in Portland, Ore.)

Save 20% on registration with the code OS11RAD

The quantified self

As Edd Dumbill observed here at Radar last year, network-connected sensors that track your fitness can increasingly be seen on city streets, gyms and wrists. Gary Wolfe has likened the growth of the quantified self to the evolution of personal computing in the 1980s.

The trend toward a data-driven life that Wolfe describes as the quantified self is no longer the domain of elite athletes or math geeks. Fully one quarter of online users are tracking their health data online, according to Pew's survey. "The Quantified Self and PatientsLikeMe are the cutting-edge of that trend, but our study shows that it may be a broader movement than previously thought," said Susannah Fox, Associate Director of Digital Strategy for the Pew Internet Project.

Carol Torgan, a health science strategist cited in the report, has shared further analysis of self-tracking. "Self-tracking is extremely widespread," writes Torgan. "In addition to all the organized tracking communities, there’s a growing number of organic self-tracking communities. For examples, take a look at the diabetes made visible community on Flickr, or the more than 20,000 videos on YouTube tagged weight loss journey."

Below, Gary Wolf delivers a TED Talk on the quantified self:

Participatory medicine

Another trend that jumps out from this report is the rise of e-patients, where peer-to-peer healthcare complements the traditional doctor-to-patient relationship. While health professionals were the number one source of health information cited in this survey, the Internet is a significant source for 80% of online users.

We're entering an age of participatory medicine, where patients can learn more about their doctors, treatments, drugs and the experiences of others suffering from their conditions than ever before. Twenty-five percent of American adults have read the comments of another patients online. Twenty-three percent of Internet users that are living with at least one of five of the chronic conditions named in the survey have searched online for someone that shared their condition.

Online forums where people voluntarily share data about symptoms, environmental conditions, sources of infection, mechanics of injury or other variables continue to grow, and there are now dozens of other social media health websites to explore. As Claire Cain Miller wrote in the New York Times last year, online social networks bridge gaps for the chronically ill. And as Stephanie Clifford wrote in 2009, online communities can provide support for elderly patients who are isolated by geography.

"These networks provide sense of distributed community, where you can find others who suffer from your condition and support for treatment," said Fox. "PatientsLikeMe is example of that."

PatientsLikeMe, in fact, recently published the results of a patient-driven clinical trial in Nature, the first such study in a major journal. Fox shared further thoughts on mapping the frontier of healthcare at

The online conversation about health is being driven forward by two forces:  1) the availability of social tools and 2) the motivation, especially among people living with chronic conditions, to connect with each other. Pew Internet has identified two important trends in our data. One is what we call the "mobile difference" — hand someone a smartphone and they become more social online, more likely to share, more likely to contribute, not just consume information.

The other is what we call the "diagnosis difference" — holding all other demographic characteristics constant we find that having a chronic disease significantly increases an Internet user's likelihood to say they both contribute and consume user-generated content related to health. They are learning from each other, not just from institutions.

This trend emphasizes the link between health literacy, media literacy and digital literacy. When citizens search for information about health online, they're presented with a dizzying array of choices, including targeted advertising, sponsored blog posts, advertorials and online forums. One area where this will be particularly challenging is in pharmaceutical information. More open data about pharmaceuticals released by open government projects like Pillbox inject trustworthy information into the Internet ecosystem, as users searching for aspirin will find. However, the United States Food and Drug Administration has still not issued any official guidance for the use of social media by the industry. Given the growing percentage of caregivers and those suffering from chronic disease that are searching for information about drugs, such guidance may be overdue.

As the role of the Internet as a platform for collective action grows, its ability to connect fellow travelers will become increasingly important. As Clay Shirky observed in January, "we have historically overestimated the value of access to information and underestimated the value of access to one another."

A new digital divide

Internet access is information access. Citizens who are not online are by definition on the other side of the digital divide. In the 21st century, however, a data-driven life is also profoundly mobile.

According to the Pew Internet survey, 18% of wireless Internet users are tracking their own healthcare data, twice as many as those who do not have a wireless-enabled device. Open health data can spur better decisions for mobile users if they have access to a smartphone or tablet and the Internet. Without it, not so much.

"The difference that we see is in the mobile space," said Fox. "It's a younger demographic, and connected to that it's more diverse. When you look at who is accessing the Internet on their smartphone and has apps, you're likely to see a more diverse population. That's the promise of mobile health: that it will reach different audiences. And yet, these are not the audiences that are in the most need of health information. If you look at the numbers of people with disability or chronic disease, mobile is not closing that gap."

Fox spoke about the promise of mobile and the new digital divide at Transform 2010:

It's no secret that the ability to pay for data plans and smartphones is correlated with socioeconomic class status. Access to hardware may change as inexpensive Android devices continue to enter the market. According to ComScore, as of January 2011, 65.8 million Americans owned a smartphone, out of a total of 234 million users ages 13 and older. If 20% of those users switch over the course of this year, smartphone penetration will be just shy of 50%. That doesn't address the needs of those without access to broadband Internet. Simply having a smartphone and connection, however, doesn't result in the information literacy and health literacy needed to apply these tools.

That's a lot to ask of citizens, who will need well-designed healthcare apps to help them make sense of the data deluge. Given spiraling healthcare costs, however, the future of healthcare looks like it's in the palms of our hands.


March 31 2011

With sentiment analysis, context always matters

People are finding new ways to use sentiment analysis tools to conduct business and measure market opinion. But is such analysis really effective, or is it too subjective to be relied upon?

In the following interview, Matthew Russell (@ptwobrussell), O'Reilly author and principal and co-founder of Zaffra, says the quality of sentiment analysis depends on the methodology. Large datasets, transparent methods, and remembering that context matters, he says, are key factors.

What is sentiment analysis?

Matthew RussellMatthew Russell: Think of sentiment analysis as "opinion mining," where the objective is to classify an opinion according to a polar spectrum. The extremes on the spectrum usually correspond to positive or negative feelings about something, such as a product, brand, or person. For example, instead of taking a poll, which essentially asks a sample of a population to respond to a question by choosing a discrete option to communicate sentiment, you might write a program that mines relevant tweets or Facebook comments with the objective of scoring them according to the same criteria to try and arrive at the same result.

What are the flaws with sentiment analysis? How can something like sarcasm be addressed?

Matthew Russell: Like all opinions, sentiment is inherently subjective from person to person, and can even be outright irrational. It's critical to mine a large — and relevant — sample of data when attempting to measure sentiment. No particular data point is necessarily relevant. It's the aggregate that matters.

An individual's sentiment toward a brand or product may be influenced by one or more indirect causes &dmash; someone might have a bad day and tweet a negative remark about something they otherwise had a pretty neutral opinion about. With a large enough sample, outliers are diluted in the aggregate. Also, since sentiment very likely changes over time according to a person's mood, world events, and so forth, it's usually important to look at data from the standpoint of time.

As to sarcasm, like any other type of natural language processing (NLP) analysis, context matters. Analyzing natural language data is, in my opinion, the problem of the next 2-3 decades. It's an incredibly difficult issue, and sarcasm and other types of ironic language are inherently problematic for machines to detect when looked at in isolation. It's imperative to have a sufficiently sophisticated and rigorous enough approach that relevant context can be taken into account. For example, that would require knowing that a particular user is generally sarcastic, ironic, or hyperbolic, or having a larger sample of the natural language data that provides clues to determine whether or not a phrase is ironic.

Is the phrase "sentiment analysis" being used appropriately?

Matthew Russell: I've never had a problem with the phrase "sentiment analysis" except that it's a little bit imprecise in that it says nothing about how the analysis is being conducted. It only describes what is being analyzed — sentiment. Given the various flaws I've described, it's pretty clear that the analysis techniques can sometimes be as subjective as the sentiment itself. Transparency in how the analysis occurs and additional background data — such as the context of when data samples were gathered, what we know about the population that generated them, and so forth — is important. Of course, this is the case for any test involving non-trivial statistics.

Sentiment analysis recently was in the news, touted as an effective tool for predicting stock market prices. What other non-marketing applications might make use of this sort of analysis?

Matthew Russell: The stock market prices is potentially a problematic example because it's not always the case that a company that creates happy consumers is necessarily profitable. For example, key decision makers could still make poor fiscal decisions or take on bad debt. Like anything else involving sentiment, you have to hold the analysis loosely.

A couple examples, though, might include:

  • Politicians could examine the sentiment of their constituencies over time to try and gain insight into whether or not they are really representing the interests that they should be. This could possibly involve realtime analysis for a controversial topic, or historical analysis to try and identify trends such as why a "red state" is becoming a "blue state," or vice-versa. (Sentiment analysis is often looked at as a realtime activity, but mining historical samples can be incredibly relevant too.)
  • Conference organizers could use sentiment analysis based on book sales or related types of data to identify topics of interest for the schedule or keynotes.

Of course, keep in mind that just because the collective sentiment of a population might represent what the population wants, it's not necessarily the case that it's in its best interests.


March 29 2011

4 SXSWi themes reveal the story within the story

For those who went to Austin, Texas this month in search of the next big thing in technology, the South by Southwest Interactive (SXSWi) Festival may have proved disappointing. There was no single breakout company, platform or technology to be found, no matter how hard the hundreds of tech journalists and bloggers sized up the wares, apps and presentations of the startups and tech titans vying for attention. For those who went in search of connections to the tech community, it was a goldmine. After the dust from nearly 20,000 attendees rambling around Austin's streets settled, MG Siegler declared advertising and the iPad 2 the "winners" of SXSWi 2011.

Based upon my experiences there, it's hard to call him wrong, given how plastered with advertising the city had become. Big brands moved in for the week, from Apple's popup store to CNN, whose cafe sign was one of the iconic images of the event.

CNN Grill at SXSW

That said, there were technologies to be found that are clearly gathering steam. QR codes were everywhere, from Mashable's party to bar napkins to fundraising for Japan. "Gamification," where entrepreneurs, nonprofits or even governments try to add a game layer to commerce, learning or, really, anything, was also hot.

But those are just the technologies that caught my eye. Below I outline the larger trends that continue to resonate with me as the excitement of SXSWi naturally dissipates.

Offline is online

Angry Birds being played via a projector
Angry Birds being played via projector

While there were literally hundreds of companies vying to grab some of the hyperkinetic attendees' attention, the perspective that matters most is what their behavior can tell us about where our relationship to technology and one another is headed. The social behavior, omnipresent mobile devices and technologies embraced at SXSWi point to something more interesting: the dissolution of the boundaries between the offline and online world.

The Pew Internet and LIfe Project estimates Internet penetration in the United States at around 79%. A few countries are more wired. The majority are less so, although that's changing. At SXSWi, however, nearly everyone was connected to the Internet from waking to sleeping. The early adopters at the festival can offer some insight into what's coming down the pike for the rest of society, as more of us are constantly connected by a mobile device with a pervasive Internet connection.

At an impromptu panel on social media and the Middle East held at Twitter's SXSWi retreat, NPR's Andy Carvin reflected that for many of the young people whose struggles he's been chronicling in real-time, the question of "offline vs online" wouldn't register as meaningful: they've already merged in their lives.

The collective behavior of those networked masses was at the crux of Oliver Burkema's dispatch on SXSWi at the Guardian. He observed that the festival heralded:

... the final disappearance of the boundary between 'life online' and 'real life', between the physical and the virtual. It thus requires only a small (and hopefully permissible) amount of journalistic hyperbole to suggest that the days of 'the internet' as an identifiably separate thing may be behind us.

If that all sounds a bit familiar to Radar readers, it should: Clay Shirky, in discussing the role of the Internet as a platform for collective action heralded the "death of cyberspace" and the end of geek culture back in January. This isn't a new idea, but the research and observation are finally aligning with reality.

Shirky's SXSWi talk about social media, the so-called "dictator's dilemma" and Egypt, drew hundreds in the audience to think about what the impact of these trends means for millions of people in the Middle East and beyond. The insight that Shirky shared again is that connecting citizens to one another has been historically undervalued. As more connection technologies enter countries where information has been tightly controlled, expect the Internet to continue to act as a disruptor.

In watching the ebb and flow of the hordes experiencing idea overload of SXSWi, Edward Boches noticed the same melding of online and offline lives. If the behavior of these early adopters is a precursor to mainstream adoption, expect this trend to continue.

This also means that civil society will continue to need great teachers and thoughtful guides to information gathering and digital literacy. Based upon those demands, Phoebe Connolly's choice to call SXSW 2011 the "year of the librarian" looks spot on.

Mobile, location, and social

In his analysis of 2011 tech trends at the beginning of the year, my colleague Mike Loukides observed that "you don't get any points for predicting that 'Mobile is going to be big in 2011'." If you're looking for what was big at SXSWi, any reasonable observer had to acknowledge that mobile devices, maps, websites, services, experiences, marketing and data were profoundly relevant.

Stowe Boyd, who had written that he wasn't going to SXSWi, found something of great value in the "thriving petri dish for a social, mobile future." As Boyd observed in his post on SXSWi:

Over 50% of the world's population is now urban, and that is expected to rise to over 60% by 2030. The cities will not only be bigger, but increasingly dense, so what we learn from SxSW today could shape the social, mobile, urban landscape of the near future, since many of the architects of the future were there, taking notes.

With tablets in abundance, Caroline McCarthy's noted that SXSWi offered a peek at what a post-PC society might look like. By year's end, more Xooms, Galaxy Tabs and BlackBerry Playbooks will likely joins the millions of iPads already in consumers' hands. Augmented reality apps that bridge the gap between online and offline life, like Sensierge, may be on many of them.

Where 2.0: 2011, being held April 19-21 in Santa Clara, Calif., will explore the intersection of location technologies and trends in software development, business strategies, and marketing.

Save 25% on registration with the code WHR11RAD

You can put location and social media in the same category of trend: impossible to omit but obvious to report. Location-based applications like Foursquare and Gowalla were visible everywhere, with a host of other startups looking to pick up some screen real estate and new users.

There was one clear winner in the social space: POPVOX, which took home an award for the best social networking site at the SXSWi Accelerator competition. Making social media in politics meaningful has been a tough nut to crack, but trying to use the Internet to make Congress smarter may be an idea whose time has come. [Disclosure: Tim O'Reilly was an early investor in POPVOX.]

California-based Votizen, which powered a social media campaign for the Startup Visa bill during SXSWi, is focusing on this space as well. Look for an in-depth report on their efforts here on Radar soon.

Mastering social media saturation at SXSWi, however, as Daniel Terdiman pointed out at CNET, required developing better filters and tuners for signals. Jeremiah Owyang found SXSWi great for networking but only if people remembered to detach from their mobile devices.

Last year, Clive Thompson wrote about about the death of the phone call. In 2011, SXSWi may have "socially" written its obituary. Making a phone call was superseded by texts, checkins, email, tweets and instant messages. That's not to say that there weren't plenty of people on the phone there, just that the cornucopia of other communication options means synchronous voice communication wasn't always the first or even third option. Social media overload on Twitter, Facebook and location-based networks created an opportunity for group messaging apps like, Fast Society or Beluga to connect people without sharing the information with everyone.

Part of the impulse to check and recheck social media is deep-seated in biochemical pathways, as Caterina Fake described in her analysis of FOMO and social media. "FOMO," or the "Fear of Missing Out," has been used by savvy entrepreneurs to drive use of their apps to find out what's happening, where and with whom. As Katherine Rossman reported for the Wall Street Journal, at SXSWi 2011, "looking down was the new looking up."

For some geeks who have been in the industry for a long time, however, being more substantively human in person was a greater attraction. As Gina Trapani may have said it best:

The best kind of social networking: one face talking to another face within 3 feet of each other.

Big data, open data and your data

Reid Hoffman's keynote talk was one of the highlights, due in no small part to the founder of LinkedIn's focus on data. Hoffman's big idea is the importance of big data, which in many ways is leading us into the next stage of the Internet. In 5 years, he posited "a product designer may need to have the characteristics of a data scientist." As we look ahead, "the future is sooner and stranger than we think."

Hoffman provided 10 rules of entrepreneurship gleaned from his time as an entrepreneur and venture capitalist. "The way we make human progress is how we collaborate together," he said.

New digital platforms and analytics will offer unexpected opportunities. "Airbnb gives us the market of eBay for [physical] space," said Hoffman, enabling people to shift how things are priced or offered. They also bring new risks. "Trying to make data trails invisible to people is nearly a Sisyphean task," he said, highlighting a key flashpoint that lurks amidst the growing petabytes of data: privacy and security.

"I would like a data dashboard with the information that the government has about me," said Hoffman, perhaps akin to the Google dashboard that provides insight into data on that service. What concerned him is that we might end up in a state akin to Aldous Huxley's "Brave New World." Awash in data, how do we discern truth in vast amounts of data?

While some will balk at the versioning of calling this "Web 3.0," the coming age of data science looks too big and too fundamental a shift to ignore.

Privacy and digital rights

Nestled amidst the hype and potential of the technologies on display were concerns about what this explosion of location and mobile will mean for unwary consumers. "Tucked modestly into the corner, an American Civil Liberties Union table offers white papers and postcards warning of the privacy dangers of all this data mining," reported Jessica Clark for PBS Mediashift.

I moderated a panel on a "social networking bill of rights," which has continued to receive attention in the days since the festival from MSNBC,, Identity Blog, Liminal States, and PC World.

At, Alistair Fairweather highlighted key questions for the technology industry to consider in the months ahead:

Why is user data always vested within the networks themselves? Why don't we host our own data as independent "nodes," and then allow networks access to it?

On that count, a briefing with a startup emerging from stealth mode at SXSWi suggested how an emerging ecosystem of trust frameworks could offer a trust layer for new class of personal data stores like the Locker Project. More on trust frameworks and that startup in a future post.

SXSWi grows up

SXSW at night

The SXSWi festival itself may be the biggest winner — or loser, depending on the critic taking stock. McCarthy called it out: SXSWi has changed — "so deal with it."

SXSWi 2011 was less a conference than a much-needed test bed for new promotions like the partnership between Foursquare and American Express, edgy marketing initiatives, and fresh habits of mobile behavior like the much-hyped showdown between a handful of similar "group messaging" applications, which permitted attendees to communicate and travel in packs as Twitter has grown too overwhelmingly popular to use as a fine-tuned way to navigate the festival.

Fair warning: 2011 was my first SXSWi. I harbor no kind, warm memories of when the festival was smaller, gentler or allowed for different kinds of community building in the technology world. In many respects, the expansion of SXSWi into an immense tech trade show / geek spring break / sprawling conference reflects the vastly expanded role for interactive technologies in modern society. Whether we do something meaningful with them beyond finding a better party is in our hands.

Coming soon: A report on SXSWi for the Gov 2.0 crowd. As Patrick Ruffini observed, SXSWi 2011 "was the year that open government made its first huge splash" at the festival.

March 03 2011

Social data is an oracle waiting for a question

We're still in the stage where access to massive amounts of social data has novelty. That's why companies are pumping out APIs and services are popping up to capture and sort all that information. But over time, as the novelty fades and the toolsets improve, we'll move into a new phase that's defined by the application of social data. Access will be implied. It's what you do with the data that will matter.

Matthew Russell (@ptwobrussell), author of "Mining the Social Web" and a speaker at the upcoming Where 2.0 Conference, has already rounded that corner. In the following interview, Russell discusses the tools and the mindset that can unlock social data's real utility.

How do you define the "social web"?

Matthew RussellMatthew Russell: The "social web" is admittedly a notional entity with some blurry boundaries. There isn't a Venn diagram that carves the "social web" out of the overall web fabric. The web is inherently a social fabric, and it's getting more social all the time.

The distinction I make is that some parts of the fabric are much easier to access than others. Naturally, the platforms that expose their data with well-defined APIs will be the ones to receive the most attention and capture the mindshare when someone thinks of the "social web."

In that regard, the social web is more of a heatmap where the hot areas are popular social networking hubs like Twitter, Facebook, and LinkedIn. Blogs, mailing lists, and even source code repositories such as Source Forge GitHub, however, are certainly part of the social web.

What sorts of questions can social data answer?

Matthew Russell: Here are some concrete examples of questions I asked — and answered — in "Mining the Social Web":

  • What's your potential influence when you tweet?
  • What does Justin Bieber have (or not have) in common with the Tea Party?
  • Where does most of your professional network geographically reside, and how might this impact career decisions?
  • How do you summarize the content of blog posts to quickly get the gist?
  • Which of your friends on Twitter, Facebook, or elsewhere know one another, and how well?

It's not hard at all to ask lots of valuable questions against social web data and answer them with high degrees of certainty. The most popular sources of social data are popular because they're generally platforms that expose the data through well-crafted APIs. The effect is that it's fairly easy to amass the data that you need to answer questions.

With the necessary data in hand to answer your questions, the selection of a programming language, toolkit, and/or framework that makes shaking out the answer is a critical step that shouldn't be taken lightly. The more efficient it is to test your hypotheses, the more time you can spend analyzing your data. Spending sufficient time in analysis engenders the kind of creative freedom needed to produce truly interesting results. This why organizations like Infochimps and GNIP are filling a critical void.

Where 2.0: 2011, being held April 19-21 in Santa Clara, Calif., will explore the intersection of location technologies and trends in software development, business strategies, and marketing.

Save 25% on registration with the code WHR11RAD

What programming skills or development background do you need to
effectively analyze social data?

Matthew Russell: A basic programming background definitely helps, because it allows you to automate so many of the mundane tasks that are involved in getting the data and munging it into a normalized form that's easy to work with. That said, the lack of a programming background should be among the last things that stops you from diving head first into social data analysis. If you're sufficiently motivated and analytical enough to ask interesting questions, there's a very good chance you can pick up an easy language, like Python or Ruby, and learn enough to be dangerous over a weekend. The rest will take care of itself.

Why did you opt to use GitHub to share the example code from the book?

Matthew Russell: GitHub is a fantastic source code management tool, but the most interesting thing about it is that it's a social coding repository. What GitHub allows you to do is share code in such a way that people can clone your code repository. They can make improvements or fork the examples into an entirely new form, and then share those changes with the rest of the world in a very transparent way.

If you look at the project I started on GitHub, you can see exactly who did what with the code, whether I incorporated their changes back into my own repository, whether someone else has done something novel by using an example listing as a template, etc. You end up with a community of people that emerge around common causes, and amazing things start to happen as these people share and communicate about important problems and ways to solve them.

While I of course want people buy the book, all of the source code is out there for the taking. I hope people put it to good use.


Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.
No Soup for you

Don't be the product, buy the product!

YES, I want to SOUP ●UP for ...