Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

December 19 2012

Why isn’t social media more like real life?

I finally got around to looking at my personal network graph on Linkedin Labs the other day. It was a fun exercise and I got at least one interesting insight from it.

Take a look at these two well defined and distinct clusters in my graph. These are my connections with the startup I worked for (blue) and the company that acquired us in 2008 (orange). It is fascinating to me that all these years later the clusters remain so disconnected. There are shared connections within a common customer base, but very few direct connections across the clusters. I would love to see maps from some of my other colleagues who are still there to see if theirs show the same degree of separation. This was an acquisition that never really seemed to click and whether this is a picture of cause or effect, it maps to my experiences living in it.

That’s an aside though. What this graph really puts in stark relief is what every social network out there is learning about us. And this graph doesn’t really tell the whole story because it doesn’t represent edge weights and types, which they also know. Social networks know who we connect with, who we interact with, and the form and strength of those interactions.

But this post isn’t a privacy rant. I know they know this stuff and so do you. What this image got me thinking about again is why social networks aren’t using this information to create for us a social experience that is more like our real world, and frankly more in tune with our human-ness.

Social media properties plumb this data to know which ads to show us, and sometimes they use it to target messages to us more effectively. Remember those LinkedIn messages we got with the pictures of our friends? We all clicked on them. But they just don’t seem to be making that much effort to make use of what they know to innovate on our behalf, to improve our experience.

For example, Facebook knows all of this too and yet they continue to cling to the curious fiction that our social life is one giant flat maximally-connected equi-weighted graph. A single giant room where we all stand shoulder to shoulder wondering who all of these strangers are. A place that refuses to acknowledge the nuance and complexity of our real world relationships. And Twitter, for all it’s wonderfulness, does the same thing. And Google Plus? Why are you making me curate circles? You know what they are. At least take a guess at making them for me.

They call themselves social networks, but in terms of how they express themselves to us, their users, they seem to be using the word “network” the way broadcast television does. The experience is more analogous to a vast mesh of public access television networks than with the complexity and richness of real world social connections. You say something and it is presented to everyone, no matter which of those clusters they inhabit. So 10% care and the rest of them filter it.

In the natural world of human-to-human conversation, communication travels person to person, modified and attenuated along the way. Or, in some cases, amplified into a cluster-spanning meme. I think it would be fascinating to see social media properties experiment with recreating some of these more complex dynamics. What if I could “talk” to a well-defined cluster in my graph and see the strength of the signal attenuate rapidly as the distance from that core increased? Not to make it invisible, but perhaps make its volume more appropriate to the another cluster’s contextual center of gravity.

Or, in the inverse, knowing things about my graph Twitter could give me a really nice low-pass filter that gave preference to those in my stream that are “close” to me, or share a common edge type, but who might not be tweeting at high frequency.

There are lots of possibilities along these lines. And I know that a big part of what makes these services useful is their simplicity. Fine. But ultimately, I wonder is all of this network science going to benefit me in any direct way as a user of these services, or is the whole field of data science ultimately about reverse engineering me for sake of advertisers?

I wrote a post a while back about our paleolithic roots and the way we consume media. The “diet” part aside, what I’ve been thinking about a lot since is a digital design sense that caters to our neurological reality. Instead of designing for the convenience of the machines and demand that we adapt, design for who we actually are. Buggy. Tribal. Easily distracted. Full of bias. Curious. Whatever. I’m eager to see a more ambitious approach to design that infuses our digital worlds with more of the nuance and subtlety we find in the physical realm, all while preserving the reach that makes our digital world special.

June 08 2012

Four short links: 8 June 2012

  1. HAproxy -- high availability proxy, cf Varnish.
  2. Opera Reviews SPDY -- thoughts on the high-performance HTTP++ from a team with experience implementing their own protocols. Section 2 makes a good intro to the features of SPDY if you've not been keeping up.
  3. Jetpants -- Tumblr's automation toolkit for handling monstrously large MySQL database topologies. (via Hacker News)
  4. LeakedIn -- check if your LinkedIn password was leaked. Chris Shiflett had this site up before LinkedIn had publicly admitted the leak.

Sponsored post

December 23 2011

May 18 2011

Four short links: 18 May 2011

  1. The Future of the Library (Seth Godin) -- We need librarians more than we ever did. What we don't need are mere clerks who guard dead paper. Librarians are too important to be a dwindling voice in our culture. For the right librarian, this is the chance of a lifetime. Passionate railing against a straw man. The library profession is diverse, but huge numbers of them are grappling with the new identity of the library in a digital age. This kind of facile outside-in "get with the Internet times" message is almost laughably displaying ignorance of actual librarians, as much as "the book is dead!" displays ignorance of books and literacy. Libraries are already much more than book caves, and already see themselves as navigators to a world of knowledge for people who need that navigation help. They disproportionately serve the under-privileged, they are public spaces, they are brave and constant battlers at the front line of freedom to access information. This kind of patronising "wake up and smell the digital roses!" wank is exactly what gives technologists a bad name in other professions. Go back to your tribes of purple cows, Seth, and leave librarians to get on with helping people find, access, and use information.
  2. An Old Word for a New World (PDF) -- paper on how "innovation", which used to be pejorative, came now to be laudable. (via Evgeny Mozorov)
  3. AlchemyAPI -- free (as in beer) entity extraction API. (via Andy Baio)
  4. Referrals by LinkedIn -- the thing with social software is that outsiders can have strong visibility into the success of your software, in a way that antisocial software can't.

January 26 2011

Four short links: 26 January 2011

  1. Find Communities -- algorithm for uncovering communities in networks of millions of nodes, for producing identifiable subgroups as in LinkedIn InMaps. (via Matt Biddulph's Delicious links)
  2. Seven Ways to Think Like The Web (Jon Udell) -- seven principles that will head off a lot of mistakes. They should be seared into the minds of anyone working in the web. 2. Pass by reference rather than by value. [pass URLs, not copies of data] [...] Why? Nobody else cares about your data as much as you do. If other people and other systems source your data from a canonical URL that you advertise and control, then they will always get data that’s as timely and accurate as you care to make it.
  3. Wire It -- an open-source javascript library to create web wirable interfaces for dataflow applications, visual programming languages, graphical modeling, or graph editors. (via Pete Warden)
  4. Interview with Marco Arment (Rands in Repose) -- Most people assume that online readers primarily view a small number of big-name sites. Nearly everyone who guesses at Instapaper’s top-saved-domain list and its proportions is wrong. The most-saved site is usually The New York Times, The Guardian, or another major traditional newspaper. But it’s only about 2% of all saved articles. The top 10 saved domains are only about 11% of saved articles. (via Courtney Johnston's Instapaper Feed)

December 02 2010

Four short links: 2 December 2010

  1. Glasgow University to License Its IP For Free -- while a small proportion of high value University of Glasgow IP will still be made available to industry through traditional licensing and spin-out companies alone, offering the bulk of IP to a larger audience for free adds value to the UK economy. (via Hacker News)
  2. Apollo 13 Spacelog -- the Apollo 13 mission transcripts presented as though it were a chat session. Not cheesy, but an effective presentation.
  3. Kafka -- LinkedIn's open source pub/sub message system.
  4. Buy This Satellite -- The owner of the world's most capable communication satellite just went bankrupt.We're fundraising to buy it.So we can move it to connect millions of people who will turn access into opportunity. (via Daniel Spector on Twitter)

November 19 2010

We're entering the talent economy

Jeff WeinerNews that Google has more than 2,000 job openings reinforced a point LinkedIn CEO Jeff Weiner made at Web 2.0 Summit: We're entering the "talent economy."

During our interview, I asked Weiner about near-term drivers of the Internet economy. Here's what he said:

... The economy, generally, is going to be increasingly driven by talent. The world has evolved. If you look back at history, we've moved from an agrarian age to the industrial revolution, followed by an information age -- arguably a "meta" information age -- and I think we're transitioning into a talent economy. Where it's not just about the information you know, but about who you know and the information they possess.

... Knowledge is now evolving so quickly, I think it's equally, if not more important, to have access and be connected to the people who have the knowledge you most need to get your job done ... Talent is going to be driving where value gets created.

Weiner touched on a number of other topics during our discussion, including:

  • How LinkedIn uses data science to create relevance from massive streams of information. Case in point: aggregated statistics reveal how companies stack up against competitors in areas like R&D and employee tenure.
  • How new features, like the Career Explorer beta, are combining personal networks and predictive tools to help job seekers find career paths. Put another way: Data tools are shifting the focus from what did happen to what can happen.

The following video contains the full interview:

The marriage of data science and data products will be discussed in-depth at the upcoming Strata Conference (Feb. 1-3, 2011 in Santa Clara, Calif.). Save 20% on registration with the code "STR11RAD."


September 30 2010

Strata Week: Behind LinkedIn Signal

Professional social networking site LinkedIn yesterday announced a new service, Signal, that applies the filters of the LinkedIn network over status updates, such as those from Twitter. Signal lets you do things such as watch tweets from particular industries, companies or locales, or filter by your professional network. All in real time.

Screenshot of LinkedIn Signal

Overlaying the Twitter nation with LinkedIn's map is a great idea, so what's the technology behind Signal? Like fellow social networks Facebook and Twitter, LinkedIn has a smart big data and analytics team, who often leverage or create open source solutions.

LinkedIn engineer John Wang (@javasoze) gave some clues as to Signal's infrastructure of "Zoie, Bobo, Sensei and Lucene", and I thought it would be fascinating to examine the parts in more detail.

Signal uses a variety of open source technologies, some developed in-house at LinkedIn by their Search, Network and Analytics team.

  • Zoie (source code) is a real-time search and indexing system built on top of the Apache Lucene search platform. As documents are added to the index, they become immediately searchable.
  • Bobo is another extension to Apache Lucene. While Lucene is great for searching free text data, Bobo takes it a step further and provides faceted searching and browsing over data sets (source code)
  • Sensei (source code) is a distributed, scalable, database offering fast searching and indexing. It is particularly tuned to answer the kind of queries LinkedIn excels at: free text search, restricted over various axes in their social network. Sensei uses Bobo and Zoie, adding clustered, elastic database features.
  • Voldemort is an open source fault-tolerant distributed key-value store, similar to Amazon's Dynamo.

LinkedIn also use the Scala and JRuby JVM programming languages, alongside Java.

If you're interested in hearing more about LinkedIn Signal, check out the coverage on TechCrunch,, Mashable and The Daily Beast.

Bringing visualization back to the future

Speaking at this week's Web 2.0 Expo in New York, Julia Grace of IBM encouraged attendees to raise their game with data visualization. As long ago as the 1980s movie directors envisioned exciting and dynamic data visualizations, but today most people are still sharing flat two-dimensional charts, which restrict the opportunities for understanding and telling stories with data. Julia decided to make some location-based data very real by projecting it onto a massive globe.

Julia's talk is embedded below, and you can also read an extended interview with her published earlier this month on O'Reilly Radar.

Hadoop goes viral

Software vendor Karmasphere creates developer tools for data intelligence that work with Hadoop-based SMAQ big data systems. They recently commissioned a study into Hadoop usage. One of the most interesting results of the survey suggests that Hadoop systems tend to start as skunkworks projects inside organizations, and move rapidly into production.

Once used inside an organization, Hadoop appears to spread:

Additionally, organizations are finding that the longer Hadoop is used, the more useful it is found to be; 65% of organizations using Hadoop for a year or more indicated more than three reasons for using Hadoop, as compared to 36% for new users.

There are challenges too. Hadoop offers the benefits of affordable big data processing, but it has an immature ecosystem that is only just starting to emerge. Respondents to the Karmasphere survey indicated that pain points included a steep learning curve, hiring qualified people, tool availability and educational materials.

This is good news for vendors such as Karmasphere, Datameer and IBM, all of whom are concentrating on making Hadoop work in ways that are familiar to enterprises, through the medium of IDEs and spreadsheet interfaces.

SciDB source released

The SciDB database is an answer to the data and analytic needs of the scientific world; serving among others the needs of biology, physics, and astronomy. In the words of their website, a database "for the toughest problems on the planet." SciDB Inc., the sponsors of the open source project, say that although science has become steadily more data intensive, scientists have had to use databases intended for commercial, rather than scientific, applications.

One of the most intriguing aspects of SciDB is that it emanates from the work of serial database innovator Michael Stonebraker. Scientific data is inherently multi-dimensional, Stonebraker told The Register earlier this month, and thus ill-suited for use with traditional relational databases.

The SciDB project has now made their source code available. The current release, R0.5, is an early stage product, for the "curious and intrepid". It features a new array query language, known as AQL, an SQL-like language extended for the array data model of SciDB. The release will run on Linux systems, and is expected to be followed up at the end of the year by a more robust and stable version.

SciDB is available under the GPL3 free software license, and may be downloaded on application to the SciDB team. According to the authors, more customary use of open source repositories is likely to follow soon.

Send us news

Email us news, tips and interesting tidbits at

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.
No Soup for you

Don't be the product, buy the product!

YES, I want to SOUP ●UP for ...