Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

April 06 2012

Visualization of the Week: Clustering your social graph

In early 2011, LinkedIn released InMaps, a way to visualize your network and see the clusters into which people fall, based on where you shared employment or education, for example. (You can read O'Reilly Radar's interview with LinkedIn data scientist Ali Imam for a look at how the company created InMaps.)

The visualization below comes from the data viz company Meurs. It's a Facebook app that uses a concept similar to InMaps to visualize your Facebook network. With the app, you can see how your friends are clustered based on education, location, occupation, and so on.

Facebook clusters

You can also view the clusters as arranged by movie, TV, books, and music "likes." It's particularly revealing to see which of your Facebook friends are in the Nickelback cluster, or in my case, that none of the people I went to grad school with (for literature, I should add) have "liked" any books on Facebook.

Found a great visualization? Tell us about it

This post is part of an ongoing series exploring visualizations. We're always looking for leads, so please drop a line if there's a visualization you think we should know about.

Fluent Conference: JavaScript & Beyond — Explore the changing worlds of JavaScript & HTML5 at the O'Reilly Fluent Conference (May 29 - 31 in San Francisco, Calif.).

Save 20% on registration with the code RADAR20

More Visualizations:

March 09 2012

Four short links: 9 March 2012

  1. Why The Symphony Needs A Progress Bar (Elaine Wherry) -- an excellent interaction designer tackles the real world.
  2. Biologic -- view your social network as though looking at cells through a microscope. Gorgeous and different.
  3. The Cost of Cracking -- analysis of used phone listings to see what improves and decreases price yields some really interesting results. Phones described as “decent” are typically priced 23% below the median. Who would describe something they’re selling as "decent" and price it below market value unless something fishy was going on? [...] On average, cracking your phone destroys 30-50% of its value instantly. Particularly interesting to me since Ms 10 just brought home her phone with *cough* a new starburst screensaver.
  4. OpenStreetMap Welcomes Apple -- this is the classy way to deal with the world's richest company quietly and badly using your work without acknowledgement.

February 07 2012

Four short links: 7 February 2012

  1. Integrated Content Editor (GitHub) -- a track changes implementation, built in javascript, for anything that is contenteditable on the web, written by the NY Times team and open sourced.
  2. Data Tables -- featureful jQuery plugin for tables of data. (via Javascript Weekly)
  3. Creating a Developer Community (Slideshare) -- treat the problem like a channel conversion funnel: turn visitors into downloaders, downloaders into users, users into contributors. His screenshots of shitty conversions are great! (via Kohsuke Kawaguchi)
  4. Sex Differences in Intimate Relationships (PDF) -- Albert-Laszlo Barabasi and others use social graph analysis to analyze communications patterns in relationships. Notice that not only does the preference for an opposite-sex “best friend” kick in significantly earlier for females than for males (~18 years vs mid-20s, respectively), but females maintain a higher plateau value for much longer. More reality mining to understand ourselves. (via Sean Gourley)

February 06 2012

Four short links: 6 February 2012

  1. Jirafe -- open source e-commerce analytics for Magento platform.
  2. iModela -- a $1000 3D milling machine. (via BoingBoing)
  3. It's Too Late to Save The Common Web (Robert Scoble) -- paraphrased: "Four years ago, I told you all that Google and Facebook were evil. You did nothing, which is why I must now use Google and Facebook." His list of reasons that Facebook beats the Open Web gives new shallows to the phrase "vanity metrics". Yes, the open web does not go out of its way to give you an inflated sense of popularity and importance. On the other hand, the things you do put there are in your control and will stay as long as you want them to. But that's obviously not a killer feature compared to a bottle of Astroglide and an autorefreshing page showing your Klout score and the number of Google+ circles you're in.
  4. iBooks Author EULA Clarified (MacObserver) -- important to note that it doesn't say you can't use the content you've written, only that you can't sell .ibook files through anyone but Apple. Less obnoxious than the "we own all your stuff, dude" interpretation, but still a bit crap. I wonder how anticompetitive this will be seen as. Apple's vertical integration is ripe for Justice Department investigation.

January 25 2012

Four short links: 25 January 2012

  1. Mobile Overtaking Web -- provocatively packaged extrapolations of ComScore and similar numbers to conclude that Americans spend more time interacting with mobile apps than with web sites. I'm sure you could beat an iPhone developer to death with the error bars.
  2. Best Privacy Policy Ever -- satiric privacy policy from a Firefox plugin.
  3. The Time for Libraries is Now -- forceful presentation on the need for librarians (aka "information professionals") in an age of excess information.
  4. Google 2011 vs Microsoft 1995 (Nelson Minar) -- interesting analysis which prompted Andy Baio's comment Google will be in trouble if their strategy succeeds, or if it doesn't.

January 24 2012

January 13 2012

Visualization of the Week: Visualizing your friends' Facebook likes

What do your Facebook friends have in common? (Well, other than all being your Facebook friend, of course.) What "likes" do they share?

Tony Hirst, a lecturer in the Department of Communication and Systems at The Open University, built a visualization that shows this connective data. He's also a written a step-by-step guide on how he constructed the visualization with Google Refine and Gephi.

Sketching common likes amongst my facebook friends

The result is a network diagram of what Hirst's friends commonly "like." He notes:

"Rather than returning likes, I could equally have pulled back lists of the movies, music or books they like, their own friends lists (permissions settings allowing), etc, etc, and then generated friends' interest maps on that basis."

Found a great visualization? Tell us about it

This post is part of an ongoing series exploring visualizations. We're always looking for leads, so please drop a line if there's a visualization you think we should know about.

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

More Visualizations:

November 10 2011

Strata Week: The social graph that isn't

Here are a few of the data stories that caught my attention this week:

Not social. Not a graph.

Graph Paper by Calsidyrose, on FlickrIt's hardly surprising that the founder of a "bookmarking site for introverts" would have something to say about the "social graph." But what Pinboard's Maciej Ceglowski has penned in a blog post titled "The Social Graph Is Neither" is arguably the must-read article of the week.

The social graph is neither a graph, nor is it social, Ceglowski posits. He argues that today's social networks have failed to capture the complexities and intricacies of our social relationships (there's no graph) and have become something that's at best contrived and at worst icky (actually, that's not the "worst," but it's the adjective Ceglowski uses).

From his post:

Imagine the U.S. Census as conducted by direct marketers — that's the social graph. Social networks exist to sell you crap. The icky feeling you get when your friend starts to talk to you about Amway or when you spot someone passing out business cards at a birthday party, is the entire driving force behind a site like Facebook. Because their collection methods are kind of primitive, these sites have to coax you into doing as much of your social interaction as possible while logged in, so they can see it.

But if today's social networks are troublesome, they're also doomed, Ceglowski contends, much as the CompuServes and the Prodigys of an earlier era were undone. It's not so much a question of their being out-innovated, but rather they were out-democratized. As the global network spread, the mass marketing has given way to grassroots efforts.

"My hope," Ceglowski writes, "is that whatever replaces Facebook and Google+ will look equally inevitable and that our kids will think we were complete rubes for ever having thrown a sheep or clicked a +1 button. It's just a matter of waiting things out and leaving ourselves enough freedom to find some interesting, organic, and human ways to bring our social lives online."

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

Cloudera raises $40 million

ClouderaThe Hadoop-based startup Cloudera announced this week that it has raised another $40 million in funding, led by Ignition Partners, Greylock, Accel, Meritech Capital Partners, and In-Q-Tel. This brings the total investment in the company to some $76 million, a solid endorsement of not just Cloudera but of the Hadoop big data solution.

Hadoop is a trend that we've covered almost weekly here as part of the Strata Week news roundup. And GigaOm's Derrick Harris has run some estimates on the numbers of the Hadoop ecosystem at large, finding that: "Hadoop-based startups have raised $104.5 million since May. The same set of companies has raised $159.7 million since 2009 when Cloudera closed its first round."

While it's easy to label Hadoop as one of the buzzwords of 2011, the amount of investor interest, as well as the amount of adoption, is an indication that many people see this as a cornerstone of a big data strategy as well as a good source of revenue for the coming years.

Kaggle raises $11 million to crowdsource big data

KaggleIt's a much smaller round of investment than Cloudera's, to be sure, but Kaggle's $11 million Series A round announced this week is still noteworthy. Kaggle provides a platform for running big data competitions. "We're making data science a sport," so its tagline reads.

But it's more than that. There remains a gulf between data scientists and those who have data problems to solve. Kaggle helps bridge this gap by letting companies outsource their big data problems to third-party data scientists and software developers, with prizes going to the best solutions. Kaggle claims it has a community of more than 17,000 PhD-level data scientists, ready to take on and resolve companies' data problems.

Kaggle has thus far enabled several important breakthroughs, including a competition that helped identify new ways to map dark matter in the universe. That's a project that had been worked on for several decades by traditional methods, but those in the Kaggle community tackled it in a couple of weeks.

The Supreme Court looks at GPS data tracking

The U.S. Supreme Court heard oral arguments this week in United States v. Jones, a case that could have major implications on mobile data, GPS and privacy. At issue is whether police need a warrant in order to attach a tracking device to a car to monitor a suspect's movements.

Surveillance via technology is clearly much easier and more efficient than traditional surveillance methods. Why follow a suspect around all day, for example, when you can attach a device to his or her car and just watch the data transmission? But it's clear that the data you get from a GPS device is much more enhanced than human surveillance, so it raises all sorts of questions about what constitutes a reasonable search. And while you needn't get a warrant to shadow someone's car, attaching that GPS tracking device might just violate the Fourth Amendment and the protection against unreasonable search and seizure.

But what's at stake is much larger than just sticking a tracking device to the underbelly of a criminal suspect's vehicle. After all, every cell phone owner gives off an incredible amount of mobile location data, something that the government could conceivably tap into and monitor.

During oral arguments, Supreme Court justices seemed skeptical about the government's power to use technology in this way.

Got data news?

Feel free to email me.

Photo: Graph Paper by Calsidyrose, on Flickr

Related:

November 09 2011

Four short links: 9 November 2011

  1. The Social Graph is Neither -- Maciej Ceglowski nails it. Imagine the U.S. Census as conducted by direct marketers - that's the social graph. Social networks exist to sell you crap. The icky feeling you get when your friend starts to talk to you about Amway, or when you spot someone passing out business cards at a birthday party, is the entire driving force behind a site like Facebook.
  2. Anonymous 101 (Wired) -- Quinn Norton explains where Anonymous came from, what it is, and why it is.
  3. Antibiotic Resistance (The Atlantic) -- Laxminarayan likens antibiotics resistance to global warming: every country needs to solve its own problems and cooperate—but if it doesn't, we all suffer. This is why we can't have nice things. (via Courtney Johnston)
  4. Deep Idle for Android -- developer saw his handset wasn't going into a deep-enough battery-saving idle mode, saw it wasn't implemented in the kernel, implemented it, and reduced battery consumption by 55%. Very cool to see open source working as it's supposed to. (via Leonard Lin)

July 13 2011

Four short links: 13 July 2011

  1. Freebase in Node.js (github) -- handy library for interacting with Freebase from node code. (via Rob McKinnon)
  2. Formalize -- CSS library to provide a standard style for form elements. (via Emma Jane Hogbin)
  3. Suggesting More Friends Using the Implicit Social Graph (PDF) -- Google paper on the algorithm behind Friend Suggest. Related: Katango. (via Big Data)
  4. Dyslexia -- a typeface for dyslexics. (via Richard Soderberg)

June 23 2011

Strata Week: Data Without Borders

Here are some of the data stories that caught my attention this week:

Data without borders

Data without bordersData is everywhere. That much we know. But the usage of and benefit from data is not evenly distributed, and this week, New York Times data scientist Jake Porway has issued a call to arms to address this. He's asking for developers and data scientists to help build a Data Without Borders-type effort to take data — particularly NGO and non-profits' data — and match it with people who know what to do with it.

As Porway observes:

There's a lot of effort in our discipline put toward what I feel are sort of "bourgeois" applications of data science, such as using complex machine learning algorithms and rich datasets not to enhance communication or improve the government, but instead to let people know that there's a 5% deal on an iPad within a 1 mile radius of where they are. In my opinion, these applications bring vanishingly small incremental improvements to lives that are arguably already pretty awesome.

Porway proposes building a program to help match data scientists with non-profits and the like who need data services. The idea is still under development, but drop Porway a line if you're interested.

Strata Conference New York 2011, being held Sept. 22-23, covers the latest and best tools and technologies for data science — from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.

Save 20% on registration with the code STN11RAD

Big data and the future of journalism

ScraperWikiThe Knight Foundation announced the winners of its Knight News Challenge this week, a competition to find and support the best new ideas in journalism. The Knight Foundation selected 16 projects to fund from among hundreds of applicants.

In announcing the winners, the Knight Foundation pointed out a couple of important trends, including "the rise of the hacker/data journalist." Indeed, several of the projects are data-related, including Swiftriver, a project that aims to make sense of crisis data; ScraperWiki, a tool for users to create their own custom scrapers; and Overview, a project that will create visualization tools to help journalists better understand large data sets.

IBM releases it first Netezza appliance

Last fall, IBM announced its acquisition of the big data analytics company Netezza. The acquisition was aimed at helping IBM build out its analytics offerings.

This week, IBM released its first new Netezza appliance since acquiring the company. The IBM Netezza High Capacity Appliance is designed to analyze up to 10 petabytes in just a few minutes. "With the new appliance, IBM is looking to make analysis of so-called big data sets more affordable," Steve Mills, senior vice president and group executive of software and systems at IBM, told ZDNet.

The new Netezza appliance is part of IBM's larger strategy of handling big data, of which its recent success with Watson on Jeopardy was just one small part.

The superhero social graph

MarvelPlenty of attention is paid to the social graph: the ways in which we are connected online through our various social networks. And while there's still lots of work to be done making sense of that data and of those relationships, a new dataset released this week by the data marketplace Infochimps points to other social (fictional) worlds that can be analyzed.

The world, in this case, is that of the Marvel Comics universe. The Marvel dataset was constructed by Cesc Rosselló, Ricardo Alberich, and Joe Miro from the University of the Balearic Islands. Much like a real social graph, the data shows the relationships between characters, and according to the researchers "is closer to a real social graph than one might expect."

Got data news?

Feel free to email me.



Related:


January 26 2011

Four short links: 26 January 2011

  1. Find Communities -- algorithm for uncovering communities in networks of millions of nodes, for producing identifiable subgroups as in LinkedIn InMaps. (via Matt Biddulph's Delicious links)
  2. Seven Ways to Think Like The Web (Jon Udell) -- seven principles that will head off a lot of mistakes. They should be seared into the minds of anyone working in the web. 2. Pass by reference rather than by value. [pass URLs, not copies of data] [...] Why? Nobody else cares about your data as much as you do. If other people and other systems source your data from a canonical URL that you advertise and control, then they will always get data that’s as timely and accurate as you care to make it.
  3. Wire It -- an open-source javascript library to create web wirable interfaces for dataflow applications, visual programming languages, graphical modeling, or graph editors. (via Pete Warden)
  4. Interview with Marco Arment (Rands in Repose) -- Most people assume that online readers primarily view a small number of big-name sites. Nearly everyone who guesses at Instapaper’s top-saved-domain list and its proportions is wrong. The most-saved site is usually The New York Times, The Guardian, or another major traditional newspaper. But it’s only about 2% of all saved articles. The top 10 saved domains are only about 11% of saved articles. (via Courtney Johnston's Instapaper Feed)

December 16 2010

Four short links: 16 December 2010

  1. On Compressing Social Networks (PDF) -- paper looking at the theory and practice of compressing social network graphs. Our main innovation here is to come up with a quick and useful method for generating an ordering on the social network nodes so that nodes with lots of common neighbors are near each other in the ordering, a property which is useful for compression (via My Biased Coin, via Matt Biddulph on Delicious)
  2. Requiring Email and Passwords for New Accounts (Instapaper blog) -- a list of reasons why the simple signup method of "pick a username, passwords are optional" turned out to be trouble in the long run. (via Courtney Johnston's Instapaper feed)
  3. Extreme Design -- building the amazing spacelog.org in an equally-amazing fashion. I want a fort.
  4. rgeo -- a new geo library for Rails. (via Daniel Azuma via Glen Barnes on Twitter)

November 29 2010

July 27 2010

Four short links: 27 July 2010

  1. Digital Continuity Conference Proceedings -- proceedings from a New Zealand conference on digital archiving, preservation, and access for archives, museums, libraries, etc.
  2. What Are The Scaling Issues to Keep in Mind While Developing a Social Network Feed? (Quora) -- insight into why you see the failwhale. (via kellan on Twitter)
  3. Fan Feeding Frenzy -- Amanda Palmer sells $15k in merch and music in 3m via Bandcamp. Is the record available on iTunes yet? Absolutely not. We have nothing against iTunes, it’ll end up there eventually I’m sure, but it was important for us to do this in as close to a DIY manner as possible. If we were just using iTunes, we couldn’t be doing tie-ins with physical product, monitoring our stats (live), and helping people in real-time when they have a question regarding the service. Being able to do all of those things and having such a transparent format in which to do it has been a dream come true. We all buy stuff on the iTunes store - or AmazonMP3 or whatever - but it’s not THE way artists should be connecting to fans, and it’s certainly not the way someone is going to capture the most revenue on a new release. (via BoingBoing)
  4. Sad State of Open Source in Android Tablets -- With the exception of Barnes & Noble’s Nook e-reader, a device that isn’t even really a tablet, I found one tablet manufacturer who was complying with the minimum of their legal open source requirements under GNU GPL. Let alone supporting community development.

June 14 2010

Four short links: 14 June 2010

  1. Learning from Libraries: the Literacy Challenge of Open Data (David Eaves) -- a powerful continuation of the theme from my Rethinking Open Data post. David observes that dumping data over the fence isn't enough, we must help citizens engage. We have a model for that help, in the form of libraries: We didn’t build libraries for an already literate citizenry. We built libraries to help citizens become literate. Today we build open data portals not because we have a data or public policy literate citizenry, we build them so that citizens may become literate in data, visualization, coding and public policy.
  2. OpenPCR on Kickstarter -- In 1983, Kary Mullis first developed PCR, for which he later received a Nobel Prize. But the tool is still expensive, even though the technology is almost 30 years old. If computing grew at the same pace, we would all still be paying $2,000+ for a 1 MHz Apple II computer. Innovation in biotech needs a kick start!
  3. Wingeing It -- profile of O'Reilly's wonderful Sara Winge by the ever fabulous Quinn Norton.
  4. PEGASUS -- petascale graph mining toolkit from CMU. See their most recent publication. (via univerself on Delicious)

June 01 2010

May 12 2010

Four short links: 12 May 2010

  1. The Ten Commandments of Rock and Roll (BoingBoing) -- ten rules that should be posted in every workplace as a guide to how to fail poisonously.
  2. Snapscouts -- rather creepy sousveillance site. It's up to you to keep America safe! If you see something suspicious, Snap it! If you see someone who doesn't belong, Snap it! Not sure if someone or something is suspicious? Snap it anyway! I like the idea of promoting a shared interest in keeping us all safe, but I'm not sure SnapScouts is there yet.
  3. Etherpad Foundation -- was open-sourced after Google acquired the company that offered it, has now acquired a life-after-death. Compare with the updated Google document editor which has a wordprocessing layout engine built in Javascript, and uses the algorithms behind Etherpad to offer simultaneous editing. (via Hacker News)
  4. Diaspora Kickstarter Project -- team looking for seed funding to write an aGPLed "privacy aware, personally controlled, do-it-all distributed open source social network" (no news of dessert topping or floor wax applicability). Received 2.5x their requested funding in a few days.

February 26 2010

Four short links: 26 February 2010

  1. Who Is Going To Build The New Public Services? -- a thoughtful exploration of the possibilities and challenges of third parties building public software systems. There's a lot of talk of "just put up the data and we'll build the apps" but I think this is a more substantial consideration of which apps can be built by whom.
  2. Quake 3 for Android -- kiss the weekend goodbye, NexusOne owners! My theory is that no platform has "made it" until a first person shooter has been ported to it. (via BoingBoing)
  3. Graph Mining -- slides and reading list from seminar series at UCSB on different aspects of mining graphs. Relevant because, obviously, social networks are one such graph to be mined.
  4. Treadmill Desk -- I want one. Staying fit while working at a sedentary job is important but not easy. I tried to type while using a stepper, but that's just a recipe for incomprehensible typing fail. (via BoingBoing)

November 20 2009

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl