Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

December 07 2010

Dipity taps data for infographics and revenue

DipityDipity, a service that lets users create and embed interactive timelines, has grown to 8.5 million users since opening to the public in 2008.

The service's timelines range from static, one-off graphics to living infographics that update via data feeds. The Huffington Post and Washington Post are using Dipity, and earlier this year the Seattle Times incorporated Dipity into their Pulitzer Prize-winning coverage of a breaking news story.

In the following interview, Dipity co-founder and CEO Derek Dukes (@ddukes) discusses the company's genesis, its business model, and the opportunities attached to rich datasets.

How did Dipity come about?

Strata 2011Derek Dukes: We started a couple years ago. We've focused on giving people the easiest possible tools to create timelines from a variety of different content sources. Over the last year or so, we've really started to get most of our traction. That was based on providing two services:

  1. Letting someone like the Washington Post or government agencies take datasets, showcase them, and surface data in interesting ways.
  2. Giving consumers the tools to make their own timelines with data from across the web.

When we first started, we saw the rate at which people were publishing information was increasing, based on YouTube and Twitter and other services. It seemed there was no great way to take that velocity of information and make sense out of it. We thought about what it would mean if you organize information on timelines, using tools to summarize a particular event or a particular happening.

An advantage here is that datasets are largely open. You can pull selected data from a variety of different services and create something that's really interesting. You can then continue to keep that updated on an ongoing basis, going back to those services over time and pulling more relevant information as the story unfolds.

What's driving interest in data visualizations?

DD: There are three things that are happening in the consumer market that are growing the market for data visualization. First, there's more data to do interesting things with, which wasn't true three or four years ago.

The second thing is that because we have touch interface tablets and high-resolution monitors, people expect to be able to do something interesting with data. It's not enough just to publish data in real-time anymore. Meaning and understanding have to be extracted from the data in interesting ways.

Third, with the advance of browser technology and increasing adoption of HTML5, things that were technically very difficult are now easier to roll out and scale. It's not like we have to build a one-off custom Flash applet that takes advantage of a limited data set. You now can build a robust platform in HTML5 that can use a variety of different data sources.

The Strata Conference (Feb. 1-3, 2011) will look at how businesses are using data to build products and revenue streams. Save 30% on registration with the code STR11RAD.

Where do you see the market for data visualizations going?

DD: Infographics will become living objects. They won't just be snapshots of data. They will change over time. I also think data visualizations are emergent in the same way that web video was emergent. Prior to 2005, the only people who could put video online were big corporations, because the tools and the cost structure were limited. The ability to create embedded video didn't really exist. Then YouTube made it something that everybody could do. Data visualization is on the precipice of that as well.

How does your business model work?

DD: Our free product is ad supported and our paid product is not. The premium version comes with additional functionality and deeper integration options.

Right now, about a third of our revenue comes from the free users through advertising and about two thirds of our revenue comes through freemium subscriptions. In the premium model, our paid products start at $4.95 per month and they go all the way up to $1,000 per month, plus some incremental integration work.

[Note: Dipity's products and plans are outlined here.]

The fact that people are creating content, and that content is getting all over the web, serves two purposes. First, there's a marketing component. Second, generally speaking, we wind up seeing some percentage of the traffic whenever a timeline gets embedded. We can monetize or convert those users and get them on board.

More importantly, having a large base of consumers creating content really helps you understand the problems people are going to encounter or the features they want. We can see, based upon usage or requests, where the roadmap should be over the next 12 months.

The Internet Memes timeline on Dipity.

How can visualizations help businesses with their own data?

Think about Google. Google sits on a huge volume of data. They choose to visualize and sort that data in a particular way that becomes useful. The consumer simply wants to get something out of the data. It could be getting a link to a relevant website or finding out what happened on the Bay Bridge this morning. The goal is to move from a big dataset into understanding.

For companies that sit on big datasets, the ability to create interfaces that are rich and engaging on top of that data seems important. If you focus on creating an interface that improves the understanding of the dataset, as long as the data is interesting, you should be in a pretty good position.

We're still in a time where consumers' expectations are evolving. Timelines seem to be a good visualization for most people. Geolocation -- data plus maps -- seems to be working. Heatmaps seem to be the next wave of that, in terms of visualizations that make sense. Tag clouds had their day in the sun, but we're moving from static tag clouds to dynamic tag clouds, which makes them more interesting. It's too early to say what the best approach is going to be, but there's definitely an opportunity for companies with rich datasets.

This interview was edited and condensed.


December 04 2010

Strata Gems: Quick starts for charts

We're publishing a new Strata Gem each day all the way through to December 24. Yesterday's Gem: Use Write your own visualizations.

Strata 2011If you're trying to summarize your data, you'll likely show it in a chart. It's easy to reach for a "standard" option, perhaps even the much-maligned pie chart: few of us leave education with a repertoire of more than a few chart types. Aside from giving your audience visual ennui, the usual suspects can be limited in what they convey.

This probably isn't news to you. You may be a disciple of Tufte, and have read the wealth of advice on creative effective charts, but where do you start? Here are a few ideas, spanning different toolsets and platforms.

Excel: Chart Chooser

Juice Analytics' Chart Chooser is a chart-style recommendation engine. Indicate the motivations behind your chart (one or more of comparison, distribution, composition, trend or relationships) and it'll suggest a chart type to use.

Going one step further than just recommendation, the chart chooser offers Excel and Powerpoint template files that you can alter and fill with your own data. Now there's no excuse for not understanding the vagaries of Excel chart controls!

Chart Chooser
Some of the 17 chart types available from Chart Chooser

R: Advanced Charts

If you're using the R statistical computing package, many chart types become available to you. D. Kelly O'Day has compiled many resources while documenting his personal journey into creating effective graphs and charts.

Initially reluctant to leave the familiarity of Excel and VBA, O'Day took the leap to learn R because of the availability of advanced chart types. His web site provides many visual examples of chart types, along with the R code to generate them, and enlightening and detailed blog posts about how to create the charts.

The Web: Tableau Public

Tableau is a leading visualization software package. The release of Tableau Public gives you a way to get started with Tableau and create publicly shareable visualizations that are interactive and render in standard web browsers.

Tableau's public edition is available for free and public use. Once data is published, anyone can see your visualizations or download the data and create their own visualizations from it. Take a look at Tableau's gallery of examples.

Tableau Public Screenshot

A screenshot from The Tale of 100 Entrepreneurs

Learn more at Strata

Naomi Robbins, author of Creating More Effective Graphs, will be presenting an in depth tutorial on Communicating Data Clearly.

Sponsored post

December 03 2010

Strata Gems: Write your own visualizations

We're publishing a new Strata Gem each day all the way through to December 24. Yesterday's Gem: Use Wikipedia as training data.

Strata 2011For many of us, collecting a data set is the easy bit, but turning data into a picture that tells a story is the hard part. We're frustrated at using the tired vocabulary of Excel-generated charts, but aren't sure where to go next. Like computer hardware, creating something graphical is a bit of a mystery - but needlessly so.

Take a little bit of time to get started with Processing, and you'll find creating interactive and interesting graphics to be fun, and not at all as hard as it seems. Processing has been around for almost ten years: originally motivated with the goal of aim promoting software literacy within the visual arts, it serves just as well to promote visual literacy among those who are comfortable with computing.

To get a feel for the capabilities of Processing, take a look at a couple of examples from the Processing Exhibition. Stephan Thiel's Understanding Shakespeare creates high level overviews of the text of Shakespeare plays, giving a feel for the form of the speeches and characters.

Just Landed, created by Strata speaker Jer Thorp, turns "just landed" tweets from airplane travelers into a 3D visualization of air travel. It's a great showcase for the capabilities of Processing, incorporating external data sources, 3D rendering, and exporting to video.

Just Landed - 36 Hours from blprnt on Vimeo.

There's a pretty straightforward way to get started with Processing, by using its JavaScript-based cousin, Processing.js. Whereas Processing is Java-based, Processing.js uses the features of HTML5 to make it possible to use Processing in modern web browsers. Getting up and running takes seconds: download the Processing.js archive and unpack it on a web server (Mac users can use the "Sites" folder on their computer and enable web sharing).

Take a look at the example.html file in the archive, and you'll see it simply sets up a canvas to draw on, and includes the actual Processing.js code from example.pjs. Replace the code in example.pjs with that shown below (taken from the Getting Started With Processing book).

void setup() {
  size(250, 250);

void draw() {
if (mousePressed) {
} else {
ellipse(mouseX, mouseY, 80, 80);

The code needs little explanation - the draw function controls what appears on the screen. It draws a circle underneath the mouse cursor, in either black or white depending on whether the mouse button is pressed or not. Give it a whirl!

In case you don't want to put files on a web server, there's an even easier way to experiment - the Processing.js IDE web page lets you paste code into the page and run it directly.

Processing.js screenshot
A run of the example code

So how do you connect Processing.js up to your data? A simple way is to directly generate the Processing.js "sketch" (.pjs) files from your data. For more interactivity, you can take advantage of the fact that Processing.js lets you mix Processing and JavaScript code, and fetch data dynamically from the web.

If you're using the Processing language proper, you can either read data from files directly, or use the Network library offers features to connect to remote servers.

November 25 2010

Four short links: 25 November 2010

  1. A Day in the Life of Twitter (Chris McDowall) -- all geo-tagged tweets from 24h of the Twitter firehose, displayed. Interesting things can be seen, such as Jakarta glowing as brightly as San Francisco. (via Chris's sciblogs post)
  2. British Library Release 3M Open Bibliographic Records) (OKFN) -- This dataset consists of the entire British National Bibliography, describing new books published in the UK since 1950; this represents about 20% of the total BL catalogue, and we are working to add further releases.
  3. Gadgets for Babies (NY Times) -- cry decoders, algorithmically enhanced rocking chairs, and (my favourite) "voice-activated crib light with womb sounds". I can't wait until babies can make womb sound playlists and share them on Twitter.
  4. GP2X Caanoo MAME/Console Emulator (ThinkGeek) -- perfect Christmas present for, well, me. Emulates classic arcade machines and microcomputers, including my nostalgia fetish object, the Commodore 64. (via BoingBoing's Gift Guide)

November 22 2010

Four short links: 22 November 2010

  1. Snippet -- JQuery syntax highlighter built on Syntax Highlighting in JavaScript. Snippet is MIT-licensed, SJHS is GPLv3.
  2. Fear of Forking -- (Brian Aker) GitHub has begun to feel like the Sourceforge of the distributed revision control world. It feels like it is littered with half started, never completed, or just never merged trees. If you can easily takes changes from the main tree, the incentive to have your tree merged back into the canonical tree is low.
  3. Product Invention Workshops (BERG London) -- Matt Webb explains what they do with customers. Output takes the form, generally, of these microbriefs. A microbrief is how we encapsulate recommendations: it’s a sketch and short description of a new product or effort that will easily test out some hypothesis or concept arrived at in the workshop. It’s sketched enough that people outside the workshop can understand it. And it’s a hook to communicate the more abstract principles which have emerged in the days. Their process isn't their secret weapon, it's their creativity, empathy, and communication skills that make them so valuable.
  4. OneMicron -- Janet Isawa's beautiful animations of biological science. (via BoingBoing who linked to this NYTimes piece)

November 18 2010

Strata Week: Keeping it clean

This edition of Strata Week is all about making things easy and tidy. If you're eager to learn more tips and tricks for doing so, come to Santa Clara in February: check out the list of Strata conference speakers and register today.

Languages made easy: R and Clojure

Love "fruitful and fun" data mining with Orange? Wish you had an interface like that for R? Wish no more. Anup Parikh and Kyle Covington have created Red-R to extend the Orange interface.

The goal of this project is to provide access to the massive library of packages in R (and even non-R packages) without any programming expertise. The Red-R framework uses concepts of data-flow programming to make data the center of attention while hiding all the programming complexity.

Similar to Orange, Red-R uses a series of widgets to modify and display data. The beauty of Red-R is that it allows programming novices to leverage R's power and to interact with their data in an analytical way. Such tools are no substitute for actual statistical modeling, of course, but they are a great first step in piquing interest and providing a visual conversation-starter.


Red-R is still in its infancy, but as with all such projects, testing and bug reports are welcome. Check out the forums to get involved.

If R is not your thing, perhaps you've jumped on the Clojure bandwagon (I wouldn't blame you: Clojure is one exciting new language). If that's the case, check out Webmine, a library for mining HTML written by Bradford Cross, Matt Revelle, and Aria Haghighi.

Facts are stubborn things

A team at the Indiana University Center for Complex Networks and Systems Research has built the Truthy system to examine and classify memes on Twitter in an attempt to identify instances of astroturfing, smear campaigns, and other "social pollution."

Truthy looks at streaming Twitter data via the public Twitter API, filters it to extract politically-minded tweets, and then pulls out "memes" like #hashtags, @ replies, phrases, and URLs. Memes that constitute a high volume of tweets, as well as memes that have experienced a significant fluctuation in volume, are flagged and entered into a database for further investigation.

The Truthy system then visualizes a timeline, map, and diffusion network for each meme, and applies sentiment analysis in order to better study and understand "social epidemics." It also relies on crowdsourcing to train its algorithms. Users can visit the project's website and are asked to click the "Truthy" button on a meme's detail page when they suspect a meme contains misinformation masquerading as fact.

Check out the gallery for some fascinating network visuals and the stories behind them.


A clean bill of health

Kudos to Dimagi and CIDRZ for a creative solution to a serious problem. In order to provide standard interventions to reduce maternal and infant mortality rates in rural Zambia for the BHOMA (Better Health Outcomes through Mentoring and Assessments) project, they needed a distributed system for capturing and relaying health data.

As in many other places in Africa, reliable internet is not easy to find in rural Zambian communities. But cell phones are nearly ubiquitous, and the best communication devices for relaying patient information from clinics and field workers and back again.

Enter Apache's CouchDB, which saved the day with its continuous replication. A lightweight server in each clinic now replicates filtered data to a national CouchDB database via a modem connection, and two-way replication allows data collected on phones to propagate back to each clinic.

Read more details of the case study here.

Refinement rather than fashion

You may recall that among a spate of Google acquisitions over the summer was Metaweb, the company responsible for Freebase. Now, a nifty open source tool formerly called Freebase Gridworks has been renamed Google Refine, and version 2.0 was released just last week.

Refine is a powerful tool for cleaning up data. It allows you to easily sort and transform inconsistent cells to correct typos and merge variants; filter, then remove or change certain rows; apply custom text transformations; examine numerical columns via histograms; and perform many more complex operations to make data more consistent and useful.

Refine really shines when it is used to combine or transform data from multiple sources, so it's no surprise that it has been popular for open government and data journalism tasks.

Also notable is the fact that Refine is a downloadable desktop app, not a web service. This means you don't have to upload your data anywhere in order to use it. Best of all, Google Refine keeps a running changelog that lets you review and revert changes to your data -- so go ahead: play around. A great set of video tutorials on Google's blog can help you do just that.

October 28 2010

Four short links: 28 October 2010

  1. Exploring Computational Thinking (Google) -- educational materials to help teachers get students thinking about recognizing patterns, decomposing problems, and so on.
  2. TimeMap -- Javascript library to display time series datasets on a map.
  3. Feedly -- RSS feeds + twitter + other sites into a single magazine format.
  4. Attention and Information -- what appears to us as “too much information” could just be the freedom from necessity. The biggest change ebooks have made in my life is that now book reading is as stressful and frenetic as RSS reading, because there's as much of an oversupply of books-I'd-like-to-read as there is of web-pages-I'd-like-to-read. My problem isn't over-supply of material, it's a shortage of urgency that would otherwise force me to make the hard decisions about "no, don't add this to the pile, it's not important enough to waste my time with". Instead, I have 1990s books on management that looked like maybe I might learn something .... (via Clay Shirky on Twitter)

October 21 2010

Strata Week: Statistically speaking

Here's a look at the latest data news and developments that caught my eye.

Never race a penguin

The London Stock Exchange (LSE) has reportedly "doubled" their networking speed with a new Linux-based system, clocking trading times at 126 microseconds as compared to previous times of several hundred microseconds.

ComputerworldUK reports that "BATS Europe and Chi-X, two dedicated electronic rivals to the LSE, are reported to have an average latency of 250 and 175 microseconds respectively."

The Millenium Exchange trading platform is scheduled to roll out on the LSE's main exchange on November 1, replacing a Microsoft .Net system.

Lies, damn lies, and vertical axes

William M. Briggs took issue in his blog with a recent post of Paul Krugman's for playing unfair tricks with the slopes of graphs by messing with the scale on the vertical axis.

Briggs asserts that starting the scale at zero is a wily way to flatten out a slope, and he's right that it has that affect. But worse, I think, is the perception distortion that results from displaying two graphs with different scales side-by-side. Whatever scale is used, consistency is key.

Ironically, Krugman's post was meant to call out graphical misrepresentation regarding levels of government spending. It all goes to show how much we need increased data and statistical literacy across the board.

Speaking of statistics

If you don't believe me, ask European Central Bank (ECB) president Jean-Claude Trichet. The fifth ECB conference on statistics, originally scheduled for April but delayed by a certain Icelandic ash cloud, was rescheduled for this week. To take advantage of yesterday's date (written in the European style, the date was twenty-ten-twenty-ten), it was declared the first World Statistics Day by the UN General Assembly. Trichet opened the conference by calling for better, more reliable statistics from all member countries.

Evidence-based decision-making in modern economies is unthinkable without statistics ... The financial crisis has revealed information gaps that we have to close while also preparing ourselves for future challenges. This is best achieved through creating a wide range of economic and financial statistics that are mutually consistent, thereby eliminating contradictory signals due to measurement issues. The main aggregates must be both reliable and timely, and, in a globalised world, they should be comparable across countries and economies.

Not only did Trichet highlight the need for widespread use of accepted statistical methodologies, but he also urged the G20 to think of themselves as examples for the globalized world. Read the transcript of his speech here.

Rest in peace, Prof. Mandelbrot

I don't wish to end on a sad note, but let's say goodbye with much fondness and gratitude for Benoît Mandelbrot, who passed away last Thursday at the age of 85.

Mandelbrot spent most of his career at IBM, eventually becoming an IBM Fellow before moving on to teach at Yale. Mary Miller, Yale College dean, remembered Mandelbrot by saying:

He revolutionized geometry and made it possible to think about measurements and visualization of forms through an entirely new kind of geometry.

Mandelbrot is perhaps best remembered for the work he did with fractals (a term he coined). While not the first to discover them, his emphasis and research brought them into the limelight as a useful tool for understanding the world around us, including things like the movement of planets and the English shoreline.

The image below is a representation of the famous Mandelbrot Set, a mathematical set of points in the complex plane that does not simplify at any level of magnification.

Send us news

Email us news, tips and interesting tidbits at

October 13 2010

Four short links: 13 October 2010

  1. 'Scrapers' Dig Deep for Data on Web (WSJ) -- our users' data comprise a valuable resource to mine and sell, but so do their kidneys. The data world faces serious issues with informed consent, control, and exploitation--it's not just a shiny new business model, it can also leave people feeling very violated. Again, if you're not paying for it then you're the product and not the customer. The majority of humanity is not conscious of the difference between "user" and "customer". (via Mike Brown on Twitter)
  2. Journalism in the Age of Data (Video) -- Stanford video, with annotations and links, on the challenge of using dataviz as a storytelling medium. (via Ben Goldacre on Twitter)
  3. webshell (Github) -- open source (Apache-licensed) console utility, requiring node.js, for debugging and understanding HTTP connections. (via Chris Shiflett on Twitter, who prefers it to yesterday's htty)
  4. Amazon to Launch Kindle Singles (press release) -- shorter-form works (think: novellas) as a format to expand publishing market rather than shrink it. Damn near every business book ever written should have been this size instead of 300 pages of tedium.

October 07 2010

Strata Week: Videos and visualization

Many places across the U.S. are experiencing brisk fall weather this week. If you're living in one of them like I am, then perhaps you too are feeling the urge to swap your walking shoes for a quilt and a remote control to watch some movies. No problem with that if you're learning, though, right? Here, then, for your viewing pleasure and enlightenment, are some great data-related videos.

Data visualization for journalism

Geoffrey McGhee, an online journalist who has worked for outlets such as, and Le Monde Interactif, has produced a wonderful documentary called Journalism in the Age of Data. McGhee was one of 12 U.S. Knight Journalism Fellows studying at Stanford during the 2009-2010 academic year, and created this video report during that time.

The project's description reads:

Journalists are coping with the rising information flood by borrowing data visualization techniques from computer scientists, researchers and artists. Some newsrooms are already beginning to retool their staffs and systems to prepare for a future in which data becomes a medium. But how do we communicate with data, how can traditional narratives be fused with sophisticated, interactive information displays?

To answer this question, McGhee interviewed many of the researchers and designers currently breaking ground in the field, including Martin Wattenberg, Fernanda Viégas, Ben Fry, Aaron Koblin, Jeffrey Heer, Matthew Ericson, Amanda Cox, Nigel Holmes, Nicholas Felton, Eric Rodenbeck, and many others.

Career planning for college students

Earlier this week, LinkedIn, in partnership with PricewaterhouseCoopers LLP (PwC), launched a new tool for current college students called Career Explorer. For now, the tool is in a limited roll-out to students in 60 universities across the U.S. that tend to feed talent to PwC, but a larger roll-out is on the way.

Career Explorer uses data from LinkedIn's 80 million users to help students map out potential career paths based on paths commonly taken by others in their fields of interest. Students can create and save multiple maps, search available jobs, and look for connections within their own networks who may be able to help. Career Explorer also provides statistics about various fields and jobs.

This short video shows some of Career Explorer's data-based features.

3D: Movies and data

At the Web2.0 Expo in New York last week, Julia Grace gave a keynote talk about the dimensionality of data. The movies, she said, are the ultimate dream-factory of data visualization and user interfaces because they allow us to design without the imperative to implement. Certain scenes from "futuristic" '80s movies look oddly familiar today.

Julia showed us the seven-foot sphere she purchased for her research lab that allows her to show three-dimensional data on a three-dimensional display. "Jumps and reductions in dimensionality equal distortion and inaccuracy," she said. If you have 3D data, you need a 3D display.

What will the future bring? Maybe it will look like the displays in "Avatar," or maybe like something else. But it is coming, fast.

Watch the keynote for yourself.

Touchable Holograms

Speaking of bringing the future forward, researchers at Tokyo University are doing just that with "touchable holograms," simple holographic images that can "feel" like physical objects. Two Nintendo Wiimotes track the user's hand, and ultrasonic waves create a sensation of pressure on the hand of the user when it interacts with the image.

As its inventors told NTD Television, there are several practical uses for this technology. "For example, it's been shown that in hospitals, there can be contamination between people due to objects that are touched communally. But if you can change the switches and such into a virtual switch, then you no longer have worry about touch contamination. This is one application that's quite easy to see," said Hiroyuki Shinoda, Professor at Tokyo University.

Another possibility is rapid-prototyping or implementation of UI design, since interfaces may be changed without the need to manufacture any physical parts.

So far, this technology has been used to create only simple objects. But it's the first step toward something Picard may one day be proud of. I'll see you in the future, Moriarty.

September 30 2010

Strata Week: Behind LinkedIn Signal

Professional social networking site LinkedIn yesterday announced a new service, Signal, that applies the filters of the LinkedIn network over status updates, such as those from Twitter. Signal lets you do things such as watch tweets from particular industries, companies or locales, or filter by your professional network. All in real time.

Screenshot of LinkedIn Signal

Overlaying the Twitter nation with LinkedIn's map is a great idea, so what's the technology behind Signal? Like fellow social networks Facebook and Twitter, LinkedIn has a smart big data and analytics team, who often leverage or create open source solutions.

LinkedIn engineer John Wang (@javasoze) gave some clues as to Signal's infrastructure of "Zoie, Bobo, Sensei and Lucene", and I thought it would be fascinating to examine the parts in more detail.

Signal uses a variety of open source technologies, some developed in-house at LinkedIn by their Search, Network and Analytics team.

  • Zoie (source code) is a real-time search and indexing system built on top of the Apache Lucene search platform. As documents are added to the index, they become immediately searchable.
  • Bobo is another extension to Apache Lucene. While Lucene is great for searching free text data, Bobo takes it a step further and provides faceted searching and browsing over data sets (source code)
  • Sensei (source code) is a distributed, scalable, database offering fast searching and indexing. It is particularly tuned to answer the kind of queries LinkedIn excels at: free text search, restricted over various axes in their social network. Sensei uses Bobo and Zoie, adding clustered, elastic database features.
  • Voldemort is an open source fault-tolerant distributed key-value store, similar to Amazon's Dynamo.

LinkedIn also use the Scala and JRuby JVM programming languages, alongside Java.

If you're interested in hearing more about LinkedIn Signal, check out the coverage on TechCrunch,, Mashable and The Daily Beast.

Bringing visualization back to the future

Speaking at this week's Web 2.0 Expo in New York, Julia Grace of IBM encouraged attendees to raise their game with data visualization. As long ago as the 1980s movie directors envisioned exciting and dynamic data visualizations, but today most people are still sharing flat two-dimensional charts, which restrict the opportunities for understanding and telling stories with data. Julia decided to make some location-based data very real by projecting it onto a massive globe.

Julia's talk is embedded below, and you can also read an extended interview with her published earlier this month on O'Reilly Radar.

Hadoop goes viral

Software vendor Karmasphere creates developer tools for data intelligence that work with Hadoop-based SMAQ big data systems. They recently commissioned a study into Hadoop usage. One of the most interesting results of the survey suggests that Hadoop systems tend to start as skunkworks projects inside organizations, and move rapidly into production.

Once used inside an organization, Hadoop appears to spread:

Additionally, organizations are finding that the longer Hadoop is used, the more useful it is found to be; 65% of organizations using Hadoop for a year or more indicated more than three reasons for using Hadoop, as compared to 36% for new users.

There are challenges too. Hadoop offers the benefits of affordable big data processing, but it has an immature ecosystem that is only just starting to emerge. Respondents to the Karmasphere survey indicated that pain points included a steep learning curve, hiring qualified people, tool availability and educational materials.

This is good news for vendors such as Karmasphere, Datameer and IBM, all of whom are concentrating on making Hadoop work in ways that are familiar to enterprises, through the medium of IDEs and spreadsheet interfaces.

SciDB source released

The SciDB database is an answer to the data and analytic needs of the scientific world; serving among others the needs of biology, physics, and astronomy. In the words of their website, a database "for the toughest problems on the planet." SciDB Inc., the sponsors of the open source project, say that although science has become steadily more data intensive, scientists have had to use databases intended for commercial, rather than scientific, applications.

One of the most intriguing aspects of SciDB is that it emanates from the work of serial database innovator Michael Stonebraker. Scientific data is inherently multi-dimensional, Stonebraker told The Register earlier this month, and thus ill-suited for use with traditional relational databases.

The SciDB project has now made their source code available. The current release, R0.5, is an early stage product, for the "curious and intrepid". It features a new array query language, known as AQL, an SQL-like language extended for the array data model of SciDB. The release will run on Linux systems, and is expected to be followed up at the end of the year by a more robust and stable version.

SciDB is available under the GPL3 free software license, and may be downloaded on application to the SciDB team. According to the authors, more customary use of open source repositories is likely to follow soon.

Send us news

Email us news, tips and interesting tidbits at

September 29 2010

Four short links: 29 September 2010

  1. Digital Mirror Demo (video) -- demo of the Digital Mirror tool that analyses relationships. Some very cute visualizations of social proximity and presentation of the things you can learn from email, calendar, etc. (via kgreene on Twitter)
  2. Free Machine Learning Books -- list of free online books from MetaOptimize readers. (via newsycombinator on Twitter)
  3. Chewie Stats -- sweet chart of blog traffic after something went memetic. Interesting for the different qualities of traffic from each site: As one might expect, Reddit users go straight for the punchline and bail immediately. One might assume the the same behavior from Facebook users, but no, among the visitors that hang around, they rank third! Likewise I would have expected MetaFilter readers to hang around and Boing Boing users to quickly move along; but in fact, the opposite is the case. (via chrissmessina on Twitter)
  4. The Document Foundation -- new home of OpenOffice, which has a name change to LibreOffice. I hope this is the start of a Mozilla-like rebirth, as does Matt Asay. (via migueldeicaza on Twitter)

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.
No Soup for you

Don't be the product, buy the product!

YES, I want to SOUP ●UP for ...