Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

October 30 2013

Four short links: 30 October 2013

  1. Offline.js — Javascript library so web app developers can gracefully deal with users going offline.
  2. Android Guideslots of info on coding for Android.
  3. Statistics Done Wrong — learn from these failure modes. Not medians or means. Modes.
  4. Streaming, Sketching, and Sufficient Statistics (YouTube) — how to process huge data sets as they stream past your CPU (e.g., those produced by sensors). (via Ben Lorica)

January 14 2010

Collecting, Aggregating, and Analyzing Data Exhaust

Next week, O'Reilly's Research Director Roger Magoulas, will lead an exciting panel discussion on Big Data. The focus will be on the piles of data that companies have been collecting, and are just beginning to analyze:

The internet and social media create a mountain of random, unstructured, and at times ephemeral data by-products, which may appear to be trash. Yet, one person’s trash is another’s treasure. From FaceBook to Netflix, people are spending more time sharing their thoughts, opinions, plans and perspectives as they socialize and conduct business online. With each of these Internet exchanges traces of information,or Data Exhaust, are left behind. When correlated or combined, these snippets can provide insight into political views, professional achievements, purchasing behaviors, and demographic information—pinpointing trend setters and leading indicators. Brilliant innovators now re-purpose this data stream, aggregating and analyzing the data to provide new products or services.

Next Tuesday's panel discussion and networking event will be held at the Stanford Business School. Further details are available on the VLAB web site.

(†) Recent Radar posts on Big Data: (1) Counting Unique Users in Real-time with Streaming Databases, (2) Pipelining and Real-time Analytics with MapReduce Online

November 11 2009

Counting Unique Users in Real-time with Streaming Databases

As the web increasingly becomes real-time, marketers and publishers need analytic tools that can produce real-time reports. As an example, the basic task of calculating the number of unique users is typically done in batch mode (e.g. daily) and in many cases using a random sample from the relevant log files. If unique user counts can be accurately computed in real-time, publishers and marketers can mount A/B tests or referral analysis to dynamically adjust their campaigns.

In a previous post I described SQL databases designed to handle data streams. In their latest release, Truviso announced technology that allows companies to track unique users in real-time. Truviso uses the same basic idea I described in my earlier post:

Recognizing that "data is moving until it gets stored", the idea behind many real-time analytic engines is to start applying the same analytic techniques to moving (streams) and static (stored) data.

Truviso uses (compressed) bitmaps and set theory to compute the number of unique customers in real-time. In the process they are able to handle the standard SQL queries associated with these types of problems: counting the number of distinct users, for any given set of demographic filters. Bitmaps are built as data streams into the system and uses the same underlying technology that allows Truviso to handle massive data sets from high-traffic web sites.


Once companies can do simple counts and averages in real-time, the next step is to use real-time information for predictive analytics. Truviso has customers using their system for "on-the-fly predictive modeling".

The other major enhancement in this release is a major step towards parallel processing. Truviso's new execution engine processes runs or blocks of data in parallel in multi-core systems or multi-node environments. Using Truviso's parallel execution engine is straightforward on a single multi-core server, but on a multi-node cluster it may require considerable attention to configuration.

[For my previous posts on real-time analytic tools see here and here.]

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!