Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

April 30 2012

Four short links: 30 April 2012

  1. Chanko (Github) -- trivial A/B testing from within Rails.
  2. OpenMeetings -- Apache project for audio/video conferencing, screen sharing, whiteboard, calendar, and other groupware features.
  3. Low Innovation Internet (Wired) -- I disagree, I think this is a Louis CK Nobody's Happy moment. We renormalize after change and become blind to the amazing things we're surrounded by. Hundreds of thousands (millions?) of people work from home, collaborate to develop software that has saved the world billions of dollars in licensing fees, provide services, write and share books, make voice and video calls, create movies, fund creative projects, buy and sell used goods, and you're unhappy because there aren't "huge changes"? Have you spoken to someone in the publishing, music, TV, film, newspaper, retail, telephone, or indeed any industry that exists outside your cave, you obtuse contrarian pillock? There's no room on my Internet for weenie whiners.
  4. Context-Free Patent Art -- endlessly amusing. (via David Kaneda)

December 08 2011

Four short links: 8 December 2011

  1. Temporal Patterns of Happiness and Information in a Global Social Network: Hedonometrics and Twitter (PLOSone) -- Tweets involving the ‘fake news’ comedian Stephen Colbert are both happier and of a higher information level than those concerning his senior colleague Jon Stewart. By contrast, tweets mentioning Glenn Beck are lower in happiness than both Colbert and Stewart but comparable to Colbert in information content.
  2. Pricing Experiments You Can Learn From -- revealing the data from experiments which showed how to drive people towards higher prices.
  3. 10 Things I Learned at CrowdConf 2011 (Crowdflower) -- Using his newly released crowdsourcing platform Coffee & Power, Philip [Rosedale] developed his entire company infrastructure and platform through a globally distributed workforce. 288 contributors in 127 locations worked together to get this startup off the ground in a whole new way. The Coffee & Power platform was built in 1,700 commits ranging from $6 quality checks all the way up to full source-code editing. One element of this process was developing the Hudat iPhone app. In less than a month for $2,485, the Coffee & Power community got this mobile app up and running.
  4. Andi -- AGPL3-licensed spaced repetition flashcard system. (via Jack Kinsella)

July 26 2011

November 11 2009

Counting Unique Users in Real-time with Streaming Databases

As the web increasingly becomes real-time, marketers and publishers need analytic tools that can produce real-time reports. As an example, the basic task of calculating the number of unique users is typically done in batch mode (e.g. daily) and in many cases using a random sample from the relevant log files. If unique user counts can be accurately computed in real-time, publishers and marketers can mount A/B tests or referral analysis to dynamically adjust their campaigns.


In a previous post I described SQL databases designed to handle data streams. In their latest release, Truviso announced technology that allows companies to track unique users in real-time. Truviso uses the same basic idea I described in my earlier post:


Recognizing that "data is moving until it gets stored", the idea behind many real-time analytic engines is to start applying the same analytic techniques to moving (streams) and static (stored) data.

Truviso uses (compressed) bitmaps and set theory to compute the number of unique customers in real-time. In the process they are able to handle the standard SQL queries associated with these types of problems: counting the number of distinct users, for any given set of demographic filters. Bitmaps are built as data streams into the system and uses the same underlying technology that allows Truviso to handle massive data sets from high-traffic web sites.


pathint


Once companies can do simple counts and averages in real-time, the next step is to use real-time information for predictive analytics. Truviso has customers using their system for "on-the-fly predictive modeling".


The other major enhancement in this release is a major step towards parallel processing. Truviso's new execution engine processes runs or blocks of data in parallel in multi-core systems or multi-node environments. Using Truviso's parallel execution engine is straightforward on a single multi-core server, but on a multi-node cluster it may require considerable attention to configuration.


[For my previous posts on real-time analytic tools see here and here.]

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl