Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

January 10 2011

Four short links: 10 January 2011

  1. Tools and Practices for Working Virtually -- a detailed explanation of how the RedMonk team works virtually.
  2. Twitter Accounts for All Stack Overflow Users by Reputation (Brian Bondy) -- superawesome list of clueful people.
  3. The Wonderful World of Early Computing -- from bones to the ENIAC, some surprising and interesting historical computation devices. (via John D. Cook)
  4. Overlapping Experiment Infrastructure (PDF) -- they can't run just one test at a time, so they have infrastructure to comprehensively test all features against all features and in real time pull out statistical conclusions from the resulting data. (via Greg Linden)

December 16 2010

Strata Week: Shop 'til you drop

Need a break from the holiday madness? You're not alone. Check out these items of interest from the land of data and see why even the big consumers face tough choices.

Does this place accept returns?

On Monday, Stack Overflow announced that they have moved the Stack Exchange Data Explorer (SEDE) off of the Windows Azure platform and onto in-house hardware.

data-explorer-screenshot.png

SEDE is an open source, web-based tool for querying the monthly data dump of Creative Commons data from its four main Q&A sites (Stack Overflow, Server Fault, Super User, and Meta) as well as other sites in the Stack Exchange family. The primary reason given (within a polite write-up by Jeff Atwood and SEDE lead Sam Saffron), was the desire to have fine-tuned control over the platform.

When you are using a [Platform-as-a-Service] you are giving up a lot of control to the service provider. The service provider chooses which applications you can run and imposes a series of restrictions. ... It was disorienting moving to a platform where we had no idea what kind of hardware was running our app. Giving up control of basic tools and processes we use to tune our environment was extremely painful.

While the support that comes with Platform-as-a-Service was acknowledged, it seems that the ability to better automate, adjust, and perpetuate processes and systems with more fine-grained control won out as a bigger convenience.



Where did you get that lovely platform?


Strata 2011Of course, one company's headache is another's dream. Netflix, a company known for playing with big data and crowdsourcing solutions "before it was cool," posted on Tuesday the four reasons they've chosen to use Amazon Web Services (AWS) as their platform and have moved onto it over the last year.

Laudably, the company states that it viewed its tremendous recent growth (in terms of both members and streaming devices) as a license to question everything in the necessary process of re-architecting. Instead of building out their own data centers, etc., they decided to answer that set of questions by paying someone else to worry about it.

Also to their credit, Netflix has enough self-awareness to know what they are and aren't good at. Building top-notch recommendation systems and providing entertainment? You betcha. Predicting customer growth and device engagement? Not so much.

How many subscribers would you guess used our Wii application the week it launched? How many would you guess will use it next month? We have to ask ourselves these questions for each device we launch because our software systems need to scale to the size of the business, every time.

Self-awareness is in fact the primary lesson in both Netflix's and Stack Exchange's platform decisions. If you feel your attention is better spent elsewhere, write a check. If you've got the time and expertise to hone your hardware, roll your own.

[Of course, Netflix doesn't go for the pre-packaged solutions every time. They also posted recently about why they love open source software, and listed among the projects they make use of and contribute back to: Hadoop, Hive, HBase, Honu, Ant, Tomcat, Hudson, Ivy, Cassandra, etc.]

With what shall we shop?

The New York Times this week released a cool group of interactive maps based on data collected in the Census Bureau's American Community Survey (ACS) from 2005 to 2009. Data is compared against the 2000 census to uncover rates of change.

[While similar to the census, the ACS is conducted every year instead of every 10 years. The ACS includes only a sampling of addresses instead of a comprehensive inventory. It covers much of the same ground on population (age, race, disability status, family relationships), but it also asks for information that is used to help make funding distribution decisions about community services and institutions.]

The Times maps explore education levels; rent, mortgage rates, and home values; household income; and racial distribution. Viewers can select among 22 maps in these four categories, and then pan and zoom to view national, state, or local trends down to the level of individual census tracts.

Above is the national view of the map that looks at change in median household income. The ACS website itself provides some maps displaying the survey numbers from the 2000 census and the 2005-2009 survey, as well as a listing of data tables.

The Times map shows the uneven way in which these numbers have gone up or down in various parts of the country, with some surprising results that are worth exploring. Note that the blue regions are places where income has dropped, and the yellow regions are places where it has increased. (No wonder a lot of us are getting creative with holiday shopping.)

If this kind of research floats your boat, check out Social Explorer, the mapping tool used to create the New York Times maps.

Even markets like to buy things

The emerging landscape of custom data markets is already shifting as Infochimps recently announced the acquisition of Data Marketplace, a start-up incubated at Y Combinator.

While Stewart Brand may be right in thinking information wants to be free, there's also enormous value to be added by aggregating, structuring, and packaging data, as well as in matching up buyers with sellers. That's the main service Data Marketplace aims to provide, particularly in the field of financial data.

At Infochimps, information is offered a la carte, and many of the site's datasets are offered for free. These include sets as diverse as "Word List - 100,000+ official crossword words (Excel readable)", "Measuring Worth: Interest Rates - US & UK 1790-2000", and "Retrosheet: Game Logs (play-by-play) for Major League Baseball Games." Data Marketplace is a bit different, in that it allows users to enter requests for data (with a deadline and budget, if desired) and then matches up would-be buyers with data providers.

Infochimps has said that Data Marketplace, which is less than a year old, will continue to operate as a standalone site, although its founders Steve DeWald and Matt Hodan will depart for new projects.

If you're interested in the burgeoning business of aggregated datasets, be sure to check out the Data Marketplaces panel I'll be moderating at Strata in February.

Not yet signed up for Strata? Register now and save 30% with the code STR11RAD.

August 04 2010

Four short links: 4 August 2010

  1. FuXi -- Python-based, bi-directional logical reasoning system for the semantic web from the folks at the Open Knowledge Foundation. (via About Inferencing)
  2. Harness the Power of Being an Internet -- I learn by trying to build something, there's no other way I can discover the devils-in-the-details. Unfortunately that's an incredibly inefficient way to gain knowledge. I basically wander around stepping on every rake in the grass, while the A Students memorize someone else's route and carefully pick their way across the lawn without incident. My only saving graces are that every now and again I discover a better path, and faced with a completely new lawn I have an instinct for where the rakes are.
  3. Stack Overflow's Curated Folksonomy -- community-driven tag synonym system to reduce the chaos of different names for the same thing. (via Skud)
  4. Image Deblurring using Inertial Measurement Sensors (Microsoft Research) -- using Arduino to correct motion blur. (via Jon Oxer)

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl