Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

January 12 2012

Strata Week: A .data TLD?

Here are some of the data stories that caught my attention this week.

Should there be a .data TLD?

radar.dataICANN is ready to open top-level domains (TLD) to the highest bidder, and as such, Wolfram Alpha's Stephen Wolfram posits it's time for a .data TLD. In a blog post on the Wolfram site, he argues that the new top-level domains provide an opportunity for the creation of a .data domain that could create a "parallel construct to the ordinary web, but oriented toward structured data intended for computational use. The notion is that alongside a website like, there'd be"

Wolfram continues:

If a human went to, there'd be a structured summary of what data the organization behind it wanted to expose. And if a computational system went there, it'd find just what it needs to ingest the data, and begin computing with it.

So how would a .data TLD change the way humans and computers interact with data? Or would it change anything? If you've got ideas of how .data could be put to use, please share them in the comments.

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

Cloudera addresses what Apache Hadoop 1.0 means to its customers

Last week, the Apache Software Foundation (ASF) announced that Hadoop had reached version 1.0. This week, Cloudera took to its blog to explain what that milestone means to its customers.

The post, in part, explains how Hadoop has branched from its trunk, noting that all of this has caused some confusion for Cloudera customers:

More than a year after Apache Hadoop 0.20 branched, significant feature development continued on just that branch and not on trunk. Two major features were added to branches off 0.20.2. One feature was authentication, enabling strong security for core Hadoop. The other major feature was append, enabling users to run Apache HBase without risk of data loss. The security branch was later released as 0.20.203. These branches and their subsequent release have been the largest source of confusion for users because since that time, releases off of the 0.20 branches had features that releases off of trunk did not have and vice versa.

Cloudera explains to its customers that it's offered the equivalent for "approximately a year now" and compares the Apache Hadoop efforts to its own offerings. The post is an interesting insight into not just how the ASF operates, but how companies that offer services around those projects have to iterate and adapt.

Disqus says that pseudonymous commenters are best

Debates over blog comments have resurfaced recently, with a back and forth about whether or not they're good, bad, evil, or irrelevant. Adding some fuel to the fire (or data to the discussion, at least) comes Disqus with its own research based on its commenting service.

According to the Disqus research, commenters using pseudonyms actually are "the most valuable contributors to communities," as their comments are both the highest quantity and quality. Those findings run counter to the idea that those who comment online without using their real names actually lessen rather than enhance quality conversations.

Disqus' data indicates that pseudonymity might engender a more engaged and more engaging community. That notion stands in contrast to arguments that anonymity leads to more trollish and unruly behavior.

Got data news?

Feel free to email me.


January 09 2012

The hidden language and "wonderful experience" of product reviews

How do reviews, both positive and negative, influence the price of a product on Amazon? What phrases used by reviewers make us more or less likely to complete a purchase? These are some of the questions that computer scientist Panagiotis Ipeirotis, an associate professor at New York University's Stern School of Business, set out to investigate by analyzing the text in thousands of reviews on Amazon. Ipeirotis continues to research this space.

Ipeirotis' findings are surprising: consumers will pay more for the same product if the seller's reviews are good, certain types of negative reviews actually boost sales, and spelling plays an important role.

Our interview follows.

How important are product reviews on Amazon? Can they give sellers more pricing power? Ipeirotis: The reviews have a significant effect. When buying online, customers are not only purchasing the product, they're also inherently buying the guarantee of a seamless transaction. Customers read the feedback left from other buyers to evaluate the reputation of the seller. Since customers are willing to pay more to buy from merchants with a better reputation — something we call the "reputation premium" — that feedback tends to have an effect on future prices that the merchant can charge.

What are some of the most influential phrases?

Panagiotis Ipeirotis: "Never received" is a killer phrase in terms of reputation. It reduced the price a seller can charge by an average of $7.46 in the products examined. "Wonderful experience" is one of the most positive, increasing the price a seller can charge by $5.86 for the researched products.

How can very positive reviews be bad for sales?

Panagiotis Ipeirotis: Extremely positive reviews that contain no concrete details tend to be perceived as non-objective — written by fanboys or spammers. We observed this mainly in the context of product reviews, where superlative phrases like "Best camera!" with no further details are actually seen negatively.

Can a negative review ever be good for sales?

Panagiotis Ipeirotis: It can when the review is overly negative or criticizes aspects of the product that are not its primary purpose — the video quality in an SLR camera, for example. Or, when customers have unreasonable expectations: "Battery life lasts only for two days of shooting." Readers interpret these types of negative comments as "This is good enough for me," and it decreases their uncertainty about the product.

What is the effect of badly written reviews on sales?

Panagiotis Ipeirotis: Reviews containing spelling and grammatical errors consistently result in suboptimal outcomes, like lower sales or lower response rates. That was a fascinating but, in retrospect, expected finding. This holds true in a wide variety of settings, from reviews of electronics to hotels. It's even the case when examining email correspondence about a decision, such as whether or not to hire a contractor.

We don't know the exact reason yet, but the effect is very systematic. There are several possible explanations:

  • Readers think that the customers who buy this product are uneducated, so they don't buy it.
  • Reviews that are badly written are considered unreliable and therefore increase the uncertainty about the product.
  • Badly written reviews are unsuccessful attempts to spam and are a signal that even the other good reviews may not be authentic.

What's the relationship between the product attributes discussed in reviews and the attributes that lead to sales?

Panagiotis Ipeirotis: We observed that the aspects of a product that drive the online discussion are not necessarily the ones that define consumer decisions to buy it. For example, "zoom" tends to be discussed a lot for small point-and-shoot cameras. However, very few people are influenced by the zoom capabilities when it comes down to deciding which camera to buy.

This interview was edited and condensed.

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20


March 15 2011

Facebook comments: Fewer and better, or just fewer?

Alistair Croll and Sean Power recently reviewed how embedded Facebook comments affect the number of comments on posts. They used TechCrunch as a test case, comparing comment totals, Facebook likes, Google Buzz and Twitter activity one week before and one week after TechCrunch implemented the FB comment plugin.


On first blush, the numbers might be surprising, and even a bit disconcerting. Croll and Power's analysis showed:

  • For all posts, implementing FB Comments caused a 42% reduction in the total amount of comments, and a 38% reduction in comments per post.
  • For the average post, implementing FB Comments caused a 58% reduction in the total amount of comments and a 56% reduction in the average amount of comments per post.

(Note: For the "average" analysis, they discarded the data from the top and bottom 5 percent.)

The results also indicated, however, that Google Buzz increased 30 percent overall and Facebook likes increased in total and average analyses as well. While the reduction in comments may appear to be a bad thing, one TechCrunch reader (not at all involved in Croll and Power's analysis study), noticed the change post-FB comment plugin and was thrilled with the reduction in spam and troll comments.

As reader engagement not only requires real readers with real thoughts, but also improves based on the quality of engagement, perhaps forcing commenters to log in with a real Facebook persona improves interaction in a quality-over-quantity kind of way.

You can read Croll and Power's complete report here and download their data here.

Web 2.0 Expo San Francisco 2011, being held March 28-31, will examine key pieces of the digital economy and the ways you can use important ideas for your own success.

Save 20% on registration with the code WEBSF11RAD

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.
No Soup for you

Don't be the product, buy the product!

YES, I want to SOUP ●UP for ...