Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

February 10 2012

O'Reilly Radar Show 2/10/12: The 5 trends that will shape the data world

Below you'll find the script and associated links from the February 10, 2012 episode of O'Reilly Radar. An archive of past shows is available through O'Reilly Media's YouTube channel and you can subscribe to episodes of O'Reilly Radar via iTunes.


Introduction

There are five major trends that will shape the data world in the months to come. Strata Conference chair Edd Dumbill reveals them in this episode of O'Reilly Radar. [Starts 12 seconds in.]

Also in this episode: We revisit a conversation with Wired's Kevin Kelly in which he discusses freemium models and why digital rights management will likely persist in some form or another. [Interview begins at 11:04.]

Radar posts of note

[This segment begins at the 10:06 mark.]

For now, legislators have backed off of the Stop Online Piracy Act and the Protect IP Act, but the friction between media companies and online piracy persists. In his piece "SOPA and PIPA are bad industrial policy," Tim O'Reilly explains why these efforts — and those sure to emerge down the road — hold back innovative business models that grow the overall market.

It's the hot trend in software right now, but what does big data mean, and how can you exploit it? In "What is big data?," Strata chair Edd Dumbill presents an introduction and orientation to the big data landscape.

Finally, books, publishing processes and readers have all made the jump to digital, and that's creating considerable opportunities for publishing startups. Justo Hidalgo explores the digital shift in his piece, "Three reasons why we're in a golden age of publishing entrepreneurship."

As always, links to these stories and other resources mentioned during this episode are available at radar.oreilly.com/show.

Radar video spotlight

At the 2011 Tools of Change for Publishing conference I had a chance to interview Wired's Kevin Kelly about two topics that continue to play big roles in the content world: the freemium model and digital rights management.

As you'll see in the following video, Kelly has a unique, long-view perspective on both of these issues.

[Interview begins at 11:04.]

Closing

Just a reminder that you can always catch episodes of O'Reilly Radar at youtube.com/oreillymedia and subscribe to episodes through iTunes.

All of the links and resources mentioned during this episode are posted at radar.oreilly.com/show.

That's all we have for this episode. Thanks for joining us and we'll see you again soon.

December 26 2011

The year in big data and data science

Big data and data science have both been with us for a while. According to McKinsey & Company's May 2011 report on big data, back in 2009 "nearly all sectors in the U.S. economy had at least an average of 200 terabytes of stored data ... per company with more than 1,000 employees." And on the data-science front, Amazon's John Rauser used his presentation at Strata New York (below) to trace the profession of data scientist all the way back to 18th-century German astronomer Tobias Mayer.

Of course, novelty and growth are separate things, and in 2011, there were a number of new technologies and companies developed to address big data's issues of storage, transfer, and analysis. Important questions were also raised about how the growing ranks of data scientists should be trained and how data science teams should be constructed.

With that as a backdrop, below I take a look at three evolving data trends that played an important role over the last year.

The ubiquity of Hadoop

HadoopIt was a big year for investment for Apache Hadoop-based companies. Hortonworks, which was spun out of Yahoo this summer, raised $20 million upon its launch. And when Cloudera announced it had raised $40 million this fall, GigaOm's Derrick Harris calculated that, all told, Hadoop-based startups had raised $104.5 million between May and November of 2011. (Other startups raising investment for their Hadoop software included PlatforaHadapt and MapR.)

But it wasn't just startups that got in on the Hadoop action this year: IBM announced this fall that it would offer Hadoop in the cloud; Oracle unveiled its own Hadoop distribution running on its new Big Data appliance; EMC signed a licensing agreement with MapR; and Microsoft opted to put its own big data processing system, Dryad, on hold, signing a deal instead with Hortonworks to handle Hadoop on Azure.

The growing number of Hadoop providers and adopters has spurred more solutions for managing and supporting Hadoop. This will become increasingly important in 2012 as Hadoop moves beyond the purview of data scientists to become a tool more businesses and analysts utilize.

More data, more privacy and security concerns

Despite all the promise that better tools for handing and analyzing data holds, there were numerous concerns this year about the privacy and security implications of big data, stemming in part from a series of high-profile data thefts and scandals.

In April, a security breach at Sony led to the theft of the personal data of 77 million users. The intrusion into the Playstation Network prompted Sony to pull it offline, but Sony failed to notify its users about the issue for a full week (later admitting that it stored usernames and passwords unencrypted). Estimates of the cost of the security breach to Sony: between $170 million and $24 billion.

That's a wide range of estimates for the damage done to the company, but the point is clear nonetheless: not only do these sorts of data breaches cost companies millions, but the value of consumers' personal data is also increasing — for both legitimate and illegitimate purposes.

iOS mapSony was hardly the only company with security and privacy concerns on its hands. In April, Alasdair Allan and Pete Warden uncovered a file in Apple iOS software that noted users' latitude-longitude coordinates along with a timestamp. Apple responded, insisting that the company "is not tracking the location of your iPhone. Apple has never done so and has no plans to ever do so." Apple fixed what it said was a "bug."

Late this year, almost all handset makers and carriers were implicated by another mobile concern when Android developer Trevor Eckhart reported that the mobile intelligence company Carrier IQ's rootkit software could record all sorts of user data — texts, web browsing, keystrokes, and even phone calls.

That the data from mobile technology was at the heart of these two controversies reflects in some ways our changing data usage patterns. But whether it's mobile or not, as we do more online — shop, browse, chat, check in, "like" — it's clear that we're leaving behind an immense trail of data about ourselves. This year saw the arrival of several open-source efforts, such as the Locker Project and ThinkUp, that strive to give users better control over their personal social data.

And while better control and safeguards can offer some level of protection, it's clear that technology can always be cracked and the goals of data aggregators can shift. So, if digital data is and always will be a moving target, how does that shape our expectations for privacy? In Privacy and Big Data, published this year, co-authors Terence Craig and Mary Ludloff argued that we might be paying too much attention to concerns about "intrusions of privacy" and that instead we need to be thinking about better transparency with how governments and companies are using our data.

Open data's inflection point

Screenshot from the Open Knowledge Foundation's Open Government Data Map
Screenshot from the Open Knowledge Foundation's Open Government Data Map.

When it comes to better transparency, 2011 has been a good year for open data, with strong growth in the number of open data efforts. Canada, the U.K., France, the U.S., and Kenya were a few of the countries unveiling open data initiatives.

There were still plenty of open data challenges: budgets cuts, for example, threatened the U.S. Data.gov initiative. And in his "state of open data 2011" talk, open data activist David Eaves pointed to the challenges of having different schemas and few standards, making it difficult for some datasets to be used across systems and jurisdictions.

Even with a number of open data "wins" at the government level, a recent survey of the data science community by EMC named the lack of open data as one of the obstacles that data scientists and business intelligence analysts said they faced. Just 22% of the former and 12% of the latter said that they "strongly believed" that the employees at their companies have the access they need to run experiments on data. Arguably, more open data efforts have spawned more interest and better understanding of what this can mean.

The demands for more open data has also spawned a demand for more tools. Importantly, these tools are beginning to be open to more than just data scientists or programmers. They include things like visualization-creator Visual.ly, the scraping tool ScraperWiki, and data-sharing site BuzzData.

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

Related:

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.
(PRO)
No Soup for you

Don't be the product, buy the product!

close
YES, I want to SOUP ●UP for ...