Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

June 07 2012

Strata Week: Data prospecting with Kaggle

Here are a few of the data stories that caught my attention this week:

Prospecting for data

KaggleThe data science competition site Kaggle is extending its features with a new service called Prospect. Prospect allows companies to submit a data sample to the site without having a pre-ordained plan for a contest. In turn, the data scientists using Kaggle can suggest ways in which machine learning could best uncover new insights and answer less-obvious questions — and what sorts of data competitions could be based on the data.

As GigaOm's Derrick Harris describes it: "It's part of a natural evolution of Kaggle from a plucky startup to an IT company with legs, but it's actually more like a prequel to Kaggle's flagship predictive modeling competitions than it is a sequel." It's certainly a good way for companies to get their feet wet with predictive modeling.

Practice Fusion, a web-based electronic health records system for physicians, has launched the inaugural Kaggle Prospect challenge.

HP's big data plans

Last year, Hewlett Packard made a move away from the personal computing business and toward enterprise software and information management. It's a move that was marked in part by the $10 billion it paid to acquire Autonomy. Now we know a bit more about HP's big data plans for its Information Optimization Portfolio, which has been built around Autonomy's Intelligent Data Operating Layer (IDOL).

ReadWriteWeb's Scott M. Fulton takes a closer look at HP's big data plans.

The latest from Cloudera

Cloudera released a number of new products this week: Cloudera Manager 3.7.6; Hue 2.0.1; and of course CDH 4.0, its Hadoop distribution.

CDH 4.0 includes:

"... high availability for the filesystem, ability to support multiple namespaces, HBase table and column level security, improved performance, HBase replication and greatly improved usability and browser support for the Hue web interface. Cloudera Manager 4 includes multi-cluster and multi-version support, automation for high availability and MapReduce2, multi-namespace support, cluster-wide heatmaps, host monitoring and automated client configurations."

Social data platform DataSift also announced this week that it was powering its Hadoop clusters with CDH to perform the "Big Data heavy lifting to help deliver DataSift's Historics, a cloud-computing platform that enables entrepreneurs and enterprises to extract business insights from historical public Tweets."

Have data news to share?

Feel free to email us.

OSCON 2012 Data Track — Today's system architectures embrace many flavors of data: relational, NoSQL, big data and streaming. Learn more in the Data track at OSCON 2012, being held July 16-20 in Portland, Oregon.

Save 20% on registration with the code RADAR

Related:

November 10 2011

Looking for KDD contenders

KDD CupThe KDD Cup is the world's foremost data mining competition. It's an annual contest that challenges data scientists to find out what they can learn from a given dataset. Previous competitions have used data in areas ranging from particle physics to customer relations. This year, they're looking for something particularly meaningful: a data problem in an area such as medicine, education, the environment, or anything that leads to a social good.

The deadline is approaching, and they're still looking for good candidates The competition will be hosted by Kaggle, so if your submission wins, you don't have to worry about logistics; all you need to do is supply the data, and one or two well-defined problems that you expect the data to solve. Then sit back and wait for the solutions to roll in.

The KDD Cup website includes directions for submitting a problem. You don't need to provide the dataset at this time, but to apply you need to provide a fairly rigorous description of the data and the problem you want to solve.

Given that the deadline for submissions is November 15, 2011 (that's next Tuesday), if you think have a good idea and some tough data, you can contact them and ask whether your idea is appropriate, and then possibly ask for an extension. If you've got a tough, practical, real-world data problem that you need to solve, this is your chance!

Reposted bydatenwolf datenwolf
Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl