Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

June 07 2012

Strata Week: Data prospecting with Kaggle

Here are a few of the data stories that caught my attention this week:

Prospecting for data

KaggleThe data science competition site Kaggle is extending its features with a new service called Prospect. Prospect allows companies to submit a data sample to the site without having a pre-ordained plan for a contest. In turn, the data scientists using Kaggle can suggest ways in which machine learning could best uncover new insights and answer less-obvious questions — and what sorts of data competitions could be based on the data.

As GigaOm's Derrick Harris describes it: "It's part of a natural evolution of Kaggle from a plucky startup to an IT company with legs, but it's actually more like a prequel to Kaggle's flagship predictive modeling competitions than it is a sequel." It's certainly a good way for companies to get their feet wet with predictive modeling.

Practice Fusion, a web-based electronic health records system for physicians, has launched the inaugural Kaggle Prospect challenge.

HP's big data plans

Last year, Hewlett Packard made a move away from the personal computing business and toward enterprise software and information management. It's a move that was marked in part by the $10 billion it paid to acquire Autonomy. Now we know a bit more about HP's big data plans for its Information Optimization Portfolio, which has been built around Autonomy's Intelligent Data Operating Layer (IDOL).

ReadWriteWeb's Scott M. Fulton takes a closer look at HP's big data plans.

The latest from Cloudera

Cloudera released a number of new products this week: Cloudera Manager 3.7.6; Hue 2.0.1; and of course CDH 4.0, its Hadoop distribution.

CDH 4.0 includes:

"... high availability for the filesystem, ability to support multiple namespaces, HBase table and column level security, improved performance, HBase replication and greatly improved usability and browser support for the Hue web interface. Cloudera Manager 4 includes multi-cluster and multi-version support, automation for high availability and MapReduce2, multi-namespace support, cluster-wide heatmaps, host monitoring and automated client configurations."

Social data platform DataSift also announced this week that it was powering its Hadoop clusters with CDH to perform the "Big Data heavy lifting to help deliver DataSift's Historics, a cloud-computing platform that enables entrepreneurs and enterprises to extract business insights from historical public Tweets."

Have data news to share?

Feel free to email us.

OSCON 2012 Data Track — Today's system architectures embrace many flavors of data: relational, NoSQL, big data and streaming. Learn more in the Data track at OSCON 2012, being held July 16-20 in Portland, Oregon.

Save 20% on registration with the code RADAR

Related:

September 23 2011

Developer Week in Review: webSOS

On the developer front, if the growing tide of rumors is correct, there will be some iOS stuff to report next week.

Meanwhile:

Last one out turn off the lights

HP WebOSHP has flung the axe, and it has taken out a large swath of the ill-fated webOS crew. HP is confirming that development will cease by the end of the year, reducing the number of viable mobile operating systems down to two again (Blackberry is heading the way of webOS, and Windows Mobile has an uphill battle at this point).

Is hegemony in the mobile space a good thing? Maybe, maybe not. It's good for mobile developers, as it reduces the number of potential platforms you need to consider. It could be bad for consumers, as it reduces the pressure on the remaining players to innovate. However, given that neither HP nor Microsoft nor RIM was pushing the envelope much with their products, that might not be a valid concern. And, frankly, Google and Apple do a pretty good job of stealing ideas from each other — witness the new Android-like notification framework in iOS5.

Android Open, being held October 9-11 in San Francisco, is a big-tent meeting ground for app and game developers, carriers, chip manufacturers, content creators, OEMs, researchers, entrepreneurs, VCs, and business leaders.

Save 20% on registration with the code AN11RAD

An (un)sign of the times

JavaOne of the joys of Java development is dealing with signed jars. For the uninitiated, Java Archives (jars) can be signed, "proving" that the contents inside are valid and untampered. Among other things, it is how the Java Web Start framework decides which Java programs can be automatically downloaded and started from a web page. Getting your jar file signed correctly is a delicate dance, and getting it wrong means that the applications will just plain not work.

Seemingly out of the blue, Oracle has started to remove the old Sun signatures from some core Java libraries that many developers depend on. The end result of this is that, going forward, it will become more difficult to deploy applications that use these frameworks. Oracle is saying it was done for security reasons, but as with many moves by Oracle lately, the end result has been to upset the developer community.

Creating the next generation of coders?

One of the paradoxical phenomena that seems to be occurring in society is that, even as technology is becoming more and more a part of people's lives, programming is being marginalized in the public schools. Instead, kids are taught how to use Excel or Powerpoint (God knows, my kid is a Powerpoint wiz!).

In the UK, they've decided to turn things around by making software design a part of the curriculum. You can make a strong argument that software engineering brings in skills from a lot of other disciplines like math and science, so it makes a good integrated teaching experience. On the other hand, my experience has been that public schools are uniquely bad at teaching coding because they try to teach it by rote, when it is at heart a creative process. It's like trying to teach painting by telling the students exactly where to place every brush stroke. Only time will tell if the UK can do it any better.

Got news?

Please send tips and leads here.

Related:

September 01 2011

Developer Week in Review: HP fires up the TouchPad production line one more time

Dear Waters Near Africa,

I know that you're very proud of the tropical depressions that you raise, and I'm sure that watching your "little babies" develop must bring you a lot of joy. It pains me to tell you, however, that one of your offspring, I think her name is Irene, went on a bender last week and totally trashed our coast. And if that isn't bad enough, I hear you have another little hellion called Katia eyeing our back yard with malice. If you can't control your children, I'm afraid we're going to have no choice but to call the police, or possibly NOAA, and ask them to do something about the situation. Thanks.

You can help my personal disaster recovery program (hey, propane for the generator doesn't grow on trees, you know ... well, actually, it did a few million years ago ...), by buying my new book, now available in early release. Read the book that helped Oprah lose weight, landed Gwyneth Paltrow her first acting gig, and got Barack Obama elected. While we can't promise the same amazing results for you, it does have a lot of good stuff in it about enterprise iOS development.

In non-flood-related news ...

At Crazy Bill Hewlett's House of Tablets, we're giving them away!

HP TouchPadWhen HP called it quits on its attempted iPad-killer, the TouchPad, most folks chalked it up to another attempt by an industry dinosaur to become one of the hip new kids. And it was no surprise that HP tried to clear its inventory by fire-saling the remaining inventory at a bargain-basement price.

What has everyone scratching their heads is that, as TouchPads disappeared off shelves at the low, low price of $99, someone over at HP decided it made sense to restart the production line and make more units to sell at the same discounted price. Given that the best estimates show HP losing around 200 clams per unit at that price, the company seems to be pursuing a somewhat questionable business model.

It may make sense if HP is trying to build interest in WebOS in front of a potential sale to buyers such as Samsung, though at least some buyers of the discounted units seem more interested in hacking them to run Android rather than stay with the native OS. In any event, if you're interested, run out and get one before the last run sells out ... Unless HP decides to do another last run ...

There's a joke about geese leaving the nest here, somewhere

Google has a history of acquiring big names in the industry to enhance its prestige as a leading software research organization. When Google hired James Gosling, who is considered one of the fathers of Java, it was seen as another feather in its cap, adding to a cadre that includes such notables as Mac pioneer Andy Hertzfeld (most recently responsible for designing the Circles feature in Google+) and Vim developer Bram Moolenaar.

It appears that for Gosling, Google wasn't so much a destination as a rest stop, however. After only a few months on the job, he's flown the coop, off to join a new startup designing autonomous ocean-going robots. If I had to guess, I'd say that Gosling decided he'd rather be a big fish at a small company solving a challenging and cool problem, as opposed to being part of a brain trust at a large one. Hey Google, I'm still available!

Strata Conference New York 2011, being held Sept. 22-23, covers the latest and best tools and technologies for data science — from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.

Save 30% on registration with the code ORM30

Apple lost another phone?

While you're pondering the wisdom of HP, here's another puzzler to chew on. If you had just gotten over the embarrassment of having one of your top-secret product prototypes left in a bar, and ending up in the hands of Gizmodo, wouldn't you make doubly sure that you kept track of where the next ones went?

Well, evidently, there's a lot of after-hours drinking going on at Apple because, once again, a next-gen iPhone became separated from its owner at a watering hole.

The more cynical among the press have suggested that it's actually all just a publicity stunt, though given that the police were brought in, I tend to doubt it since filing a false report is not a trivial charge. I blame the new iPhone Drinking Game App in iOS5. You know, the one where you have to take a drink whenever you pull out your phone to settle a trivia dispute at a bar.

Got news?

Please send tips and leads here.

Related:

August 25 2011

Strata Week: Green pigs and data

Here are a few of the data stories that caught my attention this week:

Predicting Angry Birds

Angry BirdsAngry Birds maker Rovio will begin using predictive analytics technology from the Seattle-based company Medio to help improve game play for its popular pig-smashing game.

According to the press release announcing the partnership, Angry Birds has been downloaded more 300 million times and is on course to reach 1 billion downloads. But it isn't merely downloaded a lot; it's played a lot, too. The game, which sees up to 1.4 billion minutes of game play per week, generates an incredible amount of data: user demographics, location, and device information are just a few of the data points.

Users' data has always been important in gaming, as game developers must refine their games to maximize the amount of time players spend as well as track their willingness to spend money on extras or to click on related ads. As casual gaming becomes a bigger and more competitive industry, game makers like Rovio will rely on analytics to keep their customers engaged.

As GigaOm's Derrick Harris notes, quoting Zynga's recent S-1 filing, this is already a crucial part of that gaming giant's business:

The extensive engagement of our players provides over 15 terabytes of game data per day that we use to enhance our games by designing, testing and releasing new features on an ongoing basis. We believe that combining data analytics with creative game design enables us to create a superior player experience.

By enlisting the help of Medio for predictive analytics, it's clear that Rovio is taking that same tactic to improve the Angry Bird experience.

Unstructured data and HP's next chapter

HP made a number of big announcements last week as it revealed plans for an overhaul. These plans include ending production of its tablet and smartphones, putting the development of WebOS on hold, and spending some $10 billion to acquire the British enterprise software company Autonomy.

AutonomyThe New York Times described the shift in HP as a move to "refocus the company on business products and services," and the acquisition of Autonomy could help drive that via its big data analytics. HP's president and CEO Léo Apotheker said in a statement: "Autonomy presents an opportunity to accelerate our strategic vision to decisively and profitably lead a large and growing space ... Together with Autonomy, we plan to reinvent how both unstructured and structured data is processed, analyzed, optimized, automated and protected."

As MIT Technology Review's Tom Simonite puts it, HP wants Autonomy for its "math skills" and the acquisition will position HP to take advantage of the big data trend.

Founded in 1996, Autonomy has a lengthy history of analyzing data, with an emphasis on unstructured data. Citing an earlier Technology Review interview, Simonite quotes Autonomy founder Mike Lynch's estimate that about 85% of the information inside a business is unstructured. "[W]e are human beings, and unstructured information is at the core of everything we do," Lynch said. "Most business is done using this kind of human-friendly information."

Simonite argues that by acquiring Autonomy, HP could "take a much more dominant position in the growing market for what Autonomy's Lynch dubs 'meaning-based computing.'"

Strata Conference New York 2011, being held Sept. 22-23, covers the latest and best tools and technologies for data science -- from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.

Save 30% on registration with the code STN11RAD


Using data to uncover stories for the Daily Dot

After several months of invitation-only testing, the web got its own official daily newspaper this week with the launch of The Daily Dot. CEO Nick White and founding editor Owen Thomas said the publication will focus on the news from various online communities and social networks.

GigaOm's Mathew Ingram gave The Daily Dot a mixed review, calling its focus on web communities "an interesting idea," but he questioned if the "home town newspaper" metaphor really makes sense. The number of kitten stories on the Daily Dot's front page aside, ReadWriteWeb's Marshall Kirkpatrick sees The Daily Dot as part of the larger trend toward data journalism, and he highlighted some of the technology that the publication is using to uncover the Web world's news, including Hadoop and assistance from Ravel Data.

"It's one thing to crawl, it's another to understand the community," Daily Dot CEO White told Kirkpatrick. "What we really offer is thinking about how the community ticks. The gestures and modalities on Reddit are very different from Youtube; it's sociological, not just math."

Got data news?

Feel free to email me.



Related:


August 18 2011

Four short links: 18 August 2011

  1. Amazon Publishing Signs Tim Ferris (NY Times) -- Amazon's vertical integration now extends to 15m female orgasms.
  2. Erasing Data from USB Drives (PC World) -- With flash drives things are more complex, thanks to mechanisms built into the drives to prolong their lifespan. Because flash memory cells stop working after they've been overwritten too many times, flash devices use tricks called "wear leveling" to even out how the memory cells are used. A side effect of wear levelling is that it is "almost impossible" to completely erase data from a flash device, McClain said.
  3. HP TouchPad Not Selling Well -- The biggest sale yet from flash sale site Woot, which sold the tablet for $120 off, got HP a meager 612 customers.
  4. HTTP Archive: The First Nine Months (Steve Souders) -- total data transferred up, HTTP requests up, redirects up. Flash down, so it's not all bad news, but in general web sites appear to be binging on high latency corn syrup.

July 06 2011

May 08 2011

Feeding the community fuels advances at Red Hat and JBoss

I wouldn't dare claim to pinpoint what makes Red Hat the most successful company with a pervasive open source strategy, but one intriguing thing sticks out: their free software development strategy is the precise inverse of most companies based on open source.

Take the way Red Hat put together CloudForms, one of their major announcements at last week's instance of the annual Red Hat Summit and JBoss World. As technology, CloudForms represents one of the many efforts in the computer industry to move up the stack in cloud computing, with tools for managing, migrating, and otherwise dealing with operating system instances along with a promise (welcome in these age of cloud outages) to allow easy switches between vendors and prevent lock-in. But CloudForms is actually a blend of 79 SourceForge projects. Red Hat created it by finding appropriate free software technologies and persuading the developers to work together toward this common vision.

I heard this story from vice president Scott Farrand of Hewlett-Packard. Their own toe hold on this crowded platform is the HP edition, a product offering that manages ProLiant server hosts and Flex Fabric networking to provide a platform for CloudForms.

The point of this story is that Red Hat rarely creates products like other open source companies, which tend to grow out of a single project and keep pretty close control over the core. Red Hat makes sure to maintain a healthy, independent community-based project. Furthermore, many open source companies try to keep ahead of the community, running centralized beta programs and sometimes keeping advanced features in proprietary versions of the product. In contrast, the community runs ahead of Red Hat projects. Whether it's the Fedora Linux distribution, the Drools platform underlying JBoss's BPM platform, JBoss Application Server lying behind JBoss's EAP offering, or many other projects forming the foundation of Red Hat and JBoss offerings, the volunteers typically do the experimentation and stabilize new features before the company puts together a stable package to support.

Red Hat Summit and JBoss World was huge and I got to attend only a handful of the keynotes and sessions. I spent five hours manning the booth of for Open Source for America, which got a lot of positive attention from conference attendees. Several other worthy causes in reducing poverty attracted a lot of volunteers.

In general, what I heard at the show didn't represent eye-catching innovations or sudden changes in direction, but solid progress along the lines laid out by Red Hat and JBoss in previous years. I'll report here on a few technical advances.

PaaS standardization: OpenShift

Red Hat has seized on the current computing mantra of our time, which is freedom in the cloud. (I wrote a series on this theme, culminating in a proposal for an open architecture for SaaS.) Whereas CloudForms covers the IaaS space, Red Hat's other big product announcement, OpenShift, tries to broaden the reach of PaaS. By standardizing various parts of the programming environment, Red Hat hopes to bring everyone together regardless of programming language, database back-end, or other options. For example, OpenShift is flexible enough to support PostgreSQL from EnterpriseDB, CouchDB from Couchbase, and MongoDB from 10gen, among the many partners Red Hat has lined up.

KVM optimization

The KVM virtualization platform, a direct competitor to VMware (and another project emerging from and remaining a community effort), continues to refine its performance and offer an increasing number of new features.

  • Linux hugepages (2 megabytes instead of 4 kilobytes) can lead to a performance improvement ranging from 24% to 46%, particularly when running databases.

  • Creating a virtual network path for each application can improve performance by reducing network bottlenecks.

  • vhost_net improves performance through bypassing the user-space virtualization model, QEMU.

  • Single Root I/O Virtualization (SR-IOV) allows direct access from a virtual host to an I/O device, improving performance but precluding migration of the instance to another physical host.

libvirt is much improved and is now the recommended administrative tool.

JBoss AS and EAP

Performance and multi-node management, seemed to be the obsessions driving AS 7. Performance improvements, which have led to a ten-fold speedup and almost ten times less memory use between AS 6 and AS 7, include:

  • A standardization of server requirements (ports used, etc.) so that these requirements can be brought up concurrently during system start-up

  • Reorganization of the code to better support multicore systems

  • A cache to overcome the performance hit in Java reflection.

Management enhancements include:

  • Combining nodes into domains where they can be managed as a unit

  • The ability to manage nodes through any scripting language, aided by a standard representation of configuration data types in a dynamic model with a JSON representation

  • Synching the GUI with the XML files so that a change made in either place will show up in the other

  • Offering a choice whether to bring up a server right away at system start-up, or later on an as-needed basis

  • Cycle detection when servers fail and are restarted

February 10 2011

Let the tablet wars begin

HP TouchPadYesterday, Hewlett-Packard announced the launch of its TouchPad tablet, which is scheduled to hit stores sometime this summer — no pricing information has been released, however.

The announcement was timely, as Apple is in a bit of a battle with publishers over subscription and in-app purchasing policies. HP is taking Apple head-on, even hiring one of Apple's senior directors to help draw developers.

Also notable is that HP has signed on Time Inc., allowing the publisher to provide magazine subscriptions under agreeable terms. The European Newspaper Association (ENPA) is likely taking note as its concerns over Apple's subscription policies intensify.



Related:


October 18 2010

Bookish Techy Week in Review

Another bookish-techy week has come and gone, with plenty of news from the future of publishing. Here are some of the highlights:

Good news for ebooks in general

E book sales for January-August 2010 represented $263 million, compared to $89.8 million from January-August 2009, representing an overall increase for the category of 193% over the same period last year.

Great news for Amazon/Kindle

Not such great news for iBookstore

The iBookstore six months after launch: One big failure

HP's POD pilots takes flight

This semester, Hewlett Packard (HP) is conducting print-on-demand pilots at three universities.

Libraries checking out new e-acquisitions model

Patron-Driven Ebook Model Simmers as Ebrary Joins Ranks

Craig Mod suggests only you can prevent bad ereaders

The ereader incompetence checklist (for discerning consumers, editors, publishers and designers)

Dear Author's Jane Litte advises would-be Android readers

Here are some things to look for when determining whether a particular Android tablet would be a good reader for you.

Julietta Leonetti offers an excellent analysis of how the ebook industry is (slowly) taking shape in Argentina

In Argentina, E-books Are Sexy! (But You Can't Find Them Anywhere)


May 04 2010

Four short links: 4 May 2010

  1. Comparing genomes to computer operating systems in terms of the topology and evolution of their regulatory control networks (PNAS) -- paper comparing structure and evolution of software design (exemplified by the Linux operating system) against biological systems (in the form of the e. coli bacterium). They found software has a lot more "middle manager" functions (functions that are called and then in turn call) as opposed to biology, where "workers" predominate (genes that make something, but which don't trigger other genes). They also quantified how software and biology value different things (as measured what persists across generations of organisms, or versions of software): Reuse and persistence are negatively correlated in the E. coli regulatory network but positively correlated in the Linux call graph[...]. In other words, specialized nodes are more likely to be preserved in the regulatory network, but generic or reusable functions are persistent in the Linux call graph. (via Hacker News)
  2. Virtual Keyboards in Google Search -- rolling out virtual keyboards across all Google searches. Very nice solution to the problem of "how the heck do I enter that character on this keyboard?". (via glynmoody on Twitter)
  3. Information and Quantum Systems Lab at HP -- working on the mathematical and physical foundations for the technologies that will form a new information ecosystem, the Central Nervous System for the Earth (CeNSE), consisting of a trillion nanoscale sensors and actuators embedded in the environment and connected via an array of networks with computing systems, software and services to exchange their information among analysis engines, storage systems and end users. (via dcarli on Twitter)
  4. Turkit -- Java/JavaScript API for running iterative tasks on Mechanical Turk. (via chrismessina on Twitter)

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.

Don't be the product, buy the product!

Schweinderl