Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

January 19 2012

Strata Week: A home for negative and null results

Here are a few of the data stories that caught my attention this week:

Figshare sees the upside of negative results

FigshareScience data-sharing site Figshare relaunched its website this week, adding several new features. Figshare lets researchers publish all of their data online, including negative and null results.

Using the site, researchers can now upload and publish all file formats, including videos and datasets that are often deemed "supplemental materials" or excluded from current publishing models. This is part of a larger "open science" effort. According to Figshare:

"... by opening up the peer review process, researchers can easily publish null results, avoiding the file drawer effect and helping to make scientific research more efficient. Figshare uses creative commons licensing to allow frictionless sharing of research data whilst allowing users to maintain their ownership."

As the startup argues: "Unless we as scientists publish all of our data, we will never achieve access to the sum of all scientific knowledge."

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

Accel's $100 million data fund makes its first ($52.5 million) investment

Late last year, the investment firm Accel Partners announced a new $100 Million Big Data Fund, with a promise to invest in big data startups. This year, the first investment from that fund was revealed, with a whopping $52.5 million going to Code 42.

Founded in 2001, Code 42 is the creator of the backup software CrashPlan, and the company describes itself as building "high-performance hardware and easy-to-use software solutions that protect the world's data."

Describing the investment, GigaOm's Stacey Higginbotham writes:

"With the growth in mobile devices and the data stored on corporate and consumer networks that is moving not only from device to server, but device to device, [CEO Matthew] Dornquast realized Code 42's software could become more than just a backup and sharing service, but a way for corporations to understand what data and how data was moving between employees and the devices they use."

Higginbotham also cites Accel Partners' Ping Li, who notes that further investments from its Big Data Fund are unlikely to be so sizable.

LinkedIn open sources DataFu

LinkedInLinkedIn has been a heavy user of Apache Pig for performing analysis with Hadoop on projects such as its People You May Know tool, among other things. For more advanced tasks like these, Pig supports User Defined Functions (UDFs), which allow the integration of custom code into scripts.

This week, LinkedIn announced the release of DataFu, the consolidation of its UDFs into a single, general-purpose library. DataFu enables users to "run PageRank on a large number of independent graphs, perform set operations such as intersect and union, compute the haversine distance between two points on the globe," and more.

LinkedIn is making DataFu available on GitHub under the Apache 2.0 license.

Got data news?

Feel free to email me.


May 19 2011

Strata Week: A call for open science data

Here are some of the data stories that caught my attention this week:

Should scientists share their research data more openly?

Royal Society logoLondon's Royal Society has launched a study, "Science as a public enterprise," which will examine "how scientific information should be managed to support innovative and productive research that reflects public values."

That statement points to two key ideas underlying the Royal Society's inquiry. First is the importance of public values and public trust in science. No longer can scientists just assume that people will defer to their authority, as the debates over climate change have demonstrated:

It is therefore important that science is not, and is not seen to be, a private enterprise, conducted behind the closed doors of laboratories, but a public enterprise to understand better the world we live in and our place in it. Effective dialogue about the priorities and insights of science and its relation to public values is vital. Scientists can no longer assume an unquestioning public trust.

The other aspect of the Royal Society inquiry involves reconsidering how science is practiced, particularly vis-à-vis open data. In an article in The Lancet the members of the committee contend that scientific scholarship needs to do a better job making data available. "Conventional peer-reviewed publications generally provide summaries of the available data, but not effective access to data in a useable format." Although there are calls to make data available to others, at the same time the exponential growth in the volume and diversity of data makes accessibility a challenge.

In addition to how scientists can make this data more available are questions about who should pay to do it; how scientists will handle the need for confidentiality, data security, intellectual property rights, and anonymization; and whether rules on this sort of scientific data sharing could apply globally.

OSCON Data 2011, being held July 25-27 in Portland, Ore., is a gathering for developers who are hands-on, doing the systems work and evolving architectures and tools to manage data. (This event is co-located with OSCON.)

Save 20% on registration with the code OS11RAD

Why is UK train departure data not open data?

Bewdley Station by Dazzie D, on FlickrDespite the British government's efforts to open its data, much is still unavailable. The UK-based location data startup Placr has written a blog post explaining why the country's rail data doesn't contain train departure information. The explanation, penned by Placr co-founder Jonathan Raper, demonstrates how complex open data efforts can be in terms of technology, bureaucracy, data ownership and control.

The short answer, says Raper, is that the Association of Train Operating Companies — the only group with a train departure information service and API — is a private organization that doesn't release open data. The API is available, however, with a commercial license and there are some free licenses distributed.

The British rail system was privatized in the mid-1990s but it remains heavily subsidized by taxpayers. There's now some confusion, Raper suggests, about when and if Network Rail (the rail infrastructure owner) counts as a public sector organization, and in the case of rail data, who exactly owns it.

Hacking Tyler, Texas

Christopher Groskopf is back in The Atlantic with his second blog post about his Hack Tyler project. Groskopf is relocating to Tyler, Texas and is making the most of the move by focusing his developer efforts on the sizable amount of open data made available by Tyler's local government. Groskopf says he had no idea that his idea would spark "an unexpected ruckus" from online readers and from Tyler residents.

Groskopf has already created a list of all the data sources he's been able to identify. "Its been heartening to see how much data actually is available (albeit often in less than ideal formats)," he writes. Some of this data includes a real-time list of where the city's police officers are responding, and almost all of Smith County's financial documentation.

You can follow the adventures of Hack Tyler here.

A little light reading on MapReduce and Hadoop

If you're looking to brush up on MapReduce and Hadoop algorithms in your summer reading, then check out this updated list of academic papers. The list includes 35 papers published this year, as well as two new categories: social networking and astronomy.

Got data news?

Feel free to email me.

Photo: Bewdley Station by Dazzie D, on Flickr


Reposted bydatenwolf datenwolf
Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.
Get rid of the ads (sfw)

Don't be the product, buy the product!