Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

February 21 2013

An update on in-memory data management

By Ben Lorica and Roger Magoulas

We wanted to give you a brief update on what we’ve learned so far from our series of interviews with players and practitioners in the in-memory data management space. A few preliminary themes have emerged, some expected, others surprising.

Performance improves as you put data as close to the computation as possible. We talked to people in systems, data management, web applications, and scientific computing who have embraced this concept. Some solutions go to the the lowest level of hardware (L1, L2 cache), The next generation SSDs will have latency performance closer to main memory, potentially blurring the distinction between storage and memory. For performance and power consumption considerations we can imagine a future where the primary way systems are sized will be based on the amount of non-volatile memory* deployed.

Putting data in-memory does not negate the importance of distributed computing environments. Data size and the ability to leverage parallel environments are frequently cited reasons. The same characteristics that make the distributed environments compelling also apply to in-memory systems: fault-tolerance and parallelism for performance. An additional consideration is the ability to gracefully spillover to disk when main is memory full.

There is no general purpose solution that can deliver optimal performance for all workloads. The drive for low latency requires different strategies depending on write or read intensity, fault-tolerance, and consistency. Database vendors we talked with have different approaches for transactional and analytic workloads, in some cases integrating in-memory into existing or new products. People who specialize in write-intensive systems identify hot data (i.e., frequently accessed) and put those in-memory.

Hadoop has emerged as an ingestion layer and the place to store data you might use. The next layer identifies and extracts high-value data that can be stored in-memory for low-latency interactive queries. Due to resource constraints of main memory, using columnar stores to compress data becomes important to speed I/O and store more in a limited space.

While it may be difficult to make in-memory systems completely transparent, the people we talked with emphasized programming interfaces that are as simple as possible.

Our conversations to date have revealed a wide range of solutions and strategies. We remain excited about the topic, and we’re continuing our investigation. If you haven’t yet, feel free to reach out to us on Twitter (Ben is @BigData and Roger is @rogerm) or leave a comment on this post.

* By non-volatile memory we mean the next-generation SSDs. In the rest of the post “memory” refers to traditional volatile main memory.


December 21 2012

Six ways data journalism is making sense of the world, around the world

When I wrote that Radar was investigating data journalism and asked for your favorite examples of good work, we heard back from around the world.

I received emails from Los Angeles, Philadelphia, Canada and Italy that featured data visualization, explored the role of data in government accountability, and shared how open data can revolutionize environmental reporting. A tweet pointed me to a talk about how R is being used in the newsroom. Another tweet linked to relevant interviews on social science and the media:

Two of the case studies focused on data visualization, an important practice that my colleague Julie Steele and other editors at O’Reilly Media have been exploring over the past several years.

Several other responses are featured at more length below. After you read through, make sure to also check out this terrific Ignite talk on data journalism recorded at this year’s Newsfoo in Arizona.

Visualizing civic health

Meredith Broussard, a professor at the University of Pennsylvania, sent us a link to a recent data journalism project she did for Hidden City Philadelphia, which won an award from the National Council on Citizenship and the Knight Foundation. The project, measuring Philadelphia’s civic health, won honorable mention in Knight’s civic data challenge. Data visualization was a strong theme among the winners of that challenge.

Data journalism in PhiladelphiaData journalism in Philadelphia

Mapping ambulance response times

I profiled the data journalism work of The Los Angeles Times earlier this year, when I interviewed news developer Ben Welsh about the newspaper’s Data Desk, a team of reporters and web developers that specializes in maps, databases, analysis and visualization.

Recently, the Data Desk made an interactive visualization that mapped how fast the Los Angeles Fire Department responds to calls.

LA Times fire response timesLA Times fire response times

Visualizing UK government spending

The Guardian Datablog is one of the best sources of interesting, relevant data journalism work, from sports to popular culture to government accountability. Every post demonstrates an emerging practice when its editors make it possible for readers to download the data themselves. Earlier this month, the Datablog put government spending in the United Kingdom under the microscope and accompanied it with a downloadable graphic (PDF).

The Guardian’s data journalism is particularly important as the British government continues to invest in open data. In June, the United Kingdom’s Cabinet Office relaunched and released a new open data white paper. The British government is doubling down on the notion that open data can be a catalyst for increased government transparency, civic utility and economic prosperity. The role of data journalism in delivering those outcomes is central.

(Note: A separate Radar project is digging into the open data economy.)

An Italian data job

The Italian government, while a bit behind the pace set in the UK, has made more open data available since it launched a national platform in 2011.

Elisabetta Tola, an Italian data journalist, wrote in to share her work on a series of Wired Magazine articles that feature data on seismic risk assessment in Italian schools. The interactive lets parents search for schools, a feature that embodies service journalism and offers more value than a static map.

Italian schools and earthquakes visualizationItalian schools and earthquakes visualization

Tola highlighted a key challenge in Italy that exists in many other places around the world: How can data journalism be practiced in countries that do not have a Freedom of Information Act or a tradition of transparency on government actions and spending? If you have ideas, please share them in the comments or email me.

Putting satellite imagery to work

Brazil, by way of contrast, notably passed a freedom of information law this past year, fulfilling one of its commitments to the Open Government Partnership.

Earlier this year, when I traveled to Brazil to moderate a panel at the historic partnership’s annual meeting, I met Gustavo Faleiros, a journalist working with open data focusing on the Amazon rainforest. Faleiros is as a Knight International Journalism Fellow, in partnership with Washington-based organizations International Center for Journalists and Internews. Today, Faleiros continues that work as the project coordinator for, a beautiful mashup of open data, maps and storytelling.

Faleiros explained that the partnership is training Brazilian journalists to use satellite imagery and collect data related to forest fires and carbon monoxide. He shared this video that shows a data visualization that came out of that work:

As 2012 comes to an end, the rate of Amazon deforestation has dropped to record lows. These tools help the world see what’s happening from high above.

Data-driven broadcast journalism?

I also heard about work in much colder climes when Keith Robinson wrote in from Canada. “As part of large broadcast organizations, one thing that is very satisfying about data journalism is that it often puts our digital staff in the driver’s seat — what starts as an online investigation often becomes the basis for original and exclusive broadcast content,” he wrote in an email.

Robinson, the senior producer for specials and interactive at Global News in Canada, highlighted several examples of their Data Desk’s work, including:

Robinson expects 2013 will see further investment and expansion in the data journalism practice at Global News.

Robinson also pointed to a practice that media should at least consider adopting: Global News is not only consuming and displaying open data, but also publishing the data they receive from the Canadian government. “As we make access to information requests, we’re trying to make the data received available to the public,” he wrote.

From the big picture to next steps

It was instructive to learn more about the work of two large media organizations, the Los Angeles Times and Canada’s Global News, which have been building their capacity to practice data journalism. The other international perspectives in my inbox and tweet stream, however, were a reminder that big-city newsrooms that can afford teams of programmers and designers aren’t the only players here.

To put it another way, acts of data journalism by small teams or individuals aren’t just plausible, they’re happening — from Italy to Brazil to Africa. That doesn’t mean that the news application teams at NPR, The Guardian, ProPublica or the New York Times aren’t setting the pace for data journalism when it comes to cutting edge work — far from it — but the tools and techniques to make something worthwhile are being democratized.

That’s possible in no small part because of the trend toward open source tools and social coding I’m seeing online, from Open Street Map to more open elections.

It’s a privilege to have a global network to tap into for knowledge and, in the best moments, wisdom. Thank you — and please keep the responses coming, whether you use email, Twitter or the phone. Your input is helping shape a report I’m developing that ties together our coverage of data journalism. Look for that to be published early in the new year.


Reposted bycheg00 cheg00
Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.
Get rid of the ads (sfw)

Don't be the product, buy the product!