Newer posts are loading.
You are at the newest post.
Click here to check if anything new just came in.

February 21 2013

An update on in-memory data management

By Ben Lorica and Roger Magoulas

We wanted to give you a brief update on what we’ve learned so far from our series of interviews with players and practitioners in the in-memory data management space. A few preliminary themes have emerged, some expected, others surprising.

Performance improves as you put data as close to the computation as possible. We talked to people in systems, data management, web applications, and scientific computing who have embraced this concept. Some solutions go to the the lowest level of hardware (L1, L2 cache), The next generation SSDs will have latency performance closer to main memory, potentially blurring the distinction between storage and memory. For performance and power consumption considerations we can imagine a future where the primary way systems are sized will be based on the amount of non-volatile memory* deployed.

Putting data in-memory does not negate the importance of distributed computing environments. Data size and the ability to leverage parallel environments are frequently cited reasons. The same characteristics that make the distributed environments compelling also apply to in-memory systems: fault-tolerance and parallelism for performance. An additional consideration is the ability to gracefully spillover to disk when main is memory full.

There is no general purpose solution that can deliver optimal performance for all workloads. The drive for low latency requires different strategies depending on write or read intensity, fault-tolerance, and consistency. Database vendors we talked with have different approaches for transactional and analytic workloads, in some cases integrating in-memory into existing or new products. People who specialize in write-intensive systems identify hot data (i.e., frequently accessed) and put those in-memory.

Hadoop has emerged as an ingestion layer and the place to store data you might use. The next layer identifies and extracts high-value data that can be stored in-memory for low-latency interactive queries. Due to resource constraints of main memory, using columnar stores to compress data becomes important to speed I/O and store more in a limited space.

While it may be difficult to make in-memory systems completely transparent, the people we talked with emphasized programming interfaces that are as simple as possible.

Our conversations to date have revealed a wide range of solutions and strategies. We remain excited about the topic, and we’re continuing our investigation. If you haven’t yet, feel free to reach out to us on Twitter (Ben is @BigData and Roger is @rogerm) or leave a comment on this post.

* By non-volatile memory we mean the next-generation SSDs. In the rest of the post “memory” refers to traditional volatile main memory.

Related:

January 18 2013

Need speed for big data? Think in-memory data management

By Ben Lorica and Roger Magoulas

In a forthcoming report we will highlight technologies and solutions that take advantage of the decline in prices of RAM, the popularity of distributed and cloud computing systems, and the need for faster queries on large, distributed data stores. Established technology companies have had interesting offerings, but what initially caught our attention were open source projects that started gaining traction last year.

An example we frequently hear about is the demand for tools that support interactive query performance. Faster query response times translate to more engaged and productive analysts, and real-time reports. Over the past two years several in-memory solutions emerged to deliver 5X-100X faster response times. A recent paper from Microsoft Research noted that even in this era of big data and Hadoop, many MapReduce jobs fit in the memory of a single server. To scale to extremely large datasets several new systems use a combination of distributed computing (in-memory grids), compression, and (columnar) storage technologies.

Another interesting aspect of in-memory technologies is that they seem to be everywhere these days. We’re looking at tools aimed at analysts (Tableau, Qlikview, Tibco Spotfire, Platfora), databases that target specific workloads or data types (VoltDB, SAP HANA, Hekaton, Redis, Druid, and Yarcdata), frameworks for analytics (Spark/Shark, GraphLab, GridGain, Asterix/Hyracks), and the data center (RAMCloud, memory Iocality).

We’ll be talking to companies and hackers to get a sense of how in-memory solutions fit into their planning. Along these lines, we would love to hear what you think about the rise of these technologies, as well as applications, companies and projects we should look at. Feel free to reach out to us on Twitter (Ben is @BigData and Roger is @rogerm) or leave a comment on this post.

Older posts are this way If this message doesn't go away, click anywhere on the page to continue loading posts.
Could not load more posts
Maybe Soup is currently being updated? I'll try again automatically in a few seconds...
Just a second, loading more posts...
You've reached the end.
(PRO)
No Soup for you

Don't be the product, buy the product!

close
YES, I want to SOUP ●UP for ...