48,398 research outputs found
Muppet: MapReduce-Style Processing of Fast Data
MapReduce has emerged as a popular method to process big data. In the past
few years, however, not just big data, but fast data has also exploded in
volume and availability. Examples of such data include sensor data streams, the
Twitter Firehose, and Facebook updates. Numerous applications must process fast
data. Can we provide a MapReduce-style framework so that developers can quickly
write such applications and execute them over a cluster of machines, to achieve
low latency and high scalability? In this paper we report on our investigation
of this question, as carried out at Kosmix and WalmartLabs. We describe
MapUpdate, a framework like MapReduce, but specifically developed for fast
data. We describe Muppet, our implementation of MapUpdate. Throughout the
description we highlight the key challenges, argue why MapReduce is not well
suited to address them, and briefly describe our current solutions. Finally, we
describe our experience and lessons learned with Muppet, which has been used
extensively at Kosmix and WalmartLabs to power a broad range of applications in
social media and e-commerce.Comment: VLDB201
The End of a Myth: Distributed Transactions Can Scale
The common wisdom is that distributed transactions do not scale. But what if
distributed transactions could be made scalable using the next generation of
networks and a redesign of distributed databases? There would be no need for
developers anymore to worry about co-partitioning schemes to achieve decent
performance. Application development would become easier as data placement
would no longer determine how scalable an application is. Hardware provisioning
would be simplified as the system administrator can expect a linear scale-out
when adding more machines rather than some complex sub-linear function, which
is highly application specific.
In this paper, we present the design of our novel scalable database system
NAM-DB and show that distributed transactions with the very common Snapshot
Isolation guarantee can indeed scale using the next generation of RDMA-enabled
network technology without any inherent bottlenecks. Our experiments with the
TPC-C benchmark show that our system scales linearly to over 6.5 million
new-order (14.5 million total) distributed transactions per second on 56
machines.Comment: 12 page
- …