Skip to main content
Article thumbnail
Location of Repository

DataLab: Transactional Data-Parallel Computing on an Active Storage Cloud

By Brandon Rich and Douglas Thain


Active storage clouds are an attractive platform for executing large data intensive workloads found in many fields of science. However, active storage presents new system management challenges. A large system of fault-prone machines with local persistent state can easily degenerate into a mess of unreferenced data and runaway computations. To address this challenge, we advocate adapting the notion of distributed transactions from traditional databases. We demonstrate the use of distributed transactions in the context of DataLab, a software system for executing data parallel workloads on active storage clouds. We detail the underlying capabilities required from each node, explain how transactions are coordinated, and demonstrate the robust scaling of the system to 250 nodes while running an image processing application

Year: 2008
OAI identifier: oai:CiteSeerX.psu:
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • (external link)
  • (external link)
  • Suggested articles

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.