79,763 research outputs found
Block-Based Distributed File Systems
Distributed file systems have become popular because they allow information to be shared be between computers in a natural way. A distributed file system often forms a central building block in a distributed system.
Currently most distributed file systems are built using a communications interface that transfers messages about files between machines. This thesis proposes a different, lower level, communications interface. This `block-based' interface exchanges information about the blocks that make up the file but not about the files themselves. No other distributed file system is built this way.
By demonstrating that a distributed file system can be implemented in a block-based manner, this thesis opens the way for many advances in distributed file systems. These include a reduction of the processing required at the server, uniformity in managing file blocks and fine-grained placement and replication of data. The simple communications model also lends itself to efficient implementation both at the server and in the communications protocols that support the interface. These advantages come at the cost of a more complex client implementation and the need for a lower level consistency mechanism.
A block-based distributed file system (BB-NFS) has been implemented. BB-NFS provides the Unix file system interface and demonstrates the feasibility and implementability of the block-based approach. Experience with the implementation lead to the development of a lock cache mechanism which gives a large improvement in the performance of the prototype. Although it has not been directly measured it is plausible that the prototype will perform better than the file based approach.
The block-based approach has much to offer future distributed file system developers. This thesis introduces the approach and its advantages, demonstrates its feasibility and shows that it can be implemented in a way that performs well
The Family of MapReduce and Large Scale Data Processing Systems
In the last two decades, the continuous increase of computational power has
produced an overwhelming flow of data which has called for a paradigm shift in
the computing architecture and large scale data processing mechanisms.
MapReduce is a simple and powerful programming model that enables easy
development of scalable parallel applications to process vast amounts of data
on large clusters of commodity machines. It isolates the application from the
details of running a distributed program such as issues on data distribution,
scheduling and fault tolerance. However, the original implementation of the
MapReduce framework had some limitations that have been tackled by many
research efforts in several followup works after its introduction. This article
provides a comprehensive survey for a family of approaches and mechanisms of
large scale data processing mechanisms that have been implemented based on the
original idea of the MapReduce framework and are currently gaining a lot of
momentum in both research and industrial communities. We also cover a set of
introduced systems that have been implemented to provide declarative
programming interfaces on top of the MapReduce framework. In addition, we
review several large scale data processing systems that resemble some of the
ideas of the MapReduce framework for different purposes and application
scenarios. Finally, we discuss some of the future research directions for
implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author
- …