79,763 research outputs found

    Block-Based Distributed File Systems

    Get PDF
    Distributed file systems have become popular because they allow information to be shared be between computers in a natural way. A distributed file system often forms a central building block in a distributed system. Currently most distributed file systems are built using a communications interface that transfers messages about files between machines. This thesis proposes a different, lower level, communications interface. This `block-based' interface exchanges information about the blocks that make up the file but not about the files themselves. No other distributed file system is built this way. By demonstrating that a distributed file system can be implemented in a block-based manner, this thesis opens the way for many advances in distributed file systems. These include a reduction of the processing required at the server, uniformity in managing file blocks and fine-grained placement and replication of data. The simple communications model also lends itself to efficient implementation both at the server and in the communications protocols that support the interface. These advantages come at the cost of a more complex client implementation and the need for a lower level consistency mechanism. A block-based distributed file system (BB-NFS) has been implemented. BB-NFS provides the Unix file system interface and demonstrates the feasibility and implementability of the block-based approach. Experience with the implementation lead to the development of a lock cache mechanism which gives a large improvement in the performance of the prototype. Although it has not been directly measured it is plausible that the prototype will perform better than the file based approach. The block-based approach has much to offer future distributed file system developers. This thesis introduces the approach and its advantages, demonstrates its feasibility and shows that it can be implemented in a way that performs well

    The Family of MapReduce and Large Scale Data Processing Systems

    Full text link
    In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author
    corecore