2,583 research outputs found

    The Family of MapReduce and Large Scale Data Processing Systems

    Full text link
    In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author

    Big Data Management for Cloud-Enabled Geological Information Services

    Get PDF

    Open Source Big Data Platforms and Tools: An Analysis

    Get PDF
    Big data is attracting an excessive amount of interest in the IT and academic sectors. On a regular basis, computer and digital industries generate more data than they have space to store. In the current situation, five billion people have their own mobile phone, and over two billion people are linked globally to exchange various types of data. By 2020, it is estimated that about fifty billion people will be connected to the internet. During2020, data generation, use, and sharing would be forty-four times higher than in previous years. A variety of sectors and organizations are using big data to manage various operations. As a result, a thorough examination of big data's benefits, drawbacks, meaning, and characteristics is needed. The primary goal of this research is to gather information on the various open-source big data tools and platforms that are used by various organizations. In this paper we use a three perspective methodology to identify the strength and weaknesses of the workflow in a open source big data arena. This helps to establish a pipeline of workflow events for both researcher and entrepreneur decision making
    • …
    corecore