2,583 research outputs found
The Family of MapReduce and Large Scale Data Processing Systems
In the last two decades, the continuous increase of computational power has
produced an overwhelming flow of data which has called for a paradigm shift in
the computing architecture and large scale data processing mechanisms.
MapReduce is a simple and powerful programming model that enables easy
development of scalable parallel applications to process vast amounts of data
on large clusters of commodity machines. It isolates the application from the
details of running a distributed program such as issues on data distribution,
scheduling and fault tolerance. However, the original implementation of the
MapReduce framework had some limitations that have been tackled by many
research efforts in several followup works after its introduction. This article
provides a comprehensive survey for a family of approaches and mechanisms of
large scale data processing mechanisms that have been implemented based on the
original idea of the MapReduce framework and are currently gaining a lot of
momentum in both research and industrial communities. We also cover a set of
introduced systems that have been implemented to provide declarative
programming interfaces on top of the MapReduce framework. In addition, we
review several large scale data processing systems that resemble some of the
ideas of the MapReduce framework for different purposes and application
scenarios. Finally, we discuss some of the future research directions for
implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author
Open Source Big Data Platforms and Tools: An Analysis
Big data is attracting an excessive amount of interest in the IT and academic sectors. On a regular basis, computer and digital industries generate more data than they have space to store. In the current situation, five billion people have their own mobile phone, and over two billion people are linked globally to exchange various types of data. By 2020, it is estimated that about fifty billion people will be connected to the internet. During2020, data generation, use, and sharing would be forty-four times higher than in previous years. A variety of sectors and organizations are using big data to manage various operations. As a result, a thorough examination of big data's benefits, drawbacks, meaning, and characteristics is needed. The primary goal of this research is to gather information on the various open-source big data tools and platforms that are used by various organizations. In this paper we use a three perspective methodology to identify the strength and weaknesses of the workflow in a open source big data arena. This helps to establish a pipeline of workflow events for both researcher and entrepreneur decision making
- …