3 research outputs found

    A Comparative Performance Evaluation of Hive and Map Reduce for Big-Data

    Get PDF
    Advances in information stockpiling and mining advances make it conceivable to safeguard expanding measures of information created specifically or in a roundabout way by clients and break down it to yield important new bits of knowledge. Huge information can uncover individuals' shrouded behavioral examples and even revealed insight into their expectations. All the more absolutely, it can overcome any and all hardships between what individuals need to do and what they really do and how they connect with others and their surroundings. This data is valuable to government offices and in addition privately owned businesses to bolster choice making in zones going from law requirement to social administrations to country security. One of the proficient advancements that arrangement with the Big Data is Hadoop, which will be talked about in this paper. Hadoop, for preparing extensive information volume employments utilizes MapReduce programming model. Hadoop makes utilization of diverse schedulers for executing the occupations in parallel. The default scheduler is FIFO (First In First Out) Scheduler. Different schedulers with need, pre-emption and non-pre-emption alternatives have likewise been produced. As the time has passed the MapReduce has come to few of its restrictions. So keeping in mind the end goal to beat the constraints of MapReduce, the up and coming era of MapReduce has been produced called as YARN (Yet Another Resource Negotiator). Along these lines, this paper gives a review on Hadoop, few booking strategies it uses and a brief prologue to YARN. Keywords: Big-Data, Hive, Map Reduc

    The Impact of Capacity Scheduler Configuration Settings on MapReduce Jobs

    No full text
    Abstract—MapReduce is a parallel programming paradigm used for processing huge datasets on certain classes of distributable problems using a cluster. Budgetary constraints and the need for better usage of resources in a MapReduce cluster often influence an organization to rent or share hardware resources for their main data processing and analysis tasks. Thus, there may be many competing jobs from different clients performing simultaneous requests to the MapReduce framework on a particular cluster. Schedulers like Fair Share and Capacity have been specially designed for such purposes. Administrators and users run into performance problems, however, because they do not know the exact meaning of different task scheduler settings and what impact they can have with respect to the application execution time and resource allocation policy decisions. Existing work shows that the performance of MapReduce jobs depends on the cluster configuration, input data type and job configuration settings. However, that work fails to take into account the task scheduler settings. We show, through experimental evaluation, that task scheduler configuration parameters make a significant difference to the performance of the cluster and it is important to understand the influence of such parameters. Based on our findings, we also identified some of the open issues in the existing area of research. Keywords-MapReduce, Task Scheduler, Performance I

    Simulation and Performance Evaluation of Hadoop Capacity Scheduler

    Get PDF
    MapReduce is a parallel programming paradigm used for processing huge datasets on certain classes of distributable problems using a cluster. Budgetary constraints and the need for better usage of resources in a MapReduce cluster often make organizations rent or share hardware resources for their main data processing and analysis tasks. Thus, there may be many competing jobs from different clients performing simultaneous requests to the MapReduce framework on a particular cluster. Schedulers like Fair Share and Capacity have been specially designed for such purposes. Administrators and users run into performance problems, however, because they do not know the exact meaning of different task scheduler settings and what impact they can have with respect to the resource allocation scheme across organizations for a shared MapReduce cluster. In this work, Capacity Scheduler is integrated into an existing MRPERF simulator to predict the performance of MapReduce jobs in a shared cluster under different settings for Capacity Scheduler. A few case studies on the behaviour of Capacity Scheduler across different job patterns etc. using integrated simulator are also conducted
    corecore