4 research outputs found

    On Modeling Dependency between MapReduce Configuration Parameters and Total Execution Time

    Full text link
    In this paper, we propose an analytical method to model the dependency between configuration parameters and total execution time of Map-Reduce applications. Our approach has three key phases: profiling, modeling, and prediction. In profiling, an application is run several times with different sets of MapReduce configuration parameters to profile the execution time of the application on a given platform. Then in modeling, the relation between these parameters and total execution time is modeled by multivariate linear regression. Among the possible configuration parameters, two main parameters have been used in this study: the number of Mappers, and the number of Reducers. For evaluation, two standard applications (WordCount, and Exim Mainlog parsing) are utilized to evaluate our technique on a 4-node MapReduce platform

    Thesis Report: Resource Utilization Provisioning in MapReduce

    Full text link
    In this thesis report, we have a survey on state-of-the-art methods for modelling resource utilization of MapReduce applications regard to its configuration parameters. After implementation of one of the algorithms in literature, we tried to find that if CPU usage modelling of a MapReduce application can be used to predict CPU usage of another MapReduce application

    Network Load Analysis and Provisioning of MapReduce Applications

    Full text link
    In this paper, we study the dependency between configuration parameters and network load of fixed-size MapReduce applications in shuffle phase and then propose an analytical method to model this dependency. Our approach consists of three key phases: profiling, modeling, and prediction. In the first stage, an application is run several times with different sets of MapReduce configuration parameters (here number of mappers and number of reducers) to profile the network load of the application in the shuffle phase on a given cluster. Then, the relation between these parameters and the network load is modeled by multivariate linear regression. For evaluation, three applications (WordCount, Exim Mainlog parsing, and TeraSort) are utilized to evaluate our technique on a 4-node MapReduce private cluster.Comment: 6 pages-submitted to The Thirteenth International Conference on Parallel and Distributed Computing, Applications and Technologies(PDCAT-12), Beijing, Chin

    A Study on Using Uncertain Time Series Matching Algorithms in MapReduce Applications

    Full text link
    In this paper, we study CPU utilization time patterns of several Map-Reduce applications. After extracting running patterns of several applications, the patterns with their statistical information are saved in a reference database to be later used to tweak system parameters to efficiently execute unknown applications in future. To achieve this goal, CPU utilization patterns of new applications along with its statistical information are compared with the already known ones in the reference database to find/predict their most probable execution patterns. Because of different patterns lengths, the Dynamic Time Warping (DTW) is utilized for such comparison; a statistical analysis is then applied to DTWs' outcomes to select the most suitable candidates. Moreover, under a hypothesis, another algorithm is proposed to classify applications under similar CPU utilization patterns. Three widely used text processing applications (WordCount, Distributed Grep, and Terasort) and another application (Exim Mainlog parsing) are used to evaluate our hypothesis in tweaking system parameters in executing similar applications. Results were very promising and showed effectiveness of our approach on 5-node Map-Reduce platformComment: 12 pages a version has been accepted to journal of "Concurrency and Computation: Practice and Experience", available online from the University of Sydney at http://www.nicta.com.au/pub?doc=474
    corecore