4 research outputs found
On Modeling Dependency between MapReduce Configuration Parameters and Total Execution Time
In this paper, we propose an analytical method to model the dependency
between configuration parameters and total execution time of Map-Reduce
applications. Our approach has three key phases: profiling, modeling, and
prediction. In profiling, an application is run several times with different
sets of MapReduce configuration parameters to profile the execution time of the
application on a given platform. Then in modeling, the relation between these
parameters and total execution time is modeled by multivariate linear
regression. Among the possible configuration parameters, two main parameters
have been used in this study: the number of Mappers, and the number of
Reducers. For evaluation, two standard applications (WordCount, and Exim
Mainlog parsing) are utilized to evaluate our technique on a 4-node MapReduce
platform
Thesis Report: Resource Utilization Provisioning in MapReduce
In this thesis report, we have a survey on state-of-the-art methods for
modelling resource utilization of MapReduce applications regard to its
configuration parameters. After implementation of one of the algorithms in
literature, we tried to find that if CPU usage modelling of a MapReduce
application can be used to predict CPU usage of another MapReduce application
Network Load Analysis and Provisioning of MapReduce Applications
In this paper, we study the dependency between configuration parameters and
network load of fixed-size MapReduce applications in shuffle phase and then
propose an analytical method to model this dependency. Our approach consists of
three key phases: profiling, modeling, and prediction. In the first stage, an
application is run several times with different sets of MapReduce configuration
parameters (here number of mappers and number of reducers) to profile the
network load of the application in the shuffle phase on a given cluster. Then,
the relation between these parameters and the network load is modeled by
multivariate linear regression. For evaluation, three applications (WordCount,
Exim Mainlog parsing, and TeraSort) are utilized to evaluate our technique on a
4-node MapReduce private cluster.Comment: 6 pages-submitted to The Thirteenth International Conference on
Parallel and Distributed Computing, Applications and Technologies(PDCAT-12),
Beijing, Chin
A Study on Using Uncertain Time Series Matching Algorithms in MapReduce Applications
In this paper, we study CPU utilization time patterns of several Map-Reduce
applications. After extracting running patterns of several applications, the
patterns with their statistical information are saved in a reference database
to be later used to tweak system parameters to efficiently execute unknown
applications in future. To achieve this goal, CPU utilization patterns of new
applications along with its statistical information are compared with the
already known ones in the reference database to find/predict their most
probable execution patterns. Because of different patterns lengths, the Dynamic
Time Warping (DTW) is utilized for such comparison; a statistical analysis is
then applied to DTWs' outcomes to select the most suitable candidates.
Moreover, under a hypothesis, another algorithm is proposed to classify
applications under similar CPU utilization patterns. Three widely used text
processing applications (WordCount, Distributed Grep, and Terasort) and another
application (Exim Mainlog parsing) are used to evaluate our hypothesis in
tweaking system parameters in executing similar applications. Results were very
promising and showed effectiveness of our approach on 5-node Map-Reduce
platformComment: 12 pages a version has been accepted to journal of "Concurrency and
Computation: Practice and Experience", available online from the University
of Sydney at http://www.nicta.com.au/pub?doc=474