1 research outputs found
Analytical composite performance models for Big Data applications
In the era of Big Data, whose digital industry is facing the massive growth of data size
and development of data intensive software, more and more companies are moving to use
new frameworks and paradigms capable of handling data at scale. The outstanding MapRe-
duce (MR) paradigm and its implementation framework, Hadoop are among the most re-
ferred ones, and basis for later and more advanced frameworks like Tez and Spark. Accurate
prediction of the execution time of a Big Data application helps improving design time de-
cisions, reduces over allocation charges, and assists budget management. In this regard, we
propose analytical models based on the Stochastic Activity Networks (SANs) to accurately
model the execution of MR, Tez and Spark applications in Hadoop environments governed
by the YARN Capacity scheduler. We evaluate the accuracy of the proposed models over the
TPC-DS industry benchmark across different configurations. Results obtained by numeri-
cally solving analytical SAN models show an average error of 6% in estimating the execution
time of an application compared to the data gathered from experiments and moreover the
model evaluation time is lower than simulation time of state of the art solutions