1 research outputs found
Costing Generated Runtime Execution Plans for Large-Scale Machine Learning Programs
Declarative large-scale machine learning (ML) aims at the specification of ML
algorithms in a high-level language and automatic generation of hybrid runtime
execution plans ranging from single node, in-memory computations to distributed
computations on MapReduce (MR) or similar frameworks like Spark. The
compilation of large-scale ML programs exhibits many opportunities for
automatic optimization. Advanced cost-based optimization techniques
require---as a fundamental precondition---an accurate cost model for evaluating
the impact of optimization decisions. In this paper, we share insights into a
simple and robust yet accurate technique for costing alternative runtime
execution plans of ML programs. Our cost model relies on generating and costing
runtime plans in order to automatically reflect all successive optimization
phases. Costing runtime plans also captures control flow structures such as
loops and branches, and a variety of cost factors like IO, latency, and
computation costs. Finally, we linearize all these cost factors into a single
measure of expected execution time. Within SystemML, this cost model is
leveraged by several advanced optimizers like resource optimization and global
data flow optimization. We share our lessons learned in order to provide
foundations for the optimization of ML programs