2 research outputs found
Interpreted Formalisms for Configurations
Imprecise and incomplete specification of system \textit{configurations}
threatens safety, security, functionality, and other critical system properties
and uselessly enlarges the configuration spaces to be searched by configuration
engineers and auto-tuners. To address these problems, this paper introduces
\textit{interpreted formalisms based on real-world types for configurations}.
Configuration values are lifted to values of real-world types, which we
formalize as \textit{subset types} in Coq. Values of these types are dependent
pairs whose components are values of underlying Coq types and proofs of
additional properties about them. Real-world types both extend and further
constrain \textit{machine-level} configurations, enabling richer, proof-based
checking of their consistency with real-world constraints. Tactic-based proof
scripts are written once to automate the construction of proofs, if proofs
exist, for configuration fields and whole configurations. \textit{Failures to
prove} reveal real-world type errors. Evaluation is based on a case study of
combinatorial optimization of Hadoop performance by meta-heuristic search over
Hadoop configurations spaces
ConEx: Efficient Exploration of Big-Data System Configurations for Better Performance
Configuration space complexity makes the big-data software systems hard to
configure well. Consider Hadoop, with over nine hundred parameters, developers
often just use the default configurations provided with Hadoop distributions.
The opportunity costs in lost performance are significant. Popular
learning-based approaches to auto-tune software does not scale well for
big-data systems because of the high cost of collecting training data. We
present a new method based on a combination of Evolutionary Markov Chain Monte
Carlo (EMCMC) sampling and cost reduction techniques to cost-effectively find
better-performing configurations for big data systems. For cost reduction, we
developed and experimentally tested and validated two approaches: using
scaled-up big data jobs as proxies for the objective function for larger jobs
and using a dynamic job similarity measure to infer that results obtained for
one kind of big data problem will work well for similar problems. Our
experimental results suggest that our approach promises to significantly
improve the performance of big data systems and that it outperforms competing
approaches based on random sampling, basic genetic algorithms (GA), and
predictive model learning. Our experimental results support the conclusion that
our approach has strongly demonstrated potential to significantly and
cost-effectively improve the performance of big data systems