Search CORE

4 research outputs found

Whence to Learn? Transferring Knowledge in Configurable Systems using BEETLE

Author: Jamshidi Pooyan
Krishna Rahul
Menzies Tim
Nair Vivek
Publication venue
Publication date: 25/03/2020
Field of study

As software systems grow in complexity and the space of possible configurations increases exponentially, finding the near-optimal configuration of a software system becomes challenging. Recent approaches address this challenge by learning performance models based on a sample set of configurations. However, collecting enough sample configurations can be very expensive since each such sample requires configuring, compiling, and executing the entire system using a complex test suite. When learning on new data is too expensive, it is possible to use \textit{Transfer Learning} to "transfer" old lessons to the new context. Traditional transfer learning has a number of challenges, specifically, (a) learning from excessive data takes excessive time, and (b) the performance of the models built via transfer can deteriorate as a result of learning from a poor source. To resolve these problems, we propose a novel transfer learning framework called BEETLE, which is a "bellwether"-based transfer learner that focuses on identifying and learning from the most relevant source from amongst the old data. This paper evaluates BEETLE with 57 different software configuration problems based on five software systems (a video encoder, an SAT solver, a SQL database, a high-performance C-compiler, and a streaming data analytics tool). In each of these cases, BEETLE found configurations that are as good as or better than those found by other state-of-the-art transfer learners while requiring only a fraction (

\frac{1}{7}

th) of the measurements needed by those other methods. Based on these results, we say that BEETLE is a new high-water mark in optimally configuring software.Comment: Accepted, to appear in IEEE TSE. arXiv admin note: text overlap with arXiv:1803.0390

arXiv.org e-Print Archive

Simpler Hyperparameter Optimization for Software Analytics: Why, How, When?

Author: Agrawal Amritanshu
Agrawal Rishabh
Menzies Tim
Shen Xipeng
Yang Xueqi
Publication venue
Publication date: 13/08/2020
Field of study

How to make software analytics simpler and faster? One method is to match the complexity of analysis to the intrinsic complexity of the data being explored. For example, hyperparameter optimizers find the control settings for data miners that improve for improving the predictions generated via software analytics. Sometimes, very fast hyperparameter optimization can be achieved by just DODGE-ing away from things tried before. But when is it wise to use DODGE and when must we use more complex (and much slower) optimizers? To answer this, we applied hyperparameter optimization to 120 SE data sets that explored bad smell detection, predicting Github ssue close time, bug report analysis, defect prediction, and dozens of other non-SE problems. We find that DODGE works best for data sets with low "intrinsic dimensionality" (D = 3) and very poorly for higher-dimensional data (D over 8). Nearly all the SE data seen here was intrinsically low-dimensional, indicating that DODGE is applicable for many SE analytics tasks.Comment: arXiv admin note: substantial text overlap with arXiv:1912.0406

arXiv.org e-Print Archive

CADET: Debugging and Fixing Misconfigurations using Counterfactual Reasoning

Author: Iqbal Md Shahriar
Jamshidi Pooyan
Javidian Mohammad Ali
Krishna Rahul
Ray Baishakhi
Publication venue
Publication date: 08/03/2021
Field of study

Modern computing platforms are highly-configurable with thousands of interacting configurations. However, configuring these systems is challenging. Erroneous configurations can cause unexpected non-functional faults. This paper proposes CADET (short for Causal Debugging Toolkit) that enables users to identify, explain, and fix the root cause of non-functional faults early and in a principled fashion. CADET builds a causal model by observing the performance of the system under different configurations. Then, it uses casual path extraction followed by counterfactual reasoning over the causal model to: (a) identify the root causes of non-functional faults, (b) estimate the effects of various configurable parameters on the performance objective(s), and (c) prescribe candidate repairs to the relevant configuration options to fix the non-functional fault. We evaluated CADET on 5 highly-configurable systems deployed on 3 NVIDIA Jetson systems-on-chip. We compare CADET with state-of-the-art configuration optimization and ML-based debugging approaches. The experimental results indicate that CADET can find effective repairs for faults in multiple non-functional properties with (at most) 17% more accuracy, 28% higher gain, and

40\times

speed-up than other ML-based performance debugging methods. Compared to multi-objective optimization approaches, CADET can find fixes (at most)

9\times

faster with comparable or better performance gain. Our case study of non-functional faults reported in NVIDIA's forum show that CADET can find

14%

better repairs than the experts' advice in less than 30 minutes

arXiv.org e-Print Archive

An Evolutionary Study of Configuration Design and Implementation in Cloud Systems

Author: Dong Wei
He Haochen
Legunsen Owolabi
Li Shanshan
Xu Tianyin
Zhang Yuanliang
Publication venue
Publication date: 23/02/2021
Field of study

Many techniques were proposed for detecting software misconfigurations in cloud systems and for diagnosing unintended behavior caused by such misconfigurations. Detection and diagnosis are steps in the right direction: misconfigurations cause many costly failures and severe performance issues. But, we argue that continued focus on detection and diagnosis is symptomatic of a more serious problem: configuration design and implementation are not yet first-class software engineering endeavors in cloud systems. Little is known about how and why developers evolve configuration design and implementation, and the challenges that they face in doing so. This paper presents a source-code level study of the evolution of configuration design and implementation in cloud systems. Our goal is to understand the rationale and developer practices for revising initial configuration design/implementation decisions, especially in response to consequences of misconfigurations. To this end, we studied 1178 configuration-related commits from a 2.5 year version-control history of four large-scale, actively-maintained open-source cloud systems (HDFS, HBase, Spark, and Cassandra). We derive new insights into the software configuration engineering process. Our results motivate new techniques for proactively reducing misconfigurations by improving the configuration design and implementation process in cloud systems. We highlight a number of future research directions

arXiv.org e-Print Archive