4 research outputs found
Whence to Learn? Transferring Knowledge in Configurable Systems using BEETLE
As software systems grow in complexity and the space of possible
configurations increases exponentially, finding the near-optimal configuration
of a software system becomes challenging. Recent approaches address this
challenge by learning performance models based on a sample set of
configurations. However, collecting enough sample configurations can be very
expensive since each such sample requires configuring, compiling, and executing
the entire system using a complex test suite. When learning on new data is too
expensive, it is possible to use \textit{Transfer Learning} to "transfer" old
lessons to the new context. Traditional transfer learning has a number of
challenges, specifically, (a) learning from excessive data takes excessive
time, and (b) the performance of the models built via transfer can deteriorate
as a result of learning from a poor source. To resolve these problems, we
propose a novel transfer learning framework called BEETLE, which is a
"bellwether"-based transfer learner that focuses on identifying and learning
from the most relevant source from amongst the old data. This paper evaluates
BEETLE with 57 different software configuration problems based on five software
systems (a video encoder, an SAT solver, a SQL database, a high-performance
C-compiler, and a streaming data analytics tool). In each of these cases,
BEETLE found configurations that are as good as or better than those found by
other state-of-the-art transfer learners while requiring only a fraction
(th) of the measurements needed by those other methods. Based on
these results, we say that BEETLE is a new high-water mark in optimally
configuring software.Comment: Accepted, to appear in IEEE TSE. arXiv admin note: text overlap with
arXiv:1803.0390
Simpler Hyperparameter Optimization for Software Analytics: Why, How, When?
How to make software analytics simpler and faster? One method is to match the
complexity of analysis to the intrinsic complexity of the data being explored.
For example, hyperparameter optimizers find the control settings for data
miners that improve for improving the predictions generated via software
analytics. Sometimes, very fast hyperparameter optimization can be achieved by
just DODGE-ing away from things tried before. But when is it wise to use DODGE
and when must we use more complex (and much slower) optimizers? To answer this,
we applied hyperparameter optimization to 120 SE data sets that explored bad
smell detection, predicting Github ssue close time, bug report analysis, defect
prediction, and dozens of other non-SE problems. We find that DODGE works best
for data sets with low "intrinsic dimensionality" (D = 3) and very poorly for
higher-dimensional data (D over 8). Nearly all the SE data seen here was
intrinsically low-dimensional, indicating that DODGE is applicable for many SE
analytics tasks.Comment: arXiv admin note: substantial text overlap with arXiv:1912.0406
CADET: Debugging and Fixing Misconfigurations using Counterfactual Reasoning
Modern computing platforms are highly-configurable with thousands of
interacting configurations. However, configuring these systems is challenging.
Erroneous configurations can cause unexpected non-functional faults. This paper
proposes CADET (short for Causal Debugging Toolkit) that enables users to
identify, explain, and fix the root cause of non-functional faults early and in
a principled fashion. CADET builds a causal model by observing the performance
of the system under different configurations. Then, it uses casual path
extraction followed by counterfactual reasoning over the causal model to: (a)
identify the root causes of non-functional faults, (b) estimate the effects of
various configurable parameters on the performance objective(s), and (c)
prescribe candidate repairs to the relevant configuration options to fix the
non-functional fault. We evaluated CADET on 5 highly-configurable systems
deployed on 3 NVIDIA Jetson systems-on-chip. We compare CADET with
state-of-the-art configuration optimization and ML-based debugging approaches.
The experimental results indicate that CADET can find effective repairs for
faults in multiple non-functional properties with (at most) 17% more accuracy,
28% higher gain, and speed-up than other ML-based performance
debugging methods. Compared to multi-objective optimization approaches, CADET
can find fixes (at most) faster with comparable or better performance
gain. Our case study of non-functional faults reported in NVIDIA's forum show
that CADET can find better repairs than the experts' advice in less than
30 minutes
An Evolutionary Study of Configuration Design and Implementation in Cloud Systems
Many techniques were proposed for detecting software misconfigurations in
cloud systems and for diagnosing unintended behavior caused by such
misconfigurations. Detection and diagnosis are steps in the right direction:
misconfigurations cause many costly failures and severe performance issues.
But, we argue that continued focus on detection and diagnosis is symptomatic of
a more serious problem: configuration design and implementation are not yet
first-class software engineering endeavors in cloud systems. Little is known
about how and why developers evolve configuration design and implementation,
and the challenges that they face in doing so.
This paper presents a source-code level study of the evolution of
configuration design and implementation in cloud systems. Our goal is to
understand the rationale and developer practices for revising initial
configuration design/implementation decisions, especially in response to
consequences of misconfigurations. To this end, we studied 1178
configuration-related commits from a 2.5 year version-control history of four
large-scale, actively-maintained open-source cloud systems (HDFS, HBase, Spark,
and Cassandra). We derive new insights into the software configuration
engineering process. Our results motivate new techniques for proactively
reducing misconfigurations by improving the configuration design and
implementation process in cloud systems. We highlight a number of future
research directions