146 research outputs found
Uncertainty-Aware Performance Prediction for Highly Configurable Software Systems via Bayesian Neural Networks
Configurable software systems are employed in many important application
domains. Understanding the performance of the systems under all configurations
is critical to prevent potential performance issues caused by misconfiguration.
However, as the number of configurations can be prohibitively large, it is not
possible to measure the system performance under all configurations. Thus, a
common approach is to build a prediction model from a limited measurement data
to predict the performance of all configurations as scalar values. However, it
has been pointed out that there are different sources of uncertainty coming
from the data collection or the modeling process, which can make the scalar
predictions not certainly accurate. To address this problem, we propose a
Bayesian deep learning based method, namely BDLPerf, that can incorporate
uncertainty into the prediction model. BDLPerf can provide both scalar
predictions for configurations' performance and the corresponding confidence
intervals of these scalar predictions. We also develop a novel uncertainty
calibration technique to ensure the reliability of the confidence intervals
generated by a Bayesian prediction model. Finally, we suggest an efficient
hyperparameter tuning technique so as to train the prediction model within a
reasonable amount of time whilst achieving high accuracy. Our experimental
results on 10 real-world systems show that BDLPerf achieves higher accuracy
than existing approaches, in both scalar performance prediction and confidence
interval estimation
Predicting Software Performance with Divide-and-Learn
Predicting the performance of highly configurable software systems is the
foundation for performance testing and quality assurance. To that end, recent
work has been relying on machine/deep learning to model software performance.
However, a crucial yet unaddressed challenge is how to cater for the sparsity
inherited from the configuration landscape: the influence of configuration
options (features) and the distribution of data samples are highly sparse.
In this paper, we propose an approach based on the concept of
'divide-and-learn', dubbed . The basic idea is that, to handle sample
sparsity, we divide the samples from the configuration landscape into distant
divisions, for each of which we build a regularized Deep Neural Network as the
local model to deal with the feature sparsity. A newly given configuration
would then be assigned to the right model of division for the final prediction.
Experiment results from eight real-world systems and five sets of training
data reveal that, compared with the state-of-the-art approaches, performs
no worse than the best counterpart on 33 out of 40 cases (within which 26 cases
are significantly better) with up to improvement on accuracy;
requires fewer samples to reach the same/better accuracy; and producing
acceptable training overhead. Practically, also considerably improves
different global models when using them as the underlying local models, which
further strengthens its flexibility. To promote open science, all the data,
code, and supplementary figures of this work can be accessed at our repository:
https://github.com/ideas-labo/DaL.Comment: This paper has been accepted by The ACM Joint European Software
Engineering Conference and Symposium on the Foundations of Software
Engineering (ESEC/FSE), 202
Performance-Detective: Automatic Deduction of Cheap and Accurate Performance Models
The many configuration options of modern applications make it difficult for users to select a performance-optimal configuration. Performance models help users in understanding system performance and choosing a fast configuration. Existing performance modeling approaches for applications and configurable systems either require a full-factorial experiment design or a sampling design based on heuristics. This results in high costs for achieving accurate models. Furthermore, they require repeated execution of experiments to account for measurement noise. We propose Performance-Detective, a novel code analysis tool that deduces insights on the interactions of program parameters. We use the insights to derive the smallest necessary experiment design and avoiding repetitions of measurements when possible, significantly lowering the cost of performance modeling. We evaluate Performance-Detective using two case studies where we reduce the number of measurements from up to 3125 to only 25, decreasing cost to only 2.9% of the previously needed core hours, while maintaining accuracy of the resulting model with 91.5% compared to 93.8% using all 3125 measurements
Improved Quantification of Important Beer Quality Parameters based on Non-linear Calibration Methods applied to FT-MIR Spectra
During the production process of beer, it is of utmost importance to guarantee a high consistency of the beer quality. For instance, the bitterness is an essential quality parameter which has to be controlled within the specifications already at the beginning of the production process in the unfermented beer (wort) as well as in final products such as beer and beer mix beverages. Nowadays, analytical techniques for quality control in beer production are mainly based on manual supervision, i.e. samples are taken from the process and analyzed in the laboratory. This typically requires significant lab technicians efforts for only a small fraction of samples to be analyzed, which leads to significant costs for beer breweries and companies. Fourier transform mid-infrared (FT-MIR) spectroscopy was used in combination with non-linear multivariate calibration techniques to overcome (i) the time consuming off-line analyses in beer production and (ii) already known limitations of standard linear chemometric methods , like partial least squares (PLS), for important quality parameters [1][2] such as bitterness, citric acid, total acids, free amino nitrogen, final attenuation or foam stability. The calibration models are established with enhanced non-linear techniques based (i) on a new piece-wise linear version of PLS by employing fuzzy rules for local partitioning the latent variable space and (ii) on extensions of support vector regression variants (ε-PLSSVR and ν-PLSSVR), for overcoming high computation times in high-dimensional problems and time-intensive and inappropriate settings of the kernel parameters. Furthermore, we introduce a new model selection scheme based on bagged ensembles in order to improve robustness and thus predictive quality of the final models. The approaches are tested on real-world calibration data sets for wort and beer mix beverages, and successfully compared to linear methods, as showing a clear out-performance in most cases and being able to meet the model quality requirements defined by the experts at the beer company
Machine-Learned Caching of Datasets
Generally, the present disclosure is directed to creating and/or modifying a pre-cache for a client device connected to a remote server containing a dataset. In particular, in some implementations, the systems and methods of the present disclosure can include or otherwise leverage one or more machine-learned models to predict the likelihood a particular piece of data will be used (e.g. opened, edited, saved, etc.) within a time frame based on information about the data, the user’s interaction with the data, and/or the user’s schedule
Recommended from our members
Machine Learning for Performance Prediction of Data Distribution Service
Data Distribution Service (DDS) is a specification of networking middleware used in real-time mission-critical systems such as autonomous vehicles, energy management systems, and air traffic control. It follows the publish-subscribe communication patterns and adopts the use of Quality of Service (QoS) parameters, allowing customisation of the data dissemination process in real-time.
When setting up DDS systems, practitioners must ensure the required performance levels are achievable by setting appropriate QoS and non-QoS parameters. The evaluation of performance levels can be done by running experimental performance tests for different QoS configurations to find a suitable or even a near-optimal system configuration. However, evaluation via measurements with real DDS systems can be complex and expensive, needing potentially substantial time and resources.
This paper introduces, to our knowledge for the first time, the use of machine learning (ML) models to predict the performance of DDS under different system configurations. This is done by testing some system configurations and using the performance measurements to train a model. The trained model can then be used to predict the performance of DDS under other system configurations. Since the prediction is computationally inexpensive, we can predict the performance of many different configurations to find a suitable one for given requirements.
As an ML method, random forests have been used in this paper and as a baseline we use a linear regression model.
We selected six performance metrics and for each one we trained a random forests model and tuned its hyperparameters. We tested the final models on unseen system configurations in interpolating and extrapolating with respect to the system parameter values. The random forests models show strong predictive performance and are significantly better than linear regression. Five of the eleven random forests models have a coefficient of determination greater than 0.8 for unseen system configurations in the extrapolation setting.
With these models it is possible to explore a much wider range of parameters than could be done with experimentation alone. We therefore believe that this approach can be beneficial for DDS system design
System Volume Compensating for Environmental Noise
Generally, the present disclosure is directed to an audio system for compensating for ambient environmental noise. In particular, in some implementations, the systems and methods of the present disclosure can include or otherwise leverage one or more machine-learned models to predict a comfortable volume level based on an intensity of ambient noise
- …