317 research outputs found
Boosting Operational DNN Testing Efficiency through Conditioning
With the increasing adoption of Deep Neural Network (DNN) models as integral
parts of software systems, efficient operational testing of DNNs is much in
demand to ensure these models' actual performance in field conditions. A
challenge is that the testing often needs to produce precise results with a
very limited budget for labeling data collected in field.
Viewing software testing as a practice of reliability estimation through
statistical sampling, we re-interpret the idea behind conventional structural
coverages as conditioning for variance reduction. With this insight we propose
an efficient DNN testing method based on the conditioning on the representation
learned by the DNN model under testing. The representation is defined by the
probability distribution of the output of neurons in the last hidden layer of
the model. To sample from this high dimensional distribution in which the
operational data are sparsely distributed, we design an algorithm leveraging
cross entropy minimization.
Experiments with various DNN models and datasets were conducted to evaluate
the general efficiency of the approach. The results show that, compared with
simple random sampling, this approach requires only about a half of labeled
inputs to achieve the same level of precision.Comment: Published in the Proceedings of the 27th ACM Joint European Software
Engineering Conference and Symposium on the Foundations of Software
Engineering (ESEC/FSE 2019
Iterative Assessment and Improvement of DNN Operational Accuracy
Deep Neural Networks (DNN) are nowadays largely adopted in many application
domains thanks to their human-like, or even superhuman, performance in specific
tasks. However, due to unpredictable/unconsidered operating conditions,
unexpected failures show up on field, making the performance of a DNN in
operation very different from the one estimated prior to release. In the life
cycle of DNN systems, the assessment of accuracy is typically addressed in two
ways: offline, via sampling of operational inputs, or online, via
pseudo-oracles. The former is considered more expensive due to the need for
manual labeling of the sampled inputs. The latter is automatic but less
accurate. We believe that emerging iterative industrial-strength life cycle
models for Machine Learning systems, like MLOps, offer the possibility to
leverage inputs observed in operation not only to provide faithful estimates of
a DNN accuracy, but also to improve it through remodeling/retraining actions.
We propose DAIC (DNN Assessment and Improvement Cycle), an approach which
combines ''low-cost'' online pseudo-oracles and ''high-cost'' offline sampling
techniques to estimate and improve the operational accuracy of a DNN in the
iterations of its life cycle. Preliminary results show the benefits of
combining the two approaches and integrating them in the DNN life cycle
Recommended from our members
Building thermal load prediction through shallow machine learning and deep learning
Building thermal load prediction informs the optimization of cooling plant and thermal energy storage. Physics-based prediction models of building thermal load are constrained by the model and input complexity. In this study, we developed 12 data-driven models (7 shallow learning, 2 deep learning, and 3 heuristic methods) to predict building thermal load and compared shallow machine learning and deep learning. The 12 prediction models were compared with the measured cooling demand. It was found XGBoost (Extreme Gradient Boost) and LSTM (Long Short Term Memory) provided the most accurate load prediction in the shallow and deep learning category, and both outperformed the best baseline model, which uses the previous day's data for prediction. Then, we discussed how the prediction horizon and input uncertainty would influence the load prediction accuracy. Major conclusions are twofold: first, LSTM performs well in short-term prediction (1 h ahead) but not in long term prediction (24 h ahead), because the sequential information becomes less relevant and accordingly not so useful when the prediction horizon is long. Second, the presence of weather forecast uncertainty deteriorates XGBoost's accuracy and favors LSTM, because the sequential information makes the model more robust to input uncertainty. Training the model with the uncertain rather than accurate weather data could enhance the model's robustness. Our findings have two implications for practice. First, LSTM is recommended for short-term load prediction given that weather forecast uncertainty is unavoidable. Second, XGBoost is recommended for long term prediction, and the model should be trained with the presence of input uncertainty
Operational Calibration: Debugging Confidence Errors for DNNs in the Field
Trained DNN models are increasingly adopted as integral parts of software
systems, but they often perform deficiently in the field. A particularly
damaging problem is that DNN models often give false predictions with high
confidence, due to the unavoidable slight divergences between operation data
and training data. To minimize the loss caused by inaccurate confidence,
operational calibration, i.e., calibrating the confidence function of a DNN
classifier against its operation domain, becomes a necessary debugging step in
the engineering of the whole system.
Operational calibration is difficult considering the limited budget of
labeling operation data and the weak interpretability of DNN models. We propose
a Bayesian approach to operational calibration that gradually corrects the
confidence given by the model under calibration with a small number of labeled
operation data deliberately selected from a larger set of unlabeled operation
data. The approach is made effective and efficient by leveraging the locality
of the learned representation of the DNN model and modeling the calibration as
Gaussian Process Regression. Comprehensive experiments with various practical
datasets and DNN models show that it significantly outperformed alternative
methods, and in some difficult tasks it eliminated about 71% to 97%
high-confidence (>0.9) errors with only about 10\% of the minimal amount of
labeled operation data needed for practical learning techniques to barely work.Comment: Published in the Proceedings of the 28th ACM Joint European Software
Engineering Conference and Symposium on the Foundations of Software
Engineering (ESEC/FSE 2020
Multiobjective optimization of building energy consumption and thermal comfort based on integrated BIM framework with machine learning-NSGA II
publishedVersionPaid open acces
- …