15 research outputs found
Understanding and Comparing Scalable Gaussian Process Regression for Big Data
As a non-parametric Bayesian model which produces informative predictive
distribution, Gaussian process (GP) has been widely used in various fields,
like regression, classification and optimization. The cubic complexity of
standard GP however leads to poor scalability, which poses challenges in the
era of big data. Hence, various scalable GPs have been developed in the
literature in order to improve the scalability while retaining desirable
prediction accuracy. This paper devotes to investigating the methodological
characteristics and performance of representative global and local scalable GPs
including sparse approximations and local aggregations from four main
perspectives: scalability, capability, controllability and robustness. The
numerical experiments on two toy examples and five real-world datasets with up
to 250K points offer the following findings. In terms of scalability, most of
the scalable GPs own a time complexity that is linear to the training size. In
terms of capability, the sparse approximations capture the long-term spatial
correlations, the local aggregations capture the local patterns but suffer from
over-fitting in some scenarios. In terms of controllability, we could improve
the performance of sparse approximations by simply increasing the inducing
size. But this is not the case for local aggregations. In terms of robustness,
local aggregations are robust to various initializations of hyperparameters due
to the local attention mechanism. Finally, we highlight that the proper hybrid
of global and local scalable GPs may be a promising way to improve both the
model capability and scalability for big data.Comment: 25 pages, 15 figures, preprint submitted to KB
Uncertainty Quantification in Machine Learning for Engineering Design and Health Prognostics: A Tutorial
On top of machine learning models, uncertainty quantification (UQ) functions
as an essential layer of safety assurance that could lead to more principled
decision making by enabling sound risk assessment and management. The safety
and reliability improvement of ML models empowered by UQ has the potential to
significantly facilitate the broad adoption of ML solutions in high-stakes
decision settings, such as healthcare, manufacturing, and aviation, to name a
few. In this tutorial, we aim to provide a holistic lens on emerging UQ methods
for ML models with a particular focus on neural networks and the applications
of these UQ methods in tackling engineering design as well as prognostics and
health management problems. Toward this goal, we start with a comprehensive
classification of uncertainty types, sources, and causes pertaining to UQ of ML
models. Next, we provide a tutorial-style description of several
state-of-the-art UQ methods: Gaussian process regression, Bayesian neural
network, neural network ensemble, and deterministic UQ methods focusing on
spectral-normalized neural Gaussian process. Established upon the mathematical
formulations, we subsequently examine the soundness of these UQ methods
quantitatively and qualitatively (by a toy regression example) to examine their
strengths and shortcomings from different dimensions. Then, we review
quantitative metrics commonly used to assess the quality of predictive
uncertainty in classification and regression problems. Afterward, we discuss
the increasingly important role of UQ of ML models in solving challenging
problems in engineering design and health prognostics. Two case studies with
source codes available on GitHub are used to demonstrate these UQ methods and
compare their performance in the life prediction of lithium-ion batteries at
the early stage and the remaining useful life prediction of turbofan engines
Proceedings of the 36th International Workshop Statistical Modelling July 18-22, 2022 - Trieste, Italy
The 36th International Workshop on Statistical Modelling (IWSM) is the first one held in presence after a two year hiatus due to the COVID-19 pandemic.
This edition was quite lively, with 60 oral presentations and 53 posters, covering a vast variety of topics.
As usual, the extended abstracts of the papers are collected in the IWSM proceedings, but unlike the previous workshops, this year the proceedings will be not printed on paper, but it is only online.
The workshop proudly maintains its almost unique feature of scheduling one plenary session for the whole week. This choice has always contributed to the stimulating atmosphere of the conference, combined with its informal character, encouraging the exchange of ideas and cross-fertilization among different areas as a distinguished tradition of the workshop, student participation has been strongly encouraged. This IWSM edition is particularly successful in this respect, as testified by the large number of students included in the program