661 research outputs found
Bayesian Uncertainty Analysis and Decision Support for Complex Models of Physical Systems with Application to Production Optimisation of Subsurface Energy Resources
Important decision making problems are increasingly addressed using computer models for complex real world systems. However, there are major limitations to their direct use including: their complex structure; large numbers of inputs and outputs; the presence of many sources of uncertainty; which is further compounded by their long evaluation times. Bayesian methodology for the analysis of computer models has been extensively developed to perform inference for the physical systems. In this thesis, the Bayesian uncertainty analysis methodology is extended to provide robust decision support under uncertainty.
Bayesian emulators are employed as a fast and efficient statistical approximation for computer models. We establish a hierarchical Bayesian emulation framework that exploits known constrained simulator behaviour in constituents of the decision support utility function. In addition, novel Bayesian emulation methodology is developed for computer models with structured partial discontinuities. We advance the crucial uncertainty quantification methodology to perform a robust decision analysis developing a technique to assess and remove linear transformations of the utility function induced by sources of uncertainty to which conclusions are invariant, as well as incorporating structural model discrepancy and decision implementation error. These are encompassed within a novel iterative decision support procedure which acknowledges utility function uncertainty resulting from the separation of the analysts and final decision makers to deliver a robust class of decisions, along with any additional information, for further consideration. The complete toolkit is successfully demonstrated via an application to the problem of optimal petroleum field development, including an international and commercially important benchmark challenge
PFNs Are Flexible Models for Real-World Bayesian Optimization
In this paper, we use Prior-data Fitted Networks (PFNs) as a flexible
surrogate for Bayesian Optimization (BO). PFNs are neural processes that are
trained to approximate the posterior predictive distribution (PPD) for any
prior distribution that can be efficiently sampled from. We describe how this
flexibility can be exploited for surrogate modeling in BO. We use PFNs to mimic
a naive Gaussian process (GP), an advanced GP, and a Bayesian Neural Network
(BNN). In addition, we show how to incorporate further information into the
prior, such as allowing hints about the position of optima (user priors),
ignoring irrelevant dimensions, and performing non-myopic BO by learning the
acquisition function. The flexibility underlying these extensions opens up vast
possibilities for using PFNs for BO. We demonstrate the usefulness of PFNs for
BO in a large-scale evaluation on artificial GP samples and three different
hyperparameter optimization testbeds: HPO-B, Bayesmark, and PD1. We publish
code alongside trained models at http://github.com/automl/PFNs4BO.Comment: Accepted at ICML 202
Machine learning for automatic analysis of affective behaviour
The automated analysis of affect has been gaining rapidly increasing attention by researchers over the past two decades, as it constitutes a fundamental step towards achieving next-generation computing technologies and integrating them into everyday life (e.g. via affect-aware, user-adaptive interfaces, medical imaging, health assessment, ambient intelligence etc.). The work presented in this thesis focuses on several fundamental problems manifesting in the course towards the achievement of reliable, accurate and robust affect sensing systems. In more detail, the motivation behind this work lies in recent developments in the field, namely (i) the creation of large, audiovisual databases for affect analysis in the so-called ''Big-Data`` era, along with (ii) the need to deploy systems under demanding, real-world conditions. These developments led to the requirement for the analysis of emotion expressions continuously in time, instead of merely processing static images, thus unveiling the wide range of temporal dynamics related to human behaviour to researchers. The latter entails another deviation from the traditional line of research in the field: instead of focusing on predicting posed, discrete basic emotions (happiness, surprise etc.), it became necessary to focus on spontaneous, naturalistic expressions captured under settings more proximal to real-world conditions, utilising more expressive emotion descriptions than a set of discrete labels. To this end, the main motivation of this thesis is to deal with challenges arising from the adoption of continuous dimensional emotion descriptions under naturalistic scenarios, considered to capture a much wider spectrum of expressive variability than basic emotions, and most importantly model emotional states which are commonly expressed by humans in their everyday life. In the first part of this thesis, we attempt to demystify the quite unexplored problem of predicting continuous emotional dimensions. This work is amongst the first to explore the problem of predicting emotion dimensions via multi-modal fusion, utilising facial expressions, auditory cues and shoulder gestures. A major contribution of the work presented in this thesis lies in proposing the utilisation of various relationships exhibited by emotion dimensions in order to improve the prediction accuracy of machine learning methods - an idea which has been taken on by other researchers in the field since. In order to experimentally evaluate this, we extend methods such as the Long Short-Term Memory Neural Networks (LSTM), the Relevance Vector Machine (RVM) and Canonical Correlation Analysis (CCA) in order to exploit output relationships in learning. As it is shown, this increases the accuracy of machine learning models applied to this task.
The annotation of continuous dimensional emotions is a tedious task, highly prone to the influence of various types of noise. Performed real-time by several annotators (usually experts), the annotation process can be heavily biased by factors such as subjective interpretations of the emotional states observed, the inherent ambiguity of labels related to human behaviour, the varying reaction lags exhibited by each annotator as well as other factors such as input device noise and annotation errors. In effect, the annotations manifest a strong spatio-temporal annotator-specific bias. Failing to properly deal with annotation bias and noise leads to an inaccurate ground truth, and therefore to ill-generalisable machine learning models. This deems the proper fusion of multiple annotations, and the inference of a clean, corrected version of the ``ground truth'' as one of the most significant challenges in the area. A highly important contribution of this thesis lies in the introduction of Dynamic Probabilistic Canonical Correlation Analysis (DPCCA), a method aimed at fusing noisy continuous annotations. By adopting a private-shared space model, we isolate the individual characteristics that are annotator-specific and not shared, while most importantly we model the common, underlying annotation which is shared by annotators (i.e., the derived ground truth). By further learning temporal dynamics and incorporating a time-warping process, we are able to derive a clean version of the ground truth given multiple annotations, eliminating temporal discrepancies and other nuisances.
The integration of the temporal alignment process within the proposed private-shared space model deems DPCCA suitable for the problem of temporally aligning human behaviour; that is, given temporally unsynchronised sequences (e.g., videos of two persons smiling), the goal is to generate the temporally synchronised sequences (e.g., the smile apex should co-occur in the videos). Temporal alignment is an important problem for many applications where multiple datasets need to be aligned in time. Furthermore, it is particularly suitable for the analysis of facial expressions, where the activation of facial muscles (Action Units) typically follows a set of predefined temporal phases. A highly challenging scenario is when the observations are perturbed by gross, non-Gaussian noise (e.g., occlusions), as is often the case when analysing data acquired under real-world conditions. To account for non-Gaussian noise, a robust variant of Canonical Correlation Analysis (RCCA) for robust fusion and temporal alignment is proposed. The model captures the shared, low-rank subspace of the observations, isolating the gross noise in a sparse noise term. RCCA is amongst the first robust variants of CCA proposed in literature, and as we show in related experiments outperforms other, state-of-the-art methods for related tasks such as the fusion of multiple modalities under gross noise.
Beyond private-shared space models, Component Analysis (CA) is an integral component of most computer vision systems, particularly in terms of reducing the usually high-dimensional input spaces in a meaningful manner pertaining to the task-at-hand (e.g., prediction, clustering). A final, significant contribution of this thesis lies in proposing the first unifying framework for probabilistic component analysis. The proposed framework covers most well-known CA methods, such as Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Locality Preserving Projections (LPP) and Slow Feature Analysis (SFA), providing further theoretical insights into the workings of CA. Moreover, the proposed framework is highly flexible, enabling novel CA methods to be generated by simply manipulating the connectivity of latent variables (i.e. the latent neighbourhood). As shown experimentally, methods derived via the proposed framework outperform other equivalents in several problems related to affect sensing and facial expression analysis, while providing advantages such as reduced complexity and explicit variance modelling.Open Acces
Why Do Machine Learning Practitioners Still Use Manual Tuning? A Qualitative Study
Current advanced hyperparameter optimization (HPO) methods, such as Bayesian optimization, have high sampling efficiency and facilitate replicability. Nonetheless, machine learning (ML) practitioners (e.g., engineers, scientists) mostly apply less advanced HPO methods, which can increase resource consumption during HPO or lead to underoptimized ML models. Therefore, we suspect that practitioners choose their HPO method to achieve different goals, such as decrease practitioner effort and target audience compliance. To develop HPO methods that align with such goals, the reasons why practitioners decide for specific HPO methods must be unveiled and thoroughly understood. Because qualitative research is most suitable to uncover such reasons and find potential explanations for them, we conducted semi-structured interviews to explain why practitioners choose different HPO methods. The interviews revealed six principal practitioner goals (e.g., increasing model comprehension), and eleven key factors that impact decisions for HPO methods (e.g., available computing resources). We deepen the understanding about why practitioners decide for different HPO methods and outline recommendations for improvements of HPO methods by aligning them with practitioner goals
HypBO: Expert-Guided Chemist-in-the-Loop Bayesian Search for New Materials
Robotics and automation offer massive accelerations for solving intractable,
multivariate scientific problems such as materials discovery, but the available
search spaces can be dauntingly large. Bayesian optimization (BO) has emerged
as a popular sample-efficient optimization engine, thriving in tasks where no
analytic form of the target function/property is known. Here we exploit expert
human knowledge in the form of hypotheses to direct Bayesian searches more
quickly to promising regions of chemical space. Previous methods have used
underlying distributions derived from existing experimental measurements, which
is unfeasible for new, unexplored scientific tasks. Also, such distributions
cannot capture intricate hypotheses. Our proposed method, which we call HypBO,
uses expert human hypotheses to generate an improved seed of samples.
Unpromising seeds are automatically discounted, while promising seeds are used
to augment the surrogate model data, thus achieving better-informed sampling.
This process continues in a global versus local search fashion, organized in a
bilevel optimization framework. We validate the performance of our method on a
range of synthetic functions and demonstrate its practical utility on a real
chemical design task where the use of expert hypotheses accelerates the search
performance significantly
Recommended from our members
Uncertainty in Neural Networks; Bayesian Ensembles, Priors & Prediction Intervals
The breakout success of deep neural networks (NNs) in the 2010's marked a new era in the quest to build artificial intelligence (AI). With NNs as the building block of these systems, excellent performance has been achieved on narrow, well-defined tasks where large amounts of data are available.
However, these systems lack certain capabilities that are important for broad use in real-world applications. One such capability is the communication of uncertainty in a NN's predictions and decisions. In applications such as healthcare recommendation or heavy machinery prognostics, it is vital that AI systems be aware of and express their uncertainty – this creates safer, more cautious, and ultimately more useful systems.
This thesis explores how to engineer NNs to communicate robust uncertainty estimates on their predictions, whilst minimising the impact on usability. One way to encourage uncertainty estimates to be robust is to adopt the Bayesian framework, which offers a principled approach to handling uncertainty. Two of the major contributions in this thesis relate to Bayesian NNs (BNNs).
Specifying appropriate priors is an important step in any Bayesian model, yet it is not clear how to do this in BNNs. The first contribution shows that the connection between BNNs and Gaussian Processes (GPs) provides an effective lens to study BNN priors. NN architectures are derived which mirror the combining of GP kernels to create priors tailored to a task.
The second major contribution is a novel way to perform approximate Bayesian inference in BNNs using a modified version of ensembling. Novel analysis improves an understanding of a technique known as randomised MAP sampling. It's shown this is particularly effective when strong correlations exist between parameters, making it well suited to NNs.
The third major contribution of the thesis is a non-Bayesian technique that trains a NN to directly output prediction intervals for regression tasks through a tailored objective function. This advances over related works that were incompatible with gradient descent, and ignored one source of uncertainty.EPSRC, Alan Turing Institut
- …