186 research outputs found
Gaussian process surrogates for failure detection: a Bayesian experimental design approach
An important task of uncertainty quantification is to identify {the
probability of} undesired events, in particular, system failures, caused by
various sources of uncertainties. In this work we consider the construction of
Gaussian {process} surrogates for failure detection and failure probability
estimation. In particular, we consider the situation that the underlying
computer models are extremely expensive, and in this setting, determining the
sampling points in the state space is of essential importance. We formulate the
problem as an optimal experimental design for Bayesian inferences of the limit
state (i.e., the failure boundary) and propose an efficient numerical scheme to
solve the resulting optimization problem. In particular, the proposed
limit-state inference method is capable of determining multiple sampling points
at a time, and thus it is well suited for problems where multiple computer
simulations can be performed in parallel. The accuracy and performance of the
proposed method is demonstrated by both academic and practical examples
Efficient Constellation-Based Map-Merging for Semantic SLAM
Data association in SLAM is fundamentally challenging, and handling ambiguity
well is crucial to achieve robust operation in real-world environments. When
ambiguous measurements arise, conservatism often mandates that the measurement
is discarded or a new landmark is initialized rather than risking an incorrect
association. To address the inevitable `duplicate' landmarks that arise, we
present an efficient map-merging framework to detect duplicate constellations
of landmarks, providing a high-confidence loop-closure mechanism well-suited
for object-level SLAM. This approach uses an incrementally-computable
approximation of landmark uncertainty that only depends on local information in
the SLAM graph, avoiding expensive recovery of the full system covariance
matrix. This enables a search based on geometric consistency (GC) (rather than
full joint compatibility (JC)) that inexpensively reduces the search space to a
handful of `best' hypotheses. Furthermore, we reformulate the commonly-used
interpretation tree to allow for more efficient integration of clique-based
pairwise compatibility, accelerating the branch-and-bound max-cardinality
search. Our method is demonstrated to match the performance of full JC methods
at significantly-reduced computational cost, facilitating robust object-based
loop-closure over large SLAM problems.Comment: Accepted to IEEE International Conference on Robotics and Automation
(ICRA) 201
Predictive Modelling Approach to Data-Driven Computational Preventive Medicine
This thesis contributes novel predictive modelling approaches to data-driven computational preventive medicine and offers an alternative framework to statistical analysis in preventive medicine research. In the early parts of this research, this thesis presents research by proposing a synergy of machine learning methods for detecting patterns and developing inexpensive predictive models from healthcare data to classify the potential occurrence of adverse health events. In particular, the data-driven methodology is founded upon a heuristic-systematic assessment of several machine-learning methods, data preprocessing techniques, models’ training estimation and optimisation, and performance evaluation, yielding a novel computational data-driven framework, Octopus.
Midway through this research, this thesis advances research in preventive medicine and data mining by proposing several new extensions in data preparation and preprocessing. It offers new recommendations for data quality assessment checks, a novel multimethod imputation (MMI) process for missing data mitigation, a novel imbalanced resampling approach, and minority pattern reconstruction (MPR) led by information theory. This thesis also extends the area of model performance evaluation with a novel classification performance ranking metric called XDistance.
In particular, the experimental results show that building predictive models with the methods guided by our new framework (Octopus) yields domain experts' approval of the new reliable models’ performance. Also, performing the data quality checks and applying the MMI process led healthcare practitioners to outweigh predictive reliability over interpretability. The application of MPR and its hybrid resampling strategies led to better performances in line with experts' success criteria than the traditional imbalanced data resampling techniques. Finally, the use of the XDistance performance ranking metric was found to be more effective in ranking several classifiers' performances while offering an indication of class bias, unlike existing performance metrics
The overall contributions of this thesis can be summarised as follow. First, several data mining techniques were thoroughly assessed to formulate the new Octopus framework to produce new reliable classifiers. In addition, we offer a further understanding of the impact of newly engineered features, the physical activity index (PAI) and biological effective dose (BED). Second, the newly developed methods within the new framework. Finally, the newly accepted developed predictive models help detect adverse health events, namely, visceral fat-associated diseases and advanced breast cancer radiotherapy toxicity side effects. These contributions could be used to guide future theories, experiments and healthcare interventions in preventive medicine and data mining
A generalized risk approach to path inference based on hidden Markov models
Motivated by the unceasing interest in hidden Markov models (HMMs), this
paper re-examines hidden path inference in these models, using primarily a
risk-based framework. While the most common maximum a posteriori (MAP), or
Viterbi, path estimator and the minimum error, or Posterior Decoder (PD), have
long been around, other path estimators, or decoders, have been either only
hinted at or applied more recently and in dedicated applications generally
unfamiliar to the statistical learning community. Over a decade ago, however, a
family of algorithmically defined decoders aiming to hybridize the two standard
ones was proposed (Brushe et al., 1998). The present paper gives a careful
analysis of this hybridization approach, identifies several problems and issues
with it and other previously proposed approaches, and proposes practical
resolutions of those. Furthermore, simple modifications of the classical
criteria for hidden path recognition are shown to lead to a new class of
decoders. Dynamic programming algorithms to compute these decoders in the usual
forward-backward manner are presented. A particularly interesting subclass of
such estimators can be also viewed as hybrids of the MAP and PD estimators.
Similar to previously proposed MAP-PD hybrids, the new class is parameterized
by a small number of tunable parameters. Unlike their algorithmic predecessors,
the new risk-based decoders are more clearly interpretable, and, most
importantly, work "out of the box" in practice, which is demonstrated on some
real bioinformatics tasks and data. Some further generalizations and
applications are discussed in conclusion.Comment: Section 5: corrected denominators of the scaled beta variables (pp.
27-30), => corrections in claims 1, 3, Prop. 12, bottom of Table 1. Decoder
(49), Corol. 14 are generalized to handle 0 probabilities. Notation is more
closely aligned with (Bishop, 2006). Details are inserted in eqn-s (43); the
positivity assumption in Prop. 11 is explicit. Fixed typing errors in
equation (41), Example
Unionization method for changing opinion in sentiment classification using machine learning
Sentiment classification aims to determine whether an opinionated text expresses a positive, negative or neutral opinion. Most existing sentiment classification approaches have focused on supervised text classification techniques. One critical problem of sentiment classification is that a text collection may contain tens or hundreds of thousands of features, i.e. high dimensionality, which can be solved by dimension reduction approach. Nonetheless, although feature selection as a dimension reduction method can reduce feature space to provide a reduced feature subset, the size of the subset commonly requires further reduction. In this research, a novel dimension reduction approach called feature unionization is proposed to construct a more reduced feature subset. This approach works based on the combination of several features to create a more informative single feature. Another challenge of sentiment classification is the handling of concept drift problem in the learning step. Users’ opinions are changed due to evolution of target entities over time. However, the existing sentiment classification approaches do not consider the evolution of users’ opinions. They assume that instances are independent, identically distributed and generated from a stationary distribution, even though they are generated from a stream distribution. In this study, a stream sentiment classification method is proposed to deal with changing opinion and imbalanced data distribution using ensemble learning and instance selection methods. In relation to the concept drift problem, another important issue is the handling of feature drift in the sentiment classification. To handle feature drift, relevant features need to be detected to update classifiers. Since proposed feature unionization method is very effective to construct more relevant features, it is further used to handle feature drift. Thus, a method to deal with concept and feature drifts for stream sentiment classification was proposed. The effectiveness of the feature unionization method was compared with the feature selection method over fourteen publicly available datasets in sentiment classification domain using three typical classifiers. The experimental results showed the proposed approach is more effective than current feature selection approaches. In addition, the experimental results showed the effectiveness of the proposed stream sentiment classification method in comparison to static sentiment classification. The experiments conducted on four datasets, have successfully shown that the proposed algorithm achieved better results and proving the effectiveness of the proposed method
- …