22 research outputs found
Training genetic programming classifiers by vicinal-risk minimization
We propose and motivate the use of vicinal-risk minimization (VRM) for training genetic programming classifiers. We demonstrate that VRM has a number of attractive properties and demonstrate that it has a better correlation with generalization error compared to empirical risk minimization (ERM) so is more likely to lead to better generalization performance, in general. From the results of statistical tests over a range of real and synthetic datasets, we further demonstrate that VRM yields consistently superior generalization errors compared to conventional ERM
Tikhonov Regularization as a Complexity Measure in Multiobjective Genetic Programming
© 1997-2012 IEEE. In this paper, we propose the use of Tikhonov regularization in conjunction with node count as a general complexity measure in multiobjective genetic programming. We demonstrate that employing this general complexity yields mean squared test error measures over a range of regression problems, which are typically superior to those from conventional node count (but never statistically worse). We also analyze the reason that our new method outperforms the conventional complexity measure and conclude that it forms a decision mechanism that balances both syntactic and semantic information
The use of vicinal-risk minimization for training decision trees
We propose the use of Vapnik's vicinal risk minimization (VRM) for training decision trees to approximately maximize decision margins. We implement VRM by propagating uncertainties in the input attributes into the labeling decisions. In this way, we perform a global regularization over the decision tree structure. During a training phase, a decision tree is constructed to minimize the total probability of misclassifying the labeled training examples, a process which approximately maximizes the margins of the resulting classifier. We perform the necessary minimization using an appropriate meta-heuristic (genetic programming) and present results over a range of synthetic and benchmark real datasets. We demonstrate the statistical superiority of VRM training over conventional empirical risk minimization (ERM) and the well-known C4.5 algorithm, for a range of synthetic and real datasets. We also conclude that there is no statistical difference between trees trained by ERM and using C4.5. Training with VRM is shown to be more stable and repeatable than by ERM
STEM Rebalance: A Novel Approach for Tackling Imbalanced Datasets using SMOTE, Edited Nearest Neighbour, and Mixup
Imbalanced datasets in medical imaging are characterized by skewed class
proportions and scarcity of abnormal cases. When trained using such data,
models tend to assign higher probabilities to normal cases, leading to biased
performance. Common oversampling techniques such as SMOTE rely on local
information and can introduce marginalization issues. This paper investigates
the potential of using Mixup augmentation that combines two training examples
along with their corresponding labels to generate new data points as a generic
vicinal distribution. To this end, we propose STEM, which combines SMOTE-ENN
and Mixup at the instance level. This integration enables us to effectively
leverage the entire distribution of minority classes, thereby mitigating both
between-class and within-class imbalances. We focus on the breast cancer
problem, where imbalanced datasets are prevalent. The results demonstrate the
effectiveness of STEM, which achieves AUC values of 0.96 and 0.99 in the
Digital Database for Screening Mammography and Wisconsin Breast Cancer
(Diagnostics) datasets, respectively. Moreover, this method shows promising
potential when applied with an ensemble of machine learning (ML) classifiers.Comment: 7 pages, 4 figures, International Conference on Intelligent Computer
Communication and Processin
A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community
In recent years, deep learning (DL), a re-branding of neural networks (NNs),
has risen to the top in numerous areas, namely computer vision (CV), speech
recognition, natural language processing, etc. Whereas remote sensing (RS)
possesses a number of unique challenges, primarily related to sensors and
applications, inevitably RS draws from many of the same theories as CV; e.g.,
statistics, fusion, and machine learning, to name a few. This means that the RS
community should be aware of, if not at the leading edge of, of advancements
like DL. Herein, we provide the most comprehensive survey of state-of-the-art
RS DL research. We also review recent new developments in the DL field that can
be used in DL for RS. Namely, we focus on theories, tools and challenges for
the RS community. Specifically, we focus on unsolved challenges and
opportunities as it relates to (i) inadequate data sets, (ii)
human-understandable solutions for modelling physical phenomena, (iii) Big
Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and
learning algorithms for spectral, spatial and temporal data, (vi) transfer
learning, (vii) an improved theoretical understanding of DL systems, (viii)
high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote
Sensin
From Points to Probability Measures: Statistical Learning on Distributions with Kernel Mean Embedding
The dissertation presents a novel learning framework on probability measures which has abundant real-world
applications. In classical setup, it is assumed that the data are points that have been drawn independent and identically (i.i.d.) from some unknown distribution. In many scenarios, however, representing data as distributions may be more preferable. For instance, when the measurement is noisy, we may tackle the uncertainty by treating the data themselves as distributions, which is often the case for microarray data and
astronomical data where the measurement process is imprecise and replication is often required. Distributions not only embody individual data points, but also constitute information about their interactions which can be beneficial for structural learning in high-energy physics, cosmology, causality, and so on. Moreover, classical problems in statistics such as statistical estimation, hypothesis testing, and causal inference, may be interpreted in a decision-theoretic sense as machine learning problems on empirical distributions. Rephrasing these problems as such leads to novel approach for statistical inference and estimation. Hence, allowing learning algorithms to operate directly on distributions prompts a wide range of future applications.
To work with distributions, the key methodology adopted in this thesis is the kernel mean embedding of distributions which represents each distribution as a mean function in a reproducing kernel Hilbert space (RKHS). In particular, the kernel mean embedding has been applied successfully in two-sample testing, graphical model, and probabilistic inference. On the other hand, this thesis will focus mainly on the predictive learning on distributions, i.e., when the observations are distributions and the goal is to make prediction about the previously unseen distributions. More importantly, the thesis investigates kernel mean estimation which is one of the most fundamental problems of kernel methods.
Probability distributions, as opposed to data points, constitute information at a higher level such as aggregate behavior of data points, how the underlying process evolves over time and domains, and a complex concept that cannot be described merely by individual points. Intelligent organisms have the ability to recognize and exploit such information naturally. Thus, this work may shed light on future development of intelligent machines, and most importantly, may provide clues on the true meaning of intelligence