5,307 research outputs found
One-Class Classification: Taxonomy of Study and Review of Techniques
One-class classification (OCC) algorithms aim to build classification models
when the negative class is either absent, poorly sampled or not well defined.
This unique situation constrains the learning of efficient classifiers by
defining class boundary just with the knowledge of positive class. The OCC
problem has been considered and applied under many research themes, such as
outlier/novelty detection and concept learning. In this paper we present a
unified view of the general problem of OCC by presenting a taxonomy of study
for OCC problems, which is based on the availability of training data,
algorithms used and the application domains applied. We further delve into each
of the categories of the proposed taxonomy and present a comprehensive
literature review of the OCC algorithms, techniques and methodologies with a
focus on their significance, limitations and applications. We conclude our
paper by discussing some open research problems in the field of OCC and present
our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
Learning Using Privileged Information: SVM+ and Weighted SVM
Prior knowledge can be used to improve predictive performance of learning
algorithms or reduce the amount of data required for training. The same goal is
pursued within the learning using privileged information paradigm which was
recently introduced by Vapnik et al. and is aimed at utilizing additional
information available only at training time -- a framework implemented by SVM+.
We relate the privileged information to importance weighting and show that the
prior knowledge expressible with privileged features can also be encoded by
weights associated with every training example. We show that a weighted SVM can
always replicate an SVM+ solution, while the converse is not true and we
construct a counterexample highlighting the limitations of SVM+. Finally, we
touch on the problem of choosing weights for weighted SVMs when privileged
features are not available.Comment: 18 pages, 8 figures; integrated reviewer comments, improved
typesettin
Uncertainty-Aware Principal Component Analysis
We present a technique to perform dimensionality reduction on data that is
subject to uncertainty. Our method is a generalization of traditional principal
component analysis (PCA) to multivariate probability distributions. In
comparison to non-linear methods, linear dimensionality reduction techniques
have the advantage that the characteristics of such probability distributions
remain intact after projection. We derive a representation of the PCA sample
covariance matrix that respects potential uncertainty in each of the inputs,
building the mathematical foundation of our new method: uncertainty-aware PCA.
In addition to the accuracy and performance gained by our approach over
sampling-based strategies, our formulation allows us to perform sensitivity
analysis with regard to the uncertainty in the data. For this, we propose
factor traces as a novel visualization that enables to better understand the
influence of uncertainty on the chosen principal components. We provide
multiple examples of our technique using real-world datasets. As a special
case, we show how to propagate multivariate normal distributions through PCA in
closed form. Furthermore, we discuss extensions and limitations of our
approach
Hyperspectral colon tissue cell classification
A novel algorithm to discriminate between normal and malignant tissue cells of the human colon is presented. The microscopic level images of human colon tissue cells were acquired using hyperspectral imaging technology at contiguous wavelength intervals of visible light. While hyperspectral imagery data provides a wealth of information, its large size normally means high computational processing complexity. Several methods exist to avoid the so-called curse of dimensionality and hence reduce the computational complexity. In this study, we experimented with Principal Component Analysis (PCA) and two modifications of Independent Component Analysis (ICA). In the first stage of the algorithm, the extracted components are used to separate four constituent parts of the colon tissue: nuclei, cytoplasm, lamina propria, and lumen. The segmentation is performed in an unsupervised fashion using the nearest centroid clustering algorithm. The segmented image is further used, in the second stage of the classification algorithm, to exploit the spatial relationship between the labeled constituent parts. Experimental results using supervised Support Vector Machines (SVM) classification based on multiscale morphological features reveal the discrimination between normal and malignant tissue cells with a reasonable degree of accuracy
New Fuzzy Support Vector Machine for the Class Imbalance Problem in Medical Datasets Classification
In medical datasets classification, support vector machine (SVM) is considered to be one of the most successful methods. However, most of the real-world medical datasets usually contain some outliers/noise and data often have class imbalance problems. In this paper, a fuzzy support machine (FSVM) for the class imbalance problem (called FSVM-CIP) is presented, which can be seen as a modified class of FSVM by extending manifold regularization and assigning two misclassification costs for two classes. The proposed FSVM-CIP can be used to handle the class imbalance problem in the presence of outliers/noise, and enhance the locality maximum margin. Five real-world medical datasets, breast, heart, hepatitis, BUPA liver, and pima diabetes, from the UCI medical database are employed to illustrate the method presented in this paper. Experimental results on these datasets show the outperformed or comparable effectiveness of FSVM-CIP
Total Variation Regularized Tensor RPCA for Background Subtraction from Compressive Measurements
Background subtraction has been a fundamental and widely studied task in
video analysis, with a wide range of applications in video surveillance,
teleconferencing and 3D modeling. Recently, motivated by compressive imaging,
background subtraction from compressive measurements (BSCM) is becoming an
active research task in video surveillance. In this paper, we propose a novel
tensor-based robust PCA (TenRPCA) approach for BSCM by decomposing video frames
into backgrounds with spatial-temporal correlations and foregrounds with
spatio-temporal continuity in a tensor framework. In this approach, we use 3D
total variation (TV) to enhance the spatio-temporal continuity of foregrounds,
and Tucker decomposition to model the spatio-temporal correlations of video
background. Based on this idea, we design a basic tensor RPCA model over the
video frames, dubbed as the holistic TenRPCA model (H-TenRPCA). To characterize
the correlations among the groups of similar 3D patches of video background, we
further design a patch-group-based tensor RPCA model (PG-TenRPCA) by joint
tensor Tucker decompositions of 3D patch groups for modeling the video
background. Efficient algorithms using alternating direction method of
multipliers (ADMM) are developed to solve the proposed models. Extensive
experiments on simulated and real-world videos demonstrate the superiority of
the proposed approaches over the existing state-of-the-art approaches.Comment: To appear in IEEE TI
Machine learning techniques implementation in power optimization, data processing, and bio-medical applications
The rapid progress and development in machine-learning algorithms becomes a key factor in determining the future of humanity. These algorithms and techniques were utilized to solve a wide spectrum of problems extended from data mining and knowledge discovery to unsupervised learning and optimization. This dissertation consists of two study areas. The first area investigates the use of reinforcement learning and adaptive critic design algorithms in the field of power grid control. The second area in this dissertation, consisting of three papers, focuses on developing and applying clustering algorithms on biomedical data. The first paper presents a novel modelling approach for demand side management of electric water heaters using Q-learning and action-dependent heuristic dynamic programming. The implemented approaches provide an efficient load management mechanism that reduces the overall power cost and smooths grid load profile. The second paper implements an ensemble statistical and subspace-clustering model for analyzing the heterogeneous data of the autism spectrum disorder. The paper implements a novel k-dimensional algorithm that shows efficiency in handling heterogeneous dataset. The third paper provides a unified learning model for clustering neuroimaging data to identify the potential risk factors for suboptimal brain aging. In the last paper, clustering and clustering validation indices are utilized to identify the groups of compounds that are responsible for plant uptake and contaminant transportation from roots to plants edible parts --Abstract, page iv
- …