5,961 research outputs found
Robust PCA as Bilinear Decomposition with Outlier-Sparsity Regularization
Principal component analysis (PCA) is widely used for dimensionality
reduction, with well-documented merits in various applications involving
high-dimensional data, including computer vision, preference measurement, and
bioinformatics. In this context, the fresh look advocated here permeates
benefits from variable selection and compressive sampling, to robustify PCA
against outliers. A least-trimmed squares estimator of a low-rank bilinear
factor analysis model is shown closely related to that obtained from an
-(pseudo)norm-regularized criterion encouraging sparsity in a matrix
explicitly modeling the outliers. This connection suggests robust PCA schemes
based on convex relaxation, which lead naturally to a family of robust
estimators encompassing Huber's optimal M-class as a special case. Outliers are
identified by tuning a regularization parameter, which amounts to controlling
sparsity of the outlier matrix along the whole robustification path of (group)
least-absolute shrinkage and selection operator (Lasso) solutions. Beyond its
neat ties to robust statistics, the developed outlier-aware PCA framework is
versatile to accommodate novel and scalable algorithms to: i) track the
low-rank signal subspace robustly, as new data are acquired in real time; and
ii) determine principal components robustly in (possibly) infinite-dimensional
feature spaces. Synthetic and real data tests corroborate the effectiveness of
the proposed robust PCA schemes, when used to identify aberrant responses in
personality assessment surveys, as well as unveil communities in social
networks, and intruders from video surveillance data.Comment: 30 pages, submitted to IEEE Transactions on Signal Processin
A novel R-package graphic user interface for the analysis of metabonomic profiles
Background Analysis of the plethora of metabolites found in the NMR spectra of biological fluids or tissues requires data complexity to be simplified. We present a graphical user interface (GUI) for NMR-based metabonomic analysis. The "Metabonomic Package" has been developed for metabonomics research as open-source software and uses the R statistical libraries. /Results The package offers the following options: Raw 1-dimensional spectra processing: phase, baseline correction and normalization. Importing processed spectra. Including/excluding spectral ranges, optional binning and bucketing, detection and alignment of peaks. Sorting of metabolites based on their ability to discriminate, metabolite selection, and outlier identification. Multivariate unsupervised analysis: principal components analysis (PCA). Multivariate supervised analysis: partial least squares (PLS), linear discriminant analysis (LDA), k-nearest neighbor classification. Neural networks. Visualization and overlapping of spectra. Plot values of the chemical shift position for different samples. Furthermore, the "Metabonomic" GUI includes a console to enable other kinds of analyses and to take advantage of all R statistical tools. /Conclusion We made complex multivariate analysis user-friendly for both experienced and novice users, which could help to expand the use of NMR-based metabonomics
One-Class Classification: Taxonomy of Study and Review of Techniques
One-class classification (OCC) algorithms aim to build classification models
when the negative class is either absent, poorly sampled or not well defined.
This unique situation constrains the learning of efficient classifiers by
defining class boundary just with the knowledge of positive class. The OCC
problem has been considered and applied under many research themes, such as
outlier/novelty detection and concept learning. In this paper we present a
unified view of the general problem of OCC by presenting a taxonomy of study
for OCC problems, which is based on the availability of training data,
algorithms used and the application domains applied. We further delve into each
of the categories of the proposed taxonomy and present a comprehensive
literature review of the OCC algorithms, techniques and methodologies with a
focus on their significance, limitations and applications. We conclude our
paper by discussing some open research problems in the field of OCC and present
our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
Maximally Divergent Intervals for Anomaly Detection
We present new methods for batch anomaly detection in multivariate time
series. Our methods are based on maximizing the Kullback-Leibler divergence
between the data distribution within and outside an interval of the time
series. An empirical analysis shows the benefits of our algorithms compared to
methods that treat each time step independently from each other without
optimizing with respect to all possible intervals.Comment: ICML Workshop on Anomaly Detectio
- …