5,961 research outputs found

    Robust PCA as Bilinear Decomposition with Outlier-Sparsity Regularization

    Full text link
    Principal component analysis (PCA) is widely used for dimensionality reduction, with well-documented merits in various applications involving high-dimensional data, including computer vision, preference measurement, and bioinformatics. In this context, the fresh look advocated here permeates benefits from variable selection and compressive sampling, to robustify PCA against outliers. A least-trimmed squares estimator of a low-rank bilinear factor analysis model is shown closely related to that obtained from an 0\ell_0-(pseudo)norm-regularized criterion encouraging sparsity in a matrix explicitly modeling the outliers. This connection suggests robust PCA schemes based on convex relaxation, which lead naturally to a family of robust estimators encompassing Huber's optimal M-class as a special case. Outliers are identified by tuning a regularization parameter, which amounts to controlling sparsity of the outlier matrix along the whole robustification path of (group) least-absolute shrinkage and selection operator (Lasso) solutions. Beyond its neat ties to robust statistics, the developed outlier-aware PCA framework is versatile to accommodate novel and scalable algorithms to: i) track the low-rank signal subspace robustly, as new data are acquired in real time; and ii) determine principal components robustly in (possibly) infinite-dimensional feature spaces. Synthetic and real data tests corroborate the effectiveness of the proposed robust PCA schemes, when used to identify aberrant responses in personality assessment surveys, as well as unveil communities in social networks, and intruders from video surveillance data.Comment: 30 pages, submitted to IEEE Transactions on Signal Processin

    A novel R-package graphic user interface for the analysis of metabonomic profiles

    Get PDF
    Background Analysis of the plethora of metabolites found in the NMR spectra of biological fluids or tissues requires data complexity to be simplified. We present a graphical user interface (GUI) for NMR-based metabonomic analysis. The "Metabonomic Package" has been developed for metabonomics research as open-source software and uses the R statistical libraries. /Results The package offers the following options: Raw 1-dimensional spectra processing: phase, baseline correction and normalization. Importing processed spectra. Including/excluding spectral ranges, optional binning and bucketing, detection and alignment of peaks. Sorting of metabolites based on their ability to discriminate, metabolite selection, and outlier identification. Multivariate unsupervised analysis: principal components analysis (PCA). Multivariate supervised analysis: partial least squares (PLS), linear discriminant analysis (LDA), k-nearest neighbor classification. Neural networks. Visualization and overlapping of spectra. Plot values of the chemical shift position for different samples. Furthermore, the "Metabonomic" GUI includes a console to enable other kinds of analyses and to take advantage of all R statistical tools. /Conclusion We made complex multivariate analysis user-friendly for both experienced and novice users, which could help to expand the use of NMR-based metabonomics

    One-Class Classification: Taxonomy of Study and Review of Techniques

    Full text link
    One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

    Maximally Divergent Intervals for Anomaly Detection

    Full text link
    We present new methods for batch anomaly detection in multivariate time series. Our methods are based on maximizing the Kullback-Leibler divergence between the data distribution within and outside an interval of the time series. An empirical analysis shows the benefits of our algorithms compared to methods that treat each time step independently from each other without optimizing with respect to all possible intervals.Comment: ICML Workshop on Anomaly Detectio
    corecore