786 research outputs found

    Outlier Mining Methods Based on Graph Structure Analysis

    Get PDF
    Outlier detection in high-dimensional datasets is a fundamental and challenging problem across disciplines that has also practical implications, as removing outliers from the training set improves the performance of machine learning algorithms. While many outlier mining algorithms have been proposed in the literature, they tend to be valid or efficient for specific types of datasets (time series, images, videos, etc.). Here we propose two methods that can be applied to generic datasets, as long as there is a meaningful measure of distance between pairs of elements of the dataset. Both methods start by defining a graph, where the nodes are the elements of the dataset, and the links have associated weights that are the distances between the nodes. Then, the first method assigns an outlier score based on the percolation (i.e., the fragmentation) of the graph. The second method uses the popular IsoMap non-linear dimensionality reduction algorithm, and assigns an outlier score by comparing the geodesic distances with the distances in the reduced space. We test these algorithms on real and synthetic datasets and show that they either outperform, or perform on par with other popular outlier detection methods. A main advantage of the percolation method is that is parameter free and therefore, it does not require any training; on the other hand, the IsoMap method has two integer number parameters, and when they are appropriately selected, the method performs similar to or better than all the other methods tested.Peer ReviewedPostprint (published version

    Dimensionality Reduction Mappings

    Get PDF
    A wealth of powerful dimensionality reduction methods has been established which can be used for data visualization and preprocessing. These are accompanied by formal evaluation schemes, which allow a quantitative evaluation along general principles and which even lead to further visualization schemes based on these objectives. Most methods, however, provide a mapping of a priorly given finite set of points only, requiring additional steps for out-of-sample extensions. We propose a general view on dimensionality reduction based on the concept of cost functions, and, based on this general principle, extend dimensionality reduction to explicit mappings of the data manifold. This offers simple out-of-sample extensions. Further, it opens a way towards a theory of data visualization taking the perspective of its generalization ability to new data points. We demonstrate the approach based on a simple global linear mapping as well as prototype-based local linear mappings.

    Characterization and Reduction of Noise in Manifold Representations of Hyperspectral Imagery

    Get PDF
    A new workflow to produce dimensionality reduced manifold coordinates based on the improvements of landmark Isometric Mapping (ISOMAP) algorithms using local spectral models is proposed. Manifold space from nonlinear dimensionality reduction better addresses the nonlinearity of the hyperspectral data and often has better per- formance comparing to the results of linear methods such as Minimum Noise Fraction (MNF). The dissertation mainly focuses on using adaptive local spectral models to fur- ther improve the performance of ISOMAP algorithms by addressing local noise issues and perform guided landmark selection and nearest neighborhood construction in local spectral subsets. This work could benefit the performance of common hyperspectral image analysis tasks, such as classification, target detection, etc., but also keep the computational burden low. This work is based on and improves the previous ENH- ISOMAP algorithm in various ways. The workflow is based on a unified local spectral subsetting framework. Embedding spaces in local spectral subsets as local noise models are first proposed and used to perform noise estimation, MNF regression and guided landmark selection in a local sense. Passive and active methods are proposed and ver- ified to select landmarks deliberately to ensure local geometric structure coverage and local noise avoidance. Then, a novel local spectral adaptive method is used to construct the k-nearest neighbor graph. Finally, a global MNF transformation in the manifold space is also introduced to further compress the signal dimensions. The workflow is implemented using C++ with multiple implementation optimizations, including using heterogeneous computing platforms that are available in personal computers. The re- sults are presented and evaluated by Jeffries-Matsushita separability metric, as well as the classification accuracy of supervised classifiers. The proposed workflow shows sig- nificant and stable improvements over the dimensionality reduction performance from traditional MNF and ENH-ISOMAP on various hyperspectral datasets. The computa- tional speed of the proposed implementation is also improved

    Learning Sparse High Dimensional Filters: Image Filtering, Dense CRFs and Bilateral Neural Networks

    Full text link
    Bilateral filters have wide spread use due to their edge-preserving properties. The common use case is to manually choose a parametric filter type, usually a Gaussian filter. In this paper, we will generalize the parametrization and in particular derive a gradient descent algorithm so the filter parameters can be learned from data. This derivation allows to learn high dimensional linear filters that operate in sparsely populated feature spaces. We build on the permutohedral lattice construction for efficient filtering. The ability to learn more general forms of high-dimensional filters can be used in several diverse applications. First, we demonstrate the use in applications where single filter applications are desired for runtime reasons. Further, we show how this algorithm can be used to learn the pairwise potentials in densely connected conditional random fields and apply these to different image segmentation tasks. Finally, we introduce layers of bilateral filters in CNNs and propose bilateral neural networks for the use of high-dimensional sparse data. This view provides new ways to encode model structure into network architectures. A diverse set of experiments empirically validates the usage of general forms of filters

    High Dimensional Data Set Analysis Using a Large-Scale Manifold Learning Approach

    Get PDF
    Because of technological advances, a trend occurs for data sets increasing in size and dimensionality. Processing these large scale data sets is challenging for conventional computers due to computational limitations. A framework for nonlinear dimensionality reduction on large databases is presented that alleviates the issue of large data sets through sampling, graph construction, manifold learning, and embedding. Neighborhood selection is a key step in this framework and a potential area of improvement. The standard approach to neighborhood selection is setting a fixed neighborhood. This could be a fixed number of neighbors or a fixed neighborhood size. Each of these has its limitations due to variations in data density. A novel adaptive neighbor-selection algorithm is presented to enhance performance by incorporating sparse ℓ 1-norm based optimization. These enhancements are applied to the graph construction and embedding modules of the original framework. As validation of the proposed ℓ1-based enhancement, experiments are conducted on these modules using publicly available benchmark data sets. The two approaches are then applied to a large scale magnetic resonance imaging (MRI) data set for brain tumor progression prediction. Results showed that the proposed approach outperformed linear methods and other traditional manifold learning algorithms
    corecore