1,779 research outputs found

    Multiclass Data Segmentation using Diffuse Interface Methods on Graphs

    Full text link
    We present two graph-based algorithms for multiclass segmentation of high-dimensional data. The algorithms use a diffuse interface model based on the Ginzburg-Landau functional, related to total variation compressed sensing and image processing. A multiclass extension is introduced using the Gibbs simplex, with the functional's double-well potential modified to handle the multiclass case. The first algorithm minimizes the functional using a convex splitting numerical scheme. The second algorithm is a uses a graph adaptation of the classical numerical Merriman-Bence-Osher (MBO) scheme, which alternates between diffusion and thresholding. We demonstrate the performance of both algorithms experimentally on synthetic data, grayscale and color images, and several benchmark data sets such as MNIST, COIL and WebKB. We also make use of fast numerical solvers for finding the eigenvectors and eigenvalues of the graph Laplacian, and take advantage of the sparsity of the matrix. Experiments indicate that the results are competitive with or better than the current state-of-the-art multiclass segmentation algorithms.Comment: 14 page

    Algorithms for feature selection and pattern recognition on Grassmann manifolds

    Get PDF
    Includes bibliographical references.2015 Summer.This dissertation presents three distinct application-driven research projects united by ideas and topics from geometric data analysis, optimization, computational topology, and machine learning. We first consider hyperspectral band selection problem solved by using sparse support vector machines (SSVMs). A supervised embedded approach is proposed using the property of SSVMs to exhibit a model structure that includes a clearly identifiable gap between zero and non-zero feature vector weights that permits important bands to be definitively selected in conjunction with the classification problem. An SSVM is trained using bootstrap aggregating to obtain a sample of SSVM models to reduce variability in the band selection process. This preliminary sample approach for band selection is followed by a secondary band selection which involves retraining the SSVM to further reduce the set of bands retained. We propose and compare three adaptations of the SSVM band selection algorithm for the multiclass problem. We illustrate the performance of these methods on two benchmark hyperspectral data sets. Second, we propose an approach for capturing the signal variability in data using the framework of the Grassmann manifold (Grassmannian). Labeled points from each class are sampled and used to form abstract points on the Grassmannian. The resulting points have representations as orthonormal matrices and as such do not reside in Euclidean space in the usual sense. There are a variety of metrics which allow us to determine distance matrices that can be used to realize the Grassmannian as an embedding in Euclidean space. Multidimensional scaling (MDS) determines a low dimensional Euclidean embedding of the manifold, preserving or approximating the Grassmannian geometry based on the distance measure. We illustrate that we can achieve an isometric embedding of the Grassmann manifold using the chordal metric while this is not the case with other distances. However, non-isometric embeddings generated by using the smallest principal angle pseudometric on the Grassmannian lead to the best classification results: we observe that as the dimension of the Grassmannian grows, the accuracy of the classification grows to 100% in binary classification experiments. To build a classification model, we use SSVMs to perform simultaneous dimension selection. The resulting classifier selects a subset of dimensions of the embedding without loss in classification performance. Lastly, we present an application of persistent homology to the detection of chemical plumes in hyperspectral movies. The pixels of the raw hyperspectral data cubes are mapped to the geometric framework of the Grassmann manifold where they are analyzed, contrasting our approach with the more standard framework in Euclidean space. An advantage of this approach is that it allows the time slices in a hyperspectral movie to be collapsed to a sequence of points in such a way that some of the key structure within and between the slices is encoded by the points on the Grassmannian. This motivates the search for topological structure, associated with the evolution of the frames of a hyperspectral movie, within the corresponding points on the manifold. The proposed framework affords the processing of large data sets, such as the hyperspectral movies explored in this investigation, while retaining valuable discriminative information. For a particular choice of a distance metric on the Grassmannian, it is possible to generate topological signals that capture changes in the scene after a chemical release

    PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison

    Full text link
    The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists. The present study introduces an accessible, curated, and developing public benchmark resource to facilitate identification of the strengths and weaknesses of different machine learning methodologies. We compare meta-features among the current set of benchmark datasets in this resource to characterize the diversity of available data. Finally, we apply a number of established machine learning methods to the entire benchmark suite and analyze how datasets and algorithms cluster in terms of performance. This work is an important first step towards understanding the limitations of popular benchmarking suites and developing a resource that connects existing benchmarking standards to more diverse and efficient standards in the future.Comment: 14 pages, 5 figures, submitted for review to JML

    Fast Label Embeddings via Randomized Linear Algebra

    Full text link
    Many modern multiclass and multilabel problems are characterized by increasingly large output spaces. For these problems, label embeddings have been shown to be a useful primitive that can improve computational and statistical efficiency. In this work we utilize a correspondence between rank constrained estimation and low dimensional label embeddings that uncovers a fast label embedding algorithm which works in both the multiclass and multilabel settings. The result is a randomized algorithm whose running time is exponentially faster than naive algorithms. We demonstrate our techniques on two large-scale public datasets, from the Large Scale Hierarchical Text Challenge and the Open Directory Project, where we obtain state of the art results.Comment: To appear in the proceedings of the ECML/PKDD 2015 conference. Reference implementation available at https://github.com/pmineiro/randembe

    Automatic Detection and Intensity Estimation of Spontaneous Smiles

    Get PDF
    Both the occurrence and intensity of facial expression are critical to what the face reveals. While much progress has been made towards the automatic detection of expression occurrence, controversy exists about how best to estimate expression intensity. Broadly, one approach is to adapt classifiers trained on binary ground truth to estimate expression intensity. An alternative approach is to explicitly train classifiers for the estimation of expression intensity. We investigated this issue by comparing multiple methods for binary smile detection and smile intensity estimation using two large databases of spontaneous expressions. SIFT and Gabor were used for feature extraction; Laplacian Eigenmap and PCA were used for dimensionality reduction; and binary SVM margins, multiclass SVMs, and ε-SVR models were used for prediction. Both multiclass SVMs and ε-SVR classifiers explicitly trained on intensity ground truth outperformed binary SVM margins for smile intensity estimation. A surprising finding was that multiclass SVMs also outperformed binary SVM margins on binary smile detection. This suggests that training on intensity ground truth is worthwhile even for binary expression detection

    Classification of EMG signals to control a prosthetic hand using time-frequesncy representations and Support Vector Machines

    Get PDF
    Myoelectric signals (MES) are viable control signals for externally-powered prosthetic devices. They may improve both the functionality and the cosmetic appearance of these devices. Conventional controllers, based on the signal\u27s amplitude features in the control strategy, lack a large number of controllable states because signals from independent muscles are required for each degree of freedom (DoF) of the device. Myoelectric pattern recognition systems can overcome this problem by discriminating different residual muscle movements instead of contraction levels of individual muscles. However, the lack of long-term robustness in these systems and the design of counter-intuitive control/command interfaces have resulted in low clinical acceptance levels. As a result, the development of robust, easy to use myoelectric pattern recognition-based control systems is the main challenge in the field of prosthetic control. This dissertation addresses the need to improve the controller\u27s robustness by designing a pattern recognition-based control system that classifies the user\u27s intention to actuate the prosthesis. This system is part of a cost-effective prosthetic hand prototype developed to achieve an acceptable level of functional dexterity using a simple to use interface. A Support Vector Machine (SVM) classifier implemented as a directed acyclic graph (DAG) was created. It used wavelet features from multiple surface EMG channels strategically placed over five forearm muscles. The classifiers were evaluated across seven subjects. They were able to discriminate five wrist motions with an accuracy of 91.5%. Variations of electrode locations were artificially introduced at each recording session as part of the procedure, to obtain data that accounted for the changes in the user\u27s muscle patterns over time. The generalization ability of the SVM was able to capture most of the variability in the data and to maintain an average classification accuracy of 90%. Two principal component analysis (PCA) frameworks were also evaluated to study the relationship between EMG recording sites and the need for feature space reduction. The dimension of the new feature set was reduced with the goal of improving the classification accuracy and reducing the computation time. The analysis indicated that the projection of the wavelet features into a reduced feature space did not significantly improve the accuracy and the computation time. However, decreasing the number of wavelet decomposition levels did lower the computational load without compromising the average signal classification accuracy. Based on the results of this work, a myoelectric pattern recognition-based control system that uses an SVM classifier applied to time-frequency features may be used to discriminate muscle contraction patterns for prosthetic applications
    corecore