77,967 research outputs found

    Algorithms for feature selection and pattern recognition on Grassmann manifolds

    Get PDF
    Includes bibliographical references.2015 Summer.This dissertation presents three distinct application-driven research projects united by ideas and topics from geometric data analysis, optimization, computational topology, and machine learning. We first consider hyperspectral band selection problem solved by using sparse support vector machines (SSVMs). A supervised embedded approach is proposed using the property of SSVMs to exhibit a model structure that includes a clearly identifiable gap between zero and non-zero feature vector weights that permits important bands to be definitively selected in conjunction with the classification problem. An SSVM is trained using bootstrap aggregating to obtain a sample of SSVM models to reduce variability in the band selection process. This preliminary sample approach for band selection is followed by a secondary band selection which involves retraining the SSVM to further reduce the set of bands retained. We propose and compare three adaptations of the SSVM band selection algorithm for the multiclass problem. We illustrate the performance of these methods on two benchmark hyperspectral data sets. Second, we propose an approach for capturing the signal variability in data using the framework of the Grassmann manifold (Grassmannian). Labeled points from each class are sampled and used to form abstract points on the Grassmannian. The resulting points have representations as orthonormal matrices and as such do not reside in Euclidean space in the usual sense. There are a variety of metrics which allow us to determine distance matrices that can be used to realize the Grassmannian as an embedding in Euclidean space. Multidimensional scaling (MDS) determines a low dimensional Euclidean embedding of the manifold, preserving or approximating the Grassmannian geometry based on the distance measure. We illustrate that we can achieve an isometric embedding of the Grassmann manifold using the chordal metric while this is not the case with other distances. However, non-isometric embeddings generated by using the smallest principal angle pseudometric on the Grassmannian lead to the best classification results: we observe that as the dimension of the Grassmannian grows, the accuracy of the classification grows to 100% in binary classification experiments. To build a classification model, we use SSVMs to perform simultaneous dimension selection. The resulting classifier selects a subset of dimensions of the embedding without loss in classification performance. Lastly, we present an application of persistent homology to the detection of chemical plumes in hyperspectral movies. The pixels of the raw hyperspectral data cubes are mapped to the geometric framework of the Grassmann manifold where they are analyzed, contrasting our approach with the more standard framework in Euclidean space. An advantage of this approach is that it allows the time slices in a hyperspectral movie to be collapsed to a sequence of points in such a way that some of the key structure within and between the slices is encoded by the points on the Grassmannian. This motivates the search for topological structure, associated with the evolution of the frames of a hyperspectral movie, within the corresponding points on the manifold. The proposed framework affords the processing of large data sets, such as the hyperspectral movies explored in this investigation, while retaining valuable discriminative information. For a particular choice of a distance metric on the Grassmannian, it is possible to generate topological signals that capture changes in the scene after a chemical release

    Dimension reduction for linear separation with curvilinear distances

    Get PDF
    Any high dimensional data in its original raw form may contain obviously classifiable clusters which are difficult to identify given the high-dimension representation. In reducing the dimensions it may be possible to perform a simple classification technique to extract this cluster information whilst retaining the overall topology of the data set. The supervised method presented here takes a high dimension data set consisting of multiple clusters and employs curvilinear distance as a relation between points, projecting in a lower dimension according to this relationship. This representation allows for linear separation of the non-separable high dimensional cluster data and the classification to a cluster of any successive unseen data point extracted from the same higher dimension

    On a generalization of distance sets

    Get PDF
    A subset XX in the dd-dimensional Euclidean space is called a kk-distance set if there are exactly kk distinct distances between two distinct points in XX and a subset XX is called a locally kk-distance set if for any point xx in XX, there are at most kk distinct distances between xx and other points in XX. Delsarte, Goethals, and Seidel gave the Fisher type upper bound for the cardinalities of kk-distance sets on a sphere in 1977. In the same way, we are able to give the same bound for locally kk-distance sets on a sphere. In the first part of this paper, we prove that if XX is a locally kk-distance set attaining the Fisher type upper bound, then determining a weight function ww, (X,w)(X,w) is a tight weighted spherical 2k2k-design. This result implies that locally kk-distance sets attaining the Fisher type upper bound are kk-distance sets. In the second part, we give a new absolute bound for the cardinalities of kk-distance sets on a sphere. This upper bound is useful for kk-distance sets for which the linear programming bound is not applicable. In the third part, we discuss about locally two-distance sets in Euclidean spaces. We give an upper bound for the cardinalities of locally two-distance sets in Euclidean spaces. Moreover, we prove that the existence of a spherical two-distance set in (d1)(d-1)-space which attains the Fisher type upper bound is equivalent to the existence of a locally two-distance set but not a two-distance set in dd-space with more than d(d+1)/2d(d+1)/2 points. We also classify optimal (largest possible) locally two-distance sets for dimensions less than eight. In addition, we determine the maximum cardinalities of locally two-distance sets on a sphere for dimensions less than forty.Comment: 17 pages, 1 figur

    Exact Computation of a Manifold Metric, via Lipschitz Embeddings and Shortest Paths on a Graph

    Full text link
    Data-sensitive metrics adapt distances locally based the density of data points with the goal of aligning distances and some notion of similarity. In this paper, we give the first exact algorithm for computing a data-sensitive metric called the nearest neighbor metric. In fact, we prove the surprising result that a previously published 33-approximation is an exact algorithm. The nearest neighbor metric can be viewed as a special case of a density-based distance used in machine learning, or it can be seen as an example of a manifold metric. Previous computational research on such metrics despaired of computing exact distances on account of the apparent difficulty of minimizing over all continuous paths between a pair of points. We leverage the exact computation of the nearest neighbor metric to compute sparse spanners and persistent homology. We also explore the behavior of the metric built from point sets drawn from an underlying distribution and consider the more general case of inputs that are finite collections of path-connected compact sets. The main results connect several classical theories such as the conformal change of Riemannian metrics, the theory of positive definite functions of Schoenberg, and screw function theory of Schoenberg and Von Neumann. We develop novel proof techniques based on the combination of screw functions and Lipschitz extensions that may be of independent interest.Comment: 15 page
    corecore