27 research outputs found

    Smoothed Separable Nonnegative Matrix Factorization

    Full text link
    Given a set of data points belonging to the convex hull of a set of vertices, a key problem in data analysis and machine learning is to estimate these vertices in the presence of noise. Many algorithms have been developed under the assumption that there is at least one nearby data point to each vertex; two of the most widely used ones are vertex component analysis (VCA) and the successive projection algorithm (SPA). This assumption is known as the pure-pixel assumption in blind hyperspectral unmixing, and as the separability assumption in nonnegative matrix factorization. More recently, Bhattacharyya and Kannan (ACM-SIAM Symposium on Discrete Algorithms, 2020) proposed an algorithm for learning a latent simplex (ALLS) that relies on the assumption that there is more than one nearby data point for each vertex. In that scenario, ALLS is probalistically more robust to noise than algorithms based on the separability assumption. In this paper, inspired by ALLS, we propose smoothed VCA (SVCA) and smoothed SPA (SSPA) that generalize VCA and SPA by assuming the presence of several nearby data points to each vertex. We illustrate the effectiveness of SVCA and SSPA over VCA, SPA and ALLS on synthetic data sets, and on the unmixing of hyperspectral images.Comment: 27 pages, 11 figure

    Random Separating Hyperplane Theorem and Learning Polytopes

    Full text link
    The Separating Hyperplane theorem is a fundamental result in Convex Geometry with myriad applications. Our first result, Random Separating Hyperplane Theorem (RSH), is a strengthening of this for polytopes. \rsh asserts that if the distance between aa and a polytope KK with kk vertices and unit diameter in ℜd\Re^d is at least δ\delta, where δ\delta is a fixed constant in (0,1)(0,1), then a randomly chosen hyperplane separates aa and KK with probability at least 1/poly(k)1/poly(k) and margin at least Ω(δ/d)\Omega \left(\delta/\sqrt{d} \right). An immediate consequence of our result is the first near optimal bound on the error increase in the reduction from a Separation oracle to an Optimization oracle over a polytope. RSH has algorithmic applications in learning polytopes. We consider a fundamental problem, denoted the ``Hausdorff problem'', of learning a unit diameter polytope KK within Hausdorff distance δ\delta, given an optimization oracle for KK. Using RSH, we show that with polynomially many random queries to the optimization oracle, KK can be approximated within error O(δ)O(\delta). To our knowledge this is the first provable algorithm for the Hausdorff Problem. Building on this result, we show that if the vertices of KK are well-separated, then an optimization oracle can be used to generate a list of points, each within Hausdorff distance O(δ)O(\delta) of KK, with the property that the list contains a point close to each vertex of KK. Further, we show how to prune this list to generate a (unique) approximation to each vertex of the polytope. We prove that in many latent variable settings, e.g., topic modeling, LDA, optimization oracles do exist provided we project to a suitable SVD subspace. Thus, our work yields the first efficient algorithm for finding approximations to the vertices of the latent polytope under the well-separatedness assumption

    Semi-supervised Eigenvectors for Large-scale Locally-biased Learning

    Full text link
    In many applications, one has side information, e.g., labels that are provided in a semi-supervised manner, about a specific target region of a large data set, and one wants to perform machine learning and data analysis tasks "nearby" that prespecified target region. For example, one might be interested in the clustering structure of a data graph near a prespecified "seed set" of nodes, or one might be interested in finding partitions in an image that are near a prespecified "ground truth" set of pixels. Locally-biased problems of this sort are particularly challenging for popular eigenvector-based machine learning and data analysis tools. At root, the reason is that eigenvectors are inherently global quantities, thus limiting the applicability of eigenvector-based methods in situations where one is interested in very local properties of the data. In this paper, we address this issue by providing a methodology to construct semi-supervised eigenvectors of a graph Laplacian, and we illustrate how these locally-biased eigenvectors can be used to perform locally-biased machine learning. These semi-supervised eigenvectors capture successively-orthogonalized directions of maximum variance, conditioned on being well-correlated with an input seed set of nodes that is assumed to be provided in a semi-supervised manner. We show that these semi-supervised eigenvectors can be computed quickly as the solution to a system of linear equations; and we also describe several variants of our basic method that have improved scaling properties. We provide several empirical examples demonstrating how these semi-supervised eigenvectors can be used to perform locally-biased learning; and we discuss the relationship between our results and recent machine learning algorithms that use global eigenvectors of the graph Laplacian

    Large-scale Machine Learning in High-dimensional Datasets

    Get PDF

    Algorithmic advances in learning from large dimensional matrices and scientific data

    Get PDF
    University of Minnesota Ph.D. dissertation.May 2018. Major: Computer Science. Advisor: Yousef Saad. 1 computer file (PDF); xi, 196 pages.This thesis is devoted to answering a range of questions in machine learning and data analysis related to large dimensional matrices and scientific data. Two key research objectives connect the different parts of the thesis: (a) development of fast, efficient, and scalable algorithms for machine learning which handle large matrices and high dimensional data; and (b) design of learning algorithms for scientific data applications. The work combines ideas from multiple, often non-traditional, fields leading to new algorithms, new theory, and new insights in different applications. The first of the three parts of this thesis explores numerical linear algebra tools to develop efficient algorithms for machine learning with reduced computation cost and improved scalability. Here, we first develop inexpensive algorithms combining various ideas from linear algebra and approximation theory for matrix spectrum related problems such as numerical rank estimation, matrix function trace estimation including log-determinants, Schatten norms, and other spectral sums. We also propose a new method which simultaneously estimates the dimension of the dominant subspace of covariance matrices and obtains an approximation to the subspace. Next, we consider matrix approximation problems such as low rank approximation, column subset selection, and graph sparsification. We present a new approach based on multilevel coarsening to compute these approximations for large sparse matrices and graphs. Lastly, on the linear algebra front, we devise a novel algorithm based on rank shrinkage for the dictionary learning problem, learning a small set of dictionary columns which best represent the given data. The second part of this thesis focuses on exploring novel non-traditional applications of information theory and codes, particularly in solving problems related to machine learning and high dimensional data analysis. Here, we first propose new matrix sketching methods using codes for obtaining low rank approximations of matrices and solving least squares regression problems. Next, we demonstrate that codewords from certain coding scheme perform exceptionally well for the group testing problem. Lastly, we present a novel machine learning application for coding theory, that of solving large scale multilabel classification problems. We propose a new algorithm for multilabel classification which is based on group testing and codes. The algorithm has a simple inexpensive prediction method, and the error correction capabilities of codes are exploited for the first time to correct prediction errors. The third part of the thesis focuses on devising robust and stable learning algorithms, which yield results that are interpretable from specific scientific application viewpoint. We present Union of Intersections (UoI), a flexible, modular, and scalable framework for statistical-machine learning problems. We then adapt this framework to develop new algorithms for matrix decomposition problems such as nonnegative matrix factorization (NMF) and CUR decomposition. We apply these new methods to data from Neuroscience applications in order to obtain insights into the functionality of the brain. Finally, we consider the application of material informatics, learning from materials data. Here, we deploy regression techniques on materials data to predict physical properties of materials

    Principal surfaces from unsupervised kernel regression

    Full text link

    Frameworks for High Dimensional Convex Optimization

    Get PDF
    We present novel, efficient algorithms for solving extremely large optimization problems. A significant bottleneck today is that as the size of datasets grow, researchers across disciplines desire to solve prohibitively massive optimization problems. In this thesis, we present methods to compress optimization problems. The general goal is to represent a huge problem as a smaller problem or set of smaller problems, while still retaining enough information to ensure provable guarantees on solution quality and run time. We apply this approach to the following three settings. First, we propose a framework for accelerating both linear program solvers and convex solvers for problems with linear constraints. Our focus is on a class of problems for which data is either very costly, or hard to obtain. In these situations, the number of data points m available is much smaller than the number of variables, n. In a machine learning setting, this regime is increasingly prevalent since it is often advantageous to consider larger and larger feature spaces, while not necessarily obtaining proportionally more data. Analytically, we provide worst-case guarantees on both the runtime and the quality of the solution produced. Empirically, we show that our framework speeds up state-of-the-art commercial solvers by two orders of magnitude, while maintaining a near-optimal solution. Second, we propose a novel approach for distributed optimization which uses far fewer messages than existing methods. We consider a setting in which the problem data are distributed over the nodes. We provide worst-case guarantees on the performance with respect to the amount of communication it requires and the quality of the solution. The algorithm uses O(log(n+m)) messages with high probability. We note that this is an exponential reduction compared to the O(n) communication required during each round of traditional consensus based approaches. In terms of solution quality, our algorithm produces a feasible, near optimal solution. Numeric results demonstrate that the approximation error matches that of ADMM in many cases, while using orders-of-magnitude less communication. Lastly, we propose and analyze a provably accurate long-step infeasible Interior Point Algorithm (IPM) for linear programming. The core computational bottleneck in IPMs is the need to solve a linear system of equations at each iteration. We employ sketching techniques to make the linear system computation lighter, by handling well-known ill-conditioning problems that occur when using iterative solvers in IPMs for LPs. In particular, we propose a preconditioned Conjugate Gradient iterative solver for the linear system. Our sketching strategy makes the condition number of the preconditioned system provably small. In practice we demonstrate that our approach significantly reduces the condition number of the linear system, and thus allows for more efficient solving on a range of benchmark datasets.</p
    corecore