72 research outputs found

    Algorithmic advances in learning from large dimensional matrices and scientific data

    Get PDF
    University of Minnesota Ph.D. dissertation.May 2018. Major: Computer Science. Advisor: Yousef Saad. 1 computer file (PDF); xi, 196 pages.This thesis is devoted to answering a range of questions in machine learning and data analysis related to large dimensional matrices and scientific data. Two key research objectives connect the different parts of the thesis: (a) development of fast, efficient, and scalable algorithms for machine learning which handle large matrices and high dimensional data; and (b) design of learning algorithms for scientific data applications. The work combines ideas from multiple, often non-traditional, fields leading to new algorithms, new theory, and new insights in different applications. The first of the three parts of this thesis explores numerical linear algebra tools to develop efficient algorithms for machine learning with reduced computation cost and improved scalability. Here, we first develop inexpensive algorithms combining various ideas from linear algebra and approximation theory for matrix spectrum related problems such as numerical rank estimation, matrix function trace estimation including log-determinants, Schatten norms, and other spectral sums. We also propose a new method which simultaneously estimates the dimension of the dominant subspace of covariance matrices and obtains an approximation to the subspace. Next, we consider matrix approximation problems such as low rank approximation, column subset selection, and graph sparsification. We present a new approach based on multilevel coarsening to compute these approximations for large sparse matrices and graphs. Lastly, on the linear algebra front, we devise a novel algorithm based on rank shrinkage for the dictionary learning problem, learning a small set of dictionary columns which best represent the given data. The second part of this thesis focuses on exploring novel non-traditional applications of information theory and codes, particularly in solving problems related to machine learning and high dimensional data analysis. Here, we first propose new matrix sketching methods using codes for obtaining low rank approximations of matrices and solving least squares regression problems. Next, we demonstrate that codewords from certain coding scheme perform exceptionally well for the group testing problem. Lastly, we present a novel machine learning application for coding theory, that of solving large scale multilabel classification problems. We propose a new algorithm for multilabel classification which is based on group testing and codes. The algorithm has a simple inexpensive prediction method, and the error correction capabilities of codes are exploited for the first time to correct prediction errors. The third part of the thesis focuses on devising robust and stable learning algorithms, which yield results that are interpretable from specific scientific application viewpoint. We present Union of Intersections (UoI), a flexible, modular, and scalable framework for statistical-machine learning problems. We then adapt this framework to develop new algorithms for matrix decomposition problems such as nonnegative matrix factorization (NMF) and CUR decomposition. We apply these new methods to data from Neuroscience applications in order to obtain insights into the functionality of the brain. Finally, we consider the application of material informatics, learning from materials data. Here, we deploy regression techniques on materials data to predict physical properties of materials

    Sparse feature learning for image analysis in segmentation, classification, and disease diagnosis.

    Get PDF
    The success of machine learning algorithms generally depends on intermediate data representation, called features that disentangle the hidden factors of variation in data. Moreover, machine learning models are required to be generalized, in order to reduce the specificity or bias toward the training dataset. Unsupervised feature learning is useful in taking advantage of large amount of unlabeled data, which is available to capture these variations. However, learned features are required to capture variational patterns in data space. In this dissertation, unsupervised feature learning with sparsity is investigated for sparse and local feature extraction with application to lung segmentation, interpretable deep models, and Alzheimer\u27s disease classification. Nonnegative Matrix Factorization, Autoencoder and 3D Convolutional Autoencoder are used as architectures or models for unsupervised feature learning. They are investigated along with nonnegativity, sparsity and part-based representation constraints for generalized and transferable feature extraction

    CONTEXT AWARE PRIVACY PRESERVING CLUSTERING AND CLASSIFICATION

    Get PDF
    Data are valuable assets to any organizations or individuals. Data are sources of useful information which is a big part of decision making. All sectors have potential to benefit from having information. Commerce, health, and research are some of the fields that have benefited from data. On the other hand, the availability of the data makes it easy for anyone to exploit the data, which in many cases are private confidential data. It is necessary to preserve the confidentiality of the data. We study two categories of privacy: Data Value Hiding and Data Pattern Hiding. Privacy is a huge concern but equally important is the concern of data utility. Data should avoid privacy breach yet be usable. Although these two objectives are contradictory and achieving both at the same time is challenging, having knowledge of the purpose and the manner in which it will be utilized helps. In this research, we focus on some particular situations for clustering and classification problems and strive to balance the utility and privacy of the data. In the first part of this dissertation, we propose Nonnegative Matrix Factorization (NMF) based techniques that accommodate constraints defined explicitly into the update rules. These constraints determine how the factorization takes place leading to the favorable results. These methods are designed to make alterations on the matrices such that user-specified cluster properties are introduced. These methods can be used to preserve data value as well as data pattern. As NMF and K-means are proven to be equivalent, NMF is an ideal choice for pattern hiding for clustering problems. In addition to the NMF based methods, we propose methods that take into account the data structures and the attribute properties for the classification problems. We separate the work into two different parts: linear classifiers and nonlinear classifiers. We propose two different solutions based on the classifiers. We study the effect of distortion on the utility of data. We propose three distortion measurement metrics which demonstrate better characteristics than the traditional metrics. The effectiveness of the measures is examined on different benchmark datasets. The result shows that the methods have the desirable properties such as invariance to translation, rotation, and scaling

    Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks

    Get PDF
    The growing energy and performance costs of deep learning have driven the community to reduce the size of neural networks by selectively pruning components. Similarly to their biological counterparts, sparse networks generalize just as well, sometimes even better than, the original dense networks. Sparsity promises to reduce the memory footprint of regular networks to fit mobile devices, as well as shorten training time for ever growing networks. In this paper, we survey prior work on sparsity in deep learning and provide an extensive tutorial of sparsification for both inference and training. We describe approaches to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice. Our work distills ideas from more than 300 research papers and provides guidance to practitioners who wish to utilize sparsity today, as well as to researchers whose goal is to push the frontier forward. We include the necessary background on mathematical methods in sparsification, describe phenomena such as early structure adaptation, the intricate relations between sparsity and the training process, and show techniques for achieving acceleration on real hardware. We also define a metric of pruned parameter efficiency that could serve as a baseline for comparison of different sparse networks. We close by speculating on how sparsity can improve future workloads and outline major open problems in the field

    Efficient interior point algorithms for large scale convex optimization problems

    Get PDF
    Interior point methods (IPMs) are among the most widely used algorithms for convex optimization problems. They are applicable to a wide range of problems, including linear, quadratic, nonlinear, conic and semidefinite programming problems, requiring a polynomial number of iterations to find an accurate approximation of the primal-dual solution. The formidable convergence properties of IPMs come with a fundamental drawback: the numerical linear algebra involved becomes progressively more and more challenging as the IPM converges towards optimality. In particular, solving the linear systems to find the Newton directions requires most of the computational effort of an IPM. Proposed remedies to alleviate this phenomenon include regularization techniques, predictor-corrector schemes, purposely developed preconditioners, low-rank update strategies, to mention a few. For problems of very large scale, this unpleasant characteristic of IPMs becomes a more and more problematic feature, since any technique used must be efficient and scalable in order to maintain acceptable computational requirements. In this Thesis, we deal with convex linear and quadratic problems of large “dimension”: we use this term in a broader sense than just a synonym for “size” of the problem. The instances considered can be either problems with a large number of variables and/or constraints but with a sparse structure, or problems with a moderate number of variables and/or constraints but with a dense structure. Both these type of problems require very efficient strategies to be used during the algorithm, even though the corresponding difficulties arise for different reasons. The first application that we consider deals with a moderate size quadratic problem where the quadratic term is 100% dense; this problem arises from X-ray tomographic imaging reconstruction, in particular with the goal of separating the distribution of two materials present in the observed sample. A novel non-convex regularizer is introduced for this purpose; convexity of the overall problem is maintained by careful choice of the parameters. We derive a specialized interior point method for this problem and an appropriate preconditioner for the normal equations linear system, to be used without ever forming the fully dense matrices involved. The next major contribution is related to the issue of efficiently computing the Newton direction during IPMs. When an iterative method is applied to solve the linear equation system in IPMs, the attention is usually placed on accelerating their convergence by designing appropriate preconditioners, but the linear solver is applied as a black box with a standard termination criterion which asks for a sufficient reduction of the residual in the linear system. Such an approach often leads to an unnecessary “over-solving” of linear equations. We propose new indicators for the early termination of the inner iterations and test them on a set of large scale quadratic optimization problems. Evidence gathered from these computational experiments shows that the new technique delivers significant improvements in terms of inner (linear) iterations and those translate into significant savings of the IPM solution time. The last application considered is discrete optimal transport (OT) problems; these kind of problems give rise to very large linear programs with highly structured matrices. Solutions of such problems are expected to be sparse, that is only a small subset of entries in the optimal solution is expected to be nonzero. We derive an IPM for the standard OT formulation, which exploits a column-generation-like technique to force all intermediate iterates to be as sparse as possible. We prove theoretical results about the sparsity pattern of the optimal solution and we propose to mix iterative and direct linear solvers in an efficient way, to keep computational time and memory requirement as low as possible. We compare the proposed method with two state-of-the-art solvers and show that it can compete with the best network optimization tools in terms of computational time and memory usage. We perform experiments with problems reaching more than four billion variables and demonstrate the robustness of the proposed method. We consider also the optimal transport problem on sparse graphs and present a primal-dual regularized IPM to solve it. We prove that the introduction of the regularization allows us to use sparsified versions of the normal equations system to inexpensively generate inexact IPM directions. The proposed method is shown to have polynomial complexity and to outperform a very efficient network simplex implementation, for problems with up to 50 million variables

    Dynamical low-rank training of neural networks

    Get PDF
    Neural networks have achieved tremendous success in a large variety of applications. However, their space and time computational demand can limit their usage in resource limited devices. At the same time, overparametrization seems to be necessary in order to overcome the highly non-convex nature of the training optimization problem. An optimal trade-off is then to be found in order to reduce networks' dimension while mantaining high performance. Popular approaches in the current literature are based on pruning techniques that look for subnetworks able to mantain approximately the initial performance. Nevertheless, these techniques often are not able to reduce the memory footprint of the training phase. In this thesis we will present DLRT, a training algorithm that looks for "low-rank subnetworks" by using DLRA theory and techniques. These subnetworks and their ranks are determined and adapted already during the training phase, allowing the overall time and memory resources required by both training and evaluation phases to be reduced significantly.Neural networks have achieved tremendous success in a large variety of applications. However, their space and time computational demand can limit their usage in resource limited devices. At the same time, overparametrization seems to be necessary in order to overcome the highly non-convex nature of the training optimization problem. An optimal trade-off is then to be found in order to reduce networks' dimension while mantaining high performance. Popular approaches in the current literature are based on pruning techniques that look for subnetworks able to mantain approximately the initial performance. Nevertheless, these techniques often are not able to reduce the memory footprint of the training phase. In this thesis we will present DLRT, a training algorithm that looks for "low-rank subnetworks" by using DLRA theory and techniques. These subnetworks and their ranks are determined and adapted already during the training phase, allowing the overall time and memory resources required by both training and evaluation phases to be reduced significantly
    • 

    corecore