11 research outputs found

    Error bounds for Lanczos-based matrix function approximation

    Full text link
    We analyze the Lanczos method for matrix function approximation (Lanczos-FA), an iterative algorithm for computing f(A)bf(\mathbf{A}) \mathbf{b} when A\mathbf{A} is a Hermitian matrix and b\mathbf{b} is a given mathbftor. Assuming that f:C→Cf : \mathbb{C} \rightarrow \mathbb{C} is piecewise analytic, we give a framework, based on the Cauchy integral formula, which can be used to derive {\em a priori} and \emph{a posteriori} error bounds for Lanczos-FA in terms of the error of Lanczos used to solve linear systems. Unlike many error bounds for Lanczos-FA, these bounds account for fine-grained properties of the spectrum of A\mathbf{A}, such as clustered or isolated eigenvalues. Our results are derived assuming exact arithmetic, but we show that they are easily extended to finite precision computations using existing theory about the Lanczos algorithm in finite precision. We also provide generalized bounds for the Lanczos method used to approximate quadratic forms bHf(A)b\mathbf{b}^\textsf{H} f(\mathbf{A}) \mathbf{b}, and demonstrate the effectiveness of our bounds with numerical experiments

    Feature Grouping and Sparse Principal Component Analysis

    Full text link
    Sparse Principal Component Analysis (SPCA) is widely used in data processing and dimension reduction; it uses the lasso to produce modified principal components with sparse loadings for better interpretability. However, sparse PCA never considers an additional grouping structure where the loadings share similar coefficients (i.e., feature grouping), besides a special group with all coefficients being zero (i.e., feature selection). In this paper, we propose a novel method called Feature Grouping and Sparse Principal Component Analysis (FGSPCA) which allows the loadings to belong to disjoint homogeneous groups, with sparsity as a special case. The proposed FGSPCA is a subspace learning method designed to simultaneously perform grouping pursuit and feature selection, by imposing a non-convex regularization with naturally adjustable sparsity and grouping effect. To solve the resulting non-convex optimization problem, we propose an alternating algorithm that incorporates the difference-of-convex programming, augmented Lagrange and coordinate descent methods. Additionally, the experimental results on real data sets show that the proposed FGSPCA benefits from the grouping effect compared with methods without grouping effect.Comment: 21 pages, 5 figures, 2 table

    Uncertainty in Artificial Intelligence: Proceedings of the Thirty-Fourth Conference

    Get PDF

    Structured Semidefinite Programming for Recovering Structured Preconditioners

    Full text link
    We develop a general framework for finding approximately-optimal preconditioners for solving linear systems. Leveraging this framework we obtain improved runtimes for fundamental preconditioning and linear system solving problems including the following. We give an algorithm which, given positive definite K∈Rd×d\mathbf{K} \in \mathbb{R}^{d \times d} with nnz(K)\mathrm{nnz}(\mathbf{K}) nonzero entries, computes an ϵ\epsilon-optimal diagonal preconditioner in time O~(nnz(K)⋅poly(κ⋆,ϵ−1))\widetilde{O}(\mathrm{nnz}(\mathbf{K}) \cdot \mathrm{poly}(\kappa^\star,\epsilon^{-1})), where κ⋆\kappa^\star is the optimal condition number of the rescaled matrix. We give an algorithm which, given M∈Rd×d\mathbf{M} \in \mathbb{R}^{d \times d} that is either the pseudoinverse of a graph Laplacian matrix or a constant spectral approximation of one, solves linear systems in M\mathbf{M} in O~(d2)\widetilde{O}(d^2) time. Our diagonal preconditioning results improve state-of-the-art runtimes of Ω(d3.5)\Omega(d^{3.5}) attained by general-purpose semidefinite programming, and our solvers improve state-of-the-art runtimes of Ω(dω)\Omega(d^{\omega}) where ω>2.3\omega > 2.3 is the current matrix multiplication constant. We attain our results via new algorithms for a class of semidefinite programs (SDPs) we call matrix-dictionary approximation SDPs, which we leverage to solve an associated problem we call matrix-dictionary recovery.Comment: Merge of arXiv:1812.06295 and arXiv:2008.0172

    Compositionality, stability and robustness in probabilistic machine learning

    Get PDF
    Probability theory plays an integral part in the field of machine learning. Its use has been advocated by many [MacKay, 2002; Jaynes, 2003] as it allows for the quantification of uncertainty and the incorporation of prior knowledge by simply applying the rules of probability [Kolmogorov, 1950]. While probabilistic machine learning has been originally restricted to simple models, the advent of new computational technologies, such as automatic differentiation, and advances in approximate inference, such as Variational Inference [Blei et al., 2017], has made it more viable in complex settings. Despite this progress, there remain many challenges to its application to real-world tasks. Among those are questions about the ability of probabilistic models to model complex tasks and their reliability both in training and in the face of unexpected data perturbation. These three issues can be addressed by examining the three properties of compositionality, stability and robustness in these models. Hence, this thesis explores these three key properties and their application to probabilistic models, while validating their importance on a range of applications. The first contribution in this thesis studies compositionality. Compositionality enables the construction of complex and expressive probabilistic models from simple components. This increases the types of phenomena that one can model and provides the modeller with a wide array of modelling options. This thesis examines this property through the lens of Gaussian processes [Rasmussen and Williams, 2006]. It proposes a generic compositional Gaussian process model to address the problem of multi-task learning in the non-linear setting. Additionally, this thesis contributes two methods addressing the issue of stability. Stability determines the reliability of inference algorithms in the presence of noise. More stable training procedures lead to faster, more reliable inferences, especially for complex models. The two proposed methods aim at stabilising stochastic gradient estimation in Variational Inference using the method of control variates [Owen, 2013]. Finally, the last contribution of this thesis considers robustness. Robust machine learning methods are unaffected by unaccounted-for phenomena in the data. This makes such methods essential in deploying machine learning on real-world datasets. This thesis examines the problem of robust inference in sequential probabilistic models by combining the ideas of Generalised Bayesian Inference [Bissiri et al., 2016] and Sequential Monte Carlo sampling [Doucet and Johansen, 2011]

    Applications of Stochastic Gradient Descent to Nonnegative Matrix Factorization

    Get PDF
    We consider the application of stochastic gradient descent (SGD) to the nonnegative matrix factorization (NMF) problem and the unconstrained low-rank matrix factorization problem. While the literature on the SGD algorithm is rich, the application of this specific algorithm to the field of matrix factorization problems is an unexplored area. We develop a series of results for the unconstrained problem, beginning with an analysis of standard gradient descent with a known zero-loss solution, and culminating with results for SGD in the general case where no zero-loss solution is assumed. We show that, with initialization close to a minimizer, there exist linear rate convergence guarantees. We explore these results further with numerical experiments, and examine how the matrix factorization solutions found by SGD can be used as machine learning classifiers in two specific applications. In the first application, handwritten digit recognition, we show that our approach produces classification performance competitive with existing matrix factorization algorithms. In the second application, document topic classification, we examine how well SGD can recover an unknown words-to-topics matrix when the topics-to-document matrix is generated using the Latent Dirichlet Allocation model. This approach allows us to simulate two regimes for SGD: a fixed-sample regime where a large set of data is iterated over to train the model, and a generated-sample regime where a new data point is generated at each training iteration. In both regimes, we show that SGD can be an effective tool for recovering the hidden words-to-topic matrix. We conclude with some suggestions for further expansion of this work

    Quantum Machine Learning For Classical Data

    Get PDF
    In this dissertation, we study the intersection of quantum computing and supervised machine learning algorithms, which means that we investigate quantum algorithms for supervised machine learning that operate on classical data. This area of re- search falls under the umbrella of quantum machine learning, a research area of computer science which has recently received wide attention. In particular, we in- vestigate to what extent quantum computers can be used to accelerate supervised machine learning algorithms. The aim of this is to develop a clear understanding of the promises and limitations of the current state-of-the-art of quantum algorithms for supervised machine learning, but also to define directions for future research in this exciting field. We start by looking at supervised quantum machine learning (QML) algorithms through the lens of statistical learning theory. In this frame- work, we derive novel bounds on the computational complexities of a large set of supervised QML algorithms under the requirement of optimal learning rates. Next, we give a new bound for Hamiltonian simulation of dense Hamiltonians, a major subroutine of most known supervised QML algorithms, and then derive a classical algorithm with nearly the same complexity. We then draw the parallels to recent ‘quantum-inspired’ results, and will explain the implications of these results for quantum machine learning applications. Looking for areas which might bear larger advantages for QML algorithms, we finally propose a novel algorithm for Quantum Boltzmann machines, and argue that quantum algorithms for quantum data are one of the most promising applications for QML with potentially exponential advantage over classical approaches
    corecore