11 research outputs found
Error bounds for Lanczos-based matrix function approximation
We analyze the Lanczos method for matrix function approximation (Lanczos-FA),
an iterative algorithm for computing when
is a Hermitian matrix and is a given mathbftor.
Assuming that is piecewise analytic, we
give a framework, based on the Cauchy integral formula, which can be used to
derive {\em a priori} and \emph{a posteriori} error bounds for Lanczos-FA in
terms of the error of Lanczos used to solve linear systems. Unlike many error
bounds for Lanczos-FA, these bounds account for fine-grained properties of the
spectrum of , such as clustered or isolated eigenvalues. Our
results are derived assuming exact arithmetic, but we show that they are easily
extended to finite precision computations using existing theory about the
Lanczos algorithm in finite precision. We also provide generalized bounds for
the Lanczos method used to approximate quadratic forms , and demonstrate the effectiveness of our bounds with
numerical experiments
Feature Grouping and Sparse Principal Component Analysis
Sparse Principal Component Analysis (SPCA) is widely used in data processing
and dimension reduction; it uses the lasso to produce modified principal
components with sparse loadings for better interpretability. However, sparse
PCA never considers an additional grouping structure where the loadings share
similar coefficients (i.e., feature grouping), besides a special group with all
coefficients being zero (i.e., feature selection). In this paper, we propose a
novel method called Feature Grouping and Sparse Principal Component Analysis
(FGSPCA) which allows the loadings to belong to disjoint homogeneous groups,
with sparsity as a special case. The proposed FGSPCA is a subspace learning
method designed to simultaneously perform grouping pursuit and feature
selection, by imposing a non-convex regularization with naturally adjustable
sparsity and grouping effect. To solve the resulting non-convex optimization
problem, we propose an alternating algorithm that incorporates the
difference-of-convex programming, augmented Lagrange and coordinate descent
methods. Additionally, the experimental results on real data sets show that the
proposed FGSPCA benefits from the grouping effect compared with methods without
grouping effect.Comment: 21 pages, 5 figures, 2 table
Structured Semidefinite Programming for Recovering Structured Preconditioners
We develop a general framework for finding approximately-optimal
preconditioners for solving linear systems. Leveraging this framework we obtain
improved runtimes for fundamental preconditioning and linear system solving
problems including the following. We give an algorithm which, given positive
definite with
nonzero entries, computes an -optimal
diagonal preconditioner in time , where is the
optimal condition number of the rescaled matrix. We give an algorithm which,
given that is either the pseudoinverse
of a graph Laplacian matrix or a constant spectral approximation of one, solves
linear systems in in time. Our diagonal
preconditioning results improve state-of-the-art runtimes of
attained by general-purpose semidefinite programming, and our solvers improve
state-of-the-art runtimes of where is the
current matrix multiplication constant. We attain our results via new
algorithms for a class of semidefinite programs (SDPs) we call
matrix-dictionary approximation SDPs, which we leverage to solve an associated
problem we call matrix-dictionary recovery.Comment: Merge of arXiv:1812.06295 and arXiv:2008.0172
Compositionality, stability and robustness in probabilistic machine learning
Probability theory plays an integral part in the field of machine learning. Its use has been advocated by many [MacKay, 2002; Jaynes, 2003] as it allows for the quantification of uncertainty and the incorporation of prior knowledge by simply applying the rules of probability [Kolmogorov, 1950]. While probabilistic machine learning has been originally restricted to simple models, the advent of new computational technologies, such as automatic differentiation, and advances in approximate inference, such as Variational Inference [Blei et al., 2017], has made it more viable in complex settings. Despite this progress, there remain many challenges to its application to real-world tasks. Among those are questions about the ability of probabilistic models to model complex tasks and their reliability both in training and in the face of unexpected data perturbation. These three issues can be addressed by examining the three properties of compositionality, stability and robustness in these models. Hence, this thesis explores these three key properties and their application to probabilistic models, while validating their importance on a range of applications.
The first contribution in this thesis studies compositionality. Compositionality enables the construction of complex and expressive probabilistic models from simple components. This increases the types of phenomena that one can model and provides the modeller with a wide array of modelling options. This thesis examines this property through the lens of Gaussian processes [Rasmussen and Williams, 2006]. It proposes a generic compositional Gaussian process model to address the problem of multi-task learning in the non-linear setting.
Additionally, this thesis contributes two methods addressing the issue of stability. Stability determines the reliability of inference algorithms in the presence of noise. More stable training procedures lead to faster, more reliable inferences, especially for complex models. The two proposed methods aim at stabilising stochastic gradient estimation in Variational Inference using the method of control variates [Owen, 2013].
Finally, the last contribution of this thesis considers robustness. Robust machine learning methods are unaffected by unaccounted-for phenomena in the data. This makes such methods essential in deploying machine learning on real-world datasets. This thesis examines the problem of robust inference in sequential probabilistic models by combining the ideas of Generalised Bayesian Inference [Bissiri et al., 2016] and Sequential Monte Carlo sampling [Doucet and Johansen, 2011]
Applications of Stochastic Gradient Descent to Nonnegative Matrix Factorization
We consider the application of stochastic gradient descent (SGD) to the nonnegative matrix factorization (NMF) problem and the unconstrained low-rank matrix factorization problem. While the literature on the SGD algorithm is rich, the application of this specific algorithm to the field of matrix factorization problems is an unexplored area. We develop a series of results for the unconstrained problem, beginning with an analysis of standard gradient descent with a known zero-loss solution, and culminating with results for SGD in the general case where no zero-loss solution is assumed. We show that, with initialization close to a minimizer, there exist linear rate convergence guarantees.
We explore these results further with numerical experiments, and examine how the matrix factorization solutions found by SGD can be used as machine learning classifiers in two specific applications. In the first application, handwritten digit recognition, we show that our approach produces classification performance competitive with existing matrix factorization algorithms. In the second application, document topic classification, we examine how well SGD can recover an unknown words-to-topics matrix when the topics-to-document matrix is generated using the Latent Dirichlet Allocation model. This approach allows us to simulate two regimes for SGD: a fixed-sample regime where a large set of data is iterated over to train the model, and a generated-sample regime where a new data point is generated at each training iteration. In both regimes, we show that SGD can be an effective tool for recovering the hidden words-to-topic matrix. We conclude with some suggestions for further expansion of this work
Quantum Machine Learning For Classical Data
In this dissertation, we study the intersection of quantum computing and supervised machine learning algorithms, which means that we investigate quantum algorithms for supervised machine learning that operate on classical data. This area of re- search falls under the umbrella of quantum machine learning, a research area of computer science which has recently received wide attention. In particular, we in- vestigate to what extent quantum computers can be used to accelerate supervised machine learning algorithms. The aim of this is to develop a clear understanding of the promises and limitations of the current state-of-the-art of quantum algorithms for supervised machine learning, but also to define directions for future research in this exciting field. We start by looking at supervised quantum machine learning (QML) algorithms through the lens of statistical learning theory. In this frame- work, we derive novel bounds on the computational complexities of a large set of supervised QML algorithms under the requirement of optimal learning rates. Next, we give a new bound for Hamiltonian simulation of dense Hamiltonians, a major subroutine of most known supervised QML algorithms, and then derive a classical algorithm with nearly the same complexity. We then draw the parallels to recent ‘quantum-inspired’ results, and will explain the implications of these results for quantum machine learning applications. Looking for areas which might bear larger advantages for QML algorithms, we finally propose a novel algorithm for Quantum Boltzmann machines, and argue that quantum algorithms for quantum data are one of the most promising applications for QML with potentially exponential advantage over classical approaches