Search CORE

238,000 research outputs found

Accelerated Stochastic Matrix Inversion: General Theory and Speeding up BFGS Rules for Faster Second-Order Optimization

Author: Gower Robert M.
Hanzely Filip
Richtárik Peter
Stich Sebastian
Publication venue
Publication date: 12/02/2018
Field of study

We present the first accelerated randomized algorithm for solving linear systems in Euclidean spaces. One essential problem of this type is the matrix inversion problem. In particular, our algorithm can be specialized to invert positive definite matrices in such a way that all iterates (approximate solutions) generated by the algorithm are positive definite matrices themselves. This opens the way for many applications in the field of optimization and machine learning. As an application of our general theory, we develop the {\em first accelerated (deterministic and stochastic) quasi-Newton updates}. Our updates lead to provably more aggressive approximations of the inverse Hessian, and lead to speed-ups over classical non-accelerated rules in numerical experiments. Experiments with empirical risk minimization show that our rules can accelerate training of machine learning models.Comment: 37 pages, 32 figures, 3 algorithm

arXiv.org e-Print Archive

Data and Computation Efficient Meta-Learning

Author: Bronskill John
Publication venue: University of Cambridge
Publication date: 14/07/2020
Field of study

In order to make predictions with high accuracy, conventional deep learning systems require large training datasets consisting of thousands or millions of examples and long training times measured in hours or days, consuming high levels of electricity with a negative impact on our environment. It is desirable to have have machine learning systems that can emulate human behavior such that they can quickly learn new concepts from only a few examples. This is especially true if we need to quickly customize or personalize machine learning models to specific scenarios where it would be impractical to acquire a large amount of training data and where a mobile device is the means for computation. We define a data efficient machine learning system to be one that can learn a new concept from only a few examples (or shots) and a computation efficient machine learning system to be one that can learn a new concept rapidly without retraining on an everyday computing device such as a smart phone. In this work, we design, develop, analyze, and extend the theory of machine learning systems that are both data efficient and computation efficient. We present systems that are trained using multiple tasks such that it "learns how to learn" to solve new tasks from only a few examples. These systems can efficiently solve new, unseen tasks drawn from a broad range of data distributions, in both the low and high data regimes, without the need for costly retraining. Adapting to a new task requires only a forward pass of the example task data through the trained network making the learning of new tasks possible on mobile devices. In particular, we focus on few-shot image classification systems, i.e. machine learning systems that can distinguish between numerous classes of objects depicted in digital images given only a few examples of each class of object to learn from. To accomplish this, we first develop ML-PIP, a general framework for Meta-Learning approximate Probabilistic Inference for Prediction. ML-PIP extends existing probabilistic interpretations of meta-learning to cover a broad class of methods. We then introduce Versa, an instance of the framework employing a fast, flexible and versatile amortization network that takes few-shot learning datasets as inputs, with arbitrary numbers of training examples, and outputs a distribution over task-specific parameters in a single forward pass of the network. We evaluate Versa on benchmark datasets, where at the time, the method achieved state-of-the-art results when compared to meta-learning approaches using similar training regimes and feature extractor capacity. Next, we build on Versa and add a second amortized network to adapt key parameters in the feature extractor to the current task. To accomplish this, we introduce CNAPs, a conditional neural process based approach to multi-task classification. We demonstrate that, at the time, CNAPs achieved state-of-the-art results on the challenging Meta-Dataset benchmark indicating high-quality transfer-learning. Timing experiments reveal that CNAPs is computationally efficient when adapting to an unseen task as it does not involve gradient back propagation computations. We show that trained models are immediately deployable to continual learning and active learning where they can outperform existing approaches that do not leverage transfer learning. Finally, we investigate the effects of different methods of batch normalization on meta-learning systems. Batch normalization has become an essential component of deep learning systems as it significantly accelerates the training of neural networks by allowing the use of higher learning rates and decreasing the sensitivity to network initialization. We show that the hierarchical nature of the meta-learning setting presents several challenges that can render conventional batch normalization ineffective. We evaluate a range of approaches to batch normalization for few-shot learning scenarios, and develop a novel approach that we call TaskNorm. Experiments demonstrate that the choice of batch normalization has a dramatic effect on both classification accuracy and training time for both gradient based- and gradient-free meta-learning approaches and that TaskNorm consistently improves performance

Stability of machine learning algorithms

Author: Sun Wei
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2015
Field of study

In the literature, the predictive accuracy is often the primary criterion for evaluating a learning algorithm. In this thesis, I will introduce novel concepts of stability into the machine learning community. A learning algorithm is said to be stable if it produces consistent predictions with respect to small perturbation of training samples. Stability is an important aspect of a learning procedure because unstable predictions can potentially reduce users\u27 trust in the system and also harm the reproducibility of scientific conclusions. As a prototypical example, stability of the classification procedure will be discussed extensively. In particular, I will present two new concepts of classification stability. ^ The first one is the decision boundary instability (DBI) which measures the variability of linear decision boundaries generated from homogenous training samples. Incorporating DBI with the generalization error (GE), we propose a two-stage algorithm for selecting the most accurate and stable classifier. The proposed classifier selection method introduces the statistical inference thinking into the machine learning society. Our selection method is shown to be consistent in the sense that the optimal classifier simultaneously achieves the minimal GE and the minimal DBI. Various simulations and real examples further demonstrate the superiority of our method over several alternative approaches. ^ The second one is the classification instability (CIS). CIS is a general measure of stability and generalizes DBI to nonlinear classifiers. This allows us to establish a sharp convergence rate of CIS for general plug-in classifiers under a low-noise condition. As one of the simplest plug-in classifiers, the nearest neighbor classifier is extensively studied. Motivated by an asymptotic expansion formula of the CIS of the weighted nearest neighbor classifier, we propose a new classifier called stabilized nearest neighbor (SNN) classifier. Our theoretical developments further push the frontier of statistical theory in machine learning. In particular, we prove that SNN attains the minimax optimal convergence rate in the risk, and the established sharp convergence rate in CIS. Extensive simulation and real experiments demonstrate that SNN achieves a considerable improvement in stability over existing classifiers with no sacrifice of predictive accuracy