Search CORE

907 research outputs found

Semistochastic Quadratic Bound Methods

Author: Aravkin Aleksandr Y.
Choromanska Anna
Jebara Tony
Kanevsky Dimitri
Publication venue
Publication date: 17/02/2014
Field of study

Partition functions arise in a variety of settings, including conditional random fields, logistic regression, and latent gaussian models. In this paper, we consider semistochastic quadratic bound (SQB) methods for maximum likelihood inference based on partition function optimization. Batch methods based on the quadratic bound were recently proposed for this class of problems, and performed favorably in comparison to state-of-the-art techniques. Semistochastic methods fall in between batch algorithms, which use all the data, and stochastic gradient type methods, which use small random selections at each iteration. We build semistochastic quadratic bound-based methods, and prove both global convergence (to a stationary point) under very weak assumptions, and linear convergence rate under stronger assumptions on the objective. To make the proposed methods faster and more stable, we consider inexact subproblem minimization and batch-size selection schemes. The efficacy of SQB methods is demonstrated via comparison with several state-of-the-art techniques on commonly used datasets.Comment: 11 pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

Optimization Methods for Inverse Problems

Author: Cui Tiangang
Roosta-Khorasani Farbod
Ye Nan
Publication venue
Publication date: 30/11/2017
Field of study

Optimization plays an important role in solving many inverse problems. Indeed, the task of inversion often either involves or is fully cast as a solution of an optimization problem. In this light, the mere non-linear, non-convex, and large-scale nature of many of these inversions gives rise to some very challenging optimization problems. The inverse problem community has long been developing various techniques for solving such optimization tasks. However, other, seemingly disjoint communities, such as that of machine learning, have developed, almost in parallel, interesting alternative methods which might have stayed under the radar of the inverse problem community. In this survey, we aim to change that. In doing so, we first discuss current state-of-the-art optimization methods widely used in inverse problems. We then survey recent related advances in addressing similar challenges in problems faced by the machine learning community, and discuss their potential advantages for solving inverse problems. By highlighting the similarities among the optimization challenges faced by the inverse problem and the machine learning communities, we hope that this survey can serve as a bridge in bringing together these two communities and encourage cross fertilization of ideas.Comment: 13 page

arXiv.org e-Print Archive

University of Queensland eSpace

New quasi-Newton optimization methods for machine learning

Author: Yu Jin
Publication venue
Publication date: 21/11/2018
Field of study

The Australian National University

Recommended from our members

Optimisation Methods For Training Deep Neural Networks in Speech Recognition

Author: Haider Mustafa Adnan
Publication venue: University of Cambridge
Publication date: 13/03/2019
Field of study

Automatic Speech Recognition (ASR) is an example of a sequence to sequence level classification task where, given an acoustic waveform, the goal is to produce the correct word level hypotheses. In machine learning, a classification problem such as ASR is solved in two stages: an inference stage that models the uncertainty associated with the choice of hypothesis given the acoustic waveform using a mathematical model, and a decision stage which employs the inference model in conjunction with decision theory to make optimal class assignments. With the advent of careful network initialisation and GPU computing, hybrid Hidden Markov Models (HMMs) augmented with Deep Neural Networks (DNNs) have shown to outperform traditional HMMs using Gaussian Mixture Models (GMMs) in solving the inference problem for ASR. In comparison to GMMs, DNNs possess a better capability to model the underlying non-linear data manifold due to their deep and complex structure. While the structure of such models gives rich modelling capability, it also creates complex dependencies between the parameters which can make learning difficult via first order stochastic gradient descent (SGD). The task of finding the best procedure to train DNNs continues to be an active area of research and has been made even more challenging by the availability of ever more training data. This thesis focuses on designing better optimisation approaches to train hybrid HMM-DNN models using sequence level discriminative criterion which is a natural loss function that preserves the sequential ordering of frames within a spoken utterance. The thesis presents an implementation of the second order Hessian Free (HF) optimisation method, and shows how the method can made efficient through appropriate modifications to the Conjugate Gradient algorithm. To achieve better convergence than SGD, this work explores the Natural Gradient method to train DNNs with discriminative sequence training. In the DNN literature, the method has been applied to train models for the Maximum Likelihood objective criterion. A novel contribution of this thesis is to extend this approach to the domain of Minimum Bayes Risk objective functions for discriminative sequence training. With sigmoid models trained on a 50hr and 200hr training set from the Multi-Genre Broadcast 1 (MGB1) transcription task, the NG method applied in a HF styled optimisation framework is shown to achieve better Word Error Rate (WER) reductions on the MGB1 development set than SGD from sequence training. This thesis also addresses the particular issue of overfitting between the training criterion and WER, that primarily arises during sequence training of DNN models that use Rectified Linear Units (ReLUs) as activation functions. It is shown how by scaling with the Gauss Newton matrix, the HF method unlike other approaches can overcome this issue. Seeing that different optimisers work best with different models, it is attractive to have a consistent optimisation framework that is agnostic to the choice of activation function. To address the issue, this thesis develops the geometry of the underlying function space captured by different realisations of DNN model parameters, and presents the design considerations for an optimisation algorithm to be well defined on this space. Building on this analysis, a novel optimisation technique called NGHF is presented that uses both the direction of steepest descent on a probabilistic manifold and local curvature information to effectively probe the error surface. The basis of the method relies on an alternative derivation of Taylor’s theorem using the concepts of manifolds, tangent vectors and directional derivatives from the perspective of Information Geometry. Apart from being well defined on the function space, when framed within a HF style optimisation framework, the method of NGHF is shown to achieve the greatest WER reductions from sequence training on the MGB1 development set with both sigmoid and ReLU based models trained on the 200hr MGB1 training set. The evaluation of the above optimisation methods in training different DNN model architectures is also presented.IDB Cambridge International Scholarshi

Apollo (Cambridge)

Information metrics for localization and mapping

Author: Vallvé Navarro Joan
Publication venue
Publication date: 27/02/2019
Field of study

Decades of research have made possible the existence of several autonomous systems that successfully and efficiently navigate within a variety of environments under certain conditions. One core technology that has allowed this is simultaneous localization and mapping (SLAM), the process of building a representation of the environment while localizing the robot in it. State-of-the-art solutions to the SLAM problem still rely, however, on heuristic decisions and options set by the user. In this thesis we search for principled solutions to various aspects of the localization and mapping problem with the help of information metrics. One such aspect is the issue of scalability. In SLAM, the problem size grows indefinitely as the experiment goes by, increasing computational resource demands. To maintain the problem tractable, we develop methods to build an approximation to the original network of constraints of the SLAM problem by reducing its size while maintaining its sparsity. In this thesis we propose three methods to build the topology of such approximated network, and two methods to perform the approximation itself. In addition, SLAM is a passive application. It means, it does not drive the robot. The problem of driving the robot with the aim of both accurately localizing the robot and mapping the environment is called active SLAM. In this problem two normally opposite forces drive the robot, one to new places discovering unknown regions and another to revisit previous configurations to improve localization. As opposed to heuristics, in this thesis we pose the problem as the joint minimization of both map and trajectory estimation uncertainties, and present four different active SLAM approaches based on entropy-reduction formulation. All methods presented in this thesis have been rigorously validated in both synthetic and real datasets.Dècades de recerca han fet possible l’existència de nombrosos sistemes autònoms que naveguen eficaçment i eficient per varietat d’entorns sota certes condicions. Una de les principals tecnologies que ho han fet possible és la localització i mapeig simultanis (SLAM), el procés de crear una representació de l’entorn mentre es localitza el robot en aquesta. De tota manera, els algoritmes d’SLAM de l’estat de l’art encara basen moltes decisions en heurístiques i opcions a escollir per l’usuari final. Aquesta tesi persegueix solucions fonamentades per a varietat d’aspectes del problema de localització i mappeig amb l’ajuda de mesures d’informació. Un d’aquests aspectes és l’escalabilitat. En SLAM, el problema creix indefinidament a mesura que l’experiment avança fent créixer la demanda de recursos computacionals. Per mantenir el problema tractable, desenvolupem mètodes per construir una aproximació de la xarxa de restriccions original del problema d’SLAM, reduint així el seu tamany a l’hora que es manté la seva naturalesa dispersa. En aquesta tesi, proposem tres métodes per confeccionar la topologia de l’approximació i dos mètodes per calcular l’aproximació pròpiament. A més, l’SLAM és una aplicació passiva. És a dir que no dirigeix el robot. El problema de guiar el robot amb els objectius de localitzar el robot i mapejar l’entorn amb precisió es diu SLAM actiu. En aquest problema, dues forces normalment oposades guien el robot, una cap a llocs nous descobrint regions desconegudes i l’altra a revisitar prèvies configuracions per millorar la localització. En contraposició amb mètodes heurístics, en aquesta tesi plantegem el problema com una minimització de l’incertesa tant en el mapa com en l’estimació de la trajectòria feta i presentem quatre mètodes d’SLAM actiu basats en la reducció de l’entropia. Tots els mètodes presentats en aquesta tesi han estat rigurosament validats tant en sèries de dades sintètiques com en reals

Information metrics for localization and mapping

Author: Vallvé Navarro Joan
Publication venue
Publication date: 27/02/2019
Field of study

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa