Search CORE

273 research outputs found

Geometric approach to Fletcher's ideal penalty function

Author: A. Griewank
A. Griewank
B. Christianson
B. Christianson
B. Christianson
B. Christianson
B. Pearlmutter
C. Bischof
D. P. Bertsekas
G. Pillo Di
G. Pillo Di
J. C. Gilbert
L. C. W. Dixon
M. C. Biggs
M. J. D. Powell
R. Fletcher
R. Fletcher
R. Fletcher
R. Fletcher
T. Yoshida
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/09/2005
Field of study

Original article can be found at: www.springerlink.com Copyright Springer. [Originally produced as UH Technical Report 280, 1993]In this note, we derive a geometric formulation of an ideal penalty function for equality constrained problems. This differentiable penalty function requires no parameter estimation or adjustment, has numerical conditioning similar to that of the target function from which it is constructed, and also has the desirable property that the strict second-order constrained minima of the target function are precisely those strict second-order unconstrained minima of the penalty function which satisfy the constraints. Such a penalty function can be used to establish termination properties for algorithms which avoid ill-conditioned steps. Numerical values for the penalty function and its derivatives can be calculated efficiently using automatic differentiation techniques.Peer reviewe

Crossref

University of Hertfordshire Research Archive

Training feedforward neural networks using orthogonal iteration of the Hessian eigenvectors

Author: Hunter Andrew
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2000
Field of study

Introduction Training algorithms for Multilayer Perceptrons optimize the set of W weights and biases, w, so as to minimize an error function, E, applied to a set of N training patterns. The well-known back propagation algorithm combines an efficient method of estimating the gradient of the error function in weight space, DE=g, with a simple gradient descent procedure to adjust the weights, Dw = -hg. More efficient algorithms maintain the gradient estimation procedure, but replace the update step with a faster non-linear optimization strategy [1]. Efficient non-linear optimization algorithms are based upon second order approximation [2]. When sufficiently close to a minimum the error surface is approximately quadratic, the shape being determined by the Hessian matrix. Bishop [1] presents a detailed discussion of the properties and significance of the Hessian matrix. In principle, if sufficiently close to a minimum it is possible to move directly to the minimum using the Newton step, -H-1g. In practice, the Newton step is not used as H-1 is very expensive to evaluate; in addition, when not sufficiently close to a minimum, the Newton step may cause a disastrously poor step to be taken. Second order algorithms either build up an approximation to H-1, or construct a search strategy that implicitly exploits its structure without evaluating it; they also either take precautions to prevent steps that lead to a deterioration in error, or explicitly reject such steps. In applying non-linear optimization algorithms to neural networks, a key consideration is the high-dimensional nature of the search space. Neural networks with thousands of weights are not uncommon. Some algorithms have O(W2) or O(W3) memory or execution times, and are hence impracticable in such cases. It is desirable to identify algorithms that have limited memory requirements, particularly algorithms where one may trade memory usage against convergence speed. The paper describes a new training algorithm that has scalable memory requirements, which may range from O(W) to O(W2), although in practice the useful range is limited to lower complexity levels. The algorithm is based upon a novel iterative estimation of the principal eigen-subspace of the Hessian, together with a quadratic step estimation procedure. It is shown that the new algorithm has convergence time comparable to conjugate gradient descent, and may be preferable if early stopping is used as it converges more quickly during the initial phases. Section 2 overviews the principles of second order training algorithms. Section 3 introduces the new algorithm. Second 4 discusses some experiments to confirm the algorithm's performance; section 5 concludes the paper

University of Lincoln Institutional Repository

Crossref

Metric-Free Natural Gradient for Joint-Training of Boltzmann Machines

Author: Bengio Yoshua
Courville Aaron
Desjardins Guillaume
Pascanu Razvan
Publication venue
Publication date: 16/03/2013
Field of study

This paper introduces the Metric-Free Natural Gradient (MFNG) algorithm for training Boltzmann Machines. Similar in spirit to the Hessian-Free method of Martens [8], our algorithm belongs to the family of truncated Newton methods and exploits an efficient matrix-vector product to avoid explicitely storing the natural gradient metric

L

. This metric is shown to be the expected second derivative of the log-partition function (under the model distribution), or equivalently, the variance of the vector of partial derivatives of the energy function. We evaluate our method on the task of joint-training a 3-layer Deep Boltzmann Machine and show that MFNG does indeed have faster per-epoch convergence compared to Stochastic Maximum Likelihood with centering, though wall-clock performance is currently not competitive

arXiv.org e-Print Archive

CiteSeerX

A comparison of linear and non-linear calibrations for speaker recognition

Author: Brümmer Niko
Swart Albert
van Leeuwen David
Publication venue
Publication date: 01/01/2014
Field of study

In recent work on both generative and discriminative score to log-likelihood-ratio calibration, it was shown that linear transforms give good accuracy only for a limited range of operating points. Moreover, these methods required tailoring of the calibration training objective functions in order to target the desired region of best accuracy. Here, we generalize the linear recipes to non-linear ones. We experiment with a non-linear, non-parametric, discriminative PAV solution, as well as parametric, generative, maximum-likelihood solutions that use Gaussian, Student's T and normal-inverse-Gaussian score distributions. Experiments on NIST SRE'12 scores suggest that the non-linear methods provide wider ranges of optimal accuracy and can be trained without having to resort to objective function tailoring.Comment: accepted for Odyssey 2014: The Speaker and Language Recognition Worksho

arXiv.org e-Print Archive

Radboud Repository

Theano: new features and speed improvements

Author: Bastien Frédéric
Bengio Yoshua
Bergeron Arnaud
Bergstra James
Bouchard Nicolas
Goodfellow Ian
Lamblin Pascal
Pascanu Razvan
Warde-Farley David
Publication venue
Publication date: 23/11/2012
Field of study

Theano is a linear algebra compiler that optimizes a user's symbolically-specified mathematical computations to produce efficient low-level implementations. In this paper, we present new features and efficiency improvements to Theano, and benchmarks demonstrating Theano's performance relative to Torch7, a recently introduced machine learning library, and to RNNLM, a C++ library targeted at recurrent neural networks.Comment: Presented at the Deep Learning Workshop, NIPS 201

arXiv.org e-Print Archive

CiteSeerX

Symmetric complex-valued RBF receiver for multiple-antenna aided wireless systems

Author: Chen Sheng
Hanzo L.
Tan S.
Publication venue
Publication date: 01/09/2008
Field of study

A nonlinear beamforming assisted detector is proposed for multiple-antenna-aided wireless systems employing complex-valued quadrature phase shift-keying modulation. By exploiting the inherent symmetry of the optimal Bayesian detection solution, a novel complex-valued symmetric radial basis function (SRBF)-network-based detector is developed, which is capable of approaching the optimal Bayesian performance using channel-impaired training data. In the uplink case, adaptive nonlinear beamforming can be efficiently implemented by estimating the system’s channel matrix based on the least squares channel estimate. Adaptive implementation of nonlinear beamforming in the downlink case by contrast is much more challenging, and we adopt a cluster-variationenhanced clustering algorithm to directly identify the SRBF center vectors required for realizing the optimal Bayesian detector. A simulation example is included to demonstrate the achievable performance improvement by the proposed adaptive nonlinear beamforming solution over the theoretical linear minimum bit error rate beamforming benchmark

Southampton (e-Prints Soton)