273 research outputs found
Geometric approach to Fletcher's ideal penalty function
Original article can be found at: www.springerlink.com Copyright Springer. [Originally produced as UH Technical Report 280, 1993]In this note, we derive a geometric formulation of an ideal penalty function for equality constrained problems. This differentiable penalty function requires no parameter estimation or adjustment, has numerical conditioning similar to that of the target function from which it is constructed, and also has the desirable property that the strict second-order constrained minima of the target function are precisely those strict second-order unconstrained minima of the penalty function which satisfy the constraints. Such a penalty function can be used to establish termination properties for algorithms which avoid ill-conditioned steps. Numerical values for the penalty function and its derivatives can be calculated efficiently using automatic differentiation techniques.Peer reviewe
Training feedforward neural networks using orthogonal iteration of the Hessian eigenvectors
Introduction
Training algorithms for Multilayer Perceptrons optimize the set of W weights and biases, w, so as to minimize an
error function, E, applied to a set of N training patterns. The well-known back propagation algorithm combines an
efficient method of estimating the gradient of the error function in weight space, DE=g, with a simple gradient
descent procedure to adjust the weights, Dw = -hg. More efficient algorithms maintain the gradient estimation
procedure, but replace the update step with a faster non-linear optimization strategy [1].
Efficient non-linear optimization algorithms are based upon second order approximation [2]. When sufficiently
close to a minimum the error surface is approximately quadratic, the shape being determined by the Hessian matrix.
Bishop [1] presents a detailed discussion of the properties and significance of the Hessian matrix. In principle, if
sufficiently close to a minimum it is possible to move directly to the minimum using the Newton step, -H-1g.
In practice, the Newton step is not used as H-1 is very expensive to evaluate; in addition, when not sufficiently close
to a minimum, the Newton step may cause a disastrously poor step to be taken. Second order algorithms either build
up an approximation to H-1, or construct a search strategy that implicitly exploits its structure without evaluating it;
they also either take precautions to prevent steps that lead to a deterioration in error, or explicitly reject such steps.
In applying non-linear optimization algorithms to neural networks, a key consideration is the high-dimensional
nature of the search space. Neural networks with thousands of weights are not uncommon. Some algorithms have
O(W2) or O(W3) memory or execution times, and are hence impracticable in such cases. It is desirable to identify
algorithms that have limited memory requirements, particularly algorithms where one may trade memory usage
against convergence speed.
The paper describes a new training algorithm that has scalable memory requirements, which may range from O(W)
to O(W2), although in practice the useful range is limited to lower complexity levels. The algorithm is based upon a
novel iterative estimation of the principal eigen-subspace of the Hessian, together with a quadratic step estimation
procedure.
It is shown that the new algorithm has convergence time comparable to conjugate gradient descent, and may be
preferable if early stopping is used as it converges more quickly during the initial phases.
Section 2 overviews the principles of second order training algorithms. Section 3 introduces the new algorithm.
Second 4 discusses some experiments to confirm the algorithm's performance; section 5 concludes the paper
Metric-Free Natural Gradient for Joint-Training of Boltzmann Machines
This paper introduces the Metric-Free Natural Gradient (MFNG) algorithm for
training Boltzmann Machines. Similar in spirit to the Hessian-Free method of
Martens [8], our algorithm belongs to the family of truncated Newton methods
and exploits an efficient matrix-vector product to avoid explicitely storing
the natural gradient metric . This metric is shown to be the expected second
derivative of the log-partition function (under the model distribution), or
equivalently, the variance of the vector of partial derivatives of the energy
function. We evaluate our method on the task of joint-training a 3-layer Deep
Boltzmann Machine and show that MFNG does indeed have faster per-epoch
convergence compared to Stochastic Maximum Likelihood with centering, though
wall-clock performance is currently not competitive
A comparison of linear and non-linear calibrations for speaker recognition
In recent work on both generative and discriminative score to
log-likelihood-ratio calibration, it was shown that linear transforms give good
accuracy only for a limited range of operating points. Moreover, these methods
required tailoring of the calibration training objective functions in order to
target the desired region of best accuracy. Here, we generalize the linear
recipes to non-linear ones. We experiment with a non-linear, non-parametric,
discriminative PAV solution, as well as parametric, generative,
maximum-likelihood solutions that use Gaussian, Student's T and
normal-inverse-Gaussian score distributions. Experiments on NIST SRE'12 scores
suggest that the non-linear methods provide wider ranges of optimal accuracy
and can be trained without having to resort to objective function tailoring.Comment: accepted for Odyssey 2014: The Speaker and Language Recognition
Worksho
Theano: new features and speed improvements
Theano is a linear algebra compiler that optimizes a user's
symbolically-specified mathematical computations to produce efficient low-level
implementations. In this paper, we present new features and efficiency
improvements to Theano, and benchmarks demonstrating Theano's performance
relative to Torch7, a recently introduced machine learning library, and to
RNNLM, a C++ library targeted at recurrent neural networks.Comment: Presented at the Deep Learning Workshop, NIPS 201
Symmetric complex-valued RBF receiver for multiple-antenna aided wireless systems
A nonlinear beamforming assisted detector is proposed for multiple-antenna-aided wireless systems employing complex-valued quadrature phase shift-keying modulation. By exploiting the inherent symmetry of the optimal Bayesian detection solution, a novel complex-valued symmetric radial basis function (SRBF)-network-based detector is developed, which is capable of approaching the optimal Bayesian performance using channel-impaired training data. In the uplink case, adaptive nonlinear beamforming can be efficiently implemented by estimating the systemâs channel matrix based on the least squares channel estimate. Adaptive implementation of nonlinear beamforming in the downlink case by contrast is much more challenging, and we adopt a cluster-variationenhanced clustering algorithm to directly identify the SRBF center vectors required for realizing the optimal Bayesian detector. A simulation example is included to demonstrate the achievable performance improvement by the proposed adaptive nonlinear beamforming solution over the theoretical linear minimum bit error rate beamforming benchmark
- âŠ