4 research outputs found
A feed forward neural network approach for matrix computations
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.A new neural network approach for performing matrix computations is presented. The idea of this approach is to construct a feed-forward neural network (FNN) and then train it by matching a desired set of patterns. The solution of the problem is the converged weight of the FNN. Accordingly, unlike the conventional FNN research that concentrates on external properties (mappings) of the networks, this study concentrates on the internal properties (weights) of the network. The present network is linear and its weights are usually strongly constrained; hence, complicated overlapped network needs to be construct. It should be noticed, however, that the present approach depends highly on the training algorithm of the FNN. Unfortunately, the available training methods; such as, the original Back-propagation (BP) algorithm, encounter many deficiencies when applied to matrix algebra problems; e. g., slow convergence due to improper choice of learning rates (LR). Thus, this study will focus on the development of new efficient and accurate FNN training methods. One improvement suggested to alleviate the problem of LR choice is the use of a line search with steepest descent method; namely, bracketing with golden section method. This provides an optimal LR as training progresses. Another improvement proposed in this study is the use of conjugate gradient (CG) methods to speed up the training process of the neural network. The computational feasibility of these methods is assessed on two matrix problems; namely, the LU-decomposition of both band and square ill-conditioned unsymmetric matrices and the inversion of square ill-conditioned unsymmetric matrices. In this study, two performance indexes have been considered; namely, learning speed and convergence accuracy. Extensive computer simulations have been carried out using the following training methods: steepest descent with line search (SDLS) method, conventional back propagation (BP) algorithm, and conjugate gradient (CG) methods; specifically, Fletcher Reeves conjugate gradient (CGFR) method and Polak Ribiere conjugate gradient (CGPR) method. The performance comparisons between these minimization methods have demonstrated that the CG training methods give better convergence accuracy and are by far the superior with respect to learning time; they offer speed-ups of anything between 3 and 4 over SDLS depending on the severity of the error goal chosen and the size of the problem. Furthermore, when using Powell's restart criteria with the CG methods, the problem of wrong convergence directions usually encountered in pure CG learning methods is alleviated. In general, CG methods with restarts have shown the best performance among all other methods in training the FNN for LU-decomposition and matrix inversion. Consequently, it is concluded that CG methods are good candidates for training FNN of matrix computations, in particular, Polak-Ribidre conjugate gradient method with Powell's restart criteria
Efficacy of modified backpropagation and optimisation methods on a real world medical problem
A wide range of modifications to the backpropagation (BP) algorithm, motivated by heuristic arguments and optimisation theory, has been examined on a real-world medical signal classification problem. The method of choice depends both upon the nature of the learning task and whether one wants to optimise learning for speed or generalisation. It was found that, comparitively, standard BP was sufficiently fast and provided good generalisation when the task was to learn the training set within a given error tolerance. However, if the task was to find the global minimum, then standard BP failed to do so within 100,000 iterations, but first order methods which adapt the stepsize were as fast as, if not faster than, conjugate gradient and quasi-Newton methods. Second order methods required the same amount of fine tuning of line search and restart parameters as did the first order methods of their parameters in order to achieve optimum performance
Recommended from our members
On the induction of temporal structure by recurrent neural networks
Language acquisition is one of the core problems in artificial intelligence (AI) and it is generally accepted that any successful AI account of the mind will stand or fall depending on its ability to model human language. Simple Recurrent Networks (SRNs) are a class of so-called artificial neural networks that have a long history in language modelling via learning to predict the next word in a sentence. However, SRNs have also been shown to suffer from catastrophic forgetting, lack of syntactic systematicity and an inability to represent more than three levels of centre-embedding, due to the so-called 'vanishing gradients' problem. This problem is caused by the decay of past input information encoded within the error-gradients which vanish exponentially as additional input information is encountered and passed through the recurrent connections. That said, a number of architectural variations have been applied which may compensate for this issue, such as the Nonlinear Autoregressive Network with exogenous inputs (NARX) network and the multi-recurrent network (MRN). In addition to this, Echo State Networks (ESNs) are a relatively new class of recurrent neural network that do not suffer from the vanishing gradients problem and have been shown to exhibit state-of-the-art performance in tasks such as motor control, dynamic time series prediction, and more recently language processing. This research re-explores the class of SRNs and evaluates them against the state-of-the-art ESN to identify which model class is best able to induce the underlying finite-state automaton of the target grammar implicitly through the next word prediction task. In order to meet its aim, the research analyses the internal representations formed by each of the different models and explores the conditions under which they are able to carry information about long term sequential dependencies beyond what is found in the training data. The findings of the research are significant. It reveals that the traditional class of SRNs, trained with backpropagation through time, are superior to ESNs for the grammar prediction task. More specifically, the MRN, with its state-based memory of varying rigidity, is more able to learn the underlying grammar than any other model. An analysis of the MRN’s internal state reveals that this is due to its ability to maintain a constant variance within its state-based representation of the embedded aspects (or finite state machines) of the target grammar. The investigations show that in order to successfully induce complex context free grammars directly from sentence examples, then not only are a hidden layer and output layer recurrency required, but so is self-recurrency on the context layer to enable varying degrees of current and past state information, that are integrated over time