1,109 research outputs found

    Constructive neural networks : generalisation, convergence and architectures

    Full text link
    Feedforward neural networks trained via supervised learning have proven to be successful in the field of pattern recognition. The most important feature of a pattern recognition technique is its ability to successfully classify future data. This is known as generalisation. A more practical aspect of pattern recognition methods is how quickly they can be trained and how reliably a good solution is found. Feedforward neural networks have been shown to provide good generali- sation on a variety of problems. A number of training techniques also exist that provide fast convergence. Two problems often addressed within the field of feedforward neural networks are how to improve thegeneralisation and convergence of these pattern recognition techniques. These two problems are addressed in this thesis through the frame- work of constructive neural network algorithms. Constructive neural networks are a type of feedforward neural network in which the network architecture is built during the training process. The type of architecture built can affect both generalisation and convergence speed. Convergence speed and reliability areimportant properties of feedforward neu- ral networks. These properties are studied by examining different training al- gorithms and the effect of using a constructive process. A new gradient based training algorithm, SARPROP, is introduced. This algorithm addresses the problems of poor convergence speed and reliability when using a gradient based training method. SARPROP is shown to increase both convergence speed and the chance of convergence to a good solution. This is achieved through the combination of gradient based and Simulated Annealing methods. The convergence properties of various constructive algorithms are examined through a series of empirical studies. The results of these studies demonstrate that the cascade architecture allows for faster, more reliable convergence using a gradient based method than a single layer architecture with a comparable num- ber of weights. It is shown that constructive algorithms that bias the search direction of the gradient based training algorithm for the newly added hidden neurons, produce smaller networks and more rapid convergence. A constructive algorithm using search direction biasing is shown to converge to solutions with networks that are unreliable and ineƆcient to train using a non-constructive gradient based algorithm. The technique of weight freezing is shown to result in larger architectures than those obtained from training the whole network. Improving the generalisation ability of constructive neural networks is an im- portant area of investigation. A series of empirical studies are performed to examine the effect of regularisation on generalisation in constructive cascade al- gorithms. It is found that the combination of early stopping and regularisation results in better generalisation than the use of early stopping alone. A cubic regularisation term that greatly penalises large weights is shown to be benefi- cial for generalisation in cascade networks. An adaptive method of setting the regularisation magnitude in constructive networks is introduced and is shown to produce generalisation results similar to those obtained with a fixed, user- optimised regularisation setting. This adaptive method also oftenresults in the construction of smaller networks for more complex problems. The insights obtained from the SARPROP algorithm and from the convergence and generalisation empirical studies are used to create a new constructive cascade algorithm, acasper. This algorithm is extensively benchmarked and is shown to obtain good generalisation results in comparison to a number of well-respected and successful neural network algorithms. A technique of incorporating the validation data into the training set after network construction is introduced and is shown to generally result in similar or improved generalisation. The diƆculties of implementing a cascade architecture in VLSI are described and results are given on the effect of the cascade architecture on such attributes as weight growth, fan-in, network depth, and propagation delay. Two variants of the cascade architecture are proposed. These new architectures are shown to produce similar generalisation results to the cascade architecture, while also addressing the problems of VLSI implementation of cascade networks

    Nature of the learning algorithms for feedforward neural networks

    Get PDF
    The neural network model (NN) comprised of relatively simple computing elements, operating in parallel, offers an attractive and versatile framework for exploring a variety of learning structures and processes for intelligent systems. Due to the amount of research developed in the area many types of networks have been defined. The one of interest here is the multi-layer perceptron as it is one of the simplest and it is considered a powerful representation tool whose complete potential has not been adequately exploited and whose limitations need yet to be specified in a formal and coherent framework. This dissertation addresses the theory of generalisation performance and architecture selection for the multi-layer perceptron; a subsidiary aim is to compare and integrate this model with existing data analysis techniques and exploit its potential by combining it with certain constructs from computational geometry creating a reliable, coherent network design process which conforms to the characteristics of a generative learning algorithm, ie. one including mechanisms for manipulating the connections and/or units that comprise the architecture in addition to the procedure for updating the weights of the connections. This means that it is unnecessary to provide an initial network as input to the complete training process.After discussing in general terms the motivation for this study, the multi-layer perceptron model is introduced and reviewed, along with the relevant supervised training algorithm, ie. backpropagation. More particularly, it is argued that a network developed employing this model can in general be trained and designed in a much better way by extracting more information about the domains of interest through the application of certain geometric constructs in a preprocessing stage, specifically by generating the Voronoi Diagram and Delaunav Triangulation [Okabe et al. 92] of the set of points comprising the training set and once a final architecture which performs appropriately on it has been obtained, Principal Component Analysis [Jolliffe 86] is applied to the outputs produced by the units in the network's hidden layer to eliminate the redundant dimensions of this space

    Towards Deep Learning with Competing Generalisation Objectives

    Get PDF
    The unreasonable effectiveness of Deep Learning continues to deliver unprecedented Artificial Intelligence capabilities to billions of people. Growing datasets and technological advances keep extending the reach of expressive model architectures trained through efficient optimisations. Thus, deep learning approaches continue to provide increasingly proficient subroutines for, among others, computer vision and natural interaction through speech and text. Due to their scalable learning and inference priors, higher performance is often gained cost-effectively through largely automatic training. As a result, new and improved capabilities empower more people while the costs of access drop. The arising opportunities and challenges have profoundly influenced research. Quality attributes of scalable software became central desiderata of deep learning paradigms, including reusability, efficiency, robustness and safety. Ongoing research into continual, meta- and robust learning aims to maximise such scalability metrics in addition to multiple generalisation criteria, despite possible conflicts. A significant challenge is to satisfy competing criteria automatically and cost-effectively. In this thesis, we introduce a unifying perspective on learning with competing generalisation objectives and make three additional contributions. When autonomous learning through multi-criteria optimisation is impractical, it is reasonable to ask whether knowledge of appropriate trade-offs could make it simultaneously effective and efficient. Informed by explicit trade-offs of interest to particular applications, we developed and evaluated bespoke model architecture priors. We introduced a novel architecture for sim-to-real transfer of robotic control policies by learning progressively to generalise anew. Competing desiderata of continual learning were balanced through disjoint capacity and hierarchical reuse of previously learnt representations. A new state-of-the-art meta-learning approach is then proposed. We showed that meta-trained hypernetworks efficiently store and flexibly reuse knowledge for new generalisation criteria through few-shot gradient-based optimisation. Finally, we characterised empirical trade-offs between the many desiderata of adversarial robustness and demonstrated a novel defensive capability of implicit neural networks to hinder many attacks simultaneously

    Using constraints to improve generalisation and training of feedforward neural networks : constraint based decomposition and complex backpropagation

    Get PDF
    Neural networks can be analysed from two points of view: training and generalisation. The training is characterised by a trade-off between the 'goodness' of the training algorithm itself (speed, reliability, guaranteed convergence) and the 'goodness' of the architecture (the difficulty of the problems the network can potentially solve). Good training algorithms are available for simple architectures which cannot solve complicated problems. More complex architectures, which have been shown to be able to solve potentially any problem do not have in general simple and fast algorithms with guaranteed convergence and high reliability. A good training technique should be simple, fast and reliable, and yet also be applicable to produce a network able to solve complicated problems. The thesis presents Constraint Based Decomposition (CBD) as a technique which satisfies the above requirements well. CBD is shown to build a network able to solve complicated problems in a simple, fast and reliable manner. Furthermore, the user is given a better control over the generalisation properties of the trained network with respect to the control offered by other techniques. The generalisation issue is addressed, as well. An analysis of the meaning of the term "good generalisation" is presented and a framework for assessing generalisation is given: the generalisation can be assessed only with respect to a known or desired underlying function. The known properties of the underlying function can be embedded into the network thus ensuring a better generalisation for the given problem. This is the fundamental idea of the complex backpropagation network. This network can associate signals through associating some of their parameters using complex weights. It is shown that such a network can yield better generalisation results than a standard backpropagation network associating instantaneous values

    VLSI neural networks for computer vision

    Get PDF

    Deep learning for fast and robust medical image reconstruction and analysis

    Get PDF
    Medical imaging is an indispensable component of modern medical research as well as clinical practice. Nevertheless, imaging techniques such as magnetic resonance imaging (MRI) and computational tomography (CT) are costly and are less accessible to the majority of the world. To make medical devices more accessible, affordable and efficient, it is crucial to re-calibrate our current imaging paradigm for smarter imaging. In particular, as medical imaging techniques have highly structured forms in the way they acquire data, they provide us with an opportunity to optimise the imaging techniques holistically by leveraging data. The central theme of this thesis is to explore different opportunities where we can exploit data and deep learning to improve the way we extract information for better, faster and smarter imaging. This thesis explores three distinct problems. The first problem is the time-consuming nature of dynamic MR data acquisition and reconstruction. We propose deep learning methods for accelerated dynamic MR image reconstruction, resulting in up to 10-fold reduction in imaging time. The second problem is the redundancy in our current imaging pipeline. Traditionally, imaging pipeline treated acquisition, reconstruction and analysis as separate steps. However, we argue that one can approach them holistically and optimise the entire pipeline jointly for a specific target goal. To this end, we propose deep learning approaches for obtaining high fidelity cardiac MR segmentation directly from significantly undersampled data, greatly exceeding the undersampling limit for image reconstruction. The final part of this thesis tackles the problem of interpretability of the deep learning algorithms. We propose attention-models that can implicitly focus on salient regions in an image to improve accuracy for ultrasound scan plane detection and CT segmentation. More crucially, these models can provide explainability, which is a crucial stepping stone for the harmonisation of smart imaging and current clinical practice.Open Acces

    A review of differentiable digital signal processing for music and speech synthesis

    Get PDF
    The term ā€œdifferentiable digital signal processingā€ describes a family of techniques in which loss function gradients are backpropagated through digital signal processors, facilitating their integration into neural networks. This article surveys the literature on differentiable audio signal processing, focusing on its use in music and speech synthesis. We catalogue applications to tasks including music performance rendering, sound matching, and voice transformation, discussing the motivations for and implications of the use of this methodology. This is accompanied by an overview of digital signal processing operations that have been implemented differentiably, which is further supported by a web book containing practical advice on differentiable synthesiser programming (https://intro2ddsp.github.io/). Finally, we highlight open challenges, including optimisation pathologies, robustness to real-world conditions, and design trade-offs, and discuss directions for future research

    A generalised feedforward neural network architecture and its applications to classification and regression

    Get PDF
    Shunting inhibition is a powerful computational mechanism that plays an important role in sensory neural information processing systems. It has been extensively used to model some important visual and cognitive functions. It equips neurons with a gain control mechanism that allows them to operate as adaptive non-linear filters. Shunting Inhibitory Artificial Neural Networks (SIANNs) are biologically inspired networks where the basic synaptic computations are based on shunting inhibition. SIANNs were designed to solve difficult machine learning problems by exploiting the inherent non-linearity mediated by shunting inhibition. The aim was to develop powerful, trainable networks, with non-linear decision surfaces, for classification and non-linear regression tasks. This work enhances and extends the original SIANN architecture to a more general form called the Generalised Feedforward Neural Network (GFNN) architecture, which contains as subsets both SIANN and the conventional Multilayer Perceptron (MLP) architectures. The original SIANN structure has the number of shunting neurons in the hidden layers equal to the number of inputs, due to the neuron model that is used having a single direct excitatory input. This was found to be too restrictive, often resulting in inadequately small or inordinately large network structures
    • ā€¦
    corecore