36,218 research outputs found

    The role of weight normalization in competitive learning

    Get PDF
    The effect of different kinds of weight normalization on the outcome of a simple competitive learning rule is analyzed. It is shown that there are important differences in the representation formed depending on whether the constraint is enforced by dividing each weight by the same amount (''divisive enforcement'') or subtracting a fixed amount from each weight (''subtractive enforcement''). For the divisive cases weight vectors spread out over the space so as to evenly represent ''typical'' inputs, whereas for the subtractive cases the weight vectors tend to the axes of the space, so as to represent ''extreme'' inputs. The consequences of these differences are examined

    Long-term learning behavior in a recurrent neural network for sound recognition

    Get PDF
    In this paper, the long-term learning properties of an artificial neural network model, designed for sound recognition and computational auditory scene analysis in general, are investigated. The model is designed to run for long periods of time (weeks to months) on low-cost hardware, used in a noise monitoring network, and builds upon previous work by the same authors. It consists of three neural layers, connected to each other by feedforward and feedback excitatory connections. It is shown that the different mechanisms that drive auditory attention emerge naturally from the way in which neural activation and intra-layer inhibitory connections are implemented in the model. Training of the artificial neural network is done following the Hebb principle, dictating that "Cells that fire together, wire together", with some important modifications, compared to standard Hebbian learning. As the model is designed to be on-line for extended periods of time, also learning mechanisms need to be adapted to this. The learning needs to be strongly attention-and saliency-driven, in order not to waste available memory space for sounds that are of no interest to the human listener. The model also implements plasticity, in order to deal with new or changing input over time, without catastrophically forgetting what it already learned. On top of that, it is shown that also the implementation of shortterm memory plays an important role in the long-term learning properties of the model. The above properties are investigated and demonstrated by training on real urban sound recordings

    Pre-integration lateral inhibition enhances unsupervised learning

    Get PDF
    A large and influential class of neural network architectures use post-integration lateral inhibition as a mechanism for competition. We argue that these algorithms are computationally deficient in that they fail to generate, or learn, appropriate perceptual representations under certain circumstances. An alternative neural network architecture is presented in which nodes compete for the right to receive inputs rather than for the right to generate outputs. This form of competition, implemented through pre-integration lateral inhibition, does provide appropriate coding properties and can be used to efficiently learn such representations. Furthermore, this architecture is consistent with both neuro-anatomical and neuro-physiological data. We thus argue that pre-integration lateral inhibition has computational advantages over conventional neural network architectures while remaining equally biologically plausible

    Deep Complex Networks

    Full text link
    At present, the vast majority of building blocks, techniques, and architectures for deep learning are based on real-valued operations and representations. However, recent work on recurrent neural networks and older fundamental theoretical analysis suggests that complex numbers could have a richer representational capacity and could also facilitate noise-robust memory retrieval mechanisms. Despite their attractive properties and potential for opening up entirely new neural architectures, complex-valued deep neural networks have been marginalized due to the absence of the building blocks required to design such models. In this work, we provide the key atomic components for complex-valued deep neural networks and apply them to convolutional feed-forward networks and convolutional LSTMs. More precisely, we rely on complex convolutions and present algorithms for complex batch-normalization, complex weight initialization strategies for complex-valued neural nets and we use them in experiments with end-to-end training schemes. We demonstrate that such complex-valued models are competitive with their real-valued counterparts. We test deep complex models on several computer vision tasks, on music transcription using the MusicNet dataset and on Speech Spectrum Prediction using the TIMIT dataset. We achieve state-of-the-art performance on these audio-related tasks

    Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization

    Full text link
    Global covariance pooling in convolutional neural networks has achieved impressive improvement over the classical first-order pooling. Recent works have shown matrix square root normalization plays a central role in achieving state-of-the-art performance. However, existing methods depend heavily on eigendecomposition (EIG) or singular value decomposition (SVD), suffering from inefficient training due to limited support of EIG and SVD on GPU. Towards addressing this problem, we propose an iterative matrix square root normalization method for fast end-to-end training of global covariance pooling networks. At the core of our method is a meta-layer designed with loop-embedded directed graph structure. The meta-layer consists of three consecutive nonlinear structured layers, which perform pre-normalization, coupled matrix iteration and post-compensation, respectively. Our method is much faster than EIG or SVD based ones, since it involves only matrix multiplications, suitable for parallel implementation on GPU. Moreover, the proposed network with ResNet architecture can converge in much less epochs, further accelerating network training. On large-scale ImageNet, we achieve competitive performance superior to existing counterparts. By finetuning our models pre-trained on ImageNet, we establish state-of-the-art results on three challenging fine-grained benchmarks. The source code and network models will be available at http://www.peihuali.org/iSQRT-COVComment: Accepted to CVPR 201

    Fuzzy ART: Fast Stable Learning and Categorization of Analog Patterns by an Adaptive Resonance System

    Full text link
    A Fuzzy ART model capable of rapid stable learning of recognition categories in response to arbitrary sequences of analog or binary input patterns is described. Fuzzy ART incorporates computations from fuzzy set theory into the ART 1 neural network, which learns to categorize only binary input patterns. The generalization to learning both analog and binary input patterns is achieved by replacing appearances of the intersection operator (n) in AHT 1 by the MIN operator (Λ) of fuzzy set theory. The MIN operator reduces to the intersection operator in the binary case. Category proliferation is prevented by normalizing input vectors at a preprocessing stage. A normalization procedure called complement coding leads to a symmetric theory in which the MIN operator (Λ) and the MAX operator (v) of fuzzy set theory play complementary roles. Complement coding uses on-cells and off-cells to represent the input pattern, and preserves individual feature amplitudes while normalizing the total on-cell/off-cell vector. Learning is stable because all adaptive weights can only decrease in time. Decreasing weights correspond to increasing sizes of category "boxes". Smaller vigilance values lead to larger category boxes. Learning stops when the input space is covered by boxes. With fast learning and a finite input set of arbitrary size and composition, learning stabilizes after just one presentation of each input pattern. A fast-commit slow-recode option combines fast learning with a forgetting rule that buffers system memory against noise. Using this option, rare events can be rapidly learned, yet previously learned memories are not rapidly erased in response to statistically unreliable input fluctuations.British Petroleum (89-A-1204); Defense Advanced Research Projects Agency (90-0083); National Science Foundation (IRI-90-00530); Air Force Office of Scientific Research (90-0175
    • …
    corecore