10,914 research outputs found
Visualising Basins of Attraction for the Cross-Entropy and the Squared Error Neural Network Loss Functions
Quantification of the stationary points and the associated basins of
attraction of neural network loss surfaces is an important step towards a
better understanding of neural network loss surfaces at large. This work
proposes a novel method to visualise basins of attraction together with the
associated stationary points via gradient-based random sampling. The proposed
technique is used to perform an empirical study of the loss surfaces generated
by two different error metrics: quadratic loss and entropic loss. The empirical
observations confirm the theoretical hypothesis regarding the nature of neural
network attraction basins. Entropic loss is shown to exhibit stronger gradients
and fewer stationary points than quadratic loss, indicating that entropic loss
has a more searchable landscape. Quadratic loss is shown to be more resilient
to overfitting than entropic loss. Both losses are shown to exhibit local
minima, but the number of local minima is shown to decrease with an increase in
dimensionality. Thus, the proposed visualisation technique successfully
captures the local minima properties exhibited by the neural network loss
surfaces, and can be used for the purpose of fitness landscape analysis of
neural networks.Comment: Preprint submitted to the Neural Networks journa
Large-scale Multi-label Text Classification - Revisiting Neural Networks
Neural networks have recently been proposed for multi-label classification
because they are able to capture and model label dependencies in the output
layer. In this work, we investigate limitations of BP-MLL, a neural network
(NN) architecture that aims at minimizing pairwise ranking error. Instead, we
propose to use a comparably simple NN approach with recently proposed learning
techniques for large-scale multi-label text classification tasks. In
particular, we show that BP-MLL's ranking loss minimization can be efficiently
and effectively replaced with the commonly used cross entropy error function,
and demonstrate that several advances in neural network training that have been
developed in the realm of deep learning can be effectively employed in this
setting. Our experimental results show that simple NN models equipped with
advanced techniques such as rectified linear units, dropout, and AdaGrad
perform as well as or even outperform state-of-the-art approaches on six
large-scale textual datasets with diverse characteristics.Comment: 16 pages, 4 figures, submitted to ECML 201
An optimised deep spiking neural network architecture without gradients
We present an end-to-end trainable modular event-driven neural architecture
that uses local synaptic and threshold adaptation rules to perform
transformations between arbitrary spatio-temporal spike patterns. The
architecture represents a highly abstracted model of existing Spiking Neural
Network (SNN) architectures. The proposed Optimized Deep Event-driven Spiking
neural network Architecture (ODESA) can simultaneously learn hierarchical
spatio-temporal features at multiple arbitrary time scales. ODESA performs
online learning without the use of error back-propagation or the calculation of
gradients. Through the use of simple local adaptive selection thresholds at
each node, the network rapidly learns to appropriately allocate its neuronal
resources at each layer for any given problem without using a real-valued error
measure. These adaptive selection thresholds are the central feature of ODESA,
ensuring network stability and remarkable robustness to noise as well as to the
selection of initial system parameters. Network activations are inherently
sparse due to a hard Winner-Take-All (WTA) constraint at each layer. We
evaluate the architecture on existing spatio-temporal datasets, including the
spike-encoded IRIS and TIDIGITS datasets, as well as a novel set of tasks based
on International Morse Code that we created. These tests demonstrate the
hierarchical spatio-temporal learning capabilities of ODESA. Through these
tests, we demonstrate ODESA can optimally solve practical and highly
challenging hierarchical spatio-temporal learning tasks with the minimum
possible number of computing nodes.Comment: 18 pages, 6 figure
- …