145,106 research outputs found
UDC: Unified DNAS for Compressible TinyML Models
Deploying TinyML models on low-cost IoT hardware is very challenging, due to
limited device memory capacity. Neural processing unit (NPU) hardware address
the memory challenge by using model compression to exploit weight quantization
and sparsity to fit more parameters in the same footprint. However, designing
compressible neural networks (NNs) is challenging, as it expands the design
space across which we must make balanced trade-offs. This paper demonstrates
Unified DNAS for Compressible (UDC) NNs, which explores a large search space to
generate state-of-the-art compressible NNs for NPU. ImageNet results show UDC
networks are up to smaller (iso-accuracy) or 6.25% more accurate
(iso-model size) than previous work
Optimal modularity and memory capacity of neural reservoirs
The neural network is a powerful computing framework that has been exploited
by biological evolution and by humans for solving diverse problems. Although
the computational capabilities of neural networks are determined by their
structure, the current understanding of the relationships between a neural
network's architecture and function is still primitive. Here we reveal that
neural network's modular architecture plays a vital role in determining the
neural dynamics and memory performance of the network of threshold neurons. In
particular, we demonstrate that there exists an optimal modularity for memory
performance, where a balance between local cohesion and global connectivity is
established, allowing optimally modular networks to remember longer. Our
results suggest that insights from dynamical analysis of neural networks and
information spreading processes can be leveraged to better design neural
networks and may shed light on the brain's modular organization
End-to-end Incremental Learning
Although deep learning approaches have stood out in recent years due to their state-of-the-art results, they continue to suffer from (catastrophic forgetting), a dramatic decrease in overall performance when training with new classes added incrementally. This is due to current neural network architectures requiring the entire dataset, consisting of all the samples from the old as well as the new classes, to update the model---a requirement that becomes easily unsustainable as the number of classes grows. We address this issue with our approach to learn deep neural networks incrementally, using new data and only a small exemplar set corresponding to samples from the old classes. This is based on a loss composed of a distillation measure to retain the knowledge acquired from the old classes, and a cross-entropy loss to learn the new classes. Our incremental training is achieved while keeping the entire framework end-to-end, i.e., learning the data representation and the classifier jointly, unlike recent methods with no such guarantees.This work has been funded by project TIC-1692 (Junta de AndalucĂa), TIN2016-80920R (Spanish Ministry of Science and Technology) and Universidad de Málaga. Campus de Excelencia Internacional AndalucĂa Tech
State-Regularized Recurrent Neural Networks to Extract Automata and Explain Predictions
Recurrent neural networks are a widely used class of neural architectures.
They have, however, two shortcomings. First, they are often treated as
black-box models and as such it is difficult to understand what exactly they
learn as well as how they arrive at a particular prediction. Second, they tend
to work poorly on sequences requiring long-term memorization, despite having
this capacity in principle. We aim to address both shortcomings with a class of
recurrent networks that use a stochastic state transition mechanism between
cell applications. This mechanism, which we term state-regularization, makes
RNNs transition between a finite set of learnable states. We evaluate
state-regularized RNNs on (1) regular languages for the purpose of automata
extraction; (2) non-regular languages such as balanced parentheses and
palindromes where external memory is required; and (3) real-word sequence
learning tasks for sentiment analysis, visual object recognition and text
categorisation. We show that state-regularization (a) simplifies the extraction
of finite state automata that display an RNN's state transition dynamic; (b)
forces RNNs to operate more like automata with external memory and less like
finite state machines, which potentiality leads to a more structural memory;
(c) leads to better interpretability and explainability of RNNs by leveraging
the probabilistic finite state transition mechanism over time steps.Comment: To appear at IEEE Transactions on Pattern Analysis and Machine
Intelligence. The extended version of State-Regularized Recurrent Neural
Networks [arXiv:1901.08817
- …