117 research outputs found
One-Shot Learning using Mixture of Variational Autoencoders: a Generalization Learning approach
Deep learning, even if it is very successful nowadays, traditionally needs
very large amounts of labeled data to perform excellent on the classification
task. In an attempt to solve this problem, the one-shot learning paradigm,
which makes use of just one labeled sample per class and prior knowledge,
becomes increasingly important. In this paper, we propose a new one-shot
learning method, dubbed MoVAE (Mixture of Variational AutoEncoders), to perform
classification. Complementary to prior studies, MoVAE represents a shift of
paradigm in comparison with the usual one-shot learning methods, as it does not
use any prior knowledge. Instead, it starts from zero knowledge and one labeled
sample per class. Afterward, by using unlabeled data and the generalization
learning concept (in a way, more as humans do), it is capable to gradually
improve by itself its performance. Even more, if there are no unlabeled data
available MoVAE can still perform well in one-shot learning classification. We
demonstrate empirically the efficiency of our proposed approach on three
datasets, i.e. the handwritten digits (MNIST), fashion products
(Fashion-MNIST), and handwritten characters (Omniglot), showing that MoVAE
outperforms state-of-the-art one-shot learning algorithms
On the synergy of network science and artificial intelligence,”
Abstract Traditionally science is done using the reductionism paradigm. Artificial intelligence does not make an exception and it follows the same strategy. At the same time, network science tries to study complex systems as a whole. This Ph.D. research takes an alternative approach to the reductionism strategy, and tries to advance both fields, i.e. artificial intelligence and network science, by searching for the synergy between them, while not ignoring any other source of inspiration, e.g. neuroscience
Truly Sparse Neural Networks at Scale
Recently, sparse training methods have started to be established as a de
facto approach for training and inference efficiency in artificial neural
networks. Yet, this efficiency is just in theory. In practice, everyone uses a
binary mask to simulate sparsity since the typical deep learning software and
hardware are optimized for dense matrix operations. In this paper, we take an
orthogonal approach, and we show that we can train truly sparse neural networks
to harvest their full potential. To achieve this goal, we introduce three novel
contributions, specially designed for sparse neural networks: (1) a parallel
training algorithm and its corresponding sparse implementation from scratch,
(2) an activation function with non-trainable parameters to favour the gradient
flow, and (3) a hidden neurons importance metric to eliminate redundancies. All
in one, we are able to break the record and to train the largest neural network
ever trained in terms of representational power -- reaching the bat brain size.
The results show that our approach has state-of-the-art performance while
opening the path for an environmentally friendly artificial intelligence era.Comment: 30 pages, 17 figure
SpaceNet: Make Free Space For Continual Learning
The continual learning (CL) paradigm aims to enable neural networks to learn
tasks continually in a sequential fashion. The fundamental challenge in this
learning paradigm is catastrophic forgetting previously learned tasks when the
model is optimized for a new task, especially when their data is not
accessible. Current architectural-based methods aim at alleviating the
catastrophic forgetting problem but at the expense of expanding the capacity of
the model. Regularization-based methods maintain a fixed model capacity;
however, previous studies showed the huge performance degradation of these
methods when the task identity is not available during inference (e.g. class
incremental learning scenario). In this work, we propose a novel
architectural-based method referred as SpaceNet for class incremental learning
scenario where we utilize the available fixed capacity of the model
intelligently. SpaceNet trains sparse deep neural networks from scratch in an
adaptive way that compresses the sparse connections of each task in a compact
number of neurons. The adaptive training of the sparse connections results in
sparse representations that reduce the interference between the tasks.
Experimental results show the robustness of our proposed method against
catastrophic forgetting old tasks and the efficiency of SpaceNet in utilizing
the available capacity of the model, leaving space for more tasks to be
learned. In particular, when SpaceNet is tested on the well-known benchmarks
for CL: split MNIST, split Fashion-MNIST, and CIFAR-10/100, it outperforms
regularization-based methods by a big performance gap. Moreover, it achieves
better performance than architectural-based methods without model expansion and
achieved comparable results with rehearsal-based methods, while offering a huge
memory reduction.Comment: Accepted in Neurocomputing Journa
Learning with Delayed Synaptic Plasticity
The plasticity property of biological neural networks allows them to perform
learning and optimize their behavior by changing their configuration. Inspired
by biology, plasticity can be modeled in artificial neural networks by using
Hebbian learning rules, i.e. rules that update synapses based on the neuron
activations and reinforcement signals. However, the distal reward problem
arises when the reinforcement signals are not available immediately after each
network output to associate the neuron activations that contributed to
receiving the reinforcement signal. In this work, we extend Hebbian plasticity
rules to allow learning in distal reward cases. We propose the use of neuron
activation traces (NATs) to provide additional data storage in each synapse to
keep track of the activation of the neurons. Delayed reinforcement signals are
provided after each episode relative to the networks' performance during the
previous episode. We employ genetic algorithms to evolve delayed synaptic
plasticity (DSP) rules and perform synaptic updates based on NATs and delayed
reinforcement signals. We compare DSP with an analogous hill climbing algorithm
that does not incorporate domain knowledge introduced with the NATs, and show
that the synaptic updates performed by the DSP rules demonstrate more effective
training performance relative to the HC algorithm.Comment: GECCO201
Limited Evaluation Cooperative Co-evolutionary Differential Evolution for Large-scale Neuroevolution
Many real-world control and classification tasks involve a large number of
features. When artificial neural networks (ANNs) are used for modeling these
tasks, the network architectures tend to be large. Neuroevolution is an
effective approach for optimizing ANNs; however, there are two bottlenecks that
make their application challenging in case of high-dimensional networks using
direct encoding. First, classic evolutionary algorithms tend not to scale well
for searching large parameter spaces; second, the network evaluation over a
large number of training instances is in general time-consuming. In this work,
we propose an approach called the Limited Evaluation Cooperative
Co-evolutionary Differential Evolution algorithm (LECCDE) to optimize
high-dimensional ANNs.
The proposed method aims to optimize the pre-synaptic weights of each
post-synaptic neuron in different subpopulations using a Cooperative
Co-evolutionary Differential Evolution algorithm, and employs a limited
evaluation scheme where fitness evaluation is performed on a relatively small
number of training instances based on fitness inheritance. We test LECCDE on
three datasets with various sizes, and our results show that cooperative
co-evolution significantly improves the test error comparing to standard
Differential Evolution, while the limited evaluation scheme facilitates a
significant reduction in computing time
Self-Attention Meta-Learner for Continual Learning
Continual learning aims to provide intelligent agents capable of learning
multiple tasks sequentially with neural networks. One of its main challenging,
catastrophic forgetting, is caused by the neural networks non-optimal ability
to learn in non-stationary distributions. In most settings of the current
approaches, the agent starts from randomly initialized parameters and is
optimized to master the current task regardless of the usefulness of the
learned representation for future tasks. Moreover, each of the future tasks
uses all the previously learned knowledge although parts of this knowledge
might not be helpful for its learning. These cause interference among tasks,
especially when the data of previous tasks is not accessible. In this paper, we
propose a new method, named Self-Attention Meta-Learner (SAM), which learns a
prior knowledge for continual learning that permits learning a sequence of
tasks, while avoiding catastrophic forgetting. SAM incorporates an attention
mechanism that learns to select the particular relevant representation for each
future task. Each task builds a specific representation branch on top of the
selected knowledge, avoiding the interference between tasks. We evaluate the
proposed method on the Split CIFAR-10/100 and Split MNIST benchmarks in the
task agnostic inference. We empirically show that we can achieve a better
performance than several state-of-the-art methods for continual learning by
building on the top of selected representation learned by SAM. We also show the
role of the meta-attention mechanism in boosting informative features
corresponding to the input data and identifying the correct target in the task
agnostic inference. Finally, we demonstrate that popular existing continual
learning methods gain a performance boost when they adopt SAM as a starting
point
- …