1,554 research outputs found
Expert Gate: Lifelong Learning with a Network of Experts
In this paper we introduce a model of lifelong learning, based on a Network
of Experts. New tasks / experts are learned and added to the model
sequentially, building on what was learned before. To ensure scalability of
this process,data from previous tasks cannot be stored and hence is not
available when learning a new task. A critical issue in such context, not
addressed in the literature so far, relates to the decision which expert to
deploy at test time. We introduce a set of gating autoencoders that learn a
representation for the task at hand, and, at test time, automatically forward
the test sample to the relevant expert. This also brings memory efficiency as
only one expert network has to be loaded into memory at any given time.
Further, the autoencoders inherently capture the relatedness of one task to
another, based on which the most relevant prior model to be used for training a
new expert, with finetuning or learning without-forgetting, can be selected. We
evaluate our method on image classification and video prediction problems.Comment: CVPR 2017 pape
Deep Divergence-Based Approach to Clustering
A promising direction in deep learning research consists in learning
representations and simultaneously discovering cluster structure in unlabeled
data by optimizing a discriminative loss function. As opposed to supervised
deep learning, this line of research is in its infancy, and how to design and
optimize suitable loss functions to train deep neural networks for clustering
is still an open question. Our contribution to this emerging field is a new
deep clustering network that leverages the discriminative power of
information-theoretic divergence measures, which have been shown to be
effective in traditional clustering. We propose a novel loss function that
incorporates geometric regularization constraints, thus avoiding degenerate
structures of the resulting clustering partition. Experiments on synthetic
benchmarks and real datasets show that the proposed network achieves
competitive performance with respect to other state-of-the-art methods, scales
well to large datasets, and does not require pre-training steps
- …