12,724 research outputs found
Diversified Texture Synthesis with Feed-forward Networks
Recent progresses on deep discriminative and generative modeling have shown
promising results on texture synthesis. However, existing feed-forward based
methods trade off generality for efficiency, which suffer from many issues,
such as shortage of generality (i.e., build one network per texture), lack of
diversity (i.e., always produce visually identical output) and suboptimality
(i.e., generate less satisfying visual effects). In this work, we focus on
solving these issues for improved texture synthesis. We propose a deep
generative feed-forward network which enables efficient synthesis of multiple
textures within one single network and meaningful interpolation between them.
Meanwhile, a suite of important techniques are introduced to achieve better
convergence and diversity. With extensive experiments, we demonstrate the
effectiveness of the proposed model and techniques for synthesizing a large
number of textures and show its applications with the stylization.Comment: accepted by CVPR201
Incremental Learning Using a Grow-and-Prune Paradigm with Efficient Neural Networks
Deep neural networks (DNNs) have become a widely deployed model for numerous
machine learning applications. However, their fixed architecture, substantial
training cost, and significant model redundancy make it difficult to
efficiently update them to accommodate previously unseen data. To solve these
problems, we propose an incremental learning framework based on a
grow-and-prune neural network synthesis paradigm. When new data arrive, the
neural network first grows new connections based on the gradients to increase
the network capacity to accommodate new data. Then, the framework iteratively
prunes away connections based on the magnitude of weights to enhance network
compactness, and hence recover efficiency. Finally, the model rests at a
lightweight DNN that is both ready for inference and suitable for future
grow-and-prune updates. The proposed framework improves accuracy, shrinks
network size, and significantly reduces the additional training cost for
incoming data compared to conventional approaches, such as training from
scratch and network fine-tuning. For the LeNet-300-100 and LeNet-5 neural
network architectures derived for the MNIST dataset, the framework reduces
training cost by up to 64% (63%) and 67% (63%) compared to training from
scratch (network fine-tuning), respectively. For the ResNet-18 architecture
derived for the ImageNet dataset and DeepSpeech2 for the AN4 dataset, the
corresponding training cost reductions against training from scratch (network
fine-tunning) are 64% (60%) and 67% (62%), respectively. Our derived models
contain fewer network parameters but achieve higher accuracy relative to
conventional baselines
Adding New Tasks to a Single Network with Weight Transformations using Binary Masks
Visual recognition algorithms are required today to exhibit adaptive
abilities. Given a deep model trained on a specific, given task, it would be
highly desirable to be able to adapt incrementally to new tasks, preserving
scalability as the number of new tasks increases, while at the same time
avoiding catastrophic forgetting issues. Recent work has shown that masking the
internal weights of a given original conv-net through learned binary variables
is a promising strategy. We build upon this intuition and take into account
more elaborated affine transformations of the convolutional weights that
include learned binary masks. We show that with our generalization it is
possible to achieve significantly higher levels of adaptation to new tasks,
enabling the approach to compete with fine tuning strategies by requiring
slightly more than 1 bit per network parameter per additional task. Experiments
on two popular benchmarks showcase the power of our approach, that achieves the
new state of the art on the Visual Decathlon Challenge
Incremental multi-domain learning with network latent tensor factorization
The prominence of deep learning, large amount of annotated data and
increasingly powerful hardware made it possible to reach remarkable performance
for supervised classification tasks, in many cases saturating the training
sets. However the resulting models are specialized to a single very specific
task and domain. Adapting the learned classification to new domains is a hard
problem due to at least three reasons: (1) the new domains and the tasks might
be drastically different; (2) there might be very limited amount of annotated
data on the new domain and (3) full training of a new model for each new task
is prohibitive in terms of computation and memory, due to the sheer number of
parameters of deep CNNs. In this paper, we present a method to learn
new-domains and tasks incrementally, building on prior knowledge from already
learned tasks and without catastrophic forgetting. We do so by jointly
parametrizing weights across layers using low-rank Tucker structure. The core
is task agnostic while a set of task specific factors are learnt on each new
domain. We show that leveraging tensor structure enables better performance
than simply using matrix operations. Joint tensor modelling also naturally
leverages correlations across different layers. Compared with previous methods
which have focused on adapting each layer separately, our approach results in
more compact representations for each new task/domain. We apply the proposed
method to the 10 datasets of the Visual Decathlon Challenge and show that our
method offers on average about 7.5x reduction in number of parameters and
competitive performance in terms of both classification accuracy and Decathlon
score.Comment: AAAI2
DeeperCut: A Deeper, Stronger, and Faster Multi-Person Pose Estimation Model
The goal of this paper is to advance the state-of-the-art of articulated pose
estimation in scenes with multiple people. To that end we contribute on three
fronts. We propose (1) improved body part detectors that generate effective
bottom-up proposals for body parts; (2) novel image-conditioned pairwise terms
that allow to assemble the proposals into a variable number of consistent body
part configurations; and (3) an incremental optimization strategy that explores
the search space more efficiently thus leading both to better performance and
significant speed-up factors. Evaluation is done on two single-person and two
multi-person pose estimation benchmarks. The proposed approach significantly
outperforms best known multi-person pose estimation results while demonstrating
competitive performance on the task of single person pose estimation. Models
and code available at http://pose.mpi-inf.mpg.deComment: ECCV'16. High-res version at
https://www.d2.mpi-inf.mpg.de/sites/default/files/insafutdinov16arxiv.pd
- …