5,365 research outputs found
Cooperative initialization based deep neural network training
Researchers have proposed various activation functions. These activation
functions help the deep network to learn non-linear behavior with a significant
effect on training dynamics and task performance. The performance of these
activations also depends on the initial state of the weight parameters, i.e.,
different initial state leads to a difference in the performance of a network.
In this paper, we have proposed a cooperative initialization for training the
deep network using ReLU activation function to improve the network performance.
Our approach uses multiple activation functions in the initial few epochs for
the update of all sets of weight parameters while training the network. These
activation functions cooperate to overcome their drawbacks in the update of
weight parameters, which in effect learn better "feature representation" and
boost the network performance later. Cooperative initialization based training
also helps in reducing the overfitting problem and does not increase the number
of parameters, inference (test) time in the final model while improving the
performance. Experiments show that our approach outperforms various baselines
and, at the same time, performs well over various tasks such as classification
and detection. The Top-1 classification accuracy of the model trained using our
approach improves by 2.8% for VGG-16 and 2.1% for ResNet-56 on CIFAR-100
dataset.Comment: IEEE Winter Conference on Applications of Computer Vision (WACV),
202
On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models
This study investigates the effects of Markov chain Monte Carlo (MCMC)
sampling in unsupervised Maximum Likelihood (ML) learning. Our attention is
restricted to the family of unnormalized probability densities for which the
negative log density (or energy function) is a ConvNet. We find that many of
the techniques used to stabilize training in previous studies are not
necessary. ML learning with a ConvNet potential requires only a few
hyper-parameters and no regularization. Using this minimal framework, we
identify a variety of ML learning outcomes that depend solely on the
implementation of MCMC sampling.
On one hand, we show that it is easy to train an energy-based model which can
sample realistic images with short-run Langevin. ML can be effective and stable
even when MCMC samples have much higher energy than true steady-state samples
throughout training. Based on this insight, we introduce an ML method with
purely noise-initialized MCMC, high-quality short-run synthesis, and the same
budget as ML with informative MCMC initialization such as CD or PCD. Unlike
previous models, our energy model can obtain realistic high-diversity samples
from a noise signal after training.
On the other hand, ConvNet potentials learned with non-convergent MCMC do not
have a valid steady-state and cannot be considered approximate unnormalized
densities of the training data because long-run MCMC samples differ greatly
from observed images. We show that it is much harder to train a ConvNet
potential to learn a steady-state over realistic images. To our knowledge,
long-run MCMC samples of all previous models lose the realism of short-run
samples. With correct tuning of Langevin noise, we train the first ConvNet
potentials for which long-run and steady-state MCMC samples are realistic
images.Comment: Code available at: https://github.com/point0bar1/ebm-anatom
Learning Deep Similarity Metric for 3D MR-TRUS Registration
Purpose: The fusion of transrectal ultrasound (TRUS) and magnetic resonance
(MR) images for guiding targeted prostate biopsy has significantly improved the
biopsy yield of aggressive cancers. A key component of MR-TRUS fusion is image
registration. However, it is very challenging to obtain a robust automatic
MR-TRUS registration due to the large appearance difference between the two
imaging modalities. The work presented in this paper aims to tackle this
problem by addressing two challenges: (i) the definition of a suitable
similarity metric and (ii) the determination of a suitable optimization
strategy.
Methods: This work proposes the use of a deep convolutional neural network to
learn a similarity metric for MR-TRUS registration. We also use a composite
optimization strategy that explores the solution space in order to search for a
suitable initialization for the second-order optimization of the learned
metric. Further, a multi-pass approach is used in order to smooth the metric
for optimization.
Results: The learned similarity metric outperforms the classical mutual
information and also the state-of-the-art MIND feature based methods. The
results indicate that the overall registration framework has a large capture
range. The proposed deep similarity metric based approach obtained a mean TRE
of 3.86mm (with an initial TRE of 16mm) for this challenging problem.
Conclusion: A similarity metric that is learned using a deep neural network
can be used to assess the quality of any given image registration and can be
used in conjunction with the aforementioned optimization framework to perform
automatic registration that is robust to poor initialization.Comment: To appear on IJCAR
Unsupervised Domain Adaptation using Graph Transduction Games
Unsupervised domain adaptation (UDA) amounts to assigning class labels to the
unlabeled instances of a dataset from a target domain, using labeled instances
of a dataset from a related source domain. In this paper, we propose to cast
this problem in a game-theoretic setting as a non-cooperative game and
introduce a fully automatized iterative algorithm for UDA based on graph
transduction games (GTG). The main advantages of this approach are its
principled foundation, guaranteed termination of the iterative algorithms to a
Nash equilibrium (which corresponds to a consistent labeling condition) and
soft labels quantifying the uncertainty of the label assignment process. We
also investigate the beneficial effect of using pseudo-labels from linear
classifiers to initialize the iterative process. The performance of the
resulting methods is assessed on publicly available object recognition
benchmark datasets involving both shallow and deep features. Results of
experiments demonstrate the suitability of the proposed game-theoretic approach
for solving UDA tasks.Comment: Oral IJCNN 201
Recommended from our members
Towards Informed Exploration for Deep Reinforcement Learning
In this thesis, we discuss various techniques for improving exploration for deep reinforcement learning. We begin with a brief review of reinforcement learning (RL) and the fundamental v.s. exploitation trade-off. Then we review how deep RL has improved upon classical and summarize six categories of the latest exploration methods for deep RL, in the order increasing usage of prior information. We then explore representative works in three categories discuss their strengths and weaknesses. The first category, represented by Soft Q-learning, uses regularization to encourage exploration. The second category, represented by count-based via hashing, maps states to hash codes for counting and assigns higher exploration to less-encountered states. The third category utilizes hierarchy and is represented by modular architecture for RL agents to play StarCraft II. Finally, we conclude that exploration by prior knowledge is a promising research direction and suggest topics of potentially impact
- …