116 research outputs found
Shakeout: A New Approach to Regularized Deep Neural Network Training
Recent years have witnessed the success of deep neural networks in dealing
with a plenty of practical problems. Dropout has played an essential role in
many successful deep neural networks, by inducing regularization in the model
training. In this paper, we present a new regularized training approach:
Shakeout. Instead of randomly discarding units as Dropout does at the training
stage, Shakeout randomly chooses to enhance or reverse each unit's contribution
to the next layer. This minor modification of Dropout has the statistical
trait: the regularizer induced by Shakeout adaptively combines , and
regularization terms. Our classification experiments with representative
deep architectures on image datasets MNIST, CIFAR-10 and ImageNet show that
Shakeout deals with over-fitting effectively and outperforms Dropout. We
empirically demonstrate that Shakeout leads to sparser weights under both
unsupervised and supervised settings. Shakeout also leads to the grouping
effect of the input units in a layer. Considering the weights in reflecting the
importance of connections, Shakeout is superior to Dropout, which is valuable
for the deep model compression. Moreover, we demonstrate that Shakeout can
effectively reduce the instability of the training process of the deep
architecture.Comment: Appears at T-PAMI 201
Regularization in deep neural networks
University of Technology Sydney. Faculty of Engineering and Information Technology.Recent years have witnessed the great success of deep learning. As the deep architecture becomes larger and deeper, it is easy to overfit to relatively small amount of data. Regularization has proved to be an effective way to reduce overfitting in traditional statistical learning area. In the context of deep learning, some special design is required to regularize their training process. Generally, we firstly proposed a new regularization technique named “Shakeout” to improve the generalization ability of deep neural networks beyond Dropout, via introducing a combination of L₀, L₁, and L₂ regularization effect into the network training. Then we considered the unsupervised domain adaptation setting where the source domain data is labelled and the target domain data is unlabeled. We proposed “deep adversarial attention alignment” to regularize the behavior of the convolutional layers. Such regularization reduces the domain shift existing at the start in the convolutional layers which has been ignored by previous works and leads to superior adaptation results
Prediction of Supernova Rates in Known Galaxy-galaxy Strong-lens Systems
We propose a new strategy of finding strongly-lensed supernovae (SNe) by
monitoring known galaxy-scale strong-lens systems. Strongly lensed SNe are
potentially powerful tools for the study of cosmology, galaxy evolution, and
stellar populations, but they are extremely rare. By targeting known strongly
lensed starforming galaxies, our strategy significantly boosts the detection
efficiency for lensed SNe compared to a blind search. As a reference sample, we
compile the 128 galaxy-galaxy strong-lens systems from the Sloan Lens ACS
Survey (SLACS), the SLACS for the Masses Survey, and the Baryon Oscillation
Spectroscopic Survey Emission-Line Lens Survey. Within this sample, we estimate
the rates of strongly-lensed Type Ia SN (SNIa) and core-collapse SN (CCSN) to
be and events per year, respectively. The lensed
SN images are expected to be widely separated with a median separation of 2
arcsec. Assuming a conservative fiducial lensing magnification factor of 5 for
the most highly magnified SN image, we forecast that a monitoring program with
a single-visit depth of 24.7 mag (5 point source, band) and a
cadence of 5 days can detect 0.49 strongly-lensed SNIa event and 2.1
strongly-lensed CCSN events per year within this sample. Our proposed
targeted-search strategy is particularly useful for prompt and efficient
identifications and follow-up observations of strongly-lensed SN candidates. It
also allows telescopes with small field of views and limited time to
efficiently discover strongly-lensed SNe with a pencil-beam scanning strategy.Comment: 14 pages, 5 figures, ApJ in pres
The Correspondence between Convergence Peaks from Weak Lensing and Massive Dark Matter Haloes
The convergence peaks, constructed from galaxy shape measurement in weak
lensing, is a powerful probe of cosmology as the peaks can be connected with
the underlined dark matter haloes. However the capability of convergence peak
statistic is affected by the noise in galaxy shape measurement, signal to noise
ratio as well as the contribution from the projected mass distribution from the
large-scale structures along the line of sight (LOS). In this paper we use the
ray-tracing simulation on a curved sky to investigate the correspondence
between the convergence peak and the dark matter haloes at the LOS. We find
that, in case of no noise and for source galaxies at , more than
peaks with (signal to noise ratio) are related to
more than one massive haloes with mass larger than .
Those massive haloes contribute to high peaks ()
with the remaining contributions are from the large-scale structures. On the
other hand, the peaks distribution is skewed by the noise in galaxy shape
measurement, especially for lower SNR peaks. In the noisy field where the shape
noise is modelled as a Gaussian distribution, about high peaks
() are true peaks and the fraction decreases to for
lower peaks (). Furthermore, we find that high peaks
() are dominated by very massive haloes larger than .Comment: 13 pages, 11 figures, 4 tables, accepted for publication in MNRAS.
Our mock galaxy catalog is available upon request by email to the author
([email protected]
SLCA: Slow Learner with Classifier Alignment for Continual Learning on a Pre-trained Model
The goal of continual learning is to improve the performance of recognition
models in learning sequentially arrived data. Although most existing works are
established on the premise of learning from scratch, growing efforts have been
devoted to incorporating the benefits of pre-training. However, how to
adaptively exploit the pre-trained knowledge for each incremental task while
maintaining its generalizability remains an open question. In this work, we
present an extensive analysis for continual learning on a pre-trained model
(CLPM), and attribute the key challenge to a progressive overfitting problem.
Observing that selectively reducing the learning rate can almost resolve this
issue in the representation layer, we propose a simple but extremely effective
approach named Slow Learner with Classifier Alignment (SLCA), which further
improves the classification layer by modeling the class-wise distributions and
aligning the classification layers in a post-hoc fashion. Across a variety of
scenarios, our proposal provides substantial improvements for CLPM (e.g., up to
49.76%, 50.05%, 44.69% and 40.16% on Split CIFAR-100, Split ImageNet-R, Split
CUB-200 and Split Cars-196, respectively), and thus outperforms
state-of-the-art approaches by a large margin. Based on such a strong baseline,
critical factors and promising directions are analyzed in-depth to facilitate
subsequent research.Comment: 11 pages, 8 figures, accepted by ICCV 202
- …