116 research outputs found

    Shakeout: A New Approach to Regularized Deep Neural Network Training

    Full text link
    Recent years have witnessed the success of deep neural networks in dealing with a plenty of practical problems. Dropout has played an essential role in many successful deep neural networks, by inducing regularization in the model training. In this paper, we present a new regularized training approach: Shakeout. Instead of randomly discarding units as Dropout does at the training stage, Shakeout randomly chooses to enhance or reverse each unit's contribution to the next layer. This minor modification of Dropout has the statistical trait: the regularizer induced by Shakeout adaptively combines L0L_0, L1L_1 and L2L_2 regularization terms. Our classification experiments with representative deep architectures on image datasets MNIST, CIFAR-10 and ImageNet show that Shakeout deals with over-fitting effectively and outperforms Dropout. We empirically demonstrate that Shakeout leads to sparser weights under both unsupervised and supervised settings. Shakeout also leads to the grouping effect of the input units in a layer. Considering the weights in reflecting the importance of connections, Shakeout is superior to Dropout, which is valuable for the deep model compression. Moreover, we demonstrate that Shakeout can effectively reduce the instability of the training process of the deep architecture.Comment: Appears at T-PAMI 201

    Regularization in deep neural networks

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Recent years have witnessed the great success of deep learning. As the deep architecture becomes larger and deeper, it is easy to overfit to relatively small amount of data. Regularization has proved to be an effective way to reduce overfitting in traditional statistical learning area. In the context of deep learning, some special design is required to regularize their training process. Generally, we firstly proposed a new regularization technique named “Shakeout” to improve the generalization ability of deep neural networks beyond Dropout, via introducing a combination of L₀, L₁, and L₂ regularization effect into the network training. Then we considered the unsupervised domain adaptation setting where the source domain data is labelled and the target domain data is unlabeled. We proposed “deep adversarial attention alignment” to regularize the behavior of the convolutional layers. Such regularization reduces the domain shift existing at the start in the convolutional layers which has been ignored by previous works and leads to superior adaptation results

    Prediction of Supernova Rates in Known Galaxy-galaxy Strong-lens Systems

    Full text link
    We propose a new strategy of finding strongly-lensed supernovae (SNe) by monitoring known galaxy-scale strong-lens systems. Strongly lensed SNe are potentially powerful tools for the study of cosmology, galaxy evolution, and stellar populations, but they are extremely rare. By targeting known strongly lensed starforming galaxies, our strategy significantly boosts the detection efficiency for lensed SNe compared to a blind search. As a reference sample, we compile the 128 galaxy-galaxy strong-lens systems from the Sloan Lens ACS Survey (SLACS), the SLACS for the Masses Survey, and the Baryon Oscillation Spectroscopic Survey Emission-Line Lens Survey. Within this sample, we estimate the rates of strongly-lensed Type Ia SN (SNIa) and core-collapse SN (CCSN) to be 1.23±0.121.23 \pm 0.12 and 10.4±1.110.4 \pm 1.1 events per year, respectively. The lensed SN images are expected to be widely separated with a median separation of 2 arcsec. Assuming a conservative fiducial lensing magnification factor of 5 for the most highly magnified SN image, we forecast that a monitoring program with a single-visit depth of 24.7 mag (5σ\sigma point source, rr band) and a cadence of 5 days can detect 0.49 strongly-lensed SNIa event and 2.1 strongly-lensed CCSN events per year within this sample. Our proposed targeted-search strategy is particularly useful for prompt and efficient identifications and follow-up observations of strongly-lensed SN candidates. It also allows telescopes with small field of views and limited time to efficiently discover strongly-lensed SNe with a pencil-beam scanning strategy.Comment: 14 pages, 5 figures, ApJ in pres

    The Correspondence between Convergence Peaks from Weak Lensing and Massive Dark Matter Haloes

    Full text link
    The convergence peaks, constructed from galaxy shape measurement in weak lensing, is a powerful probe of cosmology as the peaks can be connected with the underlined dark matter haloes. However the capability of convergence peak statistic is affected by the noise in galaxy shape measurement, signal to noise ratio as well as the contribution from the projected mass distribution from the large-scale structures along the line of sight (LOS). In this paper we use the ray-tracing simulation on a curved sky to investigate the correspondence between the convergence peak and the dark matter haloes at the LOS. We find that, in case of no noise and for source galaxies at zs=1z_{\rm s}=1, more than 65%65\% peaks with SNR3\text{SNR} \geq 3 (signal to noise ratio) are related to more than one massive haloes with mass larger than 1013M10^{13} {\rm M}_{\odot}. Those massive haloes contribute 87.2%87.2\% to high peaks (SNR5\text{SNR} \geq 5) with the remaining contributions are from the large-scale structures. On the other hand, the peaks distribution is skewed by the noise in galaxy shape measurement, especially for lower SNR peaks. In the noisy field where the shape noise is modelled as a Gaussian distribution, about 60%60\% high peaks (SNR5\text{SNR} \geq 5) are true peaks and the fraction decreases to 20%20\% for lower peaks (3SNR<5 3 \leq \text{SNR} < 5). Furthermore, we find that high peaks (SNR5\text{SNR} \geq 5) are dominated by very massive haloes larger than 1014M10^{14} {\rm M}_{\odot}.Comment: 13 pages, 11 figures, 4 tables, accepted for publication in MNRAS. Our mock galaxy catalog is available upon request by email to the author ([email protected]

    SLCA: Slow Learner with Classifier Alignment for Continual Learning on a Pre-trained Model

    Full text link
    The goal of continual learning is to improve the performance of recognition models in learning sequentially arrived data. Although most existing works are established on the premise of learning from scratch, growing efforts have been devoted to incorporating the benefits of pre-training. However, how to adaptively exploit the pre-trained knowledge for each incremental task while maintaining its generalizability remains an open question. In this work, we present an extensive analysis for continual learning on a pre-trained model (CLPM), and attribute the key challenge to a progressive overfitting problem. Observing that selectively reducing the learning rate can almost resolve this issue in the representation layer, we propose a simple but extremely effective approach named Slow Learner with Classifier Alignment (SLCA), which further improves the classification layer by modeling the class-wise distributions and aligning the classification layers in a post-hoc fashion. Across a variety of scenarios, our proposal provides substantial improvements for CLPM (e.g., up to 49.76%, 50.05%, 44.69% and 40.16% on Split CIFAR-100, Split ImageNet-R, Split CUB-200 and Split Cars-196, respectively), and thus outperforms state-of-the-art approaches by a large margin. Based on such a strong baseline, critical factors and promising directions are analyzed in-depth to facilitate subsequent research.Comment: 11 pages, 8 figures, accepted by ICCV 202
    corecore