182 research outputs found

    Trade liberalization and the gender employment gap in China

    Get PDF
    This paper investigates the impact of import liberalization induced labor demand shocks on male and female employment in China. Combining data from population and firm census waves over the period of 1990 to 2005, we relate prefecture-level employment by gender to the exposure to tariff reductions on locally imported products. Our empirical results show that increasing import competition has kept more females in the workforce, reducing an otherwise growing gender employment gap. These dynamics were present both in the local economies as a whole and among formal private industrial firms. Examining channels through which tariff reductions differentially affected males and females, we find that trade induced competitive pressures contributed to a general expansion of female intensive industries, shifts in sectoral gender segregation, reductions in gender discrimination in the labor market, technological upgrading through computerization and general income growth

    The Yellow Sea Surface Cold Patches in Warm Seasons

    Get PDF
    An important hydrographic phenomenon in the Yellow Sea is the surface cold patches (SCP) in warm seasons, among which the most conspicuous are the Shandong SCP, Subei SCP, and Mokpo SCP. Previous studies based on monthly mean fields propose that these patches result from the collaboration of tidal mixing and tidal induced upwelling. While this is true for patches like the Shandong SCP, the monthly mean tidal mixing and upwelling alone cannot explain all their formations. In this study, through a detailed analysis of their patterns over a spring-neap tidal cycle, it is found that the Subei and Mokpo SCPs show distinct spring-neap variations. During the neap tide phase, strong stratification is established, and hence the cold patches in these two areas may be greatly weakened or even suppressed, while during the spring tide phase, the surface temperature reaches its minimum. That is to say, for these two SCPs, besides the well-accepted mechanisms, the effect of spring-neap tidal variation must be taken into account

    Self-Ensemble Protection: Training Checkpoints Are Good Data Protectors

    Full text link
    As data become increasingly vital for deep learning, a company would be very cautious about releasing data, because the competitors could use the released data to train high-performance models, thereby posing a tremendous threat to the company's commercial competence. To prevent training good models on the data, imperceptible perturbations could be added to it. Since such perturbations aim at hurting the entire training process, they should reflect the vulnerability of DNN training, rather than that of a single model. Based on this new idea, we seek adversarial examples that are always unrecognized (never correctly classified) in training. In this paper, we uncover them by modeling checkpoints' gradients, forming the proposed self-ensemble protection (SEP), which is very effective because (1) learning on examples ignored during normal training tends to yield DNNs ignoring normal examples; (2) checkpoints' cross-model gradients are close to orthogonal, meaning that they are as diverse as DNNs with different architectures in conventional ensemble. That is, our amazing performance of ensemble only requires the computation of training one model. By extensive experiments with 9 baselines on 3 datasets and 5 architectures, SEP is verified to be a new state-of-the-art, e.g., our small ℓ∞=2/255\ell_\infty=2/255 perturbations reduce the accuracy of a CIFAR-10 ResNet18 from 94.56\% to 14.68\%, compared to 41.35\% by the best-known method.Code is available at https://github.com/Sizhe-Chen/SEP

    The Lottery Ticket Hypothesis for Vision Transformers

    Full text link
    The conventional lottery ticket hypothesis (LTH) claims that there exists a sparse subnetwork within a dense neural network and a proper random initialization method, called the winning ticket, such that it can be trained from scratch to almost as good as the dense counterpart. Meanwhile, the research of LTH in vision transformers (ViTs) is scarcely evaluated. In this paper, we first show that the conventional winning ticket is hard to find at weight level of ViTs by existing methods. Then, we generalize the LTH for ViTs to input images consisting of image patches inspired by the input dependence of ViTs. That is, there exists a subset of input image patches such that a ViT can be trained from scratch by using only this subset of patches and achieve similar accuracy to the ViTs trained by using all image patches. We call this subset of input patches the winning tickets, which represent a significant amount of information in the input. Furthermore, we present a simple yet effective method to find the winning tickets in input patches for various types of ViT, including DeiT, LV-ViT, and Swin Transformers. More specifically, we use a ticket selector to generate the winning tickets based on the informativeness of patches. Meanwhile, we build another randomly selected subset of patches for comparison, and the experiments show that there is clear difference between the performance of models trained with winning tickets and randomly selected subsets

    All-in-One: A Highly Representative DNN Pruning Framework for Edge Devices with Dynamic Power Management

    Full text link
    During the deployment of deep neural networks (DNNs) on edge devices, many research efforts are devoted to the limited hardware resource. However, little attention is paid to the influence of dynamic power management. As edge devices typically only have a budget of energy with batteries (rather than almost unlimited energy support on servers or workstations), their dynamic power management often changes the execution frequency as in the widely-used dynamic voltage and frequency scaling (DVFS) technique. This leads to highly unstable inference speed performance, especially for computation-intensive DNN models, which can harm user experience and waste hardware resources. We firstly identify this problem and then propose All-in-One, a highly representative pruning framework to work with dynamic power management using DVFS. The framework can use only one set of model weights and soft masks (together with other auxiliary parameters of negligible storage) to represent multiple models of various pruning ratios. By re-configuring the model to the corresponding pruning ratio for a specific execution frequency (and voltage), we are able to achieve stable inference speed, i.e., keeping the difference in speed performance under various execution frequencies as small as possible. Our experiments demonstrate that our method not only achieves high accuracy for multiple models of different pruning ratios, but also reduces their variance of inference latency for various frequencies, with minimal memory consumption of only one model and one soft mask

    Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training

    Full text link
    Vision transformers (ViTs) have recently obtained success in many applications, but their intensive computation and heavy memory usage at both training and inference time limit their generalization. Previous compression algorithms usually start from the pre-trained dense models and only focus on efficient inference, while time-consuming training is still unavoidable. In contrast, this paper points out that the million-scale training data is redundant, which is the fundamental reason for the tedious training. To address the issue, this paper aims to introduce sparsity into data and proposes an end-to-end efficient training framework from three sparse perspectives, dubbed Tri-Level E-ViT. Specifically, we leverage a hierarchical data redundancy reduction scheme, by exploring the sparsity under three levels: number of training examples in the dataset, number of patches (tokens) in each example, and number of connections between tokens that lie in attention weights. With extensive experiments, we demonstrate that our proposed technique can noticeably accelerate training for various ViT architectures while maintaining accuracy. Remarkably, under certain ratios, we are able to improve the ViT accuracy rather than compromising it. For example, we can achieve 15.2% speedup with 72.6% (+0.4) Top-1 accuracy on Deit-T, and 15.7% speedup with 79.9% (+0.1) Top-1 accuracy on Deit-S. This proves the existence of data redundancy in ViT.Comment: AAAI 202
    • …
    corecore