182 research outputs found
Trade liberalization and the gender employment gap in China
This paper investigates the impact of import liberalization induced labor demand shocks on male and female employment in China. Combining data from population and firm census waves over the period of 1990 to 2005, we relate prefecture-level employment by gender to the exposure to tariff reductions on locally imported products. Our empirical results show that increasing import competition has kept more females in the workforce, reducing an otherwise growing gender employment gap. These dynamics were present both in the local economies as a whole and among formal private industrial firms. Examining channels through which tariff reductions differentially affected males and females, we find that trade induced competitive pressures contributed to a general expansion of female intensive industries, shifts in sectoral gender segregation, reductions in gender discrimination in the labor market, technological upgrading through computerization and general income growth
The Yellow Sea Surface Cold Patches in Warm Seasons
An important hydrographic phenomenon in the Yellow Sea is the surface cold patches (SCP) in warm seasons, among which the most conspicuous are the Shandong SCP, Subei SCP, and Mokpo SCP. Previous studies based on monthly mean fields propose that these patches result from the collaboration of tidal mixing and tidal induced upwelling. While this is true for patches like the Shandong SCP, the monthly mean tidal mixing and upwelling alone cannot explain all their formations. In this study, through a detailed analysis of their patterns over a spring-neap tidal cycle, it is found that the Subei and Mokpo SCPs show distinct spring-neap variations. During the neap tide phase, strong stratification is established, and hence the cold patches in these two areas may be greatly weakened or even suppressed, while during the spring tide phase, the surface temperature reaches its minimum. That is to say, for these two SCPs, besides the well-accepted mechanisms, the effect of spring-neap tidal variation must be taken into account
Self-Ensemble Protection: Training Checkpoints Are Good Data Protectors
As data become increasingly vital for deep learning, a company would be very
cautious about releasing data, because the competitors could use the released
data to train high-performance models, thereby posing a tremendous threat to
the company's commercial competence. To prevent training good models on the
data, imperceptible perturbations could be added to it. Since such
perturbations aim at hurting the entire training process, they should reflect
the vulnerability of DNN training, rather than that of a single model. Based on
this new idea, we seek adversarial examples that are always unrecognized (never
correctly classified) in training. In this paper, we uncover them by modeling
checkpoints' gradients, forming the proposed self-ensemble protection (SEP),
which is very effective because (1) learning on examples ignored during normal
training tends to yield DNNs ignoring normal examples; (2) checkpoints'
cross-model gradients are close to orthogonal, meaning that they are as diverse
as DNNs with different architectures in conventional ensemble. That is, our
amazing performance of ensemble only requires the computation of training one
model. By extensive experiments with 9 baselines on 3 datasets and 5
architectures, SEP is verified to be a new state-of-the-art, e.g., our small
perturbations reduce the accuracy of a CIFAR-10 ResNet18
from 94.56\% to 14.68\%, compared to 41.35\% by the best-known method.Code is
available at https://github.com/Sizhe-Chen/SEP
The Lottery Ticket Hypothesis for Vision Transformers
The conventional lottery ticket hypothesis (LTH) claims that there exists a
sparse subnetwork within a dense neural network and a proper random
initialization method, called the winning ticket, such that it can be trained
from scratch to almost as good as the dense counterpart. Meanwhile, the
research of LTH in vision transformers (ViTs) is scarcely evaluated. In this
paper, we first show that the conventional winning ticket is hard to find at
weight level of ViTs by existing methods. Then, we generalize the LTH for ViTs
to input images consisting of image patches inspired by the input dependence of
ViTs. That is, there exists a subset of input image patches such that a ViT can
be trained from scratch by using only this subset of patches and achieve
similar accuracy to the ViTs trained by using all image patches. We call this
subset of input patches the winning tickets, which represent a significant
amount of information in the input. Furthermore, we present a simple yet
effective method to find the winning tickets in input patches for various types
of ViT, including DeiT, LV-ViT, and Swin Transformers. More specifically, we
use a ticket selector to generate the winning tickets based on the
informativeness of patches. Meanwhile, we build another randomly selected
subset of patches for comparison, and the experiments show that there is clear
difference between the performance of models trained with winning tickets and
randomly selected subsets
All-in-One: A Highly Representative DNN Pruning Framework for Edge Devices with Dynamic Power Management
During the deployment of deep neural networks (DNNs) on edge devices, many
research efforts are devoted to the limited hardware resource. However, little
attention is paid to the influence of dynamic power management. As edge devices
typically only have a budget of energy with batteries (rather than almost
unlimited energy support on servers or workstations), their dynamic power
management often changes the execution frequency as in the widely-used dynamic
voltage and frequency scaling (DVFS) technique. This leads to highly unstable
inference speed performance, especially for computation-intensive DNN models,
which can harm user experience and waste hardware resources. We firstly
identify this problem and then propose All-in-One, a highly representative
pruning framework to work with dynamic power management using DVFS. The
framework can use only one set of model weights and soft masks (together with
other auxiliary parameters of negligible storage) to represent multiple models
of various pruning ratios. By re-configuring the model to the corresponding
pruning ratio for a specific execution frequency (and voltage), we are able to
achieve stable inference speed, i.e., keeping the difference in speed
performance under various execution frequencies as small as possible. Our
experiments demonstrate that our method not only achieves high accuracy for
multiple models of different pruning ratios, but also reduces their variance of
inference latency for various frequencies, with minimal memory consumption of
only one model and one soft mask
Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training
Vision transformers (ViTs) have recently obtained success in many
applications, but their intensive computation and heavy memory usage at both
training and inference time limit their generalization. Previous compression
algorithms usually start from the pre-trained dense models and only focus on
efficient inference, while time-consuming training is still unavoidable. In
contrast, this paper points out that the million-scale training data is
redundant, which is the fundamental reason for the tedious training. To address
the issue, this paper aims to introduce sparsity into data and proposes an
end-to-end efficient training framework from three sparse perspectives, dubbed
Tri-Level E-ViT. Specifically, we leverage a hierarchical data redundancy
reduction scheme, by exploring the sparsity under three levels: number of
training examples in the dataset, number of patches (tokens) in each example,
and number of connections between tokens that lie in attention weights. With
extensive experiments, we demonstrate that our proposed technique can
noticeably accelerate training for various ViT architectures while maintaining
accuracy. Remarkably, under certain ratios, we are able to improve the ViT
accuracy rather than compromising it. For example, we can achieve 15.2% speedup
with 72.6% (+0.4) Top-1 accuracy on Deit-T, and 15.7% speedup with 79.9% (+0.1)
Top-1 accuracy on Deit-S. This proves the existence of data redundancy in ViT.Comment: AAAI 202
- …