878 research outputs found
SpliceMix: A Cross-scale and Semantic Blending Augmentation Strategy for Multi-label Image Classification
Recently, Mix-style data augmentation methods (e.g., Mixup and CutMix) have
shown promising performance in various visual tasks. However, these methods are
primarily designed for single-label images, ignoring the considerable
discrepancies between single- and multi-label images, i.e., a multi-label image
involves multiple co-occurred categories and fickle object scales. On the other
hand, previous multi-label image classification (MLIC) methods tend to design
elaborate models, bringing expensive computation. In this paper, we introduce a
simple but effective augmentation strategy for multi-label image
classification, namely SpliceMix. The "splice" in our method is two-fold: 1)
Each mixed image is a splice of several downsampled images in the form of a
grid, where the semantics of images attending to mixing are blended without
object deficiencies for alleviating co-occurred bias; 2) We splice mixed images
and the original mini-batch to form a new SpliceMixed mini-batch, which allows
an image with different scales to contribute to training together. Furthermore,
such splice in our SpliceMixed mini-batch enables interactions between mixed
images and original regular images. We also offer a simple and non-parametric
extension based on consistency learning (SpliceMix-CL) to show the flexible
extensibility of our SpliceMix. Extensive experiments on various tasks
demonstrate that only using SpliceMix with a baseline model (e.g., ResNet)
achieves better performance than state-of-the-art methods. Moreover, the
generalizability of our SpliceMix is further validated by the improvements in
current MLIC methods when married with our SpliceMix. The code is available at
https://github.com/zuiran/SpliceMix.Comment: 13 pages, 10 figure
Unlikelihood Tuning on Negative Samples Amazingly Improves Zero-Shot Translation
Zero-shot translation (ZST), which is generally based on a multilingual
neural machine translation model, aims to translate between unseen language
pairs in training data. The common practice to guide the zero-shot language
mapping during inference is to deliberately insert the source and target
language IDs, e.g., for English and for German. Recent studies have
shown that language IDs sometimes fail to navigate the ZST task, making them
suffer from the off-target problem (non-target language words exist in the
generated translation) and, therefore, difficult to apply the current
multilingual translation model to a broad range of zero-shot language
scenarios. To understand when and why the navigation capabilities of language
IDs are weakened, we compare two extreme decoder input cases in the ZST
directions: Off-Target (OFF) and On-Target (ON) cases. By contrastively
visualizing the contextual word representations (CWRs) of these cases with
teacher forcing, we show that 1) the CWRs of different languages are
effectively distributed in separate regions when the sentence and ID are
matched (ON setting), and 2) if the sentence and ID are unmatched (OFF
setting), the CWRs of different languages are chaotically distributed. Our
analyses suggest that although they work well in ideal ON settings, language
IDs become fragile and lose their navigation ability when faced with off-target
tokens, which commonly exist during inference but are rare in training
scenarios. In response, we employ unlikelihood tuning on the negative (OFF)
samples to minimize their probability such that the language IDs can
discriminate between the on- and off-target tokens during training. Experiments
spanning 40 ZST directions show that our method reduces the off-target ratio by
-48.0% on average, leading to a +9.1 BLEU improvement with only an extra +0.3%
tuning cost
Free-Form Composition Networks for Egocentric Action Recognition
Egocentric action recognition is gaining significant attention in the field
of human action recognition. In this paper, we address data scarcity issue in
egocentric action recognition from a compositional generalization perspective.
To tackle this problem, we propose a free-form composition network (FFCN) that
can simultaneously learn disentangled verb, preposition, and noun
representations, and then use them to compose new samples in the feature space
for rare classes of action videos. First, we use a graph to capture the
spatial-temporal relations among different hand/object instances in each action
video. We thus decompose each action into a set of verb and preposition
spatial-temporal representations using the edge features in the graph. The
temporal decomposition extracts verb and preposition representations from
different video frames, while the spatial decomposition adaptively learns verb
and preposition representations from action-related instances in each frame.
With these spatial-temporal representations of verbs and prepositions, we can
compose new samples for those rare classes in a free-form manner, which is not
restricted to a rigid form of a verb and a noun. The proposed FFCN can directly
generate new training data samples for rare classes, hence significantly
improve action recognition performance. We evaluated our method on three
popular egocentric action recognition datasets, Something-Something V2, H2O,
and EPIC-KITCHENS-100, and the experimental results demonstrate the
effectiveness of the proposed method for handling data scarcity problems,
including long-tailed and few-shot egocentric action recognition
Comprehensive Graph Gradual Pruning for Sparse Training in Graph Neural Networks
Graph Neural Networks (GNNs) tend to suffer from high computation costs due
to the exponentially increasing scale of graph data and the number of model
parameters, which restricts their utility in practical applications. To this
end, some recent works focus on sparsifying GNNs with the lottery ticket
hypothesis (LTH) to reduce inference costs while maintaining performance
levels. However, the LTH-based methods suffer from two major drawbacks: 1) they
require exhaustive and iterative training of dense models, resulting in an
extremely large training computation cost, and 2) they only trim graph
structures and model parameters but ignore the node feature dimension, where
significant redundancy exists. To overcome the above limitations, we propose a
comprehensive graph gradual pruning framework termed CGP. This is achieved by
designing a during-training graph pruning paradigm to dynamically prune GNNs
within one training process. Unlike LTH-based methods, the proposed CGP
approach requires no re-training, which significantly reduces the computation
costs. Furthermore, we design a co-sparsifying strategy to comprehensively trim
all three core elements of GNNs: graph structures, node features, and model
parameters. Meanwhile, aiming at refining the pruning operation, we introduce a
regrowth process into our CGP framework, in order to re-establish the pruned
but important connections. The proposed CGP is evaluated by using a node
classification task across 6 GNN architectures, including shallow models (GCN
and GAT), shallow-but-deep-propagation models (SGC and APPNP), and deep models
(GCNII and ResGCN), on a total of 14 real-world graph datasets, including
large-scale graph datasets from the challenging Open Graph Benchmark.
Experiments reveal that our proposed strategy greatly improves both training
and inference efficiency while matching or even exceeding the accuracy of
existing methods.Comment: 29 pages, 27 figures, submitting to IEEE TNNL
Iron(III) Chloride-catalyzed Nucleophilic Substitution of Propargylic Alcohols: A General and Efficient Approach for the Synthesis of 1,4-Diynes
A wide variety of 1,4-diynes have been constructed via a novel FeCl(3)-catalyzed coupling reaction of propargylic alcohols with alkynylsilanes. This synthetic approach provides a general, efficient, and economical route to 1,4-cliynes.National Natural Science Foundation of China[20772098, 21072159
Anesthetic management for cytoreductive surgery of pseudomyxoma peritonei with high intra-abdominal pressure: A case report
Anesthetic management for patients of pseudomyxoma peritonei (PMP) is challenging. This case report describes a patient of PMP with high intra-abdominal pressure. Intubation was performed in lateral position; the intraabdominal pressure was relieved slowly to prevent significant hemodynamic changes. Additionally, positive pressure ventilation was performed to reduce the risk of re-expansion pulmonary edema. During the operation, transfusion and infusion therapy was performed with target-mediated fluid therapy according to stroke volume variation (SVV) and cardiac index (CI) and blood gas analysis
EvoPass: Evolvable graphical password against shoulder-surfing attacks
National Research Foundation (NRF) Singapor
The fast light of CsI(Na) crystals
The responds of different common alkali halide crystals to alpha-rays and
gamma-rays are tested in our research. It is found that only CsI(Na) crystals
have significantly different waveforms between alpha and gamma scintillations,
while others have not this phenomena. It is suggested that the fast light of
CsI(Na) crystals arises from the recombination of free electrons with
self-trapped holes of the host crystal CsI. Self-absorption limits the emission
of fast light of CsI(Tl) and NaI(Tl) crystals.Comment: 5 pages, 11 figures Submit to Chinese Physics
- …