28 research outputs found
Exploiting Prompt Caption for Video Grounding
Video grounding aims to locate a moment of interest matching the given query
sentence from an untrimmed video. Previous works ignore the \emph{sparsity
dilemma} in video annotations, which fails to provide the context information
between potential events and query sentences in the dataset. In this paper, we
contend that exploiting easily available captions which describe general
actions \ie, prompt captions (PC) defined in our paper, will significantly
boost the performance. To this end, we propose a Prompt Caption Network (PCNet)
for video grounding. Specifically, we first introduce dense video captioning to
generate dense captions and then obtain prompt captions by Non-Prompt Caption
Suppression (NPCS). To capture the potential information in prompt captions, we
propose Caption Guided Attention (CGA) project the semantic relations between
prompt captions and query sentences into temporal space and fuse them into
visual representations. Considering the gap between prompt captions and ground
truth, we propose Asymmetric Cross-modal Contrastive Learning (ACCL) for
constructing more negative pairs to maximize cross-modal mutual information.
Without bells and whistles, extensive experiments on three public datasets
(\ie, ActivityNet Captions, TACoS and ActivityNet-CG) demonstrate that our
method significantly outperforms state-of-the-art methods
G2L: Semantically Aligned and Uniform Video Grounding via Geodesic and Game Theory
The recent video grounding works attempt to introduce vanilla contrastive
learning into video grounding. However, we claim that this naive solution is
suboptimal. Contrastive learning requires two key properties: (1)
\emph{alignment} of features of similar samples, and (2) \emph{uniformity} of
the induced distribution of the normalized features on the hypersphere. Due to
two annoying issues in video grounding: (1) the co-existence of some visual
entities in both ground truth and other moments, \ie semantic overlapping; (2)
only a few moments in the video are annotated, \ie sparse annotation dilemma,
vanilla contrastive learning is unable to model the correlations between
temporally distant moments and learned inconsistent video representations. Both
characteristics lead to vanilla contrastive learning being unsuitable for video
grounding. In this paper, we introduce Geodesic and Game Localization (G2L), a
semantically aligned and uniform video grounding framework via geodesic and
game theory. We quantify the correlations among moments leveraging the geodesic
distance that guides the model to learn the correct cross-modal
representations. Furthermore, from the novel perspective of game theory, we
propose semantic Shapley interaction based on geodesic distance sampling to
learn fine-grained semantic alignment in similar moments. Experiments on three
benchmarks demonstrate the effectiveness of our method.Comment: ICCV202
ML-LMCL: Mutual Learning and Large-Margin Contrastive Learning for Improving ASR Robustness in Spoken Language Understanding
Spoken language understanding (SLU) is a fundamental task in the
task-oriented dialogue systems. However, the inevitable errors from automatic
speech recognition (ASR) usually impair the understanding performance and lead
to error propagation. Although there are some attempts to address this problem
through contrastive learning, they (1) treat clean manual transcripts and ASR
transcripts equally without discrimination in fine-tuning; (2) neglect the fact
that the semantically similar pairs are still pushed away when applying
contrastive learning; (3) suffer from the problem of Kullback-Leibler (KL)
vanishing. In this paper, we propose Mutual Learning and Large-Margin
Contrastive Learning (ML-LMCL), a novel framework for improving ASR robustness
in SLU. Specifically, in fine-tuning, we apply mutual learning and train two
SLU models on the manual transcripts and the ASR transcripts, respectively,
aiming to iteratively share knowledge between these two models. We also
introduce a distance polarization regularizer to avoid pushing away the
intra-cluster pairs as much as possible. Moreover, we use a cyclical annealing
schedule to mitigate KL vanishing issue. Experiments on three datasets show
that ML-LMCL outperforms existing models and achieves new state-of-the-art
performance
Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation
Automatic radiology report generation has attracted enormous research
interest due to its practical value in reducing the workload of radiologists.
However, simultaneously establishing global correspondences between the image
(e.g., Chest X-ray) and its related report and local alignments between image
patches and keywords remains challenging. To this end, we propose an Unify,
Align and then Refine (UAR) approach to learn multi-level cross-modal
alignments and introduce three novel modules: Latent Space Unifier (LSU),
Cross-modal Representation Aligner (CRA) and Text-to-Image Refiner (TIR).
Specifically, LSU unifies multimodal data into discrete tokens, making it
flexible to learn common knowledge among modalities with a shared network. The
modality-agnostic CRA learns discriminative features via a set of orthonormal
basis and a dual-gate mechanism first and then globally aligns visual and
textual representations under a triplet contrastive loss. TIR boosts
token-level local alignment via calibrating text-to-image attention with a
learnable mask. Additionally, we design a two-stage training procedure to make
UAR gradually grasp cross-modal alignments at different levels, which imitates
radiologists' workflow: writing sentence by sentence first and then checking
word by word. Extensive experiments and analyses on IU-Xray and MIMIC-CXR
benchmark datasets demonstrate the superiority of our UAR against varied
state-of-the-art methods.Comment: 8 pages,6 figures,4 table
Effect on Treatment of the Landfill Leachate with the Furrow Irrigation in Onland Planting Reed (<i>Phragmites</i>)
Low-order mixed finite element analysis of progressive failure in pressure-dependent materials within the framework of the Cosserat continuum
Transition-Layer Implantation for Improving Magnetoelectric Response in Co-fired Laminated Composite
Magnetoelectric (ME) laminated composites with strong ME coupling are becoming increasingly prevalent in the electron device field. In this paper, an enhancement of the ME coupling effect via transition-layer implantation for co-fired lead-free laminated composite (80Bi0.5Na0.5TiO3-20Bi0.5K0.5TiO3)/(Ni0.8Zn0.2)Fe2O4 (BNKT/NZFO) was demonstrated. A transition layer composed of particulate ME composite 0.5BNKT-0.5NZFO was introduced between the BNKT piezoelectric layer and the NZFO magnetostrictive layer, effectively connecting the two-phase interface and strengthening interface stress transfer. In particular, an optimal ME voltage coefficients (αME) of 144 mV/(cm·Oe) at 1 kHz and 1.05 V/(cm·Oe) at the resonant frequency in the composite was achieved, with a layer thickness ratio (BNKT:0.5BNKT-0.5NZFO:NZFO) of 3:1:6. The static elastic model was used to determine strong interface coupling. A large magnetodielectric (MD) response of 3.95% was found under a magnetic field excitation of 4 kOe. These results demonstrate that transition-layer implantation provides a new path to enhance the ME response in co-fired laminated composite, which can play an important role in developing magnetic field-tuned electronic devices
Selfâbiased magnetoelectric composite for energy harvesting
Abstract The wireless sensor network energy supply technology for the Internet of things has progressed substantially, but attempts to provide sustainable and environmentally friendly energy for sensor networks remain limited and considerably cumbersome for practical application. Energy harvesting devices based on the magnetoelectric (ME) coupling effect have promising prospects in the field of selfâpowered devices due to their advantages of small size, fast response, and low power consumption. Driven by application requirements, the development of composite with a selfâbiased magnetoelectric (SME) coupling effect provides effective strategies for the miniaturized and highâprecision design of energy harvesting devices. This review summarizes the work mechanism, research status, characteristics, and structures of SME composites, with emphasis on the application and development of SME devices for vibration and magnetic energy harvesting. The main challenges and future development directions for the design and implementation of energy harvesting devices based on the SME effect are presented