66 research outputs found
Confidence-aware Non-repetitive Multimodal Transformers for TextCaps
When describing an image, reading text in the visual scene is crucial to
understand the key information. Recent work explores the TextCaps task, i.e.
image captioning with reading Optical Character Recognition (OCR) tokens, which
requires models to read text and cover them in generated captions. Existing
approaches fail to generate accurate descriptions because of their (1) poor
reading ability; (2) inability to choose the crucial words among all extracted
OCR tokens; (3) repetition of words in predicted captions. To this end, we
propose a Confidence-aware Non-repetitive Multimodal Transformers (CNMT) to
tackle the above challenges. Our CNMT consists of a reading, a reasoning and a
generation modules, in which Reading Module employs better OCR systems to
enhance text reading ability and a confidence embedding to select the most
noteworthy tokens. To address the issue of word redundancy in captions, our
Generation Module includes a repetition mask to avoid predicting repeated word
in captions. Our model outperforms state-of-the-art models on TextCaps dataset,
improving from 81.0 to 93.0 in CIDEr. Our source code is publicly available.Comment: 9 pages; Accepted by AAAI 202
Resonant Quantum Principal Component Analysis
Principal component analysis has been widely adopted to reduce the dimension
of data while preserving the information. The quantum version of PCA (qPCA) can
be used to analyze an unknown low-rank density matrix by rapidly revealing the
principal components of it, i.e. the eigenvectors of the density matrix with
largest eigenvalues. However, due to the substantial resource requirement, its
experimental implementation remains challenging. Here, we develop a resonant
analysis algorithm with the minimal resource for ancillary qubits, in which
only one frequency scanning probe qubit is required to extract the principal
components. In the experiment, we demonstrate the distillation of the first
principal component of a 44 density matrix, with the efficiency of
86.0% and fidelity of 0.90. This work shows the speed-up ability of quantum
algorithm in dimension reduction of data and thus could be used as part of
quantum artificial intelligence algorithms in the future.Comment: 10 pages, 7 figures, have been waiting for the reviewers' responses
for over 3 month
Video Background Music Generation: Dataset, Method and Evaluation
Music is essential when editing videos, but selecting music manually is
difficult and time-consuming. Thus, we seek to automatically generate
background music tracks given video input. This is a challenging task since it
requires plenty of paired videos and music to learn their correspondence.
Unfortunately, there exist no such datasets. To close this gap, we introduce a
dataset, benchmark model, and evaluation metric for video background music
generation. We introduce SymMV, a video and symbolic music dataset, along with
chord, rhythm, melody, and accompaniment annotations. To the best of our
knowledge, it is the first video-music dataset with high-quality symbolic music
and detailed annotations. We also propose a benchmark video background music
generation framework named V-MusProd, which utilizes music priors of chords,
melody, and accompaniment along with video-music relations of semantic, color,
and motion features. To address the lack of objective metrics for video-music
correspondence, we propose a retrieval-based metric VMCP built upon a powerful
video-music representation learning model. Experiments show that with our
dataset, V-MusProd outperforms the state-of-the-art method in both music
quality and correspondence with videos. We believe our dataset, benchmark
model, and evaluation metric will boost the development of video background
music generation
EA-BEV: Edge-aware Bird' s-Eye-View Projector for 3D Object Detection
In recent years, great progress has been made in the Lift-Splat-Shot-based
(LSS-based) 3D object detection method, which converts features of 2D camera
view and 3D lidar view to Bird's-Eye-View (BEV) for feature fusion. However,
inaccurate depth estimation (e.g. the 'depth jump' problem) is an obstacle to
develop LSS-based methods. To alleviate the 'depth jump' problem, we proposed
Edge-Aware Bird's-Eye-View (EA-BEV) projector. By coupling proposed edge-aware
depth fusion module and depth estimate module, the proposed EA-BEV projector
solves the problem and enforces refined supervision on depth. Besides, we
propose sparse depth supervision and gradient edge depth supervision, for
constraining learning on global depth and local marginal depth information. Our
EA-BEV projector is a plug-and-play module for any LSS-based 3D object
detection models, and effectively improves the baseline performance. We
demonstrate the effectiveness on the nuScenes benchmark. On the nuScenes 3D
object detection validation dataset, our proposed EA-BEV projector can boost
several state-of-the-art LLS-based baselines on nuScenes 3D object detection
benchmark and nuScenes BEV map segmentation benchmark with negligible increment
of inference time
Enhanced volcanic activity and long-term warmth in the middle Eocene revealed by mercury and osmium isotopes from IODP Expedition 369 Site U1514
Rapid plate reorganization may have influenced global climate during the Eocene; however, its linkage remains poorly constrained, particularly during the middle Eocene. To elucidate this tectonic–climatic relationship, here, we conducted a comprehensive analysis based on high-resolution mercury (Hg) and osmium (Os) abundance and isotope data obtained from the complete Eocene sedimentary sequence of Site U1514, drilled in the Mentelle Basin off southwest Australia. The Hg signals in this sedimentary sequence, which are characterized by significantly high enrichment and insignificant mass-independent fractionation (Δ199Hg) signal, confirm that the middle Eocene (∼45–38 Ma) was a period of persistent, increased volcanism, accompanied by intense tectonic activity. In particular, a remarkable seafloor volcanic eruption persisted for approximately 1.5 million years (∼42.0–40.5 Ma), immediately preceding the Middle Eocene Climate Optimum (MECO). Contemporaneously, the trends toward a slightly more radiogenic seawater 187Os/188Os (Osi) composition denote the prevalence of intensified continental weathering under a warm, humid climate during the middle Eocene, a phenomenon particularly evident during the MECO. Importantly, the Hg and Os records from Site U1514 reveal the occurrence of a multi-million-year warming reversal amid the long-term Eocene cooling trend, which likely contributed to significant CO2 reduction during the late Eocene. These findings significantly enhance our understanding of Eocene climate dynamics, which are fundamentally linked to intensive tectonic-driven volcanic activity and associated continental chemical weathering
Association between triglyceride glucose-body mass index and heart failure in subjects with diabetes mellitus or prediabetes mellitus: a cross-sectional study
BackgroundThe triglyceride glucose-body mass index (TyG-BMI) is a surrogate indicator of insulin resistance. However, the association of TyG-BMI with heart failure (HF) in individuals with diabetes mellitus or prediabetes mellitus is unknown.MethodsThis study included 7,472 participants aged 20–80 years old with prediabetes or diabetes from the National Health and Nutrition Examination Survey (2007–2018). The TyG-BMI was calculated as Ln [triglyceride (mg/dL) × fasting blood glucose (mg/dL)/2] × BMI, and individuals were categorized into tertiles based on TyG-BMI levels. The relationship of TyG-BMI with HF was analyzed using multiple logistic regression models. Subgroup analyses were stratified by gender, age, hypertension, and diabetes mellitus status.ResultsThis cross-sectional study had 7,472 participants (weighted n = 111,808,357), including 329 HF participants. Participants with a high TyG-BMI were prone to HF. The highest tertile group with a fully adjusted model was more likely to have HF compared to the lowest tertile group (odds ratio [OR], 2.645; 95% CI, 1.529–4.576). Restricted cubic spline analysis showed a significant dose-response relationship between TyG-BMI and HF (P < 0.001). In subgroup analyses, similar results were seen in terms of age (≥50 years old), gender, hypertension, and diabetes mellitus status.ConclusionA high TyG-BMI is significantly associated with HF risk in participants with diabetes mellitus or prediabetes mellitus
- …