66 research outputs found

    Confidence-aware Non-repetitive Multimodal Transformers for TextCaps

    Full text link
    When describing an image, reading text in the visual scene is crucial to understand the key information. Recent work explores the TextCaps task, i.e. image captioning with reading Optical Character Recognition (OCR) tokens, which requires models to read text and cover them in generated captions. Existing approaches fail to generate accurate descriptions because of their (1) poor reading ability; (2) inability to choose the crucial words among all extracted OCR tokens; (3) repetition of words in predicted captions. To this end, we propose a Confidence-aware Non-repetitive Multimodal Transformers (CNMT) to tackle the above challenges. Our CNMT consists of a reading, a reasoning and a generation modules, in which Reading Module employs better OCR systems to enhance text reading ability and a confidence embedding to select the most noteworthy tokens. To address the issue of word redundancy in captions, our Generation Module includes a repetition mask to avoid predicting repeated word in captions. Our model outperforms state-of-the-art models on TextCaps dataset, improving from 81.0 to 93.0 in CIDEr. Our source code is publicly available.Comment: 9 pages; Accepted by AAAI 202

    Resonant Quantum Principal Component Analysis

    Full text link
    Principal component analysis has been widely adopted to reduce the dimension of data while preserving the information. The quantum version of PCA (qPCA) can be used to analyze an unknown low-rank density matrix by rapidly revealing the principal components of it, i.e. the eigenvectors of the density matrix with largest eigenvalues. However, due to the substantial resource requirement, its experimental implementation remains challenging. Here, we develop a resonant analysis algorithm with the minimal resource for ancillary qubits, in which only one frequency scanning probe qubit is required to extract the principal components. In the experiment, we demonstrate the distillation of the first principal component of a 4×\times4 density matrix, with the efficiency of 86.0% and fidelity of 0.90. This work shows the speed-up ability of quantum algorithm in dimension reduction of data and thus could be used as part of quantum artificial intelligence algorithms in the future.Comment: 10 pages, 7 figures, have been waiting for the reviewers' responses for over 3 month

    Video Background Music Generation: Dataset, Method and Evaluation

    Full text link
    Music is essential when editing videos, but selecting music manually is difficult and time-consuming. Thus, we seek to automatically generate background music tracks given video input. This is a challenging task since it requires plenty of paired videos and music to learn their correspondence. Unfortunately, there exist no such datasets. To close this gap, we introduce a dataset, benchmark model, and evaluation metric for video background music generation. We introduce SymMV, a video and symbolic music dataset, along with chord, rhythm, melody, and accompaniment annotations. To the best of our knowledge, it is the first video-music dataset with high-quality symbolic music and detailed annotations. We also propose a benchmark video background music generation framework named V-MusProd, which utilizes music priors of chords, melody, and accompaniment along with video-music relations of semantic, color, and motion features. To address the lack of objective metrics for video-music correspondence, we propose a retrieval-based metric VMCP built upon a powerful video-music representation learning model. Experiments show that with our dataset, V-MusProd outperforms the state-of-the-art method in both music quality and correspondence with videos. We believe our dataset, benchmark model, and evaluation metric will boost the development of video background music generation

    EA-BEV: Edge-aware Bird' s-Eye-View Projector for 3D Object Detection

    Full text link
    In recent years, great progress has been made in the Lift-Splat-Shot-based (LSS-based) 3D object detection method, which converts features of 2D camera view and 3D lidar view to Bird's-Eye-View (BEV) for feature fusion. However, inaccurate depth estimation (e.g. the 'depth jump' problem) is an obstacle to develop LSS-based methods. To alleviate the 'depth jump' problem, we proposed Edge-Aware Bird's-Eye-View (EA-BEV) projector. By coupling proposed edge-aware depth fusion module and depth estimate module, the proposed EA-BEV projector solves the problem and enforces refined supervision on depth. Besides, we propose sparse depth supervision and gradient edge depth supervision, for constraining learning on global depth and local marginal depth information. Our EA-BEV projector is a plug-and-play module for any LSS-based 3D object detection models, and effectively improves the baseline performance. We demonstrate the effectiveness on the nuScenes benchmark. On the nuScenes 3D object detection validation dataset, our proposed EA-BEV projector can boost several state-of-the-art LLS-based baselines on nuScenes 3D object detection benchmark and nuScenes BEV map segmentation benchmark with negligible increment of inference time

    Enhanced volcanic activity and long-term warmth in the middle Eocene revealed by mercury and osmium isotopes from IODP Expedition 369 Site U1514

    Get PDF
    Rapid plate reorganization may have influenced global climate during the Eocene; however, its linkage remains poorly constrained, particularly during the middle Eocene. To elucidate this tectonic–climatic relationship, here, we conducted a comprehensive analysis based on high-resolution mercury (Hg) and osmium (Os) abundance and isotope data obtained from the complete Eocene sedimentary sequence of Site U1514, drilled in the Mentelle Basin off southwest Australia. The Hg signals in this sedimentary sequence, which are characterized by significantly high enrichment and insignificant mass-independent fractionation (Δ199Hg) signal, confirm that the middle Eocene (∼45–38 Ma) was a period of persistent, increased volcanism, accompanied by intense tectonic activity. In particular, a remarkable seafloor volcanic eruption persisted for approximately 1.5 million years (∼42.0–40.5 Ma), immediately preceding the Middle Eocene Climate Optimum (MECO). Contemporaneously, the trends toward a slightly more radiogenic seawater 187Os/188Os (Osi) composition denote the prevalence of intensified continental weathering under a warm, humid climate during the middle Eocene, a phenomenon particularly evident during the MECO. Importantly, the Hg and Os records from Site U1514 reveal the occurrence of a multi-million-year warming reversal amid the long-term Eocene cooling trend, which likely contributed to significant CO2 reduction during the late Eocene. These findings significantly enhance our understanding of Eocene climate dynamics, which are fundamentally linked to intensive tectonic-driven volcanic activity and associated continental chemical weathering

    Association between triglyceride glucose-body mass index and heart failure in subjects with diabetes mellitus or prediabetes mellitus: a cross-sectional study

    Get PDF
    BackgroundThe triglyceride glucose-body mass index (TyG-BMI) is a surrogate indicator of insulin resistance. However, the association of TyG-BMI with heart failure (HF) in individuals with diabetes mellitus or prediabetes mellitus is unknown.MethodsThis study included 7,472 participants aged 20–80 years old with prediabetes or diabetes from the National Health and Nutrition Examination Survey (2007–2018). The TyG-BMI was calculated as Ln [triglyceride (mg/dL) × fasting blood glucose (mg/dL)/2] × BMI, and individuals were categorized into tertiles based on TyG-BMI levels. The relationship of TyG-BMI with HF was analyzed using multiple logistic regression models. Subgroup analyses were stratified by gender, age, hypertension, and diabetes mellitus status.ResultsThis cross-sectional study had 7,472 participants (weighted n = 111,808,357), including 329 HF participants. Participants with a high TyG-BMI were prone to HF. The highest tertile group with a fully adjusted model was more likely to have HF compared to the lowest tertile group (odds ratio [OR], 2.645; 95% CI, 1.529–4.576). Restricted cubic spline analysis showed a significant dose-response relationship between TyG-BMI and HF (P < 0.001). In subgroup analyses, similar results were seen in terms of age (≥50 years old), gender, hypertension, and diabetes mellitus status.ConclusionA high TyG-BMI is significantly associated with HF risk in participants with diabetes mellitus or prediabetes mellitus
    • …
    corecore