142 research outputs found

    Improving Audio-Visual Segmentation with Bidirectional Generation

    Full text link
    The aim of audio-visual segmentation (AVS) is to precisely differentiate audible objects within videos down to the pixel level. Traditional approaches often tackle this challenge by combining information from various modalities, where the contribution of each modality is implicitly or explicitly modeled. Nevertheless, the interconnections between different modalities tend to be overlooked in audio-visual modeling. In this paper, inspired by the human ability to mentally simulate the sound of an object and its visual appearance, we introduce a bidirectional generation framework. This framework establishes robust correlations between an object's visual characteristics and its associated sound, thereby enhancing the performance of AVS. To achieve this, we employ a visual-to-audio projection component that reconstructs audio features from object segmentation masks and minimizes reconstruction errors. Moreover, recognizing that many sounds are linked to object movements, we introduce an implicit volumetric motion estimation module to handle temporal dynamics that may be challenging to capture using conventional optical flow methods. To showcase the effectiveness of our approach, we conduct comprehensive experiments and analyses on the widely recognized AVSBench benchmark. As a result, we establish a new state-of-the-art performance level in the AVS benchmark, particularly excelling in the challenging MS3 subset which involves segmenting multiple sound sources. To facilitate reproducibility, we plan to release both the source code and the pre-trained model.Comment: Dawei Hao and Yuxin Mao contribute equality to this paper. Yiran Zhong is the corresponding author. The code will be released at https://github.com/OpenNLPLab/AVS-bidirectiona

    Nedd4-2-dependent ubiquitination potentiates the inhibition of human NHE3 by cholera toxin and enteropathogenic Escherichia coli

    Get PDF
    BACKGROUND & AIMS: Diarrhea is one of the most common illnesses and is often caused by bacterial infection. Recently, we have shown that human Naþ/Hþ exchanger NHE3 (hNHE3), but not non-human NHE3s, interacts with the E3 ubiquitin ligase Nedd4-2. We hypothesize that this property of hNHE3 contributes to the increased severity of diarrhea in humans. METHODS: We used humanized mice expressing hNHE3 in the intestine (hNHE3int) to compare the contribution of hNHE3 and mouse NHE3 to diarrhea induced by cholera toxin (CTX) and enteropathogenic Escherichia coli (EPEC). We measured Naþ/ Hþ exchange activity and fluid absorption. The role of Nedd4-2 on hNHE3 activity and ubiquitination was determined by knockdown in Caco-2bbe cells. The effects of protein kinase A (PKA), the primary mediator of CTX-induced diarrhea, on Nedd4-2 and hNHE3 phosphorylation and their interaction were determined. RESULTS: The effects of CTX and EPEC were greater in hNHE3int mice than in control wild-type (WT) mice, resulting in greater inhibition of NHE3 activity and increased fluid accumulation in the intestine, the hallmark of diarrhea. Activation of PKA increased ubiquitination of hNHE3 and enhanced interaction of Nedd4-2 with hNHE3 via phosphorylation of Nedd4-2 at S342. S342A mutation mitigated the Nedd4-2–hNHE3 interaction and blocked PKA-induced inhibition of hNHE3. Unlike non-human NHE3s, inhibition of hNHE3 by PKA is independent of NHE3 phosphorylation, suggesting a distinct mechanism of hNHE3 regulation. CONCLUSIONS: The effects of CTX and EPEC on hNHE3 are amplified, and the unique properties of hNHE3 may contribute to diarrheal symptoms occurring in humans

    A Syntax-Guided Multi-Task Learning Approach for Turducken-Style Code Generation

    Full text link
    Due to the development of pre-trained language models, automated code generation techniques have shown great promise in recent years. However, the generated code is difficult to meet the syntactic constraints of the target language, especially in the case of Turducken-style code, where declarative code snippets are embedded within imperative programs. In this study, we summarize the lack of syntactic constraints into three significant challenges: (1) the efficient representation of syntactic constraints, (2) the effective integration of syntactic information, and (3) the scalable syntax-first decoding algorithm. To address these challenges, we propose a syntax-guided multi-task learning approach TurduckenGen. Specifically, we first explicitly append the type information to the code tokens to capture the representation of syntactic constraints. Then we formalize code generation with syntactic constraint representation as an auxiliary task to enable the model to learn the syntactic constraints of the code. Finally, the syntactically correct code is selected accurately from the multiple candidates with the help of the compiler feedback. Extensive experiments and comprehensive analysis demonstrate the effectiveness and general applicability of our approach after being compared with six state-of-the-art baselines on two Turducken-style code datasets. Finally, we conducted a human study and found the code quality generated by our approach is better than baselines in terms of code readability and semantic similarity.Comment: Accepted in Empirical Software Engineerin

    DISQ: Dynamic Iteration Skipping for Variational Quantum Algorithms

    Full text link
    This paper proposes DISQ to craft a stable landscape for VQA training and tackle the noise drift challenge. DISQ adopts a "drift detector" with a reference circuit to identify and skip iterations that are severely affected by noise drift errors. Specifically, the circuits from the previous training iteration are re-executed as a reference circuit in the current iteration to estimate noise drift impacts. The iteration is deemed compromised by noise drift errors and thus skipped if noise drift flips the direction of the ideal optimization gradient. To enhance noise drift detection reliability, we further propose to leverage multiple reference circuits from previous iterations to provide a well founded judge of current noise drift. Nevertheless, multiple reference circuits also introduce considerable execution overhead. To mitigate extra overhead, we propose Pauli-term subsetting (prime and minor subsets) to execute only observable circuits with large coefficient magnitudes (prime subset) during drift detection. Only this minor subset is executed when the current iteration is drift-free. Evaluations across various applications and QPUs demonstrate that DISQ can mitigate a significant portion of the noise drift impact on VQAs and achieve 1.51-2.24x fidelity improvement over the traditional baseline. DISQ's benefit is 1.1-1.9x over the best alternative approach while boosting average noise detection speed by 2.07

    "Searching for Happiness" or "Full of Joy"? Source Domain Activation Matters

    Get PDF

    Linearized Relative Positional Encoding

    Full text link
    Relative positional encoding is widely used in vanilla and linear transformers to represent positional information. However, existing encoding methods of a vanilla transformer are not always directly applicable to a linear transformer, because the latter requires a decomposition of the query and key representations into separate kernel functions. Nevertheless, principles for designing encoding methods suitable for linear transformers remain understudied. In this work, we put together a variety of existing linear relative positional encoding approaches under a canonical form and further propose a family of linear relative positional encoding algorithms via unitary transformation. Our formulation leads to a principled framework that can be used to develop new relative positional encoding methods that preserve linear space-time complexity. Equipped with different models, the proposed linearized relative positional encoding (LRPE) family derives effective encoding for various applications. Experiments show that compared with existing methods, LRPE achieves state-of-the-art performance in language modeling, text classification, and image classification. Meanwhile, it emphasizes a general paradigm for designing broadly more relative positional encoding methods that are applicable to linear transformers. The code is available at https://github.com/OpenNLPLab/Lrpe.Comment: Reviewed by TMLR, decision pending. Yiran Zhong is the corresponding author. Code is available at https://github.com/OpenNLPLab/Lrp

    TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer

    Full text link
    We present TransNormerLLM, the first linear attention-based Large Language Model (LLM) that outperforms conventional softmax attention-based models in terms of both accuracy and efficiency. TransNormerLLM evolves from the previous linear attention architecture TransNormer by making advanced modifications that include positional embedding, linear attention acceleration, gating mechanisms, tensor normalization, and inference acceleration and stabilization. Specifically, we use LRPE together with an exponential decay to avoid attention dilution issues while allowing the model to retain global interactions between tokens. Additionally, we propose Lightning Attention, a cutting-edge technique that accelerates linear attention by more than twice in runtime and reduces memory usage by a remarkable four times. To further enhance the performance of TransNormer, we leverage a gating mechanism for smooth training and a new tensor normalization scheme to accelerate the model, resulting in an impressive acceleration of over 20%20\%. Furthermore, we develop a robust inference algorithm that ensures numerical stability and consistent inference speed, regardless of the sequence length, showcasing superior efficiency during both training and inference stages. We also implement an efficient model parallel schema for TransNormerLLM, enabling seamless deployment on large-scale clusters and facilitating expansion to even more extensive models, i.e., LLMs with 175B parameters. We validate our model design through a series of ablations and train models with sizes of 385M, 1B, and 7B on our self-collected corpus. Benchmark results demonstrate that our models not only match the performance of state-of-the-art LLMs with Transformer but are also significantly faster. Code is released at: https://github.com/OpenNLPLab/TransnormerLLM.Comment: Technical Report. Yiran Zhong is the corresponding author. Zhen Qin, Dong Li, Weigao Sun, Weixuan Sun, Xuyang Shen contribute equally to this paper. Code is released at: https://github.com/OpenNLPLab/TransnormerLL

    Left Bundle Branch Ablation Guided by a Three-Dimensional Mapping System: A Novel Method for Establishing a Heart Failure Animal Model

    Get PDF
    Objective: Few studies have been conducted to establish animal models of left bundle branch block by using three-dimensional mapping systems. This research was aimed at creating a canine left bundle branch block model by using a three-dimensional mapping system. Materials and Methods: We used a three-dimensional mapping system to map and ablate the left bundle branch in beagles. Results: Ten canines underwent radiofrequency ablation, among which left bundle branch block was successfully established in eight, one experienced ventricular fibrillation, and one developed third-degree atrioventricular block. The maximum HV interval measured within the left ventricle was 29.00 ± 2.93 ms, and the LBP-V interval at the ablation site was 20.63 ± 2.77 ms. The LBP-V interval at the ablation target was 71.08% of the maximum HV interval. Conclusion: This three-dimensional mapping system is a reliable and effective guide for ablation of the left bundle branch in dogs

    A multi-tissue transcriptomic landscape of female mice in estrus and diestrus provides clues for precision medicine

    Get PDF
    Female reproductive cycle, also known as menstrual cycle or estrous cycle in primate or non-primate mammals, respectively, dominates the reproductive processes in non-pregnant state. However, in addition to reproductive tissues, reproductive cycle could also perform global regulation because the receptors of two major female hormones fluctuating throughout the cycle, estrogen and progesterone, are widely distributed. Therefore, a multi-tissue gene expression landscape is in continuous demand for better understanding the systemic changes during the reproductive cycle but remains largely undefined. Here we delineated a transcriptomic landscape covering 15 tissues of C57BL/6J female mice in two phases of estrous cycle, estrus and diestrus, by RNA-sequencing. Then, a number of genes, pathways, and transcription factors involved in the estrous cycle were revealed. We found the estrous cycle could widely regulate the neuro-functions, immuno-functions, blood coagulation and so on. And behind the transcriptomic alteration between estrus and diestrus, 13 transcription factors may play important roles. Next, bioinformatics modeling with 1,263 manually curated gene signatures of various physiological and pathophysiological states systematically characterized the beneficial/deleterious effects brought by estrus/diestrus on individual tissues. We revealed that the estrous cycle has a significant effect on cardiovascular system (aorta, heart, vein), in which the anti-hypertensive pattern in aorta induced by estrus is one of the most striking findings. Inspired by this point, we validated that two hypotensive drugs, felodipine and acebutolol, could exhibit significantly enhanced efficacy in estrus than diestrus by mouse and rat experiments. Together, this study provides a valuable data resource for investigating reproductive cycle from a transcriptomic perspective, and presents models and clues for investigating precision medicine associated with reproductive cycle

    Fine-grained Audible Video Description

    Full text link
    We explore a new task for audio-visual-language modeling called fine-grained audible video description (FAVD). It aims to provide detailed textual descriptions for the given audible videos, including the appearance and spatial locations of each object, the actions of moving objects, and the sounds in videos. Existing visual-language modeling tasks often concentrate on visual cues in videos while undervaluing the language and audio modalities. On the other hand, FAVD requires not only audio-visual-language modeling skills but also paragraph-level language generation abilities. We construct the first fine-grained audible video description benchmark (FAVDBench) to facilitate this research. For each video clip, we first provide a one-sentence summary of the video, ie, the caption, followed by 4-6 sentences describing the visual details and 1-2 audio-related descriptions at the end. The descriptions are provided in both English and Chinese. We create two new metrics for this task: an EntityScore to gauge the completeness of entities in the visual descriptions, and an AudioScore to assess the audio descriptions. As a preliminary approach to this task, we propose an audio-visual-language transformer that extends existing video captioning model with an additional audio branch. We combine the masked language modeling and auto-regressive language modeling losses to optimize our model so that it can produce paragraph-level descriptions. We illustrate the efficiency of our model in audio-visual-language modeling by evaluating it against the proposed benchmark using both conventional captioning metrics and our proposed metrics. We further put our benchmark to the test in video generation models, demonstrating that employing fine-grained video descriptions can create more intricate videos than using captions.Comment: accpeted to CVPR 2023, Xuyang Shen, Dong Li and Jinxing Zhou contribute equally, code link: github.com/OpenNLPLab/FAVDBench, dataset link: www.avlbench.opennlplab.c
    • …
    corecore