142 research outputs found
Improving Audio-Visual Segmentation with Bidirectional Generation
The aim of audio-visual segmentation (AVS) is to precisely differentiate
audible objects within videos down to the pixel level. Traditional approaches
often tackle this challenge by combining information from various modalities,
where the contribution of each modality is implicitly or explicitly modeled.
Nevertheless, the interconnections between different modalities tend to be
overlooked in audio-visual modeling. In this paper, inspired by the human
ability to mentally simulate the sound of an object and its visual appearance,
we introduce a bidirectional generation framework. This framework establishes
robust correlations between an object's visual characteristics and its
associated sound, thereby enhancing the performance of AVS. To achieve this, we
employ a visual-to-audio projection component that reconstructs audio features
from object segmentation masks and minimizes reconstruction errors. Moreover,
recognizing that many sounds are linked to object movements, we introduce an
implicit volumetric motion estimation module to handle temporal dynamics that
may be challenging to capture using conventional optical flow methods. To
showcase the effectiveness of our approach, we conduct comprehensive
experiments and analyses on the widely recognized AVSBench benchmark. As a
result, we establish a new state-of-the-art performance level in the AVS
benchmark, particularly excelling in the challenging MS3 subset which involves
segmenting multiple sound sources. To facilitate reproducibility, we plan to
release both the source code and the pre-trained model.Comment: Dawei Hao and Yuxin Mao contribute equality to this paper. Yiran
Zhong is the corresponding author. The code will be released at
https://github.com/OpenNLPLab/AVS-bidirectiona
Nedd4-2-dependent ubiquitination potentiates the inhibition of human NHE3 by cholera toxin and enteropathogenic Escherichia coli
BACKGROUND & AIMS: Diarrhea is one of the most common illnesses and is often caused by bacterial infection. Recently, we have shown that human Naþ/Hþ exchanger NHE3 (hNHE3), but not non-human NHE3s, interacts with the E3 ubiquitin ligase Nedd4-2. We hypothesize that this property of hNHE3 contributes to the increased severity of diarrhea in humans. METHODS: We used humanized mice expressing hNHE3 in the intestine (hNHE3int) to compare the contribution of hNHE3 and mouse NHE3 to diarrhea induced by cholera toxin (CTX) and enteropathogenic Escherichia coli (EPEC). We measured Naþ/ Hþ exchange activity and fluid absorption. The role of Nedd4-2 on hNHE3 activity and ubiquitination was determined by knockdown in Caco-2bbe cells. The effects of protein kinase A (PKA), the primary mediator of CTX-induced diarrhea, on Nedd4-2 and hNHE3 phosphorylation and their interaction were determined. RESULTS: The effects of CTX and EPEC were greater in hNHE3int mice than in control wild-type (WT) mice, resulting in greater inhibition of NHE3 activity and increased fluid accumulation in the intestine, the hallmark of diarrhea. Activation of PKA increased ubiquitination of hNHE3 and enhanced interaction of Nedd4-2 with hNHE3 via phosphorylation of Nedd4-2 at S342. S342A mutation mitigated the Nedd4-2–hNHE3 interaction and blocked PKA-induced inhibition of hNHE3. Unlike non-human NHE3s, inhibition of hNHE3 by PKA is independent of NHE3 phosphorylation, suggesting a distinct mechanism of hNHE3 regulation. CONCLUSIONS: The effects of CTX and EPEC on hNHE3 are amplified, and the unique properties of hNHE3 may contribute to diarrheal symptoms occurring in humans
A Syntax-Guided Multi-Task Learning Approach for Turducken-Style Code Generation
Due to the development of pre-trained language models, automated code
generation techniques have shown great promise in recent years. However, the
generated code is difficult to meet the syntactic constraints of the target
language, especially in the case of Turducken-style code, where declarative
code snippets are embedded within imperative programs. In this study, we
summarize the lack of syntactic constraints into three significant challenges:
(1) the efficient representation of syntactic constraints, (2) the effective
integration of syntactic information, and (3) the scalable syntax-first
decoding algorithm. To address these challenges, we propose a syntax-guided
multi-task learning approach TurduckenGen. Specifically, we first explicitly
append the type information to the code tokens to capture the representation of
syntactic constraints. Then we formalize code generation with syntactic
constraint representation as an auxiliary task to enable the model to learn the
syntactic constraints of the code. Finally, the syntactically correct code is
selected accurately from the multiple candidates with the help of the compiler
feedback. Extensive experiments and comprehensive analysis demonstrate the
effectiveness and general applicability of our approach after being compared
with six state-of-the-art baselines on two Turducken-style code datasets.
Finally, we conducted a human study and found the code quality generated by our
approach is better than baselines in terms of code readability and semantic
similarity.Comment: Accepted in Empirical Software Engineerin
DISQ: Dynamic Iteration Skipping for Variational Quantum Algorithms
This paper proposes DISQ to craft a stable landscape for VQA training and
tackle the noise drift challenge. DISQ adopts a "drift detector" with a
reference circuit to identify and skip iterations that are severely affected by
noise drift errors. Specifically, the circuits from the previous training
iteration are re-executed as a reference circuit in the current iteration to
estimate noise drift impacts. The iteration is deemed compromised by noise
drift errors and thus skipped if noise drift flips the direction of the ideal
optimization gradient. To enhance noise drift detection reliability, we further
propose to leverage multiple reference circuits from previous iterations to
provide a well founded judge of current noise drift. Nevertheless, multiple
reference circuits also introduce considerable execution overhead. To mitigate
extra overhead, we propose Pauli-term subsetting (prime and minor subsets) to
execute only observable circuits with large coefficient magnitudes (prime
subset) during drift detection. Only this minor subset is executed when the
current iteration is drift-free. Evaluations across various applications and
QPUs demonstrate that DISQ can mitigate a significant portion of the noise
drift impact on VQAs and achieve 1.51-2.24x fidelity improvement over the
traditional baseline. DISQ's benefit is 1.1-1.9x over the best alternative
approach while boosting average noise detection speed by 2.07
Linearized Relative Positional Encoding
Relative positional encoding is widely used in vanilla and linear
transformers to represent positional information. However, existing encoding
methods of a vanilla transformer are not always directly applicable to a linear
transformer, because the latter requires a decomposition of the query and key
representations into separate kernel functions. Nevertheless, principles for
designing encoding methods suitable for linear transformers remain
understudied. In this work, we put together a variety of existing linear
relative positional encoding approaches under a canonical form and further
propose a family of linear relative positional encoding algorithms via unitary
transformation. Our formulation leads to a principled framework that can be
used to develop new relative positional encoding methods that preserve linear
space-time complexity. Equipped with different models, the proposed linearized
relative positional encoding (LRPE) family derives effective encoding for
various applications. Experiments show that compared with existing methods,
LRPE achieves state-of-the-art performance in language modeling, text
classification, and image classification. Meanwhile, it emphasizes a general
paradigm for designing broadly more relative positional encoding methods that
are applicable to linear transformers. The code is available at
https://github.com/OpenNLPLab/Lrpe.Comment: Reviewed by TMLR, decision pending. Yiran Zhong is the corresponding
author. Code is available at https://github.com/OpenNLPLab/Lrp
TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer
We present TransNormerLLM, the first linear attention-based Large Language
Model (LLM) that outperforms conventional softmax attention-based models in
terms of both accuracy and efficiency. TransNormerLLM evolves from the previous
linear attention architecture TransNormer by making advanced modifications that
include positional embedding, linear attention acceleration, gating mechanisms,
tensor normalization, and inference acceleration and stabilization.
Specifically, we use LRPE together with an exponential decay to avoid attention
dilution issues while allowing the model to retain global interactions between
tokens. Additionally, we propose Lightning Attention, a cutting-edge technique
that accelerates linear attention by more than twice in runtime and reduces
memory usage by a remarkable four times. To further enhance the performance of
TransNormer, we leverage a gating mechanism for smooth training and a new
tensor normalization scheme to accelerate the model, resulting in an impressive
acceleration of over . Furthermore, we develop a robust inference
algorithm that ensures numerical stability and consistent inference speed,
regardless of the sequence length, showcasing superior efficiency during both
training and inference stages. We also implement an efficient model parallel
schema for TransNormerLLM, enabling seamless deployment on large-scale clusters
and facilitating expansion to even more extensive models, i.e., LLMs with 175B
parameters. We validate our model design through a series of ablations and
train models with sizes of 385M, 1B, and 7B on our self-collected corpus.
Benchmark results demonstrate that our models not only match the performance of
state-of-the-art LLMs with Transformer but are also significantly faster. Code
is released at: https://github.com/OpenNLPLab/TransnormerLLM.Comment: Technical Report. Yiran Zhong is the corresponding author. Zhen Qin,
Dong Li, Weigao Sun, Weixuan Sun, Xuyang Shen contribute equally to this
paper. Code is released at: https://github.com/OpenNLPLab/TransnormerLL
Left Bundle Branch Ablation Guided by a Three-Dimensional Mapping System: A Novel Method for Establishing a Heart Failure Animal Model
Objective: Few studies have been conducted to establish animal models of left bundle branch block by using three-dimensional mapping systems. This research was aimed at creating a canine left bundle branch block model by using a three-dimensional mapping system. Materials and Methods: We used a three-dimensional mapping system to map and ablate the left bundle branch in beagles. Results: Ten canines underwent radiofrequency ablation, among which left bundle branch block was successfully established in eight, one experienced ventricular fibrillation, and one developed third-degree atrioventricular block. The maximum HV interval measured within the left ventricle was 29.00 ± 2.93 ms, and the LBP-V interval at the ablation site was 20.63 ± 2.77 ms. The LBP-V interval at the ablation target was 71.08% of the maximum HV interval. Conclusion: This three-dimensional mapping system is a reliable and effective guide for ablation of the left bundle branch in dogs
A multi-tissue transcriptomic landscape of female mice in estrus and diestrus provides clues for precision medicine
Female reproductive cycle, also known as menstrual cycle or estrous cycle in primate or non-primate mammals, respectively, dominates the reproductive processes in non-pregnant state. However, in addition to reproductive tissues, reproductive cycle could also perform global regulation because the receptors of two major female hormones fluctuating throughout the cycle, estrogen and progesterone, are widely distributed. Therefore, a multi-tissue gene expression landscape is in continuous demand for better understanding the systemic changes during the reproductive cycle but remains largely undefined. Here we delineated a transcriptomic landscape covering 15 tissues of C57BL/6J female mice in two phases of estrous cycle, estrus and diestrus, by RNA-sequencing. Then, a number of genes, pathways, and transcription factors involved in the estrous cycle were revealed. We found the estrous cycle could widely regulate the neuro-functions, immuno-functions, blood coagulation and so on. And behind the transcriptomic alteration between estrus and diestrus, 13 transcription factors may play important roles. Next, bioinformatics modeling with 1,263 manually curated gene signatures of various physiological and pathophysiological states systematically characterized the beneficial/deleterious effects brought by estrus/diestrus on individual tissues. We revealed that the estrous cycle has a significant effect on cardiovascular system (aorta, heart, vein), in which the anti-hypertensive pattern in aorta induced by estrus is one of the most striking findings. Inspired by this point, we validated that two hypotensive drugs, felodipine and acebutolol, could exhibit significantly enhanced efficacy in estrus than diestrus by mouse and rat experiments. Together, this study provides a valuable data resource for investigating reproductive cycle from a transcriptomic perspective, and presents models and clues for investigating precision medicine associated with reproductive cycle
Fine-grained Audible Video Description
We explore a new task for audio-visual-language modeling called fine-grained
audible video description (FAVD). It aims to provide detailed textual
descriptions for the given audible videos, including the appearance and spatial
locations of each object, the actions of moving objects, and the sounds in
videos. Existing visual-language modeling tasks often concentrate on visual
cues in videos while undervaluing the language and audio modalities. On the
other hand, FAVD requires not only audio-visual-language modeling skills but
also paragraph-level language generation abilities. We construct the first
fine-grained audible video description benchmark (FAVDBench) to facilitate this
research. For each video clip, we first provide a one-sentence summary of the
video, ie, the caption, followed by 4-6 sentences describing the visual details
and 1-2 audio-related descriptions at the end. The descriptions are provided in
both English and Chinese. We create two new metrics for this task: an
EntityScore to gauge the completeness of entities in the visual descriptions,
and an AudioScore to assess the audio descriptions. As a preliminary approach
to this task, we propose an audio-visual-language transformer that extends
existing video captioning model with an additional audio branch. We combine the
masked language modeling and auto-regressive language modeling losses to
optimize our model so that it can produce paragraph-level descriptions. We
illustrate the efficiency of our model in audio-visual-language modeling by
evaluating it against the proposed benchmark using both conventional captioning
metrics and our proposed metrics. We further put our benchmark to the test in
video generation models, demonstrating that employing fine-grained video
descriptions can create more intricate videos than using captions.Comment: accpeted to CVPR 2023, Xuyang Shen, Dong Li and Jinxing Zhou
contribute equally, code link: github.com/OpenNLPLab/FAVDBench, dataset link:
www.avlbench.opennlplab.c
- …