574 research outputs found
Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement through Knowledge Distillation
Audio-visual speech enhancement (AV-SE) aims to enhance degraded speech along
with extra visual information such as lip videos, and has been shown to be more
effective than audio-only speech enhancement. This paper proposes further
incorporating ultrasound tongue images to improve lip-based AV-SE systems'
performance. Knowledge distillation is employed at the training stage to
address the challenge of acquiring ultrasound tongue images during inference,
enabling an audio-lip speech enhancement student model to learn from a
pre-trained audio-lip-tongue speech enhancement teacher model. Experimental
results demonstrate significant improvements in the quality and intelligibility
of the speech enhanced by the proposed method compared to the traditional
audio-lip speech enhancement baselines. Further analysis using phone error
rates (PER) of automatic speech recognition (ASR) shows that palatal and velar
consonants benefit most from the introduction of ultrasound tongue images.Comment: To be published in InterSpeech 202
SUMOhydro: A Novel Method for the Prediction of Sumoylation Sites Based on Hydrophobic Properties
Sumoylation is one of the most essential mechanisms of reversible protein post-translational modifications and is a crucial biochemical process in the regulation of a variety of important biological functions. Sumoylation is also closely involved in various human diseases. The accurate computational identification of sumoylation sites in protein sequences aids in experimental design and mechanistic research in cellular biology. In this study, we introduced amino acid hydrophobicity as a parameter into a traditional binary encoding scheme and developed a novel sumoylation site prediction tool termed SUMOhydro. With the assistance of a support vector machine, the proposed method was trained and tested using a stringent non-redundant sumoylation dataset. In a leave-one-out cross-validation, the proposed method yielded an excellent performance with a correlation coefficient, specificity, sensitivity and accuracy equal to 0.690, 98.6%, 71.1% and 97.5%, respectively. In addition, SUMOhydro has been benchmarked against previously described predictors based on an independent dataset, thereby suggesting that the introduction of hydrophobicity as an additional parameter could assist in the prediction of sumoylation sites. Currently, SUMOhydro is freely accessible at http://protein.cau.edu.cn/others/SUMOhydro/
Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement
Audio-visual speech enhancement (AV-SE) aims to enhance degraded speech along
with extra visual information such as lip videos, and has been shown to be more
effective than audio-only speech enhancement. This paper proposes the
incorporation of ultrasound tongue images to improve the performance of
lip-based AV-SE systems further. To address the challenge of acquiring
ultrasound tongue images during inference, we first propose to employ knowledge
distillation during training to investigate the feasibility of leveraging
tongue-related information without directly inputting ultrasound tongue images.
Specifically, we guide an audio-lip speech enhancement student model to learn
from a pre-trained audio-lip-tongue speech enhancement teacher model, thus
transferring tongue-related knowledge. To better model the alignment between
the lip and tongue modalities, we further propose the introduction of a
lip-tongue key-value memory network into the AV-SE model. This network enables
the retrieval of tongue features based on readily available lip features,
thereby assisting the subsequent speech enhancement task. Experimental results
demonstrate that both methods significantly improve the quality and
intelligibility of the enhanced speech compared to traditional lip-based AV-SE
baselines. Moreover, both proposed methods exhibit strong generalization
performance on unseen speakers and in the presence of unseen noises.
Furthermore, phone error rate (PER) analysis of automatic speech recognition
(ASR) reveals that while all phonemes benefit from introducing ultrasound
tongue images, palatal and velar consonants benefit most.Comment: Submmited to IEEE/ACM Transactions on Audio, Speech and Language
Processing. arXiv admin note: text overlap with arXiv:2305.1493
A Self-enhancement Approach for Domain-specific Chatbot Training via Knowledge Mining and Digest
Large Language Models (LLMs), despite their great power in language
generation, often encounter challenges when dealing with intricate and
knowledge-demanding queries in specific domains. This paper introduces a novel
approach to enhance LLMs by effectively extracting the relevant knowledge from
domain-specific textual sources, and the adaptive training of a chatbot with
domain-specific inquiries. Our two-step approach starts from training a
knowledge miner, namely LLMiner, which autonomously extracts Question-Answer
pairs from relevant documents through a chain-of-thought reasoning process.
Subsequently, we blend the mined QA pairs with a conversational dataset to
fine-tune the LLM as a chatbot, thereby enriching its domain-specific expertise
and conversational capabilities. We also developed a new evaluation benchmark
which comprises four domain-specific text corpora and associated human-crafted
QA pairs for testing. Our model shows remarkable performance improvement over
generally aligned LLM and surpasses domain-adapted models directly fine-tuned
on domain corpus. In particular, LLMiner achieves this with minimal human
intervention, requiring only 600 seed instances, thereby providing a pathway
towards self-improvement of LLMs through model-synthesized training data.Comment: Work in progres
Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications
Understanding how the surrounding environment changes is crucial for
performing downstream tasks safely and reliably in autonomous driving
applications. Recent occupancy estimation techniques using only camera images
as input can provide dense occupancy representations of large-scale scenes
based on the current observation. However, they are mostly limited to
representing the current 3D space and do not consider the future state of
surrounding objects along the time axis. To extend camera-only occupancy
estimation into spatiotemporal prediction, we propose Cam4DOcc, a new benchmark
for camera-only 4D occupancy forecasting, evaluating the surrounding scene
changes in a near future. We build our benchmark based on multiple publicly
available datasets, including nuScenes, nuScenes-Occupancy, and Lyft-Level5,
which provides sequential occupancy states of general movable and static
objects, as well as their 3D backward centripetal flow. To establish this
benchmark for future research with comprehensive comparisons, we introduce four
baseline types from diverse camera-based perception and prediction
implementations, including a static-world occupancy model, voxelization of
point cloud prediction, 2D-3D instance-based prediction, and our proposed novel
end-to-end 4D occupancy forecasting network. Furthermore, the standardized
evaluation protocol for preset multiple tasks is also provided to compare the
performance of all the proposed baselines on present and future occupancy
estimation with respect to objects of interest in autonomous driving scenarios.
The dataset and our implementation of all four baselines in the proposed
Cam4DOcc benchmark will be released here: https://github.com/haomo-ai/Cam4DOcc
1,4-Bis(5-methyl-1H-1,2,4-triazol-3-yl)benzene tetrahydrate
In the title compound, C12H12N6·4H2O, the two triazole rings adopt a cis configuration with a crystallographic twofold axis passing through the central benzene group. The benzene and triazole rings are almost coplanar with a dihedral angle of 5.5 (1)°. In the crystal, water molecules are joined together by OW—H⋯OW hydrogen bonds to form a one-dimensional zigzag chain. These water chains are further connected to the organic molecule, forming a three-dimensional network by intermolecular OW—H⋯N and N—H⋯OW hydrogen bonds. Moreover, π–π stacking interactions between triazole rings [centroid–centroid distances = 3.667 (1)–3.731 (1) Å] are observed. One of the water molecules shows one of the H atoms to be disordered over two positions
GABA, progesterone and zona pellucida activation of PLA2 and regulation by MEK-ERK1/2 during acrosomal exocytosis in guinea pig spermatozoa
AbstractWe investigated whether GABA activates phospholipase A2 (PLA2) during acrosomal exocytosis, and if the MEK-ERK1/2 pathway modulates PLA2 activation initiated by GABA, progesterone or zona pellucida (ZP). In guinea pig spermatozoa prelabelled with [14C]arachidonic acid or [14C]choline chloride, GABA stimulated a decrease in phosphatidylcholine (PC), and release of arachidonic acid and lysoPC, during exocytosis. These lipid changes are indicative of PLA2 activation and appear essential for exocytosis since inclusion of aristolochic acid (a PLA2 inhibitor) abrogated them, along with exocytosis. GABA activation of PLA2 seems to be mediated, at least in part, by diacylglycerol (DAG) and protein kinase C since inclusion of the DAG kinase inhibitor R59022 enhanced PLA2 activity and exocytosis stimulated by GABA, whereas exposure to staurosporine decreased both. GABA-, progesterone- and ZP-induced release of arachidonic acid and exocytosis were prevented by U0126 and PD98059 (MEK inhibitors). Taken together, our results suggest that PLA2 plays a fundamental role in agonist-stimulated exocytosis and that MEK-ERK1/2 are involved in PLA2 regulation during this process
Clinical values of multiple Epstein-Barr virus (EBV) serological biomarkers detected by xMAP technology
<p>Abstract</p> <p>Background</p> <p>Serological examination of Epstein-Barr virus (EBV) antibodies has been performed for screening nasopharyngeal carcinoma (NPC) and other EBV-associated diseases.</p> <p>Methods</p> <p>By using xMAP technology, we examined immunoglobulin (Ig) A antibodies against Epstein-Barr virus (EBV) VCA-gp125, p18 and IgA/IgG against EA-D, EBNA1 and gp78 in populations with distinct diseases, or with different genetic or geographic background. Sera from Cantonese NPC patients (n = 547) and healthy controls (n = 542), 90 members of high-risk NPC families and 52 non-endemic healthy individuals were tested. Thirty-five of NPC patients were recruited to observe the kinetics of EBV antibody levels during and after treatment. Patients with other EBV-associated diseases were collected, including 16 with infectious mononucleosis, 28 with nasal NK/T cell lymphoma and 14 with Hodgkin's disease.</p> <p>Results</p> <p>Both the sensitivity and specificity of each marker for NPC diagnosis ranged 61–84%, but if combined, they could reach to 84.5% and 92.4%, respectively. Almost half of NPC patients displayed decreased EBV immunoactivities shortly after therapy and tumor recurrence was accompanied with high EBV antibody reactivates. Neither the unaffected members from high-risk NPC families nor non-endemic healthy population showed statistically different EBV antibody levels compared with endemic controls. Moreover, elevated levels of specific antibodies were observed in other EBV-associated diseases, but all were lower than those in NPC.</p> <p>Conclusion</p> <p>Combined EBV serological biomarkers could improve the diagnostic values for NPC. Diverse EBV serological spectrums presented in populations with different EBV-associated diseases, but NPC patients have the highest EBV activity.</p
RRM1 single nucleotide polymorphism -37C→A correlates with progression-free survival in NSCLC patients after gemcitabine-based chemotherapy
<p>Abstract</p> <p>Background</p> <p>The ribonucleotide reductase M1 (RRM1) gene encodes the regulatory subunit of ribonucleotide reductase, the molecular target of gemcitabine. The overexpression of RRM1 mRNA in tumor tissues is reported to be associated with gemcitabine resistance. Thus, single nucleotide polymorphisms (SNPs) of the RRM1 gene are potential biomarkers of the response to gemcitabine chemotherapy. We investigated whether RRM1 expression in peripheral blood mononuclear cells (PBMCs) or SNPs were associated with clinical outcome after gemcitabine-based chemotherapy in advanced non-small cell lung cancer (NSCLC) patients.</p> <p>Methods</p> <p>PBMC samples were obtained from 62 stage IIIB and IV patients treated with gemcitabine-based chemotherapy. RRM1 mRNA expression levels were assessed by real-time PCR. Three RRM1 SNPs, -37C→A, 2455A→G and 2464G→A, were assessed by direct sequencing.</p> <p>Results</p> <p>RRM1 expression was detectable in 57 PBMC samples, and SNPs were sequenced in 56 samples. The overall response rate to gemcitabine was 18%; there was no significant association between RRM1 mRNA expression and response rate (<it>P </it>= 0.560). The median progression-free survival (PFS) was 23.3 weeks in the lower expression group and 26.9 weeks in the higher expression group (<it>P </it>= 0.659). For the -37C→A polymorphism, the median PFS was 30.7 weeks in the C(-)37A group, 24.7 weeks in the A(-)37A group, and 23.3 weeks in the C(-)37C group (<it>P </it>= 0.043). No significant difference in PFS was observed for the SNP 2455A→G or 2464G→A.</p> <p>Conclusions</p> <p>The RRM1 polymorphism -37C→A correlated with PFS in NSCLC patients treated with gemcitabine-based chemotherapy. No significant correlation was found between PBMC RRM1 mRNA expression and the efficacy of gemcitabine.</p
- …