574 research outputs found

    Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement through Knowledge Distillation

    Full text link
    Audio-visual speech enhancement (AV-SE) aims to enhance degraded speech along with extra visual information such as lip videos, and has been shown to be more effective than audio-only speech enhancement. This paper proposes further incorporating ultrasound tongue images to improve lip-based AV-SE systems' performance. Knowledge distillation is employed at the training stage to address the challenge of acquiring ultrasound tongue images during inference, enabling an audio-lip speech enhancement student model to learn from a pre-trained audio-lip-tongue speech enhancement teacher model. Experimental results demonstrate significant improvements in the quality and intelligibility of the speech enhanced by the proposed method compared to the traditional audio-lip speech enhancement baselines. Further analysis using phone error rates (PER) of automatic speech recognition (ASR) shows that palatal and velar consonants benefit most from the introduction of ultrasound tongue images.Comment: To be published in InterSpeech 202

    SUMOhydro: A Novel Method for the Prediction of Sumoylation Sites Based on Hydrophobic Properties

    Get PDF
    Sumoylation is one of the most essential mechanisms of reversible protein post-translational modifications and is a crucial biochemical process in the regulation of a variety of important biological functions. Sumoylation is also closely involved in various human diseases. The accurate computational identification of sumoylation sites in protein sequences aids in experimental design and mechanistic research in cellular biology. In this study, we introduced amino acid hydrophobicity as a parameter into a traditional binary encoding scheme and developed a novel sumoylation site prediction tool termed SUMOhydro. With the assistance of a support vector machine, the proposed method was trained and tested using a stringent non-redundant sumoylation dataset. In a leave-one-out cross-validation, the proposed method yielded an excellent performance with a correlation coefficient, specificity, sensitivity and accuracy equal to 0.690, 98.6%, 71.1% and 97.5%, respectively. In addition, SUMOhydro has been benchmarked against previously described predictors based on an independent dataset, thereby suggesting that the introduction of hydrophobicity as an additional parameter could assist in the prediction of sumoylation sites. Currently, SUMOhydro is freely accessible at http://protein.cau.edu.cn/others/SUMOhydro/

    Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement

    Full text link
    Audio-visual speech enhancement (AV-SE) aims to enhance degraded speech along with extra visual information such as lip videos, and has been shown to be more effective than audio-only speech enhancement. This paper proposes the incorporation of ultrasound tongue images to improve the performance of lip-based AV-SE systems further. To address the challenge of acquiring ultrasound tongue images during inference, we first propose to employ knowledge distillation during training to investigate the feasibility of leveraging tongue-related information without directly inputting ultrasound tongue images. Specifically, we guide an audio-lip speech enhancement student model to learn from a pre-trained audio-lip-tongue speech enhancement teacher model, thus transferring tongue-related knowledge. To better model the alignment between the lip and tongue modalities, we further propose the introduction of a lip-tongue key-value memory network into the AV-SE model. This network enables the retrieval of tongue features based on readily available lip features, thereby assisting the subsequent speech enhancement task. Experimental results demonstrate that both methods significantly improve the quality and intelligibility of the enhanced speech compared to traditional lip-based AV-SE baselines. Moreover, both proposed methods exhibit strong generalization performance on unseen speakers and in the presence of unseen noises. Furthermore, phone error rate (PER) analysis of automatic speech recognition (ASR) reveals that while all phonemes benefit from introducing ultrasound tongue images, palatal and velar consonants benefit most.Comment: Submmited to IEEE/ACM Transactions on Audio, Speech and Language Processing. arXiv admin note: text overlap with arXiv:2305.1493

    A Self-enhancement Approach for Domain-specific Chatbot Training via Knowledge Mining and Digest

    Full text link
    Large Language Models (LLMs), despite their great power in language generation, often encounter challenges when dealing with intricate and knowledge-demanding queries in specific domains. This paper introduces a novel approach to enhance LLMs by effectively extracting the relevant knowledge from domain-specific textual sources, and the adaptive training of a chatbot with domain-specific inquiries. Our two-step approach starts from training a knowledge miner, namely LLMiner, which autonomously extracts Question-Answer pairs from relevant documents through a chain-of-thought reasoning process. Subsequently, we blend the mined QA pairs with a conversational dataset to fine-tune the LLM as a chatbot, thereby enriching its domain-specific expertise and conversational capabilities. We also developed a new evaluation benchmark which comprises four domain-specific text corpora and associated human-crafted QA pairs for testing. Our model shows remarkable performance improvement over generally aligned LLM and surpasses domain-adapted models directly fine-tuned on domain corpus. In particular, LLMiner achieves this with minimal human intervention, requiring only 600 seed instances, thereby providing a pathway towards self-improvement of LLMs through model-synthesized training data.Comment: Work in progres

    Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications

    Full text link
    Understanding how the surrounding environment changes is crucial for performing downstream tasks safely and reliably in autonomous driving applications. Recent occupancy estimation techniques using only camera images as input can provide dense occupancy representations of large-scale scenes based on the current observation. However, they are mostly limited to representing the current 3D space and do not consider the future state of surrounding objects along the time axis. To extend camera-only occupancy estimation into spatiotemporal prediction, we propose Cam4DOcc, a new benchmark for camera-only 4D occupancy forecasting, evaluating the surrounding scene changes in a near future. We build our benchmark based on multiple publicly available datasets, including nuScenes, nuScenes-Occupancy, and Lyft-Level5, which provides sequential occupancy states of general movable and static objects, as well as their 3D backward centripetal flow. To establish this benchmark for future research with comprehensive comparisons, we introduce four baseline types from diverse camera-based perception and prediction implementations, including a static-world occupancy model, voxelization of point cloud prediction, 2D-3D instance-based prediction, and our proposed novel end-to-end 4D occupancy forecasting network. Furthermore, the standardized evaluation protocol for preset multiple tasks is also provided to compare the performance of all the proposed baselines on present and future occupancy estimation with respect to objects of interest in autonomous driving scenarios. The dataset and our implementation of all four baselines in the proposed Cam4DOcc benchmark will be released here: https://github.com/haomo-ai/Cam4DOcc

    1,4-Bis(5-methyl-1H-1,2,4-triazol-3-yl)benzene tetra­hydrate

    Get PDF
    In the title compound, C12H12N6·4H2O, the two triazole rings adopt a cis configuration with a crystallographic twofold axis passing through the central benzene group. The benzene and triazole rings are almost coplanar with a dihedral angle of 5.5 (1)°. In the crystal, water mol­ecules are joined together by OW—H⋯OW hydrogen bonds to form a one-dimensional zigzag chain. These water chains are further connected to the organic mol­ecule, forming a three-dimensional network by inter­molecular OW—H⋯N and N—H⋯OW hydrogen bonds. Moreover, π–π stacking inter­actions between triazole rings [centroid–centroid distances = 3.667 (1)–3.731 (1) Å] are observed. One of the water mol­ecules shows one of the H atoms to be disordered over two positions

    GABA, progesterone and zona pellucida activation of PLA2 and regulation by MEK-ERK1/2 during acrosomal exocytosis in guinea pig spermatozoa

    Get PDF
    AbstractWe investigated whether GABA activates phospholipase A2 (PLA2) during acrosomal exocytosis, and if the MEK-ERK1/2 pathway modulates PLA2 activation initiated by GABA, progesterone or zona pellucida (ZP). In guinea pig spermatozoa prelabelled with [14C]arachidonic acid or [14C]choline chloride, GABA stimulated a decrease in phosphatidylcholine (PC), and release of arachidonic acid and lysoPC, during exocytosis. These lipid changes are indicative of PLA2 activation and appear essential for exocytosis since inclusion of aristolochic acid (a PLA2 inhibitor) abrogated them, along with exocytosis. GABA activation of PLA2 seems to be mediated, at least in part, by diacylglycerol (DAG) and protein kinase C since inclusion of the DAG kinase inhibitor R59022 enhanced PLA2 activity and exocytosis stimulated by GABA, whereas exposure to staurosporine decreased both. GABA-, progesterone- and ZP-induced release of arachidonic acid and exocytosis were prevented by U0126 and PD98059 (MEK inhibitors). Taken together, our results suggest that PLA2 plays a fundamental role in agonist-stimulated exocytosis and that MEK-ERK1/2 are involved in PLA2 regulation during this process

    Clinical values of multiple Epstein-Barr virus (EBV) serological biomarkers detected by xMAP technology

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Serological examination of Epstein-Barr virus (EBV) antibodies has been performed for screening nasopharyngeal carcinoma (NPC) and other EBV-associated diseases.</p> <p>Methods</p> <p>By using xMAP technology, we examined immunoglobulin (Ig) A antibodies against Epstein-Barr virus (EBV) VCA-gp125, p18 and IgA/IgG against EA-D, EBNA1 and gp78 in populations with distinct diseases, or with different genetic or geographic background. Sera from Cantonese NPC patients (n = 547) and healthy controls (n = 542), 90 members of high-risk NPC families and 52 non-endemic healthy individuals were tested. Thirty-five of NPC patients were recruited to observe the kinetics of EBV antibody levels during and after treatment. Patients with other EBV-associated diseases were collected, including 16 with infectious mononucleosis, 28 with nasal NK/T cell lymphoma and 14 with Hodgkin's disease.</p> <p>Results</p> <p>Both the sensitivity and specificity of each marker for NPC diagnosis ranged 61–84%, but if combined, they could reach to 84.5% and 92.4%, respectively. Almost half of NPC patients displayed decreased EBV immunoactivities shortly after therapy and tumor recurrence was accompanied with high EBV antibody reactivates. Neither the unaffected members from high-risk NPC families nor non-endemic healthy population showed statistically different EBV antibody levels compared with endemic controls. Moreover, elevated levels of specific antibodies were observed in other EBV-associated diseases, but all were lower than those in NPC.</p> <p>Conclusion</p> <p>Combined EBV serological biomarkers could improve the diagnostic values for NPC. Diverse EBV serological spectrums presented in populations with different EBV-associated diseases, but NPC patients have the highest EBV activity.</p

    RRM1 single nucleotide polymorphism -37C→A correlates with progression-free survival in NSCLC patients after gemcitabine-based chemotherapy

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The ribonucleotide reductase M1 (RRM1) gene encodes the regulatory subunit of ribonucleotide reductase, the molecular target of gemcitabine. The overexpression of RRM1 mRNA in tumor tissues is reported to be associated with gemcitabine resistance. Thus, single nucleotide polymorphisms (SNPs) of the RRM1 gene are potential biomarkers of the response to gemcitabine chemotherapy. We investigated whether RRM1 expression in peripheral blood mononuclear cells (PBMCs) or SNPs were associated with clinical outcome after gemcitabine-based chemotherapy in advanced non-small cell lung cancer (NSCLC) patients.</p> <p>Methods</p> <p>PBMC samples were obtained from 62 stage IIIB and IV patients treated with gemcitabine-based chemotherapy. RRM1 mRNA expression levels were assessed by real-time PCR. Three RRM1 SNPs, -37C→A, 2455A→G and 2464G→A, were assessed by direct sequencing.</p> <p>Results</p> <p>RRM1 expression was detectable in 57 PBMC samples, and SNPs were sequenced in 56 samples. The overall response rate to gemcitabine was 18%; there was no significant association between RRM1 mRNA expression and response rate (<it>P </it>= 0.560). The median progression-free survival (PFS) was 23.3 weeks in the lower expression group and 26.9 weeks in the higher expression group (<it>P </it>= 0.659). For the -37C→A polymorphism, the median PFS was 30.7 weeks in the C(-)37A group, 24.7 weeks in the A(-)37A group, and 23.3 weeks in the C(-)37C group (<it>P </it>= 0.043). No significant difference in PFS was observed for the SNP 2455A→G or 2464G→A.</p> <p>Conclusions</p> <p>The RRM1 polymorphism -37C→A correlated with PFS in NSCLC patients treated with gemcitabine-based chemotherapy. No significant correlation was found between PBMC RRM1 mRNA expression and the efficacy of gemcitabine.</p
    corecore