10 research outputs found

    Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation

    Full text link
    Large-scale pre-trained language models (PLMs) have shown great potential in natural language processing tasks. Leveraging the capabilities of PLMs to enhance automatic speech recognition (ASR) systems has also emerged as a promising research direction. However, previous works may be limited by the inflexible structures of PLMs and the insufficient utilization of PLMs. To alleviate these problems, we propose the hierarchical knowledge distillation (HKD) on the continuous integrate-and-fire (CIF) based ASR models. To transfer knowledge from PLMs to the ASR models, HKD employs cross-modal knowledge distillation with contrastive loss at the acoustic level and knowledge distillation with regression loss at the linguistic level. Compared with the original CIF-based model, our method achieves 15% and 9% relative error rate reduction on the AISHELL-1 and LibriSpeech datasets, respectively.Comment: Accepted by INTERSPEECH 202

    Calculation and experimental verification of force-magnetic coupling model of magnetised rail based on density functional theory

    Get PDF
    Metal magnetic memory (MMM) is a widely used non-destructive electromagnetic detection technology. However, the analysis of its underlying principle is still insufficient. The mechanical and magnetic coupling model is a reasonable standpoint from which to study the principle of MMM. In this paper, a mechanical and magnetic coupling model of steel material is established based on density functional theory (DFT) using the CASTEP first-principles analysis software. In order to simulate the practical working environment, the residual magnetism in the rail is assumed to change with the stress on the rail. By applying different stresses to the model, the relationship between the atomic magnetic moment, the lattice constant and stress is explored, as well as the causes of magnetic signals in the stress concentration zone. It is revealed that the atomic magnetic moment and the crystal volume decrease with the increase in compressive stress. The magnetic signal on the surface of the magnetised metal component decreases with the increase in compressive stress, while the tensile stress shows the opposite tendency. Generally speaking, the change in atomic magnetic moment and crystal volume caused by lattice distortion under stress can be seen as the fundamental reason for the change in magnetic signal on the surface of the magnetised metal. The bending experiment of the rail shows that the normal magnetic field decreases with the increase in compressive stress in the stress concentration zone. The conclusion is verified by experiments

    Complex Dynamic Neurons Improved Spiking Transformer Network for Efficient Automatic Speech Recognition

    No full text
    The spiking neural network (SNN) using leaky-integrated-and-fire (LIF) neurons has been commonly used in automatic speech recognition (ASR) tasks. However, the LIF neuron is still relatively simple compared to that in the biological brain. Further research on more types of neurons with different scales of neuronal dynamics is necessary. Here we introduce four types of neuronal dynamics to post-process the sequential patterns generated from the spiking transformer to get the complex dynamic neuron improved spiking transformer neural network (DyTr-SNN). We found that the DyTr-SNN could handle the non-toy automatic speech recognition task well, representing a lower phoneme error rate, lower computational cost, and higher robustness. These results indicate that the further cooperation of SNNs and neural dynamics at the neuron and network scales might have much in store for the future, especially on the ASR tasks

    A Behavior-Driven Forum Spammer Recognition Method with Its Application in Automobile Forums

    No full text
    10.1155/2021/7682579Mathematical Problems in Engineering2021768257

    VLP: A Survey on Vision-Language Pre-training

    Full text link
    In the past few years, the emergence of pre-training models has brought uni-modal fields such as computer vision (CV) and natural language processing (NLP) to a new era. Substantial works have shown they are beneficial for downstream uni-modal tasks and avoid training a new model from scratch. So can such pre-trained models be applied to multi-modal tasks? Researchers have explored this problem and made significant progress. This paper surveys recent advances and new frontiers in vision-language pre-training (VLP), including image-text and video-text pre-training. To give readers a better overall grasp of VLP, we first review its recent advances from five aspects: feature extraction, model architecture, pre-training objectives, pre-training datasets, and downstream tasks. Then, we summarize the specific VLP models in detail. Finally, we discuss the new frontiers in VLP. To the best of our knowledge, this is the first survey focused on VLP. We hope that this survey can shed light on future research in the VLP field.Comment: A Survey on Vision-Language Pre-trainin

    X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages

    Full text link
    Large language models (LLMs) have demonstrated remarkable language abilities. GPT-4, based on advanced LLMs, exhibits extraordinary multimodal capabilities beyond previous visual language models. We attribute this to the use of more advanced LLMs compared with previous multimodal models. Unfortunately, the model architecture and training strategies of GPT-4 are unknown. To endow LLMs with multimodal capabilities, we propose X-LLM, which converts Multi-modalities (images, speech, videos) into foreign languages using X2L interfaces and inputs them into a large Language model (ChatGLM). Specifically, X-LLM aligns multiple frozen single-modal encoders and a frozen LLM using X2L interfaces, where ``X'' denotes multi-modalities such as image, speech, and videos, and ``L'' denotes languages. X-LLM's training consists of three stages: (1) Converting Multimodal Information: The first stage trains each X2L interface to align with its respective single-modal encoder separately to convert multimodal information into languages. (2) Aligning X2L representations with the LLM: single-modal encoders are aligned with the LLM through X2L interfaces independently. (3) Integrating multiple modalities: all single-modal encoders are aligned with the LLM through X2L interfaces to integrate multimodal capabilities into the LLM. Our experiments show that X-LLM demonstrates impressive multimodel chat abilities, sometimes exhibiting the behaviors of multimodal GPT-4 on unseen images/instructions, and yields a 84.5\% relative score compared with GPT-4 on a synthetic multimodal instruction-following dataset. And we also conduct quantitative tests on using LLM for ASR and multimodal ASR, hoping to promote the era of LLM-based speech recognition

    LPCAT1 enhances castration resistant prostate cancer progression via increased mRNA synthesis and PAF production.

    Get PDF
    Our previously study shown that Lysophosphatidylcholine Acyltransferase1 (LPCAT1) is overexpressed in castration resistant prostate cancer (CRPC) relative to primary prostate cancer (PCa), and androgen controls its expression via the Wnt signaling pathway. While highly expressed in CRPC, the role of LPCAT1 remains unclear. In vitro cell experiments referred to cell transfection, mutagenesis, proliferation, migration, invasion, cell cycle progression and apoptosis, Western blotting, Pulse-chase RNA labeling. BALB/c nude mice were used for in vivo experiments. We found that LPCAT1 overexpression enhanced the proliferation, migration, and invasion of CRPC cells both in vitro and in vivo. Silencing of LPCAT1 reduced the proliferation and the invasive capabilities of CRPC cells. Providing exogenous PAF to LPCAT1 knockdown cells increased their invasive capabilities; however platelet activating factor acetylhydrolase (PAF-AH) and the PAFR antagonist ABT-491 both reversed this phenotype; proliferation of CRPC cells was not affected in either model. LPCAT1 was found to mediate CRPC growth via nuclear re-localization and Histone H4 palmitoylation in an androgen-dependent fashion, increasing mRNA synthesis rates. We also found that LPCAT1 overexpression led to CRPC cell resistance to treatment with paclitaxel. LPCAT1 overexpression in CRPC cells drives tumor progression via increased mRNA synthesis and PAF production. Our results highlight LPCAT1 as a viable therapeutic target in the context of CRPC
    corecore