Search CORE

10 research outputs found

Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation

Author: Chen Feilong
Han Minglun
Shi Jing
Xu Bo
Xu Shuang
Publication venue
Publication date: 28/05/2023
Field of study

Large-scale pre-trained language models (PLMs) have shown great potential in natural language processing tasks. Leveraging the capabilities of PLMs to enhance automatic speech recognition (ASR) systems has also emerged as a promising research direction. However, previous works may be limited by the inflexible structures of PLMs and the insufficient utilization of PLMs. To alleviate these problems, we propose the hierarchical knowledge distillation (HKD) on the continuous integrate-and-fire (CIF) based ASR models. To transfer knowledge from PLMs to the ASR models, HKD employs cross-modal knowledge distillation with contrastive loss at the acoustic level and knowledge distillation with regression loss at the linguistic level. Compared with the original CIF-based model, our method achieves 15% and 9% relative error rate reduction on the AISHELL-1 and LibriSpeech datasets, respectively.Comment: Accepted by INTERSPEECH 202

arXiv.org e-Print Archive

Calculation and experimental verification of force-magnetic coupling model of magnetised rail based on density functional theory

Author: Feng Jiarui
Li Minglun
Shi Yu
Wang Ping
Yao Entao
Yao Han
Publication venue: 'British Institute of Non-Destructive Testing (BINDT)'
Publication date: 01/10/2021
Field of study

Metal magnetic memory (MMM) is a widely used non-destructive electromagnetic detection technology. However, the analysis of its underlying principle is still insufficient. The mechanical and magnetic coupling model is a reasonable standpoint from which to study the principle of MMM. In this paper, a mechanical and magnetic coupling model of steel material is established based on density functional theory (DFT) using the CASTEP first-principles analysis software. In order to simulate the practical working environment, the residual magnetism in the rail is assumed to change with the stress on the rail. By applying different stresses to the model, the relationship between the atomic magnetic moment, the lattice constant and stress is explored, as well as the causes of magnetic signals in the stress concentration zone. It is revealed that the atomic magnetic moment and the crystal volume decrease with the increase in compressive stress. The magnetic signal on the surface of the magnetised metal component decreases with the increase in compressive stress, while the tensile stress shows the opposite tendency. Generally speaking, the change in atomic magnetic moment and crystal volume caused by lattice distortion under stress can be seen as the fundamental reason for the change in magnetic signal on the surface of the magnetised metal. The bending experiment of the rail shows that the normal magnetic field decreases with the increase in compressive stress in the stress concentration zone. The conclusion is verified by experiments

UCL Discovery

Complex Dynamic Neurons Improved Spiking Transformer Network for Efficient Automatic Speech Recognition

Author: Han Minglun
Wang Qingyu
Wang Yi
Xu Bo
Zhang Duzhen
Zhang Tielin
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 26/06/2023
Field of study

The spiking neural network (SNN) using leaky-integrated-and-fire (LIF) neurons has been commonly used in automatic speech recognition (ASR) tasks. However, the LIF neuron is still relatively simple compared to that in the biological brain. Further research on more types of neurons with different scales of neuronal dynamics is necessary. Here we introduce four types of neuronal dynamics to post-process the sequential patterns generated from the spiking transformer to get the complex dynamic neuron improved spiking transformer neural network (DyTr-SNN). We found that the DyTr-SNN could handle the non-toy automatic speech recognition task well, representing a lower phoneme error rate, lower computational cost, and higher robustness. These results indicate that the further cooperation of SNNs and neural dynamics at the neuron and network scales might have much in store for the future, especially on the ASR tasks

Association for the Advancement of Artificial Intelligence: AAAI Publications

A Behavior-Driven Forum Spammer Recognition Method with Its Application in Automobile Forums

Author: Fang Zhao
Ni Xin
Ren Minglun
Su Han
Tang Xiaoan
Wang Anning
Publication venue: 'Hindawi Limited'
Publication date: 30/08/2021
Field of study

10.1155/2021/7682579Mathematical Problems in Engineering2021768257

ScholarBank@NUS

VLP: A Survey on Vision-Language Pre-training

Author: Chen Feilong
Chen Xiuyi
Han Minglun
Shi Jing
Xu Bo
Xu Shuang
Zhang Duzhen
Publication venue
Publication date: 30/07/2022
Field of study

In the past few years, the emergence of pre-training models has brought uni-modal fields such as computer vision (CV) and natural language processing (NLP) to a new era. Substantial works have shown they are beneficial for downstream uni-modal tasks and avoid training a new model from scratch. So can such pre-trained models be applied to multi-modal tasks? Researchers have explored this problem and made significant progress. This paper surveys recent advances and new frontiers in vision-language pre-training (VLP), including image-text and video-text pre-training. To give readers a better overall grasp of VLP, we first review its recent advances from five aspects: feature extraction, model architecture, pre-training objectives, pre-training datasets, and downstream tasks. Then, we summarize the specific VLP models in detail. Finally, we discuss the new frontiers in VLP. To the best of our knowledge, this is the first survey focused on VLP. We hope that this survey can shed light on future research in the VLP field.Comment: A Survey on Vision-Language Pre-trainin

arXiv.org e-Print Archive

X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages

Author: Chen Feilong
Han Minglun
Shi Jing
Xu Bo
Xu Shuang
Zhang Qingyang
Zhao Haozhi
Publication venue
Publication date: 09/05/2023
Field of study

Large language models (LLMs) have demonstrated remarkable language abilities. GPT-4, based on advanced LLMs, exhibits extraordinary multimodal capabilities beyond previous visual language models. We attribute this to the use of more advanced LLMs compared with previous multimodal models. Unfortunately, the model architecture and training strategies of GPT-4 are unknown. To endow LLMs with multimodal capabilities, we propose X-LLM, which converts Multi-modalities (images, speech, videos) into foreign languages using X2L interfaces and inputs them into a large Language model (ChatGLM). Specifically, X-LLM aligns multiple frozen single-modal encoders and a frozen LLM using X2L interfaces, where ``X'' denotes multi-modalities such as image, speech, and videos, and ``L'' denotes languages. X-LLM's training consists of three stages: (1) Converting Multimodal Information: The first stage trains each X2L interface to align with its respective single-modal encoder separately to convert multimodal information into languages. (2) Aligning X2L representations with the LLM: single-modal encoders are aligned with the LLM through X2L interfaces independently. (3) Integrating multiple modalities: all single-modal encoders are aligned with the LLM through X2L interfaces to integrate multimodal capabilities into the LLM. Our experiments show that X-LLM demonstrates impressive multimodel chat abilities, sometimes exhibiting the behaviors of multimodal GPT-4 on unseen images/instructions, and yields a 84.5\% relative score compared with GPT-4 on a synthetic multimodal instruction-following dataset. And we also conduct quantitative tests on using LLM for ASR and multimodal ASR, hoping to promote the era of LLM-based speech recognition

arXiv.org e-Print Archive

LPCAT1 enhances castration resistant prostate cancer progression via increased mRNA synthesis and PAF production.

Author: Bin Xu
Chao Han
Guopeng Yu
Lin Zhou
Long Li
Minglun Li
Shangqing Song
Yuanshen Mao
Yushan Liu
Zhong Wang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2020
Field of study

Our previously study shown that Lysophosphatidylcholine Acyltransferase1 (LPCAT1) is overexpressed in castration resistant prostate cancer (CRPC) relative to primary prostate cancer (PCa), and androgen controls its expression via the Wnt signaling pathway. While highly expressed in CRPC, the role of LPCAT1 remains unclear. In vitro cell experiments referred to cell transfection, mutagenesis, proliferation, migration, invasion, cell cycle progression and apoptosis, Western blotting, Pulse-chase RNA labeling. BALB/c nude mice were used for in vivo experiments. We found that LPCAT1 overexpression enhanced the proliferation, migration, and invasion of CRPC cells both in vitro and in vivo. Silencing of LPCAT1 reduced the proliferation and the invasive capabilities of CRPC cells. Providing exogenous PAF to LPCAT1 knockdown cells increased their invasive capabilities; however platelet activating factor acetylhydrolase (PAF-AH) and the PAFR antagonist ABT-491 both reversed this phenotype; proliferation of CRPC cells was not affected in either model. LPCAT1 was found to mediate CRPC growth via nuclear re-localization and Histone H4 palmitoylation in an androgen-dependent fashion, increasing mRNA synthesis rates. We also found that LPCAT1 overexpression led to CRPC cell resistance to treatment with paclitaxel. LPCAT1 overexpression in CRPC cells drives tumor progression via increased mRNA synthesis and PAF production. Our results highlight LPCAT1 as a viable therapeutic target in the context of CRPC

Directory of Open Access Journals

Open Access LMU

Studies on visual scene process system of aircraft assembly

Author: Deviprasad
Dian-liang
Dong Tianyang
Fang Minglun
Gang Zhao
Givehchi Mohammad
Givehchi Mohammad
Grabowi
Han Hu
Huagen
Jayarams
Jung
Kunwoo Lee
Li
Li Yuan
Li-hua
Lian Ding
Liberman
Manish Kumar
Newman
Qing-hua
Shi Beiqi
Smith
Song
Tan Huimeng
Tao Peng
Wei Renliang
Wen-gang
Xie Kang
Yi Zheng
Zhang Ye
Zhou Kaijun
Zhu Wenhua
Zhu Xiaowei
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Conversion of isobutane to n-butane over bifunctional PtAu-solid superacids catalyst

Author: A Alazman
A Alazman
A Awadallah-F
A Clearfield
AJ Maiaa
AM Venezia
AM Venezia
B Coq
BQ Xu
CR Vera
DI Hagen
E Rach
EK Hanrieder
F Gao
FR Chen
GB Mcvicker
H Wu
HY Park
J Han
JA Moreno
Jianhua Lv
Jidong Liu
K Föttinger
K Sun
K Yang
KB Fogash
M Hino
M Seah
M Thommes
MA Ahmed
MD Smolikov
Ming Sun
Minglun Cao
MJ Wulfers
O Poole
OV Dzhikiya
P Liu
P Wang
PZ Wang
PZ Wang
S Nand
S Sun
S Wohlrab
SE Hörnström
Shan Song
SX Song
TJ Schwartz
W Wang
X Hou
X Song
Y Zhang
YF Han
ZM Ma
ZP Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref