Search CORE

5 research outputs found

End-to-End Lyrics Recognition with Self-supervised Learning

Author: Garcia Leibny Paola
He Zhanhong
Li Shuyue Stella
Togneri Roberto
Zhang Xiangyu
Publication venue
Publication date: 26/10/2022
Field of study

Lyrics recognition is an important task in music processing. Despite traditional algorithms such as the hybrid HMM- TDNN model achieving good performance, studies on applying end-to-end models and self-supervised learning (SSL) are limited. In this paper, we first establish an end-to-end baseline for lyrics recognition and then explore the performance of SSL models on lyrics recognition task. We evaluate a variety of upstream SSL models with different training methods (masked reconstruction, masked prediction, autoregressive reconstruction, and contrastive learning). Our end-to-end self-supervised models, evaluated on the DAMP music dataset, outperform the previous state-of-the-art (SOTA) system by 5.23% for the dev set and 2.4% for the test set even without a language model trained by a large corpus. Moreover, we investigate the effect of background music on the performance of self-supervised learning models and conclude that the SSL models cannot extract features efficiently in the presence of background music. Finally, we study the out-of-domain generalization ability of the SSL features considering that those models were not trained on music datasets.Comment: 4 pages, 2 figures, 3 table

arXiv.org e-Print Archive

A New Approach to Extract Fetal Electrocardiogram Using Affine Combination of Adaptive Filters

Author: Garcia Leibny Paola
Li Shuyue Stella
Shen Zihan
Togneri Roberto
Xuan Yu
Zhang Xiangyu
Publication venue
Publication date: 26/10/2022
Field of study

The detection of abnormal fetal heartbeats during pregnancy is important for monitoring the health conditions of the fetus. While adult ECG has made several advances in modern medicine, noninvasive fetal electrocardiography (FECG) remains a great challenge. In this paper, we introduce a new method based on affine combinations of adaptive filters to extract FECG signals. The affine combination of multiple filters is able to precisely fit the reference signal, and thus obtain more accurate FECGs. We proposed a method to combine the Least Mean Square (LMS) and Recursive Least Squares (RLS) filters. Our approach found that the Combined Recursive Least Squares (CRLS) filter achieves the best performance among all proposed combinations. In addition, we found that CRLS is more advantageous in extracting FECG from abdominal electrocardiograms (AECG) with a small signal-to-noise ratio (SNR). Compared with the state-of-the-art MSF-ANC method, CRLS shows improved performance. The sensitivity, accuracy and F1 score are improved by 3.58%, 2.39% and 1.36%, respectively.Comment: 5 pages, 4 figures, 3 table

arXiv.org e-Print Archive

A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature Extractors

Author: Chao Wenhan
Garcia Leibny Paola
Li Shuyue Stella
Liu Hexin
Xu Beining
Zhang Xiangyu
Publication venue
Publication date: 27/11/2023
Field of study

In this work, we study the features extracted by English self-supervised learning (SSL) models in cross-lingual contexts and propose a new metric to predict the quality of feature representations. Using automatic speech recognition (ASR) as a downstream task, we analyze the effect of model size, training objectives, and model architecture on the models' performance as a feature extractor for a set of topologically diverse corpora. We develop a novel metric, the Phonetic-Syntax Ratio (PSR), to measure the phonetic and synthetic information in the extracted representations using deep generalized canonical correlation analysis. Results show the contrastive loss in the wav2vec2.0 objective facilitates more effective cross-lingual feature extraction. There is a positive correlation between PSR scores and ASR performance, suggesting that phonetic information extracted by monolingual SSL models can be used for downstream tasks in cross-lingual settings. The proposed metric is an effective indicator of the quality of the representations and can be useful for model selection.Comment: 12 pages, 5 figures, 4 table

arXiv.org e-Print Archive

PQLM -- Multilingual Decentralized Portable Quantum Language Model for Privacy Protection

Author: Garcia Leibny Paola
Li Shuyue Stella
Liang Ruixing
Liu Hexin
Shu Hongchao
Zhang Xiangyu
Zhou Shu
Publication venue
Publication date: 26/10/2022
Field of study

With careful manipulation, malicious agents can reverse engineer private information encoded in pre-trained language models. Security concerns motivate the development of quantum pre-training. In this work, we propose a highly portable quantum language model (PQLM) that can easily transmit information to downstream tasks on classical machines. The framework consists of a cloud PQLM built with random Variational Quantum Classifiers (VQC) and local models for downstream applications. We demonstrate the ad hoc portability of the quantum model by extracting only the word embeddings and effectively applying them to downstream tasks on classical machines. Our PQLM exhibits comparable performance to its classical counterpart on both intrinsic evaluation (loss, perplexity) and extrinsic evaluation (multilingual sentiment analysis accuracy) metrics. We also perform ablation studies on the factors affecting PQLM performance to analyze model stability. Our work establishes a theoretical foundation for a portable quantum pre-trained language model that could be trained on private data and made available for public use with privacy protection guarantees.Comment: 5 pages, 3 figures, 3 table

arXiv.org e-Print Archive

Condensing Multilingual Knowledge with Lightweight Language-Specific Modules

Author: Chen Yunmo
Koehn Philipp
Li Shuyue Stella
Murray Kenton
Tan Weiting
Van Durme Benjamin
Xu Haoran
Publication venue
Publication date: 22/10/2023
Field of study

Incorporating language-specific (LS) modules is a proven method to boost performance in multilingual machine translation. This approach bears similarity to Mixture-of-Experts (MoE) because it does not inflate FLOPs. However, the scalability of this approach to hundreds of languages (experts) tends to be unmanageable due to the prohibitive number of parameters introduced by full-rank matrices in fully-connected layers. In this work, we introduce the Language-Specific Matrix Synthesis (LMS) method. This approach constructs LS modules by generating low-rank matrices from two significantly smaller matrices to approximate the full-rank matrix. Furthermore, we condense multilingual knowledge from multiple LS modules into a single shared module with the Fuse Distillation (FD) technique to improve the efficiency of inference and model serialization. We show that our LMS method significantly outperforms previous LS methods and MoE methods with the same amount of extra parameters, e.g., 1.73 BLEU points over the Switch Transformer on many-to-many multilingual machine translation. Importantly, LMS is able to have comparable translation performance with much fewer parameters.Comment: Accepted at the main conference of EMNLP 202

arXiv.org e-Print Archive