5 research outputs found
End-to-End Lyrics Recognition with Self-supervised Learning
Lyrics recognition is an important task in music processing. Despite
traditional algorithms such as the hybrid HMM- TDNN model achieving good
performance, studies on applying end-to-end models and self-supervised learning
(SSL) are limited. In this paper, we first establish an end-to-end baseline for
lyrics recognition and then explore the performance of SSL models on lyrics
recognition task. We evaluate a variety of upstream SSL models with different
training methods (masked reconstruction, masked prediction, autoregressive
reconstruction, and contrastive learning). Our end-to-end self-supervised
models, evaluated on the DAMP music dataset, outperform the previous
state-of-the-art (SOTA) system by 5.23% for the dev set and 2.4% for the test
set even without a language model trained by a large corpus. Moreover, we
investigate the effect of background music on the performance of
self-supervised learning models and conclude that the SSL models cannot extract
features efficiently in the presence of background music. Finally, we study the
out-of-domain generalization ability of the SSL features considering that those
models were not trained on music datasets.Comment: 4 pages, 2 figures, 3 table
A New Approach to Extract Fetal Electrocardiogram Using Affine Combination of Adaptive Filters
The detection of abnormal fetal heartbeats during pregnancy is important for
monitoring the health conditions of the fetus. While adult ECG has made several
advances in modern medicine, noninvasive fetal electrocardiography (FECG)
remains a great challenge. In this paper, we introduce a new method based on
affine combinations of adaptive filters to extract FECG signals. The affine
combination of multiple filters is able to precisely fit the reference signal,
and thus obtain more accurate FECGs. We proposed a method to combine the Least
Mean Square (LMS) and Recursive Least Squares (RLS) filters. Our approach found
that the Combined Recursive Least Squares (CRLS) filter achieves the best
performance among all proposed combinations. In addition, we found that CRLS is
more advantageous in extracting FECG from abdominal electrocardiograms (AECG)
with a small signal-to-noise ratio (SNR). Compared with the state-of-the-art
MSF-ANC method, CRLS shows improved performance. The sensitivity, accuracy and
F1 score are improved by 3.58%, 2.39% and 1.36%, respectively.Comment: 5 pages, 4 figures, 3 table
A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature Extractors
In this work, we study the features extracted by English self-supervised
learning (SSL) models in cross-lingual contexts and propose a new metric to
predict the quality of feature representations. Using automatic speech
recognition (ASR) as a downstream task, we analyze the effect of model size,
training objectives, and model architecture on the models' performance as a
feature extractor for a set of topologically diverse corpora. We develop a
novel metric, the Phonetic-Syntax Ratio (PSR), to measure the phonetic and
synthetic information in the extracted representations using deep generalized
canonical correlation analysis. Results show the contrastive loss in the
wav2vec2.0 objective facilitates more effective cross-lingual feature
extraction. There is a positive correlation between PSR scores and ASR
performance, suggesting that phonetic information extracted by monolingual SSL
models can be used for downstream tasks in cross-lingual settings. The proposed
metric is an effective indicator of the quality of the representations and can
be useful for model selection.Comment: 12 pages, 5 figures, 4 table
PQLM -- Multilingual Decentralized Portable Quantum Language Model for Privacy Protection
With careful manipulation, malicious agents can reverse engineer private
information encoded in pre-trained language models. Security concerns motivate
the development of quantum pre-training. In this work, we propose a highly
portable quantum language model (PQLM) that can easily transmit information to
downstream tasks on classical machines. The framework consists of a cloud PQLM
built with random Variational Quantum Classifiers (VQC) and local models for
downstream applications. We demonstrate the ad hoc portability of the quantum
model by extracting only the word embeddings and effectively applying them to
downstream tasks on classical machines. Our PQLM exhibits comparable
performance to its classical counterpart on both intrinsic evaluation (loss,
perplexity) and extrinsic evaluation (multilingual sentiment analysis accuracy)
metrics. We also perform ablation studies on the factors affecting PQLM
performance to analyze model stability. Our work establishes a theoretical
foundation for a portable quantum pre-trained language model that could be
trained on private data and made available for public use with privacy
protection guarantees.Comment: 5 pages, 3 figures, 3 table
Condensing Multilingual Knowledge with Lightweight Language-Specific Modules
Incorporating language-specific (LS) modules is a proven method to boost
performance in multilingual machine translation. This approach bears similarity
to Mixture-of-Experts (MoE) because it does not inflate FLOPs. However, the
scalability of this approach to hundreds of languages (experts) tends to be
unmanageable due to the prohibitive number of parameters introduced by
full-rank matrices in fully-connected layers. In this work, we introduce the
Language-Specific Matrix Synthesis (LMS) method. This approach constructs LS
modules by generating low-rank matrices from two significantly smaller matrices
to approximate the full-rank matrix. Furthermore, we condense multilingual
knowledge from multiple LS modules into a single shared module with the Fuse
Distillation (FD) technique to improve the efficiency of inference and model
serialization. We show that our LMS method significantly outperforms previous
LS methods and MoE methods with the same amount of extra parameters, e.g., 1.73
BLEU points over the Switch Transformer on many-to-many multilingual machine
translation. Importantly, LMS is able to have comparable translation
performance with much fewer parameters.Comment: Accepted at the main conference of EMNLP 202