57,478 research outputs found
Statistical Approaches for Signal Processing with Application to Automatic Singer Identification
In the music world, the oldest instrument is known as the singing voice that plays an important role in musical recordings. The singer\u27s identity serves as a primary aid for people to organize, browse, and retrieve music recordings. In this thesis, we focus on the problem of singer identification based on the acoustic features of singing voice. An automatic singer identification system is constructed and has achieved a very high identification accuracy. This system consists of three crucial parts: singing voice detection, background music removal and pattern recognition. These parts are introduced and explored in great details in this thesis. To be specific, in terms of the singing voice detection, we firstly study a traditional method, double GMM. Then an improved method, namely single GMM, is proposed. The experimental result shows that the detection accuracy of single GMM can be achieved as high as 96.42%. In terms of the background music removal, Non-negative Matrix Factorization (NMF) and Robust Principal Component Analysis (RPCA) are demonstrated. The evaluation result shows that RPCA outperforms NMF. In terms of pattern recognition, we explore the algorithms of Support Vector Machine (SVM) and Gaussian Mixture Model (GMM). Based on the experimental results, it turns out that the prediction accuracy of GMM classifier is about 16% higher than SVM
Toward Leveraging Pre-Trained Self-Supervised Frontends for Automatic Singing Voice Understanding Tasks: Three Case Studies
Automatic singing voice understanding tasks, such as singer identification,
singing voice transcription, and singing technique classification, benefit from
data-driven approaches that utilize deep learning techniques. These approaches
work well even under the rich diversity of vocal and noisy samples owing to
their representation ability. However, the limited availability of labeled data
remains a significant obstacle to achieving satisfactory performance. In recent
years, self-supervised learning models (SSL models) have been trained using
large amounts of unlabeled data in the field of speech processing and music
classification. By fine-tuning these models for the target tasks, comparable
performance to conventional supervised learning can be achieved with limited
training data. Therefore, in this paper, we investigate the effectiveness of
SSL models for various singing voice recognition tasks. We report the results
of experiments comparing SSL models for three different tasks (i.e., singer
identification, singing voice transcription, and singing technique
classification) as initial exploration and aim to discuss these findings.
Experimental results show that each SSL model achieves comparable performance
and sometimes outperforms compared to state-of-the-art methods on each task. We
also conducted a layer-wise analysis to further understand the behavior of the
SSL models.Comment: Submitted to APSIPA 202
Speaker Identification for Swiss German with Spectral and Rhythm Features
We present results of speech rhythm analysis for automatic speaker identification. We expand previous experiments using similar methods for language identification. Features describing the rhythmic properties of salient changes in signal components are extracted and used in an speaker identification task to determine to which extent they are descriptive of speaker variability. We also test the performance of state-of-the-art but simple-to-extract frame-based features. The paper focus is the evaluation on one corpus (swiss german, TEVOID) using support vector machines. Results suggest that the general spectral features can provide very good performance on this dataset, whereas the rhythm features are not as successful in the task, indicating either the lack of suitability for this task or the dataset specificity
Machine Learning in Automated Text Categorization
The automated categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last ten years, due to the
increased availability of documents in digital form and the ensuing need to
organize them. In the research community the dominant approach to this problem
is based on machine learning techniques: a general inductive process
automatically builds a classifier by learning, from a set of preclassified
documents, the characteristics of the categories. The advantages of this
approach over the knowledge engineering approach (consisting in the manual
definition of a classifier by domain experts) are a very good effectiveness,
considerable savings in terms of expert manpower, and straightforward
portability to different domains. This survey discusses the main approaches to
text categorization that fall within the machine learning paradigm. We will
discuss in detail issues pertaining to three different problems, namely
document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey
Using the beat histogram for speech rhythm description and language identification
In this paper we present a novel approach for the description of speech rhythm and the extraction of rhythm-related features for automatic language identification (LID). Previous methods have extracted speech rhythm through the calculation of features based on salient elements of speech such as consonants, vowels and syllables. We present how an automatic rhythm extraction method borrowed from music information retrieval, the beat histogram, can be adapted for the analysis of speech rhythm by defining the most relevant novelty functions in the speech signal and extracting features describing their periodicities. We have evaluated those features in a rhythm-based LID task for two multilingual speech corpora using support vector machines, including feature selection methods to identify the most informative descriptors. Results suggest that the method is successful in describing speech rhythm and provides LID classification accuracy comparable to or better than that of other approaches, without the need for a preceding segmentation or annotation of the speech signal. Concerning rhythm typology, the rhythm class hypothesis in its original form seems to be only partly confirmed by our results
Connections Between Adaptive Control and Optimization in Machine Learning
This paper demonstrates many immediate connections between adaptive control
and optimization methods commonly employed in machine learning. Starting from
common output error formulations, similarities in update law modifications are
examined. Concepts in stability, performance, and learning, common to both
fields are then discussed. Building on the similarities in update laws and
common concepts, new intersections and opportunities for improved algorithm
analysis are provided. In particular, a specific problem related to higher
order learning is solved through insights obtained from these intersections.Comment: 18 page
- …