2,595 research outputs found
Physiologically-Motivated Feature Extraction Methods for Speaker Recognition
Speaker recognition has received a great deal of attention from the speech community, and significant gains in robustness and accuracy have been obtained over the past decade. However, the features used for identification are still primarily representations of overall spectral characteristics, and thus the models are primarily phonetic in nature, differentiating speakers based on overall pronunciation patterns. This creates difficulties in terms of the amount of enrollment data and complexity of the models required to cover the phonetic space, especially in tasks such as identification where enrollment and testing data may not have similar phonetic coverage. This dissertation introduces new features based on vocal source characteristics intended to capture physiological information related to the laryngeal excitation energy of a speaker. These features, including RPCC, GLFCC and TPCC, represent the unique characteristics of speech production not represented in current state-of-the-art speaker identification systems. The proposed features are evaluated through three experimental paradigms including cross-lingual speaker identification, cross song-type avian speaker identification and mono-lingual speaker identification. The experimental results show that the proposed features provide information about speaker characteristics that is significantly different in nature from the phonetically-focused information present in traditional spectral features. The incorporation of the proposed glottal source features offers significant overall improvement to the robustness and accuracy of speaker identification tasks
Voice pathologies : the most comum features and classification tools
Speech pathologies are quite common in society, however the exams that exist are invasive, making them uncomfortable for patients and depending on the experience of the clinician who performs the assessment. Hence the need to develop non-invasive methods, which allow objective and efficient analysis. Taking this need into account in this work, the most promising list of features and classifiers was identified. As features, jitter, shimmer, HNR, LPC, PLP, and MFCC were identified and as classifiers CNN, RNN and LSTM. This study intends to develop a device to support medical decision, however this article already presents the system interface.info:eu-repo/semantics/publishedVersio
Analysis and Detection of Pathological Voice using Glottal Source Features
Automatic detection of voice pathology enables objective assessment and
earlier intervention for the diagnosis. This study provides a systematic
analysis of glottal source features and investigates their effectiveness in
voice pathology detection. Glottal source features are extracted using glottal
flows estimated with the quasi-closed phase (QCP) glottal inverse filtering
method, using approximate glottal source signals computed with the zero
frequency filtering (ZFF) method, and using acoustic voice signals directly. In
addition, we propose to derive mel-frequency cepstral coefficients (MFCCs) from
the glottal source waveforms computed by QCP and ZFF to effectively capture the
variations in glottal source spectra of pathological voice. Experiments were
carried out using two databases, the Hospital Universitario Principe de
Asturias (HUPA) database and the Saarbrucken Voice Disorders (SVD) database.
Analysis of features revealed that the glottal source contains information that
discriminates normal and pathological voice. Pathology detection experiments
were carried out using support vector machine (SVM). From the detection
experiments it was observed that the performance achieved with the studied
glottal source features is comparable or better than that of conventional MFCCs
and perceptual linear prediction (PLP) features. The best detection performance
was achieved when the glottal source features were combined with the
conventional MFCCs and PLP features, which indicates the complementary nature
of the features
Cross-modal correspondences in non-human mammal communication
For both humans and other animals, the ability to combine information obtained through different senses is fundamental to the perception of the environment. It is well established that humans form systematic cross-modal correspondences between stimulus features that can facilitate the accurate combination of sensory percepts. However, the evolutionary origins of the perceptual and cognitive mechanisms involved in these cross-modal associations remain surprisingly underexplored. In this review we outline recent comparative studies investigating how non-human mammals naturally combine information encoded in different sensory modalities during communication. The results of these behavioural studies demonstrate that various mammalian species are able to combine signals from different sensory channels when they are perceived to share the same basic features, either be-
cause they can be redundantly sensed and/or because they are processed in the same way. Moreover, evidence that a wide range of mammals form complex cognitive representations about signallers, both within and across species, suggests that animals also learn to associate different sensory features which regularly co-occur. Further research is now necessary to determine how multisensory representations are formed in individual animals, including the relative importance of low level feature-related correspondences. Such investigations will generate important insights into how animals perceive and
categorise their environment, as well as provide an essential basis for understanding the evolution of
multisensory perception in humans
- …