222 research outputs found
Text-independent speaker recognition
This research presents new text-independent speaker recognition system with multivariate tools such as Principal Component Analysis (PCA) and Independent Component Analysis (ICA) embedded into the recognition system after the feature extraction step. The proposed approach evaluates the performance of such a recognition system when trained and used in clean and noisy environments. Additive white Gaussian noise and convolutive noise are added. Experiments were carried out to investigate the robust ability of PCA and ICA using the designed approach. The application of ICA improved the performance of the speaker recognition model when compared to PCA. Experimental results show that use of ICA enabled extraction of higher order statistics thereby capturing speaker dependent statistical cues in a text-independent recognition system. The results show that ICA has a better de-correlation and dimension reduction property than PCA. To simulate a multi environment system, we trained our model such that every time a new speech signal was read, it was contaminated with different types of noises and stored in the database. Results also show that ICA outperforms PCA under adverse environments. This is verified by computing recognition accuracy rates obtained when the designed system was tested for different train and test SNR conditions with additive white Gaussian noise and test delay conditions with echo effect
EEG-based biometrics: Effects of template ageing
This chapter discusses the effects of template ageing in EEG-based biometrics. The chapter also serves as an introduction to general biometrics and its main tasks: Identification and verification. To do so, we investigate different characterisations of EEG signals and examine the difference of performance in subject identification between single session and cross-session identification experiments. In order to do this, EEG signals are characterised with common state-of-the-art features, i.e. Mel Frequency Cepstral Coefficients (MFCC), Autoregression Coefficients, and Power Spectral Density-derived features. The samples were later classified using various classifiers, including Support Vector Machines and k-Nearest Neighbours with different parametrisations. Results show that performance tends to be worse for crosssession identification compared to single session identification. This finding suggests that temporal permanence of EEG signals is limited and thus more sophisticated methods are needed in order to characterise EEG signals for the task of subject identificatio
Pushing the envelope: Evaluating speech rhythm with different envelope extraction techniques
The amplitude of the speech signal varies over time, and the speech envelope is an attempt to characterise this variation in the form of an acoustic feature. Although tacitly assumed, the similarity between the speech envelope-derived time series and that of phonetic objects (e.g., vowels) remains empirically unestablished. The current paper, therefore, evaluates several speech envelope extraction techniques, such as the Hilbert transform, by comparing different acoustic landmarks (e.g., peaks in the speech envelope) with manual phonetic annotation in a naturalistic and diverse dataset. Joint speech tasks are also introduced to determine which acoustic landmarks are most closely coordinated when voices are aligned. Finally, the acoustic landmarks are evaluated as predictors for the temporal characterisation of speaking style using classification tasks. The landmark that performed most closely to annotated vowel onsets was peaks in the first derivative of a human audition-informed envelope, consistent with converging evidence from neural and behavioural data. However, differences also emerged based on language and speaking style. Overall, the results show that both the choice of speech envelope extraction technique and the form of speech under study affect how sensitive an engineered feature is at capturing aspects of speech rhythm, such as the timing of vowels
Introduction to this Special Issue: Intelligent Data Analysis on Electromyography and Electroneurography
Computer-aided electromyography (EMG) and elec- troneurography (ENG) have become indispensable tools in the daily activities of neurophysiology laboratories in facilitating quantitative analysis and decision making in clinical neurophysiology, rehabilitation, sports medicine, and studies of human physiology. These tools form the basis of a new era in the practice of neurophysiology facilitating the: (i) Standardization . Diagnoses obtained with similar criteria in different laboratories can be veri- fied. (ii) Sensitivity . Neurophysiological findings in a particular subject under investigation may be compared with a database of normal values to determine whether abnormality exists or not. (iii) Specificity . Findings may be compared with databases derived from patients with known diseases, to evaluate whether they fit a specific diagnosis. (iv) Equivalence . Results from serial examin- ations on the same patient may be compared to decide whether there is evidence of disease progression or of response to treatment. Also, findings obtained from dif- ferent quantitative methods may be contrasted to deter- mine which are most sensitive and specific.
Different methodologies have been developed in com- puter-aided EMG and ENG analysis ranging from simple quantitative measures of the recorded potentials, to more complex knowledge-based and neural network systems that enable the automated assessment of neuromuscular disorders. However, the need still exists for the further advancement and standardization of these method- ologies, especially nowadays with the emerging health telematics technologies which will enable their wider application in the neurophysiological laboratory. The main objective of this Special Issue of Medical Engin- eering & Physics is to provide a snapshot of current activities and methodologies in intelligent data analysis in peripheral neurophysiology.
A total of 12 papers are published in this Special Issue under the following topics: Motor Unit Action Potential (MUAP) Analysis, Surface EMG (SEMG) Analysis, Electroneurography, and Decision Systems. In this intro- duction, the papers are briefly introduced, following a brief review of the major achievements in quantitative electromyography and electroneuropathy
Disentangling Prosody Representations with Unsupervised Speech Reconstruction
Human speech can be characterized by different components, including semantic
content, speaker identity and prosodic information. Significant progress has
been made in disentangling representations for semantic content and speaker
identity in Automatic Speech Recognition (ASR) and speaker verification tasks
respectively. However, it is still an open challenging research question to
extract prosodic information because of the intrinsic association of different
attributes, such as timbre and rhythm, and because of the need for supervised
training schemes to achieve robust large-scale and speaker-independent ASR. The
aim of this paper is to address the disentanglement of emotional prosody from
speech based on unsupervised reconstruction. Specifically, we identify, design,
implement and integrate three crucial components in our proposed speech
reconstruction model Prosody2Vec: (1) a unit encoder that transforms speech
signals into discrete units for semantic content, (2) a pretrained speaker
verification model to generate speaker identity embeddings, and (3) a trainable
prosody encoder to learn prosody representations. We first pretrain the
Prosody2Vec representations on unlabelled emotional speech corpora, then
fine-tune the model on specific datasets to perform Speech Emotion Recognition
(SER) and Emotional Voice Conversion (EVC) tasks. Both objective (weighted and
unweighted accuracies) and subjective (mean opinion score) evaluations on the
EVC task suggest that Prosody2Vec effectively captures general prosodic
features that can be smoothly transferred to other emotional speech. In
addition, our SER experiments on the IEMOCAP dataset reveal that the prosody
features learned by Prosody2Vec are complementary and beneficial for the
performance of widely used speech pretraining models and surpass the
state-of-the-art methods when combining Prosody2Vec with HuBERT
representations.Comment: Accepted by IEEE/ACM Transactions on Audio, Speech, and Language
Processin
- …