Search CORE

222 research outputs found

Text-independent speaker recognition

Author: Gangisetty Smitha
Publication venue: The Research Repository @ WVU
Publication date: 01/05/2005
Field of study

This research presents new text-independent speaker recognition system with multivariate tools such as Principal Component Analysis (PCA) and Independent Component Analysis (ICA) embedded into the recognition system after the feature extraction step. The proposed approach evaluates the performance of such a recognition system when trained and used in clean and noisy environments. Additive white Gaussian noise and convolutive noise are added. Experiments were carried out to investigate the robust ability of PCA and ICA using the designed approach. The application of ICA improved the performance of the speaker recognition model when compared to PCA. Experimental results show that use of ICA enabled extraction of higher order statistics thereby capturing speaker dependent statistical cues in a text-independent recognition system. The results show that ICA has a better de-correlation and dimension reduction property than PCA. To simulate a multi environment system, we trained our model such that every time a new speech signal was read, it was contaminated with different types of noises and stored in the database. Results also show that ICA outperforms PCA under adverse environments. This is verified by computing recognition accuracy rates obtained when the designed system was tested for different train and test SNR conditions with additive white Gaussian noise and test delay conditions with echo effect

The Research Repository @ WVU (West Virginia University)

EEG-based biometrics: Effects of template ageing

Author: Arevalillo-Herráez Miguel
Arnau-González Pablo
Katsigiannis Stamos
Ramzan Naeem
Publication venue: 'Society for Leukocyte Biology'
Publication date: 01/01/2020
Field of study

This chapter discusses the effects of template ageing in EEG-based biometrics. The chapter also serves as an introduction to general biometrics and its main tasks: Identification and verification. To do so, we investigate different characterisations of EEG signals and examine the difference of performance in subject identification between single session and cross-session identification experiments. In order to do this, EEG signals are characterised with common state-of-the-art features, i.e. Mel Frequency Cepstral Coefficients (MFCC), Autoregression Coefficients, and Power Spectral Density-derived features. The samples were later classified using various classifiers, including Support Vector Machines and k-Nearest Neighbours with different parametrisations. Results show that performance tends to be worse for crosssession identification compared to single session identification. This finding suggests that temporal permanence of EEG signals is limited and thus more sophisticated methods are needed in order to characterise EEG signals for the task of subject identificatio

Durham Research Online

Pushing the envelope: Evaluating speech rhythm with different envelope extraction techniques

Author: Cai CQ
Macintyre AD
Scott SK
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 23/03/2022
Field of study

The amplitude of the speech signal varies over time, and the speech envelope is an attempt to characterise this variation in the form of an acoustic feature. Although tacitly assumed, the similarity between the speech envelope-derived time series and that of phonetic objects (e.g., vowels) remains empirically unestablished. The current paper, therefore, evaluates several speech envelope extraction techniques, such as the Hilbert transform, by comparing different acoustic landmarks (e.g., peaks in the speech envelope) with manual phonetic annotation in a naturalistic and diverse dataset. Joint speech tasks are also introduced to determine which acoustic landmarks are most closely coordinated when voices are aligned. Finally, the acoustic landmarks are evaluated as predictors for the temporal characterisation of speaking style using classification tasks. The landmark that performed most closely to annotated vowel onsets was peaks in the first derivative of a human audition-informed envelope, consistent with converging evidence from neural and behavioural data. However, differences also emerged based on language and speaking style. Overall, the results show that both the choice of speech envelope extraction technique and the form of speech under study affect how sensitive an engineered feature is at capturing aspects of speech rhythm, such as the timing of vowels

Introduction to this Special Issue: Intelligent Data Analysis on Electromyography and Electroneurography

Author: Merletti R.
Middleton L.T.
Parker P.A.
Pattichis C.S.
Schofield I.S.
Publication venue
Publication date
Field of study

Computer-aided electromyography (EMG) and elec- troneurography (ENG) have become indispensable tools in the daily activities of neurophysiology laboratories in facilitating quantitative analysis and decision making in clinical neurophysiology, rehabilitation, sports medicine, and studies of human physiology. These tools form the basis of a new era in the practice of neurophysiology facilitating the: (i) Standardization . Diagnoses obtained with similar criteria in different laboratories can be veri- fied. (ii) Sensitivity . Neurophysiological findings in a particular subject under investigation may be compared with a database of normal values to determine whether abnormality exists or not. (iii) Specificity . Findings may be compared with databases derived from patients with known diseases, to evaluate whether they fit a specific diagnosis. (iv) Equivalence . Results from serial examin- ations on the same patient may be compared to decide whether there is evidence of disease progression or of response to treatment. Also, findings obtained from dif- ferent quantitative methods may be contrasted to deter- mine which are most sensitive and specific. Different methodologies have been developed in com- puter-aided EMG and ENG analysis ranging from simple quantitative measures of the recorded potentials, to more complex knowledge-based and neural network systems that enable the automated assessment of neuromuscular disorders. However, the need still exists for the further advancement and standardization of these method- ologies, especially nowadays with the emerging health telematics technologies which will enable their wider application in the neurophysiological laboratory. The main objective of this Special Issue of Medical Engin- eering & Physics is to provide a snapshot of current activities and methodologies in intelligent data analysis in peripheral neurophysiology. A total of 12 papers are published in this Special Issue under the following topics: Motor Unit Action Potential (MUAP) Analysis, Surface EMG (SEMG) Analysis, Electroneurography, and Decision Systems. In this intro- duction, the papers are briefly introduced, following a brief review of the major achievements in quantitative electromyography and electroneuropathy

ZENODO

Disentangling Prosody Representations with Unsupervised Speech Reconstruction

Author: Li Taihao
Pekarek-Rosin Theresa
Qu Leyuan
Ren Fuji
Weber Cornelius
Wermter Stefan
Publication venue
Publication date: 25/09/2023
Field of study

Human speech can be characterized by different components, including semantic content, speaker identity and prosodic information. Significant progress has been made in disentangling representations for semantic content and speaker identity in Automatic Speech Recognition (ASR) and speaker verification tasks respectively. However, it is still an open challenging research question to extract prosodic information because of the intrinsic association of different attributes, such as timbre and rhythm, and because of the need for supervised training schemes to achieve robust large-scale and speaker-independent ASR. The aim of this paper is to address the disentanglement of emotional prosody from speech based on unsupervised reconstruction. Specifically, we identify, design, implement and integrate three crucial components in our proposed speech reconstruction model Prosody2Vec: (1) a unit encoder that transforms speech signals into discrete units for semantic content, (2) a pretrained speaker verification model to generate speaker identity embeddings, and (3) a trainable prosody encoder to learn prosody representations. We first pretrain the Prosody2Vec representations on unlabelled emotional speech corpora, then fine-tune the model on specific datasets to perform Speech Emotion Recognition (SER) and Emotional Voice Conversion (EVC) tasks. Both objective (weighted and unweighted accuracies) and subjective (mean opinion score) evaluations on the EVC task suggest that Prosody2Vec effectively captures general prosodic features that can be smoothly transferred to other emotional speech. In addition, our SER experiments on the IEMOCAP dataset reveal that the prosody features learned by Prosody2Vec are complementary and beneficial for the performance of widely used speech pretraining models and surpass the state-of-the-art methods when combining Prosody2Vec with HuBERT representations.Comment: Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processin

arXiv.org e-Print Archive