Search CORE

16,637 research outputs found

Recognizing Emotions in a Foreign Language

Author: B Mesquita
C Dara
CE Izard
CE Izard
D Albas
D Matsumoto
D Wilson
DA Sauter
E Beier
H Elfenbein
H Elfenbein
HG Wallbott
HL Wagner
JA Russell
KR Scherer
KR Scherer
KR Scherer
KR Scherer
Laura Monetta
M Beaupré
Marc D. Pell
MD Pell
MD Pell
P Ekman
P Ekman
P Ekman
P Ekman
P Juslin
R Banse
R Bezooijen Van
S Kitayama
Silke Paulmann
Sonja A. Kotz
U Hess
W Thompson
WF Johnson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Expressions of basic emotions (joy, sadness, anger, fear, disgust) can be recognized pan-culturally from the face and it is assumed that these emotions can be recognized from a speaker's voice, regardless of an individual's culture or linguistic ability. Here, we compared how monolingual speakers of Argentine Spanish recognize basic emotions from pseudo-utterances ("nonsense speech") produced in their native language and in three foreign languages (English, German, Arabic). Results indicated that vocal expressions of basic emotions could be decoded in each language condition at accuracy levels exceeding chance, although Spanish listeners performed significantly better overall in their native language ("in-group advantage"). Our findings argue that the ability to understand vocally-expressed emotions in speech is partly independent of linguistic ability and involves universal principles, although this ability is also shaped by linguistic and cultural variables

University of Essex Research Repository

Crossref

Springer - Publisher Connector

The University of Manchester - Institutional Repository

MPG.PuRe

3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement

Author: Chen Qian
Chen Yafeng
Cheng Luyao
Wang Hui
Zheng Siqi
Publication venue
Publication date: 27/06/2023
Field of study

Disentangling uncorrelated information in speech utterances is a crucial research topic within speech community. Different speech-related tasks focus on extracting distinct speech representations while minimizing the affects of other uncorrelated information. We present a large-scale speech corpus to facilitate the research of speech representation disentanglement. 3D-Speaker contains over 10,000 speakers, each of whom are simultaneously recorded by multiple Devices, locating at different Distances, and some speakers are speaking multiple Dialects. The controlled combinations of multi-dimensional audio data yield a matrix of a diverse blend of speech representation entanglement, thereby motivating intriguing methods to untangle them. The multi-domain nature of 3D-Speaker also makes it a suitable resource to evaluate large universal speech models and experiment methods of out-of-domain learning and self-supervised learning. https://3dspeaker.github.io

arXiv.org e-Print Archive

Homogenous Ensemble Phonotactic Language Recognition Based on SVM Supervector Reconstruction

Author: Johnson Michael T
Liu Jia
Liu Wei-Wei
Zhang Wei-Qiang
Publication venue: e-Publications@Marquette
Publication date: 01/01/2014
Field of study

Currently, acoustic spoken language recognition (SLR) and phonotactic SLR systems are widely used language recognition systems. To achieve better performance, researchers combine multiple subsystems with the results often much better than a single SLR system. Phonotactic SLR subsystems may vary in the acoustic features vectors or include multiple language-specific phone recognizers and different acoustic models. These methods achieve good performance but usually compute at high computational cost. In this paper, a new diversification for phonotactic language recognition systems is proposed using vector space models by support vector machine (SVM) supervector reconstruction (SSR). In this architecture, the subsystems share the same feature extraction, decoding, and N-gram counting preprocessing steps, but model in a different vector space by using the SSR algorithm without significant additional computation. We term this a homogeneous ensemble phonotactic language recognition (HEPLR) system. The system integrates three different SVM supervector reconstruction algorithms, including relative SVM supervector reconstruction, functional SVM supervector reconstruction, and perturbing SVM supervector reconstruction. All of the algorithms are incorporated using a linear discriminant analysis-maximum mutual information (LDA-MMI) backend for improving language recognition evaluation (LRE) accuracy. Evaluated on the National Institute of Standards and Technology (NIST) LRE 2009 task, the proposed HEPLR system achieves better performance than a baseline phone recognition-vector space modeling (PR-VSM) system with minimal extra computational cost. The performance of the HEPLR system yields 1.39%, 3.63%, and 14.79% equal error rate (EER), representing 6.06%, 10.15%, and 10.53% relative improvements over the baseline system, respectively, for the 30-, 10-, and 3-s test conditions

epublications@Marquette

Crossref

Springer - Publisher Connector

Enhancing the front-end of speaker recognition systems

Author: Ahmed Ahmed Isam
Publication venue
Publication date: 01/07/2019
Field of study

Portsmouth University Research Portal (Pure)

Malafide: a novel adversarial convolutive noise attack against deepfake and spoofing detection systems

Author: Evans Nicholas
Ge Wanying
Panariello Michele
Tak Hemlata
Todisco Massimiliano
Publication venue
Publication date: 13/06/2023
Field of study

We present Malafide, a universal adversarial attack against automatic speaker verification (ASV) spoofing countermeasures (CMs). By introducing convolutional noise using an optimised linear time-invariant filter, Malafide attacks can be used to compromise CM reliability while preserving other speech attributes such as quality and the speaker's voice. In contrast to other adversarial attacks proposed recently, Malafide filters are optimised independently of the input utterance and duration, are tuned instead to the underlying spoofing attack, and require the optimisation of only a small number of filter coefficients. Even so, they degrade CM performance estimates by an order of magnitude, even in black-box settings, and can also be configured to overcome integrated CM and ASV subsystems. Integrated solutions that use self-supervised learning CMs, however, are more robust, under both black-box and white-box settings.Comment: Accepted at INTERSPEECH 202

arXiv.org e-Print Archive

Open-set Speaker Identification

Author: Karadaghi Rawande
Publication venue
Publication date: 20/12/2018
Field of study

This study is motivated by the growing need for effective extraction of intelligence and evidence from audio recordings in the fight against crime, a need made ever more apparent with the recent expansion of criminal and terrorist organisations. The main focus is to enhance open-set speaker identification process within the speaker identification systems, which are affected by noisy audio data obtained under uncontrolled environments such as in the street, in restaurants or other places of businesses. Consequently, two investigations are initially carried out including the effects of environmental noise on the accuracy of open-set speaker recognition, which thoroughly cover relevant conditions in the considered application areas, such as variable training data length, background noise and real world noise, and the effects of short and varied duration reference data in open-set speaker recognition. The investigations led to a novel method termed “vowel boosting” to enhance the reliability in speaker identification when operating with varied duration speech data under uncontrolled conditions. Vowels naturally contain more speaker specific information. Therefore, by emphasising this natural phenomenon in speech data, it enables better identification performance. The traditional state-of-the-art GMM-UBMs and i-vectors are used to evaluate “vowel boosting”. The proposed approach boosts the impact of the vowels on the speaker scores, which improves the recognition accuracy for the specific case of open-set identification with short and varied duration of speech material

University of Hertfordshire Research Archive

About Voice: A Longitudinal Study of Speaker Recognition Dataset Dynamics

Author: Gorce Lauriane
Hutiri
Leschanowsky Anna
Pnacek Michaela
Quinlan Carolyn
Rusti Casandra
Wiebke
Publication venue
Publication date: 07/04/2023
Field of study

Like face recognition, speaker recognition is widely used for voice-based biometric identification in a broad range of industries, including banking, education, recruitment, immigration, law enforcement, healthcare, and well-being. However, while dataset evaluations and audits have improved data practices in computer vision and face recognition, the data practices in speaker recognition have gone largely unquestioned. Our research aims to address this gap by exploring how dataset usage has evolved over time and what implications this has on bias and fairness in speaker recognition systems. Previous studies have demonstrated the presence of historical, representation, and measurement biases in popular speaker recognition benchmarks. In this paper, we present a longitudinal study of speaker recognition datasets used for training and evaluation from 2012 to 2021. We survey close to 700 papers to investigate community adoption of datasets and changes in usage over a crucial time period where speaker recognition approaches transitioned to the widespread adoption of deep neural networks. Our study identifies the most commonly used datasets in the field, examines their usage patterns, and assesses their attributes that affect bias, fairness, and other ethical concerns. Our findings suggest areas for further research on the ethics and fairness of speaker recognition technology.Comment: 14 pages (23 with References and Appendix

arXiv.org e-Print Archive

VOICE BIOMETRICS FUSION FOR ENHANCED SECURITY AND SPEAKER RECOGNITION: A COMPREHENSIVE REVIEW

Author: Koffi Ettien
Publication venue: The Repository at St. Cloud State
Publication date: 14/04/2023
Field of study

The scope of this paper is purposefully limited to the 15 voice biometrics modalities discussed by Jain et al. (2004). The place of Voice within their classification scheme is reexamined in light of important developments that have taken place since 2010. Additionally, elements are added to Mayhew’s (2018) overview of the history of biometrics as an attempt to fill in missing gaps concerning Voice. All this leads to a reassessment of voice biometrics and how it relates to other biometric modalities. Speech segments that carry extremely high identity vector loads are discussed. The main assertion of this paper is that increased computing power, advanced algorithms, and the deployment of Artificial Intelligent have made voice biometrics optimal for use. Furthermore, the analysis of the compatibility among modalities, the estimation of inconvenience penalty, and the calculation of the arithmetic distances between various modalities indicate that the fusion of {Voice + Face}, {Voice + Fingerprint}, {Voice + Iris}, and {Voice + Signature} on the one hand, and of {Voice + Face +Fingerprint}, {Voice +Fingerprint + Signature} on the other, offer the best liveliness assurance against hacking, spoofing, and other malicious activities

St. Cloud State University