Search CORE

2,595 research outputs found

Physiologically-Motivated Feature Extraction Methods for Speaker Recognition

Author: Wang Jianglin
Publication venue: e-Publications@Marquette
Publication date: 01/10/2013
Field of study

Speaker recognition has received a great deal of attention from the speech community, and significant gains in robustness and accuracy have been obtained over the past decade. However, the features used for identification are still primarily representations of overall spectral characteristics, and thus the models are primarily phonetic in nature, differentiating speakers based on overall pronunciation patterns. This creates difficulties in terms of the amount of enrollment data and complexity of the models required to cover the phonetic space, especially in tasks such as identification where enrollment and testing data may not have similar phonetic coverage. This dissertation introduces new features based on vocal source characteristics intended to capture physiological information related to the laryngeal excitation energy of a speaker. These features, including RPCC, GLFCC and TPCC, represent the unique characteristics of speech production not represented in current state-of-the-art speaker identification systems. The proposed features are evaluated through three experimental paradigms including cross-lingual speaker identification, cross song-type avian speaker identification and mono-lingual speaker identification. The experimental results show that the proposed features provide information about speaker characteristics that is significantly different in nature from the phonetically-focused information present in traditional spectral features. The incorporation of the proposed glottal source features offers significant overall improvement to the robustness and accuracy of speaker identification tasks

epublications@Marquette

Spectral Properties and Prosodic Parameters of Emotional Speech in Czech and Slovak

Author: Anna Pribilova
Jiri Pribil
Publication venue: 'IntechOpen'
Publication date: 21/06/2011
Field of study

IntechOpen

Voice pathologies : the most comum features and classification tools

Author: Fernandes Joana Filipa
Freitas Diamantino
Teixeira João Paulo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Speech pathologies are quite common in society, however the exams that exist are invasive, making them uncomfortable for patients and depending on the experience of the clinician who performs the assessment. Hence the need to develop non-invasive methods, which allow objective and efficient analysis. Taking this need into account in this work, the most promising list of features and classifiers was identified. As features, jitter, shimmer, HNR, LPC, PLP, and MFCC were identified and as classifiers CNN, RNN and LSTM. This study intends to develop a device to support medical decision, however this article already presents the system interface.info:eu-repo/semantics/publishedVersio

Biblioteca Digital do IPB

Analysis and Detection of Pathological Voice using Glottal Source Features

Author: Alku Paavo
Kadiri Sudarsana Reddy
Publication venue
Publication date: 25/09/2023
Field of study

Automatic detection of voice pathology enables objective assessment and earlier intervention for the diagnosis. This study provides a systematic analysis of glottal source features and investigates their effectiveness in voice pathology detection. Glottal source features are extracted using glottal flows estimated with the quasi-closed phase (QCP) glottal inverse filtering method, using approximate glottal source signals computed with the zero frequency filtering (ZFF) method, and using acoustic voice signals directly. In addition, we propose to derive mel-frequency cepstral coefficients (MFCCs) from the glottal source waveforms computed by QCP and ZFF to effectively capture the variations in glottal source spectra of pathological voice. Experiments were carried out using two databases, the Hospital Universitario Principe de Asturias (HUPA) database and the Saarbrucken Voice Disorders (SVD) database. Analysis of features revealed that the glottal source contains information that discriminates normal and pathological voice. Pathology detection experiments were carried out using support vector machine (SVM). From the detection experiments it was observed that the performance achieved with the studied glottal source features is comparable or better than that of conventional MFCCs and perceptual linear prediction (PLP) features. The best detection performance was achieved when the glottal source features were combined with the conventional MFCCs and PLP features, which indicates the complementary nature of the features

arXiv.org e-Print Archive

Articulatory features for conversational speech recognition

Author: Metze Florian
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2005
Field of study

KITopen

Cross-modal correspondences in non-human mammal communication

Author: Adachi
Adachi
Adachi
Adachi
Adachi
Agrillo
Andrew
Aslin
Aslin
Bachorowski
Baillargeon
Barsalou
Batteau
Bertelson
Bertelson
Bo-Jørgensen
Bovet
Bradbury
Bremner
Briefer
Cantlon
Cantlon
Cartei
Charlton
Charlton
Charlton
Cheng
Collins
Cooper
Davies
De Boer
De Hevia
Doehrmann
Emmerton
Ernst
Ernst
Ernst
Evans
Evans
Ey
Fant
Faragó
Feinberg
Fernández-Prieto
Fitch
Fitch
Fitch
Fitch
Fitch
Fitch
Fletcher
Gallace
Gaver
Ghazanfar
Ghazanfar
Ghazanfar
Ghazanfar
Ghazanfar
Ghazanfar
Ghazanfar
Gibbon
Gil-da-Costa
Golinkoff
Golinkoff
Green
Greene
Gunderson
Harley
Harries
Hauser
Held
Herman
Hillenbrand
Houston-Price
Howard
Hughes
Hunter
Hölldobler
Izumi
Jacob
Johnston
Johnstone
Jordan
Kingdon
Kojima
Kondo
Kriengwatana
Krusche
Kulachi
Kulahci
Köhler
Lewkowicz
Lewkowicz
Lewkowicz
Lewkowicz
Lourenco
Ludwig
Madden
Maier
Marks
Marks
Marks
Marks
Martinez
Masataka
Maurer
Maynard-Smith
McGurk
Meck
Meltzoff
Merchant
Merritt
Moller
Monaghan
Moore
Morgan
Morton
Nagumo
Narins
Neuhoff
Ohala
Parault
Parise
Parise
Parise
Parise
Parr
Partan
Partan
Pascalis
Perdue
Peterson
Pisa
Pisanski
Pisanski
Plotsky
Pratt
Proops
Proops
Puts
Puts
Ramachandran
Ratcliffe
Reby
Reby
Rendall
Rendall
Riede
Rosch
Rowe
Rugani
Rundus
Rusconi
Seyfarth
Seyfarth
Seyfarth
Seyfarth
Sliwa
Slutsky
Smith
Smith
Sokolov
Spence
Spence
Spence
Srinivasan
Stebbens
Stevens
Taylor
Taylor
Taylor
Tedore
Thompson
Titze
Uetz
Vachon
Vatakis
Vatakis
Vatakis
Vroomen
Walker
Walker
Walker-Andrews
Walsh
Wang
Wang
Woods
Woods
Zangenehpour
Zangenehpour
Zuberbühler
Publication venue: 'Brill'
Publication date: 01/01/2015
Field of study

For both humans and other animals, the ability to combine information obtained through different senses is fundamental to the perception of the environment. It is well established that humans form systematic cross-modal correspondences between stimulus features that can facilitate the accurate combination of sensory percepts. However, the evolutionary origins of the perceptual and cognitive mechanisms involved in these cross-modal associations remain surprisingly underexplored. In this review we outline recent comparative studies investigating how non-human mammals naturally combine information encoded in different sensory modalities during communication. The results of these behavioural studies demonstrate that various mammalian species are able to combine signals from different sensory channels when they are perceived to share the same basic features, either be- cause they can be redundantly sensed and/or because they are processed in the same way. Moreover, evidence that a wide range of mammals form complex cognitive representations about signallers, both within and across species, suggests that animals also learn to associate different sensory features which regularly co-occur. Further research is now necessary to determine how multisensory representations are formed in individual animals, including the relative importance of low level feature-related correspondences. Such investigations will generate important insights into how animals perceive and categorise their environment, as well as provide an essential basis for understanding the evolution of multisensory perception in humans

Crossref

Sussex Research Online