Search CORE

42,453 research outputs found

Exploring the Time-efficient Evolutionary-based Feature Selection Algorithms for Speech Data under Stressful Work Condition

Author: Adi Derry Pramono
Frismanda
Gumelar Agustinus Bimo
Junaedi Lukman
Kristanto Andreas Agung
Publication venue: 'EMITTER International Journal of Engineering Technology'
Publication date: 26/02/2021
Field of study

Initially, the goal of Machine Learning (ML) advancements is faster computation time and lower computation resources, while the curse of dimensionality burdens both computation time and resource. This paper describes the benefits of the Feature Selection Algorithms (FSA) for speech data under workload stress. FSA contributes to reducing both data dimension and computation time and simultaneously retains the speech information. We chose to use the robust Evolutionary Algorithm, Harmony Search, Principal Component Analysis, Genetic Algorithm, Particle Swarm Optimization, Ant Colony Optimization, and Bee Colony Optimization, which are then to be evaluated using the hierarchical machine learning models. These FSAs are explored with the conversational workload stress data of a Customer Service hotline, which has daily complaints that trigger stress in speaking. Furthermore, we employed precisely 223 acoustic-based features. Using Random Forest, our evaluation result showed computation time had improved 3.6 faster than the original 223 features employed. Evaluation using Support Vector Machine beat the record with 0.001 seconds of computation time

EMITTER - International Journal of Engineering Technology

An acoustic investigation of the developmental trajectory of lexical stress contrastivity in Italian

Author: Arciuli Joanne
Colombo Lucia
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

We examined whether typically developing Italian children exhibit adult-like stress contrastivity for word productions elicited via a picture naming task (n=25 children aged 3\u20135 years and 27 adults). Stimuli were 10 trisyllabic Italian words; half began with a weak\u2013strong (WS) pattern of lexical stress across the initial 2 syllables, as in patata, while the other half began with a strong\u2013weak (SW) pattern, as in gomito. Word productions that were identified as correct via perceptual judgement were analysed acoustically. The initial 2 syllables of each correct word production were analysed in terms of the duration, peak intensity, and peak fundamental frequency of the vowels using a relative measure of contrast\u2014the normalised pairwise variability index (PVI). Results across the majority of measures showed that children\u2019s stress contrastivity was adult-like. However, the data revealed that children\u2019s contrastivity for trisyllabic words beginning with a WS pattern was not adult-like regarding the PVI for vowel duration: children showed less contrastivity than adults. This effect appeared to be driven by differences in word-medial gemination between children and adults. Results are compared with data from a recent acoustic study of stress contrastivity in English speaking children and adults and discussed in relation to language-specific and physiological motor-speech constraints on production

Crossref

Archivio istituzionale della ricerca - Università di Padova

Recommended from our members

A speech envelope landmark for syllable encoding in human superior temporal gyrus.

Author: Chang Edward F
Oganian Yulia
Publication venue: eScholarship, University of California
Publication date: 01/11/2019
Field of study

The most salient acoustic features in speech are the modulations in its intensity, captured by the amplitude envelope. Perceptually, the envelope is necessary for speech comprehension. Yet, the neural computations that represent the envelope and their linguistic implications are heavily debated. We used high-density intracranial recordings, while participants listened to speech, to determine how the envelope is represented in human speech cortical areas on the superior temporal gyrus (STG). We found that a well-defined zone in middle STG detects acoustic onset edges (local maxima in the envelope rate of change). Acoustic analyses demonstrated that timing of acoustic onset edges cues syllabic nucleus onsets, while their slope cues syllabic stress. Synthesized amplitude-modulated tone stimuli showed that steeper slopes elicited greater responses, confirming cortical encoding of amplitude change, not absolute amplitude. Overall, STG encoding of the timing and magnitude of acoustic onset edges underlies the perception of speech temporal structure

eScholarship - University of California

I hear you eat and speak: automatic recognition of eating condition and food type, use-cases, and impact on ASR performance

Author: Batliner A
Hantke S
Kurle R
Mousa AELD
Ringeval F
Schuller B
Weninger F
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 14/04/2016
Field of study

We propose a new recognition task in the area of computational paralinguistics: automatic recognition of eating conditions in speech, i. e., whether people are eating while speaking, and what they are eating. To this end, we introduce the audio-visual iHEARu-EAT database featuring 1.6 k utterances of 30 subjects (mean age: 26.1 years, standard deviation: 2.66 years, gender balanced, German speakers), six types of food (Apple, Nectarine, Banana, Haribo Smurfs, Biscuit, and Crisps), and read as well as spontaneous speech, which is made publicly available for research purposes. We start with demonstrating that for automatic speech recognition (ASR), it pays off to know whether speakers are eating or not. We also propose automatic classification both by brute-forcing of low-level acoustic features as well as higher-level features related to intelligibility, obtained from an Automatic Speech Recogniser. Prediction of the eating condition was performed with a Support Vector Machine (SVM) classifier employed in a leave-one-speaker-out evaluation framework. Results show that the binary prediction of eating condition (i. e., eating or not eating) can be easily solved independently of the speaking condition; the obtained average recalls are all above 90%. Low-level acoustic features provide the best performance on spontaneous speech, which reaches up to 62.3% average recall for multi-way classification of the eating condition, i. e., discriminating the six types of food, as well as not eating. The early fusion of features related to intelligibility with the brute-forced acoustic feature set improves the performance on read speech, reaching a 66.4% average recall for the multi-way classification task. Analysing features and classifier errors leads to a suitable ordinal scale for eating conditions, on which automatic regression can be performed with up to 56.2% determination coefficient

Directory of Open Access Journals

Spiral - Imperial College Digital Repository

Stress and Emotion Classification Using Jitter and Shimmer Features

Author: Johnson Michael T.
Leong Kirsten
Li Xi
Newman John D.
Savage Anne
Soltis Joseph
Tao Jidong
Publication venue: e-Publications@Marquette
Publication date: 01/01/2007
Field of study

In this paper, we evaluate the use of appended jitter and shimmer speech features for the classification of human speaking styles and of animal vocalization arousal levels. Jitter and shimmer features are extracted from the fundamental frequency contour and added to baseline spectral features, specifically Mel-frequency cepstral coefficients (MFCCs) for human speech and Greenwood function cepstral coefficients (GFCCs) for animal vocalizations. Hidden Markov models (HMMs) with Gaussian mixture models (GMMs) state distributions are used for classification. The appended jitter and shimmer features result in an increase in classification accuracy for several illustrative datasets, including the SUSAS dataset for human speaking styles as well as vocalizations labeled by arousal level for African elephant and Rhesus monkey species

epublications@Marquette

CiteSeerX

Crossref

Saliency or template? ERP evidence for long-term representation of word stress

Author: Atienza
Boersma
Ceponiene
Cheour
Colombo
Cunillera
Cutler
Cutler
Delorme
Ferenc Honbolygó
Friederici
Friedrich
Fónagy
Greenhouse
Grosjean
Honbolygó
Horváth
Jacobsen
Jacobsen
Kager
Korpilahti
Mattys
Mattys
Näätänen
Näätänen
Näätänen
Näätänen
Pulvermüller
Pulvermüller
Shtyrov
Siptár
Valéria Csépe
Varga
Weber
Winkler
Winkler
Ylinen
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

The present study investigated the event-related brain potential (ERP) correlates of word stress processing. Previous results showed that the violation of a legal stress pattern elicited two consecutive Mismatch Negativity (MMN) components synchronized to the changes on the first and second syllable. The aim of the present study was to test whether ERPs reflect only the detection of salient features present on the syllables, or they reflect the activation of long-term stress related representations. We examined ERPs elicited by pseudowords with no lexical representation in two conditions: the standard having a legal stress patterns, and the deviant an illegal one, and the standard having an illegal stress pattern, and the deviant a legal one. We found that the deviant having an illegal stress pattern elicited two consecutive MMN components, whereas the deviant having a legal stress pattern did not elicit MMN. Moreover, pseudowords with a legal stress pattern elicited the same ERP responses irrespective of their role in the oddball sequence, i.e., if they were standards or deviants. The results suggest that stress pattern changes are processed relying on long-term representation of word stress. To account for these results, we propose that the processing of stress cues is based on language-specific, pre-lexical stress templates

Crossref

Repository of the Academy's Library