Search CORE

389 research outputs found

Detection of shouted speech in noise:Human and machine

Author: Alku Paavo
Pohjalainen Jouni
Raitio Tuomo
Yrttiaho Santeri
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/2013
Field of study

Crossref

Edinburgh Research Explorer

Methods for speaking style conversion from normal speech to high vocal effort speech

Author: Ramírez López Ana
Publication venue: Aalto University, School of Arts, Design and Architecture, Department of Arts
Publication date: 01/01/2020
Field of study

This thesis deals with vocal-effort-focused speaking style conversion (SSC). Specifically, we studied two topics on conversion of normal speech to high vocal effort. The first topic involves the conversion of normal speech to shouted speech. We employed this conversion in a speaker recognition system with vocal effort mismatch between test and enrollment utterances (shouted speech vs. normal speech). The mismatch causes a degradation of the system's speaker identification performance. As solution, we proposed a SSC system that included a novel spectral mapping, used along a statistical mapping technique, to transform the mel-frequency spectral energies of normal speech enrollment utterances towards their counterparts in shouted speech. We evaluated the proposed solution by comparing speaker identification rates for a state-of-the-art i-vector-based speaker recognition system, with and without applying SSC to the enrollment utterances. Our results showed that applying the proposed SSC pre-processing to the enrollment data improves considerably the speaker identification rates. The second topic involves a normal-to-Lombard speech conversion. We proposed a vocoder-based parametric SSC system to perform the conversion. This system first extracts speech features using the vocoder. Next, a mapping technique, robust to data scarcity, maps the features. Finally, the vocoder synthesizes the mapped features into speech. We used two vocoders in the conversion system, for comparison: a glottal vocoder and the widely used STRAIGHT. We assessed the converted speech from the two vocoder cases with two subjective listening tests that measured similarity to Lombard speech and naturalness. The similarity subjective test showed that, for both vocoder cases, our proposed SSC system was able to convert normal speech to Lombard speech. The naturalness subjective test showed that the converted samples using the glottal vocoder were clearly more natural than those obtained with STRAIGHT

Aaltodoc Publication Archive

Shouting affects temporal properties of the speech amplitude envelope

Author: Dellwo Volker
Dimos Kostis
He Lei
Publication venue: Acoustical Society of America
Publication date: 01/01/2024
Field of study

Distinguishing shouted from non-shouted speech is crucial in communication. We examined how shouting affects temporal properties of the amplitude envelope (ENV) in a total of 720 sentences read by 18 Swiss German speakers in normal and shouted modes; shouting was characterised by maintaining sound pressure levels of ≥80 dB sound pressure level (dB-SPL) (C-weighted) at a 1-meter distance from the mouth. Generalized additive models revealed significant temporal alterations of ENV in shouted speech, marked by steeper ascent, delayed peak, and extended high levels. These findings offer potential cues for identifying shouting, particularly useful when fine-structure and dynamic range cues are absent, for example, in cochlear implant users

ZORA

Analysis and synthesis of shouted speech

Author: Airaksinen Manu
Alku Paavo
Pohjalainen Jouni
Raitio Tuomo
Suni Antti
Vainio Martti
Publication venue
Publication date: 01/01/2013
Field of study

Edinburgh Research Explorer

Automatic Detection of High Vocal Effort in Telephone Speech

Author: Alku Paavo
Pohjalainen Jouni
Pulakka Hannu
Raitio Tuomo
Publication venue
Publication date: 01/01/2012
Field of study

Edinburgh Research Explorer

Real-time noise-robust speech detection

Author: Luu Kevin Y
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2010
Field of study

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 87-89).As part of the development of an autonomous forklift of the Agile Robotics Lab at MIT's Computer Science and Artificial Intelligence Lab (CSAIL), this thesis explores the effectiveness and application of various noise-robust techniques towards real-time speech detection in real environments. Dynamic noises in the environment (including motor noise, babble noise, and other noises in a warehouse setting) can dramatically alter the speech signal, making speech detection much more difficult. In addition to the noise environments, another issue is the urgent nature of the situation, leading to the production of shouted speech. Given these constraints, the forklift must be highly accurate in detecting speech at all times, since safety is a major concern in our application. This thesis analyzes different speech properties that would be useful in distinguishing speech from noise in various noise environments. We look at various features in an effort to optimize the overall shout detection system. In addition to identifying speech features, this thesis also uses common signal processing techniques to enhance the speech signals in audio waveforms. In addition to the optimal speech features and speech enhancement techniques, we present a shout detection algorithm that is optimized towards the application of the autonomous forklift. We measure the performance of the resulting system by comparing it to other baseline systems and show 38% improvement over a baseline task.by Kevin Y. Luu.M.Eng

DSpace@MIT

Quantitative assessment of spatial sound distortion by the semi-ideal recording point of a hear-through device

Author: Christensen Flemming
Hammershøi Dorte
Hoffmann Pablo F.
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/2013
Field of study

Crossref

VBN

Models and Analysis of Vocal Emissions for Biomedical Applications

Author
Publication venue: 'Firenze University Press'
Publication date: 31/05/2022
Field of study

The Models and Analysis of Vocal Emissions with Biomedical Applications (MAVEBA) workshop came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy

Directory of Open Access Books (DOAB)