Search CORE

9,766 research outputs found

Affective encoding in the speech signal and in event-related brain potentials

Author: Alter K.
Besson M.
Friederici A.
Kotz S.
Rank E.
Schirmer A.
Toepel U.
Publication venue
Publication date: 01/01/2003
Field of study

A number of perceptual features have been utilized for the characterization of the emotional state of a speaker. However, for automatic recognition suitable objective features are needed. We have examined several features of the speech signal in relation to accentuation and traces of event-related brain potentials (ERPs) during affective speech perception. Concerning the features of the speech signal we focus on measures related to breathiness and roughness. The objective measures used were an estimation of the harmonics-to-noise ratio, the glottal-to-noise excitation ratio, a measure for spectral flatness, as well as the maximum prediction gain for a speech production model computed by the mutual information function and the ERPs. Results indicate that in particular the maximum prediction gain shows a good differentiation between neutral and non-neutral emotional speaker state. This differentiation is partly comparable to the ERP results that show a differentiation of neutral, positive and negative affect. Other objective measures are more related to accentuation than to emotional state of the speaker

MPG.PuRe

Modelling the effects of speech rate variation for automatic speech recognition

Author: Wrede Britta
Publication venue: Bielefeld University
Publication date: 01/01/2002
Field of study

Wrede B. Modelling the effects of speech rate variation for automatic speech recognition. Bielefeld (Germany): Bielefeld University; 2002.In automatic speech recognition it is a widely observed phenomenon that variations in speech rate cause severe degradations of the speech recognition performance. This is due to the fact that standard stochastic based speech recognition systems specialise on average speech rate. Although many approaches to modelling speech rate variation have been made, an integrated approach in a substantial system still has be to developed. General approaches to rate modelling are based on rate dependent models which are trained with rate specific subsets of the training data. During decoding a signal based rate estimation is performed according to which the set of rate dependent models is selected. While such approaches are able to reduce the word error rate significantly, they suffer from shortcomings such as the reduction of training data and the expensive training and decoding procedure. However, phonetic investigations show that there is a systematic relationship between speech rate and the acoustic characteristics of speech. In fast speech a tendency of reduction can be observed which can be described in more detail as a centralisation effect and an increase in coarticulation. Centralisation means that the formant frequencies of vowels tend to shift towards the vowel space center while increased coarticulation denotes the tendency of the spectral features of a vowel to shift towards those of its phonemic neighbour. The goal of this work is to investigate the possibility to incorporate the knowledge of the systematic nature of the influence of speech rate variation on the acoustic features in speech rate modelling. In an acoustic-phonetic analysis of a large corpus of spontaneous speech it was shown that an increased degree of the two effects of centralisation and coarticulation can be found in fast speech. Several measures for these effects were developed and used in speech recognition experiments with rate dependent models. A thorough investigation of rate dependent models showed that with duration and coarticulation based measures significant increases of the performance could be achieved. It was shown that by the use of different measures the models were adapted either to centralisation or coarticulation. Further experiments showed that by a more detailed modelling with more rate classes a further improvement can be achieved. It was also observed that a general basis for the models is needed before rate adaptation can be performed. In a comparison to other sources of acoustic variation it was shown that the effects of speech rate are as severe as those of speaker variation and environmental noise. All these results show that for a more substantial system that models rate variations accurately it is necessary to focus on both, durational and spectral effects. The systematic nature of the effects indicates that a continuous modelling is possible

Publications at Bielefeld University

Sound symbolism, speech expressivity and crossmodality

Author: Mario Augusto Souza Fontes
Sandra Madureira
Zuleica Camargo
Publication venue: Universite Clermont Auvergne
Publication date: 13/01/2020
Field of study

The direct links existing between sound and meaning which characterize sound symbolism can be thought of as mainly related to two kinds of phenomena: sound iconicity and sound metaphors. The first refers to the mirror relations established between sound and meaning effects (Nobile, 2011) and the latter as coined by Fonagy (1983) refers to the relationships based on analogies between meaning and speech sound production characteristics. Four relevant codes to the study of sound symbolism phenomena have been mentioned in the phonetic literature: the frequency code (Ohala, 1994), the respiratory code, the effort code (Gussenhoven, 2002) and the sirenic code (Gussenhoven, 2016). In the present work sound symbolism is taken to be the basis of speech expressivity because the meaning effects attributed to the spoken mode by the listeners are thought to be based on the acoustic features of sounds deriving from the various articulatory maneuvers yielding breath, voice, noise, resonance and silence. Based on the impression caused by the acoustic features, listeners attribute physiological, physical, psychological and social characteristics to speakers. In this way, speech can be considered both expressive and impressive, because it is used to convey meaning effects but it also impress listeners. Both segmental and prosodic elements are used to express meaning effects in speech. Among the prosodic elements vocal quality settings have received less attention regarding speech expressive uses. We argue that the investigation of the expressive uses of voice quality settings can be better approached if these settings are grouped according to their shared acoustic output properties and vocal tract configurations. Results of experiments relating symbolic uses of vocal qualities to semantic, acoustic and visual features by means of multidimensional analysis are reported and the expressive and impressive roles of vocal quality settings in spoken communication are discussed in relation to motivated links between sound forms and meaning effects. KEY WORDS: sound and meaning;  sound symbolism; speech expressivity; voice quality; acoustic analysis; perceptual analysis

Revues Clermont Université

Lingual articulation in children with developmental speech disorders

Author: Gibbon Fiona E.
Publication venue: University of Bedfordshire
Publication date: 01/02/1998
Field of study

This thesis presents thirteen research papers published between 1987-97, and a summary and discussion of their contribution to the field of developmental speech disorders. The publications collectively constitute a body of work with two overarching themes. The first is methodological: all the publications report articulatory data relating to tongue movements recorded using the instrumental technique of electropalatography (EPG). The second is the clinical orientation of the research: the EPG data are interpreted throughout for the purpose of informing the theory and practice of speech pathology. The majority of the publications are original, experimental studies of lingual articulation in children with developmental speech disorders. At the same time the publications cover a broad range of theoretical and clinical issues relating to lingual articulation including: articulation in normal speakers, the clinical applications of EPG, data analysis procedures, articulation in second language learners, and the effect of oral surgery on articulation. The contribution of the publications to the field of developmental speech disorders of unknown origin, also known as phonological impairment or functional articulation disorder, is summarised and discussed. In total, EPG data from fourteen children are reported. The collective results from the publications do not support the cognitive/linguistic explanation of developmental speech disorders. Instead, the EPG findings are marshalled to build the case that specific deficits in speech motor control can account for many of the diverse speech error characteristics identified by perceptual analysis in previous studies. Some of the children studied had speech motor deficits that were relatively discrete, involving, for example, an apparently isolated difficulty with tongue tiplblade groove formation for sibilant targets. Articulatory difficulties of the 'discrete' or specific type are consistent with traditional views of functional lingual articulation in developmental speech disorders articulation disorder. EPG studies of tongue control in normal adults provided insights into a different type of speech motor control deficit observed in the speech of many of the children studied. Unlike the children with discrete articulatory difficulties, others produced abnormal EPG patterns for a wide range of lingual targets. These abnormal gestures were characterised by broad, undifferentiated tongue-palate contact, accompanied by variable approach and release phases. These 'widespread', undifferentiated gestures are interpreted as constituting a previously undescribed form of speech motor deficit, resulting from a difficulty in controlling the tongue tip/blade system independently of the tongue body. Undifferentiated gestures were found to result in variable percepts depending on the target and the timing of the particular gesture, and may manifest as perceptually acceptable productions, phonological substitutions or phonetic distortions. It is suggested that discrete and widespread speech motor deficits reflect different stages along a developmental or severity continuum, rather than distinct subgroups with different underlying deficits. The children studied all manifested speech motor control deficits of varying degrees along this continuum. It is argued that it is the unique anatomical properties of the tongue, combined with the high level of spatial and temporal accuracy required for tongue tiplblade and tongue body co-ordination, that put lingual control specifically at risk in young children. The EPG findings question the validity of assumptions made about the presence/absence of speech motor control deficits, when such assumptions are based entirely on non-instrumental assessment procedures. A novel account of the sequence of acquisition of alveolar stop articulation in children with normal speech development is proposed, based on the EPG data from the children with developmental speech disorders. It is suggested that broad, undifferentiated gestures may occur in young normal children, and that adult-like lingual control develops gradually through the processes of differentiation and integration. Finally, the EPG fmdings are discussed in relation to two recent theoretical frameworks, that of psycho linguistic models and a dynamic systems approach to speech acquisition

University of Bedfordshire Repository

A Vowel Analysis of the Northwestern University-Children\u27s Perception of Speech Evaluation Tool

Author: Zukowski Kassie Nicole
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 01/01/2017
Field of study

In an analysis of the speech perception evaluation tool, the Northwestern University – Children’s Perception of Speech test, the goal was to determine whether the foil words and the target word were phonemically balanced across each page of test Book A, as it corresponds to the target words presented in Test Form 1 and Test Form 2 independently. Based on vowel sounds alone, variation exists in the vowels that appear on a test page on the majority of pages. The corresponding formant frequencies, at all three resonance levels for both the average adult male speaker and the average adult female speaker, revealed that the target word could be easily distinguished from the foil words on the premise of percent differences calculated between the formants of the target vowel and the foil vowels. For the population of children with hearing impairments, especially those with limited or no access to the high frequencies, the NU-CHIPS evaluation tool may not be the best indicator of the child’s speech perception ability due to significant vowel variations

UNH Scholars' Repository

Psychological Motivated Multi-Stage Emotion Classification Exploiting Voice Quality Features

Author: Bin Yang
Marko Lugger
Publication venue: 'IntechOpen'
Publication date: 01/11/2008
Field of study

IntechOpen

Crossref

Models and Analysis of Vocal Emissions for Biomedical Applications

Author
Publication venue: 'Firenze University Press'
Publication date: 31/05/2022
Field of study

The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies

Directory of Open Access Books (DOAB)

Phonological Priming In Young Children Who Stutter: Holistic Versus Incremental Processing

Author: Byrd Courtney T.
Conture Edward G.
Ohde Ralph N.
Publication venue: 'American Speech Language Hearing Association'
Publication date: 01/01/2007
Field of study

Purpose: To investigate the holistic versus incremental phonological encoding processes of young children who stutter (CWS; N = 26) and age- and gender-matched children who do not stutter (CWNS; N = 26) via a picture-naming auditory priming paradigm. Method: Children named pictures during 3 auditory priming conditions: neutral, holistic, and incremental. Speech reaction time (SRT) was measured from the onset of picture presentation to the onset of participant response. Results: CWNS shifted from being significantly faster in the holistic priming condition to being significantly faster in the incremental priming condition from 3 to 5 years of age. In contrast, the majority of 3- and 5-year-old CWS continued to exhibit faster SRT in the holistic than the incremental condition. Conclusion: CWS are delayed in making the developmental shift in phonological encoding from holistic to incremental processing, a delay that may contribute to their difficulties establishing fluent speech.Communication Sciences and Disorder

CiteSeerX

Texas ScholarWorks