1,116 research outputs found
Stress and rhythm in modern Greek
SIGLEAvailable from British Library Document Supply Centre- DSC:D35991/81 / BLDSC - British Library Document Supply CentreGBUnited Kingdo
Acoustic Intensity and Speech Breathing Kinematics in a Patient with Parkinson’s Disease
Parkinson’s disease (PD) is a neurodegenerative disease which affects the basal ganglia control circuit (Duffy, 2013). The motor speech disorder most strongly associated with PD is hypokinetic dysarthria, which presents with distinctive speech characteristics including reduced loudness and the inability to adequately maintain loud speech (Darley, Aronson, & Brown 1969; Duffy 2013). This is due to the variable kinematics for speech breathing associated with PD, which may result in abnormal muscular excursions, reduced vital capacity, and irregular breathing cycles (Duffy, 2013). The impaired ventilatory control can be attributed to the rigidity of muscles of inhalation and exhalation, as well as bradykinesia and hypokinesia.
The study aimed to evaluate whether a patient with PD was able to manipulate their acoustic intensity, and if such intensity changes were accompanied by changes in speech breathing kinematics in a novel intraoperative environment.
The study’s data were collected intra-operatively during surgery for deep brain stimulation and recordings from the subthalamic nucleus and cortex. The patient was instructed to modulate acoustic intensity while repeating three syllable CV triplets. Speech breathing kinematics of the rib cage were obtained using a Piezo Crystal Effort Sensor with a double buckle band throughout speech production. The speech breathing kinematics of interest were duration, displacement, and peak velocity of inhalation, peak velocity of exhalation, and duration from onset of exhalation to onset of speech, as well as a descriptive comparison between tidal breathing and speech breathing.
Spearman Rho correlations indicated that there were weak to no relationships observed between speech breathing kinematics and intensity in this specific participant. However, a medium effect size (Hedge’s g) was observed between tidal and speech breathing for inhalation duration, and small to medium effect size for inhalation displacement and peak velocity.
While previous literature suggests that people with PD can manipulate intensity when cued as a result of kinematic modulations for speech breathing, the current study does not support these findings for this one patient. However, previously reported differences between tidal and speech breathing were supported. Potential explanations for the lack of intensity modulation are explored, including constraints induced by the intra-operative environment
The Ability of Persons with Parkinson's Disease to Manipulate Vocal Intensity and Articulatory Precision in an Intra-Operative Setting
Parkinson’s disease is a degenerative neurological disease associated with decreased basal ganglia control circuit output, leading to decreased facilitation of cortical motor areas and subsequent motor impairments (Wichmann & DeLong, 1996). Motor impairments, including rigidity, bradykinesia, reduced range of motion and difficulty initiating movement, impact both respiratory function and speech in persons with Parkinson’s disease (PWPD), often leading to hypophonia and hypokinetic dysarthria (Darling & Huber, 2011). Hypokinetic dysarthria includes, among other characteristics, reduced loudness and imprecise articulation, and therefore reduced speech clarity.
The purpose of this study was to determine if PWPD were able to manipulate speech intensity and articulatory precision in soft versus loud stimulus presentation conditions in an intra-operative environment. Articulatory precision was measured using the F2 ratio, based on the second formant values of the vowels /i/ and /u/ (Sapir, 2007). As /i/ is produced anteriorly in the oral cavity and /u/ is produced posteriorly, an increase in this ratio is anticipated to accompany greater articulatory precision. It was hypothesized that PWPD would be able to increase vocal intensity, which would result in larger F2 ratios.
Participants consisted of 16 PWPD undergoing surgery for deep brain stimulation and simultaneous recording in the subthalamic nucleus and cortex. Participants repeated CVCVCV utterances presented auditorily at soft and loud levels. Acoustic signals were recorded and average vowel intensities and second formant values for /i/ and /u/ productions within each utterance were extracted. Second formant values were then used to calculate the F2 ratio for each utterance.
Wilcoxon Signed-Rank Tests revealed that, while intensity significantly increased in the loud compared to the soft condition, the F2 ratio did not demonstrate this increase. Of particular interest, examination of individual participants revealed that 3 patients did not increase intensity in the loud stimulus condition. When only participants who increased intensity were included in subsequent analyses, the F2 ratio did demonstrate a significant increase in the loud stimulus condition.
The current study demonstrates that, even with methodological differences as a result of the intra-operative environment, when patients are able to increase speech intensity, they also increase articulatory precision
An acoustic-phonetic approach in automatic Arabic speech recognition
In a large vocabulary speech recognition system the broad phonetic classification
technique is used instead of detailed phonetic analysis to overcome the variability in the
acoustic realisation of utterances. The broad phonetic description of a word is used as a
means of lexical access, where the lexicon is structured into sets of words sharing the
same broad phonetic labelling.
This approach has been applied to a large vocabulary isolated word Arabic speech
recognition system. Statistical studies have been carried out on 10,000 Arabic words
(converted to phonemic form) involving different combinations of broad phonetic
classes. Some particular features of the Arabic language have been exploited. The results
show that vowels represent about 43% of the total number of phonemes. They also show
that about 38% of the words can uniquely be represented at this level by using eight
broad phonetic classes. When introducing detailed vowel identification the percentage of
uniquely specified words rises to 83%. These results suggest that a fully detailed
phonetic analysis of the speech signal is perhaps unnecessary.
In the adopted word recognition model, the consonants are classified into four broad
phonetic classes, while the vowels are described by their phonemic form. A set of 100
words uttered by several speakers has been used to test the performance of the
implemented approach.
In the implemented recognition model, three procedures have been developed, namely
voiced-unvoiced-silence segmentation, vowel detection and identification, and automatic
spectral transition detection between phonemes within a word. The accuracy of both the
V-UV-S and vowel recognition procedures is almost perfect. A broad phonetic
segmentation procedure has been implemented, which exploits information from the
above mentioned three procedures. Simple phonological constraints have been used to
improve the accuracy of the segmentation process. The resultant sequence of labels are
used for lexical access to retrieve the word or a small set of words sharing the same broad
phonetic labelling. For the case of having more than one word-candidates, a verification
procedure is used to choose the most likely one
Hearing the Moment: Measures and Models of the Perceptual Centre
The perceptual centre (P-centre) is the hypothetical specific moment at which a brief event is perceived to occur. Several P-centre models are described in the literature and the first collective implementation and rigorous evaluation of these models using a common corpus is described in this thesis, thus addressing a significant open question: which model should one use? The results indicate that none of the models reliably handles all sound types. Possibly this is because the data for model development are too sparse, because inconsistent measurement methods have been used, or because the assumptions underlying the measurement methods are untested. To address this, measurement methods are reviewed and two of them, rhythm adjustment and tap asynchrony, are evaluated alongside a new method based on the phase correction response (PCR) in a synchronized tapping task. Rhythm adjustment and the PCR method yielded consistent P-centre estimates and showed no evidence of P-centre context dependence. Moreover, the PCR method appears most time efficient for generating accurate P-centre estimates. Additionally, the magnitude of the PCR is shown to vary systematically with the onset complexity of speech sounds, which presumably reflects the perceived clarity of a sound’s P-centre.
The ideal outcome of any P-centre measurement technique is to detect the true moment of perceived event occurrence. To this end a novel P-centre measurement method, based on auditory evoked potentials, is explored as a possible objective alternative to the conventional approaches examined earlier. The results are encouraging and suggest that a neuroelectric correlate of the P-centre does exist, thus opening up a new avenue of P-centre research.
Finally, an up to date and comprehensive review of the P-centre is included, integrating recent findings and reappraising previous research. The main open questions are identified, particularly those most relevant to P-centre modelling
Recommended from our members
English lexical stress, prominence and rhythm
English speech rhythm is closely associated with the patterns of lexical stress and prominence in a stream of speech. Older varieties of English (OVEs), such as British and American English, which usually operate as the model in English language teaching, are often described as ‘stress-timed’, meaning the time between stressed syllables is more or less equal, in comparison with ‘syllable-timed’ languages (e.g., French or Cantonese), for which the time between successive syllable onsets is more or less equal. The usefulness of this distinction, however, has been disputed; e.g., Cauldwell (2002) talks about ‘functional irrythmicality’ in English speech.
Cutler (1984) explains that native speakers of English focus on stressed syllables when listening to a stream of speech as part of the decoding process; i.e., for native speakers, lexical stress and the rhythm of the incoming signal play an important part in perception. Couper-Kuhlen and colleagues (e.g., Auer, Couper-Kuhlen, & Müller, 1999) have shown that speech rhythm plays an important part in the coordination of turn-taking in conversation. Anderson-Hsieh and Venkatagiri (1994) argue that speakers’ intelligibility will be affected if they do not sufficiently weaken English unstressed syllables. Such research indicates that the differences in the lexical stress and/or speech rhythm patterns of learners of English, or speakers of New Varieties of English (NVEs) which are not ‘stress-timed’, could create difficulties in comprehension and cooperative interaction for native speakers of OVEs and also, plausibly, for other speakers of English if they are using similar strategies. However, whether the majority of speakers of English in the world have a speaker of an OVE as their target interlocutor is coming increasingly under question.
This chapter gives an overview of English lexical stress, prominence and speech rhythm in OVEs, including theoretical approaches to their description, and includes suggestions for pedagogical approaches for the English language classroom
Wavelet Based Feature Extraction for The Indonesian CV Syllables Sound
This paper proposes the combined methods of Wavelet Transform (WT) and Euclidean Distance (ED) to estimate the expected value of the possibly feature vector of Indonesian syllables. This research aims to find the best properties in effectiveness and efficiency on performing feature extraction of each syllable sound to be applied in the speech recognition systems. This proposed approach which is the state-of-the-art of the previous study consist of three main phase. In the first phase, the speech signal is segmented and normalized. In the second phase, the signal is transformed into frequency domain by using the WT. In the third phase, to estimate the expected feature vector, the ED algorithm is used. Th e result shows the list of features of each syllables can be used for the next research, and some recommendations on the most effective and efficient WT to be used in performing syllable sound recognition
Fine phonetic detail and intonational meaning
International audienceThe development of theories about form-function relations in intonation should be informed by a better understanding of the dependencies that hold among different phonetic parameters. Fine phonetic detail encodes both linguistically structured meaning and paralinguistic meaning. <BR /
Producing phrasal prominence in German
This study examines the relative change in a number of acoustic parameters usually associated with the production of prominences. The production of six German sentences under different question answer conditions provide de-accented and accented versions of the same words in broad and narrow focus. Normalised energy, F0, duration and spectral measures were found to form a stable hierarchy in their exponency of the three degrees of accentuation
Louisiana State University nasalance protocol standardization
It was the purpose of this study to obtain nasalance values using the Nasometer and a resonance evaluation created at the Louisiana State University (LSU) Speech and Hearing Clinic. The Nasometer was used to measure the amount of nasal acoustic energy in the speech of 40 normal young adults during sustained vowel production, consonant vowel reduplications, and connected speech using the Rainbow Passage. Means and standard deviations are presented for the individual speech tasks and according to gender. Nasalance values for the sustained vowels were significantly higher for the high front vowel /i/ than any other vowel, and the lowest nasalance value was obtained by the high back vowel /u/. The vowels in order of highest to lowest nasalance values were as follows: /i, ae, a, u/. No significant gender differences were found for sustained vowel production or the Rainbow Passage. Correlation values indicated that three phonemes /u, k, g/ from the resonance protocol were the best predictors of nasalance for the reading passage. The results are discussed with regard to potential reasons why minimal gender differences were found, why the phonemes were found to be the best predictors of nasalance, and how the LSU protocol can be modified to provide a more effective and efficient resonance evaluation
- …