99 research outputs found

    Advances in the neurocognition of music and language

    Get PDF

    Psycholinguistics in Fluency Disorders: Prearticulatory Speech Planning In Stuttering and Cluttering

    Get PDF
    The Covert Repair Hypothesis (CRH) is an account for speech errors in normally fluent speakers, and also hypothesizes errors in the phonological encoding stage in people who stutter (PWS). Previous research has shown that PWS exhibit poorer performance compared to typically fluent adults (TFA) on linguistic tasks designed to tap into the level of phonological encoding, such as phoneme monitoring. Stuttering and cluttering often co-occur, thus the field can benefit from extending this methodology to study people who clutter (PWC). Experiment 1 in Chapter 2 used phoneme monitoring to study phonological encoding in PWS and PWC, with three conclusions: (1) slower performance by PWS; (2) increased errors by PWS compared to TFA; and (3) similar performance by PWC compared to TFA, suggesting that PWC do not exhibit difficultly with phonological encoding at the single word level. One criticism of the CRH is that the cause of errors in the speech plan has not been accounted for. Chapter 3 proposed the Near Neighbor Interference Hypothesis (NNIH) as an account for errors in the speech plan in PWS, which hypothesizes that due to a lifetime of word-substitution behavior to avoid stuttering, semantic neighborhoods of PWS may be organized differently than TFA, with more neighbors and/or stronger connections between neighbors. Chapter 3 tested the NNIH by investigating the effects of the number of associates (NoA) and the degree of relatedness on performance during lexical decision. Previous research shows TFA respond faster to words with a high vs. low NoA, and words preceded by a picture with a high vs. a low degree of relatedness. Following from the NNIH, it was hypothesized that the magnitude of these effects would be greater in PWS. In Experiment 2, both groups responded faster to words with higher NoA, but PWS were slower to respond than TFA overall, regardless of NoA. In Experiment 3, PWS were not overall slower than TFA, and the effect of degree of relatedness was actually stronger for TFA than PWS. Together, these results suggest that rather than experiencing a benefit from more semantic neighbors, it appears PWS may experience interference from these additional neighbors. Overall, results suggest that PWS may have errors in their speech plan that originate prearticulatoraily, potentially at the lexical-semantic level, and are passed down to the phonological encoding level

    An exploration of the rhythm of Malay

    Get PDF
    In recent years there has been a surge of interest in speech rhythm. However we still lack a clear understanding of the nature of rhythm and rhythmic differences across languages. Various metrics have been proposed as means for measuring rhythm on the phonetic level and making typological comparisons between languages (Ramus et al, 1999; Grabe & Low, 2002; Dellwo, 2006) but the debate is ongoing on the extent to which these metrics capture the rhythmic basis of speech (Arvaniti, 2009; Fletcher, in press). Furthermore, cross linguistic studies of rhythm have covered a relatively small number of languages and research on previously unclassified languages is necessary to fully develop the typology of rhythm. This study examines the rhythmic features of Malay, for which, to date, relatively little work has been carried out on aspects rhythm and timing. The material for the analysis comprised 10 sentences produced by 20 speakers of standard Malay (10 males and 10 females). The recordings were first analysed using rhythm metrics proposed by Ramus et. al (1999) and Grabe & Low (2002). These metrics (∆C, %V, rPVI, nPVI) are based on durational measurements of vocalic and consonantal intervals. The results indicated that Malay clustered with other so-called syllable-timed languages like French and Spanish on the basis of all metrics. However, underlying the overall findings for these metrics there was a large degree of variability in values across speakers and sentences, with some speakers having values in the range typical of stressed-timed languages like English. Further analysis has been carried out in light of Fletcher’s (in press) argument that measurements based on duration do not wholly reflect speech rhythm as there are many other factors that can influence values of consonantal and vocalic intervals, and Arvaniti’s (2009) suggestion that other features of speech should also be considered in description of rhythm to discover what contributes to listeners’ perception of regularity. Spectrographic analysis of the Malay recordings brought to light two parameters that displayed consistency and regularity for all speakers and sentences: the duration of individual vowels and the duration of intervals between intensity minima. This poster presents the results of these investigations and points to connections between the features which seem to be consistently regulated in the timing of Malay connected speech and aspects of Malay phonology. The results are discussed in light of current debate on the descriptions of rhythm

    Factors affecting the perception of noise-vocoded speech: stimulus properties and listener variability.

    Get PDF
    This thesis presents an investigation of two general factors affecting speech perception in normal-hearing adults. Two sets of experiments are described, in which speakers of English are presented with degraded (noise-vocoded) speech. The first set of studies investigates the importance of linguistic rhythm as a cue for perceptual adaptation to noise-vocoded sentences. Results indicate that the presence of native English rhythmic patterns benefits speech recognition and adaptation, but not when higher-level linguistic information is absent (i.e. when the sentences are in a foreign language). It is proposed that rhythm may help in the perceptual encoding of degraded speech in phonological working memory. Experiments in this strand also present evidence against a critical role for indexical characteristics of the speaker in the adaptation process. The second set of studies concerns the issue of individual differences in speech perception. A psychometric curve-fitting approach is selected as the preferred method of quantifying variability in noise-vocoded sentence recognition. Measures of working memory and verbal IQ are identified as candidate correlates of performance with noise-vocoded sentences. When the listener is exposed to noise-vocoded stimuli from different linguistic categories (consonants and vowels, isolated words, sentences), there is evidence for the interplay of two initial listening 'modes' in response to the degraded speech signal, representing 'top-down' cognitive-linguistic processing and 'bottom-up' acoustic-phonetic analysis. Detailed analysis of segment recognition presents a perceptual role for temporal information across all the linguistic categories, and suggests that performance could be improved through training regimes that direct attention to the most informative acoustic properties of the stimulus. Across several experiments, the results also demonstrate long-term aspects of perceptual learning. In sum, this thesis demonstrates that consideration of both stimulus-based and listener-based factors forms a promising approach to the characterization of speech perception processes in the healthy adult listener

    Detecting speech disorders in early Parkinson’s disease by acoustic analysis

    Get PDF
    This interdisciplinary habilitation thesis is focused on the design of the feasible algorithms and analytical methods based on digital signal processing and advanced statistical analysis that are sensitive to capture pathological speech changes from very early stages of Parkinson’s disease. Using objective acoustic analysis, we revealed distinctive speech impairment in patients with prodromal Parkinson’s disease, newly diagnosed Parkinson’s disease and atypical parkinsonian syndromes. Our findings suggest that automated vocal analysis may contribute to screening and diagnostic procedures to identify subjects at high risk of developing Parkinson’s disease and related neurodegenerative disorders.Cílem této multidisciplinární habilitační práce je návrh vhodných algoritmů a analytických metod pro analýzu řeči založených na digitálním zpracování signálu a pokročilé statistické analýze, které budou dostatečně sensitivní a umožní zachycení patologických změn v řeči od velmi brzkých stádiích Pakinsonovy nemoci. S využitím objektivních metod akustické analýzy byla odhalena specifická forma řečové poruchy u pacientů s prodromální Pakinsonovou nemocí, nově diagnostikovanou Pakinsonovou nemocí a atypickými parkinsonskými syndromy. Tyto nálezy naznačují možnost využití automatické analýzy hlasu pro screeningové a diagnostické testy, které by umožnily identifikovat osoby ohrožené rozvojem Pakinsonovou nemocí a dalších extrapyramidových onemocnění

    Tagungsband der 12. Tagung Phonetik und Phonologie im deutschsprachigen Raum

    Get PDF

    Acoustic correlates of encoded prosody in written conversation

    Get PDF
    This thesis presents an analysis of certain punctuation devices such as parenthesis, italics and emphatic spellings with respect to their acoustic correlates in read speech. The class of punctuation devices under investigation are referred to as prosodic markers. The thesis therefore presents an analysis of features of the spoken language which are represented symbolically in text. Hence it is a characterization of aspects of the spoken language which have been transcribed or symbolized in the written medium and then translated back into a spoken form by a reader. The thesis focuses in particular on the analysis of parenthesis, the examination of encoded prominence and emphasis, and also addresses the use of paralinguistic markers which signal attitude or emotion.In an effort to avoid the use of self constructed or artificial material containing arbitrary symbolic or prosodic encodings, all material used for empirical analysis was taken from examples of electronic written exchanges on the Internet, such as from electronic mail messages and from articles posted on electronic newsgroups and news bulletins. This medium of language, which is referred to here as written conversation, provides a rich source of material containing encoded prosodic markers. These occur in the form of 'smiley faces' expressing attitudes or feelings, words highlighted by a number of means such as capitalization, italics, underscore characters, or asterisks, and in the form of dashes or parentheses, which provide suggestions on how the information in a text or sentence may be structured with regard to its informational content.Chapter 2 investigates in detail the genre of written conversation with respect to its place in an emerging continuum between written and spoken language, concentrating on transcriptional devices and their function as indicators of prosody. The implications these symbolic representations bear on the task of reading, by humans as well as machines, are then examined.Chapters 3 and 4 turn to the acoustic analysis of parentheticals and emphasis markers respectively. The experimental work in this thesis is based on readings of a corpus of selected materials from written conversation with the acoustic analysis concentrating on the differences between readings of texts with prosodic markers and readings of the same texts from which prosodic markers have been removed. Finally, the effect of prosodic markers is tested in perception experiments involving both human and resynthesized utterances

    Characterizing and recognizing spoken corrections in human-computer dialog

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. 103-106).Miscommunication in human-computer spoken language systems is unavoidable. Recognition failures on the part of the system necessitate frequent correction attempts by the user. Unfortunately and counterintuitively, users' attempts to speak more clearly in the face of recognition errors actually lead to decreased recognition accuracy. The difficulty of correcting these errors, in turn, leads to user frustration and poor assessments of system quality. Most current approaches to identifying corrections rely on detecting violations of task or belief models that are ineffective where such constraints are weak and recognition results inaccurate or unavailable. In contrast, the approach pursued in this thesis, in contrast, uses the acoustic contrasts between original inputs and repeat corrections to identify corrections in a more content- and context-independent fashion. This thesis quantifies and builds upon the observation that suprasegmental features, such as duration, pause, and pitch, play a crucial role in distinguishing corrections from other forms of input to spoken language systems. These features can also be used to identify spoken corrections and explain reductions in recognition accuracy for these utterances. By providing a detailed characterization of acoustic-prosodic changes in corrections relative to original inputs in a voice-only system, this thesis contributes to natural language processing and spoken language understanding. We present a treatment of systematic acoustic variability in speech recognizer input as a source of new information, to interpret the speaker's corrective intent, rather than simply as noise or user error. We demonstrate the application of a machine-learning technique, decision trees, for identifying spoken corrections and achieve accuracy rates close to human levels of performance for corrections of misrecognition errors, using acoustic-prosodic information. This process is simple and local and depends neither on perfect transcription of the recognition string nor complex reasoning based on the full conversation. We further extend the conventional analysis of speaking styles beyond a 'read' versus 'conversational' contrast to extreme clear speech, describing divergence from phonological and durational models for words in this style.by Gina-Anne Levow.Ph.D
    corecore