1,327 research outputs found

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    Intonation And Reading Skills In Fourth-Grade Students

    Get PDF
    The purpose of the current study was to examine the role of intonation skills in the reading comprehension of fourth-grade students. Although, the National Reading Panel\u27s (2000) definition of reading fluency as ...the ability to read a text quickly, accurately, and with proper expression... (p. 3-5) suggests a role for prosody and intonation in reading, these features have not figured prominently in reading research and studies that have examined the relationship between intonation and reading have reported varying results. The current study adopted the view that intonation is one of the many linguistic skills that support children\u27s reading skills. From this perspective, the study examined the relationship between intonation and reading comprehension within the framework of the Simple View of Reading (SVR), a model that describes reading comprehension as the product of decoding and linguistic comprehension. Based on previous work by Miller and Schwanenflugel (2006, 2008), the study examined whether children who produced wider or more adult-like final rising intonation contours demonstrated greater reading comprehension than children who produced narrower or less adult-like final rising contours? The current study did not find support for a relationship between children\u27s productions of wider or more adult-like final rising intonation contours and their reading comprehension. The current study also examined whether inclusion of measures of intonation in the SVR accounted for additional variance in reading comprehension. The results supported inclusion of two intonation variables: 1) accuracy in producing appropriate final intonation contour direction to mark questions when reading; and 2) ability on the receptive subtests of the Profiling Elements of Prosodic Systems-Child assessment procedure (PEPS-C; Peppé & McCann, 2003), a computerized assessment of intonation. Additional statistical analyses indicated that the Chunking Reception and Contrastive Stress Reception subtests of the PEPS-C showed the strongest relationship with reading comprehension. Finally, inclusion of these intonation variables in a SVR framework reduced the significance of the relationship between the decoding and reading comprehension variables

    Intonation Production And Perception In Children With Developmental Language Impairment

    Get PDF
    Studies on intonation production and perception in children with developmental language impairment (LI) have reported mixed outcomes. Some suggest that intonation processing is impaired in this population but others fail to find any evidence of such a deficit. The issue is further complicated by findings that indicate that these children perform poorly on some intonation tasks but not on others. The source of the discrepant findings is unclear. However, one shortcoming is that most previous studies do not report information on severity of LI of participants. Thus, it may be that the mixed findings on intonation processing in children with developmental language impairment is attributable to severity of the disorder. The present study sought to investigate this possibility. Participants were 33 children with LI and 36 age-matched typically developing controls. Thirteen of the children in the experimental group had mild, 10 had moderate and 10 had severe language impairment. In two experiments, these childrenâs ability to produce (Experiment 1) and perceive (Experiment 2) intonation was assessed. In Experiment 1, participants were asked questions which required them to respond using broad or narrow focus constructions. Fundamental frequency, tonal alignment, word duration and intensity of the intonation contours produced were measured. In experiment 2, participants were presented sentences produced in broad and narrow focus and asked to discriminate between the two types of constructions. The results showed that children with mild LI performed comparably with typically developing peers on the production of all measures. However, the moderate and severe groups demonstrated difficulty producing word duration and intensity. In the perceptual experiment, all children with LI had difficulty discriminating between broad and narrow focus, with children in the severe group performing the poorest followed by the moderate and severe groups. The findings of the present study suggest that severity of language impairment plays a role in the discrepant findings on intonation processing in children with LI. It also suggests that these children may have more difficulty in the production of some acoustic correlates of intonation compared to others. The implications of these findings are discussed

    Effects of Phrase-Reading Ability, Syntactic Awareness, and Reading Rate on Reading Comprehension of Adolescent Readers in an Alternative Setting

    Get PDF
    Many adolescent readers do not acquire adequate reading skills, and over the past 40 years reading scores for adolescent students have not improved (Edmonds, Vaughn, Wexler, Reutebuch, & Cable, 2009; Lee, Grigg, & Donahue, 2007). The purposes of this study were (a) to explore the relationships among phrase-reading ability, passage reading rate, syntactic awareness and reading comprehension of students attending an alternative school, and (b) to investigate whether phrase-reading ability serves as a mediator (i.e., the mechanism that accounts for the relationship between the predictor and the criterion) between reading rate and comprehension, and between syntactic awareness and reading comprehension. Theories of automaticity (LaBerge & Samuels, 1974; Perfetti, 1985) and the structural precedence hypothesis (Koriat, Greenberg, & Kreiner, 2002) provide the theoretical basis for this investigation. To investigate the relation among reading rate, syntactic awareness, phrase-reading ability, and comprehension, a series of assessments was conducted with 70 students who attend an alternative school. The resulting data were analyzed using correlation analysis, hierarchical regression (Pedhazur, 1997), and mediation regression (Baron & Kenny, 1984). The hypotheses for adolescent readers in an alternative setting are: (a) Phrase-reading ability, syntactic awareness, passage reading rate, and reading comprehension will have a positive, significant correlation; (b) Language related variables (i.e., phrasing ability, syntactic awareness) will account for more of the variance in reading comprehension than passage reading rate; (c) Phrase-reading ability, as measured by phrase-level prosody, provides a mechanism or at least partially mediates how passage reading rate affects reading comprehension; (d) Phrase-reading ability, as measured by phrase-level prosody, provides a mechanism or at least partially mediates how syntactic awareness affects reading comprehension. Findings confirmed all hypotheses. Based on these findings, researchers should further investigate contributions that language related skills such as phrase-reading ability and syntactic awareness make to reading comprehension for adolescent readers and whether these findings when disaggregated hold true for students with disabilities and struggling adolescent readers. This investigation brought attention to the need for a standardized terminology concerning reading fluency

    An exploration of the rhythm of Malay

    Get PDF
    In recent years there has been a surge of interest in speech rhythm. However we still lack a clear understanding of the nature of rhythm and rhythmic differences across languages. Various metrics have been proposed as means for measuring rhythm on the phonetic level and making typological comparisons between languages (Ramus et al, 1999; Grabe & Low, 2002; Dellwo, 2006) but the debate is ongoing on the extent to which these metrics capture the rhythmic basis of speech (Arvaniti, 2009; Fletcher, in press). Furthermore, cross linguistic studies of rhythm have covered a relatively small number of languages and research on previously unclassified languages is necessary to fully develop the typology of rhythm. This study examines the rhythmic features of Malay, for which, to date, relatively little work has been carried out on aspects rhythm and timing. The material for the analysis comprised 10 sentences produced by 20 speakers of standard Malay (10 males and 10 females). The recordings were first analysed using rhythm metrics proposed by Ramus et. al (1999) and Grabe & Low (2002). These metrics (∆C, %V, rPVI, nPVI) are based on durational measurements of vocalic and consonantal intervals. The results indicated that Malay clustered with other so-called syllable-timed languages like French and Spanish on the basis of all metrics. However, underlying the overall findings for these metrics there was a large degree of variability in values across speakers and sentences, with some speakers having values in the range typical of stressed-timed languages like English. Further analysis has been carried out in light of Fletcher’s (in press) argument that measurements based on duration do not wholly reflect speech rhythm as there are many other factors that can influence values of consonantal and vocalic intervals, and Arvaniti’s (2009) suggestion that other features of speech should also be considered in description of rhythm to discover what contributes to listeners’ perception of regularity. Spectrographic analysis of the Malay recordings brought to light two parameters that displayed consistency and regularity for all speakers and sentences: the duration of individual vowels and the duration of intervals between intensity minima. This poster presents the results of these investigations and points to connections between the features which seem to be consistently regulated in the timing of Malay connected speech and aspects of Malay phonology. The results are discussed in light of current debate on the descriptions of rhythm

    Analysis of atypical prosodic patterns in the speech of people with Down syndrome

    Get PDF
    Producción CientíficaThe speech of people with Down syndrome (DS) shows prosodic features which are distinct from those observed in the oral productions of typically developing (TD) speakers. Although a different prosodic realization does not necessarily imply wrong expression of prosodic functions, atypical expression may hinder communication skills. The focus of this work is to ascertain whether this can be the case in individuals with DS. To do so, we analyze the acoustic features that better characterize the utterances of speakers with DS when expressing prosodic functions related to emotion, turn-end and phrasal chunking, comparing them with those used by TD speakers. An oral corpus of speech utterances has been recorded using the PEPS-C prosodic competence evaluation tool. We use automatic classifiers to prove that the prosodic features that better predict prosodic functions in TD speakers are less informative in speakers with DS. Although atypical features are observed in speakers with DS when producing prosodic functions, the intended prosodic function can be identified by listeners and, in most cases, the features correctly discriminate the function with analytical methods. However, a greater difference between the minimal pairs presented in the PEPS-C test is found for TD speakers in comparison with DS speakers. The proposed methodological approach provides, on the one hand, an identification of the set of features that distinguish the prosodic productions of DS and TD speakers and, on the other, a set of target features for therapy with speakers with DS.Ministerio de Economía, Industria y Competitividad - Fondo Europeo de Desarrollo Regional (grant TIN2017-88858-C2-1-R)Junta de Castilla y León (grant VA050G18

    Abnormal processing of emotional prosody in Williams syndrome: an event-related potentials study

    Get PDF
    Williams syndrome (WS), a neurodevelopmental genetic disorder due to a microdeletion in chromosome 7, is described as displaying an intriguing socio-cognitive phenotype. Deficits in prosody production and comprehension have been consistently reported in behavioral studies. It remains, however, to be clarified the neurobiological processes underlying prosody processing in WS. This study aimed at characterizing the electrophysiological response to neutral, happy, and angry prosody in WS, and examining if this response was dependent on the semantic content of the utterance. A group of 12 participants (5 female and 7male), diagnosed with WS, with age range between 9 and 31 years, was compared with a group of typically developing participants, individually matched for chronological age, gender and laterality. After inspection of EEG artifacts, data from 9 participants with WS and 10 controls were included in ERP analyses. Participants were presented with neutral, positive and negative sentences, in two conditions: (1) with intelligible semantic and syntactic information; (2) with unintelligible semantic and syntactic information (‘pure prosody’ condition). They were asked to decide which emotion was underlying the auditory sentence. Atypical event-related potentials (ERP) components were related with prosodic processing (N100, P200, N300) in WS. In particular, reduced N100 was observed for prosody sentences with semantic content; more positive P200 for sentences with semantic content, in particular for happy and angry intonations; and reduced N300 for both types of sentence conditions. These findings suggest abnormalities in early auditory processing, indicating a bottomup contribution to the impairment in emotional prosody processing and comprehension. Also, at least for N100 and P200, they suggest the top-down contributions of semantic processes in the sensory processing of speech. This study showed, for the first time, that abnormalities in ERP measures of early auditory processing in WS are also present during the processing of emotional vocal information. This may represent a physiological signature of underlying impaired on-line language and socio-emotional processing.This work was supported by a Doctoral Grant (SFRH/BD/35882/2007) awarded to APP, as well as by the grant PIC/IC/83290/2007Fundação para a Ciência e a Tecnologia (FCT

    Reading prosody in Spanish dyslexics

    Get PDF
    Reading becomes expressive when word and text reading are quick, accurate and automatic. Recent studies have reported that skilled readers use greater pitch changes and fewer irrelevant pauses than poor readers. Given that developmental dyslexics have difficulty acquiring and automating the alphabetic code and developing orthographic representations of words, it is possible that their use of prosody when reading differs from that of typical readers. The goal of this study was to investigate whether the reading prosody of Spanish-speaking dyslexics differs from that of typical Spanish readers. Two experiments were performed. The first experiment involved 36 children (18 with dyslexia), and the second involved 46 adults (23 with dyslexia). Participants were asked to read aloud a text which included declarative, exclamatory and interrogative sentences. Data on pausing and reading rate (number of pauses, duration of pauses and utterances), pitch changes, intensity changes and syllable lengthening were extracted from the recordings. We found that dyslexic people read more slowly than typical readers and they also made more inappropriate and longer pauses, even as adults with considerable reading experience. We also observed that dyslexics differed from skilled readers in their use of some prosodic features, particularly pitch changes at the end of sentences. This is probably because they have trouble anticipating some structural features of prose, such as sentence ends.Proyecto de Investigación del Ministerio de Economía y Competitividad: PSI2012-31913 Programa Severo Ochoa (FICYT): BP14-03

    운율 정보를 이용한 마비말장애 음성 자동 검출 및 평가

    Get PDF
    학위논문 (석사) -- 서울대학교 대학원 : 인문대학 언어학과, 2020. 8. Minhwa Chung.말장애는 신경계 또는 퇴행성 질환에서 가장 빨리 나타나는 증 상 중 하나이다. 마비말장애는 파킨슨병, 뇌성 마비, 근위축성 측삭 경화증, 다발성 경화증 환자 등 다양한 환자군에서 나타난다. 마비말장애는 조음기관 신경의 손상으로 부정확한 조음을 주요 특징으로 가지고, 운율에도 영향을 미치는 것으로 보고된다. 선행 연구에서는 운율 기반 측정치를 비장애 발화와 마비말장애 발화를 구별하는 것에 사용했다. 임상 현장에서는 마비말장애에 대한 운율 기반 분석이 마비말장애를 진단하거나 장애 양상에 따른 알맞은 치료법을 준비하는 것에 도움이 될 것이다. 따라서 마비말장애가 운율에 영향을 미치는 양상뿐만 아니라 마비말장애의 운율 특징을 긴밀하게 살펴보는 것이 필요하다. 구체 적으로, 운율이 어떤 측면에서 마비말장애에 영향을 받는지, 그리고 운율 애가 장애 정도에 따라 어떻게 다르게 나타나는지에 대한 분석이 필요하다. 본 논문은 음높이, 음질, 말속도, 리듬 등 운율을 다양한 측면에 서 살펴보고, 마비말장애 검출 및 평가에 사용하였다. 추출된 운율 특징들은 몇 가지 특징 선택 알고리즘을 통해 최적화되어 머신러닝 기반 분류기의 입력값으로 사용되었다. 분류기의 성능은 정확도, 정밀도, 재현율, F1-점수로 평가되었다. 또한, 본 논문은 장애 중증도(경도, 중등도, 심도)에 따라 운율 정보 사용의 유용성을 분석하였다. 마지막으로, 장애 발화 수집이 어려운 만큼, 본 연구는 교차 언어 분류기를 사용하였다. 한국어와 영어 장애 발화가 훈련 셋으로 사용되었으며, 테스트셋으로는 각 목표 언어만이 사용되었다. 실험 결과는 다음과 같이 세 가지를 시사한다. 첫째, 운율 정보 를 사용하는 것은 마비말장애 검출 및 평가에 도움이 된다. MFCC 만을 사용했을 때와 비교했을 때, 운율 정보를 함께 사용하는 것이 한국어와 영어 데이터셋 모두에서 도움이 되었다. 둘째, 운율 정보는 평가에 특히 유용하다. 영어의 경우 검출과 평가에서 각각 1.82%와 20.6%의 상대적 정확도 향상을 보였다. 한국어의 경우 검출에서는 향상을 보이지 않았지만, 평가에서는 13.6%의 상대적 향상이 나타났다. 셋째, 교차 언어 분류기는 단일 언어 분류기보다 향상된 결과를 보인다. 실험 결과 교차언어 분류기는 단일 언어 분류기와 비교했을 때 상대적으로 4.12% 높은 정확도를 보였다. 이것은 특정 운율 장애는 범언어적 특징을 가지며, 다른 언어 데이터를 포함시켜 데이터가 부족한 훈련 셋을 보완할 수 있 음을 시사한다.One of the earliest cues for neurological or degenerative disorders are speech impairments. Individuals with Parkinsons Disease, Cerebral Palsy, Amyotrophic lateral Sclerosis, Multiple Sclerosis among others are often diagnosed with dysarthria. Dysarthria is a group of speech disorders mainly affecting the articulatory muscles which eventually leads to severe misarticulation. However, impairments in the suprasegmental domain are also present and previous studies have shown that the prosodic patterns of speakers with dysarthria differ from the prosody of healthy speakers. In a clinical setting, a prosodic-based analysis of dysarthric speech can be helpful for diagnosing the presence of dysarthria. Therefore, there is a need to not only determine how the prosody of speech is affected by dysarthria, but also what aspects of prosody are more affected and how prosodic impairments change by the severity of dysarthria. In the current study, several prosodic features related to pitch, voice quality, rhythm and speech rate are used as features for detecting dysarthria in a given speech signal. A variety of feature selection methods are utilized to determine which set of features are optimal for accurate detection. After selecting an optimal set of prosodic features we use them as input to machine learning-based classifiers and assess the performance using the evaluation metrics: accuracy, precision, recall and F1-score. Furthermore, we examine the usefulness of prosodic measures for assessing different levels of severity (e.g. mild, moderate, severe). Finally, as collecting impaired speech data can be difficult, we also implement cross-language classifiers where both Korean and English data are used for training but only one language used for testing. Results suggest that in comparison to solely using Mel-frequency cepstral coefficients, including prosodic measurements can improve the accuracy of classifiers for both Korean and English datasets. In particular, large improvements were seen when assessing different severity levels. For English a relative accuracy improvement of 1.82% for detection and 20.6% for assessment was seen. The Korean dataset saw no improvements for detection but a relative improvement of 13.6% for assessment. The results from cross-language experiments showed a relative improvement of up to 4.12% in comparison to only using a single language during training. It was found that certain prosodic impairments such as pitch and duration may be language independent. Therefore, when training sets of individual languages are limited, they may be supplemented by including data from other languages.1. Introduction 1 1.1. Dysarthria 1 1.2. Impaired Speech Detection 3 1.3. Research Goals & Outline 6 2. Background Research 8 2.1. Prosodic Impairments 8 2.1.1. English 8 2.1.2. Korean 10 2.2. Machine Learning Approaches 12 3. Database 18 3.1. English-TORGO 20 3.2. Korean-QoLT 21 4. Methods 23 4.1. Prosodic Features 23 4.1.1. Pitch 23 4.1.2. Voice Quality 26 4.1.3. Speech Rate 29 4.1.3. Rhythm 30 4.2. Feature Selection 34 4.3. Classification Models 38 4.3.1. Random Forest 38 4.3.1. Support Vector Machine 40 4.3.1 Feed-Forward Neural Network 42 4.4. Mel-Frequency Cepstral Coefficients 43 5. Experiment 46 5.1. Model Parameters 47 5.2. Training Procedure 48 5.2.1. Dysarthria Detection 48 5.2.2. Severity Assessment 50 5.2.3. Cross-Language 51 6. Results 52 6.1. TORGO 52 6.1.1. Dysarthria Detection 52 6.1.2. Severity Assessment 56 6.2. QoLT 57 6.2.1. Dysarthria Detection 57 6.2.2. Severity Assessment 58 6.1. Cross-Language 59 7. Discussion 62 7.1. Linguistic Implications 62 7.2. Clinical Applications 65 8. Conclusion 67 References 69 Appendix 76 Abstract in Korean 79Maste

    Use of Prosody and Information Structure in High Functioning Adults with Autism in Relation to Language Ability

    Get PDF
    Abnormal prosody is a striking feature of the speech of those with Autism spectrum disorder (ASD), but previous reports suggest large variability among those with ASD. Here we show that part of this heterogeneity can be explained by level of language functioning. We recorded semi-spontaneous but controlled conversations in adults with and without ASD and measured features related to pitch and duration to determine (1) general use of prosodic features, (2) prosodic use in relation to marking information structure, specifically, the emphasis of new information in a sentence (focus) as opposed to information already given in the conversational context (topic), and (3) the relation between prosodic use and level of language functioning. We found that, compared to typical adults, those with ASD with high language functioning generally used a larger pitch range than controls but did not mark information structure, whereas those with moderate language functioning generally used a smaller pitch range than controls but marked information structure appropriately to a large extent. Both impaired general prosodic use and impaired marking of information structure would be expected to seriously impact social communication and thereby lead to increased difficulty in personal domains, such as making and keeping friendships, and in professional domains, such as competing for employment opportunities
    corecore