12 research outputs found
Speech evaluation tasks : normative data in 19-24 year old women
Includes bibliographical references.The purpose of this study was to augment dated normative studies of typical voice evaluation tasks. One hundred women aged 19-24 years completed several speech tasks. The following were calculated using the Speech Filing System software: s/z ratio, maximum phonation time (MPT), sustained pitch (fo), and diadokokinetic rates (DDK). Results indicated the one hundred female participants were comparable to previously published studies for DDK and Fo; however, they exhibited shorter phoneme durations.B.S. (Bachelor of Science
Using Web Audio To Deliver Interactive Speech Tools In The Browser
In 2014, the number of web pages delivered to tablets and smartphones overtook the number delivered to laptop and desktop computers, with a majority of users saying they prefer these new portable platforms over conventional computers for many tasks. This shift in device use provides both opportunities and challenges for providers of speech analysis tools, phonetic demonstrations and language teaching aids. It is an opportunity because web standards mean we can make our applications available to a wide audience through a single consistent programming architecture rather than writing for one particular computing platform. It is a challenge because tablets and smartphones are less powerful, require different programming skills and have different limitations in terms of user interface. In this article, I will show how interactive applications in Phonetics and Speech Science can be written to run in web browsers on any computing platform. These are native web applications, written in HTML, CSS and JavaScript that can capture, replay, display, process, and analyze audio using the Web Audio API without needing any plugins. I will describe - and give the URLs of - some demonstration applications. I will discuss some future opportunities in the area of collaborative research and some remaining challenges that arise from incompatibilities across browsers. My audience is teachers and students with intermediate web programming skills wanting to build custom speech displays, perform custom speech analysis or run speech audio experiments over the web
A quantitative assessment of group delay methods for identifying glottal closures in voiced speech
Published versio
Variações entoacionais internas às unidades de suporte prosódico em relação ao Tom Médio em narrativas (Tonal direction variation within prosodic pitch support units in relation to mean pitch in narratives)
Neste artigo, verificou-se se as unidades básicas de entoação longas e ascendentes são componentes da entoação. Selecionaram-se 52 narrativas em língua portuguesa e se extraíram as unidades mais longas, estabelecendo uma média de 34% do total de unidades. A entoação ascendente teve média de 0,6%. O teste de aderência foi de χ2(3,841)<14,9 com P<0,001, mostrando que a variação de frequência depende da duração. Os resultados mostraram que a variação pontual de frequência, dada pela frequência média das unidades, condiciona-se no PB à previsibilidade da série temporal e a variação interna das frequências atua na instância da mensagem.In this article it was verified whether the long and ascending basic intonation units are components of intonation. We selected 52 narratives in Portuguese and extracted the longest units, establishing an average of 34% of the total units. The upward tone had an average of 0.6%. The adhesion test was χ2 (3.841) <14.9 with P <0.001, showing that the frequency variation depends on the duration. Results showed that punctual variations of frequency, given by the average frequency of units, are conditioned by the predictability of the time series and the internal variation of the frequencies acts in the message instance.En este artículo, se verificó si las unidades de entonación básicas largas y crecientes son componentes de la entonación. Se seleccionaron 52 narrativas en portugués y se extrajeron las unidades más largas, estableciendo un promedio del 34% del total de unidades. La entonación ascendente promedió 0.6%. La prueba de adherencia fue χ2 (3.841) <14.9 con P <0.001, lo que demuestra que la variación de frecuencia depende de la duración. Los resultados mostraron que la variación puntual de frecuencia, dada por la frecuencia promedio de las unidades, está condicionada en BP a la previsibilidad de las series de tiempo y la variación interna de frecuencias actúa en la instancia del mensaje
Recommended from our members
Phonological and articulation treatment approaches in Portuguese children with speech and language impairments: a randomized controlled intervention study
Background
In Portugal, the routine clinical practice of speech and language therapists (SLTs) in treating children with all types of speech sound disorder (SSD) continues to be articulation therapy (AT). There is limited use of phonological therapy (PT) or phonological awareness training in Portugal. Additionally, at an international level there is a focus on collecting information on and differentiating between the effectiveness of PT and AT for children with different types of phonologically based SSD, as well as on the role of phonological awareness in remediating SSD. It is important to collect more evidence for the most effective and efficient type of intervention approach for different SSDs and for these data to be collected from diverse linguistic and cultural perspectives.
Aims
To evaluate the effectiveness of a PT and AT approach for treatment of 14 Portuguese children, aged 4.0–6.7 years, with a phonologically based SSD.
Methods & Procedures
The children were randomly assigned to one of the two treatment approaches (seven children in each group). All children were treated by the same SLT, blind to the aims of the study, over three blocks of a total of 25 weekly sessions of intervention. Outcome measures of phonological ability (percentage of consonants correct (PCC), percentage occurrence of different phonological processes and phonetic inventory) were taken before and after intervention. A qualitative assessment of intervention effectiveness from the perspective of the parents of participants was included.
Outcomes & Results
Both treatments were effective in improving the participants’ speech, with the children receiving PT showing a more significant improvement in PCC score than those receiving the AT. Children in the PT group also showed greater generalization to untreated words than those receiving AT. Parents reported both intervention approaches to be as effective in improving their children's speech.
Conclusions & Implications
The PT (combination of expressive phonological tasks, phonological awareness, listening and discrimination activities) proved to be an effective integrated method of improving phonological SSD in children. These findings provide some evidence for Portuguese SLTs to employ PT with children with phonologically based SSD
A Quantitative Assessment of Group Delay Methods for Identifying Glottal Closures in Voiced Speech
Abstract-Measures based on the group delay of the LPC residual have been used by a number of authors to identify the time instants of glottal closure in voiced speech. In this paper, we discuss the theoretical properties of three such measures and we also present a new measure having useful properties. We give a quantitative assessment of each measure's ability to detect glottal closure instants evaluated using a speech database that includes a direct measurement of glottal activity from a Laryngograph/EGG signal. We find that when using a fixed-length analysis window, the best measures can detect the instant of glottal closure in 97% of larynx cycles with a standard deviation of 0.6 ms and that in 9% of these cycles an additional excitation instant is found that normally corresponds to glottal opening. We show that some improvement in detection rate may be obtained if the analysis window length is adapted to the speech pitch. If the measures are applied to the preemphasized speech instead of to the LPC residual, we find that the timing accuracy worsens but the detection rate improves slightly. We assess the computational cost of evaluating the measures and we present new recursive algorithms that give a substantial reduction in computation in all cases
A Quantitative Assessment of Group Delay Methods for Identifying Glottal Closures in Voiced Speech
Abstract-Measures based on the group delay of the LPC residual have been used by a number of authors to identify the time instants of glottal closure in voiced speech. In this paper, we discuss the theoretical properties of three such measures and we also present a new measure having useful properties. We give a quantitative assessment of each measure's ability to detect glottal closure instants evaluated using a speech database that includes a direct measurement of glottal activity from a Laryngograph/EGG signal. We find that when using a fixed-length analysis window, the best measures can detect the instant of glottal closure in 97% of larynx cycles with a standard deviation of 0.6 ms and that in 9% of these cycles an additional excitation instant is found that normally corresponds to glottal opening. We show that some improvement in detection rate may be obtained if the analysis window length is adapted to the speech pitch. If the measures are applied to the preemphasized speech instead of to the LPC residual, we find that the timing accuracy worsens but the detection rate improves slightly. We assess the computational cost of evaluating the measures and we present new recursive algorithms that give a substantial reduction in computation in all cases
Updating the study protocol: Insight 46 - a longitudinal neuroscience sub-study of the MRC National Survey of Health and Development - phases 2 and 3
BACKGROUND: Although age is the biggest known risk factor for dementia, there remains uncertainty about other factors over the life course that contribute to a person's risk for cognitive decline later in life. Furthermore, the pathological processes leading to dementia are not fully understood. The main goals of Insight 46-a multi-phase longitudinal observational study-are to collect detailed cognitive, neurological, physical, cardiovascular, and sensory data; to combine those data with genetic and life-course information collected from the MRC National Survey of Health and Development (NSHD; 1946 British birth cohort); and thereby contribute to a better understanding of healthy ageing and dementia. METHODS/DESIGN: Phase 1 of Insight 46 (2015-2018) involved the recruitment of 502 members of the NSHD (median age = 70.7 years; 49% female) and has been described in detail by Lane and Parker et al. 2017. The present paper describes phase 2 (2018-2021) and phase 3 (2021-ongoing). Of the 502 phase 1 study members who were invited to a phase 2 research visit, 413 were willing to return for a clinic visit in London and 29 participated in a remote research assessment due to COVID-19 restrictions. Phase 3 aims to recruit 250 study members who previously participated in both phases 1 and 2 of Insight 46 (providing a third data time point) and 500 additional members of the NSHD who have not previously participated in Insight 46. DISCUSSION: The NSHD is the oldest and longest continuously running British birth cohort. Members of the NSHD are now at a critical point in their lives for us to investigate successful ageing and key age-related brain morbidities. Data collected from Insight 46 have the potential to greatly contribute to and impact the field of healthy ageing and dementia by combining unique life course data with longitudinal multiparametric clinical, imaging, and biomarker measurements. Further protocol enhancements are planned, including in-home sleep measurements and the engagement of participants through remote online cognitive testing. Data collected are and will continue to be made available to the scientific community
Continuous Emotion Prediction from Speech: Modelling Ambiguity in Emotion
There is growing interest in emotion research to model perceived emotion labelled as intensities along the affect dimensions such as arousal and valence. These labels are typically obtained from multiple annotators who would have their individualistic perceptions of emotional speech. Consequently, emotion prediction models that incorporate variation in individual perceptions as ambiguity in the emotional state would be more realistic. This thesis develops the modelling framework necessary to achieve continuous prediction of ambiguous emotional states from speech. Besides, emotion labels, feature space distribution and encoding are an integral part of the prediction system. The first part of this thesis examines the limitations of current low-level feature distributions and their minimalistic statistical descriptions. Specifically, front-end paralinguistic acoustic features are reflective of speech production mechanisms. However, discriminatively learnt features have frequently outperformed acoustic features in emotion prediction tasks, but provide no insights into the physical significance of these features. One of the contributions of this thesis is the development of a framework that can modify the acoustic feature representation based on emotion label information. Another investigation in this thesis indicates that emotion perception is language-dependent and in turn, helped develop a framework for cross-language emotion prediction. Furthermore, this investigation supported the hypothesis that emotion perception is highly individualistic and is better modelled as a distribution rather than a point estimate to encode information about the ambiguity in the perceived emotion. Following this observation, the thesis proposes measures to quantify the appropriateness of distribution types in modelling ambiguity in dimensional emotion labels which are then employed to compare well-known bounded parametric distributions. These analyses led to the conclusion that the beta distribution was the most appropriate parametric model of ambiguity in emotion labels. Finally, the thesis focuses on developing a deep learning framework for continuous emotion prediction as a temporal series of beta distributions, examining various parameterizations of the beta distributions as well as loss functions. Furthermore, distribution over the parameter spaces is examined and priors from kernel density estimation are employed to shape the posteriors over the parameter space which significantly improved valence ambiguity predictions. The proposed frameworks and methods have been extensively evaluated on multiple state of-the-art databases and the results demonstrate both the viability of predicting ambiguous emotion states and the validity of the proposed systems