12 research outputs found

    Estimation of glottal closure instants in voiced speech using the DYPSA algorithm

    Get PDF
    Published versio

    Speech evaluation tasks : normative data in 19-24 year old women

    Get PDF
    Includes bibliographical references.The purpose of this study was to augment dated normative studies of typical voice evaluation tasks. One hundred women aged 19-24 years completed several speech tasks. The following were calculated using the Speech Filing System software: s/z ratio, maximum phonation time (MPT), sustained pitch (fo), and diadokokinetic rates (DDK). Results indicated the one hundred female participants were comparable to previously published studies for DDK and Fo; however, they exhibited shorter phoneme durations.B.S. (Bachelor of Science

    Using Web Audio To Deliver Interactive Speech Tools In The Browser

    Get PDF
    In 2014, the number of web pages delivered to tablets and smartphones overtook the number delivered to laptop and desktop computers, with a majority of users saying they prefer these new portable platforms over conventional computers for many tasks. This shift in device use provides both opportunities and challenges for providers of speech analysis tools, phonetic demonstrations and language teaching aids. It is an opportunity because web standards mean we can make our applications available to a wide audience through a single consistent programming architecture rather than writing for one particular computing platform. It is a challenge because tablets and smartphones are less powerful, require different programming skills and have different limitations in terms of user interface. In this article, I will show how interactive applications in Phonetics and Speech Science can be written to run in web browsers on any computing platform. These are native web applications, written in HTML, CSS and JavaScript that can capture, replay, display, process, and analyze audio using the Web Audio API without needing any plugins. I will describe - and give the URLs of - some demonstration applications. I will discuss some future opportunities in the area of collaborative research and some remaining challenges that arise from incompatibilities across browsers. My audience is teachers and students with intermediate web programming skills wanting to build custom speech displays, perform custom speech analysis or run speech audio experiments over the web

    A quantitative assessment of group delay methods for identifying glottal closures in voiced speech

    No full text
    Published versio

    Variações entoacionais internas às unidades de suporte prosódico em relação ao Tom Médio em narrativas (Tonal direction variation within prosodic pitch support units in relation to mean pitch in narratives)

    Get PDF
    Neste artigo, verificou-se se as unidades básicas de entoação longas e ascendentes são componentes da entoação. Selecionaram-se 52 narrativas em língua portuguesa e se extraíram as unidades mais longas, estabelecendo uma média de 34% do total de unidades. A entoação ascendente teve média de 0,6%. O teste de aderência foi de χ2(3,841)<14,9 com P<0,001, mostrando que a variação de frequência depende da duração. Os resultados mostraram que a variação pontual de frequência, dada pela frequência média das unidades, condiciona-se no PB à previsibilidade da série temporal e a variação interna das frequências atua na instância da mensagem.In this article it was verified whether the long and ascending basic intonation units are components of intonation. We selected 52 narratives in Portuguese and extracted the longest units, establishing an average of 34% of the total units. The upward tone had an average of 0.6%. The adhesion test was χ2 (3.841) <14.9 with P <0.001, showing that the frequency variation depends on the duration. Results showed that punctual variations of frequency, given by the average frequency of units, are conditioned by the predictability of the time series and the internal variation of the frequencies acts in the message instance.En este artículo, se verificó si las unidades de entonación básicas largas y crecientes son componentes de la entonación. Se seleccionaron 52 narrativas en portugués y se extrajeron las unidades más largas, estableciendo un promedio del 34% del total de unidades. La entonación ascendente promedió 0.6%. La prueba de adherencia fue χ2 (3.841) <14.9 con P <0.001, lo que demuestra que la variación de frecuencia depende de la duración. Los resultados mostraron que la variación puntual de frecuencia, dada por la frecuencia promedio de las unidades, está condicionada en BP a la previsibilidad de las series de tiempo y la variación interna de frecuencias actúa en la instancia del mensaje

    A Quantitative Assessment of Group Delay Methods for Identifying Glottal Closures in Voiced Speech

    Get PDF
    Abstract-Measures based on the group delay of the LPC residual have been used by a number of authors to identify the time instants of glottal closure in voiced speech. In this paper, we discuss the theoretical properties of three such measures and we also present a new measure having useful properties. We give a quantitative assessment of each measure's ability to detect glottal closure instants evaluated using a speech database that includes a direct measurement of glottal activity from a Laryngograph/EGG signal. We find that when using a fixed-length analysis window, the best measures can detect the instant of glottal closure in 97% of larynx cycles with a standard deviation of 0.6 ms and that in 9% of these cycles an additional excitation instant is found that normally corresponds to glottal opening. We show that some improvement in detection rate may be obtained if the analysis window length is adapted to the speech pitch. If the measures are applied to the preemphasized speech instead of to the LPC residual, we find that the timing accuracy worsens but the detection rate improves slightly. We assess the computational cost of evaluating the measures and we present new recursive algorithms that give a substantial reduction in computation in all cases

    A Quantitative Assessment of Group Delay Methods for Identifying Glottal Closures in Voiced Speech

    Get PDF
    Abstract-Measures based on the group delay of the LPC residual have been used by a number of authors to identify the time instants of glottal closure in voiced speech. In this paper, we discuss the theoretical properties of three such measures and we also present a new measure having useful properties. We give a quantitative assessment of each measure's ability to detect glottal closure instants evaluated using a speech database that includes a direct measurement of glottal activity from a Laryngograph/EGG signal. We find that when using a fixed-length analysis window, the best measures can detect the instant of glottal closure in 97% of larynx cycles with a standard deviation of 0.6 ms and that in 9% of these cycles an additional excitation instant is found that normally corresponds to glottal opening. We show that some improvement in detection rate may be obtained if the analysis window length is adapted to the speech pitch. If the measures are applied to the preemphasized speech instead of to the LPC residual, we find that the timing accuracy worsens but the detection rate improves slightly. We assess the computational cost of evaluating the measures and we present new recursive algorithms that give a substantial reduction in computation in all cases

    Updating the study protocol: Insight 46 - a longitudinal neuroscience sub-study of the MRC National Survey of Health and Development - phases 2 and 3

    Get PDF
    BACKGROUND: Although age is the biggest known risk factor for dementia, there remains uncertainty about other factors over the life course that contribute to a person's risk for cognitive decline later in life. Furthermore, the pathological processes leading to dementia are not fully understood. The main goals of Insight 46-a multi-phase longitudinal observational study-are to collect detailed cognitive, neurological, physical, cardiovascular, and sensory data; to combine those data with genetic and life-course information collected from the MRC National Survey of Health and Development (NSHD; 1946 British birth cohort); and thereby contribute to a better understanding of healthy ageing and dementia. METHODS/DESIGN: Phase 1 of Insight 46 (2015-2018) involved the recruitment of 502 members of the NSHD (median age = 70.7 years; 49% female) and has been described in detail by Lane and Parker et al. 2017. The present paper describes phase 2 (2018-2021) and phase 3 (2021-ongoing). Of the 502 phase 1 study members who were invited to a phase 2 research visit, 413 were willing to return for a clinic visit in London and 29 participated in a remote research assessment due to COVID-19 restrictions. Phase 3 aims to recruit 250 study members who previously participated in both phases 1 and 2 of Insight 46 (providing a third data time point) and 500 additional members of the NSHD who have not previously participated in Insight 46. DISCUSSION: The NSHD is the oldest and longest continuously running British birth cohort. Members of the NSHD are now at a critical point in their lives for us to investigate successful ageing and key age-related brain morbidities. Data collected from Insight 46 have the potential to greatly contribute to and impact the field of healthy ageing and dementia by combining unique life course data with longitudinal multiparametric clinical, imaging, and biomarker measurements. Further protocol enhancements are planned, including in-home sleep measurements and the engagement of participants through remote online cognitive testing. Data collected are and will continue to be made available to the scientific community

    Continuous Emotion Prediction from Speech: Modelling Ambiguity in Emotion

    Full text link
    There is growing interest in emotion research to model perceived emotion labelled as intensities along the affect dimensions such as arousal and valence. These labels are typically obtained from multiple annotators who would have their individualistic perceptions of emotional speech. Consequently, emotion prediction models that incorporate variation in individual perceptions as ambiguity in the emotional state would be more realistic. This thesis develops the modelling framework necessary to achieve continuous prediction of ambiguous emotional states from speech. Besides, emotion labels, feature space distribution and encoding are an integral part of the prediction system. The first part of this thesis examines the limitations of current low-level feature distributions and their minimalistic statistical descriptions. Specifically, front-end paralinguistic acoustic features are reflective of speech production mechanisms. However, discriminatively learnt features have frequently outperformed acoustic features in emotion prediction tasks, but provide no insights into the physical significance of these features. One of the contributions of this thesis is the development of a framework that can modify the acoustic feature representation based on emotion label information. Another investigation in this thesis indicates that emotion perception is language-dependent and in turn, helped develop a framework for cross-language emotion prediction. Furthermore, this investigation supported the hypothesis that emotion perception is highly individualistic and is better modelled as a distribution rather than a point estimate to encode information about the ambiguity in the perceived emotion. Following this observation, the thesis proposes measures to quantify the appropriateness of distribution types in modelling ambiguity in dimensional emotion labels which are then employed to compare well-known bounded parametric distributions. These analyses led to the conclusion that the beta distribution was the most appropriate parametric model of ambiguity in emotion labels. Finally, the thesis focuses on developing a deep learning framework for continuous emotion prediction as a temporal series of beta distributions, examining various parameterizations of the beta distributions as well as loss functions. Furthermore, distribution over the parameter spaces is examined and priors from kernel density estimation are employed to shape the posteriors over the parameter space which significantly improved valence ambiguity predictions. The proposed frameworks and methods have been extensively evaluated on multiple state of-the-art databases and the results demonstrate both the viability of predicting ambiguous emotion states and the validity of the proposed systems
    corecore