Search CORE

24 research outputs found

Acoustic and videoendoscopic techniques to improve voice assessment via relative fundamental frequency

Author: Vojtech Jennifer Michele
Publication venue
Publication date: 29/09/2020
Field of study

Quantitative measures of laryngeal muscle tension are needed to improve assessment and track clinical progress. Although relative fundamental frequency (RFF) shows promise as an acoustic estimate of laryngeal muscle tension, it is not yet transferable to the clinic. The purpose of this work was to refine algorithmic estimation of RFF, as well as to enhance the knowledge surrounding the physiological underpinnings of RFF. The first study used a large database of voice samples collected from 227 speakers with voice disorders and 256 typical speakers to evaluate the effects of fundamental frequency estimation techniques and voice sample characteristics on algorithmic RFF estimation. By refining fundamental frequency estimation using the Auditory Sawtooth Waveform Inspired Pitch Estimator—Prime (Auditory-SWIPE′) algorithm and accounting for sample characteristics via the acoustic measure, pitch strength, algorithmic errors related to the accuracy and precision of RFF were reduced by 88.4% and 17.3%, respectively. The second study sought to characterize the physiological factors influencing acoustic outputs of RFF estimation. A group of 53 speakers with voice disorders and 69 typical speakers each produced the utterance, /ifi/, while simultaneous recordings were collected using a microphone and flexible nasendoscope. Acoustic features calculated via the microphone signal were examined in reference to the physiological initiation and termination of vocal fold vibration. The features that corresponded with these transitions were then implemented into the RFF algorithm, leading to significant improvements in the precision of the RFF algorithm to reflect the underlying physiological mechanisms for voicing offsets (p < .001, V = .60) and onsets (p < .001, V = .54) when compared to manual RFF estimation. The third study further elucidated the physiological underpinnings of RFF by examining the contribution of vocal fold abduction to RFF during intervocalic voicing offsets. Vocal fold abductory patterns were compared to RFF values in a subset of speakers from the second study, comprising young adults, older adults, and older adults with Parkinson’s disease. Abductory patterns were not significantly different among the three groups; however, vocal fold abduction was observed to play a significant role in measures of RFF at voicing offset. By improving algorithmic estimation and elucidating aspects of the underlying physiology affecting RFF, this work adds to the utility of RFF for use in conjunction with current clinical techniques to assess laryngeal muscle tension.2021-09-29T00:00:00

Boston University Institutional Repository (OpenBU)

Comparison Among Phonation of the Sustained Vowel /ε/, Lip Trills, and Tongue Trills: The Amplitude of Vocal Fold Vibration and the Closed Quotient

Author: Arlindo Neto Montagnoli and Domingos Hiroshi Tsuji
Gislaine Ferro Cordeiro
Publication venue: 'IntechOpen'
Publication date: 23/05/2012
Field of study

A framework to measure human behaviour whilst reading

Author: Greyling Jean
Salehzadeh Seyed Amirsaleh
Publication venue: 'University of Zagreb, Faculty of Science, Department of Mathematics'
Publication date: 01/01/2019
Field of study

The brain is the most complex object in the known universe that gives a sense of being to humans and characterises human behaviour. Building models of brain functions is perhaps the most fascinating scientific challenge in the 21st century. Reading is a significant cognitive process in the human brain that plays a critical role in the vital process of learning and in performing some daily activities. The study of human behaviour during reading has been an area of interest for researchers in different fields of science. This thesis is based upon providing a novel framework, called ARSAT (Assisting Researchers in the Selection of Appropriate Technologies), that measures the behaviour of humans when reading text. The ARSAT framework aims at assisting researchers in the selection and application of appropriate technologies to measure the behaviour of a person who is reading text. The ARSAT framework will assist to researchers who investigate the reading process and find it difficult to select appropriate theories, metrics, data collection methods and data analytics techniques. The ARSAT framework enhances the ability of its users to select appropriate metrics indicating the effective factors on the characterisation of different aspects of human behaviour during the reading process. As will be shown in this research study, human behaviour is characterised by a complicated interplay of action, cognition and emotion. The ARSAT framework also facilitates selecting appropriate sensory technologies that can be used to monitor and collect data for the metrics. Moreover, this research study will introduce BehaveNet, a novel Deep Learning modelling approach, which can be used for training Deep Learning models of human behaviour from the sensory data collected. In this thesis, a comprehensive literature study is presented that was conducted to acquire adequate knowledge for designing the ARSAT framework. In order to identify the contributing factors that affect the reading process, an overview of some existing theories of the reading process is provided. Furthermore, a number of sensory technologies and techniques that can be applied to monitoring the changes in the metrics indicating the factors are also demonstrated. Only, the technologies that are commercially available on the market are recommended by the ARSAT framework. A variety of Machine Learning techniques were also investigated when designing the BehaveNet. The BehaveNet takes advantage of the complementarity of Convolutional Neural Networks, Long Short-Term Memory networks and Deep Neural Networks. The design of a Human Behaviour Monitoring System (HBMS), by utilising the ARSAT framework for recognising three attention-seeking activities of humans, is also presented in this research study. Reading printed text, as well as speaking out loudly and watching a programme on TV were proposed as activities that a person unintentionally may shift his/her attention from reading into distractions. Between sensory devices recommended by the ARSAT framework, the Muse headband which is an Electroencephalography (EEG) and head motion-sensing wearable device, was selected to track the forehead EEG and a person’s head movements. The EEG and 3-axes accelerometer data were recorded from eight participants when they read printed text, as well as the time they performed two other activities. An imbalanced dataset consisting over 1.2 million rows of noisy data was created and used to build a model of the activities (60% training and 20% validating data) and evaluating the model (20% of the data). The efficiency of the framework is demonstrated by comparing the performance of the models built by utilising the BehaveNet, with the models built by utilising a number of competing Deep Learning models for raw EEG and accelerometer data, that have attained state-of-the-art performance. The classification results are evaluated by some metrics including the classification accuracy, F1 score, confusion matrix, Receiver Operating Characteristic curve, and Area under Curve (AUC) score. By considering the results, the BehaveNet contributed to the body of knowledge as an approach for measuring human behaviour by using sensory devices. In comparison with the performance of the other models, the models built by utilising the BehaveNet, attained better performance when classifying data of two EEG channels (Accuracy = 95%; AUC=0.99; F1 = 0.95), data of a single EEG channel (Accuracy = 85%; AUC=0.96; F1 = 0.83), accelerometer data (Accuracy = 81%; AUC = 0.9; F1 = 0.76) and all of the data in the dataset (Accuracy = 97%; AUC = 0.99; F1 = 0.96). The dataset and the source code of this project are also published on the Internet to help the science community. The Muse headband is also shown to be an economical and standard wearable device that can be successfully used in behavioural research

High Fidelity Computational Modeling and Analysis of Voice Production

Author: Jiang Weili
Publication venue: DigitalCommons@UMaine
Publication date: 30/12/2020
Field of study

This research aims to improve the fundamental understanding of the multiphysics nature of voice production, particularly, the dynamic couplings among glottal flow, vocal fold vibration and airway acoustics through high-fidelity computational modeling and simulations. Built upon in-house numerical solvers, including an immersed-boundary-method based incompressible flow solver, a finite element method based solid mechanics solver and a hydrodynamic/aerodynamic splitting method based acoustics solver, a fully coupled, continuum mechanics based fluid-structure-acoustics interaction model was developed to simulate the flow-induced vocal fold vibrations and sound production in birds and mammals. Extensive validations of the model were conducted by comparing to excised syringeal and laryngeal experiments. The results showed that, driven by realistic representations of physiology and experimental conditions, including the geometries, material properties and boundary conditions, the model had an excellent agreement with the experiments on the vocal fold vibration patterns, acoustics and intraglottal flow dynamics, demonstrating that the model is able to reproduce realistic phonatory dynamics during voice production. The model was then utilized to investigate the effect of vocal fold inner structures on voice production. Assuming the human vocal fold to be a three-layer structure, this research focused on the effect of longitudinal variation of layer thickness as well as the cover-body thickness ratio on vocal fold vibrations. The results showed that the longitudinal variation of the cover and ligament layers thicknesses had little effect on the flow rate, vocal fold vibration amplitude and pattern but affected the glottal angle in different coronal planes, which also influenced the energy transfer between glottal flow and the vocal fold. The cover-body thickness ratio had a complex nonlinear effect on the vocal fold vibration and voice production. Increasing the cover-body thickness ratio promoted the excitation of the wave-type modes of the vocal fold, which were also higher-eigenfrequency modes, driving the vibrations to higher frequencies. This has created complex nonlinear bifurcations. The results from the research has important clinical implications on voice disorder diagnosis and treatment as voice disorders are often associated with mechanical status changes of the vocal fold tissues and their treatment often focus on restoring the mechanical status of the vocal folds

University of Maine

Comparisons of auditorium acoustics measurements as a function of location in halls (A)

Author: Bradley J. S.
Gade Anders Christian
Siebein G W
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/1993
Field of study