6 research outputs found

    Influence of Smartphones and Software on Acoustic Voice Measures.

    Get PDF
    This study assessed the within-subject variability of voice measures captured using different recording devices (i.e., smartphones and head mounted microphone) and software programs (i.e., Analysis of Dysphonia in Speech and Voice (ADSV), Multi-dimensional Voice Program (MDVP), and Praat).  Correlations between the software programs that calculated the voice measures were also analyzed.  Results demonstrated no significant within-subject variability across devices and software and that some of the measures were highly correlated across software programs.  The study suggests that certain smartphones may be appropriate to record daily voice measures representing the effects of vocal loading within individuals.  In addition, even though different algorithms are used to compute voice measures across software programs, some of the programs and measures share a similar relationship.

    Accelerometer-based real-time voice activity detection using neck surface vibration measurement

    Get PDF
    University of Minnesota M.S. thesis.June 2019. Major: Kinesiology. Advisor: Jürgen Konczak. 1 computer file (PDF); ii, 44 pages.Speech analysis has a growing number of clinical and industry applications, all of which rely on Voice Activity Detection (VAD). Common VAD applications use microphones, which can be problematic in the presence of background noise and additional voices. Recent studies have utilized accelerometers instead of microphones as voice transducers. As part of a larger research project on impaired speech in the voce disorder spasmodic dysphonia (SD), this study aimed to explore the use of wearable accelerometers to detect speech. These accelerometers would be part of a real-time VAD system embedded in a wearable neck collar for patients with SD. This collar would deliver vibro-tactile stimulation (VTS) to the laryngeal muscles during speech as a therapy for these patients. The aims of this research concerned a) finding the ideal location on the neck to place the accelerometers and b) developing a VAD algorithm that reliably detects the onset and offset of speech based on these accelerometer signals. Methods: 6 healthy adult participants (M/F = 3/3, 26 ± SD = 5.1 years) vocalized 20 sample sentences under 12 conditions from a combination of 3 variables: 1) Normal or slow speed of speech, 2) Three accelerometer attachment locations: thyroid cartilage, sternocleidomastoid, and superior to the C7 vertebra, and 3) Application of VTS during speech in two locations. Time-synchronized acceleration and audio were recorded in each condition. Results: Number of onsets of voice activity and total time voiced, as calculated from application of the VAD algorithm to the acceleration data, were measured. The thyroid cartilage attachment location had over 90% accuracy detecting speech in both measures on average. The average accuracy of the sternocleidomastoid location was below 75% accuracy and was below 15% for C7. Discussion: Placing of an accelerometer at the thyroid cartilage for real-time detection of speech was shown to be feasible. The obtained usability data document that accelerometer signals at this anatomical landmark provide the most reliable data to detect speech. The other two locations tested were too variable in accuracy for implementing VAD. With respect to using the established VAD algorithm in the wearable collar device to treat voice symptoms in spasmodic dysphonia, one needs to state that the algorithm can be improved in robustness to filter out the noise caused by vibration. The use of advanced processing methods such as adaptive filtering will likely deliver the desired result

    Using Ambulatory Voice Monitoring to Investigate Common Voice Disorders: Research Update

    Get PDF
    Many common voice disorders are chronic or recurring conditions that are likely to result from inefficient and/or abusive patterns of vocal behavior, referred to as vocal hyperfunction. The clinical management of hyperfunctional voice disorders would be greatly enhanced by the ability to monitor and quantify detrimental vocal behaviors during an individual’s activities of daily life. This paper provides an update on ongoing work that uses a miniature accelerometer on the neck surface below the larynx to collect a large set of ambulatory data on patients with hyperfunctional voice disorders (before and after treatment) and matched-control subjects. Three types of analysis approaches are being employed in an effort to identify the best set of measures for differentiating among hyperfunctional and normal patterns of vocal behavior: (1) ambulatory measures of voice use that include vocal dose and voice quality correlates, (2) aerodynamic measures based on glottal airflow estimates extracted from the accelerometer signal using subject-specific vocal system models, and (3) classification based on machine learning and pattern recognition approaches that have been used successfully in analyzing long-term recordings of other physiological signals. Preliminary results demonstrate the potential for ambulatory voice monitoring to improve the diagnosis and treatment of common hyperfunctional voice disorders

    Smartphone-based detection of voice disorders by long-term monitoring of neck acceleration features

    No full text
    Abstract—Many common voice disorders are chronic or recurring conditions that are likely to result from inefficient and/or abusive patterns of vocal behavior, termed vocal hyperfunction. Thus an ongoing goal in clinical voice assessment is the longterm monitoring of noninvasively derived measures to track hyperfunction. This paper reports on a smartphone-based voice health monitor that records the high-bandwidth accelerometer signal from the neck skin above the collarbone. Data collection is under way from patients with vocal hyperfunction and matchedcontrol subjects to create a dataset designed to identify the best set of diagnostic measures for hyperfunctional patterns of vocal behavior. Vocal status is tracked from neck acceleration using previously-developed vocal dose measures and novel model-based features of glottal airflow estimates. Clinically, the treatment of hyperfunctional disorders would be greatly enhanced by the ability to unobtrusively monitor and quantify detrimental behaviors and, ultimately, to provide real-time biofeedback that could facilitate healthier voice use. Index Terms—voice use, vocal hyperfunction, voice production model, accelerometer sensor, wearable voice sensor I

    Towards vocal-behaviour and vocal-health assessment using distributions of acoustic parameters

    Get PDF
    Voice disorders at different levels are affecting those professional categories that make use of voice in a sustained way and for prolonged periods of time, the so-called occupational voice users. In-field voice monitoring is needed to investigate voice behaviour and vocal health status during everyday activities and to highlight work-related risk factors. The overall aim of this thesis is to contribute to the identification of tools, procedures and requirements related to the voice acoustic analysis as objective measure to prevent voice disorders, but also to assess them and furnish proof of outcomes during voice therapy. The first part of this thesis includes studies on vocal-load related parameters. Experiments were performed both in-field and in laboratory. A one-school year longitudinal study of teachers’ voice use during working hours was performed in high school classrooms using a voice analyzer equipped with a contact sensor; further measurements took place in the semi-anechoic and reverberant rooms of the National Institute of Metrological Research (I.N.Ri.M.) in Torino (Italy) for investigating the effects of very low and excessive reverberation in speech intensity, using both microphones in air and contact sensors. Within this framework, the contributions of the sound pressure level (SPL) uncertainty estimation using different devices were also assessed with proper experiments. Teachers adjusted their voice significantly with noise and reverberation, both at the beginning and at the end of the school year. Moreover, teachers who worked in the worst acoustic conditions showed higher SPLs and a worse vocal health status at the end of the school year. The minimum value of speech SPL was found for teachers in classrooms with a reverberation time of about 0.8 s. Participants involved into the in-laboratory experiments significantly increased their speech intensity of about 2.0 dB in the semi-anechoic room compared with the reverberant room, when describing a map. Such results are related to the speech monitorings performed with the vocal analyzer, whose uncertainty estimation for SPL differences resulted of about 1 dB. The second part of this thesis was addressed to vocal health and voice quality assessment using different speech materials and devices. Experiments were performed in clinics, in collaboration with the Department of Surgical Sciences of Università di Torino (Italy) and the Department of Clinical Science, Intervention and Technology of Karolinska Institutet in Stockholm (Sweden). Individual distributions of Cepstral Peak Prominence Smoothed (CPPS) from voluntary patients and control subjects were investigated in sustained vowels, reading, free speech and excerpted vowels from continuous speech, which were acquired with microphones in air and contact sensors. The main influence quantities of the estimated cepstral parameters were also identified, which are the fundamental frequency of the vocalization and the broadband noise superimposed to the signal. In addition, the reliability of CPPS estimation with respect to the frequency content of the vocal spectrum was evaluated, which is mainly dependent on the bandwidth of the measuring chain used to acquire the vocal signal. Regarding the speech materials acquired with the microphone in air, the 5th percentile resulted the best statistic for CPPS distributions that can discriminate healthy and unhealthy voices in sustained vowels, while the 95th percentile was the best in both reading and free speech tasks. The discrimination thresholds were 15 dB (95\% Confidence Interval, CI, of 0.7 dB) and 18 dB (95\% CI of 0.6 dB), respectively, where lower values indicate a high probability to have unhealthy voice. Preliminary outcomes on excerpted vowels from continuous speech stated that a CPPS mean value lower than 14 dB designates pathological voices. CPPS distributions were also effective as proof of outcomes after interventions, e.g. voice therapy and phonosurgery. Concerning the speech materials acquired with the electret contact sensor, a reasonable discrimination power was only obtained in the case of sustained vowel, where the standard deviation of CPPS distribution higher than 1.1 dB (95\% CI of 0.2 dB) indicates a high probability to have unhealthy voice. Further results indicated that a reliable estimation of CPPS parameters is obtained provided that the frequency content of the spectrum is not lower than 5 kHz: such outcome provides a guideline on the bandwidth of the measuring chain used to acquire the vocal signal

    Assessments of Voice Use, Voice Quality, and Perceived Singing Voice Function Among College/University Singing Students Ages 18-24 Through Simultaneous Ambulatory Monitoring With Accelerometer and Acoustic Transducers

    Get PDF
    Previous vocal dose studies have analyzed the duration, intensity and frequency (in Hz) of voice use among college/university singing students through ambulatory monitoring. However, no ambulatory studies of this population have acquired these vocal dose data simultaneously with acoustic measures of voice quality in order to facilitate direct comparisons of voice use with voice quality during the same voicing period. The purpose of this study was to assess the voice use, voice quality, and perceived singing voice function of college/university singing students (N = 19), ages 18-24 years, enrolled in both voice lessons and choir, through (a) measurements of vocal dose and voice quality collected over 3 full days of ambulatory monitoring with an unfiltered neck accelerometer signal acquired with the Sonovox AB VoxLog portable voice analyzer collar; (b) measurements of voice quality during singing and speaking vocal tasks acquired at 3 different times of day by the VoxLog collar's acoustic and accelerometer transducers; and (c) multiple applications of the Evaluation of the Ability to Sing Easily (EASE) questionnaire about perceived singing voice function. Vocal dose metrics included phonation percentage, dose time, cycle dose, and distance dose. Voice quality measures included fundamental frequency (F0), perceived pitch (P0), dB SPL, LTAS slope, alpha ratio, dB SPL 1-3 kHz, pitch strength, shimmer, jitter, and harmonic-to-noise ratio. Major findings indicated that among these students (a) higher vocal doses correlated significantly with greater voice amplitude, more vocal clarity, and less perturbation; (b) there were significant differences in vocal dose and voice quality among non-singing, solo singing, and choral singing time periods; (c) analysis of repeated vocal tasks with the acoustic transducer showed that F0, P0, SPL, and resonance measures displayed increases from morning to afternoon to evening; (d) less perceived ability to sing easily correlated positively with higher frequency and lower amplitude when analyzing repeated vocal tasks with the acoustic transducer; and (e) the two transducers exhibited significant and irregular differences in data simultaneously obtained for 8 of the 10 measures of voice quality
    corecore