300 research outputs found

    Objective dysphonia quantification in vocal fold paralysis: comparing nonlinear with classical measures

    Get PDF
    Clinical acoustic voice recording analysis is usually performed using classical perturbation measures including jitter, shimmer and noise-to-harmonic ratios. However, restrictive mathematical limitations of these measures prevent analysis for severely dysphonic voices. Previous studies of alternative nonlinear random measures addressed wide varieties of vocal pathologies. Here, we analyze a single vocal pathology cohort, testing the performance of these alternative measures alongside classical measures.

We present voice analysis pre- and post-operatively in unilateral vocal fold paralysis (UVFP) patients and healthy controls, patients undergoing standard medialisation thyroplasty surgery, using jitter, shimmer and noise-to-harmonic ratio (NHR), and nonlinear recurrence period density entropy (RPDE), detrended fluctuation analysis (DFA) and correlation dimension. Systematizing the preparative editing of the recordings, we found that the novel measures were more stable and hence reliable, than the classical measures, on healthy controls.

RPDE and jitter are sensitive to improvements pre- to post-operation. Shimmer, NHR and DFA showed no significant change (p > 0.05). All measures detect statistically significant and clinically important differences between controls and patients, both treated and untreated (p < 0.001, AUC > 0.7). Pre- to post-operation, GRBAS ratings show statistically significant and clinically important improvement in overall dysphonia grade (G) (AUC = 0.946, p < 0.001).

Re-calculating AUCs from other study data, we compare these results in terms of clinical importance. We conclude that, when preparative editing is systematized, nonlinear random measures may be useful UVFP treatment effectiveness monitoring tools, and there may be applications for other forms of dysphonia.
&#xa

    Extraction and Classification of Self-consumable Sport Video Highlights

    Get PDF
    This paper aims to automatically extract and classify self-consumable sport video highlights. For this purpose, we will emphasize the benefits of using play-break sequences as the effective inputs for HMM-based classifier. HMM is used to model the stochastic pattern of high-level states during specific sport highlights which correspond to the sequence of generic audio-visual measurements extracted from raw video data. This paper uses soccer as the domain study, focusing on the extraction and classification of goal, shot and foul highlights. The experiment work which uses183 play-break sequences from 6 soccer matches will be presented to demonstrate the performance of our proposed scheme

    SWIPE: A Sawtooth Waveform Inspired Pitch Estimator for Speech and Music

    Get PDF
    Se encuentra disponible en:http://www.cise.ufl.edu/~acamacho/publications/dissertation.pdfA Sawtooth Waveform Inspired Pitch Estimator (SWIPE) has been developed for processing speech and music. SWIPE is shown to outperform existing algorithms on several publicly available speech/musical-instruments databases and a disordered speech database. SWIPE estimates the pitch as the fundamental frequency of the sawtooth waveform whose spectrum best matches the spectrum of the input signal. A decaying cosine kernel provides an extension to older frequency-based, sieve-type estimation algorithms by providing smooth peaks with decaying amplitudes to correlate with the harmonics of the signal. An improvement on the algorithm is achieved by using only the first and prime harmonics, which significantly reduces subharmonic errors commonly found in other pitch estimation algorithms.UCR::Vicerrectoría de Investigación::Unidades de Investigación::Ingeniería::Centro de Investigaciones en Tecnologías de Información y Comunicación (CITIC

    HARMONIC INTONATION AND IMPLICATION (ANALYSES AND COMPOSITIONS): Harmonic perception and intonation in the reception and performance of alternative tuning systems in contemporary composition

    Get PDF
    Most composers and theorists will acknowledge that some compromise is necessary when dealing with the limitations of human performance, perception, and the realities of acoustic theory. Identifying the thresholds for pitch discrimination and execution is an important point of departure for defining workable tuning schemes, and for training musicians to realise compositions in just intonation and other alternative tuning systems. The submitted paper 'HARMONIC INTONATION AND IMPLICATION (ANALYSES AND COMPOSITIONS): Harmonic perception and intonation in the reception and performance of alternative tuning systems in contemporary composition' is a phenomenological study of harmonic perception and intonation through the analysis of recordings, scores, theoretical papers, and discussion with practicing musicians. The examined repertoire covers western 'art' music of the late nineteenth to early twenty-first centuries. I approach my research from the composer's point of view though filtered through the ears and eyes of the performer, who is here considered 'expert listener'. lt is considered that intonation is a dynamic experience subject to influences beyond just intonation or equal temperament (the two poles for intonational reference)-the performance is assumed 'correct', rather than the idealised version of the composer. My goal is to relate the performance to the intentions of the composer and raise questions regarding the choice of notation, resolution of the tuning systems, the complexity of the harmonic concept, etc. and perhaps to suggest how to extend a general theory of harmony that embraces both musical practice and psychoacoustics. lt is with the understanding that harmonic implication affects intonation, but that intonation is subject to several other forces making intonation a complex system (and therefore not fully predictable)

    Analyses of Sustained Vowels in Down Syndrome (DS): A Case Study Using Spectrograms and Perturbation Data to Investigate Voice Quality in Four Adults With DS

    Get PDF
    OBJECTIVES: Automatic acoustic measures of voice quality in people with Down syndrome (DS) do not reliably reflect perceived voice qualities. This study used acoustic data and visual spectral data to investigate the relationship between perceived voice qualities and acoustic measures. STUDY DESIGN: Participants were four young adults (two males, two females; mean age 23.8 years) with DS and severe learning disabilities, at least one of whom had a hearing impairment. METHODS: Participants imitated sustained /i/, /u/, and /a/ vowels at predetermined target pitches within their vocal range. Medial portions of vowels were analyzed, using Praat, for fundamental frequency, harmonics-to-noise ratio, jitter, and shimmer. Spectrograms were used to identify the presence and the duration of subharmonics at onset and offset, and mid-vowel. The presence of diplophonia was assessed by auditory evaluation. RESULTS: Perturbation data were highest for /a/ vowels and lowest for /u/ vowels. Intermittent productions of subharmonics were evident in spectrograms, some of which coincided with perceived diplophonia. The incidence, location, duration, and intensity of subharmonics differed between the four participants. CONCLUSIONS: Although the acoustic data do not clearly indicate atypical phonation, diplophonia and subharmonics reflect nonmodal phonation. The findings suggest that these may contribute to different perceived voice qualities in the study group and that these qualities may result from intermittent involvement of supraglottal structures. Further research is required to confirm the findings in the wider DS population, and to assess the relationships between voice quality, vowel type, and physiological measures

    An objective test tool for pitch extractors' response attributes

    Full text link
    We propose an objective measurement method for pitch extractors' responses to frequency-modulated signals. It enables us to evaluate different pitch extractors with unified criteria. The method uses extended time-stretched pulses combined by binary orthogonal sequences. It provides simultaneous measurement results consisting of the linear and the non-linear time-invariant responses and random and time-varying responses. We tested representative pitch extractors using fundamental frequencies spanning 80~Hz to 400~Hz with 1/48 octave steps and produced more than 1000 modulation frequency response plots. We found that making scientific visualization by animating these plots enables us to understand different pitch extractors' behavior at once. Such efficient and effortless inspection is impossible by inspecting all individual plots. The proposed measurement method with visualization leads to further improvement of the performance of one of the extractors mentioned above. In other words, our procedure turns the specific pitch extractor into the best reliable measuring equipment that is crucial for scientific research. We open-sourced MATLAB codes of the proposed objective measurement method and visualization procedure.Comment: 5 pages, 9 figures, submitted to Interspeech2022. arXiv admin note: text overlap with arXiv:2111.0362
    • …
    corecore