25 research outputs found

    Pan European Voice Conference - PEVOC 11

    Get PDF
    The Pan European VOice Conference (PEVOC) was born in 1995 and therefore in 2015 it celebrates the 20th anniversary of its establishment: an important milestone that clearly expresses the strength and interest of the scientific community for the topics of this conference. The most significant themes of PEVOC are singing pedagogy and art, but also occupational voice disorders, neurology, rehabilitation, image and video analysis. PEVOC takes place in different European cities every two years (www.pevoc.org). The PEVOC 11 conference includes a symposium of the Collegium Medicorum Theatri (www.comet collegium.com

    Evaluating the translational potential of relative fundamental frequency

    Get PDF
    Relative fundamental frequency (RFF) is an acoustic measure that quantifies short-term changes in fundamental frequency during voicing transitions surrounding a voiceless consonant. RFF is hypothesized to be decreased by increased laryngeal tension during voice production and has been considered a potential objective measure of vocal hyperfunction. Previous studies have supported claims that decreased RFF values may indicate the severity of vocal hyperfunction and have attempted to improve the methods to obtain RFF. In order to make progress towards developing RFF into a clinical measure, this dissertation aimed to investigate further the validity and reliability of RFF. Specifically, we examined the underlying physiological mechanisms, the auditory-perceptual relationship with strained voice quality, and test-retest reliability. The first study evaluated one of the previously hypothesized physiological mechanisms for RFF, vocal fold abduction. Vocal fold kinematics and RFF were obtained from both younger and older typical speakers producing RFF stimuli with voiceless fricatives and stops during high-speed videoendoscopy. We did not find any statistical differences between younger and older speakers, but we found that vocal folds were less adducted and RFF was lower at voicing onset after the voiceless stop compared to the fricative. This finding is in accordance with the hypothesized positive association between vocal fold contact area during voicing transitions and RFF. The second study examined the relationship between RFF and strain, a major auditory-perceptual feature of vocal hyperfunction. RFF values were synthetically modified by exchanging the RFF contours between voice samples that were produced with a comfortable voice and with maximum vocal effort, while other acoustic features remained constant. We observed that comfortable voice samples with the RFF values of maximum vocal effort samples had increased strain ratings, whereas maximum vocal effort samples with the RFF values of comfortable voice samples had decreased strain ratings. These findings support the contribution of RFF to perceived strain. The third study compared the test-retest reliability of RFF with that of conventional voice measures. We recorded individuals with healthy voices during five consecutive days and obtained acoustic, aerodynamic, and auditory-perceptual measures from the recordings. RFF was comparably reliable as acoustic and aerodynamic measures and more reliable than auditory-perceptual measures. This dissertation supports the translational potential of RFF by providing empirical evidence of the physiological mechanisms of RFF, the relationship between RFF and perceived strain, and test-retest reliability of RFF. Clinical applications of RFF are expected to improve objective diagnosis and assessment of vocal hyperfunction, and thus to lead to better voice care for individuals with vocal hyperfunction.2021-09-25T00:00:00

    SCALING ARTIFICIAL INTELLIGENCE IN ENDOSCOPY: FROM MODEL DEVELOPMENT TO MACHINE LEARNING OPERATIONS FRAMEWORKS

    Get PDF
    Questa tesi esplora l'integrazione dell'intelligenza artificiale (IA) in Otorinolaringoiatria – Chirurgia di Testa e Collo, concentrandosi sui progressi della computer vision per l’endoscopia e le procedure chirurgiche. La ricerca inizia con una revisione completa dello stato dell’arte dell'IA e della computer vision in questo campo, identificando aree per ulteriori sviluppi. L'obiettivo principale è stato quello di sviluppare un sistema di computer vision per l'analisi di immagini e video endoscopici. La ricerca ha coinvolto la progettazione di strumenti per la rilevazione e segmentazione di neoplasie nelle vie aerodigestive superiori (VADS) e la valutazione della motilità delle corde vocali, cruciale nella stadiazione del carcinoma laringeo. Inoltre, lo studio si è focalizzato sul potenziale dei foundation vision models, vision transformers basati su self-supervised learning, per ridurre la necessità di annotazione da parte di esperti, approccio particolarmente vantaggioso in campi con dati limitati. Inoltre, la ricerca ha incluso lo sviluppo di un'applicazione web per migliorare e velocizzare il processo di annotazione in endoscopia delle VADS, nell’ambito generale delle tecniche di MLOps. La tesi copre varie fasi della ricerca, a partire dalla definizione del quadro concettuale e della metodologia, denominata "Videomics". Include una revisione della letteratura sull'IA in endoscopia clinica, focalizzata sulla Narrow Band Imaging (NBI) e sulle reti neurali convoluzionali (CNN). Lo studio progredisce attraverso diverse fasi, dalla valutazione della qualità delle immagini endoscopiche alla caratterizzazione approfondita delle lesioni neoplastiche. Si affronta anche la necessità di standard nel reporting degli studi di computer vision in ambito medico e si valuta l'applicazione dell'IA in setting dinamici come la motilità delle corde vocali. Una parte significativa della ricerca indaga l'uso di algoritmi di computer vision generalizzati (“foundation models”) e la “commoditization” degli algoritmi di machine learning, utilizzando polipi nasali e il carcinoma orofaringeo come casi studio. Infine, la tesi discute lo sviluppo di ENDO-CLOUD, un sistema basato su cloud per l’analisi della videolaringoscopia, evidenziando le sfide e le soluzioni nella gestione dei dati e l’utilizzo su larga scala di modelli di IA nell'imaging medico.This thesis explores the integration of artificial intelligence (AI) in Otolaryngology – Head and Neck Surgery, focusing on advancements in computer vision for endoscopy and surgical procedures. It begins with a comprehensive review of AI and computer vision advancements in this field, identifying areas for further exploration. The primary aim was to develop a computer vision system for endoscopy analysis. The research involved designing tools for detecting and segmenting neoplasms in the upper aerodigestive tract (UADT) and assessing vocal fold motility, crucial in laryngeal cancer staging. Further, the study delves into the potential of vision foundation models, like vision transformers trained via self-supervision, to reduce the need for expert annotations, particularly beneficial in fields with limited cases. Additionally, the research includes the development of a web application for enhancing and speeding up the annotation process in UADT endoscopy, under the umbrella of Machine Learning Operations (MLOps). The thesis covers various phases of research, starting with defining the conceptual framework and methodology, termed "Videomics". It includes a literature review on AI in clinical endoscopy, focusing on Narrow Band Imaging (NBI) and convolutional neural networks (CNNs). The research progresses through different stages, from quality assessment of endoscopic images to in-depth characterization of neoplastic lesions. It also addresses the need for standards in medical computer vision study reporting and evaluates the application of AI in dynamic vision scenarios like vocal fold motility. A significant part of the research investigates the use of "general purpose" vision algorithms and the commoditization of machine learning algorithms, using nasal polyps and oropharyngeal cancer as case studies. Finally, the thesis discusses the development of ENDO-CLOUD, a cloud-based system for videolaryngoscopy, highlighting the challenges and solutions in data management and the large-scale deployment of AI models in medical imaging

    Acoustic and videoendoscopic techniques to improve voice assessment via relative fundamental frequency

    Get PDF
    Quantitative measures of laryngeal muscle tension are needed to improve assessment and track clinical progress. Although relative fundamental frequency (RFF) shows promise as an acoustic estimate of laryngeal muscle tension, it is not yet transferable to the clinic. The purpose of this work was to refine algorithmic estimation of RFF, as well as to enhance the knowledge surrounding the physiological underpinnings of RFF. The first study used a large database of voice samples collected from 227 speakers with voice disorders and 256 typical speakers to evaluate the effects of fundamental frequency estimation techniques and voice sample characteristics on algorithmic RFF estimation. By refining fundamental frequency estimation using the Auditory Sawtooth Waveform Inspired Pitch Estimator—Prime (Auditory-SWIPE′) algorithm and accounting for sample characteristics via the acoustic measure, pitch strength, algorithmic errors related to the accuracy and precision of RFF were reduced by 88.4% and 17.3%, respectively. The second study sought to characterize the physiological factors influencing acoustic outputs of RFF estimation. A group of 53 speakers with voice disorders and 69 typical speakers each produced the utterance, /ifi/, while simultaneous recordings were collected using a microphone and flexible nasendoscope. Acoustic features calculated via the microphone signal were examined in reference to the physiological initiation and termination of vocal fold vibration. The features that corresponded with these transitions were then implemented into the RFF algorithm, leading to significant improvements in the precision of the RFF algorithm to reflect the underlying physiological mechanisms for voicing offsets (p < .001, V = .60) and onsets (p < .001, V = .54) when compared to manual RFF estimation. The third study further elucidated the physiological underpinnings of RFF by examining the contribution of vocal fold abduction to RFF during intervocalic voicing offsets. Vocal fold abductory patterns were compared to RFF values in a subset of speakers from the second study, comprising young adults, older adults, and older adults with Parkinson’s disease. Abductory patterns were not significantly different among the three groups; however, vocal fold abduction was observed to play a significant role in measures of RFF at voicing offset. By improving algorithmic estimation and elucidating aspects of the underlying physiology affecting RFF, this work adds to the utility of RFF for use in conjunction with current clinical techniques to assess laryngeal muscle tension.2021-09-29T00:00:00

    Unveiling healthcare data archiving: Exploring the role of artificial intelligence in medical image analysis

    Get PDF
    Gli archivi sanitari digitali possono essere considerati dei moderni database progettati per immagazzinare e gestire ingenti quantità di informazioni mediche, dalle cartelle cliniche dei pazienti, a studi clinici fino alle immagini mediche e a dati genomici. I dati strutturati e non strutturati che compongono gli archivi sanitari sono oggetto di scrupolose e rigorose procedure di validazione per garantire accuratezza, affidabilità e standardizzazione a fini clinici e di ricerca. Nel contesto di un settore sanitario in continua e rapida evoluzione, l’intelligenza artificiale (IA) si propone come una forza trasformativa, capace di riformare gli archivi sanitari digitali migliorando la gestione, l’analisi e il recupero di vasti set di dati clinici, al fine di ottenere decisioni cliniche più informate e ripetibili, interventi tempestivi e risultati migliorati per i pazienti. Tra i diversi dati archiviati, la gestione e l’analisi delle immagini mediche in archivi digitali presentano numerose sfide dovute all’eterogeneità dei dati, alla variabilità della qualità delle immagini, nonché alla mancanza di annotazioni. L’impiego di soluzioni basate sull’IA può aiutare a risolvere efficacemente queste problematiche, migliorando l’accuratezza dell’analisi delle immagini, standardizzando la qualità dei dati e facilitando la generazione di annotazioni dettagliate. Questa tesi ha lo scopo di utilizzare algoritmi di IA per l’analisi di immagini mediche depositate in archivi sanitari digitali. Il presente lavoro propone di indagare varie tecniche di imaging medico, ognuna delle quali è caratterizzata da uno specifico dominio di applicazione e presenta quindi un insieme unico di sfide, requisiti e potenziali esiti. In particolare, in questo lavoro di tesi sarà oggetto di approfondimento l’assistenza diagnostica degli algoritmi di IA per tre diverse tecniche di imaging, in specifici scenari clinici: i) Immagini endoscopiche ottenute durante esami di laringoscopia; ciò include un’esplorazione approfondita di tecniche come la detection di keypoints per la stima della motilità delle corde vocali e la segmentazione di tumori del tratto aerodigestivo superiore; ii) Immagini di risonanza magnetica per la segmentazione dei dischi intervertebrali, per la diagnosi e il trattamento di malattie spinali, così come per lo svolgimento di interventi chirurgici guidati da immagini; iii) Immagini ecografiche in ambito reumatologico, per la valutazione della sindrome del tunnel carpale attraverso la segmentazione del nervo mediano. Le metodologie esposte in questo lavoro evidenziano l’efficacia degli algoritmi di IA nell’analizzare immagini mediche archiviate. I progressi metodologici ottenuti sottolineano il notevole potenziale dell’IA nel rivelare informazioni implicitamente presenti negli archivi sanitari digitali

    Laryngeal reinnervation: feasibility studies and development of trial outcome measures

    Get PDF
    The unifying theme of this thesis is a series of research studies that collectively amount to a feasibility study for clinical trials of laryngeal reinnervation for the treatment of vocal fold paralysis. The question ‘Does laryngeal reinnervation or thyroplasty give better voice results for patients with unilateral vocal fold paralysis (UVFP)?’ remains outstanding; a question that ideally requires a randomised control trial. However, randomised control trials in surgery face inherent surgeons’ equipoise and recruitment issues that may lead to its failure. I performed a national survey of UK ENT consultants exploring their perception and obtaining crude numbers of eligible UVFP patients under their care for such trial, which revealed that the majority of ENT surgeons are receptive to the trial and the size of the potential patient pool is promising. I interviewed eligible UVFP patients to explore issues around the recruitment process, and this suggested that the proposed trial is feasible. Some phraseology used during recruitment that needed changing was identified, which may optimise the recruitment process for a trial. In voice surgery trials, outcome measures should be multidimensional and standardised. Acoustic analysis has been proposed but has limitations. OperaVOX is a potential new acoustic analysis software developed to resolve some of these factors. I demonstrated that OperaVOX is statistically comparable to the ‘gold standard’, Multidimensional Voice Programme, for most principal phonatory outcome measures. Another outcome measure- video-laryngostroboscopy, allows visual evaluation of characteristics and vibratory pattern of vocal folds. It is typically subjective that requires inter- and intra-rater reliability study. Here, I demonstrated that certain parameters depicted substantial inter- and intra-rater reliability. However, I showed that rater training is required to improve the reliability of other parameters. I investigated MRI as a potential non-invasive method to evaluate vocal muscles’ denervation and reinnervation. I found that signal changes on the T2-weighted MRI larynx images correlated with electrophysiological results with good repeatability. Another MRI sequence, dynamic contrast enhanced- and diffusion weighted MRI, suggested reduced perfusion in paralysed muscles, whilst cine-MRI for vocal fold mobility assessment demonstrated considerable potential as a method to grade vocal fold mobility. Finally, I present a small prospective case series of non-selective and selective laryngeal reinnervation in UVFP and unilateral vagal paralysis following vagal tumour excision respectively concomitant with injection laryngoplasty. Voice improvement was demonstrated by voice handicap index-10 and other multidimensional outcome measures, and these were supported by laryngeal electromyography and T2-weighted MRI outcomes. To my knowledge, this is the first multidimensional prospective study of laryngeal reinnervation and also the first to suggest that 3T MRI may be a promising outcome measure for future reinnervation trials. In summary, I have shown that a randomised trial of laryngeal reinnervation versus thyroplasty is feasible in the UK, and have validated patient- and observer-rated outcome measures. I have also shown that MRI may offer an alternative to electromyography in the assessment of laryngeal neuromuscular function in future trials and the clinic

    Velopharyngeal incompetence in cleft palate patients - flexible video pharyngoscopy & perceptual speech assessment.

    Get PDF
    Velopharyngeal in-competence(VPI) is a common abnormality seen in Cleft palate patients causing hyper nasal speech which is a major communicative disorder in such patients. Assessment of this VPI is complex process due the Velopharyngeal apparatus being a combination of soft palate structures that regulate the airflow from the lungs and larynx through the mouth for oral sounds through the nose for nasal sounds. The present study of perceptual speech and flexible video endoscopy in patients with cleft palate pathology indicates a co-relation between speech defect and type of VPI. In management of patients with cleft palate, it is important that surgical correction of the defect is done at the same time achieving velopharyngeal competency for speech without creating nasal airway obstruction. Velopharyngeal endoscopy with speech assessment will define the anatomic and functional bases for the velopharyngeal correction and also to plan /tailor pharyngeal flaps. This approach also appears to be a useful and necessary tool for ‘surgical feedback’. Hence a multidisciplinary approach involving Otolaryngologists / Plastic surgeons / Speech pathologists for preoperative evaluation of the defect with perceptual speech analysis and velopharyngeal endoscopy is mandatory

    The biomechanical properties of the human vocal fold.

    Get PDF

    Modelo de produção da voz baseado na biofísica da fonação.

    Get PDF
    A busca por novos modelos que representem a biofísica da fonação da voz é importante em aplicações que incluem o processamento do sinal de voz por representar uma ferramenta no conhecimento de característica dos locutores. Esta tese de doutorado apresenta uma nova abordagem para a teoria fonte-filtro de geração de voz, mais precisamente sons sonoros, que realiza a modelagem da voz por meio de três subsistemas independentes: fonte de excitação, trato vocal e radiação dos lábios e narinas. Trata-se de um modelo em que a geração da voz é feita por meio de filtros lineares e invariantes ao deslocamento no tempo e que leva em consideração a física da fonação, a partir da característica cicloestacionária do sinal de voz, proveniente do comportamento de vibração das cordas vocais. É sugerido que a frequência de oscilação das cordas vocais é dada em função da massa e comprimento delas, e que seu valor é alterado principalmente pela tensão longitudinal aplicada a elas. No modelo proposto para geração da voz, o movimento vibratório das cordas vocais é modelado por meio de um de gerador de trem de impulsos cicloestacionário, controlado por um sinal de tensão obtido a partir da forma de onda do sinal de voz. É realizada toda a análise matemática que abrange o novo modelo para a excitação glotal, apresentando-se uma expressão matemática da densidade espectral de potência do sinal que excita a glote, bem como para o sinal de voz, cujos parâmetros podem ser ajustados para emular patologias na glote. Além disso, apresenta-se a análise no domínio da frequência do pulso glotal usado. Para analisar o desempenho do modelo proposto, testes com locução foram realizados e os resultados indicam que o modelo proposto se ajusta bem a geração da voz.The search for new models that represent the biophysics of voice phonation is important for applications that include voice signal processing because it represents a tool for getting to know the characteristics of the speakers. This doctoral thesis presents a new proposal for the source-filter theory of voice production, more precisely related to voiced sounds, that performs the voice modelling using three independent subsystems: the excitation source, the vocal tract, the lip and nostrils radiation system. It is a proposal for a model to generate voice using linear and time-invariant systems, and takes into account the phonation physics and the cyclestationarity characteristics of the voice signal, related to the vibrational behavior of the vocal cords. The model suggests that the frequency oscillation of the vocal folds is a function of the mass and length, but controlled by the longitudinal tension applied to them. In the proposed voice generation model, the vibratory movement of the vocal cords is modeled by a cyclestationary train of impulses, controlled by a tension signal obtained from the voice signal waveform. A mathematical analysis encompassing the new model for glottal excitation is accomplished by presenting a mathematical expression of the signal power spectral density which excites the glottis, as well as the voice signal, whose parameters can be adjusted to emulate pathologies in the glottis. Moreover, the analysis of the utilized glottal pulse in the frequency domain is presented. To analyze the performance of the proposed model, tests with locutions were done and the results indicate that the proposed model adjusts well to voice generation.CNP
    corecore