466 research outputs found

    Fully automatic segmentation of glottis and vocal folds in endoscopic laryngeal high-speed videos using a deep Convolutional LSTM Network

    Get PDF
    The objective investigation of the dynamic properties of vocal fold vibrations demands the recording and further quantitative analysis of laryngeal high-speed video (HSV). Quantification of the vocal fold vibration patterns requires as a first step the segmentation of the glottal area within each video frame from which the vibrating edges of the vocal folds are usually derived. Consequently, the outcome of any further vibration analysis depends on the quality of this initial segmentation process. In this work we propose for the first time a procedure to fully automatically segment not only the time-varying glottal area but also the vocal fold tissue directly from laryngeal high-speed video (HSV) using a deep Convolutional Neural Network (CNN) approach. Eighteen different Convolutional Neural Network (CNN) network configurations were trained and evaluated on totally 13,000 high-speed video (HSV) frames obtained from 56 healthy and 74 pathologic subjects. The segmentation quality of the best performing Convolutional Neural Network (CNN) model, which uses Long Short-Term Memory (LSTM) cells to take also the temporal context into account, was intensely investigated on 15 test video sequences comprising 100 consecutive images each. As performance measures the Dice Coefficient (DC) as well as the precisions of four anatomical landmark positions were used. Over all test data a mean Dice Coefficient (DC) of 0.85 was obtained for the glottis and 0.91 and 0.90 for the right and left vocal fold (VF) respectively. The grand average precision of the identified landmarks amounts 2.2 pixels and is in the same range as comparable manual expert segmentations which can be regarded as Gold Standard. The method proposed here requires no user interaction and overcomes the limitations of current semiautomatic or computational expensive approaches. Thus, it allows also for the analysis of long high-speed video (HSV)-sequences and holds the promise to facilitate the objective analysis of vocal fold vibrations in clinical routine. The here used dataset including the ground truth will be provided freely for all scientific groups to allow a quantitative benchmarking of segmentation approaches in future

    Validated respiratory drug deposition predictions from 2D and 3D medical images with statistical shape models and convolutional neural networks

    Get PDF
    For the one billion sufferers of respiratory disease, managing their disease with inhalers crucially influences their quality of life. Generic treatment plans could be improved with the aid of computational models that account for patient-specific features such as breathing pattern, lung pathology and morphology. Therefore, we aim to develop and validate an automated computational framework for patient-specific deposition modelling. To that end, an image processing approach is proposed that could produce 3D patient respiratory geometries from 2D chest X-rays and 3D CT images. We evaluated the airway and lung morphology produced by our image processing framework, and assessed deposition compared to in vivo data. The 2D-to-3D image processing reproduces airway diameter to 9% median error compared to ground truth segmentations, but is sensitive to outliers of up to 33% due to lung outline noise. Predicted regional deposition gave 5% median error compared to in vivo measurements. The proposed framework is capable of providing patient-specific deposition measurements for varying treatments, to determine which treatment would best satisfy the needs imposed by each patient (such as disease and lung/airway morphology). Integration of patient-specific modelling into clinical practice as an additional decision-making tool could optimise treatment plans and lower the burden of respiratory diseases.</p

    Vocal Fold Analysis From High Speed Videoendoscopic Data

    Get PDF
    High speed videoendoscopy (HSV) of the larynx far surpasses the limits of videostroboscopy in evaluating the vocal fold vibratory behavior by providing much higher frame rate. HSV enables the visualization of vocal fold vibratory pattern within an actual glottic cycle. This very detailed infor-mation on vocal fold vibratory characteristics could provide valuable information for the assessment of vocal fold vibratory function in disordered voices and the treatments effects of the behavioral, medical and surgical treatment procedures. In this work, we aim at addressing the problem of classi-fying voice disorders with varying etiology by following four steps described shortly. Our method-ology starts with glottis segmentation. Given a HSV data, the contour of the glottal opening area in each frame should be acquired. These contours record the vibration track of the vocal fold. After this, we obtain a reliable glottal axis that is necessary for getting certain vibratory features. The third step is the feature extraction on HSV data. In the last step, we complete the classification based on the features obtained from step 3. In this study, we first propose a novel glottis segmentation method based on simplified dynam-ic programming, which proves to be efficient and accurate. In addition, we introduce a new ap-proach for calculating the glottal axis. By comparing the proposed glottal axis determination meth-ods (modified linear regression) against state-of-the-art techniques, we demonstrate that our tech-nique is more reliable. After that, the concentration shifts to feature extraction and classification schemes. Eighteen different features are extracted and their discrimination is evaluated based on principal component analysis. Support vector machine and neural network are implemented to achieve the classification among three different types of vocal folds(normal vocal fold, unilateral vocal fold polyp, and unilateral vocal fold paralysis). The result demonstrates that the classification rates of four different tasks are all above 80%

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy

    SCALING ARTIFICIAL INTELLIGENCE IN ENDOSCOPY: FROM MODEL DEVELOPMENT TO MACHINE LEARNING OPERATIONS FRAMEWORKS

    Get PDF
    Questa tesi esplora l'integrazione dell'intelligenza artificiale (IA) in Otorinolaringoiatria – Chirurgia di Testa e Collo, concentrandosi sui progressi della computer vision per l’endoscopia e le procedure chirurgiche. La ricerca inizia con una revisione completa dello stato dell’arte dell'IA e della computer vision in questo campo, identificando aree per ulteriori sviluppi. L'obiettivo principale è stato quello di sviluppare un sistema di computer vision per l'analisi di immagini e video endoscopici. La ricerca ha coinvolto la progettazione di strumenti per la rilevazione e segmentazione di neoplasie nelle vie aerodigestive superiori (VADS) e la valutazione della motilità delle corde vocali, cruciale nella stadiazione del carcinoma laringeo. Inoltre, lo studio si è focalizzato sul potenziale dei foundation vision models, vision transformers basati su self-supervised learning, per ridurre la necessità di annotazione da parte di esperti, approccio particolarmente vantaggioso in campi con dati limitati. Inoltre, la ricerca ha incluso lo sviluppo di un'applicazione web per migliorare e velocizzare il processo di annotazione in endoscopia delle VADS, nell’ambito generale delle tecniche di MLOps. La tesi copre varie fasi della ricerca, a partire dalla definizione del quadro concettuale e della metodologia, denominata "Videomics". Include una revisione della letteratura sull'IA in endoscopia clinica, focalizzata sulla Narrow Band Imaging (NBI) e sulle reti neurali convoluzionali (CNN). Lo studio progredisce attraverso diverse fasi, dalla valutazione della qualità delle immagini endoscopiche alla caratterizzazione approfondita delle lesioni neoplastiche. Si affronta anche la necessità di standard nel reporting degli studi di computer vision in ambito medico e si valuta l'applicazione dell'IA in setting dinamici come la motilità delle corde vocali. Una parte significativa della ricerca indaga l'uso di algoritmi di computer vision generalizzati (“foundation models”) e la “commoditization” degli algoritmi di machine learning, utilizzando polipi nasali e il carcinoma orofaringeo come casi studio. Infine, la tesi discute lo sviluppo di ENDO-CLOUD, un sistema basato su cloud per l’analisi della videolaringoscopia, evidenziando le sfide e le soluzioni nella gestione dei dati e l’utilizzo su larga scala di modelli di IA nell'imaging medico.This thesis explores the integration of artificial intelligence (AI) in Otolaryngology – Head and Neck Surgery, focusing on advancements in computer vision for endoscopy and surgical procedures. It begins with a comprehensive review of AI and computer vision advancements in this field, identifying areas for further exploration. The primary aim was to develop a computer vision system for endoscopy analysis. The research involved designing tools for detecting and segmenting neoplasms in the upper aerodigestive tract (UADT) and assessing vocal fold motility, crucial in laryngeal cancer staging. Further, the study delves into the potential of vision foundation models, like vision transformers trained via self-supervision, to reduce the need for expert annotations, particularly beneficial in fields with limited cases. Additionally, the research includes the development of a web application for enhancing and speeding up the annotation process in UADT endoscopy, under the umbrella of Machine Learning Operations (MLOps). The thesis covers various phases of research, starting with defining the conceptual framework and methodology, termed "Videomics". It includes a literature review on AI in clinical endoscopy, focusing on Narrow Band Imaging (NBI) and convolutional neural networks (CNNs). The research progresses through different stages, from quality assessment of endoscopic images to in-depth characterization of neoplastic lesions. It also addresses the need for standards in medical computer vision study reporting and evaluates the application of AI in dynamic vision scenarios like vocal fold motility. A significant part of the research investigates the use of "general purpose" vision algorithms and the commoditization of machine learning algorithms, using nasal polyps and oropharyngeal cancer as case studies. Finally, the thesis discusses the development of ENDO-CLOUD, a cloud-based system for videolaryngoscopy, highlighting the challenges and solutions in data management and the large-scale deployment of AI models in medical imaging

    A Hybrid Machine-Learning-Based Method for Analytic Representation of the Vocal Fold Edges during Connected Speech

    Get PDF
    Investigating the phonatory processes in connected speech from high-speed videoendoscopy (HSV) demands the accurate detection of the vocal fold edges during vibration. The present paper proposes a new spatio-temporal technique to automatically segment vocal fold edges in HSV data during running speech. The HSV data were recorded from a vocally normal adult during a reading of the &ldquo;Rainbow Passage.&rdquo; The introduced technique was based on an unsupervised machine-learning (ML) approach combined with an active contour modeling (ACM) technique (also known as a hybrid approach). The hybrid method was implemented to capture the edges of vocal folds on different HSV kymograms, extracted at various cross-sections of vocal folds during vibration. The k-means clustering method, an ML approach, was first applied to cluster the kymograms to identify the clustered glottal area and consequently provided an initialized contour for the ACM. The ACM algorithm was then used to precisely detect the glottal edges of the vibrating vocal folds. The developed algorithm was able to accurately track the vocal fold edges across frames with low computational cost and high robustness against image noise. This algorithm offers a fully automated tool for analyzing the vibratory features of vocal folds in connected speech

    Unveiling healthcare data archiving: Exploring the role of artificial intelligence in medical image analysis

    Get PDF
    Gli archivi sanitari digitali possono essere considerati dei moderni database progettati per immagazzinare e gestire ingenti quantità di informazioni mediche, dalle cartelle cliniche dei pazienti, a studi clinici fino alle immagini mediche e a dati genomici. I dati strutturati e non strutturati che compongono gli archivi sanitari sono oggetto di scrupolose e rigorose procedure di validazione per garantire accuratezza, affidabilità e standardizzazione a fini clinici e di ricerca. Nel contesto di un settore sanitario in continua e rapida evoluzione, l’intelligenza artificiale (IA) si propone come una forza trasformativa, capace di riformare gli archivi sanitari digitali migliorando la gestione, l’analisi e il recupero di vasti set di dati clinici, al fine di ottenere decisioni cliniche più informate e ripetibili, interventi tempestivi e risultati migliorati per i pazienti. Tra i diversi dati archiviati, la gestione e l’analisi delle immagini mediche in archivi digitali presentano numerose sfide dovute all’eterogeneità dei dati, alla variabilità della qualità delle immagini, nonché alla mancanza di annotazioni. L’impiego di soluzioni basate sull’IA può aiutare a risolvere efficacemente queste problematiche, migliorando l’accuratezza dell’analisi delle immagini, standardizzando la qualità dei dati e facilitando la generazione di annotazioni dettagliate. Questa tesi ha lo scopo di utilizzare algoritmi di IA per l’analisi di immagini mediche depositate in archivi sanitari digitali. Il presente lavoro propone di indagare varie tecniche di imaging medico, ognuna delle quali è caratterizzata da uno specifico dominio di applicazione e presenta quindi un insieme unico di sfide, requisiti e potenziali esiti. In particolare, in questo lavoro di tesi sarà oggetto di approfondimento l’assistenza diagnostica degli algoritmi di IA per tre diverse tecniche di imaging, in specifici scenari clinici: i) Immagini endoscopiche ottenute durante esami di laringoscopia; ciò include un’esplorazione approfondita di tecniche come la detection di keypoints per la stima della motilità delle corde vocali e la segmentazione di tumori del tratto aerodigestivo superiore; ii) Immagini di risonanza magnetica per la segmentazione dei dischi intervertebrali, per la diagnosi e il trattamento di malattie spinali, così come per lo svolgimento di interventi chirurgici guidati da immagini; iii) Immagini ecografiche in ambito reumatologico, per la valutazione della sindrome del tunnel carpale attraverso la segmentazione del nervo mediano. Le metodologie esposte in questo lavoro evidenziano l’efficacia degli algoritmi di IA nell’analizzare immagini mediche archiviate. I progressi metodologici ottenuti sottolineano il notevole potenziale dell’IA nel rivelare informazioni implicitamente presenti negli archivi sanitari digitali
    • 

    corecore