368 research outputs found

    An Efficient Approach for Polyps Detection in Endoscopic Videos Based on Faster R-CNN

    Full text link
    Polyp has long been considered as one of the major etiologies to colorectal cancer which is a fatal disease around the world, thus early detection and recognition of polyps plays a crucial role in clinical routines. Accurate diagnoses of polyps through endoscopes operated by physicians becomes a challenging task not only due to the varying expertise of physicians, but also the inherent nature of endoscopic inspections. To facilitate this process, computer-aid techniques that emphasize fully-conventional image processing and novel machine learning enhanced approaches have been dedicatedly designed for polyp detection in endoscopic videos or images. Among all proposed algorithms, deep learning based methods take the lead in terms of multiple metrics in evolutions for algorithmic performance. In this work, a highly effective model, namely the faster region-based convolutional neural network (Faster R-CNN) is implemented for polyp detection. In comparison with the reported results of the state-of-the-art approaches on polyps detection, extensive experiments demonstrate that the Faster R-CNN achieves very competing results, and it is an efficient approach for clinical practice.Comment: 6 pages, 10 figures,2018 International Conference on Pattern Recognitio

    Learning-based depth and pose prediction for 3D scene reconstruction in endoscopy

    Get PDF
    Colorectal cancer is the third most common cancer worldwide. Early detection and treatment of pre-cancerous tissue during colonoscopy is critical to improving prognosis. However, navigating within the colon and inspecting the endoluminal tissue comprehensively are challenging, and success in both varies based on the endoscopist's skill and experience. Computer-assisted interventions in colonoscopy show much promise in improving navigation and inspection. For instance, 3D reconstruction of the colon during colonoscopy could promote more thorough examinations and increase adenoma detection rates which are associated with improved survival rates. Given the stakes, this thesis seeks to advance the state of research from feature-based traditional methods closer to a data-driven 3D reconstruction pipeline for colonoscopy. More specifically, this thesis explores different methods that improve subtasks of learning-based 3D reconstruction. The main tasks are depth prediction and camera pose estimation. As training data is unavailable, the author, together with her co-authors, proposes and publishes several synthetic datasets and promotes domain adaptation models to improve applicability to real data. We show, through extensive experiments, that our depth prediction methods produce more robust results than previous work. Our pose estimation network trained on our new synthetic data outperforms self-supervised methods on real sequences. Our box embeddings allow us to interpret the geometric relationship and scale difference between two images of the same surface without the need for feature matches that are often unobtainable in surgical scenes. Together, the methods introduced in this thesis help work towards a complete, data-driven 3D reconstruction pipeline for endoscopy

    SCALING ARTIFICIAL INTELLIGENCE IN ENDOSCOPY: FROM MODEL DEVELOPMENT TO MACHINE LEARNING OPERATIONS FRAMEWORKS

    Get PDF
    Questa tesi esplora l'integrazione dell'intelligenza artificiale (IA) in Otorinolaringoiatria – Chirurgia di Testa e Collo, concentrandosi sui progressi della computer vision per l’endoscopia e le procedure chirurgiche. La ricerca inizia con una revisione completa dello stato dell’arte dell'IA e della computer vision in questo campo, identificando aree per ulteriori sviluppi. L'obiettivo principale è stato quello di sviluppare un sistema di computer vision per l'analisi di immagini e video endoscopici. La ricerca ha coinvolto la progettazione di strumenti per la rilevazione e segmentazione di neoplasie nelle vie aerodigestive superiori (VADS) e la valutazione della motilità delle corde vocali, cruciale nella stadiazione del carcinoma laringeo. Inoltre, lo studio si è focalizzato sul potenziale dei foundation vision models, vision transformers basati su self-supervised learning, per ridurre la necessità di annotazione da parte di esperti, approccio particolarmente vantaggioso in campi con dati limitati. Inoltre, la ricerca ha incluso lo sviluppo di un'applicazione web per migliorare e velocizzare il processo di annotazione in endoscopia delle VADS, nell’ambito generale delle tecniche di MLOps. La tesi copre varie fasi della ricerca, a partire dalla definizione del quadro concettuale e della metodologia, denominata "Videomics". Include una revisione della letteratura sull'IA in endoscopia clinica, focalizzata sulla Narrow Band Imaging (NBI) e sulle reti neurali convoluzionali (CNN). Lo studio progredisce attraverso diverse fasi, dalla valutazione della qualità delle immagini endoscopiche alla caratterizzazione approfondita delle lesioni neoplastiche. Si affronta anche la necessità di standard nel reporting degli studi di computer vision in ambito medico e si valuta l'applicazione dell'IA in setting dinamici come la motilità delle corde vocali. Una parte significativa della ricerca indaga l'uso di algoritmi di computer vision generalizzati (“foundation models”) e la “commoditization” degli algoritmi di machine learning, utilizzando polipi nasali e il carcinoma orofaringeo come casi studio. Infine, la tesi discute lo sviluppo di ENDO-CLOUD, un sistema basato su cloud per l’analisi della videolaringoscopia, evidenziando le sfide e le soluzioni nella gestione dei dati e l’utilizzo su larga scala di modelli di IA nell'imaging medico.This thesis explores the integration of artificial intelligence (AI) in Otolaryngology – Head and Neck Surgery, focusing on advancements in computer vision for endoscopy and surgical procedures. It begins with a comprehensive review of AI and computer vision advancements in this field, identifying areas for further exploration. The primary aim was to develop a computer vision system for endoscopy analysis. The research involved designing tools for detecting and segmenting neoplasms in the upper aerodigestive tract (UADT) and assessing vocal fold motility, crucial in laryngeal cancer staging. Further, the study delves into the potential of vision foundation models, like vision transformers trained via self-supervision, to reduce the need for expert annotations, particularly beneficial in fields with limited cases. Additionally, the research includes the development of a web application for enhancing and speeding up the annotation process in UADT endoscopy, under the umbrella of Machine Learning Operations (MLOps). The thesis covers various phases of research, starting with defining the conceptual framework and methodology, termed "Videomics". It includes a literature review on AI in clinical endoscopy, focusing on Narrow Band Imaging (NBI) and convolutional neural networks (CNNs). The research progresses through different stages, from quality assessment of endoscopic images to in-depth characterization of neoplastic lesions. It also addresses the need for standards in medical computer vision study reporting and evaluates the application of AI in dynamic vision scenarios like vocal fold motility. A significant part of the research investigates the use of "general purpose" vision algorithms and the commoditization of machine learning algorithms, using nasal polyps and oropharyngeal cancer as case studies. Finally, the thesis discusses the development of ENDO-CLOUD, a cloud-based system for videolaryngoscopy, highlighting the challenges and solutions in data management and the large-scale deployment of AI models in medical imaging

    Vision-based retargeting for endoscopic navigation

    Get PDF
    Endoscopy is a standard procedure for visualising the human gastrointestinal tract. With the advances in biophotonics, imaging techniques such as narrow band imaging, confocal laser endomicroscopy, and optical coherence tomography can be combined with normal endoscopy for assisting the early diagnosis of diseases, such as cancer. In the past decade, optical biopsy has emerged to be an effective tool for tissue analysis, allowing in vivo and in situ assessment of pathological sites with real-time feature-enhanced microscopic images. However, the non-invasive nature of optical biopsy leads to an intra-examination retargeting problem, which is associated with the difficulty of re-localising a biopsied site consistently throughout the whole examination. In addition to intra-examination retargeting, retargeting of a pathological site is even more challenging across examinations, due to tissue deformation and changing tissue morphologies and appearances. The purpose of this thesis is to address both the intra- and inter-examination retargeting problems associated with optical biopsy. We propose a novel vision-based framework for intra-examination retargeting. The proposed framework is based on combining visual tracking and detection with online learning of the appearance of the biopsied site. Furthermore, a novel cascaded detection approach based on random forests and structured support vector machines is developed to achieve efficient retargeting. To cater for reliable inter-examination retargeting, the solution provided in this thesis is achieved by solving an image retrieval problem, for which an online scene association approach is proposed to summarise an endoscopic video collected in the first examination into distinctive scenes. A hashing-based approach is then used to learn the intrinsic representations of these scenes, such that retargeting can be achieved in subsequent examinations by retrieving the relevant images using the learnt representations. For performance evaluation of the proposed frameworks, extensive phantom, ex vivo and in vivo experiments have been conducted, with results demonstrating the robustness and potential clinical values of the methods proposed.Open Acces

    Towards real-world clinical colonoscopy deep learning models for video-based bowel preparation and generalisable polyp segmentation

    Get PDF
    Colorectal cancer is the most prevalence type of cancers within the digestive system. Early screening and removal of precancerous growths in the colon decrease mortality rate. The golden standard screening type for colon is colonoscopy which is conducted by a medical expert (i.e., colonoscopist). Nevertheless, due to human biases, fatigue, and experience level of the colonoscopist, colorectal cancer missing rate is negatively affected. Artificial intelligence (AI) methods hold immense promise not just in automating colonoscopy tasks but also enhancing the performance of colonoscopy screening in general. The recent development of intense computational GPUs enabled a computational-demanding AI method (i.e., deep learning) to be utilised in various medical applications. However, given the gap between the clinical-practice and the proposed deep learning models in the literature, the actual effectiveness of such methods is questionable. Hence, this thesis highlights such gaps that arises from the separation between the theoretical and practical aspect of deep learning methods applied to colonoscopy. The aim is to evaluate the current state of deep learning models applied in colonoscopy from a clinical angle, and accordingly propose better evaluation strategies and deep learning models. The aim is translated into three distinct objectives. The first objective is to develop a systematic evaluation method to assess deep learning models from a clinical perspective. The second objective is to develop a novel deep learning architecture that leverages spatial information within colonoscopy videos to enhance the effectiveness of deep learning models on real-clinical environments. The third objective is to enhance the generalisability of deep learning models on unseen test images by developing a novel deep learning framework. To translate these objectives into practice, two critical colonoscopy tasks, namely, automatic bowel preparation and polyp segmentation are attacked. In both tasks, subtle overestimations are found in the literature and discussed in the thesis theoretically and demonstrated empirically. These overestimations are induced by improper validation sets that would not appear or represent the real-world clinical environment. Arbitrary dividing colonoscopy datasets to do deep learning evaluation can result in producing similar distributions, hence, achieving unrealistic results. Accordingly, these factors are considered in the thesis to avoid such subtle overestimation. For the automatic bowel preparation task, colonoscopy videos that closely resemble clinical settings are considered as input and accordingly it necessitates the design of the proposed model as well as evaluation experiments. The proposed model’s architecture is designed to utilise both temporal and spatial information within colonoscopy videos using Gated Recurrent Unit (GRU) and a proposed Multiplexer unit, respectively. Meanwhile for the polyp segmentation task, the efficiency of current deep learning models is tested in terms of their generalisation capabilities using unseen test sets from different medical centres. The proposed framework consists of two connected models. The first model is responsible for gradually transforming textures of input images and arbitrary change their colours. Meanwhile the second model is a segmentation model that outlines polyp regions. Exposing the segmentation model to such transformed images acquires the segmentation model texture/colour invariant properties, hence, enhances the generalisability of the segmentation model. In this thesis, rigorous experiments are conducted to evaluate the proposed models against the state-of-the-art models. The yielded results indicate that the proposed models outperformed the state-of-the-art models under different settings
    • 

    corecore