46 research outputs found

    Segment Anything Model for Medical Images?

    Full text link
    The Segment Anything Model (SAM) is the first foundation model for general image segmentation. It designed a novel promotable segmentation task, ensuring zero-shot image segmentation using the pre-trained model via two main modes including automatic everything and manual prompt. SAM has achieved impressive results on various natural image segmentation tasks. However, medical image segmentation (MIS) is more challenging due to the complex modalities, fine anatomical structures, uncertain and complex object boundaries, and wide-range object scales. SAM has achieved impressive results on various natural image segmentation tasks. Meanwhile, zero-shot and efficient MIS can well reduce the annotation time and boost the development of medical image analysis. Hence, SAM seems to be a potential tool and its performance on large medical datasets should be further validated. We collected and sorted 52 open-source datasets, and build a large medical segmentation dataset with 16 modalities, 68 objects, and 553K slices. We conducted a comprehensive analysis of different SAM testing strategies on the so-called COSMOS 553K dataset. Extensive experiments validate that SAM performs better with manual hints like points and boxes for object perception in medical images, leading to better performance in prompt mode compared to everything mode. Additionally, SAM shows remarkable performance in some specific objects and modalities, but is imperfect or even totally fails in other situations. Finally, we analyze the influence of different factors (e.g., the Fourier-based boundary complexity and size of the segmented objects) on SAM's segmentation performance. Extensive experiments validate that SAM's zero-shot segmentation capability is not sufficient to ensure its direct application to the MIS.Comment: 23 pages, 14 figures, 12 table

    Machine learning strategies for diagnostic imaging support on histopathology and optical coherence tomography

    Full text link
    Tesis por compendio[ES] Esta tesis presenta soluciones de vanguardia basadas en algoritmos de computer vision (CV) y machine learning (ML) para ayudar a los expertos en el diagnóstico clínico. Se centra en dos áreas relevantes en el campo de la imagen médica: la patología digital y la oftalmología. Este trabajo propone diferentes paradigmas de machine learning y deep learning para abordar diversos escenarios de supervisión en el estudio del cáncer de próstata, el cáncer de vejiga y el glaucoma. En particular, se consideran métodos supervisados convencionales para segmentar y clasificar estructuras específicas de la próstata en imágenes histológicas digitalizadas. Para el reconocimiento de patrones específicos de la vejiga, se llevan a cabo enfoques totalmente no supervisados basados en técnicas de deep-clustering. Con respecto a la detección del glaucoma, se aplican algoritmos de memoria a corto plazo (LSTMs) que permiten llevar a cabo un aprendizaje recurrente a partir de volúmenes de tomografía por coherencia óptica en el dominio espectral (SD-OCT). Finalmente, se propone el uso de redes neuronales prototípicas (PNN) en un marco de few-shot learning para determinar el nivel de gravedad del glaucoma a partir de imágenes OCT circumpapilares. Los métodos de inteligencia artificial (IA) que se detallan en esta tesis proporcionan una valiosa herramienta de ayuda al diagnóstico por imagen, ya sea para el diagnóstico histológico del cáncer de próstata y vejiga o para la evaluación del glaucoma a partir de datos de OCT.[CA] Aquesta tesi presenta solucions d'avantguarda basades en algorismes de *computer *vision (CV) i *machine *learning (ML) per a ajudar als experts en el diagnòstic clínic. Se centra en dues àrees rellevants en el camp de la imatge mèdica: la patologia digital i l'oftalmologia. Aquest treball proposa diferents paradigmes de *machine *learning i *deep *learning per a abordar diversos escenaris de supervisió en l'estudi del càncer de pròstata, el càncer de bufeta i el glaucoma. En particular, es consideren mètodes supervisats convencionals per a segmentar i classificar estructures específiques de la pròstata en imatges histològiques digitalitzades. Per al reconeixement de patrons específics de la bufeta, es duen a terme enfocaments totalment no supervisats basats en tècniques de *deep-*clustering. Respecte a la detecció del glaucoma, s'apliquen algorismes de memòria a curt termini (*LSTMs) que permeten dur a terme un aprenentatge recurrent a partir de volums de tomografia per coherència òptica en el domini espectral (SD-*OCT). Finalment, es proposa l'ús de xarxes neuronals *prototípicas (*PNN) en un marc de *few-*shot *learning per a determinar el nivell de gravetat del glaucoma a partir d'imatges *OCT *circumpapilares. Els mètodes d'intel·ligència artificial (*IA) que es detallen en aquesta tesi proporcionen una valuosa eina d'ajuda al diagnòstic per imatge, ja siga per al diagnòstic histològic del càncer de pròstata i bufeta o per a l'avaluació del glaucoma a partir de dades d'OCT.[EN] This thesis presents cutting-edge solutions based on computer vision (CV) and machine learning (ML) algorithms to assist experts in clinical diagnosis. It focuses on two relevant areas at the forefront of medical imaging: digital pathology and ophthalmology. This work proposes different machine learning and deep learning paradigms to address various supervisory scenarios in the study of prostate cancer, bladder cancer and glaucoma. In particular, conventional supervised methods are considered for segmenting and classifying prostate-specific structures in digitised histological images. For bladder-specific pattern recognition, fully unsupervised approaches based on deep-clustering techniques are carried out. Regarding glaucoma detection, long-short term memory algorithms (LSTMs) are applied to perform recurrent learning from spectral-domain optical coherence tomography (SD-OCT) volumes. Finally, the use of prototypical neural networks (PNNs) in a few-shot learning framework is proposed to determine the severity level of glaucoma from circumpapillary OCT images. The artificial intelligence (AI) methods detailed in this thesis provide a valuable tool to aid diagnostic imaging, whether for the histological diagnosis of prostate and bladder cancer or glaucoma assessment from OCT data.García Pardo, JG. (2022). Machine learning strategies for diagnostic imaging support on histopathology and optical coherence tomography [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/182400Compendi

    Multi-modal imaging in Ophthalmology: image processing methods for improving intra-ocular tumor treatment via MRI and Fundus image photography

    Get PDF
    The most common ocular tumors in the eye are retinoblastoma and uveal melanoma, affecting children and adults respectively, and spreading throughout the body if left untreated. To date, detection and treatment of such tumors rely mainly on two imaging modalities: Fundus Image Photography (Fundus) and Ultrasound (US), however, other image modalities such as Magnetic Resonance Imaging (MRI) and Computed Tomography (CT) are key to confirm a possible tumor spread outside the eye cavity. Current procedures to select the best treatment and follow-up are based on manual multimodal measures taken by clinicians. These tasks often require the manual annotation and delineation of eye structures and tumors, a rather tedious and time consuming endeavour, to be performed in multiple medical sequences simultaneously. ################################ This work presents a new set of image processing methods for improving multimodal evaluation of intra-ocular tumors in 3D MRI and 2D Fundus. We first introduce a novel technique for the automatic delineation of ocular structures and tumors in the 3D MRI. To this end, we present an Active Shape Model (ASM) built out of a dataset of healthy patients to demonstrate that the segmentation of ocular structures (e.g. the lens, the vitreous humor, the cornea and the sclera) can be performed in an accurate and robust manner. To validate these findings, we introduce a set of experiments to test the model performance on eyes with presence of endophytic retinoblastoma, and discover that the segmentation of healthy eye structures is possible, regardless of the presence of the tumor inside the eyes. Moreover, we propose a specific set of Eye Patient-specific eye features that can be extracted -- Le rétinoblastome et le mélanome uvéal sont les types de cancer oculaire les plus communs, touchant les enfants et adultes respectivement, et peuvent se répandre à travers l’organisme s’ils ne sont pas traités. Actuellement, le traitement pour la détection du rétinoblastome se base essentiellement à partir de deux modalites d’imagerie fond d’œil (Fundus) et l’ultrason (US). Cependant, d’autres modalités d’imagerie comme l’Imagerie par Résonance magnétique (IRM) et la Tomodensitométrie (TDM) sont clé pour confirmer la possible expansion du cancer en dehors de la cavité oculaire. Les techniques utilisées pour déterminer la tumeur oculaire, ainsi que le choix du traitement, se basent sur des mesures multimodales réalisées de manière manuelle par des médecins. Cette méthodologie manuelle est appliquée quotidiennement et continuellement pendant toute la durée de la maladie. Ce processus nécessite souvent la délinéation manuelle des structures ocularies et de la tumeur, un mécanisme laborieux et long, effectuée dans des multiples séquences médicales simultanées (par exemple : T1-weighted et T2-weighted IRM ...) qui augmentent la difficulté pour évaluer la maladie. Le présent travail présente une nouvelle série de techniques permettant d’améliorer l´évaluation multimodale de tumeurs oculaires en IRM et Fundus. Dans un premier temps, nous intro- duisons une méthode qui assure la délinéation automatique de la structure oculaire et de la tumeur dans un IRM 3D. Pour cela, nous présentons un Active Shape Model (ASM) construite à partir d’un ensemble de données de patients en bonne santé pour prouver que la segmenta- tion automatique de la structure oculaire (par exemple : le cristallin, l´humeur aqueuse, la cornée et la sclère) peut être réalisée de manière précise et robuste. Afin de valider ces résultats, nous introduisons un ensemble d’essais pour tester la performance du modèle par rapport à des yeux de patients affectés pathologiquement par un rétinoblastome, et démontrons que la segmentation de la structure oculaire d’un œil sain est possible, indépendamment de la présence d’une tumeur à l’intérieur des yeux. De plus, nous proposons une caractérisation spécifique du patient-specific eye features qui peuvent être utile pour la segmentation de l’œil dans l’IRM 3D, fournissant des formes riches et une information importante concernant le tissu pathologique noyé dans la structure oculaire de l’œil sain. Cette information est ultérieurement utilisée pour entrainer un ensemble de classificateurs (Convolutional Neural Network (CNN), Random Forest, . . . ) qui réalise la segmentation automatique de tumeurs oculaires à l’intérieur de l’œil. En outre, nous explorons une nouvelle méthode pour évaluer des multitudes de séquences d’images de manière simultanée, fournissant aux médecins un outil pour observer l’extension de la tumeur dans le fond d’œil et l’IRM. Pour cela, nous combinons la segmentation auto- matique de l’œil de l’IRM selon la description ci-dessus et nous proposons une delineation manuelle de tumeurs oculaires dans le fond d’œil. Ensuite, nous recalons ces deux modalités d’imagerie avec une nouvelle base de points de repère et nous réalisons la fusion des deux modalités. Nous utilisons cette nouvelle méthode pour (i) améliorer la qualité de la délinéation dans l’IRM et pour (ii) utiliser la projection arrière de la tumeur pour transporter de riches me- sures volumétriques de l’IRM vers le fond d’œil, en créant une nouvelle forme 3D représentant le fond d’œil 2D dans une méthode que nous appelons Topographic Fundus Mapping. Pour tous les tests et contributions, nous validons les résultats avec une base de données d’IRM et une base de données d’images pathologiques du fond d’œil de rétinoblastome

    Towards markerless orthopaedic navigation with intuitive Optical See-through Head-mounted displays

    Get PDF
    The potential of image-guided orthopaedic navigation to improve surgical outcomes has been well-recognised during the last two decades. According to the tracked pose of target bone, the anatomical information and preoperative plans are updated and displayed to surgeons, so that they can follow the guidance to reach the goal with higher accuracy, efficiency and reproducibility. Despite their success, current orthopaedic navigation systems have two main limitations: for target tracking, artificial markers have to be drilled into the bone and calibrated manually to the bone, which introduces the risk of additional harm to patients and increases operating complexity; for guidance visualisation, surgeons have to shift their attention from the patient to an external 2D monitor, which is disruptive and can be mentally stressful. Motivated by these limitations, this thesis explores the development of an intuitive, compact and reliable navigation system for orthopaedic surgery. To this end, conventional marker-based tracking is replaced by a novel markerless tracking algorithm, and the 2D display is replaced by a 3D holographic Optical see-through (OST) Head-mounted display (HMD) precisely calibrated to a user's perspective. Our markerless tracking, facilitated by a commercial RGBD camera, is achieved through deep learning-based bone segmentation followed by real-time pose registration. For robust segmentation, a new network is designed and efficiently augmented by a synthetic dataset. Our segmentation network outperforms the state-of-the-art regarding occlusion-robustness, device-agnostic behaviour, and target generalisability. For reliable pose registration, a novel Bounded Iterative Closest Point (BICP) workflow is proposed. The improved markerless tracking can achieve a clinically acceptable error of 0.95 deg and 2.17 mm according to a phantom test. OST displays allow ubiquitous enrichment of perceived real world with contextually blended virtual aids through semi-transparent glasses. They have been recognised as a suitable visual tool for surgical assistance, since they do not hinder the surgeon's natural eyesight and require no attention shift or perspective conversion. The OST calibration is crucial to ensure locational-coherent surgical guidance. Current calibration methods are either human error-prone or hardly applicable to commercial devices. To this end, we propose an offline camera-based calibration method that is highly accurate yet easy to implement in commercial products, and an online alignment-based refinement that is user-centric and robust against user error. The proposed methods are proven to be superior to other similar State-of- the-art (SOTA)s regarding calibration convenience and display accuracy. Motivated by the ambition to develop the world's first markerless OST navigation system, we integrated the developed markerless tracking and calibration scheme into a complete navigation workflow designed for femur drilling tasks during knee replacement surgery. We verify the usability of our designed OST system with an experienced orthopaedic surgeon by a cadaver study. Our test validates the potential of the proposed markerless navigation system for surgical assistance, although further improvement is required for clinical acceptance.Open Acces

    Augmentation Of Human Skill In Microsurgery

    Get PDF
    Surgeons performing highly skilled microsurgery tasks can benefit from information and manual assistance to overcome technological and physiological limitations to make surgery safer, efficient, and more successful. Vitreoretinal surgery is particularly difficult due to inherent micro-scale and fragility of human eye anatomy. Additionally, surgeons are challenged by physiological hand tremor, poor visualization, lack of force sensing, and significant cognitive load while executing high-risk procedures inside the eye, such as epiretinal membrane peeling. This dissertation presents the architecture and the design principles for a surgical augmentation environment which is used to develop innovative functionality to address the fundamental limitations in vitreoretinal surgery. It is an inherently information driven modular system incorporating robotics, sensors, and multimedia components. The integrated nature of the system is leveraged to create intuitive and relevant human-machine interfaces and generate a particular system behavior to provide active physical assistance and present relevant sensory information to the surgeon. These include basic manipulation assistance, audio-visual and haptic feedback, intraoperative imaging and force sensing. The resulting functionality, and the proposed architecture and design methods generalize to other microsurgical procedures. The system's performance is demonstrated and evaluated using phantoms and in vivo experiments

    A Modular and Open-Source Framework for Virtual Reality Visualisation and Interaction in Bioimaging

    Get PDF
    Life science today involves computational analysis of a large amount and variety of data, such as volumetric data acquired by state-of-the-art microscopes, or mesh data from analysis of such data or simulations. The advent of new imaging technologies, such as lightsheet microscopy, has resulted in the users being confronted with an ever-growing amount of data, with even terabytes of imaging data created within a day. With the possibility of gentler and more high-performance imaging, the spatiotemporal complexity of the model systems or processes of interest is increasing as well. Visualisation is often the first step in making sense of this data, and a crucial part of building and debugging analysis pipelines. It is therefore important that visualisations can be quickly prototyped, as well as developed or embedded into full applications. In order to better judge spatiotemporal relationships, immersive hardware, such as Virtual or Augmented Reality (VR/AR) headsets and associated controllers are becoming invaluable tools. In this work we present scenery, a modular and extensible visualisation framework for the Java VM that can handle mesh and large volumetric data, containing multiple views, timepoints, and color channels. scenery is free and open-source software, works on all major platforms, and uses the Vulkan or OpenGL rendering APIs. We introduce scenery's main features, and discuss its use with VR/AR hardware and in distributed rendering. In addition to the visualisation framework, we present a series of case studies, where scenery can provide tangible benefit in developmental and systems biology: With Bionic Tracking, we demonstrate a new technique for tracking cells in 4D volumetric datasets via tracking eye gaze in a virtual reality headset, with the potential to speed up manual tracking tasks by an order of magnitude. We further introduce ideas to move towards virtual reality-based laser ablation and perform a user study in order to gain insight into performance, acceptance and issues when performing ablation tasks with virtual reality hardware in fast developing specimen. To tame the amount of data originating from state-of-the-art volumetric microscopes, we present ideas how to render the highly-efficient Adaptive Particle Representation, and finally, we present sciview, an ImageJ2/Fiji plugin making the features of scenery available to a wider audience.:Abstract Foreword and Acknowledgements Overview and Contributions Part 1 - Introduction 1 Fluorescence Microscopy 2 Introduction to Visual Processing 3 A Short Introduction to Cross Reality 4 Eye Tracking and Gaze-based Interaction Part 2 - VR and AR for System Biology 5 scenery — VR/AR for Systems Biology 6 Rendering 7 Input Handling and Integration of External Hardware 8 Distributed Rendering 9 Miscellaneous Subsystems 10 Future Development Directions Part III - Case Studies C A S E S T U D I E S 11 Bionic Tracking: Using Eye Tracking for Cell Tracking 12 Towards Interactive Virtual Reality Laser Ablation 13 Rendering the Adaptive Particle Representation 14 sciview — Integrating scenery into ImageJ2 & Fiji Part IV - Conclusion 15 Conclusions and Outlook Backmatter & Appendices A Questionnaire for VR Ablation User Study B Full Correlations in VR Ablation Questionnaire C Questionnaire for Bionic Tracking User Study List of Tables List of Figures Bibliography Selbstständigkeitserklärun

    BIO-INSPIRED MOTION PERCEPTION: FROM GANGLION CELLS TO AUTONOMOUS VEHICLES

    Get PDF
    Animals are remarkable at navigation, even in extreme situations. Through motion perception, animals compute their own movements (egomotion) and find other objects (prey, predator, obstacles) and their motions in the environment. Analogous to animals, artificial systems such as robots also need to know where they are relative to structure and segment obstacles to avoid collisions. Even though substantial progress has been made in the development of artificial visual systems, they still struggle to achieve robust and generalizable solutions. To this end, I propose a bio-inspired framework that narrows the gap between natural and artificial systems. The standard approaches in robot motion perception seek to reconstruct a three-dimensional model of the scene and then use this model to estimate egomotion and object segmentation. However, the scene reconstruction process is data-heavy and computationally expensive and fails to deal with high-speed and dynamic scenarios. On the contrary, biological visual systems excel in the aforementioned difficult situation by extracting only minimal information sufficient for motion perception tasks. I derive minimalist/purposive ideas from biological processes throughout this thesis and develop mathematical solutions for robot motion perception problems. In this thesis, I develop a full range of solutions that utilize bio-inspired motion representation and learning approaches for motion perception tasks. Particularly, I focus on egomotion estimation and motion segmentation tasks. I have four main contributions: 1. First, I introduce NFlowNet, a neural network to estimate normal flow (bio-inspired motion filters). Normal flow estimation presents a new avenue for solving egomotion in a robust and qualitative framework. 2. Utilizing normal flow, I propose the DiffPoseNet framework to estimate egomotion by formulating the qualitative constraint in a differentiable optimization layer, which allows for end-to-end learning. 3. Further, utilizing a neuromorphic event camera, a retina-inspired vision sensor, I develop 0-MMS, a model-based optimization approach that employs event spikes to segment the scene into multiple moving parts in high-speed dynamic lighting scenarios. 4. To improve the precision of event-based motion perception across time, I develop SpikeMS, a novel bio-inspired learning approach that fully capitalizes on the rich temporal information in event spikes

    Towards Unsupervised Domain Adaptation for Diabetic Retinopathy Detection in the Tromsø Eye Study

    Get PDF
    Diabetic retinopathy (DR) is an eye disease which affects a third of the diabetic population. It is a preventable disease, but requires early detection for efficient treatment. While there has been increasing interest in applying deep learning techniques for DR detection in order to aid practitioners make more accurate diagnosis, these efforts are mainly focused on datasets that have been collected or created with ML in mind. In this thesis, however, we take a look at two particular datasets that have been collected at the University Hospital of North-Norway - UNN. These datasets have inherent problems that motivate the methodological choices in this work such as a variable number of input images and domain shift. We therefore contribute a multi-stream model for DR classification. The multi-stream model can model dependency across different images, can take in a variable of input of any size, is general in its detection such that the image processing is equal no matter which stream the image enters, and is compatible with the domain adaptation method ADDA, but we argue the model is compatible with many other methods. As a remedy for these problems, we propose a multi-stream deep learning architecture that is uniquely tailored to these datasets and illustrate how domain adaptation might be utilized within the framework to learn efficiently in the presence of domain shift. Our experiments demonstrates the models properties empirically, and shows it can deal with each of the presented problems. The model this paper contributes is a first step towards DR detection from these local datasets and, in the bigger picture, similar datasets worldwide

    Handbook of Vascular Biometrics

    Get PDF
    corecore