374 research outputs found

    DA-TransUNet: Integrating Spatial and Channel Dual Attention with Transformer U-Net for Medical Image Segmentation

    Full text link
    Accurate medical image segmentation is critical for disease quantification and treatment evaluation. While traditional Unet architectures and their transformer-integrated variants excel in automated segmentation tasks. However, they lack the ability to harness the intrinsic position and channel features of image. Existing models also struggle with parameter efficiency and computational complexity, often due to the extensive use of Transformers. To address these issues, this study proposes a novel deep medical image segmentation framework, called DA-TransUNet, aiming to integrate the Transformer and dual attention block(DA-Block) into the traditional U-shaped architecture. Unlike earlier transformer-based U-net models, DA-TransUNet utilizes Transformers and DA-Block to integrate not only global and local features, but also image-specific positional and channel features, improving the performance of medical image segmentation. By incorporating a DA-Block at the embedding layer and within each skip connection layer, we substantially enhance feature extraction capabilities and improve the efficiency of the encoder-decoder structure. DA-TransUNet demonstrates superior performance in medical image segmentation tasks, consistently outperforming state-of-the-art techniques across multiple datasets. In summary, DA-TransUNet offers a significant advancement in medical image segmentation, providing an effective and powerful alternative to existing techniques. Our architecture stands out for its ability to improve segmentation accuracy, thereby advancing the field of automated medical image diagnostics. The codes and parameters of our model will be publicly available at https://github.com/SUN-1024/DA-TransUnet

    Text Promptable Surgical Instrument Segmentation with Vision-Language Models

    Get PDF
    In this paper, we propose a novel text promptable surgical instrument segmentation approach to overcome challenges associated with diversity and differentiation of surgical instruments in minimally invasive surgeries. We redefine the task as text promptable, thereby enabling a more nuanced comprehension of surgical instruments and adaptability to new instrument types. Inspired by recent advancements in vision-language models, we leverage pretrained image and text encoders as our model backbone and design a text promptable mask decoder consisting of attention- and convolution-based prompting schemes for surgical instrument segmentation prediction. Our model leverages multiple text prompts for each surgical instrument through a new mixture of prompts mechanism, resulting in enhanced segmentation performance. Additionally, we introduce a hard instrument area reinforcement module to improve image feature comprehension and segmentation precision. Extensive experiments on several surgical instrument segmentation datasets demonstrate our model's superior performance and promising generalization capability. To our knowledge, this is the first implementation of a promptable approach to surgical instrument segmentation, offering significant potential for practical application in the field of robotic-assisted surgery. Code is available at https://github.com/franciszzj/TP-SIS

    From Fully-Supervised Single-Task to Semi-Supervised Multi-Task Deep Learning Architectures for Segmentation in Medical Imaging Applications

    Get PDF
    Medical imaging is routinely performed in clinics worldwide for the diagnosis and treatment of numerous medical conditions in children and adults. With the advent of these medical imaging modalities, radiologists can visualize both the structure of the body as well as the tissues within the body. However, analyzing these high-dimensional (2D/3D/4D) images demands a significant amount of time and effort from radiologists. Hence, there is an ever-growing need for medical image computing tools to extract relevant information from the image data to help radiologists perform efficiently. Image analysis based on machine learning has pivotal potential to improve the entire medical imaging pipeline, providing support for clinical decision-making and computer-aided diagnosis. To be effective in addressing challenging image analysis tasks such as classification, detection, registration, and segmentation, specifically for medical imaging applications, deep learning approaches have shown significant improvement in performance. While deep learning has shown its potential in a variety of medical image analysis problems including segmentation, motion estimation, etc., generalizability is still an unsolved problem and many of these successes are achieved at the cost of a large pool of datasets. For most practical applications, getting access to a copious dataset can be very difficult, often impossible. Annotation is tedious and time-consuming. This cost is further amplified when annotation must be done by a clinical expert in medical imaging applications. Additionally, the applications of deep learning in the real-world clinical setting are still limited due to the lack of reliability caused by the limited prediction capabilities of some deep learning models. Moreover, while using a CNN in an automated image analysis pipeline, it’s critical to understand which segmentation results are problematic and require further manual examination. To this extent, the estimation of uncertainty calibration in a semi-supervised setting for medical image segmentation is still rarely reported. This thesis focuses on developing and evaluating optimized machine learning models for a variety of medical imaging applications, ranging from fully-supervised, single-task learning to semi-supervised, multi-task learning that makes efficient use of annotated training data. The contributions of this dissertation are as follows: (1) developing a fully-supervised, single-task transfer learning for the surgical instrument segmentation from laparoscopic images; and (2) utilizing supervised, single-task, transfer learning for segmenting and digitally removing the surgical instruments from endoscopic/laparoscopic videos to allow the visualization of the anatomy being obscured by the tool. The tool removal algorithms use a tool segmentation mask and either instrument-free reference frames or previous instrument-containing frames to fill in (inpaint) the instrument segmentation mask; (3) developing fully-supervised, single-task learning via efficient weight pruning and learned group convolution for accurate left ventricle (LV), right ventricle (RV) blood pool and myocardium localization and segmentation from 4D cine cardiac MR images; (4) demonstrating the use of our fully-supervised memory-efficient model to generate dynamic patient-specific right ventricle (RV) models from cine cardiac MRI dataset via an unsupervised learning-based deformable registration field; and (5) integrating a Monte Carlo dropout into our fully-supervised memory-efficient model with inherent uncertainty estimation, with the overall goal to estimate the uncertainty associated with the obtained segmentation and error, as a means to flag regions that feature less than optimal segmentation results; (6) developing semi-supervised, single-task learning via self-training (through meta pseudo-labeling) in concert with a Teacher network that instructs the Student network by generating pseudo-labels given unlabeled input data; (7) proposing largely-unsupervised, multi-task learning to demonstrate the power of a simple combination of a disentanglement block, variational autoencoder (VAE), generative adversarial network (GAN), and a conditioning layer-based reconstructor for performing two of the foremost critical tasks in medical imaging — segmentation of cardiac structures and reconstruction of the cine cardiac MR images; (8) demonstrating the use of 3D semi-supervised, multi-task learning for jointly learning multiple tasks in a single backbone module – uncertainty estimation, geometric shape generation, and cardiac anatomical structure segmentation of the left atrial cavity from 3D Gadolinium-enhanced magnetic resonance (GE-MR) images. This dissertation summarizes the impact of the contributions of our work in terms of demonstrating the adaptation and use of deep learning architectures featuring different levels of supervision to build a variety of image segmentation tools and techniques that can be used across a wide spectrum of medical image computing applications centered on facilitating and promoting the wide-spread computer-integrated diagnosis and therapy data science

    A Comparative Study of Spatio-Temporal U-Nets for Tissue Segmentation in Surgical Robotics

    Get PDF
    In surgical robotics, the ability to achieve high levels of autonomy is often limited by the complexity of the surgical scene. Autonomous interaction with soft tissues requires machines able to examine and understand the endoscopic video streams in real-time and identify the features of interest. In this work, we show the first example of spatio-temporal neural networks, based on the U-Net, aimed at segmenting soft tissues in endoscopic images. The networks, equipped with Long Short-Term Memory and Attention Gate cells, can extract the correlation between consecutive frames in an endoscopic video stream, thus enhancing the segmentation’s accuracy with respect to the standard U-Net. Initially, three configurations of the spatiotemporal layers are compared to select the best architecture. Afterwards, the parameters of the network are optimised and finally the results are compared with the standard U-Net. An accuracy of 83:77%±2:18% and a precision of 78:42%±7:38% are achieved by implementing both Long Short Term Memory (LSTM) convolutional layers and Attention Gate blocks. The results, although originated in the context of surgical tissue retraction, could benefit many autonomous tasks such as ablation, suturing and debridement

    MSDESIS: Multi-task stereo disparity estimation and surgical instrument segmentation

    Get PDF
    Reconstructing the 3D geometry of the surgical site and detecting instruments within it are important tasks for surgical navigation systems and robotic surgery automation. Traditional approaches treat each problem in isolation and do not account for the intrinsic relationship between segmentation and stereo matching. In this paper, we present a learning-based framework that jointly estimates disparity and binary tool segmentation masks. The core component of our architecture is a shared feature encoder which allows strong interaction between the aforementioned tasks. Experimentally, we train two variants of our network with different capacities and explore different training schemes including both multi-task and single-task learning. Our results show that supervising the segmentation task improves our network's disparity estimation accuracy. We demonstrate a domain adaptation scheme where we supervise the segmentation task with monocular data and achieve domain adaptation of the adjacent disparity task, reducing disparity End-Point-Error and depth mean absolute error by 77.73% and 61.73% respectively compared to the pre-trained baseline model. Our best overall multi-task model, trained with both disparity and segmentation data in subsequent phases, achieves 89.15% mean Intersection-over-Union in RIS and 3.18 millimetre depth mean absolute error in SCARED test sets. Our proposed multi-task architecture is real-time, able to process (1280x1024) stereo input and simultaneously estimate disparity maps and segmentation masks at 22 frames per second. The model code and pre-trained models are made available: https://github.com/dimitrisPs/msdesis

    From Manual to Automated Design of Biomedical Semantic Segmentation Methods

    Get PDF
    Digital imaging plays an increasingly important role in clinical practice. With the number of images that are routinely acquired on the rise, the number of experts devoted to analyzing them is by far not increasing as rapidly. This alarming disparity calls for automated image analysis methods to ease the burden on the experts and prevent a degradation of the quality of care. Semantic segmentation plays a central role in extracting clinically relevant information from images, either all by themselves or as part of more elaborate pipelines, and constitutes one of the most active fields of research in medical image analysis. Thereby, the diversity of datasets is mirrored by an equally diverse number of segmentation methods, each being optimized for the datasets they are addressing. The resulting diversity of methods does not come without downsides: The specialized nature of these segmentation methods causes a dataset dependency which makes them unable to be transferred to other segmentation problems. Not only does this result in issues with out-of-the-box applicability, but it also adversely affects future method development: Improvements over baselines that are demonstrated on one dataset rarely transfer to another, testifying a lack of reproducibility and causing a frustrating literature landscape in which it is difficult to discern veritable and long lasting methodological advances from noise. We study three different segmentation tasks in depth with the goal of understanding what makes a good segmentation model and which of the recently proposed methods are truly required to obtain competitive segmentation performance. To this end, we design state of the art segmentation models for brain tumor segmentation, cardiac substructure segmentation and kidney and kidney tumor segmentation. Each of our methods is evaluated in the context of international competitions, ensuring objective performance comparison with other methods. We obtained the third place in BraTS 2017, the second place in BraTS 2018, the first place in ACDC and the first place in the highly competitive KiTS challenge. Our analysis of the four segmentation methods reveals that competitive segmentation performance for all of these tasks can be achieved with a standard, but well-tuned U-Net architecture, which is surprising given the recent focus in the literature on finding better network architectures. Furthermore, we identify certain similarities between our segmentation pipelines and notice that their dissimilarities merely reflect well-structured adaptations in response to certain dataset properties. This leads to the hypothesis that we can identify a direct relation between the properties of a dataset and the design choices that lead to a good segmentation model for it. Based on this hypothesis we develop nnU-Net, the first method that breaks the dataset dependency of traditional segmentation methods. Traditional segmentation methods must be developed by experts, going through an iterative trial-and-error process until they have identified a good segmentation pipeline for a given dataset. This process ultimately results in a fixed pipeline configuration which may be incompatible with other datasets, requiring extensive re-optimization. In contrast, nnU-Net makes use of a generalizing method template that is dynamically and automatically adapted to each dataset it is applied to. This is achieved by condensing domain knowledge about the design of segmentation methods into inductive biases. Specifically, we identify certain pipeline hyperparameters that do not need to be adapted and for which a good default value can be set for all datasets (called blueprint parameters). They are complemented with a comprehensible set of heuristic rules, which explicitly encode how the segmentation pipeline and the network architecture that is used along with it must be adapted for each dataset (inferred parameters). Finally, a limited number of design choices is determined through empirical evaluation (empirical parameters). Following the analysis of our previously designed specialized pipelines, the basic network architecture type used is the standard U-Net, coining the name of our method: nnU-Net (”No New Net”). We apply nnU-Net to 19 diverse datasets originating from segmentation competitions in the biomedical domain. Despite being applied without manual intervention, nnU-Net sets a new state of the art in 29 out of the 49 different segmentation tasks encountered in these datasets. This is remarkable considering that nnU-Net competed against specialized manually tuned algorithms on each of them. nnU-Net is the first out-of-the-box tool that makes state of the art semantic segmentation methods accessible to non-experts. As a framework, it catalyzes future method development: new design concepts can be implemented into nnU-Net and leverage its dynamic nature to be evaluated across a wide variety of datasets without the need for manual re-tuning. In conclusion, the thesis presented here exposed critical weaknesses in the current way of segmentation method development. The dataset dependency of segmentation methods impedes scientific progress by confining researchers to a subset of datasets available in the domain, causing noisy evaluation and in turn a literature landscape in which results are difficult to reproduce and true methodological advances are difficult to discern. Additionally, non-experts were barred access to state of the art segmentation for their custom datasets because method development is a time consuming trial-and-error process that needs expertise to be done correctly. We propose to address this situation with nnU-Net, a segmentation method that automatically and dynamically adapts itself to arbitrary datasets, not only making out-of-the-box segmentation available for everyone but also enabling more robust decision making in the development of segmentation methods by enabling easy and convenient evaluation across multiple datasets
    • …
    corecore