108 research outputs found

    GLSFormer: Gated - Long, Short Sequence Transformer for Step Recognition in Surgical Videos

    Full text link
    Automated surgical step recognition is an important task that can significantly improve patient safety and decision-making during surgeries. Existing state-of-the-art methods for surgical step recognition either rely on separate, multi-stage modeling of spatial and temporal information or operate on short-range temporal resolution when learned jointly. However, the benefits of joint modeling of spatio-temporal features and long-range information are not taken in account. In this paper, we propose a vision transformer-based approach to jointly learn spatio-temporal features directly from sequence of frame-level patches. Our method incorporates a gated-temporal attention mechanism that intelligently combines short-term and long-term spatio-temporal feature representations. We extensively evaluate our approach on two cataract surgery video datasets, namely Cataract-101 and D99, and demonstrate superior performance compared to various state-of-the-art methods. These results validate the suitability of our proposed approach for automated surgical step recognition. Our code is released at: https://github.com/nisargshah1999/GLSFormerComment: Accepted to MICCAI 2023 (Early Accept

    Co-Learning Semantic-aware Unsupervised Segmentation for Pathological Image Registration

    Full text link
    The registration of pathological images plays an important role in medical applications. Despite its significance, most researchers in this field primarily focus on the registration of normal tissue into normal tissue. The negative impact of focal tissue, such as the loss of spatial correspondence information and the abnormal distortion of tissue, are rarely considered. In this paper, we propose GIRNet, a novel unsupervised approach for pathological image registration by incorporating segmentation and inpainting through the principles of Generation, Inpainting, and Registration (GIR). The registration, segmentation, and inpainting modules are trained simultaneously in a co-learning manner so that the segmentation of the focal area and the registration of inpainted pairs can improve collaboratively. Overall, the registration of pathological images is achieved in a completely unsupervised learning framework. Experimental results on multiple datasets, including Magnetic Resonance Imaging (MRI) of T1 sequences, demonstrate the efficacy of our proposed method. Our results show that our method can accurately achieve the registration of pathological images and identify lesions even in challenging imaging modalities. Our unsupervised approach offers a promising solution for the efficient and cost-effective registration of pathological images. Our code is available at https://github.com/brain-intelligence-lab/GIRNet.Comment: 13 pages, 7 figures, published in Medical Image Computing and Computer Assisted Intervention (MICCAI) 202

    3D Kidney Segmentation from Abdominal Images Using Spatial-Appearance Models

    Get PDF
    Kidney segmentation is an essential step in developing any noninvasive computer-assisted diagnostic system for renal function assessment. This paper introduces an automated framework for 3D kidney segmentation from dynamic computed tomography (CT) images that integrates discriminative features from the current and prior CT appearances into a random forest classification approach. To account for CT images’ inhomogeneities, we employ discriminate features that are extracted from a higher-order spatial model and an adaptive shape model in addition to the first-order CT appearance. To model the interactions between CT data voxels, we employed a higher-order spatial model, which adds the triple and quad clique families to the traditional pairwise clique family. The kidney shape prior model is built using a set of training CT data and is updated during segmentation using not only region labels but also voxels’ appearances in neighboring spatial voxel locations. Our framework performance has been evaluated on in vivo dynamic CT data collected from 20 subjects and comprises multiple 3D scans acquired before and after contrast medium administration. Quantitative evaluation between manually and automatically segmented kidney contours using Dice similarity, percentage volume differences, and 95th-percentile bidirectional Hausdorff distances confirms the high accuracy of our approach

    A non-invasive diagnostic system for early assessment of acute renal transplant rejection.

    Get PDF
    Early diagnosis of acute renal transplant rejection (ARTR) is of immense importance for appropriate therapeutic treatment administration. Although the current diagnostic technique is based on renal biopsy, it is not preferred due to its invasiveness, recovery time (1-2 weeks), and potential for complications, e.g., bleeding and/or infection. In this thesis, a computer-aided diagnostic (CAD) system for early detection of ARTR from 4D (3D + b-value) diffusion-weighted (DW) MRI data is developed. The CAD process starts from a 3D B-spline-based data alignment (to handle local deviations due to breathing and heart beat) and kidney tissue segmentation with an evolving geometric (level-set-based) deformable model. The latter is guided by a voxel-wise stochastic speed function, which follows from a joint kidney-background Markov-Gibbs random field model accounting for an adaptive kidney shape prior and for on-going visual kidney-background appearances. A cumulative empirical distribution of apparent diffusion coefficient (ADC) at different b-values of the segmented DW-MRI is considered a discriminatory transplant status feature. Finally, a classifier based on deep learning of a non-negative constrained stacked auto-encoder is employed to distinguish between rejected and non-rejected renal transplants. In the “leave-one-subject-out” experiments on 53 subjects, 98% of the subjects were correctly classified (namely, 36 out of 37 rejected transplants and 16 out of 16 nonrejected ones). Additionally, a four-fold cross-validation experiment was performed, and an average accuracy of 96% was obtained. These experimental results hold promise of the proposed CAD system as a reliable non-invasive diagnostic tool

    Diffusion-weighted magnetic resonance imaging in diagnosing graft dysfunction : a non-invasive alternative to renal biopsy.

    Get PDF
    The thesis is divided into three parts. The first part focuses on background information including how the kidney functions, diseases, and available kidney disease treatment strategies. In addition, the thesis provides information on imaging instruments and how they can be used to diagnose renal graft dysfunction. The second part focuses on elucidating the parameters linked with highly accurate diagnosis of rejection. Four parameters categories were tested: clinical biomarkers alone, individual mean apparent diffusion coefficient (ADC) at 11-different b- values, mean ADCs of certain groups of b-value, and fusion of clinical biomarkers and all b-values. The most accurate model was found to be when the b-value of b=100 s/mm2 and b=700 s/mm2 were fused. The third part of this thesis focuses on a study that uses Diffusion-Weighted MRI to diagnose and differentiate two types of renal rejection. The system was found to correctly differentiate the two types of rejection with a 98% accuracy. The last part of this thesis concludes the work that has been done and states the possible trends and future avenues

    Scalable joint segmentation and registration framework for infant brain images

    Get PDF
    The first year of life is the most dynamic and perhaps the most critical phase of postnatal brain development. The ability to accurately measure structure changes is critical in early brain development study, which highly relies on the performances of image segmentation and registration techniques. However, either infant image segmentation or registration, if deployed independently, encounters much more challenges than segmentation/registration of adult brains due to dynamic appearance change with rapid brain development. In fact, image segmentation and registration of infant images can assists each other to overcome the above challenges by using the growth trajectories (i.e., temporal correspondences) learned from a large set of training subjects with complete longitudinal data. Specifically, a one-year-old image with ground-truth tissue segmentation can be first set as the reference domain. Then, to register the infant image of a new subject at earlier age, we can estimate its tissue probability maps, i.e., with sparse patch-based multi-atlas label fusion technique, where only the training images at the respective age are considered as atlases since they have similar image appearance. Next, these probability maps can be fused as a good initialization to guide the level set segmentation. Thus, image registration between the new infant image and the reference image is free of difficulty of appearance changes, by establishing correspondences upon the reasonably segmented images. Importantly, the segmentation of new infant image can be further enhanced by propagating the much more reliable label fusion heuristics at the reference domain to the corresponding location of the new infant image via the learned growth trajectories, which brings image segmentation and registration to assist each other. It is worth noting that our joint segmentation and registration framework is also flexible to handle the registration of any two infant images even with significant age gap in the first year of life, by linking their joint segmentation and registration through the reference domain. Thus, our proposed joint segmentation and registration method is scalable to various registration tasks in early brain development studies. Promising segmentation and registration results have been achieved for infant brain MR images aged from 2-week-old to 1-year-old, indicating the applicability of our method in early brain development study

    Computational methods for the analysis of functional 4D-CT chest images.

    Get PDF
    Medical imaging is an important emerging technology that has been intensively used in the last few decades for disease diagnosis and monitoring as well as for the assessment of treatment effectiveness. Medical images provide a very large amount of valuable information that is too huge to be exploited by radiologists and physicians. Therefore, the design of computer-aided diagnostic (CAD) system, which can be used as an assistive tool for the medical community, is of a great importance. This dissertation deals with the development of a complete CAD system for lung cancer patients, which remains the leading cause of cancer-related death in the USA. In 2014, there were approximately 224,210 new cases of lung cancer and 159,260 related deaths. The process begins with the detection of lung cancer which is detected through the diagnosis of lung nodules (a manifestation of lung cancer). These nodules are approximately spherical regions of primarily high density tissue that are visible in computed tomography (CT) images of the lung. The treatment of these lung cancer nodules is complex, nearly 70% of lung cancer patients require radiation therapy as part of their treatment. Radiation-induced lung injury is a limiting toxicity that may decrease cure rates and increase morbidity and mortality treatment. By finding ways to accurately detect, at early stage, and hence prevent lung injury, it will have significant positive consequences for lung cancer patients. The ultimate goal of this dissertation is to develop a clinically usable CAD system that can improve the sensitivity and specificity of early detection of radiation-induced lung injury based on the hypotheses that radiated lung tissues may get affected and suffer decrease of their functionality as a side effect of radiation therapy treatment. These hypotheses have been validated by demonstrating that automatic segmentation of the lung regions and registration of consecutive respiratory phases to estimate their elasticity, ventilation, and texture features to provide discriminatory descriptors that can be used for early detection of radiation-induced lung injury. The proposed methodologies will lead to novel indexes for distinguishing normal/healthy and injured lung tissues in clinical decision-making. To achieve this goal, a CAD system for accurate detection of radiation-induced lung injury that requires three basic components has been developed. These components are the lung fields segmentation, lung registration, and features extraction and tissue classification. This dissertation starts with an exploration of the available medical imaging modalities to present the importance of medical imaging in today’s clinical applications. Secondly, the methodologies, challenges, and limitations of recent CAD systems for lung cancer detection are covered. This is followed by introducing an accurate segmentation methodology of the lung parenchyma with the focus of pathological lungs to extract the volume of interest (VOI) to be analyzed for potential existence of lung injuries stemmed from the radiation therapy. After the segmentation of the VOI, a lung registration framework is introduced to perform a crucial and important step that ensures the co-alignment of the intra-patient scans. This step eliminates the effects of orientation differences, motion, breathing, heart beats, and differences in scanning parameters to be able to accurately extract the functionality features for the lung fields. The developed registration framework also helps in the evaluation and gated control of the radiotherapy through the motion estimation analysis before and after the therapy dose. Finally, the radiation-induced lung injury is introduced, which combines the previous two medical image processing and analysis steps with the features estimation and classification step. This framework estimates and combines both texture and functional features. The texture features are modeled using the novel 7th-order Markov Gibbs random field (MGRF) model that has the ability to accurately models the texture of healthy and injured lung tissues through simultaneously accounting for both vertical and horizontal relative dependencies between voxel-wise signals. While the functionality features calculations are based on the calculated deformation fields, obtained from the 4D-CT lung registration, that maps lung voxels between successive CT scans in the respiratory cycle. These functionality features describe the ventilation, the air flow rate, of the lung tissues using the Jacobian of the deformation field and the tissues’ elasticity using the strain components calculated from the gradient of the deformation field. Finally, these features are combined in the classification model to detect the injured parts of the lung at an early stage and enables an earlier intervention

    Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures

    Full text link
    Recent advancements in surgical computer vision applications have been driven by fully-supervised methods, primarily using only visual data. These methods rely on manually annotated surgical videos to predict a fixed set of object categories, limiting their generalizability to unseen surgical procedures and downstream tasks. In this work, we put forward the idea that the surgical video lectures available through open surgical e-learning platforms can provide effective supervisory signals for multi-modal representation learning without relying on manual annotations. We address the surgery-specific linguistic challenges present in surgical video lectures by employing multiple complementary automatic speech recognition systems to generate text transcriptions. We then present a novel method, SurgVLP - Surgical Vision Language Pre-training, for multi-modal representation learning. SurgVLP constructs a new contrastive learning objective to align video clip embeddings with the corresponding multiple text embeddings by bringing them together within a joint latent space. To effectively show the representation capability of the learned joint latent space, we introduce several vision-and-language tasks for surgery, such as text-based video retrieval, temporal activity grounding, and video captioning, as benchmarks for evaluation. We further demonstrate that without using any labeled ground truth, our approach can be employed for traditional vision-only surgical downstream tasks, such as surgical tool, phase, and triplet recognition. The code will be made available at https://github.com/CAMMA-public/SurgVL
    • …
    corecore