735 research outputs found

    Perception and Orientation in Minimally Invasive Surgery

    No full text
    During the last two decades, we have seen a revolution in the way that we perform abdominal surgery with increased reliance on minimally invasive techniques. This paradigm shift has come at a rapid pace, with laparoscopic surgery now representing the gold standard for many surgical procedures and further minimisation of invasiveness being seen with the recent clinical introduction of novel techniques such as single-incision laparoscopic surgery and natural orifice translumenal endoscopic surgery. Despite the obvious benefits conferred on the patient in terms of morbidity, length of hospital stay and post-operative pain, this paradigm shift comes at a significantly higher demand on the surgeon, in terms of both perception and manual dexterity. The issues involved include degradation of sensory input to the operator compared to conventional open surgery owing to a loss of three-dimensional vision through the use of the two-dimensional operative interface, and decreased haptic feedback from the instruments. These changes have led to a much higher cognitive load on the surgeon and a greater risk of operator disorientation leading to potential surgical errors. This thesis represents a detailed investigation of disorientation in minimally invasive surgery. In this thesis, eye tracking methodology is identified as the method of choice for evaluating behavioural patterns during orientation. An analysis framework is proposed to profile orientation behaviour using eye tracking data validated in a laboratory model. This framework is used to characterise and quantify successful orientation strategies at critical stages of laparoscopic cholecystectomy and furthermore use these strategies to prove that focused teaching of this behaviour in novices can significantly increase performance in this task. Orientation strategies are then characterised for common clinical scenarios in natural orifice translumenal endoscopic surgery and the concept of image saliency is introduced to further investigate the importance of specific visual cues associated with effective orientation. Profiling of behavioural patterns is related to performance in orientation and implications on education and construction of smart surgical robots are drawn. Finally, a method for potentially decreasing operator disorientation is investigated in the form of endoscopic horizon stabilization in a simulated operative model for transgastric surgery. The major original contributions of this thesis include: Validation of a profiling methodology/framework to characterise orientation behaviour Identification of high performance orientation strategies in specific clinical scenarios including laparoscopic cholecystectomy and natural orifice translumenal endoscopic surgery Evaluation of the efficacy of teaching orientation strategies Evaluation of automatic endoscopic horizon stabilization in natural orifice translumenal endoscopic surgery The impact of the results presented in this thesis, as well as the potential for further high impact research is discussed in the context of both eye tracking as an evaluation tool in minimally invasive surgery as well as implementation of means to combat operator disorientation in a surgical platform. The work also provides further insight into the practical implementation of computer-assistance and technological innovation in future flexible access surgical platforms

    COMPASS: A Formal Framework and Aggregate Dataset for Generalized Surgical Procedure Modeling

    Full text link
    Purpose: We propose a formal framework for the modeling and segmentation of minimally-invasive surgical tasks using a unified set of motion primitives (MPs) to enable more objective labeling and the aggregation of different datasets. Methods: We model dry-lab surgical tasks as finite state machines, representing how the execution of MPs as the basic surgical actions results in the change of surgical context, which characterizes the physical interactions among tools and objects in the surgical environment. We develop methods for labeling surgical context based on video data and for automatic translation of context to MP labels. We then use our framework to create the COntext and Motion Primitive Aggregate Surgical Set (COMPASS), including six dry-lab surgical tasks from three publicly-available datasets (JIGSAWS, DESK, and ROSMA), with kinematic and video data and context and MP labels. Results: Our context labeling method achieves near-perfect agreement between consensus labels from crowd-sourcing and expert surgeons. Segmentation of tasks to MPs results in the creation of the COMPASS dataset that nearly triples the amount of data for modeling and analysis and enables the generation of separate transcripts for the left and right tools. Conclusion: The proposed framework results in high quality labeling of surgical data based on context and fine-grained MPs. Modeling surgical tasks with MPs enables the aggregation of different datasets and the separate analysis of left and right hands for bimanual coordination assessment. Our formal framework and aggregate dataset can support the development of explainable and multi-granularity models for improved surgical process analysis, skill assessment, error detection, and autonomy.Comment: 22 pages, 6 figures, 12 table

    Kvasir-Capsule, a video capsule endoscopy dataset

    Get PDF
    Artificial intelligence (AI) is predicted to have profound effects on the future of video capsule endoscopy (VCE) technology. The potential lies in improving anomaly detection while reducing manual labour. Existing work demonstrates the promising benefits of AI-based computer-assisted diagnosis systems for VCE. They also show great potential for improvements to achieve even better results. Also, medical data is often sparse and unavailable to the research community, and qualified medical personnel rarely have time for the tedious labelling work. We present Kvasir-Capsule, a large VCE dataset collected from examinations at a Norwegian Hospital. Kvasir-Capsule consists of 117 videos which can be used to extract a total of 4,741,504 image frames. We have labelled and medically verified 47,238 frames with a bounding box around findings from 14 different classes. In addition to these labelled images, there are 4,694,266 unlabelled frames included in the dataset. The Kvasir-Capsule dataset can play a valuable role in developing better algorithms in order to reach true potential of VCE technology

    Hypergraph-Transformer (HGT) for Interactive Event Prediction in Laparoscopic and Robotic Surgery

    Full text link
    Understanding and anticipating intraoperative events and actions is critical for intraoperative assistance and decision-making during minimally invasive surgery. Automated prediction of events, actions, and the following consequences is addressed through various computational approaches with the objective of augmenting surgeons' perception and decision-making capabilities. We propose a predictive neural network that is capable of understanding and predicting critical interactive aspects of surgical workflow from intra-abdominal video, while flexibly leveraging surgical knowledge graphs. The approach incorporates a hypergraph-transformer (HGT) structure that encodes expert knowledge into the network design and predicts the hidden embedding of the graph. We verify our approach on established surgical datasets and applications, including the detection and prediction of action triplets, and the achievement of the Critical View of Safety (CVS). Moreover, we address specific, safety-related tasks, such as predicting the clipping of cystic duct or artery without prior achievement of the CVS. Our results demonstrate the superiority of our approach compared to unstructured alternatives

    Spatiotemporal Event Graphs for Dynamic Scene Understanding

    Full text link
    Dynamic scene understanding is the ability of a computer system to interpret and make sense of the visual information present in a video of a real-world scene. In this thesis, we present a series of frameworks for dynamic scene understanding starting from road event detection from an autonomous driving perspective to complex video activity detection, followed by continual learning approaches for the life-long learning of the models. Firstly, we introduce the ROad event Awareness Dataset (ROAD) for Autonomous Driving, to our knowledge the first of its kind. Due to the lack of datasets equipped with formally specified logical requirements, we also introduce the ROad event Awareness Dataset with logical Requirements (ROAD-R), the first publicly available dataset for autonomous driving with requirements expressed as logical constraints, as a tool for driving neurosymbolic research in the area. Next, we extend event detection to holistic scene understanding by proposing two complex activity detection methods. In the first method, we present a deformable, spatiotemporal scene graph approach, consisting of three main building blocks: action tube detection, a 3D deformable RoI pooling layer designed for learning the flexible, deformable geometry of the constituent action tubes, and a scene graph constructed by considering all parts as nodes and connecting them based on different semantics. In a second approach evolving from the first, we propose a hybrid graph neural network that combines attention applied to a graph encoding of the local (short-term) dynamic scene with a temporal graph modelling the overall long-duration activity. Finally, the last part of the thesis is about presenting a new continual semi-supervised learning (CSSL) paradigm.Comment: PhD thesis, Oxford Brookes University, Examiners: Prof. Dima Damen and Dr. Matthias Rolf, 183 page

    Iterative multi-path tracking for video and volume segmentation with sparse point supervision

    Get PDF
    Recent machine learning strategies for segmentation tasks have shown great ability when trained on large pixel-wise annotated image datasets. It remains a major challenge however to aggregate such datasets, as the time and monetary cost associated with collecting extensive annotations is extremely high. This is particularly the case for generating precise pixel-wise annotations in video and volumetric image data. To this end, this work presents a novel framework to produce pixel-wise segmentations using minimal supervision. Our method relies on 2D point supervision, whereby a single 2D location within an object of interest is provided on each image of the data. Our method then estimates the object appearance in a semi-supervised fashion by learning object-image-specific features and by using these in a semi-supervised learning framework. Our object model is then used in a graph-based optimization problem that takes into account all provided locations and the image data in order to infer the complete pixel-wise segmentation. In practice, we solve this optimally as a tracking problem using a K-shortest path approach. Both the object model and segmentation are then refined iteratively to further improve the final segmentation. We show that by collecting 2D locations using a gaze tracker, our approach can provide state-of-the-art segmentations on a range of objects and image modalities (video and 3D volumes), and that these can then be used to train supervised machine learning classifiers

    Learning-based depth and pose prediction for 3D scene reconstruction in endoscopy

    Get PDF
    Colorectal cancer is the third most common cancer worldwide. Early detection and treatment of pre-cancerous tissue during colonoscopy is critical to improving prognosis. However, navigating within the colon and inspecting the endoluminal tissue comprehensively are challenging, and success in both varies based on the endoscopist's skill and experience. Computer-assisted interventions in colonoscopy show much promise in improving navigation and inspection. For instance, 3D reconstruction of the colon during colonoscopy could promote more thorough examinations and increase adenoma detection rates which are associated with improved survival rates. Given the stakes, this thesis seeks to advance the state of research from feature-based traditional methods closer to a data-driven 3D reconstruction pipeline for colonoscopy. More specifically, this thesis explores different methods that improve subtasks of learning-based 3D reconstruction. The main tasks are depth prediction and camera pose estimation. As training data is unavailable, the author, together with her co-authors, proposes and publishes several synthetic datasets and promotes domain adaptation models to improve applicability to real data. We show, through extensive experiments, that our depth prediction methods produce more robust results than previous work. Our pose estimation network trained on our new synthetic data outperforms self-supervised methods on real sequences. Our box embeddings allow us to interpret the geometric relationship and scale difference between two images of the same surface without the need for feature matches that are often unobtainable in surgical scenes. Together, the methods introduced in this thesis help work towards a complete, data-driven 3D reconstruction pipeline for endoscopy
