735 research outputs found
Perception and Orientation in Minimally Invasive Surgery
During the last two decades, we have seen a revolution in the way that we perform abdominal surgery with increased reliance on minimally invasive techniques. This paradigm shift has come at a rapid pace, with laparoscopic surgery now representing the gold standard for many surgical procedures and further minimisation of invasiveness being seen with the recent clinical introduction of novel techniques such as single-incision laparoscopic surgery and natural orifice translumenal endoscopic surgery. Despite the obvious benefits conferred on the patient in terms of morbidity, length of hospital stay and post-operative pain, this paradigm shift comes at a significantly higher demand on the surgeon, in terms of both perception and manual dexterity. The issues involved include degradation of sensory input to the operator compared to conventional open surgery owing to a loss of three-dimensional vision through the use of the two-dimensional operative interface, and decreased haptic feedback from the instruments. These changes have led to a much higher cognitive load on the surgeon and a greater risk of operator disorientation leading to potential surgical errors.
This thesis represents a detailed investigation of disorientation in minimally invasive surgery. In this thesis, eye tracking methodology is identified as the method of choice for evaluating behavioural patterns during orientation. An analysis framework is proposed to profile orientation behaviour using eye tracking data validated in a laboratory model. This framework is used to characterise and quantify successful orientation strategies at critical stages of laparoscopic cholecystectomy and furthermore use these strategies to prove that focused teaching of this behaviour in novices can significantly increase performance in this task. Orientation strategies are then characterised for common clinical scenarios in natural orifice translumenal endoscopic surgery and the concept of image saliency is introduced to further investigate the importance of specific visual cues associated with effective orientation. Profiling of behavioural patterns is related to performance in orientation and implications on education and construction of smart surgical robots are drawn. Finally, a method for potentially decreasing operator disorientation is
investigated in the form of endoscopic horizon stabilization in a simulated operative model for transgastric surgery.
The major original contributions of this thesis include:
Validation of a profiling methodology/framework to characterise orientation behaviour
Identification of high performance orientation strategies in specific clinical scenarios including laparoscopic cholecystectomy and natural orifice translumenal endoscopic surgery
Evaluation of the efficacy of teaching orientation strategies
Evaluation of automatic endoscopic horizon stabilization in natural orifice translumenal endoscopic surgery
The impact of the results presented in this thesis, as well as the potential for further high impact research is discussed in the context of both eye tracking as an evaluation tool in minimally invasive surgery as well as implementation of means to combat operator disorientation in a surgical platform. The work also provides further insight into the practical implementation of computer-assistance and technological innovation in future flexible access surgical platforms
COMPASS: A Formal Framework and Aggregate Dataset for Generalized Surgical Procedure Modeling
Purpose: We propose a formal framework for the modeling and segmentation of
minimally-invasive surgical tasks using a unified set of motion primitives
(MPs) to enable more objective labeling and the aggregation of different
datasets.
Methods: We model dry-lab surgical tasks as finite state machines,
representing how the execution of MPs as the basic surgical actions results in
the change of surgical context, which characterizes the physical interactions
among tools and objects in the surgical environment. We develop methods for
labeling surgical context based on video data and for automatic translation of
context to MP labels. We then use our framework to create the COntext and
Motion Primitive Aggregate Surgical Set (COMPASS), including six dry-lab
surgical tasks from three publicly-available datasets (JIGSAWS, DESK, and
ROSMA), with kinematic and video data and context and MP labels.
Results: Our context labeling method achieves near-perfect agreement between
consensus labels from crowd-sourcing and expert surgeons. Segmentation of tasks
to MPs results in the creation of the COMPASS dataset that nearly triples the
amount of data for modeling and analysis and enables the generation of separate
transcripts for the left and right tools.
Conclusion: The proposed framework results in high quality labeling of
surgical data based on context and fine-grained MPs. Modeling surgical tasks
with MPs enables the aggregation of different datasets and the separate
analysis of left and right hands for bimanual coordination assessment. Our
formal framework and aggregate dataset can support the development of
explainable and multi-granularity models for improved surgical process
analysis, skill assessment, error detection, and autonomy.Comment: 22 pages, 6 figures, 12 table
Kvasir-Capsule, a video capsule endoscopy dataset
Artificial intelligence (AI) is predicted to have profound effects on the future of video capsule endoscopy (VCE) technology. The potential lies in improving anomaly detection while reducing manual labour. Existing work demonstrates the promising benefits of AI-based computer-assisted diagnosis systems for VCE. They also show great potential for improvements to achieve even better results. Also, medical data is often sparse and unavailable to the research community, and qualified medical personnel rarely have time for the tedious labelling work. We present Kvasir-Capsule, a large VCE dataset collected from examinations at a Norwegian Hospital. Kvasir-Capsule consists of 117 videos which can be used to extract a total of 4,741,504 image frames. We have labelled and medically verified 47,238 frames with a bounding box around findings from 14 different classes. In addition to these labelled images, there are 4,694,266 unlabelled frames included in the dataset. The Kvasir-Capsule dataset can play a valuable role in developing better algorithms in order to reach true potential of VCE technology
Hypergraph-Transformer (HGT) for Interactive Event Prediction in Laparoscopic and Robotic Surgery
Understanding and anticipating intraoperative events and actions is critical
for intraoperative assistance and decision-making during minimally invasive
surgery. Automated prediction of events, actions, and the following
consequences is addressed through various computational approaches with the
objective of augmenting surgeons' perception and decision-making capabilities.
We propose a predictive neural network that is capable of understanding and
predicting critical interactive aspects of surgical workflow from
intra-abdominal video, while flexibly leveraging surgical knowledge graphs. The
approach incorporates a hypergraph-transformer (HGT) structure that encodes
expert knowledge into the network design and predicts the hidden embedding of
the graph. We verify our approach on established surgical datasets and
applications, including the detection and prediction of action triplets, and
the achievement of the Critical View of Safety (CVS). Moreover, we address
specific, safety-related tasks, such as predicting the clipping of cystic duct
or artery without prior achievement of the CVS. Our results demonstrate the
superiority of our approach compared to unstructured alternatives
Spatiotemporal Event Graphs for Dynamic Scene Understanding
Dynamic scene understanding is the ability of a computer system to interpret
and make sense of the visual information present in a video of a real-world
scene. In this thesis, we present a series of frameworks for dynamic scene
understanding starting from road event detection from an autonomous driving
perspective to complex video activity detection, followed by continual learning
approaches for the life-long learning of the models. Firstly, we introduce the
ROad event Awareness Dataset (ROAD) for Autonomous Driving, to our knowledge
the first of its kind. Due to the lack of datasets equipped with formally
specified logical requirements, we also introduce the ROad event Awareness
Dataset with logical Requirements (ROAD-R), the first publicly available
dataset for autonomous driving with requirements expressed as logical
constraints, as a tool for driving neurosymbolic research in the area. Next, we
extend event detection to holistic scene understanding by proposing two complex
activity detection methods. In the first method, we present a deformable,
spatiotemporal scene graph approach, consisting of three main building blocks:
action tube detection, a 3D deformable RoI pooling layer designed for learning
the flexible, deformable geometry of the constituent action tubes, and a scene
graph constructed by considering all parts as nodes and connecting them based
on different semantics. In a second approach evolving from the first, we
propose a hybrid graph neural network that combines attention applied to a
graph encoding of the local (short-term) dynamic scene with a temporal graph
modelling the overall long-duration activity. Finally, the last part of the
thesis is about presenting a new continual semi-supervised learning (CSSL)
paradigm.Comment: PhD thesis, Oxford Brookes University, Examiners: Prof. Dima Damen
and Dr. Matthias Rolf, 183 page
Iterative multi-path tracking for video and volume segmentation with sparse point supervision
Recent machine learning strategies for segmentation tasks have shown great
ability when trained on large pixel-wise annotated image datasets. It remains a
major challenge however to aggregate such datasets, as the time and monetary
cost associated with collecting extensive annotations is extremely high. This
is particularly the case for generating precise pixel-wise annotations in video
and volumetric image data. To this end, this work presents a novel framework to
produce pixel-wise segmentations using minimal supervision. Our method relies
on 2D point supervision, whereby a single 2D location within an object of
interest is provided on each image of the data. Our method then estimates the
object appearance in a semi-supervised fashion by learning
object-image-specific features and by using these in a semi-supervised learning
framework. Our object model is then used in a graph-based optimization problem
that takes into account all provided locations and the image data in order to
infer the complete pixel-wise segmentation. In practice, we solve this
optimally as a tracking problem using a K-shortest path approach. Both the
object model and segmentation are then refined iteratively to further improve
the final segmentation. We show that by collecting 2D locations using a gaze
tracker, our approach can provide state-of-the-art segmentations on a range of
objects and image modalities (video and 3D volumes), and that these can then be
used to train supervised machine learning classifiers
Learning-based depth and pose prediction for 3D scene reconstruction in endoscopy
Colorectal cancer is the third most common cancer worldwide. Early detection and treatment of pre-cancerous tissue during colonoscopy is critical to improving prognosis. However, navigating within the colon and inspecting the endoluminal tissue comprehensively are challenging, and success in both varies based on the endoscopist's skill and experience. Computer-assisted interventions in colonoscopy show much promise in improving navigation and inspection. For instance, 3D reconstruction of the colon during colonoscopy could promote more thorough examinations and increase adenoma detection rates which are associated with improved survival rates. Given the stakes, this thesis seeks to advance the state of research from feature-based traditional methods closer to a data-driven 3D reconstruction pipeline for colonoscopy.
More specifically, this thesis explores different methods that improve subtasks of learning-based 3D reconstruction. The main tasks are depth prediction and camera pose estimation. As training data is unavailable, the author, together with her co-authors, proposes and publishes several synthetic datasets and promotes domain adaptation models to improve applicability to real data. We show, through extensive experiments, that our depth prediction methods produce more robust results than previous work. Our pose estimation network trained on our new synthetic data outperforms self-supervised methods on real sequences. Our box embeddings allow us to interpret the geometric relationship and scale difference between two images of the same surface without the need for feature matches that are often unobtainable in surgical scenes. Together, the methods introduced in this thesis help work towards a complete, data-driven 3D reconstruction pipeline for endoscopy
- âŠ