17 research outputs found

    Towards real-time multiple surgical tool tracking

    Get PDF
    Surgical tool tracking is an essential building block for computer-assisted interventions (CAI) and applications like video summarisation, workflow analysis and surgical navigation. Vision-based instrument tracking in laparoscopic surgical data faces significant challenges such as fast instrument motion, multiple simultaneous instruments and re-initialisation due to out-of-view conditions or instrument occlusions. In this paper, we propose a real-time multiple object tracking framework for whole laparoscopic tools, which extends an existing single object tracker. We introduce a geometric object descriptor, which helps with overlapping bounding box disambiguation, fast motion and optimal assignment between existing trajectories and new hypotheses. We achieve 99.51% and 75.64% average accuracy on ex-vivo robotic data and in-vivo laparoscopic sequences respectively from the Endovis’15 Instrument Tracking Dataset. The proposed geometric descriptor increased the performance on laparoscopic data by 32%, significantly reducing identity switches, false negatives and false positives. Overall, the proposed pipeline can successfully recover trajectories over long-sequences and it runs in real-time at approximately 25–29 fps

    EasyLabels: weak labels for scene segmentation in laparoscopic videos

    Get PDF
    PURPOSE: We present a different approach for annotating laparoscopic images for segmentation in a weak fashion and experimentally prove that its accuracy when trained with partial cross-entropy is close to that obtained with fully supervised approaches. METHODS: We propose an approach that relies on weak annotations provided as stripes over the different objects in the image and partial cross-entropy as the loss function of a fully convolutional neural network to obtain a dense pixel-level prediction map. RESULTS: We validate our method on three different datasets, providing qualitative results for all of them and quantitative results for two of them. The experiments show that our approach is able to obtain at least [Formula: see text] of the accuracy obtained with fully supervised methods for all the tested datasets, while requiring [Formula: see text][Formula: see text] less time to create the annotations compared to full supervision. CONCLUSIONS: With this work, we demonstrate that laparoscopic data can be segmented using very few annotated data while maintaining levels of accuracy comparable to those obtained with full supervision

    Feature Aggregation Decoder for Segmenting Laparoscopic Scenes

    Get PDF
    Laparoscopic scene segmentation is one of the key building blocks required for developing advanced computer assisted interventions and robotic automation. Scene segmentation approaches often rely on encoder-decoder architectures that encode a representation of the input to be decoded to semantic pixel labels. In this paper, we propose to use the deep Xception model for the encoder and a simple yet effective decoder that relies on a feature aggregation module. Our feature aggregation module constructs a mapping function that reuses and transfers encoder features and combines information across all feature scales to build a richer representation that keeps both high-level context and low-level boundary information. We argue that this aggregation module enables us to simplify the decoder and reduce the number of parameters in the decoder. We have evaluated our approach on two datasets and our experimental results show that our model outperforms state-of-the-art models on the same experimental setup and significantly improves the previous results, 98.44% vs 89.00% , on the EndoVis’15 dataset

    CaDIS: Cataract dataset for surgical RGB-image segmentation

    Get PDF
    Video feedback provides a wealth of information about surgical procedures and is the main sensory cue for surgeons. Scene understanding is crucial to computer assisted interventions (CAI) and to post-operative analysis of the surgical procedure. A fundamental building block of such capabilities is the identification and localization of surgical instruments and anatomical structures through semantic segmentation. Deep learning has advanced semantic segmentation techniques in the recent years but is inherently reliant on the availability of labelled datasets for model training. This paper introduces a dataset for semantic segmentation of cataract surgery videos complementing the publicly available CATARACTS challenge dataset. In addition, we benchmark the performance of several state-of-the-art deep learning models for semantic segmentation on the presented dataset. The dataset is publicly available at https://cataracts-semantic-segmentation2020.grand-challenge.org/

    Towards video-based surgical workflow understanding in open orthopaedic surgery

    Get PDF
    Safe and efficient surgical training and workflow management play a critical role in clinical competency and ultimately, patient outcomes. Video data in minimally invasive surgery (MIS) have enabled opportunities for vision-based artificial intelligence (AI) systems to improve surgical skills training and assurance through post-operative video analysis and development of real-time computer-assisted interventions (CAI). Despite the availability of mounted cameras for the operating room (OR), similar capabilities are much more complex to develop for recording open surgery procedures, which has resulted in a shortage of exemplar video-based training materials. In this paper, we present a potential solution to record open surgical procedures using head-mounted cameras. Recorded videos were anonymised to remove patient and staff identifiable information using a machine learning algorithm that achieves state-of-the-art results on the OR Face dataset. We then propose a CNN-LSTM-based model to automatically segment videos into different surgical phases, which has never been previously demonstrated in open procedures. The redacted videos, along with the automatically predicted phases, are then available for surgeons and their teams for post-operative review and analysis. To our knowledge, this is the first demonstration of the feasibility of deploying camera recording systems and developing machine learning-based workflow analysis solutions for open surgery, particularly in orthopaedics

    Human Pose Estimation on Privacy-Preserving Low-Resolution Depth Images

    Full text link
    Human pose estimation (HPE) is a key building block for developing AI-based context-aware systems inside the operating room (OR). The 24/7 use of images coming from cameras mounted on the OR ceiling can however raise concerns for privacy, even in the case of depth images captured by RGB-D sensors. Being able to solely use low-resolution privacy-preserving images would address these concerns and help scale up the computer-assisted approaches that rely on such data to a larger number of ORs. In this paper, we introduce the problem of HPE on low-resolution depth images and propose an end-to-end solution that integrates a multi-scale super-resolution network with a 2D human pose estimation network. By exploiting intermediate feature-maps generated at different super-resolution, our approach achieves body pose results on low-resolution images (of size 64x48) that are on par with those of an approach trained and tested on full resolution images (of size 640x480).Comment: Published at MICCAI-201

    2018 Robotic Scene Segmentation Challenge

    Get PDF
    In 2015 we began a sub-challenge at the EndoVis workshop at MICCAI in Munich using endoscope images of ex-vivo tissue with automatically generated annotations from robot forward kinematics and instrument CAD models. However, the limited background variation and simple motion rendered the dataset uninformative in learning about which techniques would be suitable for segmentation in real surgery. In 2017, at the same workshop in Quebec we introduced the robotic instrument segmentation dataset with 10 teams participating in the challenge to perform binary, articulating parts and type segmentation of da Vinci instruments. This challenge included realistic instrument motion and more complex porcine tissue as background and was widely addressed with modifications on U-Nets and other popular CNN architectures. In 2018 we added to the complexity by introducing a set of anatomical objects and medical devices to the segmented classes. To avoid over-complicating the challenge, we continued with porcine data which is dramatically simpler than human tissue due to the lack of fatty tissue occluding many organs

    Why is the Winner the Best?

    Get PDF
    International benchmarking competitions have become fundamental for the comparative performance assessment of image analysis methods. However, little attention has been given to investigating what can be learnt from these competitions. Do they really generate scientific progress? What are common and successful participation strategies? What makes a solution superior to a competing method? To address this gap in the literature, we performed a multicenter study with all 80 competitions that were conducted in the scope of IEEE ISBI 2021 and MICCAI 2021. Statistical analyses performed based on comprehensive descriptions of the submitted algorithms linked to their rank as well as the underlying participation strategies revealed common characteristics of winning solutions. These typically include the use of multi-task learning (63%) and/or multi-stage pipelines (61%), and a focus on augmentation (100%), image preprocessing (97%), data curation (79%), and post-processing (66%). The “typical” lead of a winning team is a computer scientist with a doctoral degree, five years of experience in biomedical image analysis, and four years of experience in deep learning. Two core general development strategies stood out for highly-ranked teams: the reflection of the metrics in the method design and the focus on analyzing and handling failure cases. According to the organizers, 43% of the winning algorithms exceeded the state of the art but only 11% completely solved the respective domain problem. The insights of our study could help researchers (1) improve algorithm development strategies when approaching new problems, and (2) focus on open research questions revealed by this work

    Why is the winner the best?

    Get PDF
    International benchmarking competitions have become fundamental for the comparative performance assessment of image analysis methods. However, little attention has been given to investigating what can be learnt from these competitions. Do they really generate scientific progress? What are common and successful participation strategies? What makes a solution superior to a competing method? To address this gap in the literature, we performed a multicenter study with all 80 competitions that were conducted in the scope of IEEE ISBI 2021 and MICCAI 2021. Statistical analyses performed based on comprehensive descriptions of the submitted algorithms linked to their rank as well as the underlying participation strategies revealed common characteristics of winning solutions. These typically include the use of multi-task learning (63%) and/or multi-stage pipelines (61%), and a focus on augmentation (100%), image preprocessing (97%), data curation (79%), and post-processing (66%). The 'typical' lead of a winning team is a computer scientist with a doctoral degree, five years of experience in biomedical image analysis, and four years of experience in deep learning. Two core general development strategies stood out for highly-ranked teams: the reflection of the metrics in the method design and the focus on analyzing and handling failure cases. According to the organizers, 43% of the winning algorithms exceeded the state of the art but only 11% completely solved the respective domain problem. The insights of our study could help researchers (1) improve algorithm development strategies when approaching new problems, and (2) focus on open research questions revealed by this work
    corecore