376 research outputs found

    Large-scale surgical workflow segmentation for laparoscopic sacrocolpopexy

    Get PDF
    Purpose: Laparoscopic sacrocolpopexy is the gold standard procedure for the management of vaginal vault prolapse. Studying surgical skills and different approaches to this procedure requires an analysis at the level of each of its individual phases, thus motivating investigation of automated surgical workflow for expediting this research. Phase durations in this procedure are significantly larger and more variable than commonly available benchmarks such as Cholec80, and we assess these differences. / Methodology: We introduce sequence-to-sequence (seq2seq) models for coarse-level phase segmentation in order to deal with highly variable phase durations in Sacrocolpopexy. Multiple architectures (LSTM and transformer), configurations (time-shifted, time-synchronous), and training strategies are tested with this novel framework to explore its flexibility. / Results: We perform 7-fold cross-validation on a dataset with 14 complete videos of sacrocolpopexy. We perform both a frame-based (accuracy, F1-score) and an event-based (Ward metric) evaluation of our algorithms and show that different architectures present a trade-off between higher number of accurate frames (LSTM, Mode average) or more consistent ordering of phase transitions (Transformer). We compare the implementations on the widely used Cholec80 dataset and verify that relative performances are different to those in Sacrocolpopexy. / Conclusions: We show that workflow segmentation of Sacrocolpopexy videos has specific challenges that are different to the widely used benchmark Cholec80 and require dedicated approaches to deal with the significantly larger phase durations. We demonstrate the feasibility of seq2seq models in Sacrocolpopexy, a broad framework that can be further explored with new configurations. We show that an event-based evaluation metric is useful to evaluate workflow segmentation algorithms and provides complementary insight to the more commonly used metrics such as accuracy or F1-score

    Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning

    Full text link
    The Visual Dialogue task requires an agent to engage in a conversation about an image with a human. It represents an extension of the Visual Question Answering task in that the agent needs to answer a question about an image, but it needs to do so in light of the previous dialogue that has taken place. The key challenge in Visual Dialogue is thus maintaining a consistent, and natural dialogue while continuing to answer questions correctly. We present a novel approach that combines Reinforcement Learning and Generative Adversarial Networks (GANs) to generate more human-like responses to questions. The GAN helps overcome the relative paucity of training data, and the tendency of the typical MLE-based approach to generate overly terse answers. Critically, the GAN is tightly integrated into the attention mechanism that generates human-interpretable reasons for each answer. This means that the discriminative model of the GAN has the task of assessing whether a candidate answer is generated by a human or not, given the provided reason. This is significant because it drives the generative model to produce high quality answers that are well supported by the associated reasoning. The method also generates the state-of-the-art results on the primary benchmark
    • …
    corecore