Search CORE

3,750 research outputs found

Understanding Video Transformers for Segmentation: A Survey of Application and Interpretability

Author: Karim Rezaul
Wildes Richard P.
Publication venue
Publication date: 18/10/2023
Field of study

Video segmentation encompasses a wide range of categories of problem formulation, e.g., object, scene, actor-action and multimodal video segmentation, for delineating task-specific scene components with pixel-level masks. Recently, approaches in this research area shifted from concentrating on ConvNet-based to transformer-based models. In addition, various interpretability approaches have appeared for transformer models and video temporal dynamics, motivated by the growing interest in basic scientific understanding, model diagnostics and societal implications of real-world deployment. Previous surveys mainly focused on ConvNet models on a subset of video segmentation tasks or transformers for classification tasks. Moreover, component-wise discussion of transformer-based video segmentation models has not yet received due focus. In addition, previous reviews of interpretability methods focused on transformers for classification, while analysis of video temporal dynamics modelling capabilities of video models received less attention. In this survey, we address the above with a thorough discussion of various categories of video segmentation, a component-wise discussion of the state-of-the-art transformer-based models, and a review of related interpretability methods. We first present an introduction to the different video segmentation task categories, their objectives, specific challenges and benchmark datasets. Next, we provide a component-wise review of recent transformer-based models and document the state of the art on different video segmentation tasks. Subsequently, we discuss post-hoc and ante-hoc interpretability methods for transformer models and interpretability methods for understanding the role of the temporal dimension in video models. Finally, we conclude our discussion with future research directions

arXiv.org e-Print Archive

Comparative evaluation of instrument segmentation and tracking methods in minimally invasive surgery

Author: Agustinos Anthony
Allan Max
Bodenstedt Sebastian
Du Xiaofei
Garcia-Peraza-Herrera Luis
Kenngott Hannes
Kurmann Thomas
Maier-Hein Lena
Müller-Stich Beat
Ourselin Sebastien
Pakhomov Daniil
Speidel Stefanie
Stoyanov Danail
Sznitman Raphael
Teichmann Marvin
Thoma Martin
Vercauteren Tom
Voros Sandrine
Wagner Martin
Wochner Pamela
Publication venue
Publication date: 07/05/2018
Field of study

Intraoperative segmentation and tracking of minimally invasive instruments is a prerequisite for computer- and robotic-assisted surgery. Since additional hardware like tracking systems or the robot encoders are cumbersome and lack accuracy, surgical vision is evolving as promising techniques to segment and track the instruments using only the endoscopic images. However, what is missing so far are common image data sets for consistent evaluation and benchmarking of algorithms against each other. The paper presents a comparative validation study of different vision-based methods for instrument segmentation and tracking in the context of robotic as well as conventional laparoscopic surgery. The contribution of the paper is twofold: we introduce a comprehensive validation data set that was provided to the study participants and present the results of the comparative validation study. Based on the results of the validation study, we arrive at the conclusion that modern deep learning approaches outperform other methods in instrument segmentation tasks, but the results are still not perfect. Furthermore, we show that merging results from different methods actually significantly increases accuracy in comparison to the best stand-alone method. On the other hand, the results of the instrument tracking task show that this is still an open challenge, especially during challenging scenarios in conventional laparoscopic surgery

arXiv.org e-Print Archive

Advanced methods and deep learning for video and satellite data compression

Author: PRETTE NICOLA
Publication venue: country:Italy
Publication date: 19/09/2022
Field of study

L'abstract è presente nell'allegato / the abstract is in the attachmen

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)