3,750 research outputs found
Understanding Video Transformers for Segmentation: A Survey of Application and Interpretability
Video segmentation encompasses a wide range of categories of problem
formulation, e.g., object, scene, actor-action and multimodal video
segmentation, for delineating task-specific scene components with pixel-level
masks. Recently, approaches in this research area shifted from concentrating on
ConvNet-based to transformer-based models. In addition, various
interpretability approaches have appeared for transformer models and video
temporal dynamics, motivated by the growing interest in basic scientific
understanding, model diagnostics and societal implications of real-world
deployment. Previous surveys mainly focused on ConvNet models on a subset of
video segmentation tasks or transformers for classification tasks. Moreover,
component-wise discussion of transformer-based video segmentation models has
not yet received due focus. In addition, previous reviews of interpretability
methods focused on transformers for classification, while analysis of video
temporal dynamics modelling capabilities of video models received less
attention. In this survey, we address the above with a thorough discussion of
various categories of video segmentation, a component-wise discussion of the
state-of-the-art transformer-based models, and a review of related
interpretability methods. We first present an introduction to the different
video segmentation task categories, their objectives, specific challenges and
benchmark datasets. Next, we provide a component-wise review of recent
transformer-based models and document the state of the art on different video
segmentation tasks. Subsequently, we discuss post-hoc and ante-hoc
interpretability methods for transformer models and interpretability methods
for understanding the role of the temporal dimension in video models. Finally,
we conclude our discussion with future research directions
Comparative evaluation of instrument segmentation and tracking methods in minimally invasive surgery
Intraoperative segmentation and tracking of minimally invasive instruments is
a prerequisite for computer- and robotic-assisted surgery. Since additional
hardware like tracking systems or the robot encoders are cumbersome and lack
accuracy, surgical vision is evolving as promising techniques to segment and
track the instruments using only the endoscopic images. However, what is
missing so far are common image data sets for consistent evaluation and
benchmarking of algorithms against each other. The paper presents a comparative
validation study of different vision-based methods for instrument segmentation
and tracking in the context of robotic as well as conventional laparoscopic
surgery. The contribution of the paper is twofold: we introduce a comprehensive
validation data set that was provided to the study participants and present the
results of the comparative validation study. Based on the results of the
validation study, we arrive at the conclusion that modern deep learning
approaches outperform other methods in instrument segmentation tasks, but the
results are still not perfect. Furthermore, we show that merging results from
different methods actually significantly increases accuracy in comparison to
the best stand-alone method. On the other hand, the results of the instrument
tracking task show that this is still an open challenge, especially during
challenging scenarios in conventional laparoscopic surgery
Advanced methods and deep learning for video and satellite data compression
L'abstract è presente nell'allegato / the abstract is in the attachmen
- …