Search CORE

1,688 research outputs found

Surgical Phase Recognition of Short Video Shots Based on Temporal Modeling of Deep Features

Author: Loukas Constantinos
Publication venue
Publication date: 07/12/2018
Field of study

Recognizing the phases of a laparoscopic surgery (LS) operation form its video constitutes a fundamental step for efficient content representation, indexing and retrieval in surgical video databases. In the literature, most techniques focus on phase segmentation of the entire LS video using hand-crafted visual features, instrument usage signals, and recently convolutional neural networks (CNNs). In this paper we address the problem of phase recognition of short video shots (10s) of the operation, without utilizing information about the preceding/forthcoming video frames, their phase labels or the instruments used. We investigate four state-of-the-art CNN architectures (Alexnet, VGG19, GoogleNet, and ResNet101), for feature extraction via transfer learning. Visual saliency was employed for selecting the most informative region of the image as input to the CNN. Video shot representation was based on two temporal pooling mechanisms. Most importantly, we investigate the role of 'elapsed time' (from the beginning of the operation), and we show that inclusion of this feature can increase performance dramatically (69% vs. 75% mean accuracy). Finally, a long short-term memory (LSTM) network was trained for video shot classification based on the fusion of CNN features with 'elapsed time', increasing the accuracy to 86%. Our results highlight the prominent role of visual saliency, long-range temporal recursion and 'elapsed time' (a feature so far ignored), for surgical phase recognition.Comment: 6 pages, 4 figures, 6 table

arXiv.org e-Print Archive

Crossref

Pergamos : Unified Institutional Repository / Digital Library Platform of the National and Kapodistrian University of Athens

Compound Effects of Top-down and Bottom-up Influences on Visual Attention During Action Recognition

Author: Demiris Y
Kaelbling LP
Khadhouri B
Saffotti A
Publication venue: IJCAI-INT JOINT CONF ARTIF INTELL
Publication date: 01/01/2005
Field of study

Spiral - Imperial College Digital Repository

DR(eye)VE: a Dataset for Attention-Based Tasks with Applications to Autonomous and Assisted Driving

Author: ALLETTO STEFANO
CALDERARA Simone
CUCCHIARA Rita
PALAZZI ANDREA
SOLERA FRANCESCO
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Autonomous and assisted driving are undoubtedly hot topics in computer vision. However, the driving task is extremely complex and a deep understanding of drivers' behavior is still lacking. Several researchers are now investigating the attention mechanism in order to define computational models for detecting salient and interesting objects in the scene. Nevertheless, most of these models only refer to bottom up visual saliency and are focused on still images. Instead, during the driving experience the temporal nature and peculiarity of the task influence the attention mechanisms, leading to the conclusion that real life driving data is mandatory. In this paper we propose a novel and publicly available dataset acquired during actual driving. Our dataset, composed by more than 500,000 frames, contains drivers' gaze fixations and their temporal integration providing task-specific saliency maps. Geo-referenced locations, driving speed and course complete the set of released data. To the best of our knowledge, this is the first publicly available dataset of this kind and can foster new discussions on better understanding, exploiting and reproducing the driver's attention process in the autonomous and assisted cars of future generations

Crossref

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia