1,257 research outputs found
Surgical Phase Recognition of Short Video Shots Based on Temporal Modeling of Deep Features
Recognizing the phases of a laparoscopic surgery (LS) operation form its
video constitutes a fundamental step for efficient content representation,
indexing and retrieval in surgical video databases. In the literature, most
techniques focus on phase segmentation of the entire LS video using
hand-crafted visual features, instrument usage signals, and recently
convolutional neural networks (CNNs). In this paper we address the problem of
phase recognition of short video shots (10s) of the operation, without
utilizing information about the preceding/forthcoming video frames, their phase
labels or the instruments used. We investigate four state-of-the-art CNN
architectures (Alexnet, VGG19, GoogleNet, and ResNet101), for feature
extraction via transfer learning. Visual saliency was employed for selecting
the most informative region of the image as input to the CNN. Video shot
representation was based on two temporal pooling mechanisms. Most importantly,
we investigate the role of 'elapsed time' (from the beginning of the
operation), and we show that inclusion of this feature can increase performance
dramatically (69% vs. 75% mean accuracy). Finally, a long short-term memory
(LSTM) network was trained for video shot classification based on the fusion of
CNN features with 'elapsed time', increasing the accuracy to 86%. Our results
highlight the prominent role of visual saliency, long-range temporal recursion
and 'elapsed time' (a feature so far ignored), for surgical phase recognition.Comment: 6 pages, 4 figures, 6 table
Surgical video retrieval using deep neural networks
Although the amount of raw surgical videos, namely videos
captured during surgical interventions, is growing fast, automatic retrieval
and search remains a challenge. This is mainly due to the nature
of the content, i.e. visually non-consistent tissue, diversity of internal organs,
abrupt viewpoint changes and illumination variation. We propose
a framework for retrieving surgical videos and a protocol for evaluating
the results. The method is composed of temporal shot segmentation and
representation based on deep features, and the protocol introduces novel
criteria to the field. The experimental results prove the superiority of
the proposed method and highlight the path towards a more effective
protocol for evaluating surgical videos
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Recent advancements in surgical computer vision applications have been driven
by fully-supervised methods, primarily using only visual data. These methods
rely on manually annotated surgical videos to predict a fixed set of object
categories, limiting their generalizability to unseen surgical procedures and
downstream tasks. In this work, we put forward the idea that the surgical video
lectures available through open surgical e-learning platforms can provide
effective supervisory signals for multi-modal representation learning without
relying on manual annotations. We address the surgery-specific linguistic
challenges present in surgical video lectures by employing multiple
complementary automatic speech recognition systems to generate text
transcriptions. We then present a novel method, SurgVLP - Surgical Vision
Language Pre-training, for multi-modal representation learning. SurgVLP
constructs a new contrastive learning objective to align video clip embeddings
with the corresponding multiple text embeddings by bringing them together
within a joint latent space. To effectively show the representation capability
of the learned joint latent space, we introduce several vision-and-language
tasks for surgery, such as text-based video retrieval, temporal activity
grounding, and video captioning, as benchmarks for evaluation. We further
demonstrate that without using any labeled ground truth, our approach can be
employed for traditional vision-only surgical downstream tasks, such as
surgical tool, phase, and triplet recognition. The code will be made available
at https://github.com/CAMMA-public/SurgVL
TELMA: technology enhanced learning environment for Minimally Invasive Surgery
One of the most important revolutions in the past few decades in clinical practice has been motivated by the introduction of Minimally Invasive Surgery (MIS) techniques, which have spread amongst almost all surgical specialities. MIS training is a principal component of the education of new surgical residents, with an increasing demand for knowledge and skills for medical students and surgeons. Technology enhanced learning (TEL) solutions can deal with the growing need for MIS learning. This research work aims to develop a MIS learning environment based on web technologies, named TELMA, which will respond to the growing amount of information and multimedia surgical contents available (mainly intervention’s video recording libraries), in compliance with specific learning needs of surgical students and professionals, enhancing their competence on MIS cognitive skills. Furthermore, TELMA will support knowledge capturing, sharing and reuse, and effective management of didactic contents through personalised and collaborative services
Event Recognition in Laparoscopic Gynecology Videos with Hybrid Transformers
Analyzing laparoscopic surgery videos presents a complex and multifaceted
challenge, with applications including surgical training, intra-operative
surgical complication prediction, and post-operative surgical assessment.
Identifying crucial events within these videos is a significant prerequisite in
a majority of these applications. In this paper, we introduce a comprehensive
dataset tailored for relevant event recognition in laparoscopic gynecology
videos. Our dataset includes annotations for critical events associated with
major intra-operative challenges and post-operative complications. To validate
the precision of our annotations, we assess event recognition performance using
several CNN-RNN architectures. Furthermore, we introduce and evaluate a hybrid
transformer architecture coupled with a customized training-inference framework
to recognize four specific events in laparoscopic surgery videos. Leveraging
the Transformer networks, our proposed architecture harnesses inter-frame
dependencies to counteract the adverse effects of relevant content occlusion,
motion blur, and surgical scene variation, thus significantly enhancing event
recognition accuracy. Moreover, we present a frame sampling strategy designed
to manage variations in surgical scenes and the surgeons' skill level,
resulting in event recognition with high temporal resolution. We empirically
demonstrate the superiority of our proposed methodology in event recognition
compared to conventional CNN-RNN architectures through a series of extensive
experiments
Can Image Enhancement be Beneficial to Find Smoke Images in Laparoscopic Surgery?
Laparoscopic surgery has a limited field of view. Laser ablation in a
laproscopic surgery causes smoke, which inevitably influences the surgeon's
visibility. Therefore, it is of vital importance to remove the smoke, such that
a clear visualization is possible. In order to employ a desmoking technique,
one needs to know beforehand if the image contains smoke or not, to this date,
there exists no accurate method that could classify the smoke/non-smoke images
completely. In this work, we propose a new enhancement method which enhances
the informative details in the RGB images for discrimination of smoke/non-smoke
images. Our proposed method utilizes weighted least squares optimization
framework~(WLS). For feature extraction, we use statistical features based on
bivariate histogram distribution of gradient magnitude~(GM) and Laplacian of
Gaussian~(LoG). We then train a SVM classifier with binary smoke/non-smoke
classification task. We demonstrate the effectiveness of our method on Cholec80
dataset. Experiments using our proposed enhancement method show promising
results with improvements of 4\% in accuracy and 4\% in F1-Score over the
baseline performance of RGB images. In addition, our approach improves over the
saturation histogram based classification methodologies Saturation
Analysis~(SAN) and Saturation Peak Analysis~(SPA) by 1/5\% and 1/6\% in
accuracy/F1-Score metrics.Comment: In proceedings of IST, Color and Imaging Conference (CIC 26).
Congcong Wang and Vivek Sharma contributed equally to this work and listed in
alphabetical orde
AMELIE: Authoring Multimedia-Enhanced Learning Interactive Environment for e-Health Contents
This paper presents the AMELIE Authoring Tool for e-health applications. AMELIE provides the means for creating video-based contents with a focus on e-learning and telerehabilitation processes. The main core of AMELIE lies in the efficient exploitation of raw multimedia resources, which may be already available at clinical centers or recorded ad hoc for learning purposes by health professionals. Three real use cases scenarios involving different target users are presented: (1) cognitive skills? training of surgeons in minimally invasive surgery (medical professionals), (2) training of informal carers for elderly home assistance and (3) cognitive rehabilitation of patients with acquired brain injury. Preliminary validation in the field of surgery hints at the potential of AMELIE; and its versatility in different medical applications is patent from the use cases described. Regardless, new validation studies are planned in the three main application areas identified in this work
Real-time 3D tracking of laparoscopy training instruments for assessment and feedback
Assessment of minimally invasive surgical skills is a non-trivial task, usually requiring the presence and time of expert observers, including subjectivity and requiring special and expensive equipment and software. Although there are virtual simulators that provide self-assessment features, they are limited as the trainee loses the immediate feedback from realistic physical interaction. The physical training boxes, on the other hand, preserve the immediate physical feedback, but lack the automated self-assessment facilities. This study develops an algorithm for real-time tracking of laparoscopy instruments in the video cues of a standard physical laparoscopy training box with a single fisheye camera. The developed visual tracking algorithm recovers the 3D positions of the laparoscopic instrument tips, to which simple colored tapes (markers) are attached. With such system, the extracted instrument trajectories can be digitally processed, and automated self-assessment feedback can be provided. In this way, both the physical interaction feedback would be preserved and the need for the observance of an expert would be overcome. Real-time instrument tracking with a suitable assessment criterion would constitute a significant step towards provision of real-time (immediate) feedback to correct trainee actions and show them how the action should be performed. This study is a step towards achieving this with a low cost, automated, and widely applicable laparoscopy training and assessment system using a standard physical training box equipped with a fisheye camera
Temporal coherence-based self-supervised learning for laparoscopic workflow analysis
In order to provide the right type of assistance at the right time,
computer-assisted surgery systems need context awareness. To achieve this,
methods for surgical workflow analysis are crucial. Currently, convolutional
neural networks provide the best performance for video-based workflow analysis
tasks. For training such networks, large amounts of annotated data are
necessary. However, collecting a sufficient amount of data is often costly,
time-consuming, and not always feasible. In this paper, we address this problem
by presenting and comparing different approaches for self-supervised
pretraining of neural networks on unlabeled laparoscopic videos using temporal
coherence. We evaluate our pretrained networks on Cholec80, a publicly
available dataset for surgical phase segmentation, on which a maximum F1 score
of 84.6 was reached. Furthermore, we were able to achieve an increase of the F1
score of up to 10 points when compared to a non-pretrained neural network.Comment: Accepted at the Workshop on Context-Aware Operating Theaters (OR
2.0), a MICCAI satellite even
Education in laparoscopic surgery:All eyes towards in vivo training
Tegenwoordig worden steeds meer buikoperaties d.m.v. laparoscopische (knoopsgat) chirurgie uitgevoerd. Omdat deze manier van chirurgie zo anders is dan conventionele chirurgie staat tegenwoordig de manier van selectie, training en beoordeling van artsen in opleiding tot chirurg ter discussie in de wetenschap. Uit dit proefschrift blijkt dat neuropsychologische testen voor ruimtelijk inzicht en psychomotorische vaardigheden een voorspellende waarde hebben in de laparoscopische chirurgie. Beoordeling van applicaties voor de opleiding chirurgie zouden daarom gebaat zijn bij een neuropsychologische test van deze vaardigheden. De training van chirurgen kan mogelijk worden verbeterd door het gebruik van het Pareto-principe, een principe dat veel gebruikt wordt in de bedrijfseconomie en verondersteld dat 20% van de verschillende oorzaken verantwoordelijk is voor 80% van de gevolgen. Ook op de operatiekamer blijkt namelijk 20% van de laparoscopische vaardigheden verantwoordelijk te zijn voor 80% van de verbale correcties gegeven door supervisoren. Ten behoeve van trainingsefficiëntie lijkt het dus verstandig om traininginstrumenten (VR simulator taken, boeken, cursussen, etc.) te ontwikkelen die juist deze 20% aanpakken. In het huidige trainingsysteem wordt een algemeen beoordelingsformulier gebruikt voor het geven van feedback, de OSATS. Alhoewel dit een duidelijke vooruitgang is t.o.v. de meer subjectieve beoordelingen van vroeger kan het formulier niet gebruikt worden voor procedure specifieke feedback. Uit dit proefschrift blijkt dat het beoordelen van de mate van fysieke en verbale ondersteuning die een arts in opleiding tot chirurg nodig heeft van zijn supervisor een goed beeld geeft van zijn/haar niveau tijdens een laparoscopische operatie en tevens kan worden gebruikt voor het geven van procedure specifieke feedback
- …