241 research outputs found
Who's Better? Who's Best? Pairwise Deep Ranking for Skill Determination
We present a method for assessing skill from video, applicable to a variety
of tasks, ranging from surgery to drawing and rolling pizza dough. We formulate
the problem as pairwise (who's better?) and overall (who's best?) ranking of
video collections, using supervised deep ranking. We propose a novel loss
function that learns discriminative features when a pair of videos exhibit
variance in skill, and learns shared features when a pair of videos exhibit
comparable skill levels. Results demonstrate our method is applicable across
tasks, with the percentage of correctly ordered pairs of videos ranging from
70% to 83% for four datasets. We demonstrate the robustness of our approach via
sensitivity analysis of its parameters. We see this work as effort toward the
automated organization of how-to video collections and overall, generic skill
determination in video.Comment: CVPR 201
Gesture Recognition in Robotic Surgery: a Review
OBJECTIVE: Surgical activity recognition is a fundamental step in computer-assisted interventions. This paper reviews the state-of-the-art in methods for automatic recognition of fine-grained gestures in robotic surgery focusing on recent data-driven approaches and outlines the open questions and future research directions. METHODS: An article search was performed on 5 bibliographic databases with combinations of the following search terms: robotic, robot-assisted, JIGSAWS, surgery, surgical, gesture, fine-grained, surgeme, action, trajectory, segmentation, recognition, parsing. Selected articles were classified based on the level of supervision required for training and divided into different groups representing major frameworks for time series analysis and data modelling. RESULTS: A total of 52 articles were reviewed. The research field is showing rapid expansion, with the majority of articles published in the last 4 years. Deep-learning-based temporal models with discriminative feature extraction and multi-modal data integration have demonstrated promising results on small surgical datasets. Currently, unsupervised methods perform significantly less well than the supervised approaches. CONCLUSION: The development of large and diverse open-source datasets of annotated demonstrations is essential for development and validation of robust solutions for surgical gesture recognition. While new strategies for discriminative feature extraction and knowledge transfer, or unsupervised and semi-supervised approaches, can mitigate the need for data and labels, they have not yet been demonstrated to achieve comparable performance. Important future research directions include detection and forecast of gesture-specific errors and anomalies. SIGNIFICANCE: This paper is a comprehensive and structured analysis of surgical gesture recognition methods aiming to summarize the status of this rapidly evolving field
Encoding Surgical Videos as Latent Spatiotemporal Graphs for Object and Anatomy-Driven Reasoning
Recently, spatiotemporal graphs have emerged as a concise and elegant manner
of representing video clips in an object-centric fashion, and have shown to be
useful for downstream tasks such as action recognition. In this work, we
investigate the use of latent spatiotemporal graphs to represent a surgical
video in terms of the constituent anatomical structures and tools and their
evolving properties over time. To build the graphs, we first predict frame-wise
graphs using a pre-trained model, then add temporal edges between nodes based
on spatial coherence and visual and semantic similarity. Unlike previous
approaches, we incorporate long-term temporal edges in our graphs to better
model the evolution of the surgical scene and increase robustness to temporary
occlusions. We also introduce a novel graph-editing module that incorporates
prior knowledge and temporal coherence to correct errors in the graph, enabling
improved downstream task performance. Using our graph representations, we
evaluate two downstream tasks, critical view of safety prediction and surgical
phase recognition, obtaining strong results that demonstrate the quality and
flexibility of the learned representations. Code is available at
github.com/CAMMA-public/SurgLatentGraph.Comment: 13 pages, 2 figures, MICCAI 202
Latent Graph Representations for Critical View of Safety Assessment
Assessing the critical view of safety in laparoscopic cholecystectomy
requires accurate identification and localization of key anatomical structures,
reasoning about their geometric relationships to one another, and determining
the quality of their exposure. Prior works have approached this task by
including semantic segmentation as an intermediate step, using predicted
segmentation masks to then predict the CVS. While these methods are effective,
they rely on extremely expensive ground-truth segmentation annotations and tend
to fail when the predicted segmentation is incorrect, limiting generalization.
In this work, we propose a method for CVS prediction wherein we first represent
a surgical image using a disentangled latent scene graph, then process this
representation using a graph neural network. Our graph representations
explicitly encode semantic information - object location, class information,
geometric relations - to improve anatomy-driven reasoning, as well as visual
features to retain differentiability and thereby provide robustness to semantic
errors. Finally, to address annotation cost, we propose to train our method
using only bounding box annotations, incorporating an auxiliary image
reconstruction objective to learn fine-grained object boundaries. We show that
our method not only outperforms several baseline methods when trained with
bounding box annotations, but also scales effectively when trained with
segmentation masks, maintaining state-of-the-art performance.Comment: 12 pages, 4 figure
Artificial intelligence and automation in endoscopy and surgery
Modern endoscopy relies on digital technology, from high-resolution imaging sensors and displays to electronics connecting configurable illumination and actuation systems for robotic articulation. In addition to enabling more effective diagnostic and therapeutic interventions, the digitization of the procedural toolset enables video data capture of the internal human anatomy at unprecedented levels. Interventional video data encapsulate functional and structural information about a patient’s anatomy as well as events, activity and action logs about the surgical process. This detailed but difficult-to-interpret record from endoscopic procedures can be linked to preoperative and postoperative records or patient imaging information. Rapid advances in artificial intelligence, especially in supervised deep learning, can utilize data from endoscopic procedures to develop systems for assisting procedures leading to computer-assisted interventions that can enable better navigation during procedures, automation of image interpretation and robotically assisted tool manipulation. In this Perspective, we summarize state-of-the-art artificial intelligence for computer-assisted interventions in gastroenterology and surgery
GRACE: Online Gesture Recognition for Autonomous Camera-Motion Enhancement in Robot-Assisted Surgery
Camera navigation in minimally invasive surgery changed significantly since the introduction of robotic assistance. Robotic surgeons are subjected to a cognitive workload increase due to the asynchronous control over tools and camera, which also leads to interruptions in the workflow. Camera motion automation has been addressed as a possible solution, but still lacks situation awareness. We propose an online surgical Gesture Recognition for Autonomous Camera-motion Enhancement (GRACE) system to introduce situation awareness in autonomous camera navigation. A recurrent neural network is used in combination with a tool tracking system to offer gesture-specific camera motion during a robotic-assisted suturing task. GRACE was integrated with a research version of the da Vinci surgical system and a user study (involving 10 participants) was performed to evaluate the benefits introduced by situation awareness in camera motion, both with respect to a state of the art autonomous system (S) and current clinical approach (P). Results show GRACE improving completion time by a median reduction of 18.9s (8.1% ) with respect to S and 65.1s (21.1% ) with respect to P. Also, workload reduction was confirmed by statistical difference in the NASA Task Load Index with respect to S (p < 0.05). Reduction of motion sickness, a common issue related to continuous camera motion of autonomous systems, was assessed by a post-experiment survey ( p < 0.01 )
- …