16 research outputs found
Temporal Segmentation of Surgical Sub-tasks through Deep Learning with Multiple Data Sources
Many tasks in robot-assisted surgeries (RAS) can be represented by finite-state machines (FSMs), where each state represents either an action (such as picking up a needle) or an observation (such as bleeding). A crucial step towards the automation of such surgical tasks is the temporal perception of the current surgical scene, which requires a real-time estimation of the states in the FSMs. The objective of this work is to estimate the current state of the surgical task based on the actions performed or events occurred as the task progresses. We propose Fusion-KVE, a unified surgical state estimation model that incorporates multiple data sources including the Kinematics, Vision, and system Events. Additionally, we examine the strengths and weaknesses of different state estimation models in segmenting states with different representative features or levels of granularity. We evaluate our model on the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), as well as a more complex dataset involving robotic intra-operative ultrasound (RIOUS) imaging, created using the da Vinci® Xi surgical system. Our model achieves a superior frame-wise state estimation accuracy up to 89.4%, which improves the state-of-the-art surgical state estimation models in both JIGSAWS suturing dataset and our RIOUS dataset
daVinciNet: Joint Prediction of Motion and Surgical State in Robot-Assisted Surgery
This paper presents a technique to concurrently and jointly predict the future trajectories of surgical instruments and the future state(s) of surgical subtasks in robot-assisted surgeries (RAS) using multiple input sources. Such predictions are a necessary first step towards shared control and supervised autonomy of surgical subtasks. Minute-long surgical subtasks, such as suturing or ultrasound scanning, often have distinguishable tool kinematics and visual features, and can be described as a series of fine-grained states with transition schematics. We propose daVinciNet - an end-to-end dual-task model for robot motion and surgical state predictions. daVinciNet performs concurrent end-effector trajectory and surgical state predictions using features extracted from multiple data streams, including robot kinematics, endoscopic vision, and system events. We evaluate our proposed model on an extended Robotic Intra-Operative Ultrasound (RIOUS+) imaging dataset collected on a da Vinci Xi surgical system and the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS). Our model achieves up to 93.85% short-term (0.5s) and 82.11% long-term (2s) state prediction accuracy, as well as 1.07mm short-term and 5.62mm long-term trajectory prediction error
daVinciNet: Joint Prediction of Motion and Surgical State in Robot-Assisted Surgery
This paper presents a technique to concurrently and jointly predict the
future trajectories of surgical instruments and the future state(s) of surgical
subtasks in robot-assisted surgeries (RAS) using multiple input sources. Such
predictions are a necessary first step towards shared control and supervised
autonomy of surgical subtasks. Minute-long surgical subtasks, such as suturing
or ultrasound scanning, often have distinguishable tool kinematics and visual
features, and can be described as a series of fine-grained states with
transition schematics. We propose daVinciNet - an end-to-end dual-task model
for robot motion and surgical state predictions. daVinciNet performs concurrent
end-effector trajectory and surgical state predictions using features extracted
from multiple data streams, including robot kinematics, endoscopic vision, and
system events. We evaluate our proposed model on an extended Robotic
Intra-Operative Ultrasound (RIOUS+) imaging dataset collected on a da Vinci Xi
surgical system and the JHU-ISI Gesture and Skill Assessment Working Set
(JIGSAWS). Our model achieves up to 93.85% short-term (0.5s) and 82.11%
long-term (2s) state prediction accuracy, as well as 1.07mm short-term and
5.62mm long-term trajectory prediction error.Comment: Accepted to IROS 202
Temporal Segmentation of Surgical Sub-tasks through Deep Learning with Multiple Data Sources
Many tasks in robot-assisted surgeries (RAS) can be represented by finite-state machines (FSMs), where each state represents either an action (such as picking up a needle) or an observation (such as bleeding). A crucial step towards the automation of such surgical tasks is the temporal perception of the current surgical scene, which requires a real-time estimation of the states in the FSMs. The objective of this work is to estimate the current state of the surgical task based on the actions performed or events occurred as the task progresses. We propose Fusion-KVE, a unified surgical state estimation model that incorporates multiple data sources including the Kinematics, Vision, and system Events. Additionally, we examine the strengths and weaknesses of different state estimation models in segmenting states with different representative features or levels of granularity. We evaluate our model on the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), as well as a more complex dataset involving robotic intra-operative ultrasound (RIOUS) imaging, created using the da Vinci® Xi surgical system. Our model achieves a superior frame-wise state estimation accuracy up to 89.4%, which improves the state-of-the-art surgical state estimation models in both JIGSAWS suturing dataset and our RIOUS dataset
Protocol for a scoping review to examine the usability, acceptance and implementation of Artificial Intelligence (AI) in surgical coaching and training.
The current surgical training paradigm in the United States is shifting to a competency-based system. Artificial Intelligence (AI) also can assist in competency-based learning by eliminating biases that may exist in surgical coaching and training. This scoping review aims to examine studies (1) that evaluate implementation and/or bias of AI in surgical videos (2) studies examining user acceptance of this technology for surgical training or coaching, and (3) studies examining the impact on surgical outcomes with the use of AI applied to surgical videos. Our approach will follow the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews (PRISMA-ScR) guidelines and checklist. Results will be charted and synthesized to understand outcomes
Concept Graph Neural Networks for Surgical Video Understanding
We constantly integrate our knowledge and understanding of the world to
enhance our interpretation of what we see.
This ability is crucial in application domains which entail reasoning about
multiple entities and concepts, such as AI-augmented surgery. In this paper, we
propose a novel way of integrating conceptual knowledge into temporal analysis
tasks via temporal concept graph networks. In the proposed networks, a global
knowledge graph is incorporated into the temporal analysis of surgical
instances, learning the meaning of concepts and relations as they apply to the
data. We demonstrate our results in surgical video data for tasks such as
verification of critical view of safety, as well as estimation of Parkland
grading scale. The results show that our method improves the recognition and
detection of complex benchmarks as well as enables other analytic applications
of interest