1,429 research outputs found

    Automatic alignment of surgical videos using kinematic data

    Full text link
    Over the past one hundred years, the classic teaching methodology of "see one, do one, teach one" has governed the surgical education systems worldwide. With the advent of Operation Room 2.0, recording video, kinematic and many other types of data during the surgery became an easy task, thus allowing artificial intelligence systems to be deployed and used in surgical and medical practice. Recently, surgical videos has been shown to provide a structure for peer coaching enabling novice trainees to learn from experienced surgeons by replaying those videos. However, the high inter-operator variability in surgical gesture duration and execution renders learning from comparing novice to expert surgical videos a very difficult task. In this paper, we propose a novel technique to align multiple videos based on the alignment of their corresponding kinematic multivariate time series data. By leveraging the Dynamic Time Warping measure, our algorithm synchronizes a set of videos in order to show the same gesture being performed at different speed. We believe that the proposed approach is a valuable addition to the existing learning tools for surgery.Comment: Accepted at AIME 201

    Continuous Action Recognition Based on Sequence Alignment

    Get PDF
    Continuous action recognition is more challenging than isolated recognition because classification and segmentation must be simultaneously carried out. We build on the well known dynamic time warping (DTW) framework and devise a novel visual alignment technique, namely dynamic frame warping (DFW), which performs isolated recognition based on per-frame representation of videos, and on aligning a test sequence with a model sequence. Moreover, we propose two extensions which enable to perform recognition concomitant with segmentation, namely one-pass DFW and two-pass DFW. These two methods have their roots in the domain of continuous recognition of speech and, to the best of our knowledge, their extension to continuous visual action recognition has been overlooked. We test and illustrate the proposed techniques with a recently released dataset (RAVEL) and with two public-domain datasets widely used in action recognition (Hollywood-1 and Hollywood-2). We also compare the performances of the proposed isolated and continuous recognition algorithms with several recently published methods

    Goal Set Inverse Optimal Control and Iterative Re-planning for Predicting Human Reaching Motions in Shared Workspaces

    Full text link
    To enable safe and efficient human-robot collaboration in shared workspaces it is important for the robot to predict how a human will move when performing a task. While predicting human motion for tasks not known a priori is very challenging, we argue that single-arm reaching motions for known tasks in collaborative settings (which are especially relevant for manufacturing) are indeed predictable. Two hypotheses underlie our approach for predicting such motions: First, that the trajectory the human performs is optimal with respect to an unknown cost function, and second, that human adaptation to their partner's motion can be captured well through iterative re-planning with the above cost function. The key to our approach is thus to learn a cost function which "explains" the motion of the human. To do this, we gather example trajectories from pairs of participants performing a collaborative assembly task using motion capture. We then use Inverse Optimal Control to learn a cost function from these trajectories. Finally, we predict reaching motions from the human's current configuration to a task-space goal region by iteratively re-planning a trajectory using the learned cost function. Our planning algorithm is based on the trajectory optimizer STOMP, it plans for a 23 DoF human kinematic model and accounts for the presence of a moving collaborator and obstacles in the environment. Our results suggest that in most cases, our method outperforms baseline methods when predicting motions. We also show that our method outperforms baselines for predicting human motion when a human and a robot share the workspace.Comment: 12 pages, Accepted for publication IEEE Transaction on Robotics 201

    Computational Modeling Approaches For Task Analysis In Robotic-Assisted Surgery

    Get PDF
    Surgery is continuously subject to technological innovations including the introduction of robotic surgical devices. The ultimate goal is to program the surgical robot to perform certain difficult or complex surgical tasks in an autonomous manner. The feasibility of current robotic surgery systems to record quantitative motion and video data motivates developing descriptive mathematical models to recognize, classify and analyze surgical tasks. Recent advances in machine learning research for uncovering concealed patterns in huge data sets, like kinematic and video data, offer a possibility to better understand surgical procedures from a system point of view. This dissertation focuses on bridging the gap between these two lines of the research by developing computational models for task analysis in robotic-assisted surgery. The key step for advance study in robotic-assisted surgery and autonomous skill assessment is to develop techniques that are capable of recognizing fundamental surgical tasks intelligently. Surgical tasks and at a more granular level, surgical gestures, need to be quantified to make them amenable for further study. To answer to this query, we introduce a new framework, namely DTW-kNN, to recognize and classify three important surgical tasks including suturing, needle passing and knot tying based on kinematic data captured using da Vinci robotic surgery system. Our proposed method needs minimum preprocessing that results in simple, straightforward and accurate framework which can be applied for any autonomous control system. We also propose an unsupervised gesture segmentation and recognition (UGSR) method which has the ability to automatically segment and recognize temporal sequence of gestures in RMIS task. We also extent our model by applying soft boundary segmentation (Soft-UGSR) to address some of the challenges that exist in the surgical motion segmentation. The proposed algorithm can effectively model gradual transitions between surgical activities. Additionally, surgical training is undergoing a paradigm shift with more emphasis on the development of technical skills earlier in training. Thus metrics for the skills, especially objective metrics, become crucial. One field of surgery where such techniques can be developed is robotic surgery, as here all movements are already digitalized and therefore easily susceptible to analysis. Robotic surgery requires surgeons to perform a much longer and difficult training process which create numerous new challenges for surgical training. Hence, a new method of surgical skill assessment is required to ensure that surgeons have adequate skill level to be allowed to operate freely on patients. Among many possible approaches, those that provide noninvasive monitoring of expert surgeon and have the ability to automatically evaluate surgeon\u27s skill are of increased interest. Therefore, in this dissertation we develop a predictive framework for surgical skill assessment to automatically evaluate performance of surgeon in RMIS. Our classification framework is based on the Global Movement Features (GMFs) which extracted from kinematic movement data. The proposed method addresses some of the limitations in previous work and gives more insight about underlying patterns of surgical skill levels


    Get PDF
    Current Automatic Speech Recognition (ASR) systems fail to perform nearly as good as human speech recognition performance due to their lack of robustness against speech variability and noise contamination. The goal of this dissertation is to investigate these critical robustness issues, put forth different ways to address them and finally present an ASR architecture based upon these robustness criteria. Acoustic variations adversely affect the performance of current phone-based ASR systems, in which speech is modeled as `beads-on-a-string', where the beads are the individual phone units. While phone units are distinctive in cognitive domain, they are varying in the physical domain and their variation occurs due to a combination of factors including speech style, speaking rate etc.; a phenomenon commonly known as `coarticulation'. Traditional ASR systems address such coarticulatory variations by using contextualized phone-units such as triphones. Articulatory phonology accounts for coarticulatory variations by modeling speech as a constellation of constricting actions known as articulatory gestures. In such a framework, speech variations such as coarticulation and lenition are accounted for by gestural overlap in time and gestural reduction in space. To realize a gesture-based ASR system, articulatory gestures have to be inferred from the acoustic signal. At the initial stage of this research an initial study was performed using synthetically generated speech to obtain a proof-of-concept that articulatory gestures can indeed be recognized from the speech signal. It was observed that having vocal tract constriction trajectories (TVs) as intermediate representation facilitated the gesture recognition task from the speech signal. Presently no natural speech database contains articulatory gesture annotation; hence an automated iterative time-warping architecture is proposed that can annotate any natural speech database with articulatory gestures and TVs. Two natural speech databases: X-ray microbeam and Aurora-2 were annotated, where the former was used to train a TV-estimator and the latter was used to train a Dynamic Bayesian Network (DBN) based ASR architecture. The DBN architecture used two sets of observation: (a) acoustic features in the form of mel-frequency cepstral coefficients (MFCCs) and (b) TVs (estimated from the acoustic speech signal). In this setup the articulatory gestures were modeled as hidden random variables, hence eliminating the necessity for explicit gesture recognition. Word recognition results using the DBN architecture indicate that articulatory representations not only can help to account for coarticulatory variations but can also significantly improve the noise robustness of ASR system
    • …