Search CORE

1,429 research outputs found

Automatic alignment of surgical videos using kinematic data

Author: AK Rapp
F Padua
F Petitjean
F Petitjean
G Forestier
G Forestier
GD Evangelidis
GE Herrera-Almario
H Ismail Fawaz
H Sakoe
I Masic
L Wang
L Wolf
M Shokoohi-Yekta
O Wang
P Mota
R Kneebone
S McNatt
Y Yamada
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/04/2019
Field of study

Over the past one hundred years, the classic teaching methodology of "see one, do one, teach one" has governed the surgical education systems worldwide. With the advent of Operation Room 2.0, recording video, kinematic and many other types of data during the surgery became an easy task, thus allowing artificial intelligence systems to be deployed and used in surgical and medical practice. Recently, surgical videos has been shown to provide a structure for peer coaching enabling novice trainees to learn from experienced surgeons by replaying those videos. However, the high inter-operator variability in surgical gesture duration and execution renders learning from comparing novice to expert surgical videos a very difficult task. In this paper, we propose a novel technique to align multiple videos based on the alignment of their corresponding kinematic multivariate time series data. By leveraging the Dynamic Time Warping measure, our algorithm synchronizes a set of videos in order to show the same gesture being performed at different speed. We believe that the proposed approach is a valuable addition to the existing learning tools for surgery.Comment: Accepted at AIME 201

arXiv.org e-Print Archive

Crossref

Continuous Action Recognition Based on Sequence Alignment

Author: Cech Jan
Evangelidis Georgios
Horaud Radu
Kulkarni Kaustubh
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/06/2014
Field of study

Continuous action recognition is more challenging than isolated recognition because classification and segmentation must be simultaneously carried out. We build on the well known dynamic time warping (DTW) framework and devise a novel visual alignment technique, namely dynamic frame warping (DFW), which performs isolated recognition based on per-frame representation of videos, and on aligning a test sequence with a model sequence. Moreover, we propose two extensions which enable to perform recognition concomitant with segmentation, namely one-pass DFW and two-pass DFW. These two methods have their roots in the domain of continuous recognition of speech and, to the best of our knowledge, their extension to continuous visual action recognition has been overlooked. We test and illustrate the proposed techniques with a recently released dataset (RAVEL) and with two public-domain datasets widely used in action recognition (Hollywood-1 and Hollywood-2). We also compare the performances of the proposed isolated and continuous recognition algorithms with several recently published methods

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-Rennes 1

Goal Set Inverse Optimal Control and Iterative Re-planning for Predicting Human Reaching Motions in Shared Workspaces

Author: Berenson Dmitry
Hayne Rafi
Mainprice Jim
Publication venue
Publication date: 07/06/2016
Field of study

To enable safe and efficient human-robot collaboration in shared workspaces it is important for the robot to predict how a human will move when performing a task. While predicting human motion for tasks not known a priori is very challenging, we argue that single-arm reaching motions for known tasks in collaborative settings (which are especially relevant for manufacturing) are indeed predictable. Two hypotheses underlie our approach for predicting such motions: First, that the trajectory the human performs is optimal with respect to an unknown cost function, and second, that human adaptation to their partner's motion can be captured well through iterative re-planning with the above cost function. The key to our approach is thus to learn a cost function which "explains" the motion of the human. To do this, we gather example trajectories from pairs of participants performing a collaborative assembly task using motion capture. We then use Inverse Optimal Control to learn a cost function from these trajectories. Finally, we predict reaching motions from the human's current configuration to a task-space goal region by iteratively re-planning a trajectory using the learned cost function. Our planning algorithm is based on the trajectory optimizer STOMP, it plans for a 23 DoF human kinematic model and accounts for the presence of a moving collaborator and obstacles in the environment. Our results suggest that in most cases, our method outperforms baseline methods when predicting motions. We also show that our method outperforms baselines for predicting human motion when a human and a robot share the workspace.Comment: 12 pages, Accepted for publication IEEE Transaction on Robotics 201

arXiv.org e-Print Archive

MPG.PuRe

The Assesment of Spatial Features and Kinematics of Characters: an Analysis of Subjective and Objective Measures

Author: Anne Hillairet De Boisferon
Edouard Gentaz
Jeremy Bluteau
Publication venue: 'IntechOpen'
Publication date: 01/01/2010
Field of study

IntechOpen

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL Université de Savoie

Computational Modeling Approaches For Task Analysis In Robotic-Assisted Surgery

Author: Jahanbani Fard Mahtab
Publication venue: DigitalCommons@WayneState
Publication date: 01/01/2016
Field of study

Surgery is continuously subject to technological innovations including the introduction of robotic surgical devices. The ultimate goal is to program the surgical robot to perform certain difficult or complex surgical tasks in an autonomous manner. The feasibility of current robotic surgery systems to record quantitative motion and video data motivates developing descriptive mathematical models to recognize, classify and analyze surgical tasks. Recent advances in machine learning research for uncovering concealed patterns in huge data sets, like kinematic and video data, offer a possibility to better understand surgical procedures from a system point of view. This dissertation focuses on bridging the gap between these two lines of the research by developing computational models for task analysis in robotic-assisted surgery. The key step for advance study in robotic-assisted surgery and autonomous skill assessment is to develop techniques that are capable of recognizing fundamental surgical tasks intelligently. Surgical tasks and at a more granular level, surgical gestures, need to be quantified to make them amenable for further study. To answer to this query, we introduce a new framework, namely DTW-kNN, to recognize and classify three important surgical tasks including suturing, needle passing and knot tying based on kinematic data captured using da Vinci robotic surgery system. Our proposed method needs minimum preprocessing that results in simple, straightforward and accurate framework which can be applied for any autonomous control system. We also propose an unsupervised gesture segmentation and recognition (UGSR) method which has the ability to automatically segment and recognize temporal sequence of gestures in RMIS task. We also extent our model by applying soft boundary segmentation (Soft-UGSR) to address some of the challenges that exist in the surgical motion segmentation. The proposed algorithm can effectively model gradual transitions between surgical activities. Additionally, surgical training is undergoing a paradigm shift with more emphasis on the development of technical skills earlier in training. Thus metrics for the skills, especially objective metrics, become crucial. One field of surgery where such techniques can be developed is robotic surgery, as here all movements are already digitalized and therefore easily susceptible to analysis. Robotic surgery requires surgeons to perform a much longer and difficult training process which create numerous new challenges for surgical training. Hence, a new method of surgical skill assessment is required to ensure that surgeons have adequate skill level to be allowed to operate freely on patients. Among many possible approaches, those that provide noninvasive monitoring of expert surgeon and have the ability to automatically evaluate surgeon\u27s skill are of increased interest. Therefore, in this dissertation we develop a predictive framework for surgical skill assessment to automatically evaluate performance of surgeon in RMIS. Our classification framework is based on the Global Movement Features (GMFs) which extracted from kinematic movement data. The proposed method addresses some of the limitations in previous work and gives more insight about underlying patterns of surgical skill levels

Digital Commons@Wayne State University

ARTICULATORY INFORMATION FOR ROBUST SPEECH RECOGNITION

Author: Mitra Vikramjit
Publication venue
Publication date: 01/01/2010
Field of study

Current Automatic Speech Recognition (ASR) systems fail to perform nearly as good as human speech recognition performance due to their lack of robustness against speech variability and noise contamination. The goal of this dissertation is to investigate these critical robustness issues, put forth different ways to address them and finally present an ASR architecture based upon these robustness criteria. Acoustic variations adversely affect the performance of current phone-based ASR systems, in which speech is modeled as `beads-on-a-string', where the beads are the individual phone units. While phone units are distinctive in cognitive domain, they are varying in the physical domain and their variation occurs due to a combination of factors including speech style, speaking rate etc.; a phenomenon commonly known as `coarticulation'. Traditional ASR systems address such coarticulatory variations by using contextualized phone-units such as triphones. Articulatory phonology accounts for coarticulatory variations by modeling speech as a constellation of constricting actions known as articulatory gestures. In such a framework, speech variations such as coarticulation and lenition are accounted for by gestural overlap in time and gestural reduction in space. To realize a gesture-based ASR system, articulatory gestures have to be inferred from the acoustic signal. At the initial stage of this research an initial study was performed using synthetically generated speech to obtain a proof-of-concept that articulatory gestures can indeed be recognized from the speech signal. It was observed that having vocal tract constriction trajectories (TVs) as intermediate representation facilitated the gesture recognition task from the speech signal. Presently no natural speech database contains articulatory gesture annotation; hence an automated iterative time-warping architecture is proposed that can annotate any natural speech database with articulatory gestures and TVs. Two natural speech databases: X-ray microbeam and Aurora-2 were annotated, where the former was used to train a TV-estimator and the latter was used to train a Dynamic Bayesian Network (DBN) based ASR architecture. The DBN architecture used two sets of observation: (a) acoustic features in the form of mel-frequency cepstral coefficients (MFCCs) and (b) TVs (estimated from the acoustic speech signal). In this setup the articulatory gestures were modeled as hidden random variables, hence eliminating the necessity for explicit gesture recognition. Word recognition results using the DBN architecture indicate that articulatory representations not only can help to account for coarticulatory variations but can also significantly improve the noise robustness of ASR system

CiteSeerX

Digital Repository at the University of Maryland

Recommended from our members

Pantomimic Gestures for Human-Robot Interaction

Author: Burke M
Lasenby J
Publication venue: IEEE Transactions on Robotics
Publication date: 01/01/2015
Field of study

This work introduces a pantomimic gesture interface, which classifies human hand gestures using unmanned aerial vehicle (UAV) behaviour recordings as training data. We argue that pantomimic gestures are more intuitive than iconic gestures and show that a pantomimic gesture recognition strategy using micro UAV behaviour recordings can be more robust than one trained directly using hand gestures. Hand gestures are isolated by applying a maximum information criterion, with features extracted using principal component analysis (PCA) and compared using a nearest neighbour classifier. These features are biased in that they are better suited to classifying certain behaviours. We show how a Bayesian update step accounting for the geometry of training features compensates for this, resulting in fairer classification results, and introduce a weighted voting system to aid in sequence labelling.This is the author accepted manuscript. The final version is available from IEEE via http://dx.doi.org/10.1109/TRO.2015.247595

Apollo (Cambridge)