58 research outputs found

    Who's Better? Who's Best? Pairwise Deep Ranking for Skill Determination

    Get PDF
    We present a method for assessing skill from video, applicable to a variety of tasks, ranging from surgery to drawing and rolling pizza dough. We formulate the problem as pairwise (who's better?) and overall (who's best?) ranking of video collections, using supervised deep ranking. We propose a novel loss function that learns discriminative features when a pair of videos exhibit variance in skill, and learns shared features when a pair of videos exhibit comparable skill levels. Results demonstrate our method is applicable across tasks, with the percentage of correctly ordered pairs of videos ranging from 70% to 83% for four datasets. We demonstrate the robustness of our approach via sensitivity analysis of its parameters. We see this work as effort toward the automated organization of how-to video collections and overall, generic skill determination in video.Comment: CVPR 201

    Automated Virtual Coach for Surgical Training

    Get PDF
    Surgical educators have recommended individualized coaching for acquisition, retention and improvement of expertise in technical skills. Such one-on-one coaching is limited to institutions that can afford surgical coaches and is certainly not feasible at national and global scales. We hypothesize that automated methods that model intraoperative video, surgeon's hand and instrument motion, and sensor data can provide effective and efficient individualized coaching. With the advent of instrumented operating rooms and training laboratories, access to such large scale intra-operative data has become feasible. Previous methods for automated skill assessment present an overall evaluation at the task/global level to the surgeons without any directed feedback and error analysis. Demonstration, if at all, is present in the form of fixed instructional videos, while deliberate practice is completely absent from automated training platforms. We believe that an effective coach should: demonstrate expert behavior (how do I do it correctly), evaluate trainee performance (how did I do) at task and segment-level, critique errors and deficits (where and why was I wrong), recommend deliberate practice (what do I do to improve), and monitor skill progress (when do I become proficient). In this thesis, we present new methods and solutions towards these coaching interventions in different training settings viz. virtual reality simulation, bench-top simulation and the operating room. First, we outline a summarizations-based approach for surgical phase modeling using various sources of intra-operative procedural data such as – system events (sensors) as well as crowdsourced surgical activity context. We validate a crowdsourced approach to obtain context summarizations of intra-operative surgical activity. Second, we develop a new scoring method to evaluate task segments using rankings derived from pairwise comparisons of performances obtained via crowdsourcing. We show that reliable and valid crowdsourced pairwise comparisons can be obtained across multiple training task settings. Additionally, we present preliminary results comparing inter-rater agreement in relative ratings and absolute ratings for crowdsourced assessments of an endoscopic sinus surgery training task data set. Third, we implement a real-time feedback and teaching framework using virtual reality simulation to present teaching cues and deficit metrics that are targeted at critical learning elements of a task. We compare the effectiveness of this real-time coach to independent self-driven learning on a needle passing task in a pilot randomized controlled trial. Finally, we present an integration of the above components of task progress detection, segment-level evaluation and real-time feedback towards the first end-to-end automated virtual coach for surgical training

    Evaluation of objective tools and artificial intelligence in robotic surgery technical skills assessment: a systematic review

    Get PDF
    BACKGROUND: There is a need to standardize training in robotic surgery, including objective assessment for accreditation. This systematic review aimed to identify objective tools for technical skills assessment, providing evaluation statuses to guide research and inform implementation into training curricula. METHODS: A systematic literature search was conducted in accordance with the PRISMA guidelines. Ovid Embase/Medline, PubMed and Web of Science were searched. Inclusion criterion: robotic surgery technical skills tools. Exclusion criteria: non-technical, laparoscopy or open skills only. Manual tools and automated performance metrics (APMs) were analysed using Messick's concept of validity and the Oxford Centre of Evidence-Based Medicine (OCEBM) Levels of Evidence and Recommendation (LoR). A bespoke tool analysed artificial intelligence (AI) studies. The Modified Downs-Black checklist was used to assess risk of bias. RESULTS: Two hundred and forty-seven studies were analysed, identifying: 8 global rating scales, 26 procedure-/task-specific tools, 3 main error-based methods, 10 simulators, 28 studies analysing APMs and 53 AI studies. Global Evaluative Assessment of Robotic Skills and the da Vinci Skills Simulator were the most evaluated tools at LoR 1 (OCEBM). Three procedure-specific tools, 3 error-based methods and 1 non-simulator APMs reached LoR 2. AI models estimated outcomes (skill or clinical), demonstrating superior accuracy rates in the laboratory with 60 per cent of methods reporting accuracies over 90 per cent, compared to real surgery ranging from 67 to 100 per cent. CONCLUSIONS: Manual and automated assessment tools for robotic surgery are not well validated and require further evaluation before use in accreditation processes.PROSPERO: registration ID CRD42022304901

    Skill Determination from Long Videos

    Get PDF

    Assessing emphysema in CT scans of the lungs:Using machine learning, crowdsourcing and visual similarity

    Get PDF

    A Survey of Crowdsourcing in Medical Image Analysis

    Get PDF
    Rapid advances in image processing capabilities have been seen across many domains, fostered by the application of machine learning algorithms to "big-data". However, within the realm of medical image analysis, advances have been curtailed, in part, due to the limited availability of large-scale, well-annotated datasets. One of the main reasons for this is the high cost often associated with producing large amounts of high-quality meta-data. Recently, there has been growing interest in the application of crowdsourcing for this purpose; a technique that a technique that is well established in a number of disciplines, including astronomy, ecology and meteorology for creating large-scale datasets across a range of disciplines, from computer vision to astrophysics. Despite the growing popularity of this approach, there has not yet been a comprehensive literature review to provide guidance to researchers considering using crowdsourcing methodologies in their own medical imaging analysis. In this survey, we review studies applying crowdsourcing to the analysis of medical images, published prior to July 2018. We identify common approaches and challenges and provide recommendations to researchers implementing crowdsourcing for medical imaging tasks. Finally, we discuss future opportunities for development within this emerging domain

    Multi-Modal Models for Fine-grained Action Segmentation in Situated Environments

    Get PDF
    Automated methods for analyzing human activities from video or sensor data are critical for enabling new applications in human-robot interaction, surgical data modeling, video summarization, and beyond. Despite decades of research in the fields of robotics and computer vision, current approaches are inadequate for modeling complex activities outside of constrained environments or without using heavily instrumented sensor suites. In this dissertation, I address the problem of fine-grained action segmentation by developing solutions that generalize from domain-specific to general-purpose for applications in surgical workflow, surveillance, and cooking. A key technical challenge, which is central to this dissertation, is how to capture complex temporal patterns from sensor data. For a given task, users may perform the same action at different speeds or styles, and each user may carry out actions in a different order. I present a series of temporal models that address these modes of variability. First, I define the notion of a convolutional action primitive, which captures how low-level sensor signals change as a function of the action a user is performing. Second, I generalize this idea to video with a Spatiotemporal Convolutional Neural Network, which captures relationships between objects in an image and how they change temporally. Lastly, I discuss a hierarchical variant that applies to video or sensor data, called a Temporal Convolutional Network (TCN), which models actions at multiple temporal scales. In certain domains (e.g., surgical training), TCNs can be used to successfully bridge the gap in performance between domain-specific and general-purpose solutions. A key scientific challenge concerns the evaluation of predicted action segmentations. In many applications, action labels may be ill-defined and if one asks two different annotators when a given action starts and stops they may give answers that are seconds apart. I argue that the standard action segmentation metrics are insufficient for evaluating real-world segmentation performance and propose two alternatives. Qualitatively, these metrics are better at capturing the efficacy of models in the described applications. I conclude with a case-study on surgical workflow analysis, which has the potential to improve surgical education and operating room efficiency. Current work almost exclusively relies on extensive instrumentation, which is difficult and costly to acquire. I show that our spatiotemporal video models are capable of capturing important surgical attributes (e.g., organs, tools) and achieve state-of-the-art performance on two challenging datasets. The models and methodology described have demonstrably improved the ability to temporally segment complex human activities, in many cases without sophisticated instrumentation
    • …
    corecore