15,515 research outputs found

    Who's Better? Who's Best? Pairwise Deep Ranking for Skill Determination

    Get PDF
    We present a method for assessing skill from video, applicable to a variety of tasks, ranging from surgery to drawing and rolling pizza dough. We formulate the problem as pairwise (who's better?) and overall (who's best?) ranking of video collections, using supervised deep ranking. We propose a novel loss function that learns discriminative features when a pair of videos exhibit variance in skill, and learns shared features when a pair of videos exhibit comparable skill levels. Results demonstrate our method is applicable across tasks, with the percentage of correctly ordered pairs of videos ranging from 70% to 83% for four datasets. We demonstrate the robustness of our approach via sensitivity analysis of its parameters. We see this work as effort toward the automated organization of how-to video collections and overall, generic skill determination in video.Comment: CVPR 201

    Dance training shapes action perception and its neural implementation within the young and older adult brain

    Get PDF
    How we perceive others in action is shaped by our prior experience. Many factors influence brain responses when observing others in action, including training in a particular physical skill, such as sport or dance, and also general development and aging processes. Here, we investigate how learning a complex motor skill shapes neural and behavioural responses among a dance-naïve sample of 20 young and 19 older adults. Across four days, participants physically rehearsed one set of dance sequences, observed a second set, and a third set remained untrained. Functional MRI was obtained prior to and immediately following training. Participants’ behavioural performance on motor and visual tasks improved across the training period, with younger adults showing steeper performance gains than older adults. At the brain level, both age groups demonstrated decreased sensorimotor cortical engagement after physical training, with younger adults showing more pronounced decreases in inferior parietal activity compared to older adults. Neural decoding results demonstrate that among both age groups, visual and motor regions contain experience-specific representations of new motor learning. By combining behavioural measures of performance with univariate and multivariate measures of brain activity, we can start to build a more complete picture of age-related changes in experience-dependent plasticity

    Automatic alignment of surgical videos using kinematic data

    Full text link
    Over the past one hundred years, the classic teaching methodology of "see one, do one, teach one" has governed the surgical education systems worldwide. With the advent of Operation Room 2.0, recording video, kinematic and many other types of data during the surgery became an easy task, thus allowing artificial intelligence systems to be deployed and used in surgical and medical practice. Recently, surgical videos has been shown to provide a structure for peer coaching enabling novice trainees to learn from experienced surgeons by replaying those videos. However, the high inter-operator variability in surgical gesture duration and execution renders learning from comparing novice to expert surgical videos a very difficult task. In this paper, we propose a novel technique to align multiple videos based on the alignment of their corresponding kinematic multivariate time series data. By leveraging the Dynamic Time Warping measure, our algorithm synchronizes a set of videos in order to show the same gesture being performed at different speed. We believe that the proposed approach is a valuable addition to the existing learning tools for surgery.Comment: Accepted at AIME 201

    Surgical Skill Assessment via Video Semantic Aggregation

    Full text link
    Automated video-based assessment of surgical skills is a promising task in assisting young surgical trainees, especially in poor-resource areas. Existing works often resort to a CNN-LSTM joint framework that models long-term relationships by LSTMs on spatially pooled short-term CNN features. However, this practice would inevitably neglect the difference among semantic concepts such as tools, tissues, and background in the spatial dimension, impeding the subsequent temporal relationship modeling. In this paper, we propose a novel skill assessment framework, Video Semantic Aggregation (ViSA), which discovers different semantic parts and aggregates them across spatiotemporal dimensions. The explicit discovery of semantic parts provides an explanatory visualization that helps understand the neural network's decisions. It also enables us to further incorporate auxiliary information such as the kinematic data to improve representation learning and performance. The experiments on two datasets show the competitiveness of ViSA compared to state-of-the-art methods. Source code is available at: bit.ly/MICCAI2022ViSA.Comment: To appear in MICCAI 202

    Toward future 'mixed reality' learning spaces for STEAM education

    Get PDF
    Digital technology is becoming more integrated and part of modern society. As this begins to happen, technologies including augmented reality, virtual reality, 3d printing and user supplied mobile devices (collectively referred to as mixed reality) are often being touted as likely to become more a part of the classroom and learning environment. In the discipline areas of STEAM education, experts are expected to be at the forefront of technology and how it might fit into their classroom. This is especially important because increasingly, educators are finding themselves surrounded by new learners that expect to be engaged with participatory, interactive, sensory-rich, experimental activities with greater opportunities for student input and creativity. This paper will explore learner and academic perspectives on mixed reality case studies in 3d spatial design (multimedia and architecture), paramedic science and information technology, through the use of existing data as well as additional one-on-one interviews around the use of mixed reality in the classroom. Results show that mixed reality can provide engagement, critical thinking and problem solving benefits for students in line with this new generation of learners, but also demonstrates that more work needs to be done to refine mixed reality solutions for the classroom

    Action Quality Assessment with Temporal Parsing Transformer

    Full text link
    Action Quality Assessment(AQA) is important for action understanding and resolving the task poses unique challenges due to subtle visual differences. Existing state-of-the-art methods typically rely on the holistic video representations for score regression or ranking, which limits the generalization to capture fine-grained intra-class variation. To overcome the above limitation, we propose a temporal parsing transformer to decompose the holistic feature into temporal part-level representations. Specifically, we utilize a set of learnable queries to represent the atomic temporal patterns for a specific action. Our decoding process converts the frame representations to a fixed number of temporally ordered part representations. To obtain the quality score, we adopt the state-of-the-art contrastive regression based on the part representations. Since existing AQA datasets do not provide temporal part-level labels or partitions, we propose two novel loss functions on the cross attention responses of the decoder: a ranking loss to ensure the learnable queries to satisfy the temporal order in cross attention and a sparsity loss to encourage the part representations to be more discriminative. Extensive experiments show that our proposed method outperforms prior work on three public AQA benchmarks by a considerable margin.Comment: accepted by ECCV 202
    corecore