15,516 research outputs found
Who's Better? Who's Best? Pairwise Deep Ranking for Skill Determination
We present a method for assessing skill from video, applicable to a variety
of tasks, ranging from surgery to drawing and rolling pizza dough. We formulate
the problem as pairwise (who's better?) and overall (who's best?) ranking of
video collections, using supervised deep ranking. We propose a novel loss
function that learns discriminative features when a pair of videos exhibit
variance in skill, and learns shared features when a pair of videos exhibit
comparable skill levels. Results demonstrate our method is applicable across
tasks, with the percentage of correctly ordered pairs of videos ranging from
70% to 83% for four datasets. We demonstrate the robustness of our approach via
sensitivity analysis of its parameters. We see this work as effort toward the
automated organization of how-to video collections and overall, generic skill
determination in video.Comment: CVPR 201
Dance training shapes action perception and its neural implementation within the young and older adult brain
How we perceive others in action is shaped by our prior experience. Many factors influence brain responses when observing others in action, including training in a particular physical skill, such as sport or dance, and also general development and aging processes. Here, we investigate how learning a complex motor skill shapes neural and behavioural responses among a dance-naïve sample of 20 young and 19 older adults. Across four days, participants physically rehearsed one set of dance sequences, observed a second set, and a third set remained untrained. Functional MRI was obtained prior to and immediately following training. Participants’ behavioural performance on motor and visual tasks improved across the training period, with younger adults showing steeper performance gains than older adults. At the brain level, both age groups demonstrated decreased sensorimotor cortical engagement after physical training, with younger adults showing more pronounced decreases in inferior parietal activity compared to older adults. Neural decoding results demonstrate that among both age groups, visual and motor regions contain experience-specific representations of new motor learning. By combining behavioural measures of performance with univariate and multivariate measures of brain activity, we can start to build a more complete picture of age-related changes in experience-dependent plasticity
Automatic alignment of surgical videos using kinematic data
Over the past one hundred years, the classic teaching methodology of "see
one, do one, teach one" has governed the surgical education systems worldwide.
With the advent of Operation Room 2.0, recording video, kinematic and many
other types of data during the surgery became an easy task, thus allowing
artificial intelligence systems to be deployed and used in surgical and medical
practice. Recently, surgical videos has been shown to provide a structure for
peer coaching enabling novice trainees to learn from experienced surgeons by
replaying those videos. However, the high inter-operator variability in
surgical gesture duration and execution renders learning from comparing novice
to expert surgical videos a very difficult task. In this paper, we propose a
novel technique to align multiple videos based on the alignment of their
corresponding kinematic multivariate time series data. By leveraging the
Dynamic Time Warping measure, our algorithm synchronizes a set of videos in
order to show the same gesture being performed at different speed. We believe
that the proposed approach is a valuable addition to the existing learning
tools for surgery.Comment: Accepted at AIME 201
Surgical Skill Assessment via Video Semantic Aggregation
Automated video-based assessment of surgical skills is a promising task in
assisting young surgical trainees, especially in poor-resource areas. Existing
works often resort to a CNN-LSTM joint framework that models long-term
relationships by LSTMs on spatially pooled short-term CNN features. However,
this practice would inevitably neglect the difference among semantic concepts
such as tools, tissues, and background in the spatial dimension, impeding the
subsequent temporal relationship modeling. In this paper, we propose a novel
skill assessment framework, Video Semantic Aggregation (ViSA), which discovers
different semantic parts and aggregates them across spatiotemporal dimensions.
The explicit discovery of semantic parts provides an explanatory visualization
that helps understand the neural network's decisions. It also enables us to
further incorporate auxiliary information such as the kinematic data to improve
representation learning and performance. The experiments on two datasets show
the competitiveness of ViSA compared to state-of-the-art methods. Source code
is available at: bit.ly/MICCAI2022ViSA.Comment: To appear in MICCAI 202
Toward future 'mixed reality' learning spaces for STEAM education
Digital technology is becoming more integrated and part of modern society. As this begins to happen, technologies including augmented reality, virtual reality, 3d printing and user supplied mobile devices (collectively referred to as mixed reality) are often being touted as likely to become more a part of the classroom and learning environment. In the discipline areas of STEAM education, experts are expected to be at the forefront of technology and how it might fit into their classroom. This is especially important because increasingly, educators are finding themselves surrounded by new learners that expect to be engaged with participatory, interactive, sensory-rich, experimental activities with greater opportunities for student input and creativity. This paper will explore learner and academic perspectives on mixed reality case studies in 3d spatial design (multimedia and architecture), paramedic science and information technology, through the use of existing data as well as additional one-on-one interviews around the use of mixed reality in the classroom. Results show that mixed reality can provide engagement, critical thinking and problem solving benefits for students in line with this new generation of learners, but also demonstrates that more work needs to be done to refine mixed reality solutions for the classroom
Action Quality Assessment with Temporal Parsing Transformer
Action Quality Assessment(AQA) is important for action understanding and
resolving the task poses unique challenges due to subtle visual differences.
Existing state-of-the-art methods typically rely on the holistic video
representations for score regression or ranking, which limits the
generalization to capture fine-grained intra-class variation. To overcome the
above limitation, we propose a temporal parsing transformer to decompose the
holistic feature into temporal part-level representations. Specifically, we
utilize a set of learnable queries to represent the atomic temporal patterns
for a specific action. Our decoding process converts the frame representations
to a fixed number of temporally ordered part representations. To obtain the
quality score, we adopt the state-of-the-art contrastive regression based on
the part representations. Since existing AQA datasets do not provide temporal
part-level labels or partitions, we propose two novel loss functions on the
cross attention responses of the decoder: a ranking loss to ensure the
learnable queries to satisfy the temporal order in cross attention and a
sparsity loss to encourage the part representations to be more discriminative.
Extensive experiments show that our proposed method outperforms prior work on
three public AQA benchmarks by a considerable margin.Comment: accepted by ECCV 202
- …