127 research outputs found
Temporal Attention-Gated Model for Robust Sequence Classification
Typical techniques for sequence classification are designed for
well-segmented sequences which have been edited to remove noisy or irrelevant
parts. Therefore, such methods cannot be easily applied on noisy sequences
expected in real-world applications. In this paper, we present the Temporal
Attention-Gated Model (TAGM) which integrates ideas from attention models and
gated recurrent networks to better deal with noisy or unsegmented sequences.
Specifically, we extend the concept of attention model to measure the relevance
of each observation (time step) of a sequence. We then use a novel gated
recurrent network to learn the hidden representation for the final prediction.
An important advantage of our approach is interpretability since the temporal
attention weights provide a meaningful value for the salience of each time step
in the sequence. We demonstrate the merits of our TAGM approach, both for
prediction accuracy and interpretability, on three different tasks: spoken
digit recognition, text-based sentiment analysis and visual event recognition.Comment: Accepted by CVPR 201
Recommended from our members
Automatic facial expression analysis
Humans spend a large amount of their time interacting with computers of one type or another. However, computers are emotionally blind and indifferent to the affective states of their users. Human-computer interaction which does not consider emotions, ignores a whole channel of available information.
Faces contain a large portion of our emotionally expressive behaviour. We use facial expressions to display our emotional states and to manage our interactions. Furthermore, we express and read emotions in faces effortlessly. However, automatic understanding of facial expressions is a very difficult task computationally, especially in the presence of highly variable pose, expression and illumination. My work furthers the field of automatic facial expression tracking by tackling these issues, bringing emotionally aware computing closer to reality.
Firstly, I present an in-depth analysis of the Constrained Local Model (CLM) for facial expression and head pose tracking. I propose a number of extensions that make location of facial features more accurate.
Secondly, I introduce a 3D Constrained Local Model (CLM-Z) which takes full advantage of depth information available from various range scanners. CLM-Z is robust to changes in illumination and shows better facial tracking performance.
Thirdly, I present the Constrained Local Neural Field (CLNF), a novel instance of CLM that deals with the issues of facial tracking in complex scenes. It achieves this through the use of a novel landmark detector and a novel CLM fitting algorithm. CLNF outperforms state-of-the-art models for facial tracking in presence of difficult illumination and varying pose.
Lastly, I demonstrate how tracked facial expressions can be used for emotion inference from videos. I also show how the tools developed for facial tracking can be applied to emotion inference in music
Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition
Aerial scene recognition is a fundamental task in remote sensing and has
recently received increased interest. While the visual information from
overhead images with powerful models and efficient algorithms yields
considerable performance on scene recognition, it still suffers from the
variation of ground objects, lighting conditions etc. Inspired by the
multi-channel perception theory in cognition science, in this paper, for
improving the performance on the aerial scene recognition, we explore a novel
audiovisual aerial scene recognition task using both images and sounds as
input. Based on an observation that some specific sound events are more likely
to be heard at a given geographic location, we propose to exploit the knowledge
from the sound events to improve the performance on the aerial scene
recognition. For this purpose, we have constructed a new dataset named AuDio
Visual Aerial sceNe reCognition datasEt (ADVANCE). With the help of this
dataset, we evaluate three proposed approaches for transferring the sound event
knowledge to the aerial scene recognition task in a multimodal learning
framework, and show the benefit of exploiting the audio information for the
aerial scene recognition. The source code is publicly available for
reproducibility purposes.Comment: ECCV 202
SimpleEgo: Predicting Probabilistic Body Pose from Egocentric Cameras
Our work addresses the problem of egocentric human pose estimation from
downwards-facing cameras on head-mounted devices (HMD). This presents a
challenging scenario, as parts of the body often fall outside of the image or
are occluded. Previous solutions minimize this problem by using fish-eye camera
lenses to capture a wider view, but these can present hardware design issues.
They also predict 2D heat-maps per joint and lift them to 3D space to deal with
self-occlusions, but this requires large network architectures which are
impractical to deploy on resource-constrained HMDs. We predict pose from images
captured with conventional rectilinear camera lenses. This resolves hardware
design issues, but means body parts are often out of frame. As such, we
directly regress probabilistic joint rotations represented as matrix Fisher
distributions for a parameterized body model. This allows us to quantify pose
uncertainties and explain out-of-frame or occluded joints. This also removes
the need to compute 2D heat-maps and allows for simplified DNN architectures
which require less compute. Given the lack of egocentric datasets using
rectilinear camera lenses, we introduce the SynthEgo dataset, a synthetic
dataset with 60K stereo images containing high diversity of pose, shape,
clothing and skin tone. Our approach achieves state-of-the-art results for this
challenging configuration, reducing mean per-joint position error by 23%
overall and 58% for the lower body. Our architecture also has eight times fewer
parameters and runs twice as fast as the current state-of-the-art. Experiments
show that training on our synthetic dataset leads to good generalization to
real world images without fine-tuning.Comment: Accepted in 3DV 202
Experimental studies of tribological properties of hard indexable inserts with vacuum-plazma coatings at cylindrical milling of woodchip boards
The article presents the methodology and results of experimental studies of friction coefficients that characterize the processing of chipboard tail cutters, knives equipped with vacuum-plasma coatings. The developed technique for determining the coefficient of friction of the back of the blade and the adjacent part of the cutting edge in milling based on the simultaneous registration of the tangential and normal cutting forces on the back of the blade during the cutting process with zero height allowance being taken. The research results have allowed to perform scientific basis of tribological characteristics of vacuum-plasma coatings deposited on the cutting elements, optimization of coating parameters
The Cambridge Face Tracker: Accurate, Low Cost Measurement of Head Posture Using Computer Vision and Face Recognition Software.
PURPOSE: We validate a video-based method of head posture measurement. METHODS: The Cambridge Face Tracker uses neural networks (constrained local neural fields) to recognize facial features in video. The relative position of these facial features is used to calculate head posture. First, we assess the accuracy of this approach against videos in three research databases where each frame is tagged with a precisely measured head posture. Second, we compare our method to a commercially available mechanical device, the Cervical Range of Motion device: four subjects each adopted 43 distinct head postures that were measured using both methods. RESULTS: The Cambridge Face Tracker achieved confident facial recognition in 92% of the approximately 38,000 frames of video from the three databases. The respective mean error in absolute head posture was 3.34°, 3.86°, and 2.81°, with a median error of 1.97°, 2.16°, and 1.96°. The accuracy decreased with more extreme head posture. Comparing The Cambridge Face Tracker to the Cervical Range of Motion Device gave correlation coefficients of 0.99 (P < 0.0001), 0.96 (P < 0.0001), and 0.99 (P < 0.0001) for yaw, pitch, and roll, respectively. CONCLUSIONS: The Cambridge Face Tracker performs well under real-world conditions and within the range of normally-encountered head posture. It allows useful quantification of head posture in real time or from precaptured video. Its performance is similar to that of a clinically validated mechanical device. It has significant advantages over other approaches in that subjects do not need to wear any apparatus, and it requires only low cost, easy-to-setup consumer electronics. TRANSLATIONAL RELEVANCE: Noncontact assessment of head posture allows more complete clinical assessment of patients, and could benefit surgical planning in future
- …