16 research outputs found
Recommended from our members
Automatic facial expression analysis
Humans spend a large amount of their time interacting with computers of one type or another. However, computers are emotionally blind and indifferent to the affective states of their users. Human-computer interaction which does not consider emotions, ignores a whole channel of available information.
Faces contain a large portion of our emotionally expressive behaviour. We use facial expressions to display our emotional states and to manage our interactions. Furthermore, we express and read emotions in faces effortlessly. However, automatic understanding of facial expressions is a very difficult task computationally, especially in the presence of highly variable pose, expression and illumination. My work furthers the field of automatic facial expression tracking by tackling these issues, bringing emotionally aware computing closer to reality.
Firstly, I present an in-depth analysis of the Constrained Local Model (CLM) for facial expression and head pose tracking. I propose a number of extensions that make location of facial features more accurate.
Secondly, I introduce a 3D Constrained Local Model (CLM-Z) which takes full advantage of depth information available from various range scanners. CLM-Z is robust to changes in illumination and shows better facial tracking performance.
Thirdly, I present the Constrained Local Neural Field (CLNF), a novel instance of CLM that deals with the issues of facial tracking in complex scenes. It achieves this through the use of a novel landmark detector and a novel CLM fitting algorithm. CLNF outperforms state-of-the-art models for facial tracking in presence of difficult illumination and varying pose.
Lastly, I demonstrate how tracked facial expressions can be used for emotion inference from videos. I also show how the tools developed for facial tracking can be applied to emotion inference in music
Temporal Attention-Gated Model for Robust Sequence Classification
Typical techniques for sequence classification are designed for
well-segmented sequences which have been edited to remove noisy or irrelevant
parts. Therefore, such methods cannot be easily applied on noisy sequences
expected in real-world applications. In this paper, we present the Temporal
Attention-Gated Model (TAGM) which integrates ideas from attention models and
gated recurrent networks to better deal with noisy or unsegmented sequences.
Specifically, we extend the concept of attention model to measure the relevance
of each observation (time step) of a sequence. We then use a novel gated
recurrent network to learn the hidden representation for the final prediction.
An important advantage of our approach is interpretability since the temporal
attention weights provide a meaningful value for the salience of each time step
in the sequence. We demonstrate the merits of our TAGM approach, both for
prediction accuracy and interpretability, on three different tasks: spoken
digit recognition, text-based sentiment analysis and visual event recognition.Comment: Accepted by CVPR 201
SimpleEgo: Predicting Probabilistic Body Pose from Egocentric Cameras
Our work addresses the problem of egocentric human pose estimation from
downwards-facing cameras on head-mounted devices (HMD). This presents a
challenging scenario, as parts of the body often fall outside of the image or
are occluded. Previous solutions minimize this problem by using fish-eye camera
lenses to capture a wider view, but these can present hardware design issues.
They also predict 2D heat-maps per joint and lift them to 3D space to deal with
self-occlusions, but this requires large network architectures which are
impractical to deploy on resource-constrained HMDs. We predict pose from images
captured with conventional rectilinear camera lenses. This resolves hardware
design issues, but means body parts are often out of frame. As such, we
directly regress probabilistic joint rotations represented as matrix Fisher
distributions for a parameterized body model. This allows us to quantify pose
uncertainties and explain out-of-frame or occluded joints. This also removes
the need to compute 2D heat-maps and allows for simplified DNN architectures
which require less compute. Given the lack of egocentric datasets using
rectilinear camera lenses, we introduce the SynthEgo dataset, a synthetic
dataset with 60K stereo images containing high diversity of pose, shape,
clothing and skin tone. Our approach achieves state-of-the-art results for this
challenging configuration, reducing mean per-joint position error by 23%
overall and 58% for the lower body. Our architecture also has eight times fewer
parameters and runs twice as fast as the current state-of-the-art. Experiments
show that training on our synthetic dataset leads to good generalization to
real world images without fine-tuning.Comment: Accepted in 3DV 202
The Cambridge Face Tracker: Accurate, Low Cost Measurement of Head Posture Using Computer Vision and Face Recognition Software.
PURPOSE: We validate a video-based method of head posture measurement. METHODS: The Cambridge Face Tracker uses neural networks (constrained local neural fields) to recognize facial features in video. The relative position of these facial features is used to calculate head posture. First, we assess the accuracy of this approach against videos in three research databases where each frame is tagged with a precisely measured head posture. Second, we compare our method to a commercially available mechanical device, the Cervical Range of Motion device: four subjects each adopted 43 distinct head postures that were measured using both methods. RESULTS: The Cambridge Face Tracker achieved confident facial recognition in 92% of the approximately 38,000 frames of video from the three databases. The respective mean error in absolute head posture was 3.34°, 3.86°, and 2.81°, with a median error of 1.97°, 2.16°, and 1.96°. The accuracy decreased with more extreme head posture. Comparing The Cambridge Face Tracker to the Cervical Range of Motion Device gave correlation coefficients of 0.99 (P < 0.0001), 0.96 (P < 0.0001), and 0.99 (P < 0.0001) for yaw, pitch, and roll, respectively. CONCLUSIONS: The Cambridge Face Tracker performs well under real-world conditions and within the range of normally-encountered head posture. It allows useful quantification of head posture in real time or from precaptured video. Its performance is similar to that of a clinically validated mechanical device. It has significant advantages over other approaches in that subjects do not need to wear any apparatus, and it requires only low cost, easy-to-setup consumer electronics. TRANSLATIONAL RELEVANCE: Noncontact assessment of head posture allows more complete clinical assessment of patients, and could benefit surgical planning in future
Estimation of accuracy of the trees and logs volume tables
Darbo objektas – pušų, eglių, beržų, alksnių medžių stiebai ir jų iš pagaminta apvaliosios medienos produkcija. Darbo tikslas – Išanalizuoti medienos tūrio skirtumus, kurie susidaro medienos apskaitai naudojant medžių stiebų su žieve tūrio, medžių tūrio struktūros ir rąstų tūrio lenteles. Darbo metodai – statistiniai, empiriniai. Darbo rezultatai. Atlikus tyrimus buvo gauna, kad stiebų tūrio lentelės vidutiniškai 2,2% didina visų medžių rūšių kartu paėmus stiebų su žieve tūrius. Stiebų tūrio lentelės vidutiniškai 4,4% didina pušų stiebų su žieve tūrį, ir 2,5% mažina juodalksnių stiebų su žieve tūrį. Tikrintų biržių imčių duomenys rodo, kad visų medžių rūšių kartu paėmus likvidikės medienos tūris, nustatytas pagal medžių tūrio struktūros lenteles, yra 3.5% padidintas. Vidutinis nelikvidinės medienos tūris bareliuose sudaro 15%. Rąstų tūrio lentelių tikslumas yra pakankamas apvalios medienos tūriui įvertinti. Tūrio skirtumo paklaida yra neesminė. Raktažodžiai: Stiebai, mediena, produkcija, tūrio skirtumai, žievė.Work object - stems of pinus, picea, alnus and betula and from these produced round wood production. Work goal - to compare stems capacity of the same trees, which is evaluate by trees stems with bark tables with capacity, which is evaluated by compound Huber formula, by structure of trees capacity tables and by capacity of logs tables. Work methods – statisticals, empiricals. Work results – after research was noticed that tables of stems capacity increases all kinds of trees capacity including stems with bark about 2,2%. Tables of stems capacity about 4,4% increase volume of pinus stems with bark and about 2,5% reduce volume of alnus stems with bark. The material of checked plots shows that all kinds of trees liquidated wood capacity, which was evaluated by tables of trees capacity structure is increased 3.5%. Medium not commercial wood capacity in areas contains 15%. Accuracy of logs volume tables is unbiased.Žemės ūkio akademijaVytauto Didžiojo universiteta
Crowdsouring in emotion studies across time and culture
Crowdsourcing is becoming increasingly popular as a cheap and effective tool for multimedia annotation. However, the idea is not new, and can be traced back to Charles Darwin. He was interested in studying the universality of facial expressions in conveying emotions, thus he had to consider a global population. Access to different cultures allowed him to reach more general conclusions. In this paper, we highlight a few milestones in the history of the study of emotion that share the concepts of crowdsourcing. We first consider the study of posed photographs and then move to videos of natural expressions. We present our use of crowdsouring to label a video corpus of natural expressions, and also to recreate one of Darwin’s original emotion judgment experiments. This allows us to compare people’s perception of emotional expressions in the 19th and 21st centuries, showing that it remains stable through both culture and time. 1