Search CORE

16 research outputs found

Recommended from our members

Automatic facial expression analysis

Author: Baltrušaitis Tadas
Publication venue: University of Cambridge
Publication date: 08/04/2014
Field of study

Humans spend a large amount of their time interacting with computers of one type or another. However, computers are emotionally blind and indifferent to the affective states of their users. Human-computer interaction which does not consider emotions, ignores a whole channel of available information. Faces contain a large portion of our emotionally expressive behaviour. We use facial expressions to display our emotional states and to manage our interactions. Furthermore, we express and read emotions in faces effortlessly. However, automatic understanding of facial expressions is a very difficult task computationally, especially in the presence of highly variable pose, expression and illumination. My work furthers the field of automatic facial expression tracking by tackling these issues, bringing emotionally aware computing closer to reality. Firstly, I present an in-depth analysis of the Constrained Local Model (CLM) for facial expression and head pose tracking. I propose a number of extensions that make location of facial features more accurate. Secondly, I introduce a 3D Constrained Local Model (CLM-Z) which takes full advantage of depth information available from various range scanners. CLM-Z is robust to changes in illumination and shows better facial tracking performance. Thirdly, I present the Constrained Local Neural Field (CLNF), a novel instance of CLM that deals with the issues of facial tracking in complex scenes. It achieves this through the use of a novel landmark detector and a novel CLM fitting algorithm. CLNF outperforms state-of-the-art models for facial tracking in presence of difficult illumination and varying pose. Lastly, I demonstrate how tracked facial expressions can be used for emotion inference from videos. I also show how the tools developed for facial tracking can be applied to emotion inference in music

Apollo (Cambridge)

Temporal Attention-Gated Model for Robust Sequence Classification

Author: Baltrušaitis Tadas
Morency Louis-Philippe
Pei Wenjie
Tax David M. J.
Publication venue
Publication date: 15/04/2017
Field of study

Typical techniques for sequence classification are designed for well-segmented sequences which have been edited to remove noisy or irrelevant parts. Therefore, such methods cannot be easily applied on noisy sequences expected in real-world applications. In this paper, we present the Temporal Attention-Gated Model (TAGM) which integrates ideas from attention models and gated recurrent networks to better deal with noisy or unsegmented sequences. Specifically, we extend the concept of attention model to measure the relevance of each observation (time step) of a sequence. We then use a novel gated recurrent network to learn the hidden representation for the final prediction. An important advantage of our approach is interpretability since the temporal attention weights provide a meaningful value for the salience of each time step in the sequence. We demonstrate the merits of our TAGM approach, both for prediction accuracy and interpretability, on three different tasks: spoken digit recognition, text-based sentiment analysis and visual event recognition.Comment: Accepted by CVPR 201

arXiv.org e-Print Archive

Crossref

SimpleEgo: Predicting Probabilistic Body Pose from Egocentric Cameras

Author: Aliakbarian Sadegh
Baltrušaitis Tadas
Cuevas-Velasquez Hanz
Hewitt Charlie
Publication venue
Publication date: 26/01/2024
Field of study

Our work addresses the problem of egocentric human pose estimation from downwards-facing cameras on head-mounted devices (HMD). This presents a challenging scenario, as parts of the body often fall outside of the image or are occluded. Previous solutions minimize this problem by using fish-eye camera lenses to capture a wider view, but these can present hardware design issues. They also predict 2D heat-maps per joint and lift them to 3D space to deal with self-occlusions, but this requires large network architectures which are impractical to deploy on resource-constrained HMDs. We predict pose from images captured with conventional rectilinear camera lenses. This resolves hardware design issues, but means body parts are often out of frame. As such, we directly regress probabilistic joint rotations represented as matrix Fisher distributions for a parameterized body model. This allows us to quantify pose uncertainties and explain out-of-frame or occluded joints. This also removes the need to compute 2D heat-maps and allows for simplified DNN architectures which require less compute. Given the lack of egocentric datasets using rectilinear camera lenses, we introduce the SynthEgo dataset, a synthetic dataset with 60K stereo images containing high diversity of pose, shape, clothing and skin tone. Our approach achieves state-of-the-art results for this challenging configuration, reducing mean per-joint position error by 23% overall and 58% for the lower body. Our architecture also has eight times fewer parameters and runs twice as fast as the current state-of-the-art. Experiments show that training on our synthetic dataset leads to good generalization to real world images without fine-tuning.Comment: Accepted in 3DV 202

arXiv.org e-Print Archive

The Cambridge Face Tracker: Accurate, Low Cost Measurement of Head Posture Using Computer Vision and Face Recognition Software.

Author: Baltrušaitis Tadas
Robinson Peter
Thomas Peter BM
Vivian Anthony J
Publication venue: Transl Vis Sci Technol
Publication date: 30/09/2016
Field of study

PURPOSE: We validate a video-based method of head posture measurement. METHODS: The Cambridge Face Tracker uses neural networks (constrained local neural fields) to recognize facial features in video. The relative position of these facial features is used to calculate head posture. First, we assess the accuracy of this approach against videos in three research databases where each frame is tagged with a precisely measured head posture. Second, we compare our method to a commercially available mechanical device, the Cervical Range of Motion device: four subjects each adopted 43 distinct head postures that were measured using both methods. RESULTS: The Cambridge Face Tracker achieved confident facial recognition in 92% of the approximately 38,000 frames of video from the three databases. The respective mean error in absolute head posture was 3.34°, 3.86°, and 2.81°, with a median error of 1.97°, 2.16°, and 1.96°. The accuracy decreased with more extreme head posture. Comparing The Cambridge Face Tracker to the Cervical Range of Motion Device gave correlation coefficients of 0.99 (P < 0.0001), 0.96 (P < 0.0001), and 0.99 (P < 0.0001) for yaw, pitch, and roll, respectively. CONCLUSIONS: The Cambridge Face Tracker performs well under real-world conditions and within the range of normally-encountered head posture. It allows useful quantification of head posture in real time or from precaptured video. Its performance is similar to that of a clinically validated mechanical device. It has significant advantages over other approaches in that subjects do not need to wear any apparatus, and it requires only low cost, easy-to-setup consumer electronics. TRANSLATIONAL RELEVANCE: Noncontact assessment of head posture allows more complete clinical assessment of patients, and could benefit surgical planning in future

Crossref

PubMed Central

Apollo (Cambridge)

Estimation of accuracy of the trees and logs volume tables

Author: Baltrušaitis Tadas
Publication venue
Publication date: 14/01/2009
Field of study

Darbo objektas – pušų, eglių, beržų, alksnių medžių stiebai ir jų iš pagaminta apvaliosios medienos produkcija. Darbo tikslas – Išanalizuoti medienos tūrio skirtumus, kurie susidaro medienos apskaitai naudojant medžių stiebų su žieve tūrio, medžių tūrio struktūros ir rąstų tūrio lenteles. Darbo metodai – statistiniai, empiriniai. Darbo rezultatai. Atlikus tyrimus buvo gauna, kad stiebų tūrio lentelės vidutiniškai 2,2% didina visų medžių rūšių kartu paėmus stiebų su žieve tūrius. Stiebų tūrio lentelės vidutiniškai 4,4% didina pušų stiebų su žieve tūrį, ir 2,5% mažina juodalksnių stiebų su žieve tūrį. Tikrintų biržių imčių duomenys rodo, kad visų medžių rūšių kartu paėmus likvidikės medienos tūris, nustatytas pagal medžių tūrio struktūros lenteles, yra 3.5% padidintas. Vidutinis nelikvidinės medienos tūris bareliuose sudaro 15%. Rąstų tūrio lentelių tikslumas yra pakankamas apvalios medienos tūriui įvertinti. Tūrio skirtumo paklaida yra neesminė. Raktažodžiai: Stiebai, mediena, produkcija, tūrio skirtumai, žievė.Work object - stems of pinus, picea, alnus and betula and from these produced round wood production. Work goal - to compare stems capacity of the same trees, which is evaluate by trees stems with bark tables with capacity, which is evaluated by compound Huber formula, by structure of trees capacity tables and by capacity of logs tables. Work methods – statisticals, empiricals. Work results – after research was noticed that tables of stems capacity increases all kinds of trees capacity including stems with bark about 2,2%. Tables of stems capacity about 4,4% increase volume of pinus stems with bark and about 2,5% reduce volume of alnus stems with bark. The material of checked plots shows that all kinds of trees liquidated wood capacity, which was evaluated by tables of trees capacity structure is increased 3.5%. Medium not commercial wood capacity in areas contains 15%. Accuracy of logs volume tables is unbiased.Žemės ūkio akademijaVytauto Didžiojo universiteta

Vytautas Magnus University Institutional Repository (VMU ePub)

Crowdsouring in emotion studies across time and culture

Author: Marwa Mahmoud
Peter Robinson
Tadas Baltrušaitis
Publication venue
Publication date: 29/10/2012
Field of study

Crowdsourcing is becoming increasingly popular as a cheap and effective tool for multimedia annotation. However, the idea is not new, and can be traced back to Charles Darwin. He was interested in studying the universality of facial expressions in conveying emotions, thus he had to consider a global population. Access to different cultures allowed him to reach more general conclusions. In this paper, we highlight a few milestones in the history of the study of emotion that share the concepts of crowdsourcing. We first consider the study of posed photographs and then move to videos of natural expressions. We present our use of crowdsouring to label a video corpus of natural expressions, and also to recreate one of Darwin’s original emotion judgment experiments. This allows us to compare people’s perception of emotional expressions in the 19th and 21st centuries, showing that it remains stable through both culture and time. 1

CiteSeerX

Crossref

Enlighten