16 research outputs found

    Temporal Attention-Gated Model for Robust Sequence Classification

    Full text link
    Typical techniques for sequence classification are designed for well-segmented sequences which have been edited to remove noisy or irrelevant parts. Therefore, such methods cannot be easily applied on noisy sequences expected in real-world applications. In this paper, we present the Temporal Attention-Gated Model (TAGM) which integrates ideas from attention models and gated recurrent networks to better deal with noisy or unsegmented sequences. Specifically, we extend the concept of attention model to measure the relevance of each observation (time step) of a sequence. We then use a novel gated recurrent network to learn the hidden representation for the final prediction. An important advantage of our approach is interpretability since the temporal attention weights provide a meaningful value for the salience of each time step in the sequence. We demonstrate the merits of our TAGM approach, both for prediction accuracy and interpretability, on three different tasks: spoken digit recognition, text-based sentiment analysis and visual event recognition.Comment: Accepted by CVPR 201

    SimpleEgo: Predicting Probabilistic Body Pose from Egocentric Cameras

    Full text link
    Our work addresses the problem of egocentric human pose estimation from downwards-facing cameras on head-mounted devices (HMD). This presents a challenging scenario, as parts of the body often fall outside of the image or are occluded. Previous solutions minimize this problem by using fish-eye camera lenses to capture a wider view, but these can present hardware design issues. They also predict 2D heat-maps per joint and lift them to 3D space to deal with self-occlusions, but this requires large network architectures which are impractical to deploy on resource-constrained HMDs. We predict pose from images captured with conventional rectilinear camera lenses. This resolves hardware design issues, but means body parts are often out of frame. As such, we directly regress probabilistic joint rotations represented as matrix Fisher distributions for a parameterized body model. This allows us to quantify pose uncertainties and explain out-of-frame or occluded joints. This also removes the need to compute 2D heat-maps and allows for simplified DNN architectures which require less compute. Given the lack of egocentric datasets using rectilinear camera lenses, we introduce the SynthEgo dataset, a synthetic dataset with 60K stereo images containing high diversity of pose, shape, clothing and skin tone. Our approach achieves state-of-the-art results for this challenging configuration, reducing mean per-joint position error by 23% overall and 58% for the lower body. Our architecture also has eight times fewer parameters and runs twice as fast as the current state-of-the-art. Experiments show that training on our synthetic dataset leads to good generalization to real world images without fine-tuning.Comment: Accepted in 3DV 202

    The Cambridge Face Tracker: Accurate, Low Cost Measurement of Head Posture Using Computer Vision and Face Recognition Software.

    Get PDF
    PURPOSE: We validate a video-based method of head posture measurement. METHODS: The Cambridge Face Tracker uses neural networks (constrained local neural fields) to recognize facial features in video. The relative position of these facial features is used to calculate head posture. First, we assess the accuracy of this approach against videos in three research databases where each frame is tagged with a precisely measured head posture. Second, we compare our method to a commercially available mechanical device, the Cervical Range of Motion device: four subjects each adopted 43 distinct head postures that were measured using both methods. RESULTS: The Cambridge Face Tracker achieved confident facial recognition in 92% of the approximately 38,000 frames of video from the three databases. The respective mean error in absolute head posture was 3.34°, 3.86°, and 2.81°, with a median error of 1.97°, 2.16°, and 1.96°. The accuracy decreased with more extreme head posture. Comparing The Cambridge Face Tracker to the Cervical Range of Motion Device gave correlation coefficients of 0.99 (P < 0.0001), 0.96 (P < 0.0001), and 0.99 (P < 0.0001) for yaw, pitch, and roll, respectively. CONCLUSIONS: The Cambridge Face Tracker performs well under real-world conditions and within the range of normally-encountered head posture. It allows useful quantification of head posture in real time or from precaptured video. Its performance is similar to that of a clinically validated mechanical device. It has significant advantages over other approaches in that subjects do not need to wear any apparatus, and it requires only low cost, easy-to-setup consumer electronics. TRANSLATIONAL RELEVANCE: Noncontact assessment of head posture allows more complete clinical assessment of patients, and could benefit surgical planning in future

    Estimation of accuracy of the trees and logs volume tables

    No full text
    Darbo objektas – pušų, eglių, beržų, alksnių medžių stiebai ir jų iš pagaminta apvaliosios medienos produkcija. Darbo tikslas – Išanalizuoti medienos tūrio skirtumus, kurie susidaro medienos apskaitai naudojant medžių stiebų su žieve tūrio, medžių tūrio struktūros ir rąstų tūrio lenteles. Darbo metodai – statistiniai, empiriniai. Darbo rezultatai. Atlikus tyrimus buvo gauna, kad stiebų tūrio lentelės vidutiniškai 2,2% didina visų medžių rūšių kartu paėmus stiebų su žieve tūrius. Stiebų tūrio lentelės vidutiniškai 4,4% didina pušų stiebų su žieve tūrį, ir 2,5% mažina juodalksnių stiebų su žieve tūrį. Tikrintų biržių imčių duomenys rodo, kad visų medžių rūšių kartu paėmus likvidikės medienos tūris, nustatytas pagal medžių tūrio struktūros lenteles, yra 3.5% padidintas. Vidutinis nelikvidinės medienos tūris bareliuose sudaro 15%. Rąstų tūrio lentelių tikslumas yra pakankamas apvalios medienos tūriui įvertinti. Tūrio skirtumo paklaida yra neesminė. Raktažodžiai: Stiebai, mediena, produkcija, tūrio skirtumai, žievė.Work object - stems of pinus, picea, alnus and betula and from these produced round wood production. Work goal - to compare stems capacity of the same trees, which is evaluate by trees stems with bark tables with capacity, which is evaluated by compound Huber formula, by structure of trees capacity tables and by capacity of logs tables. Work methods – statisticals, empiricals. Work results – after research was noticed that tables of stems capacity increases all kinds of trees capacity including stems with bark about 2,2%. Tables of stems capacity about 4,4% increase volume of pinus stems with bark and about 2,5% reduce volume of alnus stems with bark. The material of checked plots shows that all kinds of trees liquidated wood capacity, which was evaluated by tables of trees capacity structure is increased 3.5%. Medium not commercial wood capacity in areas contains 15%. Accuracy of logs volume tables is unbiased.Žemės ūkio akademijaVytauto Didžiojo universiteta

    Crowdsouring in emotion studies across time and culture

    No full text
    Crowdsourcing is becoming increasingly popular as a cheap and effective tool for multimedia annotation. However, the idea is not new, and can be traced back to Charles Darwin. He was interested in studying the universality of facial expressions in conveying emotions, thus he had to consider a global population. Access to different cultures allowed him to reach more general conclusions. In this paper, we highlight a few milestones in the history of the study of emotion that share the concepts of crowdsourcing. We first consider the study of posed photographs and then move to videos of natural expressions. We present our use of crowdsouring to label a video corpus of natural expressions, and also to recreate one of Darwin’s original emotion judgment experiments. This allows us to compare people’s perception of emotional expressions in the 19th and 21st centuries, showing that it remains stable through both culture and time. 1
    corecore