29 research outputs found
Temporal Attention-Gated Model for Robust Sequence Classification
Typical techniques for sequence classification are designed for
well-segmented sequences which have been edited to remove noisy or irrelevant
parts. Therefore, such methods cannot be easily applied on noisy sequences
expected in real-world applications. In this paper, we present the Temporal
Attention-Gated Model (TAGM) which integrates ideas from attention models and
gated recurrent networks to better deal with noisy or unsegmented sequences.
Specifically, we extend the concept of attention model to measure the relevance
of each observation (time step) of a sequence. We then use a novel gated
recurrent network to learn the hidden representation for the final prediction.
An important advantage of our approach is interpretability since the temporal
attention weights provide a meaningful value for the salience of each time step
in the sequence. We demonstrate the merits of our TAGM approach, both for
prediction accuracy and interpretability, on three different tasks: spoken
digit recognition, text-based sentiment analysis and visual event recognition.Comment: Accepted by CVPR 201
The Cambridge Face Tracker: Accurate, Low Cost Measurement of Head Posture Using Computer Vision and Face Recognition Software.
PURPOSE: We validate a video-based method of head posture measurement. METHODS: The Cambridge Face Tracker uses neural networks (constrained local neural fields) to recognize facial features in video. The relative position of these facial features is used to calculate head posture. First, we assess the accuracy of this approach against videos in three research databases where each frame is tagged with a precisely measured head posture. Second, we compare our method to a commercially available mechanical device, the Cervical Range of Motion device: four subjects each adopted 43 distinct head postures that were measured using both methods. RESULTS: The Cambridge Face Tracker achieved confident facial recognition in 92% of the approximately 38,000 frames of video from the three databases. The respective mean error in absolute head posture was 3.34°, 3.86°, and 2.81°, with a median error of 1.97°, 2.16°, and 1.96°. The accuracy decreased with more extreme head posture. Comparing The Cambridge Face Tracker to the Cervical Range of Motion Device gave correlation coefficients of 0.99 (P < 0.0001), 0.96 (P < 0.0001), and 0.99 (P < 0.0001) for yaw, pitch, and roll, respectively. CONCLUSIONS: The Cambridge Face Tracker performs well under real-world conditions and within the range of normally-encountered head posture. It allows useful quantification of head posture in real time or from precaptured video. Its performance is similar to that of a clinically validated mechanical device. It has significant advantages over other approaches in that subjects do not need to wear any apparatus, and it requires only low cost, easy-to-setup consumer electronics. TRANSLATIONAL RELEVANCE: Noncontact assessment of head posture allows more complete clinical assessment of patients, and could benefit surgical planning in future
Geometries of Light and Shadows, from Piero della Francesca to James Turrell
This chapter addresses the problem of representing light and shadow in the artistic culture, from its uncertain beginnings, related to the studies on conical linear perspective in the Fifteenth Century, to the applications of light projection in the installations of contemporary art.
Here are examined in particular two works by two artists, representing two different conceptual approaches to the perception and symbolism of light and shadow. The first is the so-called Brera Madonna by Piero della Francesca, where the image projected from a luminous radiation is employed with a narrative purpose, supporting the apparently hidden script of the painting and according to the artist\u2019s own speculations about perspective as a means to clarify the phenomenal world.
The second is one of James Turrell\u2019s Dark Spaces installations, where quantum electrodynamics interpretation of light is taken into account: for Turrell, light is physical and thus can shape spaces where the visitors, or viewers, can \u201csee themselves seeing.\u201d In his body of work, perceptual deceptions are carefullyproduced by the interaction of the senses with his phenomenal staging of light and darkness, but a strong symbolic component is always present, often related to his own speculative interests.
In both cases, light and shadow, through their geometries, emphasize both phenomenal and spiritual contents of the work of art, intended as a device to expand the perception and the knowledge of the viewer
Learning Tversky Similarity
In this paper, we advocate Tversky's ratio model as an appropriate basis for
computational approaches to semantic similarity, that is, the comparison of
objects such as images in a semantically meaningful way. We consider the
problem of learning Tversky similarity measures from suitable training data
indicating whether two objects tend to be similar or dissimilar.
Experimentally, we evaluate our approach to similarity learning on two image
datasets, showing that is performs very well compared to existing methods
Detecting human Activities Based on a multimodal sensor data set using a bidirectional long short-term memory model: a case study
Human falls are one of the leading causes of fatal unintentional injuries
worldwide. Falls result in a direct financial cost to health systems, and indirectly,
to society’s productivity. Unsurprisingly, human fall detection and prevention is
a major focus of health research. In this chapter, we present and evaluate several
bidirectional long short-term memory (Bi-LSTM) models using a data set provided
by the Challenge UP competition. The main goal of this study is to detect 12 human
daily activities (six daily human activities, five falls, and one post-fall activity)
derived from multi-modal data sources - wearable sensors, ambient sensors, and
vision devices. Our proposed Bi-LSTM model leverages data from accelerometer
and gyroscope sensors located at the ankle, right pocket, belt, and neck of the subject.
We utilize a grid search technique to evaluate variations of the Bi-LSTM model and
identify a configuration that presents the best results. The best Bi-LSTM model
achieved good results for precision and f1-score, 43.30% and 38.50%, respectivel
Learning-Based confidence estimation for Multi-modal classifier fusion
We propose a novel confidence estimation method for predictions from a multi-class classifier. Unlike existing methods, we learn a confidence-estimator on the basis of a held-out set from the training data. The predicted confidence values by the proposed system are used to improve the accuracy of multi-modal emotion and sentiment classification. The scores of different classes from the individual modalities are superposed on the basis of confidence values. Experimental results demonstrate that the accuracy of the proposed confidence based fusion method is significantly superior to that of the classifier trained on any modality separately, and achieves superior performance compared to other fusion methods