10,724 research outputs found
A Comprehensive Performance Evaluation of Deformable Face Tracking "In-the-Wild"
Recently, technologies such as face detection, facial landmark localisation
and face recognition and verification have matured enough to provide effective
and efficient solutions for imagery captured under arbitrary conditions
(referred to as "in-the-wild"). This is partially attributed to the fact that
comprehensive "in-the-wild" benchmarks have been developed for face detection,
landmark localisation and recognition/verification. A very important technology
that has not been thoroughly evaluated yet is deformable face tracking
"in-the-wild". Until now, the performance has mainly been assessed
qualitatively by visually assessing the result of a deformable face tracking
technology on short videos. In this paper, we perform the first, to the best of
our knowledge, thorough evaluation of state-of-the-art deformable face tracking
pipelines using the recently introduced 300VW benchmark. We evaluate many
different architectures focusing mainly on the task of on-line deformable face
tracking. In particular, we compare the following general strategies: (a)
generic face detection plus generic facial landmark localisation, (b) generic
model free tracking plus generic facial landmark localisation, as well as (c)
hybrid approaches using state-of-the-art face detection, model free tracking
and facial landmark localisation technologies. Our evaluation reveals future
avenues for further research on the topic.Comment: E. Antonakos and P. Snape contributed equally and have joint second
authorshi
A fast and robust hand-driven 3D mouse
The development of new interaction paradigms requires a natural interaction. This means that people should be able to interact with technology with the same models used to interact with everyday real life, that is through gestures, expressions, voice. Following this idea, in this paper we propose a non intrusive vision based tracking system able to capture hand motion and simple hand gestures. The proposed device allows to use the hand as a "natural" 3D mouse, where the forefinger tip or the palm centre are used to identify a 3D marker and the hand gesture can be used to simulate the mouse buttons. The approach is based on a monoscopic tracking algorithm which is computationally fast and robust against noise and cluttered backgrounds. Two image streams are processed in parallel exploiting multi-core architectures, and their results are combined to obtain a constrained stereoscopic problem. The system has been implemented and thoroughly tested in an experimental environment where the 3D hand mouse has been used to interact with objects in a virtual reality application. We also provide results about the performances of the tracker, which demonstrate precision and robustness of the proposed syste
The Verbal and Non Verbal Signals of Depression -- Combining Acoustics, Text and Visuals for Estimating Depression Level
Depression is a serious medical condition that is suffered by a large number
of people around the world. It significantly affects the way one feels, causing
a persistent lowering of mood. In this paper, we propose a novel
attention-based deep neural network which facilitates the fusion of various
modalities. We use this network to regress the depression level. Acoustic, text
and visual modalities have been used to train our proposed network. Various
experiments have been carried out on the benchmark dataset, namely, Distress
Analysis Interview Corpus - a Wizard of Oz (DAIC-WOZ). From the results, we
empirically justify that the fusion of all three modalities helps in giving the
most accurate estimation of depression level. Our proposed approach outperforms
the state-of-the-art by 7.17% on root mean squared error (RMSE) and 8.08% on
mean absolute error (MAE).Comment: 10 pages including references, 2 figure
On Using Gait Biometrics to Enhance Face Pose Estimation
Many face biometrics systems use controlled environments where subjects are viewed directly facing the camera. This is less likely to occur in surveillance environments, so a process is required to handle the pose variation of the human head, change in illumination, and low frame rate of input image sequences. This has been achieved using scale invariant features and 3D models to determine the pose of the human subject. Then, a gait trajectory model is generated to obtain the correct the face region whilst handing the looming effect. In this way, we describe a new approach aimed to estimate accurate face pose. The contributions of this research include the construction of a 3D model for pose estimation from planar imagery and the first use of gait information to enhance the face pose estimation process
- âŠ