2,610 research outputs found
FEELS: a full-spectrum enhanced emotion learning system for assisting individuals with autism spectrum disorder
Autism Spectrum Disorder (ASD) is a developmental disorder thatcan lead to a variety of social and communication challenges, andindividuals with ASD are at a higher risk of loneliness and depres-sion as a result of the disconnect and isolation they may feel fromthe rest of society as a result of their ASD. Interventions targetingimproved emotional detection has been clinically shown to be quitepromising; however, there are considerable barriers that make itchallenging to incorporate emotion detection within daily life sce-narios. Motivated by the need to fill this gap, we introduce theconcept of FEELS, a full-spectrum enhanced emotion learning sys-tem which could be useful as a tool to assist individuals with ASD.FEELS facilitates enhanced emotion detection by capturing a livevideo stream of individuals in real-time, then leveraging deep con-volutional neural networks to detect facial landmarks and a customhybrid neural network consisting of a time distributed feed-forwardneural network and a LTSM neural network to determine the emo-tional state of the individuals based on a sequence of facial land-marks over time. The feasibility of such an approach was exploredthrough the construction of a proof-of-concept FEELS system thatcan detect between five different basic emotional states: neutral,sad, happy, surprise, and anger. Future work will include extend-ing the proof-of-concept FEELS system to detect more emotionalstates and evaluate the system in more natural settings
A Comprehensive Performance Evaluation of Deformable Face Tracking "In-the-Wild"
Recently, technologies such as face detection, facial landmark localisation
and face recognition and verification have matured enough to provide effective
and efficient solutions for imagery captured under arbitrary conditions
(referred to as "in-the-wild"). This is partially attributed to the fact that
comprehensive "in-the-wild" benchmarks have been developed for face detection,
landmark localisation and recognition/verification. A very important technology
that has not been thoroughly evaluated yet is deformable face tracking
"in-the-wild". Until now, the performance has mainly been assessed
qualitatively by visually assessing the result of a deformable face tracking
technology on short videos. In this paper, we perform the first, to the best of
our knowledge, thorough evaluation of state-of-the-art deformable face tracking
pipelines using the recently introduced 300VW benchmark. We evaluate many
different architectures focusing mainly on the task of on-line deformable face
tracking. In particular, we compare the following general strategies: (a)
generic face detection plus generic facial landmark localisation, (b) generic
model free tracking plus generic facial landmark localisation, as well as (c)
hybrid approaches using state-of-the-art face detection, model free tracking
and facial landmark localisation technologies. Our evaluation reveals future
avenues for further research on the topic.Comment: E. Antonakos and P. Snape contributed equally and have joint second
authorshi
Data Fusion for Real-time Multimodal Emotion Recognition through Webcams and Microphones in E-Learning
The original article is available on the Taylor & Francis Online website in the following link: http://www.tandfonline.com/doi/abs/10.1080/10447318.2016.1159799?journalCode=hihc20This paper describes the validation study of our software that uses combined webcam and microphone data for real-time, continuous, unobtrusive emotion recognition as part of our FILTWAM framework. FILTWAM aims at deploying a real time multimodal emotion recognition method for providing more adequate feedback to the learners through an online communication skills training. Herein, timely feedback is needed that reflects on their shown intended emotions and which is also useful to increase learners’ awareness of their own behaviour. At least, a reliable and valid software interpretation of performed face and voice emotions is needed to warrant such adequate feedback. This validation study therefore calibrates our software. The study uses a multimodal fusion method. Twelve test persons performed computer-based tasks in which they were asked to mimic specific facial and vocal emotions. All test persons’ behaviour was recorded on video and two raters independently scored the showed emotions, which were contrasted with the software recognition outcomes. A hybrid method for multimodal fusion of our multimodal software shows accuracy between 96.1% and 98.6% for the best-chosen WEKA classifiers over predicted emotions. The software fulfils its requirements of real-time data interpretation and reliable results.The Netherlands Laboratory for Lifelong Learning (NELLL) of the Open University Netherlands
Speaker-following Video Subtitles
We propose a new method for improving the presentation of subtitles in video
(e.g. TV and movies). With conventional subtitles, the viewer has to constantly
look away from the main viewing area to read the subtitles at the bottom of the
screen, which disrupts the viewing experience and causes unnecessary eyestrain.
Our method places on-screen subtitles next to the respective speakers to allow
the viewer to follow the visual content while simultaneously reading the
subtitles. We use novel identification algorithms to detect the speakers based
on audio and visual information. Then the placement of the subtitles is
determined using global optimization. A comprehensive usability study indicated
that our subtitle placement method outperformed both conventional
fixed-position subtitling and another previous dynamic subtitling method in
terms of enhancing the overall viewing experience and reducing eyestrain
- …