2,610 research outputs found

    FEELS: a full-spectrum enhanced emotion learning system for assisting individuals with autism spectrum disorder

    Get PDF
    Autism Spectrum Disorder (ASD) is a developmental disorder thatcan lead to a variety of social and communication challenges, andindividuals with ASD are at a higher risk of loneliness and depres-sion as a result of the disconnect and isolation they may feel fromthe rest of society as a result of their ASD. Interventions targetingimproved emotional detection has been clinically shown to be quitepromising; however, there are considerable barriers that make itchallenging to incorporate emotion detection within daily life sce-narios. Motivated by the need to fill this gap, we introduce theconcept of FEELS, a full-spectrum enhanced emotion learning sys-tem which could be useful as a tool to assist individuals with ASD.FEELS facilitates enhanced emotion detection by capturing a livevideo stream of individuals in real-time, then leveraging deep con-volutional neural networks to detect facial landmarks and a customhybrid neural network consisting of a time distributed feed-forwardneural network and a LTSM neural network to determine the emo-tional state of the individuals based on a sequence of facial land-marks over time. The feasibility of such an approach was exploredthrough the construction of a proof-of-concept FEELS system thatcan detect between five different basic emotional states: neutral,sad, happy, surprise, and anger. Future work will include extend-ing the proof-of-concept FEELS system to detect more emotionalstates and evaluate the system in more natural settings

    A Comprehensive Performance Evaluation of Deformable Face Tracking "In-the-Wild"

    Full text link
    Recently, technologies such as face detection, facial landmark localisation and face recognition and verification have matured enough to provide effective and efficient solutions for imagery captured under arbitrary conditions (referred to as "in-the-wild"). This is partially attributed to the fact that comprehensive "in-the-wild" benchmarks have been developed for face detection, landmark localisation and recognition/verification. A very important technology that has not been thoroughly evaluated yet is deformable face tracking "in-the-wild". Until now, the performance has mainly been assessed qualitatively by visually assessing the result of a deformable face tracking technology on short videos. In this paper, we perform the first, to the best of our knowledge, thorough evaluation of state-of-the-art deformable face tracking pipelines using the recently introduced 300VW benchmark. We evaluate many different architectures focusing mainly on the task of on-line deformable face tracking. In particular, we compare the following general strategies: (a) generic face detection plus generic facial landmark localisation, (b) generic model free tracking plus generic facial landmark localisation, as well as (c) hybrid approaches using state-of-the-art face detection, model free tracking and facial landmark localisation technologies. Our evaluation reveals future avenues for further research on the topic.Comment: E. Antonakos and P. Snape contributed equally and have joint second authorshi

    Data Fusion for Real-time Multimodal Emotion Recognition through Webcams and Microphones in E-Learning

    Get PDF
    The original article is available on the Taylor & Francis Online website in the following link: http://www.tandfonline.com/doi/abs/10.1080/10447318.2016.1159799?journalCode=hihc20This paper describes the validation study of our software that uses combined webcam and microphone data for real-time, continuous, unobtrusive emotion recognition as part of our FILTWAM framework. FILTWAM aims at deploying a real time multimodal emotion recognition method for providing more adequate feedback to the learners through an online communication skills training. Herein, timely feedback is needed that reflects on their shown intended emotions and which is also useful to increase learners’ awareness of their own behaviour. At least, a reliable and valid software interpretation of performed face and voice emotions is needed to warrant such adequate feedback. This validation study therefore calibrates our software. The study uses a multimodal fusion method. Twelve test persons performed computer-based tasks in which they were asked to mimic specific facial and vocal emotions. All test persons’ behaviour was recorded on video and two raters independently scored the showed emotions, which were contrasted with the software recognition outcomes. A hybrid method for multimodal fusion of our multimodal software shows accuracy between 96.1% and 98.6% for the best-chosen WEKA classifiers over predicted emotions. The software fulfils its requirements of real-time data interpretation and reliable results.The Netherlands Laboratory for Lifelong Learning (NELLL) of the Open University Netherlands

    Speaker-following Video Subtitles

    Full text link
    We propose a new method for improving the presentation of subtitles in video (e.g. TV and movies). With conventional subtitles, the viewer has to constantly look away from the main viewing area to read the subtitles at the bottom of the screen, which disrupts the viewing experience and causes unnecessary eyestrain. Our method places on-screen subtitles next to the respective speakers to allow the viewer to follow the visual content while simultaneously reading the subtitles. We use novel identification algorithms to detect the speakers based on audio and visual information. Then the placement of the subtitles is determined using global optimization. A comprehensive usability study indicated that our subtitle placement method outperformed both conventional fixed-position subtitling and another previous dynamic subtitling method in terms of enhancing the overall viewing experience and reducing eyestrain
    • …
    corecore