Search CORE

26,423 research outputs found

Lip Detection and Adaptive Tracking

Author: Wang Benjamin
Publication venue: DigitalCommons@CalPoly
Publication date: 01/01/2017
Field of study

Performance of automatic speech recognition (ASR) systems utilizing only acoustic information degrades significantly in noisy environments such as a car cabins. Incorporating audio and visual information together can improve performance in these situations. This work proposes a lip detection and tracking algorithm to serve as a visual front end to an audio-visual automatic speech recognition (AVASR) system. Several color spaces are examined that are effective for segmenting lips from skin pixels. These color components and several features are used to characterize lips and to train cascaded lip detectors. Pre- and post-processing techniques are employed to maximize detector accuracy. The trained lip detector is incorporated into an adaptive mean-shift tracking algorithm for tracking lips in a car cabin environment. The resulting detector achieves 96.8% accuracy, and the tracker is shown to recover and adapt in scenarios where mean-shift alone fails

DigitalCommons@CalPoly

Automatic Detection of Pain from Spontaneous Facial Expressions

Author: Loy Fong Ling
Meawad Fatma
Yang Su-Yin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

This paper presents a new approach for detecting pain in sequences of spontaneous facial expressions. The motivation for this work is to accompany mobile-based self-management of chronic pain as a virtual sensor for tracking patients' expressions in real-world settings. Operating under such constraints requires a resource efficient approach for processing non-posed facial expressions from unprocessed temporal data. In this work, the facial action units of pain are modeled as sets of distances among related facial landmarks. Using standardized measurements of pain versus no-pain that are specific to each user, changes in the extracted features in relation to pain are detected. The activated features in each frame are combined using an adapted form of the Prkachin and Solomon Pain Intensity scale (PSPI) to detect the presence of pain per frame. Painful features must be activated in N consequent frames (time window) to indicate the presence of pain in a session. The discussed method was tested on 171 video sessions for 19 subjects from the McMaster painful dataset for spontaneous facial expressions. The results show higher precision than coverage in detecting sequences of pain. Our algorithm achieves 94% precision (F-score=0.82) against human observed labels, 74% precision (F-score=0.62) against automatically generated pain intensities and 100% precision (F-score=0.67) against self-reported pain intensities

Crossref

Enlighten

Lip segmentation using adaptive color space training

Author: Erdogan Hakan
Erdoğan Hakan
Karabalkan Harun
Ozgur Erol
Unel Mustafa
Yılmaz Mustafa Berkay
Yilmaz Mustafa Berkay
Özgür Erol
Ünel Mustafa
Publication venue: 'The International Fiscal Association of Korea'
Publication date: 01/09/2008
Field of study

In audio-visual speech recognition (AVSR), it is beneficial to use lip boundary information in addition to texture-dependent features. In this paper, we propose an automatic lip segmentation method that can be used in AVSR systems. The algorithm consists of the following steps: face detection, lip corners extraction, adaptive color space training for lip and non-lip regions using Gaussian mixture models (GMMs), and curve evolution using level-set formulation based on region and image gradients fields. Region-based fields are obtained using adapted GMM likelihoods. We have tested the proposed algorithm on a database (SU-TAV) of 100 facial images and obtained objective performance results by comparing automatic lip segmentations with hand-marked ground truth segmentations. Experimental results are promising and much work has to be done to improve the robustness of the proposed method

Sabanci University Research Database

Speaker-following Video Subtitles

Author: Hu Yongtao
Kautz Jan
Wang Wenping
Yu Yizhou
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

We propose a new method for improving the presentation of subtitles in video (e.g. TV and movies). With conventional subtitles, the viewer has to constantly look away from the main viewing area to read the subtitles at the bottom of the screen, which disrupts the viewing experience and causes unnecessary eyestrain. Our method places on-screen subtitles next to the respective speakers to allow the viewer to follow the visual content while simultaneously reading the subtitles. We use novel identification algorithms to detect the speakers based on audio and visual information. Then the placement of the subtitles is determined using global optimization. A comprehensive usability study indicated that our subtitle placement method outperformed both conventional fixed-position subtitling and another previous dynamic subtitling method in terms of enhancing the overall viewing experience and reducing eyestrain

arXiv.org e-Print Archive

HKU Scholars Hub

Face analysis using curve edge maps

Author: D. Cristinacce
F. Deboeverie
F. Deboeverie
F. Deboeverie
J. Canny
K.J. Karande
L. Ding
L. Liang
O. Jesorsky
P. Veelaert
S. Belongie
S.-H. Cha
T.F. Cootes
X. Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

This paper proposes an automatic and real-time system for face analysis, usable in visual communication applications. In this approach, faces are represented with Curve Edge Maps, which are collections of polynomial segments with a convex region. The segments are extracted from edge pixels using an adaptive incremental linear-time fitting algorithm, which is based on constructive polynomial fitting. The face analysis system considers face tracking, face recognition and facial feature detection, using Curve Edge Maps driven by histograms of intensities and histograms of relative positions. When applied to different face databases and video sequences, the average face recognition rate is 95.51%, the average facial feature detection rate is 91.92% and the accuracy in location of the facial features is 2.18% in terms of the size of the face, which is comparable with or better than the results in literature. However, our method has the advantages of simplicity, real-time performance and extensibility to the different aspects of face analysis, such as recognition of facial expressions and talking

Crossref

Ghent University Academic Bibliography

Recommended from our members

MUSCLE movie-database: a multimodal corpus with rich annotation for dialogue and saliency detection

Author: Antonopoulos P.
Benetos E.
Kotropoulos C.
Kotti M.
Maragos P.
Moschou V.
Nikolaidis N.
Pitas I.
Spachos D.
Tzimouli K.
Zlantintsi A.
Publication venue
Publication date: 01/01/2008
Field of study

City Research Online

Spiral - Imperial College Digital Repository

Dictionary-based lip reading classification

Author: Ghita Ovidiu
Sutherland Alistair
Whelan Paul F.
Yu Dahai
Publication venue
Publication date: 01/01/2006
Field of study

Visual lip reading recognition is an essential stage in many multimedia systems such as “Audio Visual Speech Recognition” [6], “Mobile Phone Visual System for deaf people”, “Sign Language Recognition System”, etc. The use of lip visual features to help audio or hand recognition is appropriate because this information is robust to acoustic noise. In this paper, we describe our work towards developing a robust technique for lip reading classification that extracts the lips in a colour image by using EMPCA feature extraction and k-nearest-neighbor classification. In order to reduce the dimensionality of the feature space the lip motion is characterized by three templates that are modelled based on different mouth shapes: closed template, semi-closed template, and wideopen template. Our goal is to classify each image sequence based on the distribution of the three templates and group the words into different clusters. The words that form the database were grouped into three different clusters as follows: group1(‘I’, ‘high’, ‘lie’, ‘hard’, ‘card’, ‘bye’), group2(‘you, ‘owe’, ‘word’), group3(‘bird’)

CiteSeerX

Irish Universities

DCU Online Research Access Service

Robust Modeling of Epistemic Mental States

Author: Anam ASM Iftekhar
Rahman AKMMahbubur
Yeasin Mohammed
Publication venue
Publication date: 28/05/2020
Field of study

This work identifies and advances some research challenges in the analysis of facial features and their temporal dynamics with epistemic mental states in dyadic conversations. Epistemic states are: Agreement, Concentration, Thoughtful, Certain, and Interest. In this paper, we perform a number of statistical analyses and simulations to identify the relationship between facial features and epistemic states. Non-linear relations are found to be more prevalent, while temporal features derived from original facial features have demonstrated a strong correlation with intensity changes. Then, we propose a novel prediction framework that takes facial features and their nonlinear relation scores as input and predict different epistemic states in videos. The prediction of epistemic states is boosted when the classification of emotion changing regions such as rising, falling, or steady-state are incorporated with the temporal features. The proposed predictive models can predict the epistemic states with significantly improved accuracy: correlation coefficient (CoERR) for Agreement is 0.827, for Concentration 0.901, for Thoughtful 0.794, for Certain 0.854, and for Interest 0.913.Comment: Accepted for Publication in Multimedia Tools and Application, Special Issue: Socio-Affective Technologie

arXiv.org e-Print Archive

University of Memphis Digital Commons