3,550 research outputs found

    Dictionary-based lip reading classification

    Get PDF
    Visual lip reading recognition is an essential stage in many multimedia systems such as ā€œAudio Visual Speech Recognitionā€ [6], ā€œMobile Phone Visual System for deaf peopleā€, ā€œSign Language Recognition Systemā€, etc. The use of lip visual features to help audio or hand recognition is appropriate because this information is robust to acoustic noise. In this paper, we describe our work towards developing a robust technique for lip reading classification that extracts the lips in a colour image by using EMPCA feature extraction and k-nearest-neighbor classification. In order to reduce the dimensionality of the feature space the lip motion is characterized by three templates that are modelled based on different mouth shapes: closed template, semi-closed template, and wideopen template. Our goal is to classify each image sequence based on the distribution of the three templates and group the words into different clusters. The words that form the database were grouped into three different clusters as follows: group1(ā€˜Iā€™, ā€˜highā€™, ā€˜lieā€™, ā€˜hardā€™, ā€˜cardā€™, ā€˜byeā€™), group2(ā€˜you, ā€˜oweā€™, ā€˜wordā€™), group3(ā€˜birdā€™)

    A Comprehensive Performance Evaluation of Deformable Face Tracking "In-the-Wild"

    Full text link
    Recently, technologies such as face detection, facial landmark localisation and face recognition and verification have matured enough to provide effective and efficient solutions for imagery captured under arbitrary conditions (referred to as "in-the-wild"). This is partially attributed to the fact that comprehensive "in-the-wild" benchmarks have been developed for face detection, landmark localisation and recognition/verification. A very important technology that has not been thoroughly evaluated yet is deformable face tracking "in-the-wild". Until now, the performance has mainly been assessed qualitatively by visually assessing the result of a deformable face tracking technology on short videos. In this paper, we perform the first, to the best of our knowledge, thorough evaluation of state-of-the-art deformable face tracking pipelines using the recently introduced 300VW benchmark. We evaluate many different architectures focusing mainly on the task of on-line deformable face tracking. In particular, we compare the following general strategies: (a) generic face detection plus generic facial landmark localisation, (b) generic model free tracking plus generic facial landmark localisation, as well as (c) hybrid approaches using state-of-the-art face detection, model free tracking and facial landmark localisation technologies. Our evaluation reveals future avenues for further research on the topic.Comment: E. Antonakos and P. Snape contributed equally and have joint second authorshi

    Key-Pose Prediction in Cyclic Human Motion

    Get PDF
    In this paper we study the problem of estimating innercyclic time intervals within repetitive motion sequences of top-class swimmers in a swimming channel. Interval limits are given by temporal occurrences of key-poses, i.e. distinctive postures of the body. A key-pose is defined by means of only one or two specific features of the complete posture. It is often difficult to detect such subtle features directly. We therefore propose the following method: Given that we observe the swimmer from the side, we build a pictorial structure of poselets to robustly identify random support poses within the regular motion of a swimmer. We formulate a maximum likelihood model which predicts a key-pose given the occurrences of multiple support poses within one stroke. The maximum likelihood can be extended with prior knowledge about the temporal location of a key-pose in order to improve the prediction recall. We experimentally show that our models reliably and robustly detect key-poses with a high precision and that their performance can be improved by extending the framework with additional camera views.Comment: Accepted at WACV 2015, 8 pages, 3 figure

    Facial Feature Tracking and Occlusion Recovery in American Sign Language

    Full text link
    Facial features play an important role in expressing grammatical information in signed languages, including American Sign Language(ASL). Gestures such as raising or furrowing the eyebrows are key indicators of constructions such as yes-no questions. Periodic head movements (nods and shakes) are also an essential part of the expression of syntactic information, such as negation (associated with a side-to-side headshake). Therefore, identification of these facial gestures is essential to sign language recognition. One problem with detection of such grammatical indicators is occlusion recovery. If the signer's hand blocks his/her eyebrows during production of a sign, it becomes difficult to track the eyebrows. We have developed a system to detect such grammatical markers in ASL that recovers promptly from occlusion. Our system detects and tracks evolving templates of facial features, which are based on an anthropometric face model, and interprets the geometric relationships of these templates to identify grammatical markers. It was tested on a variety of ASL sentences signed by various Deaf native signers and detected facial gestures used to express grammatical information, such as raised and furrowed eyebrows as well as headshakes.National Science Foundation (IIS-0329009, IIS-0093367, IIS-9912573, EIA-0202067, EIA-9809340

    Silhouette-based gait recognition using Procrustes shape analysis and elliptic Fourier descriptors

    Get PDF
    This paper presents a gait recognition method which combines spatio-temporal motion characteristics, statistical and physical parameters (referred to as STM-SPP) of a human subject for its classification by analysing shape of the subject's silhouette contours using Procrustes shape analysis (PSA) and elliptic Fourier descriptors (EFDs). STM-SPP uses spatio-temporal gait characteristics and physical parameters of human body to resolve similar dissimilarity scores between probe and gallery sequences obtained by PSA. A part-based shape analysis using EFDs is also introduced to achieve robustness against carrying conditions. The classification results by PSA and EFDs are combined, resolving tie in ranking using contour matching based on Hu moments. Experimental results show STM-SPP outperforms several silhouette-based gait recognition methods

    Recognition of nonmanual markers in American Sign Language (ASL) using non-parametric adaptive 2D-3D face tracking

    Full text link
    This paper addresses the problem of automatically recognizing linguistically significant nonmanual expressions in American Sign Language from video. We develop a fully automatic system that is able to track facial expressions and head movements, and detect and recognize facial events continuously from video. The main contributions of the proposed framework are the following: (1) We have built a stochastic and adaptive ensemble of face trackers to address factors resulting in lost face track; (2) We combine 2D and 3D deformable face models to warp input frames, thus correcting for any variation in facial appearance resulting from changes in 3D head pose; (3) We use a combination of geometric features and texture features extracted from a canonical frontal representation. The proposed new framework makes it possible to detect grammatically significant nonmanual expressions from continuous signing and to differentiate successfully among linguistically significant expressions that involve subtle differences in appearance. We present results that are based on the use of a dataset containing 330 sentences from videos that were collected and linguistically annotated at Boston University

    Computerized Analysis of Magnetic Resonance Images to Study Cerebral Anatomy in Developing Neonates

    Get PDF
    The study of cerebral anatomy in developing neonates is of great importance for the understanding of brain development during the early period of life. This dissertation therefore focuses on three challenges in the modelling of cerebral anatomy in neonates during brain development. The methods that have been developed all use Magnetic Resonance Images (MRI) as source data. To facilitate study of vascular development in the neonatal period, a set of image analysis algorithms are developed to automatically extract and model cerebral vessel trees. The whole process consists of cerebral vessel tracking from automatically placed seed points, vessel tree generation, and vasculature registration and matching. These algorithms have been tested on clinical Time-of- Flight (TOF) MR angiographic datasets. To facilitate study of the neonatal cortex a complete cerebral cortex segmentation and reconstruction pipeline has been developed. Segmentation of the neonatal cortex is not effectively done by existing algorithms designed for the adult brain because the contrast between grey and white matter is reversed. This causes pixels containing tissue mixtures to be incorrectly labelled by conventional methods. The neonatal cortical segmentation method that has been developed is based on a novel expectation-maximization (EM) method with explicit correction for mislabelled partial volume voxels. Based on the resulting cortical segmentation, an implicit surface evolution technique is adopted for the reconstruction of the cortex in neonates. The performance of the method is investigated by performing a detailed landmark study. To facilitate study of cortical development, a cortical surface registration algorithm for aligning the cortical surface is developed. The method first inflates extracted cortical surfaces and then performs a non-rigid surface registration using free-form deformations (FFDs) to remove residual alignment. Validation experiments using data labelled by an expert observer demonstrate that the method can capture local changes and follow the growth of specific sulcus

    Linguistically-driven framework for computationally efficient and scalable sign recognition

    Full text link
    We introduce a new general framework for sign recognition from monocular video using limited quantities of annotated data. The novelty of the hybrid framework we describe here is that we exploit state-of-the art learning methods while also incorporating features based on what we know about the linguistic composition of lexical signs. In particular, we analyze hand shape, orientation, location, and motion trajectories, and then use CRFs to combine this linguistically significant information for purposes of sign recognition. Our robust modeling and recognition of these sub-components of sign production allow an efficient parameterization of the sign recognition problem as compared with purely data-driven methods. This parameterization enables a scalable and extendable time-series learning approach that advances the state of the art in sign recognition, as shown by the results reported here for recognition of isolated, citation-form, lexical signs from American Sign Language (ASL)
    • ā€¦
    corecore