6 research outputs found

    3D face tracking and multi-scale, spatio-temporal analysis of linguistically significant facial expressions and head positions in ASL

    Full text link
    Essential grammatical information is conveyed in signed languages by clusters of events involving facial expressions and movements of the head and upper body. This poses a significant challenge for computer-based sign language recognition. Here, we present new methods for the recognition of nonmanual grammatical markers in American Sign Language (ASL) based on: (1) new 3D tracking methods for the estimation of 3D head pose and facial expressions to determine the relevant low-level features; (2) methods for higher-level analysis of component events (raised/lowered eyebrows, periodic head nods and head shakes) used in grammatical markings—with differentiation of temporal phases (onset, core, offset, where appropriate), analysis of their characteristic properties, and extraction of corresponding features; (3) a 2-level learning framework to combine lowand high-level features of differing spatio-temporal scales. This new approach achieves significantly better tracking and recognition results than our previous methods

    Computer-based tracking, analysis, and visualization of linguistically significant nonmanual events in American Sign Language (ASL)

    Full text link
    Our linguistically annotated American Sign Language (ASL) corpora have formed a basis for research to automate detection by computer of essential linguistic information conveyed through facial expressions and head movements. We have tracked head position and facial deformations, and used computational learning to discern specific grammatical markings. Our ability to detect, identify, and temporally localize the occurrence of such markings in ASL videos has recently been improved by incorporation of (1) new techniques for deformable model-based 3D tracking of head position and facial expressions, which provide significantly better tracking accuracy and recover quickly from temporary loss of track due to occlusion; and (2) a computational learning approach incorporating 2-level Conditional Random Fields (CRFs), suited to the multi-scale spatio-temporal characteristics of the data, which analyses not only low-level appearance characteristics, but also the patterns that enable identification of significant gestural components, such as periodic head movements and raised or lowered eyebrows. Here we summarize our linguistically motivated computational approach and the results for detection and recognition of nonmanual grammatical markings; demonstrate our data visualizations, and discuss the relevance for linguistic research; and describe work underway to enable such visualizations to be produced over large corpora and shared publicly on the Web

    Automatic segmentation of grammatical facial expressions in sign language: towards an inclusive communication experience

    Get PDF
    Nowadays, natural language processing techniques enable the development of applications that promote communication between humans and between humans and machines. Although the technology related to automated oral communication is mature and affordable, there are currently no appropriate solutions for visual-spatial languages. In the scarce efforts to automatically process sign languages, studies on non-manual gestures are rare, making it difficult to properly interpret the speeches uttered in those languages. In this paper, we present a solution for the automatic segmentation of grammatical facial expressions in sign language. This is a low-cost computational solution designed to integrate a sign language processing framework that supports the development of simple but high value-added applications for the context of universal communication. Moreover, we present a discussion of the difficulties faced by this solution to guide future research in this area

    Eye and mouth openness estimation in sign language and news broadcast videos

    Get PDF
    Currently there exists an increasing need of automatic video analysis tools to support sign language studies and the evaluation of the activity of the face in sign language and other videos. Henceforth, research focusing on automatic estimation and annotation of videos and facial gestures is continuously developing. In this work, techniques for the estimation of eye and mouth openness and eyebrow position are studied. Such estimation could prove beneficial for automatic annotation and quantitative evaluation of sign language videos as well as towards more prolific production of sign language material. The method proposed for the estimation of the eyebrow position, eye openness, and mouth state is based on the construction of a set of facial landmarks that employ different detection techniques designed for each facial element. Furthermore, we compare the presented landmark detection algorithm with a recently published third-party face alignment algorithm. The landmarks are used to compute features which describe the geometric information of the elements of the face. The features constitute the input for the classifiers that can produce quantized openness estimates for the studied facial elements. Finally, the estimation performance of the estimations is evaluated in quantitative and qualitative experiments with sign language and news broadcast videos

    Recognizing Eyebrow and Periodic Head Gestures Using CRFs for Non-Manual Grammatical Marker Detection in ASL

    No full text
    Abstract — Changes in eyebrow configuration, in combination with head gestures and other facial expressions, are used to signal essential grammatical information in signed languages. Motivated by the goal of improving the detection of nonmanual grammatical markings in American Sign Language (ASL), we introduce a 2-level CRF method for recognition of the components of eyebrow and periodic head gestures, differentiating the linguistically significant domain (core) from transitional movements (which we refer to as the onset and offset). We use a robust face tracker and 3D warping to extract and combine the geometric and appearance features, as well as a feature selection method to further improve the recognition accuracy. For the second level of the CRFs, linguistic annotations were used as training for partitioning of the gestures, to separate the onset and offset. This partitioning is essential to recognition of the linguistically significant domains (in between). We then use the recognition of onset, core, and offset of these gestures together with the lower level features to detect non-manual grammatical markers in ASL. I
    corecore