1,407 research outputs found

    Robust Registration of Dynamic Facial Sequences.

    Get PDF
    Accurate face registration is a key step for several image analysis applications. However, existing registration methods are prone to temporal drift errors or jitter among consecutive frames. In this paper, we propose an iterative rigid registration framework that estimates the misalignment with trained regressors. The input of the regressors is a robust motion representation that encodes the motion between a misaligned frame and the reference frame(s), and enables reliable performance under non-uniform illumination variations. Drift errors are reduced when the motion representation is computed from multiple reference frames. Furthermore, we use the L2 norm of the representation as a cue for performing coarse-to-fine registration efficiently. Importantly, the framework can identify registration failures and correct them. Experiments show that the proposed approach achieves significantly higher registration accuracy than the state-of-the-art techniques in challenging sequences.The research work of Evangelos Sariyanidi and Hatice Gunes has been partially supported by the EPSRC under its IDEAS Factory Sandpits call on Digital Personhood (Grant Ref.: EP/L00416X/1)

    Time Expressions Recognition with Word Vectors and Neural Networks

    Get PDF
    This work re-examines the widely addressed problem of the recognition and interpretation of time expressions, and suggests an approach based on distributed representations and artificial neural networks. Artificial neural networks allow us to build highly generic models, but the large variety of hyperparameters makes it difficult to determine the best configuration. In this work we study the behavior of different models by varying the number of layers, sizes and normalization techniques. We also analyze the behavior of distributed representations in the temporal domain, where we find interesting properties regarding order and granularity. The experiments were conducted mainly for Spanish, although this does not affect the approach, given its generic nature. This work aims to be a starting point towards processing temporality in texts via word vectors and neural networks, without the need of any kind of feature engineering

    Ontologies and Information Extraction

    Full text link
    This report argues that, even in the simplest cases, IE is an ontology-driven process. It is not a mere text filtering method based on simple pattern matching and keywords, because the extracted pieces of texts are interpreted with respect to a predefined partial domain model. This report shows that depending on the nature and the depth of the interpretation to be done for extracting the information, more or less knowledge must be involved. This report is mainly illustrated in biology, a domain in which there are critical needs for content-based exploration of the scientific literature and which becomes a major application domain for IE

    A Comprehensive Performance Evaluation of Deformable Face Tracking "In-the-Wild"

    Full text link
    Recently, technologies such as face detection, facial landmark localisation and face recognition and verification have matured enough to provide effective and efficient solutions for imagery captured under arbitrary conditions (referred to as "in-the-wild"). This is partially attributed to the fact that comprehensive "in-the-wild" benchmarks have been developed for face detection, landmark localisation and recognition/verification. A very important technology that has not been thoroughly evaluated yet is deformable face tracking "in-the-wild". Until now, the performance has mainly been assessed qualitatively by visually assessing the result of a deformable face tracking technology on short videos. In this paper, we perform the first, to the best of our knowledge, thorough evaluation of state-of-the-art deformable face tracking pipelines using the recently introduced 300VW benchmark. We evaluate many different architectures focusing mainly on the task of on-line deformable face tracking. In particular, we compare the following general strategies: (a) generic face detection plus generic facial landmark localisation, (b) generic model free tracking plus generic facial landmark localisation, as well as (c) hybrid approaches using state-of-the-art face detection, model free tracking and facial landmark localisation technologies. Our evaluation reveals future avenues for further research on the topic.Comment: E. Antonakos and P. Snape contributed equally and have joint second authorshi

    Real-Time Purchase Prediction Using Retail Video Analytics

    Get PDF
    The proliferation of video data in retail marketing brings opportunities for researchers to study customer behavior using rich video information. Our study demonstrates how to understand customer behavior of multiple dimensions using video analytics on a scalable basis. We obtained a unique video footage data collected from in-store cameras, resulting in approximately 20,000 customers involved and over 6,000 payments recorded. We extracted features on the demographics, appearance, emotion, and contextual dimensions of customer behavior from the video with state-of-the-art computer vision techniques and proposed a novel framework using machine learning and deep learning models to predict consumer purchase decision. Results showed that our framework makes accurate predictions which indicate the importance of incorporating emotional response into prediction. Our findings reveal multi-dimensional drivers of purchase decision and provide an implementable video analytics tool for marketers. It shows possibility of involving personalized recommendations that would potentially integrate our framework into omnichannel landscape

    A Study on Techniques and Challenges in Sign Language Translation

    Get PDF
    Sign Language Translation (SLT) plays a pivotal role in enabling effective communication for the Deaf and Hard of Hearing (DHH) community. This review delves into the state-of-the-art techniques and methodologies in SLT, focusing on its significance, challenges, and recent advancements. The review provides a comprehensive analysis of various SLT approaches, ranging from rule-based systems to deep learning models, highlighting their strengths and limitations. Datasets specifically tailored for SLT research are explored, shedding light on the diversity and complexity of Sign Languages across the globe. The review also addresses critical issues in SLT, such as the expressiveness of generated signs, facial expressions, and non-manual signals. Furthermore, it discusses the integration of SLT into assistive technologies and educational tools, emphasizing the transformative potential in enhancing accessibility and inclusivity. Finally, the review outlines future directions, including the incorporation of multimodal inputs and the imperative need for co-creation with the Deaf community, paving the way for more accurate, expressive, and culturally sensitive Sign Language Generation systems

    Staticand Dynamic Facial Emotion Recognition Using Neural Network Models

    Get PDF
    Emotion recognition is the process of identifying human emotions. It is made possible by processing various modalities including facial expressions, speech signals, biometricsignals,etc. Withtheadvancementsincomputingtechnologies,FacialEmo tion Recognition (FER) became important for several applications in which the user’s emotional state is required, such as emotional training for autistic children. The recent years witnessed a major leap in Artificial Intelligence(AI),specially neural networks for computer vision applications. In this thesis, we investigate the application of AI algo rithms for FER from static and dynamic data. Our experiments address the limitations and challenges of previous works such as limited generalizability due to the datasets. We compare the performance of machine learning classifiers and convolution neural networks (CNNs) for FER from static data (images). Moreover, we study the perfor mance of the proposed CNN for dynamic FER(videos),in addition to Long-ShortTerm Memory(LSTM)inaCNN-LSTM hybrid approach to utilize the temporal information in the videos. The proposed CNN architecture out performed the other classifiers with an accuracy of 86.5%. It also outperformed the hybrid approach for dynamic FER which achievedanaccuracyof74.6
    corecore