84 research outputs found

    Recognizing Emotions Conveyed through Facial Expressions

    Get PDF
    Emotional communication is a key element of habilitation care of persons with dementia. It is, therefore, highly preferable for assistive robots that are used to supplement human care provided to persons with dementia, to possess the ability to recognize and respond to emotions expressed by those who are being cared-for. Facial expressions are one of the key modalities through which emotions are conveyed. This work focuses on computer vision-based recognition of facial expressions of emotions conveyed by the elderly. Although there has been much work on automatic facial expression recognition, the algorithms have been experimentally validated primarily on young faces. The facial expressions on older faces has been totally excluded. This is due to the fact that the facial expression databases that were available and that have been used in facial expression recognition research so far do not contain images of facial expressions of people above the age of 65 years. To overcome this problem, we adopt a recently published database, namely, the FACES database, which was developed to address exactly the same problem in the area of human behavioural research. The FACES database contains 2052 images of six different facial expressions, with almost identical and systematic representation of the young, middle-aged and older age-groups. In this work, we evaluate and compare the performance of two of the existing imagebased approaches for facial expression recognition, over a broad spectrum of age ranging from 19 to 80 years. The evaluated systems use Gabor filters and uniform local binary patterns (LBP) for feature extraction, and AdaBoost.MH with multi-threshold stump learner for expression classification. We have experimentally validated the hypotheses that facial expression recognition systems trained only on young faces perform poorly on middle-aged and older faces, and that such systems confuse ageing-related facial features on neutral faces with other expressions of emotions. We also identified that, among the three age-groups, the middle-aged group provides the best generalization performance across the entire age spectrum. The performance of the systems was also compared to the performance of humans in recognizing facial expressions of emotions. Some similarities were observed, such as, difficulty in recognizing the expressions on older faces, and difficulty in recognizing the expression of sadness. The findings of our work establish the need for developing approaches for facial expression recognition that are robust to the effects of ageing on the face. The scientific results of our work can be used as a basis to guide future research in this direction

    Advanced Biometrics with Deep Learning

    Get PDF
    Biometrics, such as fingerprint, iris, face, hand print, hand vein, speech and gait recognition, etc., as a means of identity management have become commonplace nowadays for various applications. Biometric systems follow a typical pipeline, that is composed of separate preprocessing, feature extraction and classification. Deep learning as a data-driven representation learning approach has been shown to be a promising alternative to conventional data-agnostic and handcrafted pre-processing and feature extraction for biometric systems. Furthermore, deep learning offers an end-to-end learning paradigm to unify preprocessing, feature extraction, and recognition, based solely on biometric data. This Special Issue has collected 12 high-quality, state-of-the-art research papers that deal with challenging issues in advanced biometric systems based on deep learning. The 12 papers can be divided into 4 categories according to biometric modality; namely, face biometrics, medical electronic signals (EEG and ECG), voice print, and others

    Real-time scalable video coding for surveillance applications on embedded architectures

    Get PDF

    Enhanced Deep Learning Architectures for Face Liveness Detection for Static and Video Sequences

    Get PDF
    The major contribution of this research is the development of deep architectures for face liveness detection on a static image as well as video sequences that use a combination of texture analysis and deep Convolutional Neural Network (CNN) to classify the captured image or video as real or fake. Face recognition is a popular and efficient form of biometric authentication used in many software applications. One drawback of this technique is that, it is prone to face spoofing attacks, where an impostor can gain access to the system by presenting a photograph or recorded video of a valid user to the sensor. Thus, face liveness detection is a critical preprocessing step in face recognition authentication systems. The first part of our research was on face liveness detection on a static image, where we applied nonlinear diffusion based on an additive operator splitting scheme and a tri-diagonal matrix block-solver algorithm to the image, which enhances the edges and surface texture in the real image. The diffused image was then fed to a deep CNN to identify the complex and deep features for classification. We obtained high accuracy on the NUAA Photograph Impostor dataset using one of our enhanced architectures. In the second part of our research, we developed an end-to-end real-time solution for face liveness detection on static images, where instead of using a separate preprocessing step for diffusing the images, we used a combined architecture where the diffusion process and CNN were implemented in a single step. This integrated approach gave promising results with two different architectures, on the Replay-Attack and Replay-Mobile datasets. We also developed a novel deep architecture for face liveness detection on video frames that uses the diffusion of images followed by a deep CNN and Long Short-Term Memory (LSTM) to classify the video sequence as real or fake. Performance evaluation of our architecture on the Replay-Attack and Replay-Mobile datasets gave very competitive results. We performed liveness detection on video sequences using diffusion and the Two-Stream Inflated 3D ConvNet (I3D) architecture, and our experiments on the Replay-Attack and Replay-Mobile datasets gave very good results

    Lip print based authentication in physical access control Environments

    Get PDF
    Abstract: In modern society, there is an ever-growing need to determine the identity of a person in many applications including computer security, financial transactions, borders, and forensics. Early automated methods of authentication relied mostly on possessions and knowledge. Notably these authentication methods such as passwords and access cards are based on properties that can be lost, stolen, forgotten, or disclosed. Fortunately, biometric recognition provides an elegant solution to these shortcomings by identifying a person based on their physiological or behaviourial characteristics. However, due to the diverse nature of biometric applications (e.g., unlocking a mobile phone to cross an international border), no biometric trait is likely to be ideal and satisfy the criteria for all applications. Therefore, it is necessary to investigate novel biometric modalities to establish the identity of individuals on occasions where techniques such as fingerprint or face recognition are unavailable. One such modality that has gained much attention in recent years which originates from forensic practices is the lip. This research study considers the use of computer vision methods to recognise different lip prints for achieving the task of identification. To determine whether the research problem of the study is valid, a literature review is conducted which helps identify the problem areas and the different computer vision methods that can be used for achieving lip print recognition. Accordingly, the study builds on these areas and proposes lip print identification experiments with varying models which identifies individuals solely based on their lip prints and provides guidelines for the implementation of the proposed system. Ultimately, the experiments encapsulate the broad categories of methods for achieving lip print identification. The implemented computer vision pipelines contain different stages including data augmentation, lip detection, pre-processing, feature extraction, feature representation and classification. Three pipelines were implemented from the proposed model which include a traditional machine learning pipeline, a deep learning-based pipeline and a deep hybridlearning based pipeline. Different metrics reported in literature are used to assess the performance of the prototype such as IoU, mAP, accuracy, precision, recall, F1 score, EER, ROC curve, PR curve, accuracy and loss curves. The first pipeline of the current study is a classical pipeline which employs a facial landmark detector (One Millisecond Face Alignment algorithm) to detect the lip, SURF for feature extraction, BoVW for feature representation and an SVM or K-NN classifier. The second pipeline makes use of the facial landmark detector and a VGG16 or ResNet50 architecture. The findings reveal that the ResNet50 is the best performing method for lip print identification for the current study. The third pipeline also employs the facial landmark detector, the ResNet50 architecture for feature extraction with an SVM classifier. The development of the experiments is validated and benchmarked to determine the extent or performance at which it can achieve lip print identification. The results of the benchmark for the prototype, indicate that the study accomplishes the objective of identifying individuals based on their lip prints using computer vision methods. The results also determine that the use of deep learning architectures such as ResNet50 yield promising results.M.Sc. (Science

    Novel image descriptors and learning methods for image classification applications

    Get PDF
    Image classification is an active and rapidly expanding research area in computer vision and machine learning due to its broad applications. With the advent of big data, the need for robust image descriptors and learning methods to process a large number of images for different kinds of visual applications has greatly increased. Towards that end, this dissertation focuses on exploring new image descriptors and learning methods by incorporating important visual aspects and enhancing the feature representation in the discriminative space for advancing image classification. First, an innovative sparse representation model using the complete marginal Fisher analysis (CMFA-SR) framework is proposed for improving the image classification performance. In particular, the complete marginal Fisher analysis method extracts the discriminatory features in both the column space of the local samples based within class scatter matrix and the null space of its transformed matrix. To further improve the classification capability, a discriminative sparse representation model is proposed by integrating a representation criterion such as the sparse representation and a discriminative criterion. Second, the discriminative dictionary distribution based sparse coding (DDSC) method is presented that utilizes both the discriminative and generative information to enhance the feature representation. Specifically, the dictionary distribution criterion reveals the class conditional probability of each dictionary item by using the dictionary distribution coefficients, and the discriminative criterion applies new within-class and between-class scatter matrices for discriminant analysis. Third, a fused color Fisher vector (FCFV) feature is developed by integrating the most expressive features of the DAISY Fisher vector (D-FV) feature, the WLD-SIFT Fisher vector (WS-FV) feature, and the SIFT-FV feature in different color spaces to capture the local, color, spatial, relative intensity, as well as the gradient orientation information. Furthermore, a sparse kernel manifold learner (SKML) method is applied to the FCFV features for learning a discriminative sparse representation by considering the local manifold structure and the label information based on the marginal Fisher criterion. Finally, a novel multiple anthropological Fisher kernel framework (M-AFK) is presented to extract and enhance the facial genetic features for kinship verification. The proposed method is derived by applying a novel similarity enhancement approach based on SIFT flow and learning an inheritable transformation on the multiple Fisher vector features that uses the criterion of minimizing the distance among the kinship samples and maximizing the distance among the non-kinship samples. The effectiveness of the proposed methods is assessed on numerous image classification tasks, such as face recognition, kinship verification, scene classification, object classification, and computational fine art painting categorization. The experimental results on popular image datasets show the feasibility of the proposed methods

    Models and analysis of vocal emissions for biomedical applications: 5th International Workshop: December 13-15, 2007, Firenze, Italy

    Get PDF
    The MAVEBA Workshop proceedings, held on a biannual basis, collect the scientific papers presented both as oral and poster contributions, during the conference. The main subjects are: development of theoretical and mechanical models as an aid to the study of main phonatory dysfunctions, as well as the biomedical engineering methods for the analysis of voice signals and images, as a support to clinical diagnosis and classification of vocal pathologies. The Workshop has the sponsorship of: Ente Cassa Risparmio di Firenze, COST Action 2103, Biomedical Signal Processing and Control Journal (Elsevier Eds.), IEEE Biomedical Engineering Soc. Special Issues of International Journals have been, and will be, published, collecting selected papers from the conference

    Investigation of new learning methods for visual recognition

    Get PDF
    Visual recognition is one of the most difficult and prevailing problems in computer vision and pattern recognition due to the challenges in understanding the semantics and contents of digital images. Two major components of a visual recognition system are discriminatory feature representation and efficient and accurate pattern classification. This dissertation therefore focuses on developing new learning methods for visual recognition. Based on the conventional sparse representation, which shows its robustness for visual recognition problems, a series of new methods is proposed. Specifically, first, a new locally linear K nearest neighbor method, or LLK method, is presented. The LLK method derives a new representation, which is an approximation to the ideal representation, by optimizing an objective function based on a host of criteria for sparsity, locality, and reconstruction. The novel representation is further processed by two new classifiers, namely, an LLK based classifier (LLKc) and a locally linear nearest mean based classifier (LLNc), for visual recognition. The proposed classifiers are shown to connect to the Bayes decision rule for minimum error. Second, a new generative and discriminative sparse representation (GDSR) method is proposed by taking advantage of both a coarse modeling of the generative information and a modeling of the discriminative information. The proposed GDSR method integrates two new criteria, namely, a discriminative criterion and a generative criterion, into the conventional sparse representation criterion. A new generative and discriminative sparse representation based classification (GDSRc) method is then presented based on the derived new representation. Finally, a new Score space based multiple Metric Learning (SML) method is presented for a challenging visual recognition application, namely, recognizing kinship relations or kinship verification. The proposed SML method, which goes beyond the conventional Mahalanobis distance metric learning, not only learns the distance metric but also models the generative process of features by taking advantage of the score space. The SML method is optimized by solving a constrained, non-negative, and weighted variant of the sparse representation problem. To assess the feasibility of the proposed new learning methods, several visual recognition tasks, such as face recognition, scene recognition, object recognition, computational fine art analysis, action recognition, fine grained recognition, as well as kinship verification are applied. The experimental results show that the proposed new learning methods achieve better performance than the other popular methods

    Generation of realistic human behaviour

    Get PDF
    As the use of computers and robots in our everyday lives increases so does the need for better interaction with these devices. Human-computer interaction relies on the ability to understand and generate human behavioural signals such as speech, facial expressions and motion. This thesis deals with the synthesis and evaluation of such signals, focusing not only on their intelligibility but also on their realism. Since these signals are often correlated, it is common for methods to drive the generation of one signal using another. The thesis begins by tackling the problem of speech-driven facial animation and proposing models capable of producing realistic animations from a single image and an audio clip. The goal of these models is to produce a video of a target person, whose lips move in accordance with the driving audio. Particular focus is also placed on a) generating spontaneous expression such as blinks, b) achieving audio-visual synchrony and c) transferring or producing natural head motion. The second problem addressed in this thesis is that of video-driven speech reconstruction, which aims at converting a silent video into waveforms containing speech. The method proposed for solving this problem is capable of generating intelligible and accurate speech for both seen and unseen speakers. The spoken content is correctly captured thanks to a perceptual loss, which uses features from pre-trained speech-driven animation models. The ability of the video-to-speech model to run in real-time allows its use in hearing assistive devices and telecommunications. The final work proposed in this thesis is a generic domain translation system, that can be used for any translation problem including those mapping across different modalities. The framework is made up of two networks performing translations in opposite directions and can be successfully applied to solve diverse sets of translation problems, including speech-driven animation and video-driven speech reconstruction.Open Acces

    Pattern Recognition

    Get PDF
    Pattern recognition is a very wide research field. It involves factors as diverse as sensors, feature extraction, pattern classification, decision fusion, applications and others. The signals processed are commonly one, two or three dimensional, the processing is done in real- time or takes hours and days, some systems look for one narrow object class, others search huge databases for entries with at least a small amount of similarity. No single person can claim expertise across the whole field, which develops rapidly, updates its paradigms and comprehends several philosophical approaches. This book reflects this diversity by presenting a selection of recent developments within the area of pattern recognition and related fields. It covers theoretical advances in classification and feature extraction as well as application-oriented works. Authors of these 25 works present and advocate recent achievements of their research related to the field of pattern recognition
    • …
    corecore