564 research outputs found

    MODIFIED LOCAL TERNARY PATTERN WITH CONVOLUTIONAL NEURAL NETWORK FOR FACE EXPRESSION RECOGNITION

    Get PDF
    Facial expression recognition (FER) on images with illumination variation and noises is a challenging problem in the computer vision field. We solve this using deep learning approaches that have been successfully applied in various fields, especially in uncontrolled input conditions. We apply a sequence of processes including face detection, normalization, augmentation, and texture representation, to develop FER based on Convolutional Neural Network (CNN). The combination of TanTriggs normalization technique and Adaptive Gaussian Transformation Method is used to reduce light variation. The number of images is augmented using a geometric augmentation technique to prevent overfitting due to lack of training data. We propose a representation of Modified Local Ternary Pattern (Modified LTP) texture image that is more discriminating and less sensitive to noise by combining the upper and lower parts of the original LTP using the logical AND operation followed by average calculation. The Modified LTP texture images are then used to train a CNN-based classification model. Experiments on the KDEF dataset show that the proposed approach provides a promising result with an accuracy of 81.15%

    Face Recognition Under Varying Illumination

    Get PDF
    This study is a result of a successful joint-venture with my adviser Prof. Dr. Muhittin Gökmen. I am thankful to him for his continuous assistance on preparing this project. Special thanks to the assistants of the Computer Vision Laboratory for their steady support and help in many topics related with the project

    Ensemble of texture descriptors and classifiers for face recognition

    Get PDF
    Abstract Presented in this paper is a novel system for face recognition that works well in the wild and that is based on ensembles of descriptors that utilize different preprocessing techniques. The power of our proposed approach is demonstrated on two datasets: the FERET dataset and the Labeled Faces in the Wild (LFW) dataset. In the FERET datasets, where the aim is identification, we use the angle distance. In the LFW dataset, where the aim is to verify a given match, we use the Support Vector Machine and Similarity Metric Learning. Our proposed system performs well on both datasets, obtaining, to the best of our knowledge, one of the highest performance rates published in the literature on the FERET datasets. Particularly noteworthy is the fact that these good results on both datasets are obtained without using additional training patterns. The MATLAB source of our best ensemble approach will be freely available at https://www.dei.unipd.it/node/2357

    Model-driven and Data-driven Approaches for some Object Recognition Problems

    Get PDF
    Recognizing objects from images and videos has been a long standing problem in computer vision. The recent surge in the prevalence of visual cameras has given rise to two main challenges where, (i) it is important to understand different sources of object variations in more unconstrained scenarios, and (ii) rather than describing an object in isolation, efficient learning methods for modeling object-scene `contextual' relations are required to resolve visual ambiguities. This dissertation addresses some aspects of these challenges, and consists of two parts. First part of the work focuses on obtaining object descriptors that are largely preserved across certain sources of variations, by utilizing models for image formation and local image features. Given a single instance of an object, we investigate the following three problems. (i) Representing a 2D projection of a 3D non-planar shape invariant to articulations, when there are no self-occlusions. We propose an articulation invariant distance that is preserved across piece-wise affine transformations of a non-rigid object `parts', under a weak perspective imaging model, and then obtain a shape context-like descriptor to perform recognition; (ii) Understanding the space of `arbitrary' blurred images of an object, by representing an unknown blur kernel of a known maximum size using a complete set of orthonormal basis functions spanning that space, and showing that subspaces resulting from convolving a clean object and its blurred versions with these basis functions are equal under some assumptions. We then view the invariant subspaces as points on a Grassmann manifold, and use statistical tools that account for the underlying non-Euclidean nature of the space of these invariants to perform recognition across blur; (iii) Analyzing the robustness of local feature descriptors to different illumination conditions. We perform an empirical study of these descriptors for the problem of face recognition under lighting change, and show that the direction of image gradient largely preserves object properties across varying lighting conditions. The second part of the dissertation utilizes information conveyed by large quantity of data to learn contextual information shared by an object (or an entity) with its surroundings. (i) We first consider a supervised two-class problem of detecting lane markings from road video sequences, where we learn relevant feature-level contextual information through a machine learning algorithm based on boosting. We then focus on unsupervised object classification scenarios where, (ii) we perform clustering using maximum margin principles, by deriving some basic properties on the affinity of `a pair of points' belonging to the same cluster using the information conveyed by `all' points in the system, and (iii) then consider correspondence-free adaptation of statistical classifiers across domain shifting transformations, by generating meaningful `intermediate domains' that incrementally convey potential information about the domain change
    corecore