10 research outputs found

    Nonparametric Facial Feature Localization Using Segment-Based Eigenfeatures

    Get PDF
    We present a nonparametric facial feature localization method using relative directional information between regularly sampled image segments and facial feature points. Instead of using any iterative parameter optimization technique or search algorithm, our method finds the location of facial feature points by using a weighted concentration of the directional vectors originating from the image segments pointing to the expected facial feature positions. Each directional vector is calculated by linear combination of eigendirectional vectors which are obtained by a principal component analysis of training facial segments in feature space of histogram of oriented gradient (HOG). Our method finds facial feature points very fast and accurately, since it utilizes statistical reasoning from all the training data without need to extract local patterns at the estimated positions of facial features, any iterative parameter optimization algorithm, and any search algorithm. In addition, we can reduce the storage size for the trained model by controlling the energy preserving level of HOG pattern space

    Nearest Neighbor Discriminant Analysis Based Face Recognition Using Ensembled Gabor Features

    Get PDF
    Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Bilişim Enstitüsü, 2009Thesis (M.Sc.) -- İstanbul Technical University, Institute of Informatics, 2009Son yıllarda, ışık varyasyonlarına ve yüz ifade değişikliklerine karşı gürbüz olduğu üzere yüz tanıma alanında Gabor öznitelikleri tabanlı yüz temsil etme çok umut vaad edici sonuç vermiştir. Seçilen uzamsal frekans, uzamsal lokalizasyon ve yönelime göre yerel yapıyı hesaplaması, elle işaretlendirmeye ihtiyaç duymaması Gabor özniteliklerini efektif yapan özellikleridir. Bu tez çalışmasındaki katkı, Gabor süzgeçleri ve En Yakın Komşu Ayrışım Analizi'nin (EYKAA) güçlerini birleştirerek önemli ayrışım öznitelikleri ortaya çıkaran Gabor En Yakın Komşu Sınıflandırıcısı (GEYKS) genişletip Parçalı Gabor En Yakın Komşu Sınıflandırıcısı (PGEYKS) metodunu ortaya koymaktır. PGEYKS; alçaltılmış gabor öznitelikleri barındıran farklı segmanları kullanarak, her biri ayrı dizayn edilen birçok EYKAA tabanlı bileşen sınıflandırıcılarını bir araya getiren grup sınıflandırıcısıdır. Tüm gabor özniteliklerinin alçaltılmış boyutu tek bir EYKAA bileşeninden çıkarıldığı gibi, PGEYKS; ayrışım bilgi kaybını minimum yapıp 3S (yetersiz örnek miktarı) problemini önleyerek alçaltılmış gabor öznitelikleri içindeki ayrıştırabilirliği daha iyi kullanır. PGEYKS yönteminin tanıma başarımı karşılaştırmalı performans çalışması ile gösterilmiştir. Farklı ışıklandırma ve yüz ifadesi deişiklikleri barındıran 200 sınıflık FERET veritabanı alt kümesinde, 65 öznitelik için PGEYKS %100 başarım elde ederek atası olan GEYKS'nın aldığı %98 başarısını ve diğer GFS (Gabor Fisher Sınıflandırıcı) ve GTS (Gabor Temel Sınıflandırıcı) gibi standard methodlardan daha iyi sonuçlar vermiştir. Ayrıca YALE veritabanı üzerindeki testlerde PGEYKS her türlü (k, alpha) çiftleri için GEYKS'ten daha başarılıdır ve 14 öznitelik için step size = 5, k = 5, alpha = 3 parametlerinde %96 tanıma başarısına ulaşmıştır.In last decades, Gabor features based face representation performed very promising results in face recognition area as its robust to variations due to illumination and facial expression changes. The properties of Gabor are, which makes it effective, it computes the local structure corresponding to spatial frequency (scale), spatial localization, and orientation selectivity and no need for manual annotations. The contribution of this thesis, an Ensemble based Gabor Nearest Neighbor Classifier (EGNNC) method is proposed extending Gabor Nearest Neighbor Classifier (GNNC) where GNNC extracts important discriminant features both utilizing the power of Gabor filters and Nearest Neighbor Discriminant Analysis (NNDA). EGNNC is an ensemble classifier combining multiple NNDA based component classifiers designed respectively using different segments of the reduced Gabor feature. Since reduced dimension of the entire Gabor feature is extracted by one component NNDA classifier, EGNNC has better use of the discriminability implied in reduced Gabor features by the avoiding 3S (small sample size) problem as making minimum loss of discriminative information. The accuracy of the EGNNC is shown by comparative performance work. Using a 200 class subset of FERET database covering illumination and expression variations, EGNNC achieved 100% recognition rate, outperforming its ancestor GNNC perform 98 percent as well as standard methods such GFC and GPC for 65 features. Also for the YALE database, EGNNC outperformed GNNC on all (k, alpha) tuples and EGNNC reaches 96 percent accuracy in 14 feature dimension, along with parameters step size = 5, k = 5, alpha = 3.Yüksek LisansM.Sc

    Human detection and face recognition in indoor environment to improve human-robot interaction in assistive and collaborative robots

    Get PDF
    Human detection in indoor environment is essential for Robots working together with humans in collaborative manufacturing environment. Similarly, Human detection is essential for service robots providing service with household chores or helping elderly population with different daily activities. Human detection can be achieved by Human Head detection, as head is the most discriminative part of human. Head detection method can be divided into three types: i) Method based on color mode; ii) Method based on template matching; and iii) Method based on contour detection. Method based on color mode is simple but is error prone. Method based on head template detects head in the image by searching for a template which is similar to head template. On the other hand, Method based on contour detection uses some information to describe head or head and shoulder information. The use of only one criteria may not be sufficient and accuracy of human head detection can be increased by combining the shape and color information. In this thesis, a method of human detection is proposed by combining the head shape and skin color (i.e., Combination of method based on Color mode and method based on Contour detection). Mainly, curvature criteria is used to segment out curves having similar curvature to find human head. Further, skin color is detected to localize face in image plane. A curve represents human head curve if only it has sufficient skin colored pixel in its closed proximity. Thus, by using color and human head curvature it was found that promising results could be obtained in human detection in indoor environment. iv After detecting humans in the surrounding, the next step for the robot could be to identify and recognize them. In this thesis, the use of Gabor filter response on nine points was investigated to identify eight different individuals. This suggests that the Gabor filter on nine points could be applied to identify people in small areas, for example home or small office with less individuals.Masters of Applied Science (M.A.Sc.) in Natural Resource Engineerin

    State of the Art in Face Recognition

    Get PDF
    Notwithstanding the tremendous effort to solve the face recognition problem, it is not possible yet to design a face recognition system with a potential close to human performance. New computer vision and pattern recognition approaches need to be investigated. Even new knowledge and perspectives from different fields like, psychology and neuroscience must be incorporated into the current field of face recognition to design a robust face recognition system. Indeed, many more efforts are required to end up with a human like face recognition system. This book tries to make an effort to reduce the gap between the previous face recognition research state and the future state

    Discriminative learning with application to interactive facial image retrieval

    Get PDF
    The amount of digital images is growing drastically and advanced tools for searching in large image collections are therefore becoming urgently needed. Content-based image retrieval is advantageous for such a task in terms of automatic feature extraction and indexing without human labor and subjectivity in image annotations. The semantic gap between high-level semantics and low-level visual features can be reduced by the relevance feedback technique. However, most existing interactive content-based image retrieval (ICBIR) systems require a substantial amount of human evaluation labor, which leads to the evaluation fatigue problem that heavily restricts the application of ICBIR. In this thesis a solution based on discriminative learning is presented. It extends an existing ICBIR system, PicSOM, towards practical applications. The enhanced ICBIR system allows users to input partial relevance which includes not only relevance extent but also relevance reason. A multi-phase retrieval with partial relevance can adapt to the user's searching intention in a from-coarse-to-fine manner. The retrieval performance can be improved by employing supervised learning as a preprocessing step before unsupervised content-based indexing. In this work, Parzen Discriminant Analysis (PDA) is proposed to extract discriminative components from images. PDA regularizes the Informative Discriminant Analysis (IDA) objective with a greatly accelerated optimization algorithm. Moreover, discriminative Self-Organizing Maps trained with resulting features can easily handle fuzzy categorizations. The proposed techniques have been applied to interactive facial image retrieval. Both a query example and a benchmark simulation study are presented, which indicate that the first image depicting the target subject can be retrieved in a small number of rounds

    LEARNING FROM MULTIPLE VIEWS OF DATA

    Get PDF
    This dissertation takes inspiration from the abilities of our brain to extract information and learn from multiple sources of data and try to mimic this ability for some practical problems. It explores the hypothesis that the human brain can extract and store information from raw data in a form, termed a common representation, suitable for cross-modal content matching. A human-level performance for the aforementioned task requires - a) the ability to extract sufficient information from raw data and b) algorithms to obtain a task-specific common representation from multiple sources of extracted information. This dissertation addresses the aforementioned requirements and develops novel content extraction and cross-modal content matching architectures. The first part of the dissertation proposes a learning-based visual information extraction approach: Recursive Context Propagation Network or RCPN, for semantic segmentation of images. It is a deep neural network that utilizes the contextual information from the entire image for semantic segmentation, through bottom-up followed by top-down context propagation. This improves the feature representation of every super-pixel in an image for better classification into semantic categories. RCPN is analyzed to discover that the presence of bypass-error paths in RCPN can hinder effective context propagation. It is shown that bypass-errors can be tackled by inclusion of classification loss of internal nodes as well. Secondly, a novel tree-MRF structure is developed using the parse trees to model the hierarchical dependency present in the output. The second part of this dissertation develops algorithms to obtain and match the common representations across different modalities. A novel Partial Least Square (PLS) based framework is proposed to learn a common subspace from multiple modalities of data. It is used for multi-modal face biometric problems such as pose-invariant face recognition and sketch-face recognition. The issue of sensitivity to the noise in pose variation is analyzed and a two-stage discriminative model is developed to tackle it. A generalized framework is proposed to extend various popular feature extraction techniques that can be solved as a generalized eigenvalue problem to their multi-modal counterpart. It is termed Generalized Multiview Analysis or GMA, and used for pose-and-lighting invariant face recognition and text-image retrieval

    From rule-based to learning-based image-conditional image generation

    Get PDF
    Visual contents, such as movies, animations, computer games, videos and photos, are massively produced and consumed nowadays. Most of these contents are the combination of materials captured from real-world and contents synthesized by computers. Particularly, computer-generated visual contents are increasingly indispensable in modern entertainment and production. The generation of visual contents by computers is typically conditioned on real-world materials, driven by the imagination of designers and artists, or a combination of both. However, creating visual contents manually are both challenging and labor intensive. Therefore, enabling computers to automatically or semi-automatically synthesize needed visual contents becomes essential. Among all these efforts, a stream of research is to generate novel images based on given image priors, e.g., photos and sketches. This research direction is known as image-conditional image generation, which covers a wide range of topics such as image stylization, image completion, image fusion, sketch-to-image generation, and extracting image label maps. In this thesis, a set of novel approaches for image-conditional image generation are presented. The thesis starts with an exemplar-based method for facial image stylization in Chapter 2. This method involves a unified framework for facial image stylization based on a single style exemplar. A two-phase procedure is employed, where the first phase searches a dense and semantic-aware correspondence between the input and the exemplar images, and the second phase conducts edge-preserving texture transfer. While this algorithm has the merit of requiring only a single exemplar, it is constrained to face photos. To perform generalized image-to-image translation, Chapter 3 presents a data-driven and learning-based method. Inspired by the dual learning paradigm designed for natural language translation [115], a novel dual Generative Adversarial Network (DualGAN) mechanism is developed, which enables image translators to be trained from two sets of unlabeled images from two domains. This is followed by another data-driven method in Chapter 4, which learns multiscale manifolds from a set of images and then enables synthesizing novel images that mimic the appearance of the target image dataset. The method is named as Branched Generative Adversarial Network (BranchGAN) and employs a novel training method that enables unconditioned generative adversarial networks (GANs) to learn image manifolds at multiple scales. As a result, we can directly manipulate and even combine latent manifold codes that are associated with specific feature scales. Finally, to provide users more control over image generation results, Chapter 5 discusses an upgraded version of iGAN [126] (iGANHD) that significantly improves the art of manipulating high-resolution images through utilizing the multi-scale manifold learned with BranchGAN

    Hierarchical age estimation using enhanced facial features.

    Get PDF
    Doctor of Philosopy in Computer Science, University of KwaZulu-Natal, Westville, 2018.Ageing is a stochastic, inevitable and uncontrollable process that constantly affect shape, texture and general appearance of the human face. Humans can easily determine ones’ gender, identity and ethnicity with highest accuracy as compared to age. This makes development of automatic age estimation techniques that surpass human performance an attractive yet challenging task. Automatic age estimation requires extraction of robust and reliable age discriminative features. Local binary patterns (LBP) sensitivity to noise makes it insufficiently reliable in capturing age discriminative features. Although local ternary patterns (LTP) is insensitive to noise, it uses a single static threshold for all images regardless of varied image conditions. Local directional patterns (LDP) uses k directional responses to encode image gradient and disregards not only central pixel in the local neighborhood but also 8 k directional responses. Every pixel in an image carry subtle information. Discarding 8 k directional responses lead to lose of discriminative texture features. This study proposes two variations of LDP operator for texture extraction. Significantorientation response LDP (SOR-LDP) encodes image gradient by grouping eight directional responses into four pairs. Each pair represents orientation of an edge with respect to central reference pixel. Values in each pair are compared and the bit corresponding to the maximum value in the pair is set to 1 while the other is set to 0. The resultant binary code is converted to decimal and assigned to the central pixel as its’ SOR-LDP code. Texture features are contained in the histogram of SOR-LDP encoded image. Local ternary directional patterns (LTDP) first gets the difference between neighboring pixels and central pixel in 3 3 image region. These differential values are convolved with Kirsch edge detectors to obtain directional responses. These responses are normalized and used as probability of an edge occurring towards a respective direction. An adaptive threshold is applied to derive LTDP code. The LTDP code is split into its positive and negative LTDP codes. Histograms of negative and positive LTDP encoded images are concatenated to obtain texture feature. Regardless of there being evidence of spatial frequency processing in primary visual cortex, biologically inspired features (BIF) that model visual cortex uses only scale and orientation selectivity in feature extraction. Furthermore, these BIF are extracted using holistic (global) pooling across scale and orientations leading to lose of substantive information. This study proposes multi-frequency BIF (MF-BIF) where frequency selectivity is introduced in BIF modelling. Local statistical BIF (LS-BIF) uses local pooling within scale, orientation and frequency in n n region for BIF extraction. Using Leave-one-person-out (LOPO) validation protocol, this study investigated performance of proposed feature extractors in age estimation in a hierarchical way by performing age-group classification using Multi-layer Perceptron (MLP) followed by within age-group exact age regression using support vector regression (SVR). Mean absolute error (MAE) and cumulative score (CS) were used to evaluate performance of proposed face descriptors. Experimental results on FG-NET ageing dataset show that SOR-LDP, LTDP, MF-BIF and LS-BIF outperform state-of-the-art feature descriptors in age estimation. Experimental results show that performing gender discrimination before age-group and age estimation further improves age estimation accuracies. Shape, appearance, wrinkle and texture features are simultaneously extracted by visual system in primates for the brain to process and understand an image or a scene. However, age estimation systems in the literature use a single feature for age estimation. A single feature is not sufficient enough to capture subtle age discriminative traits due to stochastic and personalized nature of ageing. This study propose fusion of different facial features to enhance their discriminative power. Experimental results show that fusing shape, texture, wrinkle and appearance result into robust age discriminative features that achieve lower MAE compared to single feature performance

    A Study on Human Motion Acquisition and Recognition Employing Structured Motion Database

    Get PDF
    九州工業大学博士学位論文 学位記番号:工博甲第332号 学位授与年月日:平成24年3月23日1 Introduction||2 Human Motion Representation||3 Human Motion Recognition||4 Automatic Human Motion Acquisition||5 Human Motion Recognition Employing Structured Motion Database||6 Analysis on the Constraints in Human Motion Recognition||7 Multiple Persons’ Action Recognition||8 Discussion and ConclusionsHuman motion analysis is an emerging research field for the video-based applications capable of acquiring and recognizing human motions or actions. The automaticity of such a system with these capabilities has vital importance in real-life scenarios. With the increasing number of applications, the demand for a human motion acquisition system is gaining importance day-by-day. We develop such kind of acquisition system based on body-parts modeling strategy. The system is able to acquire the motion by positioning body joints and interpreting those joints by the inter-parts inclination. Besides the development of the acquisition system, there is increasing need for a reliable human motion recognition system in recent years. There are a number of researches on motion recognition is performed in last two decades. At the same time, an enormous amount of bulk motion datasets are becoming available. Therefore, it becomes an indispensable task to develop a motion database that can deal with large variability of motions efficiently. We have developed such a system based on the structured motion database concept. In order to gain a perspective on this issue, we have analyzed various aspects of the motion database with a view to establishing a standard recognition scheme. The conventional structured database is subjected to improvement by considering three aspects: directional organization, nearest neighbor searching problem resolution, and prior direction estimation. In order to investigate and analyze comprehensively the effect of those aspects on motion recognition, we have adopted two forms of motion representation, eigenspace-based motion compression, and B-Tree structured database. Moreover, we have also analyzed the two important constraints in motion recognition: missing information and clutter outdoor motions. Two separate systems based on these constraints are also developed that shows the suitable adoption of the constraints. However, several people occupy a scene in practical cases. We have proposed a detection-tracking-recognition integrated action recognition system to deal with multiple people case. The system shows decent performance in outdoor scenarios. The experimental results empirically illustrate the suitability and compatibility of various factors of the motion recognition
    corecore