196 research outputs found

    Object detection and segmentation using discriminative learning

    Get PDF
    Object detection and segmentation algorithms need to use prior knowledge of objects' shape and appearance to guide solutions to correct ones. A promising way of obtaining prior knowledge is to learn it directly from expert annotations by using machine learning techniques. Previous approaches commonly use generative learning approaches to achieve this goal. In this dissertation, I propose a series of discriminative learning algorithms based on boosting principles to learn prior knowledge from image databases with expert annotations. The learned knowledge improves the performance of detection and segmentation, leading to fast and accurate solutions. For object detection, I present a learning procedure called a Probabilistic Boosting Network (PBN) suitable for real-time object detection and pose estimation. Based on the law of total probability, PBN integrates evidence from two building blocks, namely a multiclass classifier for pose estimation and a detection cascade for object detection. Both the classifier and detection cascade employ boosting. By inferring the pose parameter, I avoid the exhaustive scan over pose parameters, which hampers real-time detection. I implement PBN using a graph-structured network that alternates the two tasks of object detection and pose estimation in an effort to reject negative cases as quickly as possible. Compared with previous approaches, PBN has higher accuracy in object localization and pose estimation with noticeable reduced computation. For object segmentation, I cast deformable object segmentation as optimizing the conditional probability density function p(C|I), where I is an image and C is a vector of model parameters describing the object shape. I propose a regression approach to learn the density p(C|I) discriminatively based on boosting principles. The learned density p(C|I) possesses a desired unimodal, smooth shape, which can be used by optimization algorithms to efficiently estimate a solution. To handle the high-dimensional learning challenges, I propose a multi-level approach and a gradient-based sampling strategy to learn regression functions efficiently. I show that the regression approach consistently outperforms state-of-the-art methods on a variety of testing datasets. Finally, I present a comparative study on how to apply three discriminative learning approaches - classification, regression, and ranking - to deformable shape segmentation. I discuss how to extend the idea of the regression approach to build discriminative models using classification and ranking. I propose sampling strategies to collect training examples from a high-dimensional model space for the classification and the ranking approach. I also propose a ranking algorithm based on Rankboost to learn a discriminative model for segmentation. Experimental results on left ventricle and left atrium segmentation from ultrasound images and facial feature localization demonstrate that the discriminative models outperform generative models and energy minimization methods by a large margin

    Detection of Locations of Key Points on Facial Images

    Get PDF
    In field of computer vision research, One of the most important branch is Face recognition. It targets at finding size and location of human face on digital image, by identifying and separating faces from the surrounding objects like building, plants etc. For the purpose of developing an advanced face recognition algorithm, Detection of facial key points is the basic and very important task, basically it is about finding out the location of specific key points on facial images. This key points can be mouths, noses, left eyes, right eyes and so on. For implementation of solution, I have used amazon ec2 gpu instance and convolutional networks consisting of multiple levels. Outputs of multiple networks are fused at every level for accurate and robust evaluation. At the stage of initialization, high level features are extracted over the whole face region which helps in locating key points with high accuracy. Local minimum occurred by data corruption and ambiguity in difficult samples of image caused by occlusions, extreme lightings and large variations in poses can be avoided by this method. At later levels, training of networks is adopted to locally refine the initial predictions and the input supplied to them are limited to smaller regions around the predictions that are obtained in initial stage

    Robust facial expression recognition in the presence of rotation and partial occlusion

    Get PDF
    >Magister Scientiae - MScThis research proposes an approach to recognizing facial expressions in the presence of rotations and partial occlusions of the face. The research is in the context of automatic machine translation of South African Sign Language (SASL) to English. The proposed method is able to accurately recognize frontal facial images at an average accuracy of 75%. It also achieves a high recognition accuracy of 70% for faces rotated to 60â—¦. It was also shown that the method is able to continue to recognize facial expressions even in the presence of full occlusions of the eyes, mouth and left/right sides of the face. The accuracy was as high as 70% for occlusion of some areas. An additional finding was that both the left and the right sides of the face are required for recognition. As an addition, the foundation was laid for a fully automatic facial expression recognition system that can accurately segment frontal or rotated faces in a video sequence

    Joint optimization of manifold learning and sparse representations for face and gesture analysis

    Get PDF
    Face and gesture understanding algorithms are powerful enablers in intelligent vision systems for surveillance, security, entertainment, and smart spaces. In the future, complex networks of sensors and cameras may disperse directions to lost tourists, perform directory lookups in the office lobby, or contact the proper authorities in case of an emergency. To be effective, these systems will need to embrace human subtleties while interacting with people in their natural conditions. Computer vision and machine learning techniques have recently become adept at solving face and gesture tasks using posed datasets in controlled conditions. However, spontaneous human behavior under unconstrained conditions, or in the wild, is more complex and is subject to considerable variability from one person to the next. Uncontrolled conditions such as lighting, resolution, noise, occlusions, pose, and temporal variations complicate the matter further. This thesis advances the field of face and gesture analysis by introducing a new machine learning framework based upon dimensionality reduction and sparse representations that is shown to be robust in posed as well as natural conditions. Dimensionality reduction methods take complex objects, such as facial images, and attempt to learn lower dimensional representations embedded in the higher dimensional data. These alternate feature spaces are computationally more efficient and often more discriminative. The performance of various dimensionality reduction methods on geometric and appearance based facial attributes are studied leading to robust facial pose and expression recognition models. The parsimonious nature of sparse representations (SR) has successfully been exploited for the development of highly accurate classifiers for various applications. Despite the successes of SR techniques, large dictionaries and high dimensional data can make these classifiers computationally demanding. Further, sparse classifiers are subject to the adverse effects of a phenomenon known as coefficient contamination, where for example variations in pose may affect identity and expression recognition. This thesis analyzes the interaction between dimensionality reduction and sparse representations to present a unified sparse representation classification framework that addresses both issues of computational complexity and coefficient contamination. Semi-supervised dimensionality reduction is shown to mitigate the coefficient contamination problems associated with SR classifiers. The combination of semi-supervised dimensionality reduction with SR systems forms the cornerstone for a new face and gesture framework called Manifold based Sparse Representations (MSR). MSR is shown to deliver state-of-the-art facial understanding capabilities. To demonstrate the applicability of MSR to new domains, MSR is expanded to include temporal dynamics. The joint optimization of dimensionality reduction and SRs for classification purposes is a relatively new field. The combination of both concepts into a single objective function produce a relation that is neither convex, nor directly solvable. This thesis studies this problem to introduce a new jointly optimized framework. This framework, termed LGE-KSVD, utilizes variants of Linear extension of Graph Embedding (LGE) along with modified K-SVD dictionary learning to jointly learn the dimensionality reduction matrix, sparse representation dictionary, sparse coefficients, and sparsity-based classifier. By injecting LGE concepts directly into the K-SVD learning procedure, this research removes the support constraints K-SVD imparts on dictionary element discovery. Results are shown for facial recognition, facial expression recognition, human activity analysis, and with the addition of a concept called active difference signatures, delivers robust gesture recognition from Kinect or similar depth cameras

    Automatic Segmentation of Mandible from Conventional Methods to Deep Learning-A Review

    Get PDF
    Medical imaging techniques, such as (cone beam) computed tomography and magnetic resonance imaging, have proven to be a valuable component for oral and maxillofacial surgery (OMFS). Accurate segmentation of the mandible from head and neck (H&N) scans is an important step in order to build a personalized 3D digital mandible model for 3D printing and treatment planning of OMFS. Segmented mandible structures are used to effectively visualize the mandible volumes and to evaluate particular mandible properties quantitatively. However, mandible segmentation is always challenging for both clinicians and researchers, due to complex structures and higher attenuation materials, such as teeth (filling) or metal implants that easily lead to high noise and strong artifacts during scanning. Moreover, the size and shape of the mandible vary to a large extent between individuals. Therefore, mandible segmentation is a tedious and time-consuming task and requires adequate training to be performed properly. With the advancement of computer vision approaches, researchers have developed several algorithms to automatically segment the mandible during the last two decades. The objective of this review was to present the available fully (semi)automatic segmentation methods of the mandible published in different scientific articles. This review provides a vivid description of the scientific advancements to clinicians and researchers in this field to help develop novel automatic methods for clinical applications

    Gender classification using facial components.

    Get PDF
    Master’s degree. University of KwaZulu-Natal, Durban.Gender classification is very important in facial analysis as it can be used as input into a number of systems such as face recognition. Humans are able to classify gender with great accuracy however passing this ability to machines is a complex task because of many variables such as lighting to mention a few. For the purpose of this research we have approached gender classification as a binary problem, involving the two classes male and female. Two datasets are used in this research which are the FG-NET dataset and Pilots Parliament datasets. Two appearance based feature extractors are used which are the LBP and LDP with the Active Shape model being included by fusing. The classifiers used here are the Support Vector Machine with Radial Basis Function kernel and an Artificial Neural Network with backpropagation. On the FG-NET an average detection of 90.6% against that of 87.5% to that of the PPB. Gender is then detected from the facial components the nose, eyes among others. The forehead recorded the highest accuracy with 92%, followed by the nose with 90%, cheeks with 89.2% and the eyes with 87% and the mouth recorded the lowest accuracy of 75%. As a result feature fusion is then carried out to improve classification accuracies especially that of the mouth and eyes with lowest accuracies. The eyes with an accuracy of 87% is fused with the forehead with 92% and the resulting accuracy is an increase to 93%. The mouth, with the lowest accuracy of 75% is fused with the nose which has an accuracy of 90% and the resulting accuracy is 87%. These results carried out by fusing through addition showed improved results. Fusion is then carried out between Appearance based and shape based features. On the FG-NET dataset using the LBP and LDP an accuracy of 85.33% and 89.53% with the PPB recording 83.13%, 89.3% for LBP and LDP respectively. As expected and shown by previous researchers the LDP clearly obtains higher classification accuracies as it than LBP as it uses gradient rather than pixel intensity. We then fuse the vectors of the LDP, LBP with that of the ASM and carry out dimensionality reduction, then fusion by addition. On the PPB dataset fusion of LDP and ASM records 81.56%, and 94.53% with the FG-NET recording 89.53% respectively

    Facial expression recognition in the wild : from individual to group

    Get PDF
    The progress in computing technology has increased the demand for smart systems capable of understanding human affect and emotional manifestations. One of the crucial factors in designing systems equipped with such intelligence is to have accurate automatic Facial Expression Recognition (FER) methods. In computer vision, automatic facial expression analysis is an active field of research for over two decades now. However, there are still a lot of questions unanswered. The research presented in this thesis attempts to address some of the key issues of FER in challenging conditions mentioned as follows: 1) creating a facial expressions database representing real-world conditions; 2) devising Head Pose Normalisation (HPN) methods which are independent of facial parts location; 3) creating automatic methods for the analysis of mood of group of people. The central hypothesis of the thesis is that extracting close to real-world data from movies and performing facial expression analysis on movies is a stepping stone in the direction of moving the analysis of faces towards real-world, unconstrained condition. A temporal facial expressions database, Acted Facial Expressions in the Wild (AFEW) is proposed. The database is constructed and labelled using a semi-automatic process based on closed caption subtitle based keyword search. Currently, AFEW is the largest facial expressions database representing challenging conditions available to the research community. For providing a common platform to researchers in order to evaluate and extend their state-of-the-art FER methods, the first Emotion Recognition in the Wild (EmotiW) challenge based on AFEW is proposed. An image-only based facial expressions database Static Facial Expressions In The Wild (SFEW) extracted from AFEW is proposed. Furthermore, the thesis focuses on HPN for real-world images. Earlier methods were based on fiducial points. However, as fiducial points detection is an open problem for real-world images, HPN can be error-prone. A HPN method based on response maps generated from part-detectors is proposed. The proposed shape-constrained method does not require fiducial points and head pose information, which makes it suitable for real-world images. Data from movies and the internet, representing real-world conditions poses another major challenge of the presence of multiple subjects to the research community. This defines another focus of this thesis where a novel approach for modeling the perception of mood of a group of people in an image is presented. A new database is constructed from Flickr based on keywords related to social events. Three models are proposed: averaging based Group Expression Model (GEM), Weighted Group Expression Model (GEM_w) and Augmented Group Expression Model (GEM_LDA). GEM_w is based on social contextual attributes, which are used as weights on each person's contribution towards the overall group's mood. Further, GEM_LDA is based on topic model and feature augmentation. The proposed framework is applied to applications of group candid shot selection and event summarisation. The application of Structural SIMilarity (SSIM) index metric is explored for finding similar facial expressions. The proposed framework is applied to the problem of creating image albums based on facial expressions, finding corresponding expressions for training facial performance transfer algorithms

    Face Image Retrieval with Landmark Detection and Semantic Concepts Extraction

    Get PDF
    This thesis proposes various novel approaches for improving the performances of automatic facial landmarks detection system based on the concept of pictorial tree structure model. Furthermore, a robust glasses landmark detection system is also proposed as glasses are commonly used. These proposed approaches are employed to develop an automatic semantic based face images retrieval system. The experiment results demonstrate significant improvements of all the proposed approaches towards accuracy and efficiency
    • …
    corecore