6 research outputs found

    Facial landmark detection via attention-adaptive deep network

    Get PDF
    Facial landmark detection is a key component of the face recognition pipeline as well as facial attribute analysis and face verification. Recently convolutional neural network-based face alignment methods have achieved significant improvement, but occlusion is still a major source of a hurdle to achieve good accuracy. In this paper, we introduce the attentioned distillation module in our previous work Occlusion-adaptive Deep Network (ODN) model, to improve performance. In this model, the occlusion probability of each position in high-level features are inferred by a distillation module. It can be learnt automatically in the process of estimating the relationship between facial appearance and facial shape. The occlusion probability serves as the adaptive weight on high-level features to reduce the impact of occlusion and obtain clean feature representation. Nevertheless, the clean feature representation cannot represent the holistic face due to the missing semantic features. To obtain exhaustive and complete feature representation, it is vital that we leverage a low-rank learning module to recover lost features. Considering that facial geometric characteristics are conducive to the low-rank module to recover lost features, the role of the geometry-aware module is, to excavate geometric relationships between different facial components. The role of attentioned distillation module is, to get rich feature representation and model occlusion. To improve feature representation, we used channel-wise attention and spatial attention. Experimental results show that our method performs better than existing methods

    Facial Landmark Point Localization using Coarse-to-Fine Deep Recurrent Neural Network

    Full text link
    The accurate localization of facial landmarks is at the core of face analysis tasks, such as face recognition and facial expression analysis, to name a few. In this work we propose a novel localization approach based on a Deep Learning architecture that utilizes dual cascaded CNN subnetworks of the same length, where each subnetwork in a cascade refines the accuracy of its predecessor. The first set of cascaded subnetworks estimates heatmaps that encode the landmarks' locations, while the second set of cascaded subnetworks refines the heatmaps-based localization using regression, and also receives as input the output of the corresponding heatmap estimation subnetwork. The proposed scheme is experimentally shown to compare favorably with contemporary state-of-the-art schemes

    Markerless facial motion capture: deep learning approaches on RGBD data

    Get PDF
    Facial expressions are a series of fast, complex and interconnected movement that causes an array of deformations, such as stretching, compressing and folding of the skin. Identifying expression is a natural process in human vision, but due to the diversity of faces, it has many challenges for computer vision. Research in markerless facial motion capture using single Red Green Blue (RGB) camera has gained popularity due to the wide access of the data, such as from mobile phones. The motivation behind this work is much of the existing work attempts to infer the 3-Dimensional (3D) data from 2-Dimensional (2D) images, such as in motion capture multiple 2D cameras are calibration to allow some depth prediction. Whereas, the inclusion of Red Green Blue Depth (RGBD) sensors that give ground truth depth data could gain a better understanding of the human face and how expressions are visualised. The aim of this thesis is to investigate and develop novel methods of markerless facial motion capture, where the focus is on the inclusions of RGBD data to provide 3D data. The contributions are: A tool to aid in the annotation of 3D facial landmarks; A novel neural network that demonstrate the ability of predicting 2D and 3D landmarks by merging RGBD data; Working application that demonstrates complex deep learning network on portable handheld devices; A review of existing methods of denoising fine detail in depth maps using neural networks; A network for the complete analysis of facial landmarks and expressions in 3D. The 3D annotator was developed to overcome the issues of relying on existing 3D modelling software, which made feature identification difficult. The technique of predicting 2D and 3D with auxiliary information, allowed high accuracy 3D landmarking, without the need for full model generation. Also, it outperformed other recent techniques of landmarking. The networks running on the handheld devices show as a proof of concept that even without much optimisation, a complex task can be performed in near real-time. Denoising Time of Flight (ToF) depth maps, showed much more complexity than the tradition RGB denoising, where we reviewed and applied an array of techniques to the task. The full facial analysis showed that when neural networks perform on a wide range of related task for auxiliary information allow for deep understanding of the overall task. The research for facial processing is vast, but still with many new problems and challenges to face and improve upon. While RGB cameras are used widely, we see the inclusion of high accuracy and cost-effective depth sensing device available. The new devices allow better understanding of facial features and expression. By using and merging RGB data, the area of facial landmarking, and expression intensity recognition can be improved
    corecore