1,463 research outputs found

    Image processing for plastic surgery planning

    Get PDF
    This thesis presents some image processing tools for plastic surgery planning. In particular, it presents a novel method that combines local and global context in a probabilistic relaxation framework to identify cephalometric landmarks used in Maxillofacial plastic surgery. It also uses a method that utilises global and local symmetry to identify abnormalities in CT frontal images of the human body. The proposed methodologies are evaluated with the help of several clinical data supplied by collaborating plastic surgeons

    Deep face tracking and parsing in the wild

    Get PDF
    Face analysis has been a long-standing research direction in the field of computer vision and pattern recognition. A complete face analysis system involves solving several tasks including face detection, face tracking, face parsing, and face recognition. Recently, the performance of methods in all tasks has significantly improved thanks to the employment of Deep Convolutional Neural Networks (DCNNs). However, existing face analysis algorithms mainly focus on solving facial images captured in the constrained laboratory environment, and their performance on real-world images has remained less explored. Compared with the lab environment, the in-the-wild settings involve greater diversity in face sizes, poses, facial expressions, background clutters, lighting conditions and imaging quality. This thesis investigates two fundamental tasks in face analysis under in-the-wild settings: face tracking and face parsing. Both tasks serve as important prerequisites for downstream face analysis applications. However, in-the-wild datasets remain scarce in both fields and models have not been rigorously evaluated in such settings. In this thesis, we aim to bridge that gap of lacking in-the-wild data, evaluate existing methods in these settings, and develop accurate, robust and efficient deep learning-based methods for the two tasks. For face tracking in the wild, we introduce the first in-the-wild face tracking dataset, MobiFace, that consists of 80 videos captured by mobile phones during mobile live-streaming. The environment of the live-streaming performance is fully unconstrained and the interactions between users and mobile phones are natural and spontaneous. Next, we evaluate existing tracking methods, including generic object trackers and dedicated face trackers. The results show that MobiFace represent unique challenges in face tracking in the wild and cannot be readily solved by existing methods. Finally, we present a DCNN-based framework, FT-RCNN, that significantly outperforms other methods in face tracking in the wild. For face parsing in the wild, we introduce the first large-scale in-the-wild face dataset, iBugMask, that contains 21, 866 training images and 1, 000 testing images. Unlike existing datasets, the images in iBugMask are captured in the fully unconstrained environment and are not cropped or preprocessed of any kind. Manually annotated per-pixel labels for eleven facial regions are provided for each target face. Next, we benchmark existing parsing methods and the results show that iBugMask is extremely challenging for all methods. By rigorous benchmarking, we observe that the pre-processing of facial images with bounding boxes in face parsing in the wild introduces bias. When cropping the face with a bounding box, a cropping margin has to be hand-picked. If face alignment is used, fiducial landmarks are required and a predefined alignment template has to be selected. These additional hyper-parameters have to be carefully considered and can have a significant impact on the face parsing performance. To solve this, we propose Region-of-Interest (RoI) Tanh-polar transform that warps the whole image to a fixed-sized representation. Moreover, the RoI Tanh-polar transform is differentiable and allows for rotation equivariance in 1 DCNNs. We show that when coupled with a simple Fully Convolutional Network, our RoI Tanh-polar transformer Network has achieved state-of-the-art results on face parsing in the wild. This thesis contributes towards in-the-wild face tracking and face parsing by providing novel datasets and proposing effective frameworks. Both tasks can benefit real-world downstream applications such as facial age estimation, facial expression recognition and lip-reading. The proposed RoI Tanh-polar transform also provides a new perspective in how to preprocess the face images and make the DCNNs truly end-to-end for real-world face analysis applications.Open Acces

    Deep Learning Architectures for Heterogeneous Face Recognition

    Get PDF
    Face recognition has been one of the most challenging areas of research in biometrics and computer vision. Many face recognition algorithms are designed to address illumination and pose problems for visible face images. In recent years, there has been significant amount of research in Heterogeneous Face Recognition (HFR). The large modality gap between faces captured in different spectrum as well as lack of training data makes heterogeneous face recognition (HFR) quite a challenging problem. In this work, we present different deep learning frameworks to address the problem of matching non-visible face photos against a gallery of visible faces. Algorithms for thermal-to-visible face recognition can be categorized as cross-spectrum feature-based methods, or cross-spectrum image synthesis methods. In cross-spectrum feature-based face recognition a thermal probe is matched against a gallery of visible faces corresponding to the real-world scenario, in a feature subspace. The second category synthesizes a visible-like image from a thermal image which can then be used by any commercial visible spectrum face recognition system. These methods also beneficial in the sense that the synthesized visible face image can be directly utilized by existing face recognition systems which operate only on the visible face imagery. Therefore, using this approach one can leverage the existing commercial-off-the-shelf (COTS) and government-off-the-shelf (GOTS) solutions. In addition, the synthesized images can be used by human examiners for different purposes. There are some informative traits, such as age, gender, ethnicity, race, and hair color, which are not distinctive enough for the sake of recognition, but still can act as complementary information to other primary information, such as face and fingerprint. These traits, which are known as soft biometrics, can improve recognition algorithms while they are much cheaper and faster to acquire. They can be directly used in a unimodal system for some applications. Usually, soft biometric traits have been utilized jointly with hard biometrics (face photo) for different tasks in the sense that they are considered to be available both during the training and testing phases. In our approaches we look at this problem in a different way. We consider the case when soft biometric information does not exist during the testing phase, and our method can predict them directly in a multi-tasking paradigm. There are situations in which training data might come equipped with additional information that can be modeled as an auxiliary view of the data, and that unfortunately is not available during testing. This is the LUPI scenario. We introduce a novel framework based on deep learning techniques that leverages the auxiliary view to improve the performance of recognition system. We do so by introducing a formulation that is general, in the sense that can be used with any visual classifier. Every use of auxiliary information has been validated extensively using publicly available benchmark datasets, and several new state-of-the-art accuracy performance values have been set. Examples of application domains include visual object recognition from RGB images and from depth data, handwritten digit recognition, and gesture recognition from video. We also design a novel aggregation framework which optimizes the landmark locations directly using only one image without requiring any extra prior which leads to robust alignment given arbitrary face deformations. Three different approaches are employed to generate the manipulated faces and two of them perform the manipulation via the adversarial attacks to fool a face recognizer. This step can decouple from our framework and potentially used to enhance other landmark detectors. Aggregation of the manipulated faces in different branches of proposed method leads to robust landmark detection. Finally we focus on the generative adversarial networks which is a very powerful tool in synthesizing a visible-like images from the non-visible images. The main goal of a generative model is to approximate the true data distribution which is not known. In general, the choice for modeling the density function is challenging. Explicit models have the advantage of explicitly calculating the probability densities. There are two well-known implicit approaches, namely the Generative Adversarial Network (GAN) and Variational AutoEncoder (VAE) which try to model the data distribution implicitly. The VAEs try to maximize the data likelihood lower bound, while a GAN performs a minimax game between two players during its optimization. GANs overlook the explicit data density characteristics which leads to undesirable quantitative evaluations and mode collapse. This causes the generator to create similar looking images with poor diversity of samples. In the last chapter of thesis, we focus to address this issue in GANs framework

    From 3D Point Clouds to Pose-Normalised Depth Maps

    Get PDF
    We consider the problem of generating either pairwise-aligned or pose-normalised depth maps from noisy 3D point clouds in a relatively unrestricted poses. Our system is deployed in a 3D face alignment application and consists of the following four stages: (i) data filtering, (ii) nose tip identification and sub-vertex localisation, (iii) computation of the (relative) face orientation, (iv) generation of either a pose aligned or a pose normalised depth map. We generate an implicit radial basis function (RBF) model of the facial surface and this is employed within all four stages of the process. For example, in stage (ii), construction of novel invariant features is based on sampling this RBF over a set of concentric spheres to give a spherically-sampled RBF (SSR) shape histogram. In stage (iii), a second novel descriptor, called an isoradius contour curvature signal, is defined, which allows rotational alignment to be determined using a simple process of 1D correlation. We test our system on both the University of York (UoY) 3D face dataset and the Face Recognition Grand Challenge (FRGC) 3D data. For the more challenging UoY data, our SSR descriptors significantly outperform three variants of spin images, successfully identifying nose vertices at a rate of 99.6%. Nose localisation performance on the higher quality FRGC data, which has only small pose variations, is 99.9%. Our best system successfully normalises the pose of 3D faces at rates of 99.1% (UoY data) and 99.6% (FRGC data)

    3D Face Recognition

    Get PDF

    Mouth features extraction for emotion classification

    Full text link

    Deep Learning Based Face Image Synthesis

    Get PDF
    Face image synthesis is an important problem in the biometrics and computer vision communities due to its applications in law enforcement and entertainment. In this thesis, we develop novel deep neural network models and associated loss functions for two face image synthesis problems, namely thermal to visible face synthesis and visual attribute to face synthesis. In particular, for thermal to visible face synthesis, we propose a model which makes use of facial attributes to obtain better synthesis. We use attributes extracted from visible images to synthesize attribute-preserved visible images from thermal imagery. A pre-trained attribute predictor network is used to extract attributes from the visible image. Then, a novel multi-scale generator is proposed to synthesize the visible image from the thermal image guided by the extracted attributes. Finally, a pre-trained VGG-Face network is leveraged to extract features from the synthesized image and the input visible image for verification. In addition, we propose another thermal to visible face synthesis method based on selfattention generative adversarial network (SAGAN) which allows efficient attention-guided image synthesis. Rather than focusing only on synthesizing visible faces from thermal faces, we also propose to synthesize thermal faces from visible faces. Our intuition is based on the fact that thermal images also contain some discriminative information about the person for verification. Deep features from a pre-trained Convolutional Neural Network (CNN) are extracted from the original as well as the synthesized images. These features are then fused to generate a template which is then used for cross-modal face verification. Regarding attribute to face image synthesis, we propose the Att2SK2Face model for face image synthesis from visual attributes via sketch. In this approach, we first synthesize facial sketch corresponding to the visual attributes and then generate the face image based on the synthesized sketch. The proposed framework is based on a combination of two different Generative Adversarial Networks (GANs) – (1) a sketch generator network which synthesizes realistic sketch from the input attributes, and (2) a face generator network which synthesizes facial images from the synthesized sketch images with the help of facial attributes. Finally, we propose another synthesis model, called Att2MFace, which can simultaneously synthesize multimodal faces from visual attributes without requiring paired data in different domains for training the network. We introduce a novel generator with multimodal stretch-out modules to simultaneously synthesize multimodal face images. Additionally, multimodal stretch-in modules are introduced in the discriminator which discriminate between real and fake images
    corecore