47 research outputs found
Synthesization and reconstruction of 3D faces by deep neural networks
The past few decades have witnessed substantial progress towards 3D facial modelling and reconstruction as it is high importance for many computer vision and graphics applications including Augmented/Virtual Reality (AR/VR), computer games, movie post-production, image/video editing, medical applications, etc. In the traditional approaches, facial texture and shape are represented as triangle mesh that can cover identity and expression variation with non-rigid deformation. A dataset of 3D face scans is then densely registered into a common topology in order to construct a linear statistical model. Such models are called 3D Morphable Models (3DMMs) and can be used for 3D face synthesization or reconstruction by a single or few 2D face images. The works presented in this thesis focus on the modernization of these traditional techniques in the light of recent advances of deep learning and thanks to the availability of large-scale datasets.
Ever since the introduction of 3DMMs by over two decades, there has been a lot of progress on it and they are still considered as one of the best methodologies to model 3D faces. Nevertheless, there are still several aspects of it that need to be upgraded to the "deep era". Firstly, the conventional 3DMMs are built by linear statistical approaches such as Principal Component Analysis (PCA) which omits high-frequency information by its nature. While this does not curtail shape, which is often smooth in the original data, texture models are heavily afflicted by losing high-frequency details and photorealism. Secondly, the existing 3DMM fitting approaches rely on very primitive (i.e. RGB values, sparse landmarks) or hand-crafted features (i.e. HOG, SIFT) as supervision that are sensitive to "in-the-wild" images (i.e. lighting, pose, occlusion), or somewhat missing identity/expression resemblance with the target image. Finally, shape, texture, and expression modalities are separately modelled by ignoring the correlation among them, placing a fundamental limit to the synthesization of semantically meaningful 3D faces. Moreover, photorealistic 3D face synthesis has not been studied thoroughly in the literature.
This thesis attempts to address the above-mentioned issues by harnessing the power of deep neural network and generative adversarial networks as explained below:
Due to the linear texture models, many of the state-of-the-art methods are still not capable of reconstructing facial textures with high-frequency details. For this, we take a radically different approach and build a high-quality texture model by Generative Adversarial Networks (GANs) that preserves details. That is, we utilize GANs to train a very powerful generator of facial texture in the UV space. And then show that it is possible to employ this generator network as a statistical texture prior to 3DMM fitting. The resulting texture reconstructions are plausible and photorealistic as GANs are faithful to the real-data distribution in both low- and high- frequency domains.
Then, we revisit the conventional 3DMM fitting approaches making use of non-linear optimization to find the optimal latent parameters that best reconstruct the test image but under a new perspective. We propose to optimize the parameters with the supervision of pretrained deep identity features through our end-to-end differentiable framework. In order to be robust towards initialization and expedite the fitting process, we also propose a novel self-supervised regression-based approach. We demonstrate excellent 3D face reconstructions that are photorealistic and identity preserving and achieve for the first time, to the best of our knowledge, facial texture reconstruction with high-frequency details.
In order to extend the non-linear texture model for photo-realistic 3D face synthesis, we present a methodology that generates high-quality texture, shape, and normals jointly. To do so, we propose a novel GAN that can generate data from different modalities while exploiting their correlations. Furthermore, we demonstrate how we can condition the generation on the expression and create faces with various facial expressions. Additionally, we study another approach for photo-realistic face synthesis by 3D guidance. This study proposes to generate 3D faces by linear 3DMM and then augment their 2D rendering by an image-to-image translation network to the photorealistic face domain. Both works demonstrate excellent photorealistic face synthesis and show that the generated faces are improving face recognition benchmarks as synthetic training data.
Finally, we study expression reconstruction for personalized 3D face models where we improve generalization and robustness of expression encoding. First, we propose a 3D augmentation approach on 2D head-mounted camera images to increase robustness to perspective changes. And, we also propose to train generic expression encoder network by populating the number of identities with a novel multi-id personalized model training architecture in a self-supervised manner. Both approaches show promising results in both qualitative and quantitative experiments.Open Acces
Machine learning techniques in pain recognition.
No abstract available.The original print copy of this thesis may be available here: http://wizard.unbc.ca/record=b131711
Proceedings. 22. Workshop Computational Intelligence, Dortmund, 6. - 7. Dezember 2012
Dieser Tagungsband enthält die Beiträge des 22. Workshops "Computational Intelligence" des Fachausschusses 5.14 der VDI/VDE-Gesellschaft für Mess- und Automatisierungstechnik (GMA) der vom 6. - 7. Dezember 2012 in Dortmund stattgefunden hat.
Die Schwerpunkte sind Methoden, Anwendungen und Tools für
- Fuzzy-Systeme,
- Künstliche Neuronale Netze,
- Evolutionäre Algorithmen und
- Data-Mining-Verfahren
sowie der Methodenvergleich anhand von industriellen Anwendungen und Benchmark-Problemen
The Future of Humanoid Robots
This book provides state of the art scientific and engineering research findings and developments in the field of humanoid robotics and its applications. It is expected that humanoids will change the way we interact with machines, and will have the ability to blend perfectly into an environment already designed for humans. The book contains chapters that aim to discover the future abilities of humanoid robots by presenting a variety of integrated research in various scientific and engineering fields, such as locomotion, perception, adaptive behavior, human-robot interaction, neuroscience and machine learning. The book is designed to be accessible and practical, with an emphasis on useful information to those working in the fields of robotics, cognitive science, artificial intelligence, computational methods and other fields of science directly or indirectly related to the development and usage of future humanoid robots. The editor of the book has extensive R&D experience, patents, and publications in the area of humanoid robotics, and his experience is reflected in editing the content of the book
Visual Cortex
The neurosciences have experienced tremendous and wonderful progress in many areas, and the spectrum encompassing the neurosciences is expansive. Suffice it to mention a few classical fields: electrophysiology, genetics, physics, computer sciences, and more recently, social and marketing neurosciences. Of course, this large growth resulted in the production of many books. Perhaps the visual system and the visual cortex were in the vanguard because most animals do not produce their own light and offer thus the invaluable advantage of allowing investigators to conduct experiments in full control of the stimulus. In addition, the fascinating evolution of scientific techniques, the immense productivity of recent research, and the ensuing literature make it virtually impossible to publish in a single volume all worthwhile work accomplished throughout the scientific world. The days when a single individual, as Diderot, could undertake the production of an encyclopedia are gone forever. Indeed most approaches to studying the nervous system are valid and neuroscientists produce an almost astronomical number of interesting data accompanied by extremely worthy hypotheses which in turn generate new ventures in search of brain functions. Yet, it is fully justified to make an encore and to publish a book dedicated to visual cortex and beyond. Many reasons validate a book assembling chapters written by active researchers. Each has the opportunity to bind together data and explore original ideas whose fate will not fall into the hands of uncompromising reviewers of traditional journals. This book focuses on the cerebral cortex with a large emphasis on vision. Yet it offers the reader diverse approaches employed to investigate the brain, for instance, computer simulation, cellular responses, or rivalry between various targets and goal directed actions. This volume thus covers a large spectrum of research even though it is impossible to include all topics in the extremely diverse field of neurosciences
Proceedings of the 7th Sound and Music Computing Conference
Proceedings of the SMC2010 - 7th Sound and Music Computing Conference, July 21st - July 24th 2010
Recommended from our members
Face Detection Using Single Cascade of Customized Features Discriminators
Face detection has become an important and helpful tool for camera and video processing. Useful human-computer interaction (HCI) applications such as drivers assistant system that prevents accidents and saves pedestrian lives when drivers attention is absent, needs a head pose estimator. A head pose estimator cannot function without face detector.
There has been a considerable amount of literature to address the problem. The most significant results obtained on uptight frontal face detection which is a sub-problem of a larger problem of face detection. There are other types of sub-problems that has been studied with least significant advancements that the upright frontal face detection had accomplished. The problem of multi-pose detection is still under study and it remains hard.
A solution to this large scale of the problem (multi-pose face detection) is critical in head pose accuracy. This thesis suggests a multi-pose face detection algorithm for uncontrolled environments. The detector is designed to be used in building head pose estimator for a human-computer interaction application. The observed design of the detector has to implement a cascade of classifiers. Each classifier has to address at least one certain area of the problem. The design have to maintain speed and an acceptable detection rate.
These requirements can be satisfied by constructing the cascade to implement fast and simple classifiers at first stages of the cascade. A novel use of the integral image as a fast filter was invented to be placed at the start of the detection process. Included in the cascade, classifiers that are trained on special designed features aimed to solve part of the problem. One special unique classifier is a data mining based classifier that uses a modified version of the Maximal Frequent Itemset Algorithm (MAFIA) [2] for feature extraction.
Special features classifiers use the extracted facial features information extracted from a new knowledge-based classifier/filter that was created with the capacity to locate to an acceptable ac- curacy the location of eyes, mouth and nose using a suite of approaches including discreet local minima and geometric measures. The extracted facial features were used to estimate head pose and extract classifier features accordingly to enhance detection rates.
A cascade of classifiers based on fast and simple contrast features was used to refine and speed up the detection process. To further improve speed some components were parallelized. As an attempt to overcome some of the fundamental challenges of face detection, lighting correction and noise reduction were implemented based on the information extracted from images.
Results are reported on the FDDB [12] benchmark showed 5.22% detection rate with 2000 false positives while OpenCV implementation of Viola-Jones [19] face detector showed 65.92 detection rate with 2010 false positives. This comparison is flawed; because Viola-Jones is an upright face detector and even though FDDB [12] includes a number on non-frontal faces and profiles the majority of the faces are frontal. The two solutions address two different problems that reflect large differences in difficulty.
A standard benchmark testset and evaluation system as FDDB [12] benchmark and com- parable results from the same class of the problem at the time of writing this document was not available. The key points to building good face detector in general are; (1) resolving speed issues using fast techniques (e.g. integral image) at the start of the cascade and a powerful design, (2) using a huge number of different strong and weak features, and (3) eliminating variations (i.e. pose , noise and lighting variations). The algorithm was also tested on MIT+CMU upfront faces testset and reported 43.56% detection rate with 504 false positives