1,186 research outputs found
A Comprehensive Performance Evaluation of Deformable Face Tracking "In-the-Wild"
Recently, technologies such as face detection, facial landmark localisation
and face recognition and verification have matured enough to provide effective
and efficient solutions for imagery captured under arbitrary conditions
(referred to as "in-the-wild"). This is partially attributed to the fact that
comprehensive "in-the-wild" benchmarks have been developed for face detection,
landmark localisation and recognition/verification. A very important technology
that has not been thoroughly evaluated yet is deformable face tracking
"in-the-wild". Until now, the performance has mainly been assessed
qualitatively by visually assessing the result of a deformable face tracking
technology on short videos. In this paper, we perform the first, to the best of
our knowledge, thorough evaluation of state-of-the-art deformable face tracking
pipelines using the recently introduced 300VW benchmark. We evaluate many
different architectures focusing mainly on the task of on-line deformable face
tracking. In particular, we compare the following general strategies: (a)
generic face detection plus generic facial landmark localisation, (b) generic
model free tracking plus generic facial landmark localisation, as well as (c)
hybrid approaches using state-of-the-art face detection, model free tracking
and facial landmark localisation technologies. Our evaluation reveals future
avenues for further research on the topic.Comment: E. Antonakos and P. Snape contributed equally and have joint second
authorshi
LiveCap: Real-time Human Performance Capture from Monocular Video
We present the first real-time human performance capture approach that
reconstructs dense, space-time coherent deforming geometry of entire humans in
general everyday clothing from just a single RGB video. We propose a novel
two-stage analysis-by-synthesis optimization whose formulation and
implementation are designed for high performance. In the first stage, a skinned
template model is jointly fitted to background subtracted input video, 2D and
3D skeleton joint positions found using a deep neural network, and a set of
sparse facial landmark detections. In the second stage, dense non-rigid 3D
deformations of skin and even loose apparel are captured based on a novel
real-time capable algorithm for non-rigid tracking using dense photometric and
silhouette constraints. Our novel energy formulation leverages automatically
identified material regions on the template to model the differing non-rigid
deformation behavior of skin and apparel. The two resulting non-linear
optimization problems per-frame are solved with specially-tailored
data-parallel Gauss-Newton solvers. In order to achieve real-time performance
of over 25Hz, we design a pipelined parallel architecture using the CPU and two
commodity GPUs. Our method is the first real-time monocular approach for
full-body performance capture. Our method yields comparable accuracy with
off-line performance capture techniques, while being orders of magnitude
faster
Computationally efficient deformable 3D object tracking with a monocular RGB camera
182 p.Monocular RGB cameras are present in most scopes and devices, including embedded environments like robots, cars and home automation. Most of these environments have in common a significant presence of human operators with whom the system has to interact. This context provides the motivation to use the captured monocular images to improve the understanding of the operator and the surrounding scene for more accurate results and applications.However, monocular images do not have depth information, which is a crucial element in understanding the 3D scene correctly. Estimating the three-dimensional information of an object in the scene using a single two-dimensional image is already a challenge. The challenge grows if the object is deformable (e.g., a human body or a human face) and there is a need to track its movements and interactions in the scene.Several methods attempt to solve this task, including modern regression methods based on Deep NeuralNetworks. However, despite the great results, most are computationally demanding and therefore unsuitable for several environments. Computational efficiency is a critical feature for computationally constrained setups like embedded or onboard systems present in robotics and automotive applications, among others.This study proposes computationally efficient methodologies to reconstruct and track three-dimensional deformable objects, such as human faces and human bodies, using a single monocular RGB camera. To model the deformability of faces and bodies, it considers two types of deformations: non-rigid deformations for face tracking, and rigid multi-body deformations for body pose tracking. Furthermore, it studies their performance on computationally restricted devices like smartphones and onboard systems used in the automotive industry. The information extracted from such devices gives valuable insight into human behaviour a crucial element in improving human-machine interaction.We tested the proposed approaches in different challenging application fields like onboard driver monitoring systems, human behaviour analysis from monocular videos, and human face tracking on embedded devices
Computationally efficient deformable 3D object tracking with a monocular RGB camera
182 p.Monocular RGB cameras are present in most scopes and devices, including embedded environments like robots, cars and home automation. Most of these environments have in common a significant presence of human operators with whom the system has to interact. This context provides the motivation to use the captured monocular images to improve the understanding of the operator and the surrounding scene for more accurate results and applications.However, monocular images do not have depth information, which is a crucial element in understanding the 3D scene correctly. Estimating the three-dimensional information of an object in the scene using a single two-dimensional image is already a challenge. The challenge grows if the object is deformable (e.g., a human body or a human face) and there is a need to track its movements and interactions in the scene.Several methods attempt to solve this task, including modern regression methods based on Deep NeuralNetworks. However, despite the great results, most are computationally demanding and therefore unsuitable for several environments. Computational efficiency is a critical feature for computationally constrained setups like embedded or onboard systems present in robotics and automotive applications, among others.This study proposes computationally efficient methodologies to reconstruct and track three-dimensional deformable objects, such as human faces and human bodies, using a single monocular RGB camera. To model the deformability of faces and bodies, it considers two types of deformations: non-rigid deformations for face tracking, and rigid multi-body deformations for body pose tracking. Furthermore, it studies their performance on computationally restricted devices like smartphones and onboard systems used in the automotive industry. The information extracted from such devices gives valuable insight into human behaviour a crucial element in improving human-machine interaction.We tested the proposed approaches in different challenging application fields like onboard driver monitoring systems, human behaviour analysis from monocular videos, and human face tracking on embedded devices
Detección rápida de puntos de referencia faciales y aplicaciones: estudio de la bibliografía
Dense facial landmark detection is one of the key elements of face processing pipeline. It is used in virtual face reenactment, emotion recognition, driver status tracking, etc. Early approaches were suitable for facial landmark detection in controlled environments only, which is clearly insufficient. Neural networks have shown an astonishing qualitative improvement for in-the-wild face landmark detection problem, and are now being studied by many researchers in the field. Numerous bright ideas are proposed, often complimentary to each other. However, exploration of the whole volume of novel approaches is quite challenging. Therefore, we present this survey, where we summarize state-of-the-art algorithms into categories, provide a comparison of recently introduced in-the-wild datasets (e.g., 300W, AFLW, COFW, WFLW) that contain images with large pose, face occlusion, taken in unconstrained conditions. In addition to quality, applications require fast inference, and preferably on mobile devices. Hence, we include information about algorithm inference speed both on desktop and mobile hardware, which is rarely studied. Importantly, we highlight problems of algorithms, their applications, vulnerabilities, and briefly touch on established methods. We hope that the reader will find many novel ideas, will see how the algorithms are used in applications, which will enable further research.La detección de puntos de referenda faciales densos es uno de los elementos clave del proceso de procesamiento de rostros. Se utiliza en la anünación de rostros virtuales, el reconocüniento de emociones, el seguimiento del estado del conductor, etc. Los prüneros enfoques eran adecuados para la detección de puntos de referencia faciales solo en entornos controlados, lo que claramente es insuficiente. Las redes neuronales han mostrado una asombrosa mejora cualitativa para el problema de detección de puntos de referencia faciales en condiciones del mundo real, y ahora están siendo estudiadas por muchos investigadores en el campo. Se proponen numerosas ideas brillantes, a menudo complementarias. Sin embargo, la exploración de todo el volumen de enfoques novedosos es bastante desafiante. Por lo tanto, presentamos esta encuesta, donde resumimos los algoritmos de última generación en categorías, brindamos una comparación de los conjuntos de datos introducidos recientemente (por ejemplo, 300W, AFLW, COFW, WFLW) que contienen imágenes con pose grande, oclusión facial, tomadas en condiciones sin restricciones. Además de calidad, las aplicaciones requieren una inferencia rápida y preferentemente en dispositivos móviles. Por lo tanto, incluimos información sobre la velocidad de inferencia de algoritmos tanto en hardware de escritorio como móvil, que rara vez se estudia. Es importante destacar que destacamos los problemas de los algoritmos, sus aplicaciones, vulnerabilidades y mencionamos brevemente los métodos establecidos. Esperamos que el lector encuentre muchas ideas novedosas, vea cómo se utilizan los algoritmos en las aplicaciones, lo que permitirá futuras investigaciones.Facultad de Informátic
Edge AI on a Deep-Learning based Real-Time Face Identification and Attributes Recognition System
There is another way of understanding how a customer service office works, and Everis is developing it in its new generation of spaces designed to offer easy and personalized attention to its customers. Some of the technologies implemented in this space to offer a better experience range from voice recognition or facial identification to the detection of hand gestures. The purpose of the project is to incorporate into the Everis customer e-Motion HUB a new computer vision-based system to extend its abilities and to improve the user experience.Face recognition systems are nowadays being used in a variety of settings, including surveillance systems and human-computer interactions. Different approaches have been used for face recognition throughout the years, but recent research has shown that Deep Learning models along with Convolutional Neural Networks, or \gls{CNN}s, provide better results than any other methods. However, these more complex \gls{CNN} models have several limitations, including the need for extensive training data or high computational requirements in some cases. Fields such as robotics and embedded systems that deploy face recognition systems have significantly less power on board and limited heat dissipation capacity. Therefore, it can be difficult to deploy deep learning models on them. Additionally, and to counter these issues, the classical approach in some industries has been to rely on cloud computing or other third companies paid services. Edge computing devices, such as the NVIDIA Jetson Nano proposed in this approach, can bridge this gap by providing certain advantages in many different areas. In this thesis, we explore the Edge Artificial Intelligence or Edge AI capabilities by developing and implementing a real-time face recognition system along with multiple feature extraction namely age, gender, emotions, and paid attention. Additionally, we provide a data storing approach into a relational database so that all the gathered information can be further exploited. Although this work has certain areas that can be improved, mainly with regards to its efficiency, it has served as a proof of concept for the ideas behind it. Consequently, research in this direction will surely be continued
- …