1,186 research outputs found

    A Comprehensive Performance Evaluation of Deformable Face Tracking "In-the-Wild"

    Full text link
    Recently, technologies such as face detection, facial landmark localisation and face recognition and verification have matured enough to provide effective and efficient solutions for imagery captured under arbitrary conditions (referred to as "in-the-wild"). This is partially attributed to the fact that comprehensive "in-the-wild" benchmarks have been developed for face detection, landmark localisation and recognition/verification. A very important technology that has not been thoroughly evaluated yet is deformable face tracking "in-the-wild". Until now, the performance has mainly been assessed qualitatively by visually assessing the result of a deformable face tracking technology on short videos. In this paper, we perform the first, to the best of our knowledge, thorough evaluation of state-of-the-art deformable face tracking pipelines using the recently introduced 300VW benchmark. We evaluate many different architectures focusing mainly on the task of on-line deformable face tracking. In particular, we compare the following general strategies: (a) generic face detection plus generic facial landmark localisation, (b) generic model free tracking plus generic facial landmark localisation, as well as (c) hybrid approaches using state-of-the-art face detection, model free tracking and facial landmark localisation technologies. Our evaluation reveals future avenues for further research on the topic.Comment: E. Antonakos and P. Snape contributed equally and have joint second authorshi

    LiveCap: Real-time Human Performance Capture from Monocular Video

    Full text link
    We present the first real-time human performance capture approach that reconstructs dense, space-time coherent deforming geometry of entire humans in general everyday clothing from just a single RGB video. We propose a novel two-stage analysis-by-synthesis optimization whose formulation and implementation are designed for high performance. In the first stage, a skinned template model is jointly fitted to background subtracted input video, 2D and 3D skeleton joint positions found using a deep neural network, and a set of sparse facial landmark detections. In the second stage, dense non-rigid 3D deformations of skin and even loose apparel are captured based on a novel real-time capable algorithm for non-rigid tracking using dense photometric and silhouette constraints. Our novel energy formulation leverages automatically identified material regions on the template to model the differing non-rigid deformation behavior of skin and apparel. The two resulting non-linear optimization problems per-frame are solved with specially-tailored data-parallel Gauss-Newton solvers. In order to achieve real-time performance of over 25Hz, we design a pipelined parallel architecture using the CPU and two commodity GPUs. Our method is the first real-time monocular approach for full-body performance capture. Our method yields comparable accuracy with off-line performance capture techniques, while being orders of magnitude faster

    Computationally efficient deformable 3D object tracking with a monocular RGB camera

    Get PDF
    182 p.Monocular RGB cameras are present in most scopes and devices, including embedded environments like robots, cars and home automation. Most of these environments have in common a significant presence of human operators with whom the system has to interact. This context provides the motivation to use the captured monocular images to improve the understanding of the operator and the surrounding scene for more accurate results and applications.However, monocular images do not have depth information, which is a crucial element in understanding the 3D scene correctly. Estimating the three-dimensional information of an object in the scene using a single two-dimensional image is already a challenge. The challenge grows if the object is deformable (e.g., a human body or a human face) and there is a need to track its movements and interactions in the scene.Several methods attempt to solve this task, including modern regression methods based on Deep NeuralNetworks. However, despite the great results, most are computationally demanding and therefore unsuitable for several environments. Computational efficiency is a critical feature for computationally constrained setups like embedded or onboard systems present in robotics and automotive applications, among others.This study proposes computationally efficient methodologies to reconstruct and track three-dimensional deformable objects, such as human faces and human bodies, using a single monocular RGB camera. To model the deformability of faces and bodies, it considers two types of deformations: non-rigid deformations for face tracking, and rigid multi-body deformations for body pose tracking. Furthermore, it studies their performance on computationally restricted devices like smartphones and onboard systems used in the automotive industry. The information extracted from such devices gives valuable insight into human behaviour a crucial element in improving human-machine interaction.We tested the proposed approaches in different challenging application fields like onboard driver monitoring systems, human behaviour analysis from monocular videos, and human face tracking on embedded devices

    Computationally efficient deformable 3D object tracking with a monocular RGB camera

    Get PDF
    182 p.Monocular RGB cameras are present in most scopes and devices, including embedded environments like robots, cars and home automation. Most of these environments have in common a significant presence of human operators with whom the system has to interact. This context provides the motivation to use the captured monocular images to improve the understanding of the operator and the surrounding scene for more accurate results and applications.However, monocular images do not have depth information, which is a crucial element in understanding the 3D scene correctly. Estimating the three-dimensional information of an object in the scene using a single two-dimensional image is already a challenge. The challenge grows if the object is deformable (e.g., a human body or a human face) and there is a need to track its movements and interactions in the scene.Several methods attempt to solve this task, including modern regression methods based on Deep NeuralNetworks. However, despite the great results, most are computationally demanding and therefore unsuitable for several environments. Computational efficiency is a critical feature for computationally constrained setups like embedded or onboard systems present in robotics and automotive applications, among others.This study proposes computationally efficient methodologies to reconstruct and track three-dimensional deformable objects, such as human faces and human bodies, using a single monocular RGB camera. To model the deformability of faces and bodies, it considers two types of deformations: non-rigid deformations for face tracking, and rigid multi-body deformations for body pose tracking. Furthermore, it studies their performance on computationally restricted devices like smartphones and onboard systems used in the automotive industry. The information extracted from such devices gives valuable insight into human behaviour a crucial element in improving human-machine interaction.We tested the proposed approaches in different challenging application fields like onboard driver monitoring systems, human behaviour analysis from monocular videos, and human face tracking on embedded devices

    Detección rápida de puntos de referencia faciales y aplicaciones: estudio de la bibliografía

    Get PDF
    Dense facial landmark detection is one of the key elements of face processing pipeline. It is used in virtual face reenactment, emotion recognition, driver status tracking, etc. Early approaches were suitable for facial landmark detection in controlled environments only, which is clearly insufficient. Neural networks have shown an astonishing qualitative improvement for in-the-wild face landmark detection problem, and are now being studied by many researchers in the field. Numerous bright ideas are proposed, often complimentary to each other. However, exploration of the whole volume of novel approaches is quite challenging. Therefore, we present this survey, where we summarize state-of-the-art algorithms into categories, provide a comparison of recently introduced in-the-wild datasets (e.g., 300W, AFLW, COFW, WFLW) that contain images with large pose, face occlusion, taken in unconstrained conditions. In addition to quality, applications require fast inference, and preferably on mobile devices. Hence, we include information about algorithm inference speed both on desktop and mobile hardware, which is rarely studied. Importantly, we highlight problems of algorithms, their applications, vulnerabilities, and briefly touch on established methods. We hope that the reader will find many novel ideas, will see how the algorithms are used in applications, which will enable further research.La detección de puntos de referenda faciales densos es uno de los elementos clave del proceso de procesamiento de rostros. Se utiliza en la anünación de rostros virtuales, el reconocüniento de emociones, el seguimiento del estado del conductor, etc. Los prüneros enfoques eran adecuados para la detección de puntos de referencia faciales solo en entornos controlados, lo que claramente es insuficiente. Las redes neuronales han mostrado una asombrosa mejora cualitativa para el problema de detección de puntos de referencia faciales en condiciones del mundo real, y ahora están siendo estudiadas por muchos investigadores en el campo. Se proponen numerosas ideas brillantes, a menudo complementarias. Sin embargo, la exploración de todo el volumen de enfoques novedosos es bastante desafiante. Por lo tanto, presentamos esta encuesta, donde resumimos los algoritmos de última generación en categorías, brindamos una comparación de los conjuntos de datos introducidos recientemente (por ejemplo, 300W, AFLW, COFW, WFLW) que contienen imágenes con pose grande, oclusión facial, tomadas en condiciones sin restricciones. Además de calidad, las aplicaciones requieren una inferencia rápida y preferentemente en dispositivos móviles. Por lo tanto, incluimos información sobre la velocidad de inferencia de algoritmos tanto en hardware de escritorio como móvil, que rara vez se estudia. Es importante destacar que destacamos los problemas de los algoritmos, sus aplicaciones, vulnerabilidades y mencionamos brevemente los métodos establecidos. Esperamos que el lector encuentre muchas ideas novedosas, vea cómo se utilizan los algoritmos en las aplicaciones, lo que permitirá futuras investigaciones.Facultad de Informátic

    Edge AI on a Deep-Learning based Real-Time Face Identification and Attributes Recognition System

    Get PDF
    There is another way of understanding how a customer service office works, and Everis is developing it in its new generation of spaces designed to offer easy and personalized attention to its customers. Some of the technologies implemented in this space to offer a better experience range from voice recognition or facial identification to the detection of hand gestures. The purpose of the project is to incorporate into the Everis customer e-Motion HUB a new computer vision-based system to extend its abilities and to improve the user experience.Face recognition systems are nowadays being used in a variety of settings, including surveillance systems and human-computer interactions. Different approaches have been used for face recognition throughout the years, but recent research has shown that Deep Learning models along with Convolutional Neural Networks, or \gls{CNN}s, provide better results than any other methods. However, these more complex \gls{CNN} models have several limitations, including the need for extensive training data or high computational requirements in some cases. Fields such as robotics and embedded systems that deploy face recognition systems have significantly less power on board and limited heat dissipation capacity. Therefore, it can be difficult to deploy deep learning models on them. Additionally, and to counter these issues, the classical approach in some industries has been to rely on cloud computing or other third companies paid services. Edge computing devices, such as the NVIDIA Jetson Nano proposed in this approach, can bridge this gap by providing certain advantages in many different areas. In this thesis, we explore the Edge Artificial Intelligence or Edge AI capabilities by developing and implementing a real-time face recognition system along with multiple feature extraction namely age, gender, emotions, and paid attention. Additionally, we provide a data storing approach into a relational database so that all the gathered information can be further exploited. Although this work has certain areas that can be improved, mainly with regards to its efficiency, it has served as a proof of concept for the ideas behind it. Consequently, research in this direction will surely be continued
    corecore