1,605 research outputs found

    Computationally efficient deformable 3D object tracking with a monocular RGB camera

    Get PDF
    182 p.Monocular RGB cameras are present in most scopes and devices, including embedded environments like robots, cars and home automation. Most of these environments have in common a significant presence of human operators with whom the system has to interact. This context provides the motivation to use the captured monocular images to improve the understanding of the operator and the surrounding scene for more accurate results and applications.However, monocular images do not have depth information, which is a crucial element in understanding the 3D scene correctly. Estimating the three-dimensional information of an object in the scene using a single two-dimensional image is already a challenge. The challenge grows if the object is deformable (e.g., a human body or a human face) and there is a need to track its movements and interactions in the scene.Several methods attempt to solve this task, including modern regression methods based on Deep NeuralNetworks. However, despite the great results, most are computationally demanding and therefore unsuitable for several environments. Computational efficiency is a critical feature for computationally constrained setups like embedded or onboard systems present in robotics and automotive applications, among others.This study proposes computationally efficient methodologies to reconstruct and track three-dimensional deformable objects, such as human faces and human bodies, using a single monocular RGB camera. To model the deformability of faces and bodies, it considers two types of deformations: non-rigid deformations for face tracking, and rigid multi-body deformations for body pose tracking. Furthermore, it studies their performance on computationally restricted devices like smartphones and onboard systems used in the automotive industry. The information extracted from such devices gives valuable insight into human behaviour a crucial element in improving human-machine interaction.We tested the proposed approaches in different challenging application fields like onboard driver monitoring systems, human behaviour analysis from monocular videos, and human face tracking on embedded devices

    Computationally efficient deformable 3D object tracking with a monocular RGB camera

    Get PDF
    182 p.Monocular RGB cameras are present in most scopes and devices, including embedded environments like robots, cars and home automation. Most of these environments have in common a significant presence of human operators with whom the system has to interact. This context provides the motivation to use the captured monocular images to improve the understanding of the operator and the surrounding scene for more accurate results and applications.However, monocular images do not have depth information, which is a crucial element in understanding the 3D scene correctly. Estimating the three-dimensional information of an object in the scene using a single two-dimensional image is already a challenge. The challenge grows if the object is deformable (e.g., a human body or a human face) and there is a need to track its movements and interactions in the scene.Several methods attempt to solve this task, including modern regression methods based on Deep NeuralNetworks. However, despite the great results, most are computationally demanding and therefore unsuitable for several environments. Computational efficiency is a critical feature for computationally constrained setups like embedded or onboard systems present in robotics and automotive applications, among others.This study proposes computationally efficient methodologies to reconstruct and track three-dimensional deformable objects, such as human faces and human bodies, using a single monocular RGB camera. To model the deformability of faces and bodies, it considers two types of deformations: non-rigid deformations for face tracking, and rigid multi-body deformations for body pose tracking. Furthermore, it studies their performance on computationally restricted devices like smartphones and onboard systems used in the automotive industry. The information extracted from such devices gives valuable insight into human behaviour a crucial element in improving human-machine interaction.We tested the proposed approaches in different challenging application fields like onboard driver monitoring systems, human behaviour analysis from monocular videos, and human face tracking on embedded devices

    HUMAN FACE RECOGNITION BASED ON FRACTAL IMAGE CODING

    Get PDF
    Human face recognition is an important area in the field of biometrics. It has been an active area of research for several decades, but still remains a challenging problem because of the complexity of the human face. In this thesis we describe fully automatic solutions that can locate faces and then perform identification and verification. We present a solution for face localisation using eye locations. We derive an efficient representation for the decision hyperplane of linear and nonlinear Support Vector Machines (SVMs). For this we introduce the novel concept of ρ\rho and η\eta prototypes. The standard formulation for the decision hyperplane is reformulated and expressed in terms of the two prototypes. Different kernels are treated separately to achieve further classification efficiency and to facilitate its adaptation to operate with the fast Fourier transform to achieve fast eye detection. Using the eye locations, we extract and normalise the face for size and in-plane rotations. Our method produces a more efficient representation of the SVM decision hyperplane than the well-known reduced set methods. As a result, our eye detection subsystem is faster and more accurate. The use of fractals and fractal image coding for object recognition has been proposed and used by others. Fractal codes have been used as features for recognition, but we need to take into account the distance between codes, and to ensure the continuity of the parameters of the code. We use a method based on fractal image coding for recognition, which we call the Fractal Neighbour Distance (FND). The FND relies on the Euclidean metric and the uniqueness of the attractor of a fractal code. An advantage of using the FND over fractal codes as features is that we do not have to worry about the uniqueness of, and distance between, codes. We only require the uniqueness of the attractor, which is already an implied property of a properly generated fractal code. Similar methods to the FND have been proposed by others, but what distinguishes our work from the rest is that we investigate the FND in greater detail and use our findings to improve the recognition rate. Our investigations reveal that the FND has some inherent invariance to translation, scale, rotation and changes to illumination. These invariances are image dependent and are affected by fractal encoding parameters. The parameters that have the greatest effect on recognition accuracy are the contrast scaling factor, luminance shift factor and the type of range block partitioning. The contrast scaling factor affect the convergence and eventual convergence rate of a fractal decoding process. We propose a novel method of controlling the convergence rate by altering the contrast scaling factor in a controlled manner, which has not been possible before. This helped us improve the recognition rate because under certain conditions better results are achievable from using a slower rate of convergence. We also investigate the effects of varying the luminance shift factor, and examine three different types of range block partitioning schemes. They are Quad-tree, HV and uniform partitioning. We performed experiments using various face datasets, and the results show that our method indeed performs better than many accepted methods such as eigenfaces. The experiments also show that the FND based classifier increases the separation between classes. The standard FND is further improved by incorporating the use of localised weights. A local search algorithm is introduced to find a best matching local feature using this locally weighted FND. The scores from a set of these locally weighted FND operations are then combined to obtain a global score, which is used as a measure of the similarity between two face images. Each local FND operation possesses the distortion invariant properties described above. Combined with the search procedure, the method has the potential to be invariant to a larger class of non-linear distortions. We also present a set of locally weighted FNDs that concentrate around the upper part of the face encompassing the eyes and nose. This design was motivated by the fact that the region around the eyes has more information for discrimination. Better performance is achieved by using different sets of weights for identification and verification. For facial verification, performance is further improved by using normalised scores and client specific thresholding. In this case, our results are competitive with current state-of-the-art methods, and in some cases outperform all those to which they were compared. For facial identification, under some conditions the weighted FND performs better than the standard FND. However, the weighted FND still has its short comings when some datasets are used, where its performance is not much better than the standard FND. To alleviate this problem we introduce a voting scheme that operates with normalised versions of the weighted FND. Although there are no improvements at lower matching ranks using this method, there are significant improvements for larger matching ranks. Our methods offer advantages over some well-accepted approaches such as eigenfaces, neural networks and those that use statistical learning theory. Some of the advantages are: new faces can be enrolled without re-training involving the whole database; faces can be removed from the database without the need for re-training; there are inherent invariances to face distortions; it is relatively simple to implement; and it is not model-based so there are no model parameters that need to be tweaked
    corecore