6 research outputs found

    Image based human body rendering via regression & MRF energy minimization

    Get PDF
    A machine learning method for synthesising human images is explored to create new images without relying on 3D modelling. Machine learning allows the creation of new images through prediction from existing data based on the use of training images. In the present study, image synthesis is performed at two levels: contour and pixel. A class of learning-based methods is formulated to create object contours from the training image for the synthetic image that allow pixel synthesis within the contours in the second level. The methods rely on applying robust object descriptions, dynamic learning models after appropriate motion segmentation, and machine learning-based frameworks. Image-based human image synthesis using machine learning is a research focus that has recently gained considerable attention in the field of computer graphics. It makes use of techniques from image/motion analysis in computer vision. The problem lies in the estimation of methods for image-based object configuration (i.e. segmentation, contour outline). Using the results of these analysis methods as bases, the research adopts the machine learning approach, in which human images are synthesised by executing the synthesis of contour and pixels through the learning from training image. Firstly, thesis shows how an accurate silhouette is distilled using developed background subtraction for accuracy and efficiency. The traditional vector machine approach is used to avoid ambiguities within the regression process. Images can be represented as a class of accurate and efficient vectors for single images as well as sequences. Secondly, the framework is explored using a unique view of machine learning methods, i.e., support vector regression (SVR), to obtain the convergence result of vectors for contour allocation. The changing relationship between the synthetic image and the training image is expressed as a vector and represented in functions. Finally, a pixel synthesis is performed based on belief propagation. This thesis proposes a novel image-based rendering method for colour image synthesis using SVR and belief propagation for generalisation to enable the prediction of contour and colour information from input colour images. The methods rely on using appropriately defined and robust input colour images, optimising the input contour images within a sparse SVR framework. Firstly, the thesis shows how contour can effectively and efficiently be predicted from small numbers of input contour images. In addition, the thesis exploits the sparse properties of SVR efficiency, and makes use of SVR to estimate regression function. The image-based rendering method employed in this study enables contour synthesis for the prediction of small numbers of input source images. This procedure avoids the use of complex models and geometry information. Secondly, the method used for human body contour colouring is extended to define eight differently connected pixels, and construct a link distance field via the belief propagation method. The link distance, which acts as the message in propagation, is transformed by improving the low-envelope method in fast distance transform. Finally, the methodology is tested by considering human facial and human body clothing information. The accuracy of the test results for the human body model confirms the efficiency of the proposed method.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Contributions to Robust Multi-view 3D Action Recognition

    Get PDF
    This thesis focus on human action recognition using volumetric reconstructions obtained from multiple monocular cameras. The problem of action recognition has been addressed using di erent approaches, both in the 2D and 3D domains, and using one or multiple views. However, the development of robust recognition methods, independent from the view employed, remains an open problem. Multi-view approaches allow to exploit 3D information to improve the recognition performance. Nevertheless, manipulating the large amount of information of 3D representations poses a major problem. As a consequence, standard dimensionality reduction techniques must be applied prior to the use of machine learning approaches. The rst contribution of this work is a new descriptor of volumetric information that can be further reduced using standard Dimensionality Reduction techniques in both holistic and sequential recognition approaches. However, the descriptor itself reduces the amount of data up to an order of magnitude (compared to previous descriptors) without a ecting to the classi cation performance. The descriptor represents the volumetric information obtained by SfS techniques. However, this family of techniques are highly in uenced by errors in the segmentation process (e.g., undersegmentation causes false negatives in the reconstructed volumes) so that the recognition performance is highly a ected by this rst step. The second contribution of this work is a new SfS technique (named SfSDS) that employs the Dempster-Shafer theory to fuse evidences provided by multiple cameras. The central idea is to consider the relative position between cameras so as to deal with inconsistent silhouettes and obtain robust volumetric reconstructions. The basic SfS technique still have a main drawback, it requires the whole volume to be analized in order to obtain the reconstruction. On the other hand, octree-based representations allows to save memory and time employing a dynamic tree structure where only occupied nodes are stored. Nevertheless, applying the SfS method to octreebased representations is not straightforward. The nal contribution of this work is a method for generating octrees using our proposed SfSDS technique so as to obtain robust and compact volumetric representations.Esta tesis se centra en el reconocimiento de acciones humanas usando reconstrucciones volum etricas obtenidas a partir de m ultiples c amaras monoculares. El problema del reconocimiento de acciones ha sido tratado usando diferentes enfoques, en los dominios 2D y 3D, y usando una o varias vistas. No obstante, el desarrollo de m etodos de reconocimiento robustos, independientes de la vista empleada, sigue siendo un problema abierto. Los enfoques multi-vista permiten explotar la informaci on 3D para mejorar el rendimiento del reconocimiento. Sin embargo, manipular las grandes cantidades de informaci on de las representaciones 3D plantea un importante problema. Como consecuencia, deben ser aplicadas t ecnicas est andar de reducci on de dimensionalidad con anterioridad al uso de propuestas de aprendizaje. La primera contribuci on de este trabajo es un nuevo descriptor de informaci on volum etrica que puede ser posteriormente reducido mediante t ecnicas est andar de reducci on de dimensionalidad en los enfoques de reconocimiento hol sticos y secuenciales. El descriptor, por si mismo, reduce la cantidad de datos hasta en un orden de magnitud (en comparaci on con descriptores previos) sin afectar al rendimiento de clasi caci on. El descriptor representa la informaci on volum etrica obtenida en t ecnicas SfS. Sin embargo, esta familia de t ecnicas est a altamente in uenciada por los errores en el proceso de segmentaci on (p.e., una sub-segmentaci on causa falsos negativos en los vol umenes reconstruidos) de forma que el rendimiento del reconocimiento est a signi cativamente afectado por este primer paso. La segunda contribuci on de este trabajo es una nueva t ecnica SfS (denominada SfSDS) que emplea la teor a de Dempster-Shafer para fusionar evidencias proporcionadas por m ultiples c amaras. La idea central consiste en considerar la posici on relativa entre c amaras de forma que se traten las inconsistencias en las siluetas y se obtenga reconstrucciones volum etricas robustas. La t ecnica SfS b asica sigue teniendo un inconveniente principal; requiere que el volumen completo sea analizado para obtener la reconstrucci on. Por otro lado, las representaciones basadas en octrees permiten salvar memoria y tiempo empleando una estructura de arbol din amica donde s olo se almacenan los nodos ocupados. No obstante, la aplicaci on del m etodo SfS a representaciones basadas en octrees no es directa. La contribuci on nal de este trabajo es un m etodo para la generaci on de octrees usando nuestra t ecnica SfSDS propuesta de forma que se obtengan representaciones volum etricas robustas y compactas

    Multi-signal gesture recognition using body and hand poses

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 147-154).We present a vision-based multi-signal gesture recognition system that integrates information from body and hand poses. Unlike previous approaches to gesture recognition, which concentrated mainly on making it a signal signal, our system allows a richer gesture vocabulary and more natural human-computer interaction. The system consists of three parts: 3D body pose estimation, hand pose classification, and gesture recognition. 3D body pose estimation is performed following a generative model-based approach, using a particle filtering estimation framework. Hand pose classification is performed by extracting Histogram of Oriented Gradients features and using a multi-class Support Vector Machine classifier. Finally, gesture recognition is performed using a novel statistical inference framework that we developed for multi-signal pattern recognition, extending previous work on a discriminative hidden-state graphical model (HCRF) to consider multi-signal input data, which we refer to Multi Information-Channel Hidden Conditional Random Fields (MIC-HCRFs). One advantage of MIC-HCRF is that it allows us to capture complex dependencies of multiple information channels more precisely than conventional approaches to the task. Our system was evaluated on the scenario of an aircraft carrier flight deck environment, where humans interact with unmanned vehicles using existing body and hand gesture vocabulary. When tested on 10 gestures recorded from 20 participants, the average recognition accuracy of our system was 88.41%.by Yale Song.S.M

    Automatic acquisition and initialization of articulated models

    No full text
    Tracking, classification and visual analysis of articulated motion is challenging because of the difficulties involved in separating noise and variabilities caused by appearance, size and viewpoint fluctuations from task-relevant variations. By incorporating powerful domain knowledge, model-based approaches are able to overcome these problem to a great extent and are actively explored by many researchers. However, model acquisition, initialization and adaptation are still relatively under-investigated problems, especially for the case of single-camera systems. In this paper, we address the problem of automatic acquisition and initialization of articulated models from monocular video without any prior knowledge of shape and kinematic structure. The framework is applied in a human-computer interaction context where articulated shape models have to be acquired from unknown users for subsequent limb tracking. Bayesian motion segmentation is used to extract and initialize articulated models from visual data. Image sequences are decomposed into rigid components that can undergo parametric motion. The relative motion of these components is used to obtain joint information. The resulting components are assembled into an articulated kinematic model which is then used for visual tracking, eliminating the need for manual initialization or adaptation. The efficacy of the method is demonstrated on synthetic as well as natural image sequences. The accuracy of the joint estimation stage is verified on ground truth data. © Springer-Verlag 2003

    Automatic Acquisition and Initialization of Articulated Models

    No full text
    Tracking, classification and visual analysis of articulated motion is challenging due to the difficulties involved in separating noise and variabilities caused by appearance, size and view point fluctuations from task-relevant variations. By incorporating powerful domain knowledge, model based approaches are able to overcome these problem to a great extent and are actively explored by many researchers. However, model acquisition, initialization and adaptation are still relatively under-investigated problems, especially for the case of single camera systems
    corecore