607 research outputs found

    2.5D multi-view gait recognition based on point cloud registration

    Get PDF
    This paper presents a method for modeling a 2.5-dimensional (2.5D) human body and extracting the gait features for identifying the human subject. To achieve view-invariant gait recognition, a multi-view synthesizing method based on point cloud registration (MVSM) to generate multi-view training galleries is proposed. The concept of a density and curvature-based Color Gait Curvature Image is introduced to map 2.5D data onto a 2D space to enable data dimension reduction by discrete cosine transform and 2D principle component analysis. Gait recognition is achieved via a 2.5D view-invariant gait recognition method based on point cloud registration. Experimental results on the in-house database captured by a Microsoft Kinect camera show a significant performance gain when using MVSM

    Interpretable Transformations with Encoder-Decoder Networks

    Full text link
    Deep feature spaces have the capacity to encode complex transformations of their input data. However, understanding the relative feature-space relationship between two transformed encoded images is difficult. For instance, what is the relative feature space relationship between two rotated images? What is decoded when we interpolate in feature space? Ideally, we want to disentangle confounding factors, such as pose, appearance, and illumination, from object identity. Disentangling these is difficult because they interact in very nonlinear ways. We propose a simple method to construct a deep feature space, with explicitly disentangled representations of several known transformations. A person or algorithm can then manipulate the disentangled representation, for example, to re-render an image with explicit control over parameterized degrees of freedom. The feature space is constructed using a transforming encoder-decoder network with a custom feature transform layer, acting on the hidden representations. We demonstrate the advantages of explicit disentangling on a variety of datasets and transformations, and as an aid for traditional tasks, such as classification.Comment: Accepted at ICCV 201

    Discriminative Appearance Models for Face Alignment

    Get PDF
    The proposed face alignment algorithm uses local gradient features as the appearance representation. These features are obtained by pixel value comparison, which provide robustness against changes in illumination, as well as partial occlusion and local deformation due to the locality. The adopted features are modeled in three discriminative methods, which correspond to different alignment cost functions. The discriminative appearance modeling alleviate the generalization problem to some extent

    SEGMENTATION, RECOGNITION, AND ALIGNMENT OF COLLABORATIVE GROUP MOTION

    Get PDF
    Modeling and recognition of human motion in videos has broad applications in behavioral biometrics, content-based visual data analysis, security and surveillance, as well as designing interactive environments. Significant progress has been made in the past two decades by way of new models, methods, and implementations. In this dissertation, we focus our attention on a relatively less investigated sub-area called collaborative group motion analysis. Collaborative group motions are those that typically involve multiple objects, wherein the motion patterns of individual objects may vary significantly in both space and time, but the collective motion pattern of the ensemble allows characterization in terms of geometry and statistics. Therefore, the motions or activities of an individual object constitute local information. A framework to synthesize all local information into a holistic view, and to explicitly characterize interactions among objects, involves large scale global reasoning, and is of significant complexity. In this dissertation, we first review relevant previous contributions on human motion/activity modeling and recognition, and then propose several approaches to answer a sequence of traditional vision questions including 1) which of the motion elements among all are the ones relevant to a group motion pattern of interest (Segmentation); 2) what is the underlying motion pattern (Recognition); and 3) how two motion ensembles are similar and how we can 'optimally' transform one to match the other (Alignment). Our primary practical scenario is American football play, where the corresponding problems are 1) who are offensive players; 2) what are the offensive strategy they are using; and 3) whether two plays are using the same strategy and how we can remove the spatio-temporal misalignment between them due to internal or external factors. The proposed approaches discard traditional modeling paradigm but explore either concise descriptors, hierarchies, stochastic mechanism, or compact generative model to achieve both effectiveness and efficiency. In particular, the intrinsic geometry of the spaces of the involved features/descriptors/quantities is exploited and statistical tools are established on these nonlinear manifolds. These initial attempts have identified new challenging problems in complex motion analysis, as well as in more general tasks in video dynamics. The insights gained from nonlinear geometric modeling and analysis in this dissertation may hopefully be useful toward a broader class of computer vision applications

    A comprehensive survey on Pose-Invariant Face Recognition

    Full text link
    Β© 2016 ACM. The capacity to recognize faces under varied poses is a fundamental human ability that presents a unique challenge for computer vision systems. Compared to frontal face recognition, which has been intensively studied and has gradually matured in the past few decades, Pose-Invariant Face Recognition (PIFR) remains a largely unsolved problem. However, PIFR is crucial to realizing the full potential of face recognition for real-world applications, since face recognition is intrinsically a passive biometric technology for recognizing uncooperative subjects. In this article, we discuss the inherent difficulties in PIFR and present a comprehensive review of established techniques. Existing PIFR methods can be grouped into four categories, that is, pose-robust feature extraction approaches, multiview subspace learning approaches, face synthesis approaches, and hybrid approaches. The motivations, strategies, pros/cons, and performance of representative approaches are described and compared. Moreover, promising directions for future research are discussed

    3D face structure extraction from images at arbitrary poses and under arbitrary illumination conditions

    Get PDF
    With the advent of 9/11, face detection and recognition is becoming an important tool to be used for securing homeland safety against potential terrorist attacks by tracking and identifying suspects who might be trying to indulge in such activities. It is also a technology that has proven its usefulness for law enforcement agencies by helping identifying or narrowing down a possible suspect from surveillance tape on the crime scene, or quickly by finding a suspect based on description from witnesses.In this thesis we introduce several improvements to morphable model based algorithms and make use of the 3D face structures extracted from multiple images to conduct illumination analysis and face recognition experiments. We present an enhanced Active Appearance Model (AAM), which possesses several sub-models that are independently updated to introduce more model flexibility to achieve better feature localization. Most appearance based models suffer from the unpredictability of facial background, which might result in a bad boundary extraction. To overcome this problem we propose a local projection models that accurately locates face boundary landmarks. We also introduce a novel and unbiased cost function that casts the face alignment as an optimization problem, where shape constraints obtained from direct motion estimation are incorporated to achieve a much higher convergence rate and more accurate alignment. Viewing angles are roughly categorized to four different poses, and the customized view-based AAMs align face images in different specific pose categories. We also attempt at obtaining individual 3D face structures by morphing a 3D generic face model to fit the individual faces. Face contour is dynamically generated so that the morphed face looks realistic. To overcome the correspondence problem between facial feature points on the generic and the individual face, we use an approach based on distance maps. With the extracted 3D face structure we study the illumination effects on the appearance based on the spherical harmonic illumination analysis. By normalizing the illumination conditions on different facial images, we extract a global illumination-invariant texture map, which jointly with the extracted 3D face structure in the form of cubic morphing parameters completely encode an individual face, and allow for the generation of images at arbitrary pose and under arbitrary illumination.Face recognition is conducted based on the face shape matching error, texture error and illumination-normalized texture error. Experiments show that a higher face recognition rate is achieved by compensating for illumination effects. Furthermore, it is observed that the fusion of shape and texture information result in a better performance than using either shape or texture information individually.Ph.D., Electrical Engineering -- Drexel University, 200

    Visual Data Augmentation through Learning

    Get PDF
    The rapid progress in machine learning methods has been empowered by i) huge datasets that have been collected and annotated, ii) improved engineering (e.g. data pre-processing/normalization). The existing datasets typically include several million samples, which constitutes their extension a colossal task. In addition, the state-of-the-art data-driven methods demand a vast amount of data, hence a standard engineering trick employed is artificial data augmentation for instance by adding into the data cropped and (affinely) transformed images. However, this approach does not correspond to any change in the natural 3D scene. We propose instead to perform data augmentation through learning realistic local transformations. We learn a forward and an inverse transformation that maps an image from the high-dimensional space of pixel intensities to a latent space which varies (approximately) linearly with the latent space of a realistically transformed version of the image. Such transformed images can be considered two successive frames in a video. Next, we utilize these transformations to learn a linear model that modifies the latent spaces and then use the inverse transformation to synthesize a new image. We argue that the this procedure produces powerful invariant representations. We perform both qualitative and quantitative experiments that demonstrate our proposed method creates new realistic images

    Feature Extraction Methods for Character Recognition

    Get PDF
    Not Include
    • …
    corecore