269 research outputs found

    Concise and Effective Network for 3D Human Modeling from Orthogonal Silhouettes

    Full text link
    In this paper, we revisit the problem of 3D human modeling from two orthogonal silhouettes of individuals (i.e., front and side views). Different from our prior work {\cite{wang2003virtual}}, a supervised learning approach based on \textit{convolutional neural network} (CNN) is investigated to solve the problem by establishing a mapping function that can effectively extract features from two silhouettes and fuse them into coefficients in the shape space of human bodies. A new CNN structure is proposed in our work to exact not only the discriminative features of front and side views and also their mixed features for the mapping function. 3D human models with high accuracy are synthesized from coefficients generated by the mapping function. Existing CNN approaches for 3D human modeling usually learn a large number of parameters (from {8.5M} to {355.4M}) from two binary images. Differently, we investigate a new network architecture and conduct the samples on silhouettes as input. As a consequence, more accurate models can be generated by our network with only {2.4M} coefficients. The training of our network is conducted on samples obtained by augmenting a publicly accessible dataset. Learning transfer by using datasets with a smaller number of scanned models is applied to our network to enable the function of generating results with gender-oriented (or geographical) patterns

    Eigenvector-based Dimensionality Reduction for Human Activity Recognition and Data Classification

    Get PDF
    In the context of appearance-based human motion compression, representation, and recognition, we have proposed a robust framework based on the eigenspace technique. First, the new appearance-based template matching approach which we named Motion Intensity Image for compressing a human motion video into a simple and concise, yet very expressive representation. Second, a learning strategy based on the eigenspace technique is employed for dimensionality reduction using each of PCA and FDA, while providing maximum data variance and maximum class separability, respectively. Third, a new compound eigenspace is introduced for multiple directed motion recognition that takes care also of the possible changes in scale. This method extracts two more features that are used to control the recognition process. A similarity measure, based on Euclidean distance, has been employed for matching dimensionally-reduced testing templates against a projected set of known motions templates. In the stream of nonlinear classification, we have introduced a new eigenvector-based recognition model, built upon the idea of the kernel technique. A practical study on the use of the kernel technique with 18 different functions has been carried out. We have shown in this study how crucial choosing the right kernel function is, for the success of the subsequent linear discrimination in the feature space for a particular problem. Second, building upon the theory of reproducing kernels, we have proposed a new robust nonparametric discriminant analysis approach with kernels. Our proposed technique can efficiently find a nonparametric kernel representation where linear discriminants can perform better. Data classification is achieved by integrating the linear version of the NDA with the kernel mapping. Based on the kernel trick, we have provided a new formulation for Fisher\u27s criterion, defined in terms of the Gram matrix only

    Feature regularization and learning for human activity recognition.

    Get PDF
    Doctoral Degree. University of KwaZulu-Natal, Durban.Feature extraction is an essential component in the design of human activity recognition model. However, relying on extracted features alone for learning often makes the model a suboptimal model. Therefore, this research work seeks to address such potential problem by investigating feature regularization. Feature regularization is used for encapsulating discriminative patterns that are needed for better and efficient model learning. Firstly, a within-class subspace regularization approach is proposed for eigenfeatures extraction and regularization in human activity recognition. In this ap- proach, the within-class subspace is modelled using more eigenvalues from the reliable subspace to obtain a four-parameter modelling scheme. This model enables a better and true estimation of the eigenvalues that are distorted by the small sample size effect. This regularization is done in one piece, thereby avoiding undue complexity of modelling eigenspectrum differently. The whole eigenspace is used for performance evaluation because feature extraction and dimensionality reduction are done at a later stage of the evaluation process. Results show that the proposed approach has better discriminative capacity than several other subspace approaches for human activity recognition. Secondly, with the use of likelihood prior probability, a new regularization scheme that improves the loss function of deep convolutional neural network is proposed. The results obtained from this work demonstrate that a well regularized feature yields better class discrimination in human activity recognition. The major contribution of the thesis is the development of feature extraction strategies for determining discriminative patterns needed for efficient model learning

    Multi-modal Machine Learning in Engineering Design: A Review and Future Directions

    Full text link
    In the rapidly advancing field of multi-modal machine learning (MMML), the convergence of multiple data modalities has the potential to reshape various applications. This paper presents a comprehensive overview of the current state, advancements, and challenges of MMML within the sphere of engineering design. The review begins with a deep dive into five fundamental concepts of MMML:multi-modal information representation, fusion, alignment, translation, and co-learning. Following this, we explore the cutting-edge applications of MMML, placing a particular emphasis on tasks pertinent to engineering design, such as cross-modal synthesis, multi-modal prediction, and cross-modal information retrieval. Through this comprehensive overview, we highlight the inherent challenges in adopting MMML in engineering design, and proffer potential directions for future research. To spur on the continued evolution of MMML in engineering design, we advocate for concentrated efforts to construct extensive multi-modal design datasets, develop effective data-driven MMML techniques tailored to design applications, and enhance the scalability and interpretability of MMML models. MMML models, as the next generation of intelligent design tools, hold a promising future to impact how products are designed

    Shape analysis and description based on the isometric invariances of topological skeletonization

    Get PDF
    ilustracionesIn this dissertation, we explore the problem of how to describe the shape of an object in 2D and 3D with a set of features that are invariant to isometric transformations. We focus to based our approach on the well-known Medial Axis Transform and its topological properties. We aim to study two problems. The first is how to find a shape representation of a segmented object that exhibits rotation, translation, and reflection invariance. The second problem is how to build a machine learning pipeline that uses the isometric invariance of the shape representation to do both classification and retrieval. Our proposed solution demonstrates competitive results compared to state-of-the-art approaches. We based our shape representation on the medial axis transform (MAT), sometimes called the topological skeleton. Accepted and well-studied properties of the medial axis include: homotopy preservation, rotation invariance, mediality, one pixel thickness, and the ability to fully reconstruct the object. These properties make the MAT a suitable input to create shape features; however, several problems arise because not all skeletonization methods satisfy all the above-mentioned properties at the same time. In general, skeletons based on thinning approaches preserve topology but are noise sensitive and do not allow a proper reconstruction. They are also not invariant to rotations. Voronoi skeletons also preserve topology and are rotation invariant, but do not have information about the thickness of the object, making reconstruction impossible. The Voronoi skeleton is an approximation of the real skeleton. The denser the sampling of the boundary, the better the approximation; however, a denser sampling makes the Voronoi diagram more computationally expensive. In contrast, distance transform methods allow the reconstruction of the original object by providing the distance from every pixel in the skeleton to the boundary. Moreover, they exhibit an acceptable degree of the properties listed above, but noise sensitivity remains an issue. Therefore, we selected distance transform medial axis methods as our skeletonization strategy, and focused on creating a new noise-free approach to solve the contour noise problem. To effectively classify an object, or perform any other task with features based on its shape, the descriptor needs to be a normalized, compact form: Φ\Phi should map every shape Ω\Omega to the same vector space Rn\mathrm{R}^{n}. This is not possible with skeletonization methods because the skeletons of different objects have different numbers of branches and different numbers of points, even when they belong to the same category. Consequently, we developed a strategy to extract features from the skeleton through the map Φ\Phi, which we used as an input to a machine learning approach. After developing our method for robust skeletonization, the next step is to use such skeleton into the machine learning pipeline to classify object into previously defined categories. We developed a set of skeletal features that were used as input data to the machine learning architectures. We ran experiments on MPEG7 and ModelNet40 dataset to test our approach in both 2D and 3D. Our experiments show results comparable with the state-of-the-art in shape classification and retrieval. Our experiments also show that our pipeline and our skeletal features exhibit some degree of invariance to isometric transformations. In this study, we sought to design an isometric invariant shape descriptor through robust skeletonization enforced by a feature extraction pipeline that exploits such invariance through a machine learning methodology. We conducted a set of classification and retrieval experiments over well-known benchmarks to validate our proposed method. (Tomado de la fuente)En esta disertación se explora el problema de cómo describir la forma de un objeto en 2D y 3D con un conjunto de características que sean invariantes a transformaciones isométricas. La metodología propuesta en este documento se enfoca en la Transformada del Eje Medio (Medial Axis Transform) y sus propiedades topológicas. Nuestro objetivo es estudiar dos problemas. El primero es encontrar una representación matemática de la forma de un objeto que exhiba invarianza a las operaciones de rotación, translación y reflexión. El segundo problema es como construir un modelo de machine learning que use esas invarianzas para las tareas de clasificación y consulta de objetos a través de su forma. El método propuesto en esta tesis muestra resultados competitivos en comparación con otros métodos del estado del arte. En este trabajo basamos nuestra representación de forma en la transformada del eje medio, a veces llamada esqueleto topológico. Algunas propiedades conocidas y bien estudiadas de la transformada del eje medio son: conservación de la homotopía, invarianza a la rotación, su grosor consiste en un solo pixel (1D), y la habilidad para reconstruir el objeto original a través de ella. Estas propiedades hacen de la transformada del eje medio un punto de partida adecuado para crear características de forma. Sin embargo, en este punto surgen varios problemas dado que no todos los métodos de esqueletización satisfacen, al mismo tiempo, todas las propiedades mencionadas anteriormente. En general, los esqueletos basados en enfoques de erosión morfológica conservan la topología del objeto, pero son sensibles al ruido y no permiten una reconstrucción adecuada. Además, no son invariantes a las rotaciones. Otro método de esqueletización son los esqueletos de Voronoi. Los esqueletos de Voronoi también conservan la topología y son invariantes a la rotación, pero no tienen información sobre el grosor del objeto, lo que hace imposible su reconstrucción. Cuanto más denso sea el muestreo del contorno del objeto, mejor será la aproximación. Sin embargo, un muestreo más denso hace que el diagrama de Voronoi sea más costoso computacionalmente. Por el contrario, los métodos basados en la transformada de la distancia permiten la reconstrucción del objeto original, ya que proporcionan la distancia desde cada píxel del esqueleto hasta su punto más cercano en el contorno. Además, exhiben un grado aceptable de las propiedades enumeradas anteriormente, aunque la sensibilidad al ruido sigue siendo un problema. Por lo tanto, en este documento seleccionamos los métodos basados en la transformada de la distancia como nuestra estrategia de esqueletización, y nos enfocamos en crear un nuevo enfoque que resuelva el problema del ruido en el contorno. Para clasificar eficazmente un objeto o realizar cualquier otra tarea con características basadas en su forma, el descriptor debe ser compacto y estar normalizado: Φ\Phi debe relacionar cada forma Ω\Omega al mismo espacio vectorial Rn\mathrm{R}^{n}. Esto no es posible con los métodos de esqueletización en el estado del arte, porque los esqueletos de diferentes objetos tienen diferentes números de ramas y diferentes números de puntos incluso cuando pertenecen a la misma categoría. Consecuentemente, en nuestra propuesta desarrollamos una estrategia para extraer características del esqueleto a través de la función Φ\Phi, que usamos como entrada para un enfoque de aprendizaje automático. % TODO completar con resultados. Después de desarrollar nuestro método de esqueletización robusta, el siguiente paso es usar dicho esqueleto en un modelo de aprendizaje de máquina para clasificar el objeto en categorías previamente definidas. Para ello se desarrolló un conjunto de características basadas en el eje medio que se utilizaron como datos de entrada para la arquitectura de aprendizaje automático. Realizamos experimentos en los conjuntos de datos: MPEG7 y ModelNet40 para probar nuestro enfoque tanto en 2D como en 3D. Nuestros experimentos muestran resultados comparables con el estado del arte en clasificación y consulta de formas (retrieval). Nuestros experimentos también muestran que el modelo desarrollado junto con nuestras características basadas en el eje medio son invariantes a las transformaciones isométricas. (Tomado de la fuente)Beca para Doctorados Nacionales de Colciencias, convocatoria 725 de 2015DoctoradoDoctor en IngenieríaVisión por computadora y aprendizaje automátic

    Sparse representation frameworks for inference problems in visual sensor networks

    Get PDF
    Visual sensor networks (VSNs) form a new research area that merges computer vision and sensor networks. VSNs consist of small visual sensor nodes called camera nodes, which integrate an image sensor, an embedded processor, and a wireless transceiver. Having multiple cameras in a wireless network poses unique and challenging problems that do not exist either in computer vision or in sensor networks. Due to the resource constraints of the camera nodes, such as battery power and bandwidth, it is crucial to perform data processing and collaboration efficiently. This thesis presents a number of sparse-representation based methods to be used in the context of surveillance tasks in VSNs. Performing surveillance tasks, such as tracking, recognition, etc., in a communication-constrained VSN environment is extremely challenging. Compressed sensing is a technique for acquiring and reconstructing a signal from small amount of measurements utilizing the prior knowledge that the signal has a sparse representation in a proper space. The ability of sparse representation tools to reconstruct signals from small amount of observations fits well with the limitations in VSNs for processing, communication, and collaboration. Hence, this thesis presents novel sparsity-driven methods that can be used in action recognition and human tracking applications in VSNs. A sparsity-driven action recognition method is proposed by casting the classification problem as an optimization problem. We solve the optimization problem by enforcing sparsity through Å‚1 regularization and perform action recognition. We have demonstrated the superiority of our method when observations are low-resolution, occluded, and noisy. To the best of our knowledge, this is the first action recognition method that uses sparse representation. In addition, we have proposed an adaptation of this method for VSN resource constraints. We have also performed an analysis of the role of sparsity in classi cation for two different action recognition problems. We have proposed a feature compression framework for human tracking applications in visual sensor networks. In this framework, we perform decentralized tracking: each camera extracts useful features from the images it has observed and sends them to a fusion node which collects the multi-view image features and performs tracking. In tracking, extracting features usually results a likelihood function. To reduce communication in the network, we compress the likelihoods by first splitting them into blocks, and then transforming each block to a proper domain and taking only the most significant coefficients in this representation. To the best of our knowledge, compression of features computed in the context of tracking in a VSN has not been proposed in previous works. We have applied our method for indoor and outdoor tracking scenarios. Experimental results show that our approach can save up to 99.6% of the bandwidth compared to centralized approaches that compress raw images to decrease the communication. We have also shown that our approach outperforms existing decentralized approaches. Furthermore, we have extended this tracking framework and proposed a sparsitydriven approach for human tracking in VSNs. We have designed special overcomplete dictionaries that exploit the specific known geometry of the measurement scenario and used these dictionaries for sparse representation of likelihoods. By obtaining dictionaries that match the structure of the likelihood functions, we can represent likelihoods with few coefficients, and thereby decrease the communication in the network. This is the first method in the literature that uses sparse representation to compress likelihood functions and applies this idea for VSNs. We have tested our approach for indoor and outdoor tracking scenarios and demonstrated that our approach can achieve bandwidth reduction better than our feature compression framework. We have also presented that our approach outperforms existing decentralized and distributed approaches

    Multi-Modality Human Action Recognition

    Get PDF
    Human action recognition is very useful in many applications in various areas, e.g. video surveillance, HCI (Human computer interaction), video retrieval, gaming and security. Recently, human action recognition becomes an active research topic in computer vision and pattern recognition. A number of action recognition approaches have been proposed. However, most of the approaches are designed on the RGB images sequences, where the action data was collected by RGB/intensity camera. Thus the recognition performance is usually related to various occlusion, background, and lighting conditions of the image sequences. If more information can be provided along with the image sequences, more data sources other than the RGB video can be utilized, human actions could be better represented and recognized by the designed computer vision system.;In this dissertation, the multi-modality human action recognition is studied. On one hand, we introduce the study of multi-spectral action recognition, which involves the information from different spectrum beyond visible, e.g. infrared and near infrared. Action recognition in individual spectra is explored and new methods are proposed. Then the cross-spectral action recognition is also investigated and novel approaches are proposed in our work. On the other hand, since the depth imaging technology has made a significant progress recently, where depth information can be captured simultaneously with the RGB videos. The depth-based human action recognition is also investigated. I first propose a method combining different type of depth data to recognize human actions. Then a thorough evaluation is conducted on spatiotemporal interest point (STIP) based features for depth-based action recognition. Finally, I advocate the study of fusing different features for depth-based action analysis. Moreover, human depression recognition is studied by combining facial appearance model as well as facial dynamic model

    Simple and Complex Human Action Recognition in Constrained and Unconstrained Videos

    Get PDF
    Human action recognition plays a crucial role in visual learning applications such as video understanding and surveillance, video retrieval, human-computer interactions, and autonomous driving systems. A variety of methodologies have been proposed for human action recognition via developing of low-level features along with the bag-of-visual-word models. However, much less research has been performed on the compound of pre-processing, encoding and classification stages. This dissertation focuses on enhancing the action recognition performances via ensemble learning, hybrid classifier, hierarchical feature representation, and key action perception methodologies. Action variation is one of the crucial challenges in video analysis and action recognition. We address this problem by proposing the hybrid classifier (HC) to discriminate actions which contain similar forms of motion features such as walking, running, and jogging. Aside from that, we show and proof that the fusion of various appearance-based and motion features can boost the simple and complex action recognition performance. The next part of the dissertation introduces pooled-feature representation (PFR) which is derived from a double phase encoding framework (DPE). Considering that a given unconstrained video is composed of a sequence of simple frames, the first phase of DPE generates temporal sub-volumes from the video and represents them individually by employing the proposed improved rank pooling (IRP) method. The second phase constructs the pool of features by fusing the represented vectors from the first phase. The pool is compressed and then encoded to provide video-parts vector (VPV). The DPE framework allows distilling the video representation and hierarchically extracting new information. Compared with recent video encoding approaches, VPV can preserve the higher-level information through standard encoding of low-level features in two phases. Furthermore, the encoded vectors from both phases of DPE are fused along with a compression stage to develop PFR
    • …
    corecore