1,134 research outputs found

    Sparse Coding on Symmetric Positive Definite Manifolds using Bregman Divergences

    Full text link
    This paper introduces sparse coding and dictionary learning for Symmetric Positive Definite (SPD) matrices, which are often used in machine learning, computer vision and related areas. Unlike traditional sparse coding schemes that work in vector spaces, in this paper we discuss how SPD matrices can be described by sparse combination of dictionary atoms, where the atoms are also SPD matrices. We propose to seek sparse coding by embedding the space of SPD matrices into Hilbert spaces through two types of Bregman matrix divergences. This not only leads to an efficient way of performing sparse coding, but also an online and iterative scheme for dictionary learning. We apply the proposed methods to several computer vision tasks where images are represented by region covariance matrices. Our proposed algorithms outperform state-of-the-art methods on a wide range of classification tasks, including face recognition, action recognition, material classification and texture categorization

    Indexing and Retrieval of 3D Articulated Geometry Models

    Get PDF
    In this PhD research study, we focus on building a content-based search engine for 3D articulated geometry models. 3D models are essential components in nowadays graphic applications, and are widely used in the game, animation and movies production industry. With the increasing number of these models, a search engine not only provides an entrance to explore such a huge dataset, it also facilitates sharing and reusing among different users. In general, it reduces production costs and time to develop these 3D models. Though a lot of retrieval systems have been proposed in recent years, search engines for 3D articulated geometry models are still in their infancies. Among all the works that we have surveyed, reliability and efficiency are the two main issues that hinder the popularity of such systems. In this research, we have focused our attention mainly to address these two issues. We have discovered that most existing works design features and matching algorithms in order to reflect the intrinsic properties of these 3D models. For instance, to handle 3D articulated geometry models, it is common to extract skeletons and use graph matching algorithms to compute the similarity. However, since this kind of feature representation is complex, it leads to high complexity of the matching algorithms. As an example, sub-graph isomorphism can be NP-hard for model graph matching. Our solution is based on the understanding that skeletal matching seeks correspondences between the two comparing models. If we can define descriptive features, the correspondence problem can be solved by bag-based matching where fast algorithms are available. In the first part of the research, we propose a feature extraction algorithm to extract such descriptive features. We then convert the skeletal matching problems into bag-based matching. We further define metric similarity measure so as to support fast search. We demonstrate the advantages of this idea in our experiments. The improvement on precision is 12\% better at high recall. The indexing search of 3D model is 24 times faster than the state of the art if only the first relevant result is returned. However, improving the quality of descriptive features pays the price of high dimensionality. Curse of dimensionality is a notorious problem on large multimedia databases. The computation time scales exponentially as the dimension increases, and indexing techniques may not be useful in such situation. In the second part of the research, we focus ourselves on developing an embedding retrieval framework to solve the high dimensionality problem. We first argue that our proposed matching method projects 3D models on manifolds. We then use manifold learning technique to reduce dimensionality and maximize intra-class distances. We further propose a numerical method to sub-sample and fast search databases. To preserve retrieval accuracy using fewer landmark objects, we propose an alignment method which is also beneficial to existing works for fast search. The advantages of the retrieval framework are demonstrated in our experiments that it alleviates the problem of curse of dimensionality. It also improves the efficiency (3.4 times faster) and accuracy (30\% more accurate) of our matching algorithm proposed above. In the third part of the research, we also study a closely related area, 3D motions. 3D motions are captured by sticking sensor on human beings. These captured data are real human motions that are used to animate 3D articulated geometry models. Creating realistic 3D motions is an expensive and tedious task. Although 3D motions are very different from 3D articulated geometry models, we observe that existing works also suffer from the problem of temporal structure matching. This also leads to low efficiency in the matching algorithms. We apply the same idea of bag-based matching into the work of 3D motions. From our experiments, the proposed method has a 13\% improvement on precision at high recall and is 12 times faster than existing works. As a summary, we have developed algorithms for 3D articulated geometry models and 3D motions, covering feature extraction, feature matching, indexing and fast search methods. Through various experiments, our idea of converting restricted matching to bag-based matching improves matching efficiency and reliability. These have been shown in both 3D articulated geometry models and 3D motions. We have also connected 3D matching to the area of manifold learning. The embedding retrieval framework not only improves efficiency and accuracy, but has also opened a new area of research

    Fast Adaptive Reparametrization (FAR) with Application to Human Action Recognition

    Get PDF
    In this paper, a fast approach for curve reparametrization, called Fast Adaptive Reparamterization (FAR), is introduced. Instead of computing an optimal matching between two curves such as Dynamic Time Warping (DTW) and elastic distance-based approaches, our method is applied to each curve independently, leading to linear computational complexity. It is based on a simple replacement of the curve parameter by a variable invariant under specific variations of reparametrization. The choice of this variable is heuristically made according to the application of interest. In addition to being fast, the proposed reparametrization can be applied not only to curves observed in Euclidean spaces but also to feature curves living in Riemannian spaces. To validate our approach, we apply it to the scenario of human action recognition using curves living in the Riemannian product Special Euclidean space SE(3) n. The obtained results on three benchmarks for human action recognition (MSRAction3D, Florence3D, and UTKinect) show that our approach competes with state-of-the-art methods in terms of accuracy and computational cost

    Statistical and Dynamical Modeling of Riemannian Trajectories with Application to Human Movement Analysis

    Get PDF
    abstract: The data explosion in the past decade is in part due to the widespread use of rich sensors that measure various physical phenomenon -- gyroscopes that measure orientation in phones and fitness devices, the Microsoft Kinect which measures depth information, etc. A typical application requires inferring the underlying physical phenomenon from data, which is done using machine learning. A fundamental assumption in training models is that the data is Euclidean, i.e. the metric is the standard Euclidean distance governed by the L-2 norm. However in many cases this assumption is violated, when the data lies on non Euclidean spaces such as Riemannian manifolds. While the underlying geometry accounts for the non-linearity, accurate analysis of human activity also requires temporal information to be taken into account. Human movement has a natural interpretation as a trajectory on the underlying feature manifold, as it evolves smoothly in time. A commonly occurring theme in many emerging problems is the need to \emph{represent, compare, and manipulate} such trajectories in a manner that respects the geometric constraints. This dissertation is a comprehensive treatise on modeling Riemannian trajectories to understand and exploit their statistical and dynamical properties. Such properties allow us to formulate novel representations for Riemannian trajectories. For example, the physical constraints on human movement are rarely considered, which results in an unnecessarily large space of features, making search, classification and other applications more complicated. Exploiting statistical properties can help us understand the \emph{true} space of such trajectories. In applications such as stroke rehabilitation where there is a need to differentiate between very similar kinds of movement, dynamical properties can be much more effective. In this regard, we propose a generalization to the Lyapunov exponent to Riemannian manifolds and show its effectiveness for human activity analysis. The theory developed in this thesis naturally leads to several benefits in areas such as data mining, compression, dimensionality reduction, classification, and regression.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

    An original framework for understanding human actions and body language by using deep neural networks

    Get PDF
    The evolution of both fields of Computer Vision (CV) and Artificial Neural Networks (ANNs) has allowed the development of efficient automatic systems for the analysis of people's behaviour. By studying hand movements it is possible to recognize gestures, often used by people to communicate information in a non-verbal way. These gestures can also be used to control or interact with devices without physically touching them. In particular, sign language and semaphoric hand gestures are the two foremost areas of interest due to their importance in Human-Human Communication (HHC) and Human-Computer Interaction (HCI), respectively. While the processing of body movements play a key role in the action recognition and affective computing fields. The former is essential to understand how people act in an environment, while the latter tries to interpret people's emotions based on their poses and movements; both are essential tasks in many computer vision applications, including event recognition, and video surveillance. In this Ph.D. thesis, an original framework for understanding Actions and body language is presented. The framework is composed of three main modules: in the first one, a Long Short Term Memory Recurrent Neural Networks (LSTM-RNNs) based method for the Recognition of Sign Language and Semaphoric Hand Gestures is proposed; the second module presents a solution based on 2D skeleton and two-branch stacked LSTM-RNNs for action recognition in video sequences; finally, in the last module, a solution for basic non-acted emotion recognition by using 3D skeleton and Deep Neural Networks (DNNs) is provided. The performances of RNN-LSTMs are explored in depth, due to their ability to model the long term contextual information of temporal sequences, making them suitable for analysing body movements. All the modules were tested by using challenging datasets, well known in the state of the art, showing remarkable results compared to the current literature methods

    Joint optimization of manifold learning and sparse representations for face and gesture analysis

    Get PDF
    Face and gesture understanding algorithms are powerful enablers in intelligent vision systems for surveillance, security, entertainment, and smart spaces. In the future, complex networks of sensors and cameras may disperse directions to lost tourists, perform directory lookups in the office lobby, or contact the proper authorities in case of an emergency. To be effective, these systems will need to embrace human subtleties while interacting with people in their natural conditions. Computer vision and machine learning techniques have recently become adept at solving face and gesture tasks using posed datasets in controlled conditions. However, spontaneous human behavior under unconstrained conditions, or in the wild, is more complex and is subject to considerable variability from one person to the next. Uncontrolled conditions such as lighting, resolution, noise, occlusions, pose, and temporal variations complicate the matter further. This thesis advances the field of face and gesture analysis by introducing a new machine learning framework based upon dimensionality reduction and sparse representations that is shown to be robust in posed as well as natural conditions. Dimensionality reduction methods take complex objects, such as facial images, and attempt to learn lower dimensional representations embedded in the higher dimensional data. These alternate feature spaces are computationally more efficient and often more discriminative. The performance of various dimensionality reduction methods on geometric and appearance based facial attributes are studied leading to robust facial pose and expression recognition models. The parsimonious nature of sparse representations (SR) has successfully been exploited for the development of highly accurate classifiers for various applications. Despite the successes of SR techniques, large dictionaries and high dimensional data can make these classifiers computationally demanding. Further, sparse classifiers are subject to the adverse effects of a phenomenon known as coefficient contamination, where for example variations in pose may affect identity and expression recognition. This thesis analyzes the interaction between dimensionality reduction and sparse representations to present a unified sparse representation classification framework that addresses both issues of computational complexity and coefficient contamination. Semi-supervised dimensionality reduction is shown to mitigate the coefficient contamination problems associated with SR classifiers. The combination of semi-supervised dimensionality reduction with SR systems forms the cornerstone for a new face and gesture framework called Manifold based Sparse Representations (MSR). MSR is shown to deliver state-of-the-art facial understanding capabilities. To demonstrate the applicability of MSR to new domains, MSR is expanded to include temporal dynamics. The joint optimization of dimensionality reduction and SRs for classification purposes is a relatively new field. The combination of both concepts into a single objective function produce a relation that is neither convex, nor directly solvable. This thesis studies this problem to introduce a new jointly optimized framework. This framework, termed LGE-KSVD, utilizes variants of Linear extension of Graph Embedding (LGE) along with modified K-SVD dictionary learning to jointly learn the dimensionality reduction matrix, sparse representation dictionary, sparse coefficients, and sparsity-based classifier. By injecting LGE concepts directly into the K-SVD learning procedure, this research removes the support constraints K-SVD imparts on dictionary element discovery. Results are shown for facial recognition, facial expression recognition, human activity analysis, and with the addition of a concept called active difference signatures, delivers robust gesture recognition from Kinect or similar depth cameras

    Machine learning approaches to video activity recognition: from computer vision to signal processing

    Get PDF
    244 p.La investigación presentada se centra en técnicas de clasificación para dos tareas diferentes, aunque relacionadas, de tal forma que la segunda puede ser considerada parte de la primera: el reconocimiento de acciones humanas en vídeos y el reconocimiento de lengua de signos.En la primera parte, la hipótesis de partida es que la transformación de las señales de un vídeo mediante el algoritmo de Patrones Espaciales Comunes (CSP por sus siglas en inglés, comúnmente utilizado en sistemas de Electroencefalografía) puede dar lugar a nuevas características que serán útiles para la posterior clasificación de los vídeos mediante clasificadores supervisados. Se han realizado diferentes experimentos en varias bases de datos, incluyendo una creada durante esta investigación desde el punto de vista de un robot humanoide, con la intención de implementar el sistema de reconocimiento desarrollado para mejorar la interacción humano-robot.En la segunda parte, las técnicas desarrolladas anteriormente se han aplicado al reconocimiento de lengua de signos, pero además de ello se propone un método basado en la descomposición de los signos para realizar el reconocimiento de los mismos, añadiendo la posibilidad de una mejor explicabilidad. El objetivo final es desarrollar un tutor de lengua de signos capaz de guiar a los usuarios en el proceso de aprendizaje, dándoles a conocer los errores que cometen y el motivo de dichos errores
    corecore