599 research outputs found

    Head Pose Estimation via Manifold Learning

    Get PDF
    For the last decades, manifold learning has shown its advantage of efficient non-linear dimensionality reduction in data analysis. Based on the assumption that informative and discriminative representation of the data lies on a low-dimensional smooth manifold which implicitly embedded in the original high-dimensional space, manifold learning aims to learn the low-dimensional representation following some geometrical protocols, such as preserving piecewise local structure of the original data. Manifold learning also plays an important role in the applications of computer vision, i.e., face image analysis. According to the observations that many face-related research is benefitted by the head pose estimation, and the continuous variation of head pose can be modelled and interpreted as a low-dimensional smooth manifold, we will focus on the head pose estimation via manifold learning in this chapter. Generally, head pose is hard to directly explore from the high-dimensional space interpreted as face images, which is, however, can be efficiently represented in low-dimensional manifold. Therefore, in this chapter, classical manifold learning algorithms are introduced and the corresponding application on head pose estimation are elaborated. Several extensions of manifold learning algorithms which are developed especially for head pose estimation are also discussed and compared

    Learning to Transform Time Series with a Few Examples

    Get PDF
    We describe a semi-supervised regression algorithm that learns to transform one time series into another time series given examples of the transformation. This algorithm is applied to tracking, where a time series of observations from sensors is transformed to a time series describing the pose of a target. Instead of defining and implementing such transformations for each tracking task separately, our algorithm learns a memoryless transformation of time series from a few example input-output mappings. The algorithm searches for a smooth function that fits the training examples and, when applied to the input time series, produces a time series that evolves according to assumed dynamics. The learning procedure is fast and lends itself to a closed-form solution. It is closely related to nonlinear system identification and manifold learning techniques. We demonstrate our algorithm on the tasks of tracking RFID tags from signal strength measurements, recovering the pose of rigid objects, deformable bodies, and articulated bodies from video sequences. For these tasks, this algorithm requires significantly fewer examples compared to fully-supervised regression algorithms or semi-supervised learning algorithms that do not take the dynamics of the output time series into account

    Multi-view Learning as a Nonparametric Nonlinear Inter-Battery Factor Analysis

    Get PDF
    Factor analysis aims to determine latent factors, or traits, which summarize a given data set. Inter-battery factor analysis extends this notion to multiple views of the data. In this paper we show how a nonlinear, nonparametric version of these models can be recovered through the Gaussian process latent variable model. This gives us a flexible formalism for multi-view learning where the latent variables can be used both for exploratory purposes and for learning representations that enable efficient inference for ambiguous estimation tasks. Learning is performed in a Bayesian manner through the formulation of a variational compression scheme which gives a rigorous lower bound on the log likelihood. Our Bayesian framework provides strong regularization during training, allowing the structure of the latent space to be determined efficiently and automatically. We demonstrate this by producing the first (to our knowledge) published results of learning from dozens of views, even when data is scarce. We further show experimental results on several different types of multi-view data sets and for different kinds of tasks, including exploratory data analysis, generation, ambiguity modelling through latent priors and classification.Comment: 49 pages including appendi

    A framework of face recognition with set of testing images

    Get PDF
    We propose a novel framework to solve the face recognition problem base on set of testing images. Our framework can handle the case that no pose overlap between training set and query set. The main techniques used in this framework are manifold alignment, face normalization and discriminant learning. Experiments on different databases show our system outperforms some state of the art methods

    Deep Learning for Head Pose Estimation: A Survey

    Get PDF
    Head pose estimation (HPE) is an active and popular area of research. Over the years, many approaches have constantly been developed, leading to a progressive improvement in accuracy; nevertheless, head pose estimation remains an open research topic, especially in unconstrained environments. In this paper, we will review the increasing amount of available datasets and the modern methodologies used to estimate orientation, with a special attention to deep learning techniques. We will discuss the evolution of the feld by proposing a classifcation of head pose estimation methods, explaining their advantages and disadvantages, and highlighting the diferent ways deep learning techniques have been used in the context of HPE. An in-depth performance comparison and discussion is presented at the end of the work. We also highlight the most promising research directions for future investigations on the topic

    Studies on dimension reduction and feature spaces :

    Get PDF
    Today's world produces and stores huge amounts of data, which calls for methods that can tackle both growing sizes and growing dimensionalities of data sets. Dimension reduction aims at answering the challenges posed by the latter. Many dimension reduction methods consist of a metric transformation part followed by optimization of a cost function. Several classes of cost functions have been developed and studied, while metrics have received less attention. We promote the view that metrics should be lifted to a more independent role in dimension reduction research. The subject of this work is the interaction of metrics with dimension reduction. The work is built on a series of studies on current topics in dimension reduction and neural network research. Neural networks are used both as a tool and as a target for dimension reduction. When the results of modeling or clustering are represented as a metric, they can be studied using dimension reduction, or they can be used to introduce new properties into a dimension reduction method. We give two examples of such use: visualizing results of hierarchical clustering, and creating supervised variants of existing dimension reduction methods by using a metric that is built on the feature space of a neural network. Combining clustering with dimension reduction results in a novel way for creating space-efficient visualizations, that tell both about hierarchical structure and about distances of clusters. We study feature spaces used in a recently developed neural network architecture called extreme learning machine. We give a novel interpretation for such neural networks, and recognize the need to parameterize extreme learning machines with the variance of network weights. This has practical implications for use of extreme learning machines, since the current practice emphasizes the role of hidden units and ignores the variance. A current trend in the research of deep neural networks is to use cost functions from dimension reduction methods to train the network for supervised dimension reduction. We show that equally good results can be obtained by training a bottlenecked neural network for classification or regression, which is faster than using a dimension reduction cost. We demonstrate that, contrary to the current belief, using sparse distance matrices for creating fast dimension reduction methods is feasible, if a proper balance between short-distance and long-distance entries in the sparse matrix is maintained. This observation opens up a promising research direction, with possibility to use modern dimension reduction methods on much larger data sets than which are manageable today

    Gaussian processes for modeling of facial expressions

    Get PDF
    Automated analysis of facial expressions has been gaining significant attention over the past years. This stems from the fact that it constitutes the primal step toward developing some of the next-generation computer technologies that can make an impact in many domains, ranging from medical imaging and health assessment to marketing and education. No matter the target application, the need to deploy systems under demanding, real-world conditions that can generalize well across the population is urgent. Hence, careful consideration of numerous factors has to be taken prior to designing such a system. The work presented in this thesis focuses on tackling two important problems in automated analysis of facial expressions: (i) view-invariant facial expression analysis; (ii) modeling of the structural patterns in the face, in terms of well coordinated facial muscle movements. Driven by the necessity for efficient and accurate inference mechanisms we explore machine learning techniques based on the probabilistic framework of Gaussian processes (GPs). Our ultimate goal is to design powerful models that can efficiently handle imagery with spontaneously displayed facial expressions, and explain in detail the complex configurations behind the human face in real-world situations. To effectively decouple the head pose and expression in the presence of large out-of-plane head rotations we introduce a manifold learning approach based on multi-view learning strategies. Contrary to the majority of existing methods that typically treat the numerous poses as individual problems, in this model we first learn a discriminative manifold shared by multiple views of a facial expression. Subsequently, we perform facial expression classification in the expression manifold. Hence, the pose normalization problem is solved by aligning the facial expressions from different poses in a common latent space. We demonstrate that the recovered manifold can efficiently generalize to various poses and expressions even from a small amount of training data, while also being largely robust to corrupted image features due to illumination variations. State-of-the-art performance is achieved in the task of facial expression classification of basic emotions. The methods that we propose for learning the structure in the configuration of the muscle movements represent some of the first attempts in the field of analysis and intensity estimation of facial expressions. In these models, we extend our multi-view approach to exploit relationships not only in the input features but also in the multi-output labels. The structure of the outputs is imposed into the recovered manifold either from heuristically defined hard constraints, or in an auto-encoded manner, where the structure is learned automatically from the input data. The resulting models are proven to be robust to data with imbalanced expression categories, due to our proposed Bayesian learning of the target manifold. We also propose a novel regression approach based on product of GP experts where we take into account people's individual expressiveness in order to adapt the learned models on each subject. We demonstrate the superior performance of our proposed models on the task of facial expression recognition and intensity estimation.Open Acces

    Landmark Localization, Feature Matching and Biomarker Discovery from Magnetic Resonance Images

    Get PDF
    The work presented in this thesis proposes several methods that can be roughly divided into three different categories: I) landmark localization in medical images, II) feature matching for image registration, and III) biomarker discovery in neuroimaging. The first part deals with the identification of anatomical landmarks. The motivation stems from the fact that the manual identification and labeling of these landmarks is very time consuming and prone to observer errors, especially when large datasets must be analyzed. In this thesis we present three methods to tackle this challenge: A landmark descriptor based on local self-similarities (SS), a subspace building framework based on manifold learning and a sparse coding landmark descriptor based on data-specific learned dictionary basis. The second part of this thesis deals with finding matching features between a pair of images. These matches can be used to perform a registration between them. Registration is a powerful tool that allows mapping images in a common space in order to aid in their analysis. Accurate registration can be challenging to achieve using intensity based registration algorithms. Here, a framework is proposed for learning correspondences in pairs of images by matching SS features and random sample and consensus (RANSAC) is employed as a robust model estimator to learn a deformation model based on feature matches. Finally, the third part of the thesis deals with biomarker discovery using machine learning. In this section a framework for feature extraction from learned low-dimensional subspaces that represent inter-subject variability is proposed. The manifold subspace is built using data-driven regions of interest (ROI). These regions are learned via sparse regression, with stability selection. Also, probabilistic distribution models for different stages in the disease trajectory are estimated for different class populations in the low-dimensional manifold and used to construct a probabilistic scoring function.Open Acces

    Dimensionality reduction and sparse representations in computer vision

    Get PDF
    The proliferation of camera equipped devices, such as netbooks, smartphones and game stations, has led to a significant increase in the production of visual content. This visual information could be used for understanding the environment and offering a natural interface between the users and their surroundings. However, the massive amounts of data and the high computational cost associated with them, encumbers the transfer of sophisticated vision algorithms to real life systems, especially ones that exhibit resource limitations such as restrictions in available memory, processing power and bandwidth. One approach for tackling these issues is to generate compact and descriptive representations of image data by exploiting inherent redundancies. We propose the investigation of dimensionality reduction and sparse representations in order to accomplish this task. In dimensionality reduction, the aim is to reduce the dimensions of the space where image data reside in order to allow resource constrained systems to handle them and, ideally, provide a more insightful description. This goal is achieved by exploiting the inherent redundancies that many classes of images, such as faces under different illumination conditions and objects from different viewpoints, exhibit. We explore the description of natural images by low dimensional non-linear models called image manifolds and investigate the performance of computer vision tasks such as recognition and classification using these low dimensional models. In addition to dimensionality reduction, we study a novel approach in representing images as a sparse linear combination of dictionary examples. We investigate how sparse image representations can be used for a variety of tasks including low level image modeling and higher level semantic information extraction. Using tools from dimensionality reduction and sparse representation, we propose the application of these methods in three hierarchical image layers, namely low-level features, mid-level structures and high-level attributes. Low level features are image descriptors that can be extracted directly from the raw image pixels and include pixel intensities, histograms, and gradients. In the first part of this work, we explore how various techniques in dimensionality reduction, ranging from traditional image compression to the recently proposed Random Projections method, affect the performance of computer vision algorithms such as face detection and face recognition. In addition, we discuss a method that is able to increase the spatial resolution of a single image, without using any training examples, according to the sparse representations framework. In the second part, we explore mid-level structures, including image manifolds and sparse models, produced by abstracting information from low-level features and offer compact modeling of high dimensional data. We propose novel techniques for generating more descriptive image representations and investigate their application in face recognition and object tracking. In the third part of this work, we propose the investigation of a novel framework for representing the semantic contents of images. This framework employs high level semantic attributes that aim to bridge the gap between the visual information of an image and its textual description by utilizing low level features and mid level structures. This innovative paradigm offers revolutionary possibilities including recognizing the category of an object from purely textual information without providing any explicit visual example
    • …
    corecore