32 research outputs found

    Extrinsic Methods for Coding and Dictionary Learning on Grassmann Manifolds

    Get PDF
    Sparsity-based representations have recently led to notable results in various visual recognition tasks. In a separate line of research, Riemannian manifolds have been shown useful for dealing with features and models that do not lie in Euclidean spaces. With the aim of building a bridge between the two realms, we address the problem of sparse coding and dictionary learning over the space of linear subspaces, which form Riemannian structures known as Grassmann manifolds. To this end, we propose to embed Grassmann manifolds into the space of symmetric matrices by an isometric mapping. This in turn enables us to extend two sparse coding schemes to Grassmann manifolds. Furthermore, we propose closed-form solutions for learning a Grassmann dictionary, atom by atom. Lastly, to handle non-linearity in data, we extend the proposed Grassmann sparse coding and dictionary learning algorithms through embedding into Hilbert spaces. Experiments on several classification tasks (gender recognition, gesture classification, scene analysis, face recognition, action recognition and dynamic texture classification) show that the proposed approaches achieve considerable improvements in discrimination accuracy, in comparison to state-of-the-art methods such as kernelized Affine Hull Method and graph-embedding Grassmann discriminant analysis.Comment: Appearing in International Journal of Computer Visio

    Extrinsic methods for coding and dictionary learning on grassmann manifolds

    Get PDF
    Sparsity-based representations have recently led to notable results in various visual recognition tasks. In a separate line of research, Riemannian manifolds have been shown useful for dealing with features and models that do not lie in Euclidean spaces. With the aim of building a bridge between the two realms, we address the problem of sparse coding and dictionary learning in Grassmann manifolds, i.e., the space of linear subspaces. To this end, we propose to embed Grassmann manifolds into the space of symmetric matrices by an isometric mapping. This in turn enables us to extend two sparse coding schemes to Grassmann manifolds. Furthermore, we propose an algorithm for learning a Grassmann dictionary, atom by atom. Lastly, to handle non-linearity in data, we extend the proposed Grassmann sparse coding and dictionary learning algorithms through embedding into higher dimensional Hilbert spaces. Experiments on several classification tasks (gender recognition, gesture classification, scene analysis, face recognition, action recognition and dynamic texture classification) show that the proposed approaches achieve considerable improvements in discrimination accuracy, in comparison to state-of-the-art methods such as kernelized Affine Hull Method and graph-embedding Grassmann discriminant analysis

    Robust subspace learning for static and dynamic affect and behaviour modelling

    Get PDF
    Machine analysis of human affect and behavior in naturalistic contexts has witnessed a growing attention in the last decade from various disciplines ranging from social and cognitive sciences to machine learning and computer vision. Endowing machines with the ability to seamlessly detect, analyze, model, predict as well as simulate and synthesize manifestations of internal emotional and behavioral states in real-world data is deemed essential for the deployment of next-generation, emotionally- and socially-competent human-centered interfaces. In this thesis, we are primarily motivated by the problem of modeling, recognizing and predicting spontaneous expressions of non-verbal human affect and behavior manifested through either low-level facial attributes in static images or high-level semantic events in image sequences. Both visual data and annotations of naturalistic affect and behavior naturally contain noisy measurements of unbounded magnitude at random locations, commonly referred to as ‘outliers’. We present here machine learning methods that are robust to such gross, sparse noise. First, we deal with static analysis of face images, viewing the latter as a superposition of mutually-incoherent, low-complexity components corresponding to facial attributes, such as facial identity, expressions and activation of atomic facial muscle actions. We develop a robust, discriminant dictionary learning framework to extract these components from grossly corrupted training data and combine it with sparse representation to recognize the associated attributes. We demonstrate that our framework can jointly address interrelated classification tasks such as face and facial expression recognition. Inspired by the well-documented importance of the temporal aspect in perceiving affect and behavior, we direct the bulk of our research efforts into continuous-time modeling of dimensional affect and social behavior. Having identified a gap in the literature which is the lack of data containing annotations of social attitudes in continuous time and scale, we first curate a new audio-visual database of multi-party conversations from political debates annotated frame-by-frame in terms of real-valued conflict intensity and use it to conduct the first study on continuous-time conflict intensity estimation. Our experimental findings corroborate previous evidence indicating the inability of existing classifiers in capturing the hidden temporal structures of affective and behavioral displays. We present here a novel dynamic behavior analysis framework which models temporal dynamics in an explicit way, based on the natural assumption that continuous- time annotations of smoothly-varying affect or behavior can be viewed as outputs of a low-complexity linear dynamical system when behavioral cues (features) act as system inputs. A novel robust structured rank minimization framework is proposed to estimate the system parameters in the presence of gross corruptions and partially missing data. Experiments on prediction of dimensional conflict and affect as well as multi-object tracking from detection validate the effectiveness of our predictive framework and demonstrate that for the first time that complex human behavior and affect can be learned and predicted based on small training sets of person(s)-specific observations.Open Acces

    Application of Multivariate Statistical Analysis for the Detection of Structural Changes in the Series of Monitoring Data

    Get PDF
    A new approach to the study of time series by the projection pursuit methods is described. The ideas are illustrated on the time series of the monitoring of the environment and climate: (a) on time series of anomalies of global mean annual temperature -- the main climatological parameter; (b) on time series of atmospheric CO2 concentrations -- the main greenhouse gases; (c) on time series of vegetation index (NDVI) -- the main global characteristic of biota activity on the satellite data. With the aid of the shift operator for time signal, we construct a curve in n-dimensional Euclidian space (shift operator and integer n are the parameters of method). So an analysis of a time series is reduced to the analysis of the most informative projections [for example, by the criterion of factor analysis or spectral analysis (discrete Fourie analysis)] of the corresponding n-dimensional curve. We show that the comparison of such projections for model-test time series with the projection of the time series under investigation gives an effective way of finding the structural changes of the monitoring time series. For example, the case of the Hansen-Lebedeff time series of anomalies of the global mean annual temperature (see Rends 'go), shows that the structure of the series in the interval from 1920 until 1950 essentially differs from the structure on the intervals 1880-1920 and 1950-1987. For the series of CO2 on the Mauna Loa and Barrow monitoring stations, we obtained dynamics of the amplitudes of the year and semi-year cycles. We give the construction of a nonparametric estimation of a model of the initial time series using k-dimensional projection of n-dimensional curve. As a consequence, for example, we found the main components of the CO2 time series and obtained the models of the yearly behaviors of NDVI time series which permit one to carry out statistically stable classification of ecosystems by ecotypes and to describe dynamics of the separate ecosystems. Thus it is proposed a tool for the creation of the statistical description of the current state of the given monitoring series in the form of geometrical image. These geometrical images permit us to analyze the anomalies in the monitoring series in the terms of deviation of these images. As it follows from the examples given below such a method of the analysis of monitoring data is an effective method. Between the theoretical let us stress the following: we show how the methods of the analysis of the time series widely used in statistical treatment of monitoring data could also be used in our approach as the tools of the projections pursuit for comparing the images of the curves of the signal under investigation with the curves of the corresponding signals; it is shown that the proposed approach permits us to join in a united method the achievement of the theory of operators of a generalized shift and exploratory analysis on the basis of the projection pursuit

    マシンビジョンを用いた農業圃場における操縦者安全のための機械学習システムの開発

    Get PDF
    この博士論文は内容の要約のみの公開(または一部非公開)になっています筑波大学 (University of Tsukuba)201

    Learning to process with spikes and to localise pulses

    Get PDF
    In the last few decades, deep learning with artificial neural networks (ANNs) has emerged as one of the most widely used techniques in tasks such as classification and regression, achieving competitive results and in some cases even surpassing human-level performance. Nonetheless, as ANN architectures are optimised towards empirical results and departed from their biological precursors, how exactly human brains process information using these short electrical pulses called spikes remains a mystery. Hence, in this thesis, we explore the problem of learning to process with spikes and to localise pulses. We first consider spiking neural networks (SNNs), a type of ANN that more closely mimic biological neural networks in that neurons communicate with one another using spikes. This unique architecture allows us to look into the role of heterogeneity in learning. Since it is conjectured that the information is encoded by the timing of spikes, we are particularly interested in the heterogeneity of time constants of neurons. We then trained SNNs for classification tasks on a range of visual and auditory neuromorphic datasets, which contain streams of events (spike times) instead of the conventional frame-based data, and show that the overall performance is improved by allowing the neurons to have different time constants, especially on tasks with richer temporal structure. We also find that the learned time constants are distributed similarly to those experimentally observed in some mammalian cells. Besides, we demonstrate that learning with heterogeneity improves robustness against hyperparameter mistuning. These results suggest that heterogeneity may be more than the byproduct of noisy processes and perhaps serves a key role in learning in changing environments, yet heterogeneity has been overlooked in basic artificial models. While neuromorphic datasets, which are often captured by neuromorphic devices that closely model the corresponding biological systems, have enabled us to explore the more biologically plausible SNNs, there still exists a gap in understanding how spike times encode information in actual biological neural networks like human brains, as such data is difficult to acquire due to the trade-off between the timing precision and the number of cells simultaneously recorded electrically. Instead, what we usually obtain is the low-rate discrete samples of trains of filtered spikes. Hence, in the second part of the thesis, we focus on a different type of problem involving pulses, that is to retrieve the precise pulse locations from these low-rate samples. We make use of the finite rate of innovation (FRI) sampling theory, which states that perfect reconstruction is possible for classes of continuous non-bandlimited signals that have a small number of free parameters. However, existing FRI methods break down under very noisy conditions due to the so-called subspace swap event. Thus, we present two novel model-based learning architectures: Deep Unfolded Projected Wirtinger Gradient Descent (Deep Unfolded PWGD) and FRI Encoder-Decoder Network (FRIED-Net). The former is based on the existing iterative denoising algorithm for subspace-based methods, while the latter models directly the relationship between the samples and the locations of the pulses using an autoencoder-like network. Using a stream of K Diracs as an example, we show that both algorithms are able to overcome the breakdown inherent in the existing subspace-based methods. Moreover, we extend our FRIED-Net framework beyond conventional FRI methods by considering when the shape is unknown. We show that the pulse shape can be learned using backpropagation. This coincides with the application of spike detection from real-world calcium imaging data, where we achieve competitive results. Finally, we explore beyond canonical FRI signals and demonstrate that FRIED-Net is able to reconstruct streams of pulses with different shapes.Open Acces

    Overcomplete Dictionary and Deep Learning Approaches to Image and Video Analysis

    Get PDF
    Extracting useful information while ignoring others (e.g. noise, occlusion, lighting) is an essential and challenging data analyzing step for many computer vision tasks such as facial recognition, scene reconstruction, event detection, image restoration, etc. Data analyzing of those tasks can be formulated as a form of matrix decomposition or factorization to separate useful and/or fill in missing information based on sparsity and/or low-rankness of the data. There has been an increasing number of non-convex approaches including conventional matrix norm optimizing and emerging deep learning models. However, it is hard to optimize the ideal l0-norm or learn the deep models directly and efficiently. Motivated from this challenging process, this thesis proposes two sets of approaches: conventional and deep learning based. For conventional approaches, this thesis proposes a novel online non-convex lp-norm based Robust PCA (OLP-RPCA) approach for matrix decomposition, where 0 < p < 1. OLP-RPCA is developed from the offline version LP-RPCA. A robust face recognition framework is also developed from Robust PCA and sparse coding approaches. More importantly, OLP-RPCA method can achieve real-time performance on large-scale data without parallelizing or implementing on a graphics processing unit. We mathematically and empirically show that our OLP-RPCA algorithm is linear in both the sample dimension and the number of samples. The proposed OLP-RPCA and LP-RPCA approaches are evaluated in various applications including Gaussian/non-Gaussian image denoising, face modeling, real-time background subtraction and video inpainting and compared against numerous state-of-the-art methods to demonstrate the robustness of the algorithms. In addition, this thesis proposes a novel Robust lp-norm Singular Value Decomposition (RP-SVD) method for analyzing two-way functional data. The proposed RP-SVD is formulated as an lp-norm based penalized loss minimization problem. The proposed RP-SVD method is evaluated in four applications, i.e. noise and outlier removal, estimation of missing values, structure from motion reconstruction and facial image reconstruction. For deep learning based approaches, this thesis explores the idea of matrix decomposition via Robust Deep Boltzmann Machines (RDBM), an alternative form of Robust Boltzmann Machines, which aiming at dealing with noise and occlusion for face-related applications, particularly. This thesis proposes an extension to texture modeling in the Deep Appearance Models (DAMs) by using RDBM to enhance its robustness against noise and occlusion. The extended model can cope with occlusion and extreme poses when modeling human faces in 2D image reconstruction. This thesis also introduces new fitting algorithms with occlusion awareness through the mask obtained from the RDBM reconstruction. The proposed approach is evaluated in various applications by using challenging face datasets, i.e. Labeled Face Parts in the Wild (LFPW), Helen, EURECOM and AR databases, to demonstrate its robustness and capabilities
    corecore