11 research outputs found

    Latent Dirichlet Allocation Uncovers Spectral Characteristics of Drought Stressed Plants

    Full text link
    Understanding the adaptation process of plants to drought stress is essential in improving management practices, breeding strategies as well as engineering viable crops for a sustainable agriculture in the coming decades. Hyper-spectral imaging provides a particularly promising approach to gain such understanding since it allows to discover non-destructively spectral characteristics of plants governed primarily by scattering and absorption characteristics of the leaf internal structure and biochemical constituents. Several drought stress indices have been derived using hyper-spectral imaging. However, they are typically based on few hyper-spectral images only, rely on interpretations of experts, and consider few wavelengths only. In this study, we present the first data-driven approach to discovering spectral drought stress indices, treating it as an unsupervised labeling problem at massive scale. To make use of short range dependencies of spectral wavelengths, we develop an online variational Bayes algorithm for latent Dirichlet allocation with convolved Dirichlet regularizer. This approach scales to massive datasets and, hence, provides a more objective complement to plant physiological practices. The spectral topics found conform to plant physiological knowledge and can be computed in a fraction of the time compared to existing LDA approaches.Comment: Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012

    Bivariate Functional Archetypoid Analysis: An Application to Financial Time Series

    Get PDF
    Treball de Fi de MĂ ster Universitari en MatemĂ tica Computacional (Pla de 2013). Codi: SIQ027. Curs 2016-2017Archetype Analysis (AA) is a statistical technique that describes individuals of a sample as a convex combination of certain number of elements called Archetypes, which in turn, are convex combinations of the individuals in the sample. For it's part, Archetypoid Analysis (ADA) tries to represent each individual as a convex combination of a certain number of extreme subjects called Archetypoids. It is possible to apply these techniques to functional data applying a basis expansion function and performing AA or ADA to the weighted coe cients in the basis. This document presents an application of Functional Archetypoids Analysis (FADA) to nancial time series. The starting time series consists of daily equity prices of the SP500 stocks. From it, measures of volatility and pro tability are generated in order to characterize listed companies. These variables are converted into functional data through a Fourier basis expansion function and bivariate FADA is applied. By representing subjects through extreme cases, this analysis facilitates the understanding of both the composition and the relationships between listed companies. Finally, a cluster methodology based on a similarity parameter is presented. Therefore, the suitability of this technique for this kind of time series is shown, as well as the robustness of the conclusions drawn

    Archetypal analysis for ordinal data

    Get PDF
    Archetypoid analysis (ADA) is an exploratory approach that explains a set of continuous observations as mixtures of pure (extreme) patterns. Those patterns (archetypoids) are actual observations of the sample which makes the results of this technique easily interpretable, even for non-experts. Note that the observations are approximated as a convex combination of the archetypoids. Archetypoid analysis, in its current form, cannot be applied directly to ordinal data. We propose and describe a two-step method for applying ADA to ordinal responses based on the ordered stereotype model. One of the main advantages of this model is that it allows us to convert the ordinal data to numerical values, using a new data-driven spacing that better reflects the ordinal patterns of the data, and this numerical conversion then enables us to apply ADA straightforwardly. The results of the novel method are presented for two behavioural science applications. Finally, the proposed method is also compared with other unsupervised statistical learning methods

    Fast and Robust Recursive Algorithms for Separable Nonnegative Matrix Factorization

    Full text link
    In this paper, we study the nonnegative matrix factorization problem under the separability assumption (that is, there exists a cone spanned by a small subset of the columns of the input nonnegative data matrix containing all columns), which is equivalent to the hyperspectral unmixing problem under the linear mixing model and the pure-pixel assumption. We present a family of fast recursive algorithms, and prove they are robust under any small perturbations of the input data matrix. This family generalizes several existing hyperspectral unmixing algorithms and hence provides for the first time a theoretical justification of their better practical performance.Comment: 30 pages, 2 figures, 7 tables. Main change: Improvement of the bound of the main theorem (Th. 3), replacing r with sqrt(r

    SAGA: Sparse And Geometry-Aware non-negative matrix factorization through non-linear local embedding

    Get PDF
    International audienceThis paper presents a new non-negative matrix factorization technique which (1) allows the decomposition of the original data on multiple latent factors accounting for the geometrical structure of the manifold embedding the data; (2) provides an optimal representation with a controllable level of sparsity; (3) has an overall linear complexity allowing handling in tractable time large and high dimensional datasets. It operates by coding the data with respect to local neighbors with non-linear weights. This locality is obtained as a consequence of the simultaneous sparsity and convexity constraints. Our method is demonstrated over several experiments, including a feature extraction and classification task, where it achieves better performances than the state-of-the-art factorization methods, with a shorter computational time

    Game analytics - maximizing the value of player data

    Get PDF
    During the years of the Information Age, technological advances in the computers, satellites, data transfer, optics, and digital storage has led to the collection of an immense mass of data on everything from business to astronomy, counting on the power of digital computing to sort through the amalgam of information and generate meaning from the data. Initially, in the 1970s and 1980s of the previous century, data were stored on disparate structures and very rapidly became overwhelming. The initial chaos led to the creation of structured databases and database management systems to assist with the management of large corpuses of data, and notably, the effective and efficient retrieval of information from databases. The rise of the database management system increased the already rapid pace of information gathering.peer-reviewe

    Analysis of Trajectories by Preserving Structural Information

    Get PDF
    The analysis of trajectories from traffic data is an established and yet fast growing area of research in the related fields of Geo-analytics and Geographic Information Systems (GIS). It has a broad range of applications that impact lives of millions of people, e.g., in urban planning, transportation and navigation systems and localized search methods. Most of these applications share some underlying basic tasks which are related to matching, clustering and classification of trajectories. And, these tasks in turn share some underlying problems, i.e., dealing with the noisy and variable length spatio-temporal sequences in the wild. In our view, these problems can be handled in a better manner by exploiting the spatio-temporal relationships (or structural information) in sampled trajectory points that remain considerably unharmed during the measurement process. Although, the usage of such structural information has allowed breakthroughs in other fields related to the analysis of complex data sets [18], surprisingly, there is no existing approach in trajectory analysis that looks at this structural information in a unified way across multiple tasks. In this thesis, we build upon these observations and give a unified treatment of structural information in order to improve trajectory analysis tasks. This treatment explores for the first time that sequences, graphs, and kernels are common to machine learning and geo-analytics. This common language allows to pool the corresponding methods and knowledge to help solving the challenges raised by the ever growing amount of movement data by developing new analysis models and methods. This is illustrated in several ways. For example, we introduce new problem settings, distance functions and a visualization scheme in the area of trajectory analysis. We also connect the broad fild of kernel methods to the analysis of trajectories, and, we strengthen and revisit the link between biological sequence methods and analysis of trajectories. Finally, the results of our experiments show that - by incorporating the structural information - our methods improve over state-of-the-art in the focused tasks, i.e., map matching, clustering and traffic event detection
    corecore