11 research outputs found
Latent Dirichlet Allocation Uncovers Spectral Characteristics of Drought Stressed Plants
Understanding the adaptation process of plants to drought stress is essential
in improving management practices, breeding strategies as well as engineering
viable crops for a sustainable agriculture in the coming decades.
Hyper-spectral imaging provides a particularly promising approach to gain such
understanding since it allows to discover non-destructively spectral
characteristics of plants governed primarily by scattering and absorption
characteristics of the leaf internal structure and biochemical constituents.
Several drought stress indices have been derived using hyper-spectral imaging.
However, they are typically based on few hyper-spectral images only, rely on
interpretations of experts, and consider few wavelengths only. In this study,
we present the first data-driven approach to discovering spectral drought
stress indices, treating it as an unsupervised labeling problem at massive
scale. To make use of short range dependencies of spectral wavelengths, we
develop an online variational Bayes algorithm for latent Dirichlet allocation
with convolved Dirichlet regularizer. This approach scales to massive datasets
and, hence, provides a more objective complement to plant physiological
practices. The spectral topics found conform to plant physiological knowledge
and can be computed in a fraction of the time compared to existing LDA
approaches.Comment: Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty
in Artificial Intelligence (UAI2012
Bivariate Functional Archetypoid Analysis: An Application to Financial Time Series
Treball de Fi de MĂ ster Universitari en MatemĂ tica Computacional (Pla de 2013). Codi: SIQ027. Curs 2016-2017Archetype Analysis (AA) is a statistical technique that describes individuals of a sample as a
convex combination of certain number of elements called Archetypes, which in turn, are convex
combinations of the individuals in the sample. For it's part, Archetypoid Analysis (ADA) tries
to represent each individual as a convex combination of a certain number of extreme subjects
called Archetypoids. It is possible to apply these techniques to functional data applying a basis
expansion function and performing AA or ADA to the weighted coe cients in the basis.
This document presents an application of Functional Archetypoids Analysis (FADA) to
nancial time series. The starting time series consists of daily equity prices of the SP500
stocks. From it, measures of volatility and pro tability are generated in order to characterize
listed companies. These variables are converted into functional data through a Fourier basis
expansion function and bivariate FADA is applied. By representing subjects through extreme
cases, this analysis facilitates the understanding of both the composition and the relationships
between listed companies. Finally, a cluster methodology based on a similarity parameter is
presented. Therefore, the suitability of this technique for this kind of time series is shown, as
well as the robustness of the conclusions drawn
Archetypal analysis for ordinal data
Archetypoid analysis (ADA) is an exploratory approach that explains a set of continuous observations as mixtures of pure (extreme) patterns. Those patterns (archetypoids) are actual observations of the sample which makes the results of this technique easily interpretable, even for non-experts. Note that the observations are approximated as a convex combination of the archetypoids. Archetypoid analysis, in its current form, cannot be applied directly to ordinal data. We propose and describe a two-step method for applying ADA to ordinal responses based on the ordered stereotype model. One of the main advantages of this model is that it allows us to convert the ordinal data to numerical values, using a new data-driven spacing that better reflects the ordinal patterns of the data, and this numerical conversion then enables us to apply ADA straightforwardly. The results of the novel method are presented for two behavioural science applications. Finally, the proposed method is also compared with other unsupervised statistical learning methods
Fast and Robust Recursive Algorithms for Separable Nonnegative Matrix Factorization
In this paper, we study the nonnegative matrix factorization problem under
the separability assumption (that is, there exists a cone spanned by a small
subset of the columns of the input nonnegative data matrix containing all
columns), which is equivalent to the hyperspectral unmixing problem under the
linear mixing model and the pure-pixel assumption. We present a family of fast
recursive algorithms, and prove they are robust under any small perturbations
of the input data matrix. This family generalizes several existing
hyperspectral unmixing algorithms and hence provides for the first time a
theoretical justification of their better practical performance.Comment: 30 pages, 2 figures, 7 tables. Main change: Improvement of the bound
of the main theorem (Th. 3), replacing r with sqrt(r
Búsqueda de arquetipos en la relación didáctica del razonamiento proporcional con otros contenidos matemáticos
INNTED Congreso Internacional de InnovaciĂłn y Tendencias Educativas 202
SAGA: Sparse And Geometry-Aware non-negative matrix factorization through non-linear local embedding
International audienceThis paper presents a new non-negative matrix factorization technique which (1) allows the decomposition of the original data on multiple latent factors accounting for the geometrical structure of the manifold embedding the data; (2) provides an optimal representation with a controllable level of sparsity; (3) has an overall linear complexity allowing handling in tractable time large and high dimensional datasets. It operates by coding the data with respect to local neighbors with non-linear weights. This locality is obtained as a consequence of the simultaneous sparsity and convexity constraints. Our method is demonstrated over several experiments, including a feature extraction and classification task, where it achieves better performances than the state-of-the-art factorization methods, with a shorter computational time
Game analytics - maximizing the value of player data
During the years of the Information Age, technological advances in the computers,
satellites, data transfer, optics, and digital storage has led to the collection of an
immense mass of data on everything from business to astronomy, counting on the
power of digital computing to sort through the amalgam of information and generate meaning from the data. Initially, in the 1970s and 1980s of the previous century,
data were stored on disparate structures and very rapidly became overwhelming. The
initial chaos led to the creation of structured databases and database management
systems to assist with the management of large corpuses of data, and notably, the
effective and efficient retrieval of information from databases. The rise of the database management system increased the already rapid pace of information
gathering.peer-reviewe
Analysis of Trajectories by Preserving Structural Information
The analysis of trajectories from traffic data is an established and yet fast growing area of research in the related fields of Geo-analytics and Geographic Information Systems (GIS). It has a broad range of applications that impact lives of millions of people, e.g., in urban planning, transportation and navigation systems and localized search methods. Most of these applications share some underlying basic tasks which are related to matching, clustering and classification of trajectories. And, these tasks in turn share some underlying problems, i.e., dealing with the noisy and variable length spatio-temporal sequences in the wild. In our view, these problems can be handled in a better manner by exploiting the spatio-temporal relationships (or structural information) in sampled trajectory points that remain considerably unharmed during the measurement process. Although, the usage of such structural information has allowed breakthroughs in other fields related to the analysis of complex data sets [18], surprisingly, there is no existing approach in trajectory analysis that looks at this structural information in a unified way across multiple tasks. In this thesis, we build upon these observations and give a unified treatment of structural information in order to improve trajectory analysis tasks. This treatment explores for the first time that sequences, graphs, and kernels are common to machine learning and geo-analytics. This common language allows to pool the corresponding methods and knowledge to help solving the challenges raised by the ever growing amount of movement data by developing new analysis models and methods. This is illustrated in several ways. For example, we introduce new problem settings, distance functions and a visualization scheme in the area of trajectory analysis. We also connect the broad fild of kernel methods to the analysis of trajectories, and, we strengthen and revisit the link between biological sequence methods and analysis of trajectories. Finally, the results of our experiments show that - by incorporating the structural information - our methods improve over state-of-the-art in the focused tasks, i.e., map matching, clustering and traffic event detection