6,277 research outputs found
Ranking and significance of variable-length similarity-based time series motifs
The detection of very similar patterns in a time series, commonly called
motifs, has received continuous and increasing attention from diverse
scientific communities. In particular, recent approaches for discovering
similar motifs of different lengths have been proposed. In this work, we show
that such variable-length similarity-based motifs cannot be directly compared,
and hence ranked, by their normalized dissimilarities. Specifically, we find
that length-normalized motif dissimilarities still have intrinsic dependencies
on the motif length, and that lowest dissimilarities are particularly affected
by this dependency. Moreover, we find that such dependencies are generally
non-linear and change with the considered data set and dissimilarity measure.
Based on these findings, we propose a solution to rank those motifs and measure
their significance. This solution relies on a compact but accurate model of the
dissimilarity space, using a beta distribution with three parameters that
depend on the motif length in a non-linear way. We believe the incomparability
of variable-length dissimilarities could go beyond the field of time series,
and that similar modeling strategies as the one used here could be of help in a
more broad context.Comment: 20 pages, 10 figure
K2-ABC: Approximate Bayesian Computation with Kernel Embeddings
Complicated generative models often result in a situation where computing the
likelihood of observed data is intractable, while simulating from the
conditional density given a parameter value is relatively easy. Approximate
Bayesian Computation (ABC) is a paradigm that enables simulation-based
posterior inference in such cases by measuring the similarity between simulated
and observed data in terms of a chosen set of summary statistics. However,
there is no general rule to construct sufficient summary statistics for complex
models. Insufficient summary statistics will "leak" information, which leads to
ABC algorithms yielding samples from an incorrect (partial) posterior. In this
paper, we propose a fully nonparametric ABC paradigm which circumvents the need
for manually selecting summary statistics. Our approach, K2-ABC, uses maximum
mean discrepancy (MMD) as a dissimilarity measure between the distributions
over observed and simulated data. MMD is easily estimated as the squared
difference between their empirical kernel embeddings. Experiments on a
simulated scenario and a real-world biological problem illustrate the
effectiveness of the proposed algorithm
Convergence and Divergence among Technology Clubs
The paper investigates cross-country differences in technology in a large sample of developed and developing economies over the 1990s. The empirical analysis indicates the existence of three technology clubs with markedly different levels of technological development: advanced, followers and marginalized countries. The technology clubs also differ with respect to their dynamics over the 1990s. While the club of followers is characterized by a process of gradual convergence towards the technological frontier, the group of marginalized has experienced an increase in its gap in terms of innovative capabilities.Growth and development; technological change; convergence clubs; polarization
Neural activity classification with machine learning models trained on interspike interval series data
The flow of information through the brain is reflected by the activity
patterns of neural cells. Indeed, these firing patterns are widely used as
input data to predictive models that relate stimuli and animal behavior to the
activity of a population of neurons. However, relatively little attention was
paid to single neuron spike trains as predictors of cell or network properties
in the brain. In this work, we introduce an approach to neuronal spike train
data mining which enables effective classification and clustering of neuron
types and network activity states based on single-cell spiking patterns. This
approach is centered around applying state-of-the-art time series
classification/clustering methods to sequences of interspike intervals recorded
from single neurons. We demonstrate good performance of these methods in tasks
involving classification of neuron type (e.g. excitatory vs. inhibitory cells)
and/or neural circuit activity state (e.g. awake vs. REM sleep vs. nonREM sleep
states) on an open-access cortical spiking activity dataset
Multivariate dynamic kernels for financial time series forecasting
The final publication is available at http://link.springer.com/chapter/10.1007/978-3-319-44781-0_40We propose a forecasting procedure based on multivariate dynamic kernels, with the capability of integrating information measured at different frequencies and at irregular time intervals in financial markets. A data compression process redefines the original financial time series into temporal data blocks, analyzing the temporal information of multiple time intervals. The analysis is done through multivariate dynamic kernels within support vector regression. We also propose two kernels for financial time series that are computationally efficient without a sacrifice on accuracy. The efficacy of the methodology is demonstrated by empirical experiments on forecasting the challenging S&P500 market.Peer ReviewedPostprint (author's final draft
- âŠ