6,277 research outputs found

    Ranking and significance of variable-length similarity-based time series motifs

    Get PDF
    The detection of very similar patterns in a time series, commonly called motifs, has received continuous and increasing attention from diverse scientific communities. In particular, recent approaches for discovering similar motifs of different lengths have been proposed. In this work, we show that such variable-length similarity-based motifs cannot be directly compared, and hence ranked, by their normalized dissimilarities. Specifically, we find that length-normalized motif dissimilarities still have intrinsic dependencies on the motif length, and that lowest dissimilarities are particularly affected by this dependency. Moreover, we find that such dependencies are generally non-linear and change with the considered data set and dissimilarity measure. Based on these findings, we propose a solution to rank those motifs and measure their significance. This solution relies on a compact but accurate model of the dissimilarity space, using a beta distribution with three parameters that depend on the motif length in a non-linear way. We believe the incomparability of variable-length dissimilarities could go beyond the field of time series, and that similar modeling strategies as the one used here could be of help in a more broad context.Comment: 20 pages, 10 figure

    K2-ABC: Approximate Bayesian Computation with Kernel Embeddings

    Get PDF
    Complicated generative models often result in a situation where computing the likelihood of observed data is intractable, while simulating from the conditional density given a parameter value is relatively easy. Approximate Bayesian Computation (ABC) is a paradigm that enables simulation-based posterior inference in such cases by measuring the similarity between simulated and observed data in terms of a chosen set of summary statistics. However, there is no general rule to construct sufficient summary statistics for complex models. Insufficient summary statistics will "leak" information, which leads to ABC algorithms yielding samples from an incorrect (partial) posterior. In this paper, we propose a fully nonparametric ABC paradigm which circumvents the need for manually selecting summary statistics. Our approach, K2-ABC, uses maximum mean discrepancy (MMD) as a dissimilarity measure between the distributions over observed and simulated data. MMD is easily estimated as the squared difference between their empirical kernel embeddings. Experiments on a simulated scenario and a real-world biological problem illustrate the effectiveness of the proposed algorithm

    Convergence and Divergence among Technology Clubs

    Get PDF
    The paper investigates cross-country differences in technology in a large sample of developed and developing economies over the 1990s. The empirical analysis indicates the existence of three technology clubs with markedly different levels of technological development: advanced, followers and marginalized countries. The technology clubs also differ with respect to their dynamics over the 1990s. While the club of followers is characterized by a process of gradual convergence towards the technological frontier, the group of marginalized has experienced an increase in its gap in terms of innovative capabilities.Growth and development; technological change; convergence clubs; polarization

    Neural activity classification with machine learning models trained on interspike interval series data

    Full text link
    The flow of information through the brain is reflected by the activity patterns of neural cells. Indeed, these firing patterns are widely used as input data to predictive models that relate stimuli and animal behavior to the activity of a population of neurons. However, relatively little attention was paid to single neuron spike trains as predictors of cell or network properties in the brain. In this work, we introduce an approach to neuronal spike train data mining which enables effective classification and clustering of neuron types and network activity states based on single-cell spiking patterns. This approach is centered around applying state-of-the-art time series classification/clustering methods to sequences of interspike intervals recorded from single neurons. We demonstrate good performance of these methods in tasks involving classification of neuron type (e.g. excitatory vs. inhibitory cells) and/or neural circuit activity state (e.g. awake vs. REM sleep vs. nonREM sleep states) on an open-access cortical spiking activity dataset

    Multivariate dynamic kernels for financial time series forecasting

    Get PDF
    The final publication is available at http://link.springer.com/chapter/10.1007/978-3-319-44781-0_40We propose a forecasting procedure based on multivariate dynamic kernels, with the capability of integrating information measured at different frequencies and at irregular time intervals in financial markets. A data compression process redefines the original financial time series into temporal data blocks, analyzing the temporal information of multiple time intervals. The analysis is done through multivariate dynamic kernels within support vector regression. We also propose two kernels for financial time series that are computationally efficient without a sacrifice on accuracy. The efficacy of the methodology is demonstrated by empirical experiments on forecasting the challenging S&P500 market.Peer ReviewedPostprint (author's final draft
    • 

    corecore