Search CORE

17,494 research outputs found

Foundational principles for large scale inference: Illustrations through correlation mining

Author: Alfred O. Hero
Alfred O. Hero
Alfred O. Hero
Bala Rajaratnam
Bala Rajaratnam
Bala Rajaratnam
Publication venue
Publication date: 18/05/2015
Field of study

When can reliable inference be drawn in the "Big Data" context? This paper presents a framework for answering this fundamental question in the context of correlation mining, with implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics the dataset is often variable-rich but sample-starved: a regime where the number

n

of acquired samples (statistical replicates) is far fewer than the number

p

of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for "Big Data." Sample complexity however has received relatively less attention, especially in the setting when the sample size

n

is fixed, and the dimension

p

grows without bound. To address this gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where the variable dimension is fixed and the sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; 3) the purely high dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa-scale data dimension. We illustrate this high dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables that are of interest. We demonstrate various regimes of correlation mining based on the unifying perspective of high dimensional learning rates and sample complexity for different structured covariance models and different inference tasks

arXiv.org e-Print Archive

CiteSeerX

PubMed Central

eScholarship - University of California

Ranking and significance of variable-length similarity-based time series motifs

Author: Arcos Josep Lluis
Corral Álvaro
Serra Isabel
Serrà Joan
Publication venue: 'Elsevier BV'
Publication date: 06/03/2015
Field of study

The detection of very similar patterns in a time series, commonly called motifs, has received continuous and increasing attention from diverse scientific communities. In particular, recent approaches for discovering similar motifs of different lengths have been proposed. In this work, we show that such variable-length similarity-based motifs cannot be directly compared, and hence ranked, by their normalized dissimilarities. Specifically, we find that length-normalized motif dissimilarities still have intrinsic dependencies on the motif length, and that lowest dissimilarities are particularly affected by this dependency. Moreover, we find that such dependencies are generally non-linear and change with the considered data set and dissimilarity measure. Based on these findings, we propose a solution to rank those motifs and measure their significance. This solution relies on a compact but accurate model of the dissimilarity space, using a beta distribution with three parameters that depend on the motif length in a non-linear way. We believe the incomparability of variable-length dissimilarities could go beyond the field of time series, and that similar modeling strategies as the one used here could be of help in a more broad context.Comment: 20 pages, 10 figure

arXiv.org e-Print Archive

Digital.CSIC

Segregating Event Streams and Noise with a Markov Renewal Process Model

Author: Plumbley MD
Stowell D
Publication venue
Publication date: 01/08/2013
Field of study

DS and MP are supported by EPSRC Leadership Fellowship EP/G007144/1

Queen Mary Research Online

Surrey Research Insight