103 research outputs found

    Uniform convergence of Vapnik--Chervonenkis classes under ergodic sampling

    Get PDF
    We show that if X\mathcal{X} is a complete separable metric space and C\mathcal{C} is a countable family of Borel subsets of X\mathcal{X} with finite VC dimension, then, for every stationary ergodic process with values in X\mathcal{X}, the relative frequencies of sets C∈CC\in\mathcal{C} converge uniformly to their limiting probabilities. Beyond ergodicity, no assumptions are imposed on the sampling process, and no regularity conditions are imposed on the elements of C\mathcal{C}. The result extends existing work of Vapnik and Chervonenkis, among others, who have studied uniform convergence for i.i.d. and strongly mixing processes. Our method of proof is new and direct: it does not rely on symmetrization techniques, probability inequalities or mixing conditions. The uniform convergence of relative frequencies for VC-major and VC-graph classes of functions under ergodic sampling is established as a corollary of the basic result for sets.Comment: Published in at http://dx.doi.org/10.1214/09-AOP511 the Annals of Probability (http://www.imstat.org/aop/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Denoising Deterministic Time Series

    Get PDF
    This paper is concerned with the problem of recovering a finite, deterministic time series from observations that are corrupted by additive, independent noise. A distinctive feature of this problem is that the available data exhibit long-range dependence and, as a consequence, existing statistical theory and methods are not readily applicable. This paper gives an analysis of the denoising problem that extends recent work of Lalley, but begins from first principles. Both positive and negative results are established. The positive results show that denoising is possible under somewhat restrictive conditions on the additive noise. The negative results show that, under more general conditions on the noise, no procedure can recover the underlying deterministic series

    Significance-based community detection in weighted networks

    Get PDF
    Community detection is the process of grouping strongly connected nodes in a network. Many community detection methods for un-weighted networks have a theoretical basis in a null model. Communities discovered by these methods therefore have interpretations in terms of statistical signficance. In this paper, we introduce a null for weighted networks called the continuous configuration model. We use the model both as a tool for community detection and for simulating weighted networks with null nodes. First, we propose a community extraction algorithm for weighted networks which incorporates iterative hypothesis testing under the null. We prove a central limit theorem for edge-weight sums and asymptotic consistency of the algorithm under a weighted stochastic block model. We then incorporate the algorithm in a community detection method called CCME. To benchmark the method, we provide a simulation framework incorporating the null to plant "background" nodes in weighted networks with communities. We show that the empirical performance of CCME on these simulations is competitive with existing methods, particularly when overlapping communities and background nodes are present. To further validate the method, we present two real-world networks with potential background nodes and analyze them with CCME, yielding results that reveal macro-features of the corresponding systems.Comment: Code and supplemental info available at http://stats.johnpalowitch.com/ccme. V3 changes: based on lengthy referee revision process, new theoretical sections added, + major organizational changes. V2 changes: grant info added, 1 reference added, bibliography section moved to end, condensed bib line spacing, corrected typo
    • …
    corecore