103 research outputs found
Uniform convergence of Vapnik--Chervonenkis classes under ergodic sampling
We show that if is a complete separable metric space and
is a countable family of Borel subsets of with
finite VC dimension, then, for every stationary ergodic process with values in
, the relative frequencies of sets converge
uniformly to their limiting probabilities. Beyond ergodicity, no assumptions
are imposed on the sampling process, and no regularity conditions are imposed
on the elements of . The result extends existing work of Vapnik
and Chervonenkis, among others, who have studied uniform convergence for i.i.d.
and strongly mixing processes. Our method of proof is new and direct: it does
not rely on symmetrization techniques, probability inequalities or mixing
conditions. The uniform convergence of relative frequencies for VC-major and
VC-graph classes of functions under ergodic sampling is established as a
corollary of the basic result for sets.Comment: Published in at http://dx.doi.org/10.1214/09-AOP511 the Annals of
Probability (http://www.imstat.org/aop/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Denoising Deterministic Time Series
This paper is concerned with the problem of recovering a finite,
deterministic time series from observations that are corrupted by additive,
independent noise. A distinctive feature of this problem is that the available
data exhibit long-range dependence and, as a consequence, existing statistical
theory and methods are not readily applicable. This paper gives an analysis of
the denoising problem that extends recent work of Lalley, but begins from first
principles. Both positive and negative results are established. The positive
results show that denoising is possible under somewhat restrictive conditions
on the additive noise. The negative results show that, under more general
conditions on the noise, no procedure can recover the underlying deterministic
series
Significance-based community detection in weighted networks
Community detection is the process of grouping strongly connected nodes in a
network. Many community detection methods for un-weighted networks have a
theoretical basis in a null model. Communities discovered by these methods
therefore have interpretations in terms of statistical signficance. In this
paper, we introduce a null for weighted networks called the continuous
configuration model. We use the model both as a tool for community detection
and for simulating weighted networks with null nodes. First, we propose a
community extraction algorithm for weighted networks which incorporates
iterative hypothesis testing under the null. We prove a central limit theorem
for edge-weight sums and asymptotic consistency of the algorithm under a
weighted stochastic block model. We then incorporate the algorithm in a
community detection method called CCME. To benchmark the method, we provide a
simulation framework incorporating the null to plant "background" nodes in
weighted networks with communities. We show that the empirical performance of
CCME on these simulations is competitive with existing methods, particularly
when overlapping communities and background nodes are present. To further
validate the method, we present two real-world networks with potential
background nodes and analyze them with CCME, yielding results that reveal
macro-features of the corresponding systems.Comment: Code and supplemental info available at
http://stats.johnpalowitch.com/ccme. V3 changes: based on lengthy referee
revision process, new theoretical sections added, + major organizational
changes. V2 changes: grant info added, 1 reference added, bibliography
section moved to end, condensed bib line spacing, corrected typo
- β¦