2,736 research outputs found
Adaptive Threshold Sampling and Estimation
Sampling is a fundamental problem in both computer science and statistics. A
number of issues arise when designing a method based on sampling. These include
statistical considerations such as constructing a good sampling design and
ensuring there are good, tractable estimators for the quantities of interest as
well as computational considerations such as designing fast algorithms for
streaming data and ensuring the sample fits within memory constraints.
Unfortunately, existing sampling methods are only able to address all of these
issues in limited scenarios.
We develop a framework that can be used to address these issues in a broad
range of scenarios. In particular, it addresses the problem of drawing and
using samples under some memory budget constraint. This problem can be
challenging since the memory budget forces samples to be drawn
non-independently and consequently, makes computation of resulting estimators
difficult.
At the core of the framework is the notion of a data adaptive thresholding
scheme where the threshold effectively allows one to treat the non-independent
sample as if it were drawn independently. We provide sufficient conditions for
a thresholding scheme to allow this and provide ways to build and compose such
schemes.
Furthermore, we provide fast algorithms to efficiently sample under these
thresholding schemes
Multiscale likelihood analysis and complexity penalized estimation
We describe here a framework for a certain class of multiscale likelihood
factorizations wherein, in analogy to a wavelet decomposition of an L^2
function, a given likelihood function has an alternative representation as a
product of conditional densities reflecting information in both the data and
the parameter vector localized in position and scale. The framework is
developed as a set of sufficient conditions for the existence of such
factorizations, formulated in analogy to those underlying a standard
multiresolution analysis for wavelets, and hence can be viewed as a
multiresolution analysis for likelihoods. We then consider the use of these
factorizations in the task of nonparametric, complexity penalized likelihood
estimation. We study the risk properties of certain thresholding and
partitioning estimators, and demonstrate their adaptivity and near-optimality,
in a minimax sense over a broad range of function spaces, based on squared
Hellinger distance as a loss function. In particular, our results provide an
illustration of how properties of classical wavelet-based estimators can be
obtained in a single, unified framework that includes models for continuous,
count and categorical data types
A model of Poissonian interactions and detection of dependence
This paper proposes a model of interactions between two point processes,
ruled by a reproduction function h, which is considered as the intensity of a
Poisson process. In particular, we focus on the context of neurosciences to
detect possible interactions in the cerebral activity associated with two
neurons. To provide a mathematical answer to this specific problem of
neurobiologists, we address so the question of testing the nullity of the
intensity h. We construct a multiple testing procedure obtained by the
aggregation of single tests based on a wavelet thresholding method. This test
has good theoretical properties: it is possible to guarantee the level but also
the power under some assumptions and its uniform separation rate over weak
Besov bodies is adaptive minimax. Then, some simulations are provided, showing
the good practical behavior of our testing procedure.Comment: 27 page
On adaptive wavelet estimation of a class of weighted densities
We investigate the estimation of a weighted density taking the form
, where denotes an unknown density, the associated
distribution function and is a known (non-negative) weight. Such a class
encompasses many examples, including those arising in order statistics or when
is related to the maximum or the minimum of (random or fixed)
independent and identically distributed (\iid) random variables. We here
construct a new adaptive non-parametric estimator for based on a plug-in
approach and the wavelets methodology. For a wide class of models, we prove
that it attains fast rates of convergence under the risk with
(not only for corresponding to the mean integrated squared
error) over Besov balls. The theoretical findings are illustrated through
several simulations
Super-resolution community detection for layer-aggregated multilayer networks
Applied network science often involves preprocessing network data before
applying a network-analysis method, and there is typically a theoretical
disconnect between these steps. For example, it is common to aggregate
time-varying network data into windows prior to analysis, and the tradeoffs of
this preprocessing are not well understood. Focusing on the problem of
detecting small communities in multilayer networks, we study the effects of
layer aggregation by developing random-matrix theory for modularity matrices
associated with layer-aggregated networks with nodes and layers, which
are drawn from an ensemble of Erd\H{o}s-R\'enyi networks. We study phase
transitions in which eigenvectors localize onto communities (allowing their
detection) and which occur for a given community provided its size surpasses a
detectability limit . When layers are aggregated via a summation, we
obtain , where is the number of
layers across which the community persists. Interestingly, if is allowed to
vary with then summation-based layer aggregation enhances small-community
detection even if the community persists across a vanishing fraction of layers,
provided that decays more slowly than . Moreover,
we find that thresholding the summation can in some cases cause to decay
exponentially, decreasing by orders of magnitude in a phenomenon we call
super-resolution community detection. That is, layer aggregation with
thresholding is a nonlinear data filter enabling detection of communities that
are otherwise too small to detect. Importantly, different thresholds generally
enhance the detectability of communities having different properties,
illustrating that community detection can be obscured if one analyzes network
data using a single threshold.Comment: 11 pages, 8 figure
Automatic Kalman-filter-based wavelet shrinkage denoising of 1D stellar spectra
We propose a non-parametric method to denoise 1D stellar spectra based on wavelet shrinkage followed by adaptive Kalman thresholding. Wavelet shrinkage denoising involves applying the discrete wavelet transform (DWT) to the input signal, 'shrinking' certain frequency components in the transform domain, and then applying inverse DWT to the reduced components. The performance of this procedure is influenced by the choice of base wavelet, the number of decomposition levels, and the thresholding function. Typically, these parameters are chosen by 'trial and error', which can be strongly dependent on the properties of the data being denoised. We here introduce an adaptive Kalman-filter-based thresholding method that eliminates the need for choosing the number of decomposition levels. We use the 'Haar' wavelet basis, which we found to provide excellent filtering for 1D stellar spectra, at a low computational cost. We introduce various levels of Poisson noise into synthetic PHOENIX spectra, and test the performance of several common denoising methods against our own. It proves superior in terms of noise suppression and peak shape preservation. We expect it may also be of use in automatically and accurately filtering low signal-to-noise galaxy and quasar spectra obtained from surveys such as SDSS, Gaia, LSST, PESSTO, VANDELS, LEGA-C, and DESI
- âŠ