    Adaptive Threshold Sampling and Estimation

    Sampling is a fundamental problem in both computer science and statistics. A number of issues arise when designing a method based on sampling. These include statistical considerations such as constructing a good sampling design and ensuring there are good, tractable estimators for the quantities of interest as well as computational considerations such as designing fast algorithms for streaming data and ensuring the sample fits within memory constraints. Unfortunately, existing sampling methods are only able to address all of these issues in limited scenarios. We develop a framework that can be used to address these issues in a broad range of scenarios. In particular, it addresses the problem of drawing and using samples under some memory budget constraint. This problem can be challenging since the memory budget forces samples to be drawn non-independently and consequently, makes computation of resulting estimators difficult. At the core of the framework is the notion of a data adaptive thresholding scheme where the threshold effectively allows one to treat the non-independent sample as if it were drawn independently. We provide sufficient conditions for a thresholding scheme to allow this and provide ways to build and compose such schemes. Furthermore, we provide fast algorithms to efficiently sample under these thresholding schemes

    Multiscale likelihood analysis and complexity penalized estimation

    We describe here a framework for a certain class of multiscale likelihood factorizations wherein, in analogy to a wavelet decomposition of an L^2 function, a given likelihood function has an alternative representation as a product of conditional densities reflecting information in both the data and the parameter vector localized in position and scale. The framework is developed as a set of sufficient conditions for the existence of such factorizations, formulated in analogy to those underlying a standard multiresolution analysis for wavelets, and hence can be viewed as a multiresolution analysis for likelihoods. We then consider the use of these factorizations in the task of nonparametric, complexity penalized likelihood estimation. We study the risk properties of certain thresholding and partitioning estimators, and demonstrate their adaptivity and near-optimality, in a minimax sense over a broad range of function spaces, based on squared Hellinger distance as a loss function. In particular, our results provide an illustration of how properties of classical wavelet-based estimators can be obtained in a single, unified framework that includes models for continuous, count and categorical data types

    A model of Poissonian interactions and detection of dependence

    This paper proposes a model of interactions between two point processes, ruled by a reproduction function h, which is considered as the intensity of a Poisson process. In particular, we focus on the context of neurosciences to detect possible interactions in the cerebral activity associated with two neurons. To provide a mathematical answer to this specific problem of neurobiologists, we address so the question of testing the nullity of the intensity h. We construct a multiple testing procedure obtained by the aggregation of single tests based on a wavelet thresholding method. This test has good theoretical properties: it is possible to guarantee the level but also the power under some assumptions and its uniform separation rate over weak Besov bodies is adaptive minimax. Then, some simulations are provided, showing the good practical behavior of our testing procedure.Comment: 27 page

    On adaptive wavelet estimation of a class of weighted densities

    We investigate the estimation of a weighted density taking the form g=w(F)fg=w(F)f, where ff denotes an unknown density, FF the associated distribution function and ww is a known (non-negative) weight. Such a class encompasses many examples, including those arising in order statistics or when gg is related to the maximum or the minimum of NN (random or fixed) independent and identically distributed (\iid) random variables. We here construct a new adaptive non-parametric estimator for gg based on a plug-in approach and the wavelets methodology. For a wide class of models, we prove that it attains fast rates of convergence under the Lp\mathbb{L}_p risk with p≄1p\ge 1 (not only for p=2p = 2 corresponding to the mean integrated squared error) over Besov balls. The theoretical findings are illustrated through several simulations

    Super-resolution community detection for layer-aggregated multilayer networks

    Applied network science often involves preprocessing network data before applying a network-analysis method, and there is typically a theoretical disconnect between these steps. For example, it is common to aggregate time-varying network data into windows prior to analysis, and the tradeoffs of this preprocessing are not well understood. Focusing on the problem of detecting small communities in multilayer networks, we study the effects of layer aggregation by developing random-matrix theory for modularity matrices associated with layer-aggregated networks with NN nodes and LL layers, which are drawn from an ensemble of Erd\H{o}s-R\'enyi networks. We study phase transitions in which eigenvectors localize onto communities (allowing their detection) and which occur for a given community provided its size surpasses a detectability limit K∗K^*. When layers are aggregated via a summation, we obtain K∗∝O(NL/T)K^*\varpropto \mathcal{O}(\sqrt{NL}/T), where TT is the number of layers across which the community persists. Interestingly, if TT is allowed to vary with LL then summation-based layer aggregation enhances small-community detection even if the community persists across a vanishing fraction of layers, provided that T/LT/L decays more slowly than O(L−1/2) \mathcal{O}(L^{-1/2}). Moreover, we find that thresholding the summation can in some cases cause K∗K^* to decay exponentially, decreasing by orders of magnitude in a phenomenon we call super-resolution community detection. That is, layer aggregation with thresholding is a nonlinear data filter enabling detection of communities that are otherwise too small to detect. Importantly, different thresholds generally enhance the detectability of communities having different properties, illustrating that community detection can be obscured if one analyzes network data using a single threshold.Comment: 11 pages, 8 figure

    Automatic Kalman-filter-based wavelet shrinkage denoising of 1D stellar spectra

    We propose a non-parametric method to denoise 1D stellar spectra based on wavelet shrinkage followed by adaptive Kalman thresholding. Wavelet shrinkage denoising involves applying the discrete wavelet transform (DWT) to the input signal, 'shrinking' certain frequency components in the transform domain, and then applying inverse DWT to the reduced components. The performance of this procedure is influenced by the choice of base wavelet, the number of decomposition levels, and the thresholding function. Typically, these parameters are chosen by 'trial and error', which can be strongly dependent on the properties of the data being denoised. We here introduce an adaptive Kalman-filter-based thresholding method that eliminates the need for choosing the number of decomposition levels. We use the 'Haar' wavelet basis, which we found to provide excellent filtering for 1D stellar spectra, at a low computational cost. We introduce various levels of Poisson noise into synthetic PHOENIX spectra, and test the performance of several common denoising methods against our own. It proves superior in terms of noise suppression and peak shape preservation. We expect it may also be of use in automatically and accurately filtering low signal-to-noise galaxy and quasar spectra obtained from surveys such as SDSS, Gaia, LSST, PESSTO, VANDELS, LEGA-C, and DESI
