31,076 research outputs found

    Kernel density estimation via diffusion

    Get PDF
    We present a new adaptive kernel density estimator based on linear diffusion processes. The proposed estimator builds on existing ideas for adaptive smoothing by incorporating information from a pilot density estimate. In addition, we propose a new plug-in bandwidth selection method that is free from the arbitrary normal reference rules used by existing methods. We present simulation examples in which the proposed approach outperforms existing methods in terms of accuracy and reliability.Comment: Published in at http://dx.doi.org/10.1214/10-AOS799 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Numerical performance of Penalized Comparison to Overfitting for multivariate kernel density estimation

    Full text link
    Kernel density estimation is a well known method involving a smoothing parameter (the bandwidth) that needs to be tuned by the user. Although this method has been widely used the bandwidth selection remains a challenging issue in terms of balancing algorithmic performance and statistical relevance. The purpose of this paper is to compare a recently developped bandwidth selection method for kernel density estimation to those which are commonly used by now (at least those which are implemented in the R-package). This new method is called Penalized Comparison to Overfitting (PCO). It has been proposed by some of the authors of this paper in a previous work devoted to its statistical relevance from a purely theoretical perspective. It is compared here to other usual bandwidth selection methods for univariate and also multivariate kernel density estimation on the basis of intensive simulation studies. In particular, cross-validation and plug-in criteria are numerically investigated and compared to PCO. The take home message is that PCO can outperform the classical methods without algorithmic additionnal cost

    Bandwidth selection in kernel empirical risk minimization via the gradient

    Get PDF
    In this paper, we deal with the data-driven selection of multidimensional and possibly anisotropic bandwidths in the general framework of kernel empirical risk minimization. We propose a universal selection rule, which leads to optimal adaptive results in a large variety of statistical models such as nonparametric robust regression and statistical learning with errors in variables. These results are stated in the context of smooth loss functions, where the gradient of the risk appears as a good criterion to measure the performance of our estimators. The selection rule consists of a comparison of gradient empirical risks. It can be viewed as a nontrivial improvement of the so-called Goldenshluger-Lepski method to nonlinear estimators. Furthermore, one main advantage of our selection rule is the nondependency on the Hessian matrix of the risk, usually involved in standard adaptive procedures.Comment: Published at http://dx.doi.org/10.1214/15-AOS1318 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Automatic bandwidth selection for circular density estimation

    Get PDF
    Given angular data θ1,…,θn[0,2π) a common objective is to estimate the density. In case that a kernel estimator is used, bandwidth selection is crucial to the performance. A “plug-in rule” for the bandwidth, which is based on the concentration of a reference density, namely, the von Mises distribution is obtained. It is seen that this is equivalent to the usual Euclidean plug-in rule in the case where the concentration becomes large. In case that the concentration parameter is unknown, alternative methods are explored which are intended to be robust to departures from the reference density. Simulations indicate that “wrapped estimators” can perform well in this context. The methods are applied to a real bivariate dataset concerning protein structure

    The geometry of kernelized spectral clustering

    Full text link
    Clustering of data sets is a standard problem in many areas of science and engineering. The method of spectral clustering is based on embedding the data set using a kernel function, and using the top eigenvectors of the normalized Laplacian to recover the connected components. We study the performance of spectral clustering in recovering the latent labels of i.i.d. samples from a finite mixture of nonparametric distributions. The difficulty of this label recovery problem depends on the overlap between mixture components and how easily a mixture component is divided into two nonoverlapping components. When the overlap is small compared to the indivisibility of the mixture components, the principal eigenspace of the population-level normalized Laplacian operator is approximately spanned by the square-root kernelized component densities. In the finite sample setting, and under the same assumption, embedded samples from different components are approximately orthogonal with high probability when the sample size is large. As a corollary we control the fraction of samples mislabeled by spectral clustering under finite mixtures with nonparametric components.Comment: Published at http://dx.doi.org/10.1214/14-AOS1283 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Optical properties of small polarons from dynamical mean-field theory

    Full text link
    The optical properties of polarons are studied in the framework of the Holstein model by applying the dynamical mean-field theory. This approach allows to enlighten important quantitative and qualitative deviations from the limiting treatments of small polaron theory, that should be considered when interpreting experimental data. In the antiadiabatic regime, accounting on the same footing for a finite phonon frequency and a finite electron bandwidth allows to address the evolution of the optical absorption away from the well-understood molecular limit. It is shown that the width of the multiphonon peaks in the optical spectra depends on the temperature and on the frequency in a way that contradicts the commonly accepted results, most notably in the strong coupling case. In the adiabatic regime, on the other hand, the present method allows to identify a wide range of parameters of experimental interest, where the electron bandwidth is comparable or larger than the broadening of the Franck-Condon line, leading to a strong modification of both the position and the shape of the polaronic absorption. An analytical expression is derived in the limit of vanishing broadening, which improves over the existing formulas and whose validity extends to any finite-dimensional lattice. In the same adiabatic regime, at intermediate values of the interaction strength, the optical absorption exhibits a characteristic reentrant behavior, with the emergence of sharp features upon increasing the temperature -- polaron interband transitions -- which are peculiar of the polaron crossover, and for which analytical expressions are provided.Comment: 16 pages, 6 figure

    Bandwidth selection for kernel estimation in mixed multi-dimensional spaces

    Get PDF
    Kernel estimation techniques, such as mean shift, suffer from one major drawback: the kernel bandwidth selection. The bandwidth can be fixed for all the data set or can vary at each points. Automatic bandwidth selection becomes a real challenge in case of multidimensional heterogeneous features. This paper presents a solution to this problem. It is an extension of \cite{Comaniciu03a} which was based on the fundamental property of normal distributions regarding the bias of the normalized density gradient. The selection is done iteratively for each type of features, by looking for the stability of local bandwidth estimates across a predefined range of bandwidths. A pseudo balloon mean shift filtering and partitioning are introduced. The validity of the method is demonstrated in the context of color image segmentation based on a 5-dimensional space

    Population Synthesis via k-Nearest Neighbor Crossover Kernel

    Full text link
    The recent development of multi-agent simulations brings about a need for population synthesis. It is a task of reconstructing the entire population from a sampling survey of limited size (1% or so), supplying the initial conditions from which simulations begin. This paper presents a new kernel density estimator for this task. Our method is an analogue of the classical Breiman-Meisel-Purcell estimator, but employs novel techniques that harness the huge degree of freedom which is required to model high-dimensional nonlinearly correlated datasets: the crossover kernel, the k-nearest neighbor restriction of the kernel construction set and the bagging of kernels. The performance as a statistical estimator is examined through real and synthetic datasets. We provide an "optimization-free" parameter selection rule for our method, a theory of how our method works and a computational cost analysis. To demonstrate the usefulness as a population synthesizer, our method is applied to a household synthesis task for an urban micro-simulator.Comment: 10 pages, 4 figures, IEEE International Conference on Data Mining (ICDM) 201

    Dynamical properties of ultracold bosons in an optical lattice

    Full text link
    We study the excitation spectrum of strongly correlated lattice bosons for the Mott-insulating phase and for the superfluid phase close to localization. Within a Schwinger-boson mean-field approach we find two gapped modes in the Mott insulator and the combination of a sound mode (Goldstone) and a gapped (Higgs) mode in the superfluid. To make our findings comparable with experimental results, we calculate the dynamic structure factor as well as the linear response to the optical lattice modulation introduced by Stoeferle et al. [Phys. Rev. Lett. 92, 130403 (2004)]. We find that the puzzling finite frequency absorption observed in the superfluid phase could be explained via the excitation of the gapped (Higgs) mode. We check the consistency of our results with an adapted f-sum-rule and propose an extension of the experimental technique by Stoeferle et al. to further verify our findings.Comment: 13 pages, 5 figure
    • …
    corecore