7,097 research outputs found

    Nonparametric Nearest Neighbor Random Process Clustering

    Full text link
    We consider the problem of clustering noisy finite-length observations of stationary ergodic random processes according to their nonparametric generative models without prior knowledge of the model statistics and the number of generative models. Two algorithms, both using the L1-distance between estimated power spectral densities (PSDs) as a measure of dissimilarity, are analyzed. The first algorithm, termed nearest neighbor process clustering (NNPC), to the best of our knowledge, is new and relies on partitioning the nearest neighbor graph of the observations via spectral clustering. The second algorithm, simply referred to as k-means (KM), consists of a single k-means iteration with farthest point initialization and was considered before in the literature, albeit with a different measure of dissimilarity and with asymptotic performance results only. We show that both NNPC and KM succeed with high probability under noise and even when the generative process PSDs overlap significantly, all provided that the observation length is sufficiently large. Our results quantify the tradeoff between the overlap of the generative process PSDs, the noise variance, and the observation length. Finally, we present numerical performance results for synthetic and real data.Comment: IEEE International Symposium on Information Theory (ISIT), June 2015, to appea

    Information Theoretical Estimators Toolbox

    Get PDF
    We present ITE (information theoretical estimators) a free and open source, multi-platform, Matlab/Octave toolbox that is capable of estimating many different variants of entropy, mutual information, divergence, association measures, cross quantities, and kernels on distributions. Thanks to its highly modular design, ITE supports additionally (i) the combinations of the estimation techniques, (ii) the easy construction and embedding of novel information theoretical estimators, and (iii) their immediate application in information theoretical optimization problems. ITE also includes a prototype application in a central problem class of signal processing, independent subspace analysis and its extensions.Comment: 5 pages; ITE toolbox: https://bitbucket.org/szzoli/ite

    Mapping Topographic Structure in White Matter Pathways with Level Set Trees

    Full text link
    Fiber tractography on diffusion imaging data offers rich potential for describing white matter pathways in the human brain, but characterizing the spatial organization in these large and complex data sets remains a challenge. We show that level set trees---which provide a concise representation of the hierarchical mode structure of probability density functions---offer a statistically-principled framework for visualizing and analyzing topography in fiber streamlines. Using diffusion spectrum imaging data collected on neurologically healthy controls (N=30), we mapped white matter pathways from the cortex into the striatum using a deterministic tractography algorithm that estimates fiber bundles as dimensionless streamlines. Level set trees were used for interactive exploration of patterns in the endpoint distributions of the mapped fiber tracks and an efficient segmentation of the tracks that has empirical accuracy comparable to standard nonparametric clustering methods. We show that level set trees can also be generalized to model pseudo-density functions in order to analyze a broader array of data types, including entire fiber streamlines. Finally, resampling methods show the reliability of the level set tree as a descriptive measure of topographic structure, illustrating its potential as a statistical descriptor in brain imaging analysis. These results highlight the broad applicability of level set trees for visualizing and analyzing high-dimensional data like fiber tractography output

    Finite-Sample Analysis of Fixed-k Nearest Neighbor Density Functional Estimators

    Full text link
    We provide finite-sample analysis of a general framework for using k-nearest neighbor statistics to estimate functionals of a nonparametric continuous probability density, including entropies and divergences. Rather than plugging a consistent density estimate (which requires k→∞k \to \infty as the sample size n→∞n \to \infty) into the functional of interest, the estimators we consider fix k and perform a bias correction. This is more efficient computationally, and, as we show in certain cases, statistically, leading to faster convergence rates. Our framework unifies several previous estimators, for most of which ours are the first finite sample guarantees.Comment: 16 pages, 0 figure
    • …
    corecore