7,097 research outputs found
Nonparametric Nearest Neighbor Random Process Clustering
We consider the problem of clustering noisy finite-length observations of
stationary ergodic random processes according to their nonparametric generative
models without prior knowledge of the model statistics and the number of
generative models. Two algorithms, both using the L1-distance between estimated
power spectral densities (PSDs) as a measure of dissimilarity, are analyzed.
The first algorithm, termed nearest neighbor process clustering (NNPC), to the
best of our knowledge, is new and relies on partitioning the nearest neighbor
graph of the observations via spectral clustering. The second algorithm, simply
referred to as k-means (KM), consists of a single k-means iteration with
farthest point initialization and was considered before in the literature,
albeit with a different measure of dissimilarity and with asymptotic
performance results only. We show that both NNPC and KM succeed with high
probability under noise and even when the generative process PSDs overlap
significantly, all provided that the observation length is sufficiently large.
Our results quantify the tradeoff between the overlap of the generative process
PSDs, the noise variance, and the observation length. Finally, we present
numerical performance results for synthetic and real data.Comment: IEEE International Symposium on Information Theory (ISIT), June 2015,
to appea
Information Theoretical Estimators Toolbox
We present ITE (information theoretical estimators) a free and open source,
multi-platform, Matlab/Octave toolbox that is capable of estimating many
different variants of entropy, mutual information, divergence, association
measures, cross quantities, and kernels on distributions. Thanks to its highly
modular design, ITE supports additionally (i) the combinations of the
estimation techniques, (ii) the easy construction and embedding of novel
information theoretical estimators, and (iii) their immediate application in
information theoretical optimization problems. ITE also includes a prototype
application in a central problem class of signal processing, independent
subspace analysis and its extensions.Comment: 5 pages; ITE toolbox: https://bitbucket.org/szzoli/ite
Mapping Topographic Structure in White Matter Pathways with Level Set Trees
Fiber tractography on diffusion imaging data offers rich potential for
describing white matter pathways in the human brain, but characterizing the
spatial organization in these large and complex data sets remains a challenge.
We show that level set trees---which provide a concise representation of the
hierarchical mode structure of probability density functions---offer a
statistically-principled framework for visualizing and analyzing topography in
fiber streamlines. Using diffusion spectrum imaging data collected on
neurologically healthy controls (N=30), we mapped white matter pathways from
the cortex into the striatum using a deterministic tractography algorithm that
estimates fiber bundles as dimensionless streamlines. Level set trees were used
for interactive exploration of patterns in the endpoint distributions of the
mapped fiber tracks and an efficient segmentation of the tracks that has
empirical accuracy comparable to standard nonparametric clustering methods. We
show that level set trees can also be generalized to model pseudo-density
functions in order to analyze a broader array of data types, including entire
fiber streamlines. Finally, resampling methods show the reliability of the
level set tree as a descriptive measure of topographic structure, illustrating
its potential as a statistical descriptor in brain imaging analysis. These
results highlight the broad applicability of level set trees for visualizing
and analyzing high-dimensional data like fiber tractography output
Finite-Sample Analysis of Fixed-k Nearest Neighbor Density Functional Estimators
We provide finite-sample analysis of a general framework for using k-nearest
neighbor statistics to estimate functionals of a nonparametric continuous
probability density, including entropies and divergences. Rather than plugging
a consistent density estimate (which requires as the sample size
) into the functional of interest, the estimators we consider fix
k and perform a bias correction. This is more efficient computationally, and,
as we show in certain cases, statistically, leading to faster convergence
rates. Our framework unifies several previous estimators, for most of which
ours are the first finite sample guarantees.Comment: 16 pages, 0 figure
- …