1,218 research outputs found
Nonparametric Nearest Neighbor Random Process Clustering
We consider the problem of clustering noisy finite-length observations of
stationary ergodic random processes according to their nonparametric generative
models without prior knowledge of the model statistics and the number of
generative models. Two algorithms, both using the L1-distance between estimated
power spectral densities (PSDs) as a measure of dissimilarity, are analyzed.
The first algorithm, termed nearest neighbor process clustering (NNPC), to the
best of our knowledge, is new and relies on partitioning the nearest neighbor
graph of the observations via spectral clustering. The second algorithm, simply
referred to as k-means (KM), consists of a single k-means iteration with
farthest point initialization and was considered before in the literature,
albeit with a different measure of dissimilarity and with asymptotic
performance results only. We show that both NNPC and KM succeed with high
probability under noise and even when the generative process PSDs overlap
significantly, all provided that the observation length is sufficiently large.
Our results quantify the tradeoff between the overlap of the generative process
PSDs, the noise variance, and the observation length. Finally, we present
numerical performance results for synthetic and real data.Comment: IEEE International Symposium on Information Theory (ISIT), June 2015,
to appea
Subspace clustering of dimensionality-reduced data
Subspace clustering refers to the problem of clustering unlabeled
high-dimensional data points into a union of low-dimensional linear subspaces,
assumed unknown. In practice one may have access to dimensionality-reduced
observations of the data only, resulting, e.g., from "undersampling" due to
complexity and speed constraints on the acquisition device. More pertinently,
even if one has access to the high-dimensional data set it is often desirable
to first project the data points into a lower-dimensional space and to perform
the clustering task there; this reduces storage requirements and computational
cost. The purpose of this paper is to quantify the impact of
dimensionality-reduction through random projection on the performance of the
sparse subspace clustering (SSC) and the thresholding based subspace clustering
(TSC) algorithms. We find that for both algorithms dimensionality reduction
down to the order of the subspace dimensions is possible without incurring
significant performance degradation. The mathematical engine behind our
theorems is a result quantifying how the affinities between subspaces change
under random dimensionality reducing projections.Comment: ISIT 201
The AutoProof Verifier: Usability by Non-Experts and on Standard Code
Formal verification tools are often developed by experts for experts; as a
result, their usability by programmers with little formal methods experience
may be severely limited. In this paper, we discuss this general phenomenon with
reference to AutoProof: a tool that can verify the full functional correctness
of object-oriented software. In particular, we present our experiences of using
AutoProof in two contrasting contexts representative of non-expert usage.
First, we discuss its usability by students in a graduate course on software
verification, who were tasked with verifying implementations of various sorting
algorithms. Second, we evaluate its usability in verifying code developed for
programming assignments of an undergraduate course. The first scenario
represents usability by serious non-experts; the second represents usability on
"standard code", developed without full functional verification in mind. We
report our experiences and lessons learnt, from which we derive some general
suggestions for furthering the development of verification tools with respect
to improving their usability.Comment: In Proceedings F-IDE 2015, arXiv:1508.0338
Greedy Algorithms for Cone Constrained Optimization with Convergence Guarantees
Greedy optimization methods such as Matching Pursuit (MP) and Frank-Wolfe
(FW) algorithms regained popularity in recent years due to their simplicity,
effectiveness and theoretical guarantees. MP and FW address optimization over
the linear span and the convex hull of a set of atoms, respectively. In this
paper, we consider the intermediate case of optimization over the convex cone,
parametrized as the conic hull of a generic atom set, leading to the first
principled definitions of non-negative MP algorithms for which we give explicit
convergence rates and demonstrate excellent empirical performance. In
particular, we derive sublinear () convergence on general
smooth and convex objectives, and linear convergence () on
strongly convex objectives, in both cases for general sets of atoms.
Furthermore, we establish a clear correspondence of our algorithms to known
algorithms from the MP and FW literature. Our novel algorithms and analyses
target general atom sets and general objective functions, and hence are
directly applicable to a large variety of learning settings.Comment: NIPS 201
- âŠ