6,558 research outputs found
Anytime Hierarchical Clustering
We propose a new anytime hierarchical clustering method that iteratively
transforms an arbitrary initial hierarchy on the configuration of measurements
along a sequence of trees we prove for a fixed data set must terminate in a
chain of nested partitions that satisfies a natural homogeneity requirement.
Each recursive step re-edits the tree so as to improve a local measure of
cluster homogeneity that is compatible with a number of commonly used (e.g.,
single, average, complete) linkage functions. As an alternative to the standard
batch algorithms, we present numerical evidence to suggest that appropriate
adaptations of this method can yield decentralized, scalable algorithms
suitable for distributed/parallel computation of clustering hierarchies and
online tracking of clustering trees applicable to large, dynamically changing
databases and anomaly detection.Comment: 13 pages, 6 figures, 5 tables, in preparation for submission to a
conferenc
Probabilistic Sparse Subspace Clustering Using Delayed Association
Discovering and clustering subspaces in high-dimensional data is a
fundamental problem of machine learning with a wide range of applications in
data mining, computer vision, and pattern recognition. Earlier methods divided
the problem into two separate stages of finding the similarity matrix and
finding clusters. Similar to some recent works, we integrate these two steps
using a joint optimization approach. We make the following contributions: (i)
we estimate the reliability of the cluster assignment for each point before
assigning a point to a subspace. We group the data points into two groups of
"certain" and "uncertain", with the assignment of latter group delayed until
their subspace association certainty improves. (ii) We demonstrate that delayed
association is better suited for clustering subspaces that have ambiguities,
i.e. when subspaces intersect or data are contaminated with outliers/noise.
(iii) We demonstrate experimentally that such delayed probabilistic association
leads to a more accurate self-representation and final clusters. The proposed
method has higher accuracy both for points that exclusively lie in one
subspace, and those that are on the intersection of subspaces. (iv) We show
that delayed association leads to huge reduction of computational cost, since
it allows for incremental spectral clustering
Face Identification and Clustering
In this thesis, we study two problems based on clustering algorithms. In the
first problem, we study the role of visual attributes using an agglomerative
clustering algorithm to whittle down the search area where the number of
classes is high to improve the performance of clustering. We observe that as we
add more attributes, the clustering performance increases overall. In the
second problem, we study the role of clustering in aggregating templates in a
1:N open set protocol using multi-shot video as a probe. We observe that by
increasing the number of clusters, the performance increases with respect to
the baseline and reaches a peak, after which increasing the number of clusters
causes the performance to degrade. Experiments are conducted using recently
introduced unconstrained IARPA Janus IJB-A, CS2, and CS3 face recognition
datasets
- …