21,382 research outputs found
The Metric Nearness Problem
Metric nearness refers to the problem of optimally restoring metric properties to
distance measurements that happen to be nonmetric due to measurement errors or otherwise. Metric
data can be important in various settings, for example, in clustering, classification, metric-based
indexing, query processing, and graph theoretic approximation algorithms. This paper formulates
and solves the metric nearness problem: Given a set of pairwise dissimilarities, find a ānearestā set
of distances that satisfy the properties of a metricāprincipally the triangle inequality. For solving
this problem, the paper develops efficient triangle fixing algorithms that are based on an iterative
projection method. An intriguing aspect of the metric nearness problem is that a special case turns
out to be equivalent to the all pairs shortest paths problem. The paper exploits this equivalence and
develops a new algorithm for the latter problem using a primal-dual method. Applications to graph
clustering are provided as an illustration. We include experiments that demonstrate the computational
superiority of triangle fixing over general purpose convex programming software. Finally, we
conclude by suggesting various useful extensions and generalizations to metric nearness
Beyond pairwise clustering
We consider the problem of clustering in domains where the affinity relations are not dyadic (pairwise), but rather triadic, tetradic or higher. The problem is an instance of the hypergraph partitioning problem. We propose a two-step algorithm for solving this problem. In the first step we use a novel scheme to approximate the hypergraph using a weighted graph. In the second step a spectral partitioning algorithm is used to partition the vertices of this graph. The algorithm is capable of handling hyperedges of all orders including order two, thus incorporating information of all orders simultaneously. We present a theoretical analysis that relates our algorithm to an existing hypergraph partitioning algorithm and explain the reasons for its superior performance. We report the performance of our algorithm on a variety of computer vision problems and compare it to several existing hypergraph partitioning algorithms
Correlation Clustering with Low-Rank Matrices
Correlation clustering is a technique for aggregating data based on
qualitative information about which pairs of objects are labeled 'similar' or
'dissimilar.' Because the optimization problem is NP-hard, much of the previous
literature focuses on finding approximation algorithms. In this paper we
explore how to solve the correlation clustering objective exactly when the data
to be clustered can be represented by a low-rank matrix. We prove in particular
that correlation clustering can be solved in polynomial time when the
underlying matrix is positive semidefinite with small constant rank, but that
the task remains NP-hard in the presence of even one negative eigenvalue. Based
on our theoretical results, we develop an algorithm for efficiently "solving"
low-rank positive semidefinite correlation clustering by employing a procedure
for zonotope vertex enumeration. We demonstrate the effectiveness and speed of
our algorithm by using it to solve several clustering problems on both
synthetic and real-world data
- ā¦