278 research outputs found
Greedy MAXCUT Algorithms and their Information Content
MAXCUT defines a classical NP-hard problem for graph partitioning and it
serves as a typical case of the symmetric non-monotone Unconstrained Submodular
Maximization (USM) problem. Applications of MAXCUT are abundant in machine
learning, computer vision and statistical physics. Greedy algorithms to
approximately solve MAXCUT rely on greedy vertex labelling or on an edge
contraction strategy. These algorithms have been studied by measuring their
approximation ratios in the worst case setting but very little is known to
characterize their robustness to noise contaminations of the input data in the
average case. Adapting the framework of Approximation Set Coding, we present a
method to exactly measure the cardinality of the algorithmic approximation sets
of five greedy MAXCUT algorithms. Their information contents are explored for
graph instances generated by two different noise models: the edge reversal
model and Gaussian edge weights model. The results provide insights into the
robustness of different greedy heuristics and techniques for MAXCUT, which can
be used for algorithm design of general USM problems.Comment: This is a longer version of the paper published in 2015 IEEE
Information Theory Workshop (ITW
Exponential Time Complexity of the Permanent and the Tutte Polynomial
We show conditional lower bounds for well-studied #P-hard problems:
(a) The number of satisfying assignments of a 2-CNF formula with n variables
cannot be counted in time exp(o(n)), and the same is true for computing the
number of all independent sets in an n-vertex graph.
(b) The permanent of an n x n matrix with entries 0 and 1 cannot be computed
in time exp(o(n)).
(c) The Tutte polynomial of an n-vertex multigraph cannot be computed in time
exp(o(n)) at most evaluation points (x,y) in the case of multigraphs, and it
cannot be computed in time exp(o(n/polylog n)) in the case of simple graphs.
Our lower bounds are relative to (variants of) the Exponential Time
Hypothesis (ETH), which says that the satisfiability of n-variable 3-CNF
formulas cannot be decided in time exp(o(n)). We relax this hypothesis by
introducing its counting version #ETH, namely that the satisfying assignments
cannot be counted in time exp(o(n)). In order to use #ETH for our lower bounds,
we transfer the sparsification lemma for d-CNF formulas to the counting
setting
The Metric Nearness Problem
Metric nearness refers to the problem of optimally restoring metric properties to
distance measurements that happen to be nonmetric due to measurement errors or otherwise. Metric
data can be important in various settings, for example, in clustering, classification, metric-based
indexing, query processing, and graph theoretic approximation algorithms. This paper formulates
and solves the metric nearness problem: Given a set of pairwise dissimilarities, find a “nearest” set
of distances that satisfy the properties of a metric—principally the triangle inequality. For solving
this problem, the paper develops efficient triangle fixing algorithms that are based on an iterative
projection method. An intriguing aspect of the metric nearness problem is that a special case turns
out to be equivalent to the all pairs shortest paths problem. The paper exploits this equivalence and
develops a new algorithm for the latter problem using a primal-dual method. Applications to graph
clustering are provided as an illustration. We include experiments that demonstrate the computational
superiority of triangle fixing over general purpose convex programming software. Finally, we
conclude by suggesting various useful extensions and generalizations to metric nearness
Clustering is difficult only when it does not matter
Numerous papers ask how difficult it is to cluster data. We suggest that the
more relevant and interesting question is how difficult it is to cluster data
sets {\em that can be clustered well}. More generally, despite the ubiquity and
the great importance of clustering, we still do not have a satisfactory
mathematical theory of clustering. In order to properly understand clustering,
it is clearly necessary to develop a solid theoretical basis for the area. For
example, from the perspective of computational complexity theory the clustering
problem seems very hard. Numerous papers introduce various criteria and
numerical measures to quantify the quality of a given clustering. The resulting
conclusions are pessimistic, since it is computationally difficult to find an
optimal clustering of a given data set, if we go by any of these popular
criteria. In contrast, the practitioners' perspective is much more optimistic.
Our explanation for this disparity of opinions is that complexity theory
concentrates on the worst case, whereas in reality we only care for data sets
that can be clustered well.
We introduce a theoretical framework of clustering in metric spaces that
revolves around a notion of "good clustering". We show that if a good
clustering exists, then in many cases it can be efficiently found. Our
conclusion is that contrary to popular belief, clustering should not be
considered a hard task
Phase transitions of extremal cuts for the configuration model
The -section width and the Max-Cut for the configuration model are shown
to exhibit phase transitions according to the values of certain parameters of
the asymptotic degree distribution. These transitions mirror those observed on
Erd\H{o}s-R\'enyi random graphs, established by Luczak and McDiarmid (2001),
and Coppersmith et al. (2004), respectively
- …