278 research outputs found

    Greedy MAXCUT Algorithms and their Information Content

    Full text link
    MAXCUT defines a classical NP-hard problem for graph partitioning and it serves as a typical case of the symmetric non-monotone Unconstrained Submodular Maximization (USM) problem. Applications of MAXCUT are abundant in machine learning, computer vision and statistical physics. Greedy algorithms to approximately solve MAXCUT rely on greedy vertex labelling or on an edge contraction strategy. These algorithms have been studied by measuring their approximation ratios in the worst case setting but very little is known to characterize their robustness to noise contaminations of the input data in the average case. Adapting the framework of Approximation Set Coding, we present a method to exactly measure the cardinality of the algorithmic approximation sets of five greedy MAXCUT algorithms. Their information contents are explored for graph instances generated by two different noise models: the edge reversal model and Gaussian edge weights model. The results provide insights into the robustness of different greedy heuristics and techniques for MAXCUT, which can be used for algorithm design of general USM problems.Comment: This is a longer version of the paper published in 2015 IEEE Information Theory Workshop (ITW

    Exponential Time Complexity of the Permanent and the Tutte Polynomial

    Get PDF
    We show conditional lower bounds for well-studied #P-hard problems: (a) The number of satisfying assignments of a 2-CNF formula with n variables cannot be counted in time exp(o(n)), and the same is true for computing the number of all independent sets in an n-vertex graph. (b) The permanent of an n x n matrix with entries 0 and 1 cannot be computed in time exp(o(n)). (c) The Tutte polynomial of an n-vertex multigraph cannot be computed in time exp(o(n)) at most evaluation points (x,y) in the case of multigraphs, and it cannot be computed in time exp(o(n/polylog n)) in the case of simple graphs. Our lower bounds are relative to (variants of) the Exponential Time Hypothesis (ETH), which says that the satisfiability of n-variable 3-CNF formulas cannot be decided in time exp(o(n)). We relax this hypothesis by introducing its counting version #ETH, namely that the satisfying assignments cannot be counted in time exp(o(n)). In order to use #ETH for our lower bounds, we transfer the sparsification lemma for d-CNF formulas to the counting setting

    The Metric Nearness Problem

    Get PDF
    Metric nearness refers to the problem of optimally restoring metric properties to distance measurements that happen to be nonmetric due to measurement errors or otherwise. Metric data can be important in various settings, for example, in clustering, classification, metric-based indexing, query processing, and graph theoretic approximation algorithms. This paper formulates and solves the metric nearness problem: Given a set of pairwise dissimilarities, find a “nearest” set of distances that satisfy the properties of a metric—principally the triangle inequality. For solving this problem, the paper develops efficient triangle fixing algorithms that are based on an iterative projection method. An intriguing aspect of the metric nearness problem is that a special case turns out to be equivalent to the all pairs shortest paths problem. The paper exploits this equivalence and develops a new algorithm for the latter problem using a primal-dual method. Applications to graph clustering are provided as an illustration. We include experiments that demonstrate the computational superiority of triangle fixing over general purpose convex programming software. Finally, we conclude by suggesting various useful extensions and generalizations to metric nearness

    Clustering is difficult only when it does not matter

    Full text link
    Numerous papers ask how difficult it is to cluster data. We suggest that the more relevant and interesting question is how difficult it is to cluster data sets {\em that can be clustered well}. More generally, despite the ubiquity and the great importance of clustering, we still do not have a satisfactory mathematical theory of clustering. In order to properly understand clustering, it is clearly necessary to develop a solid theoretical basis for the area. For example, from the perspective of computational complexity theory the clustering problem seems very hard. Numerous papers introduce various criteria and numerical measures to quantify the quality of a given clustering. The resulting conclusions are pessimistic, since it is computationally difficult to find an optimal clustering of a given data set, if we go by any of these popular criteria. In contrast, the practitioners' perspective is much more optimistic. Our explanation for this disparity of opinions is that complexity theory concentrates on the worst case, whereas in reality we only care for data sets that can be clustered well. We introduce a theoretical framework of clustering in metric spaces that revolves around a notion of "good clustering". We show that if a good clustering exists, then in many cases it can be efficiently found. Our conclusion is that contrary to popular belief, clustering should not be considered a hard task

    Co-Degeneracy and Co-Treewidth: Using the Complement to Solve Dense Instances

    Get PDF

    Phase transitions of extremal cuts for the configuration model

    Get PDF
    The kk-section width and the Max-Cut for the configuration model are shown to exhibit phase transitions according to the values of certain parameters of the asymptotic degree distribution. These transitions mirror those observed on Erd\H{o}s-R\'enyi random graphs, established by Luczak and McDiarmid (2001), and Coppersmith et al. (2004), respectively
    corecore