14,930 research outputs found

    Robust Correlation Clustering

    Get PDF
    In this paper, we introduce and study the Robust-Correlation-Clustering problem: given a graph G = (V,E) where every edge is either labeled + or - (denoting similar or dissimilar pairs of vertices), and a parameter m, the goal is to delete a set D of m vertices, and partition the remaining vertices V D into clusters to minimize the cost of the clustering, which is the sum of the number of + edges with end-points in different clusters and the number of - edges with end-points in the same cluster. This generalizes the classical Correlation-Clustering problem which is the special case when m = 0. Correlation clustering is useful when we have (only) qualitative information about the similarity or dissimilarity of pairs of points, and Robust-Correlation-Clustering equips this model with the capability to handle noise in datasets. In this work, we present a constant-factor bi-criteria algorithm for Robust-Correlation-Clustering on complete graphs (where our solution is O(1)-approximate w.r.t the cost while however discarding O(1) m points as outliers), and also complement this by showing that no finite approximation is possible if we do not violate the outlier budget. Our algorithm is very simple in that it first does a simple LP-based pre-processing to delete O(m) vertices, and subsequently runs a particular Correlation-Clustering algorithm ACNAlg [Ailon et al., 2005] on the residual instance. We then consider general graphs, and show (O(log n), O(log^2 n)) bi-criteria algorithms while also showing a hardness of alpha_MC on both the cost and the outlier violation, where alpha_MC is the lower bound for the Minimum-Multicut problem

    Overlapping and Robust Edge-Colored Clustering in Hypergraphs

    Full text link
    A recent trend in data mining has explored (hyper)graph clustering algorithms for data with categorical relationship types. Such algorithms have applications in the analysis of social, co-authorship, and protein interaction networks, to name a few. Many such applications naturally have some overlap between clusters, a nuance which is missing from current combinatorial models. Additionally, existing models lack a mechanism for handling noise in datasets. We address these concerns by generalizing Edge-Colored Clustering, a recent framework for categorical clustering of hypergraphs. Our generalizations allow for a budgeted number of either (a) overlapping cluster assignments or (b) node deletions. For each new model we present a greedy algorithm which approximately minimizes an edge mistake objective, as well as bicriteria approximations where the second approximation factor is on the budget. Additionally, we address the parameterized complexity of each problem, providing FPT algorithms and hardness results

    Local Guarantees in Graph Cuts and Clustering

    Full text link
    Correlation Clustering is an elegant model that captures fundamental graph cut problems such as Min s−ts-t Cut, Multiway Cut, and Multicut, extensively studied in combinatorial optimization. Here, we are given a graph with edges labeled ++ or −- and the goal is to produce a clustering that agrees with the labels as much as possible: ++ edges within clusters and −- edges across clusters. The classical approach towards Correlation Clustering (and other graph cut problems) is to optimize a global objective. We depart from this and study local objectives: minimizing the maximum number of disagreements for edges incident on a single node, and the analogous max min agreements objective. This naturally gives rise to a family of basic min-max graph cut problems. A prototypical representative is Min Max s−ts-t Cut: find an s−ts-t cut minimizing the largest number of cut edges incident on any node. We present the following results: (1)(1) an O(n)O(\sqrt{n})-approximation for the problem of minimizing the maximum total weight of disagreement edges incident on any node (thus providing the first known approximation for the above family of min-max graph cut problems), (2)(2) a remarkably simple 77-approximation for minimizing local disagreements in complete graphs (improving upon the previous best known approximation of 4848), and (3)(3) a 1/(2+ε)1/(2+\varepsilon)-approximation for maximizing the minimum total weight of agreement edges incident on any node, hence improving upon the 1/(4+ε)1/(4+\varepsilon)-approximation that follows from the study of approximate pure Nash equilibria in cut and party affiliation games

    Centrality of Trees for Capacitated k-Center

    Full text link
    There is a large discrepancy in our understanding of uncapacitated and capacitated versions of network location problems. This is perhaps best illustrated by the classical k-center problem: there is a simple tight 2-approximation algorithm for the uncapacitated version whereas the first constant factor approximation algorithm for the general version with capacities was only recently obtained by using an intricate rounding algorithm that achieves an approximation guarantee in the hundreds. Our paper aims to bridge this discrepancy. For the capacitated k-center problem, we give a simple algorithm with a clean analysis that allows us to prove an approximation guarantee of 9. It uses the standard LP relaxation and comes close to settling the integrality gap (after necessary preprocessing), which is narrowed down to either 7, 8 or 9. The algorithm proceeds by first reducing to special tree instances, and then solves such instances optimally. Our concept of tree instances is quite versatile, and applies to natural variants of the capacitated k-center problem for which we also obtain improved algorithms. Finally, we give evidence to show that more powerful preprocessing could lead to better algorithms, by giving an approximation algorithm that beats the integrality gap for instances where all non-zero capacities are uniform.Comment: 21 pages, 2 figure
    • …
    corecore