12 research outputs found

    Online Correlation Clustering

    Get PDF
    We study the online clustering problem where data items arrive in an online fashion. The algorithm maintains a clustering of data items into similarity classes. Upon arrival of v, the relation between v and previously arrived items is revealed, so that for each u we are told whether v is similar to u. The algorithm can create a new cluster for v and merge existing clusters. When the objective is to minimize disagreements between the clustering and the input, we prove that a natural greedy algorithm is O(n)-competitive, and this is optimal. When the objective is to maximize agreements between the clustering and the input, we prove that the greedy algorithm is .5-competitive; that no online algorithm can be better than .834-competitive; we prove that it is possible to get better than 1/2, by exhibiting a randomized algorithm with competitive ratio .5+c for a small positive fixed constant c.Comment: 12 pages, 1 figur

    Learning and Testing Variable Partitions

    Get PDF
    Let FF be a multivariate function from a product set Σn\Sigma^n to an Abelian group GG. A kk-partition of FF with cost δ\delta is a partition of the set of variables V\mathbf{V} into kk non-empty subsets (X1,…,Xk)(\mathbf{X}_1, \dots, \mathbf{X}_k) such that F(V)F(\mathbf{V}) is δ\delta-close to F1(X1)+⋯+Fk(Xk)F_1(\mathbf{X}_1)+\dots+F_k(\mathbf{X}_k) for some F1,…,FkF_1, \dots, F_k with respect to a given error metric. We study algorithms for agnostically learning kk partitions and testing kk-partitionability over various groups and error metrics given query access to FF. In particular we show that 1.1. Given a function that has a kk-partition of cost δ\delta, a partition of cost O(kn2)(δ+ϵ)\mathcal{O}(k n^2)(\delta + \epsilon) can be learned in time O~(n2poly(1/ϵ))\tilde{\mathcal{O}}(n^2 \mathrm{poly} (1/\epsilon)) for any ϵ>0\epsilon > 0. In contrast, for k=2k = 2 and n=3n = 3 learning a partition of cost δ+ϵ\delta + \epsilon is NP-hard. 2.2. When FF is real-valued and the error metric is the 2-norm, a 2-partition of cost δ2+ϵ\sqrt{\delta^2 + \epsilon} can be learned in time O~(n5/ϵ2)\tilde{\mathcal{O}}(n^5/\epsilon^2). 3.3. When FF is Zq\mathbb{Z}_q-valued and the error metric is Hamming weight, kk-partitionability is testable with one-sided error and O(kn3/ϵ)\mathcal{O}(kn^3/\epsilon) non-adaptive queries. We also show that even two-sided testers require Ω(n)\Omega(n) queries when k=2k = 2. This work was motivated by reinforcement learning control tasks in which the set of control variables can be partitioned. The partitioning reduces the task into multiple lower-dimensional ones that are relatively easier to learn. Our second algorithm empirically increases the scores attained over previous heuristic partitioning methods applied in this context.Comment: Innovations in Theoretical Computer Science (ITCS) 202

    Faster graph algorithms via switching classes

    Get PDF
    2012 Summer.Includes bibliographical references.The runtime of an algorithm is intimately related to how an instance is represented. Recall that the runtimes of the first generation of graph algorithms were expressed as functions of n := |V|. This analysis was natural since at this time graphs were represented in n2 space via their adjacency matrix. It was soon noticed that if m := |E| = o(n2), then a variety of graph algorithms could be sped-up by computing the adjacency-list from the adjacency matrix, then running the algorithm on the more efficient adjacency-list representation. This motivated the introduction of m to the runtime of graph algorithms and it is now customary in algorithm design to assume that a graph instance is given in the form of its adjacency-list. For instance, a graph algorithm is not considered to run in linear time unless it runs in O(n + m) time. An O(n2) bound is not considered linear, even though the two bounds are the same in the worst case. Let m͂ be the size of the minimum representative of a graph G's switching class (w.r.t. to some switching operation). It is shown that better bounds for several classical graph algorithms can be obtained by modifying them so that their running time is a function of n+m͂ rather than of n+m. This is significant because m͂ is O(m) but m is not O(m͂). This is accomplished by first computing the so-called partially complemented adjacency list (pc-list) from an adjacency list, then designing an algorithm that is amenable to the more efficient pc-list representation. The pc-list data-structure is generalization of the adjacency list that has a natural correspondence to switching classes. Using this approach, better bounds are obtained for bipartite maximum matching, graph diameter, and vertex-weighted all-pairs shortest path

    Multi Layer Peeling for Linear Arrangement and Hierarchical Clustering

    Get PDF
    We present a new multi-layer peeling technique to cluster points in a metric space. A well-known non-parametric objective is to embed the metric space into a simpler structured metric space such as a line (i.e., Linear Arrangement) or a binary tree (i.e., Hierarchical Clustering). Points which are close in the metric space should be mapped to close points/leaves in the line/tree; similarly, points which are far in the metric space should be far in the line or on the tree. In particular we consider the Maximum Linear Arrangement problem [Refael Hassin and Shlomi Rubinstein, 2001] and the Maximum Hierarchical Clustering problem [Vincent Cohen-Addad et al., 2018] applied to metrics. We design approximation schemes (1-? approximation for any constant ? > 0) for these objectives. In particular this shows that by considering metrics one may significantly improve former approximations (0.5 for Max Linear Arrangement and 0.74 for Max Hierarchical Clustering). Our main technique, which is called multi-layer peeling, consists of recursively peeling off points which are far from the "core" of the metric space. The recursion ends once the core becomes a sufficiently densely weighted metric space (i.e. the average distance is at least a constant times the diameter) or once it becomes negligible with respect to its inner contribution to the objective. Interestingly, the algorithm in the Linear Arrangement case is much more involved than that in the Hierarchical Clustering case, and uses a significantly more delicate peeling

    Exploiting Dense Structures in Parameterized Complexity

    Get PDF
    Over the past few decades, the study of dense structures from the perspective of approximation algorithms has become a wide area of research. However, from the viewpoint of parameterized algorithm, this area is largely unexplored. In particular, properties of random samples have been successfully deployed to design approximation schemes for a number of fundamental problems on dense structures [Arora et al. FOCS 1995, Goldreich et al. FOCS 1996, Giotis and Guruswami SODA 2006, Karpinksi and Schudy STOC 2009]. In this paper, we fill this gap, and harness the power of random samples as well as structure theory to design kernelization as well as parameterized algorithms on dense structures. In particular, we obtain linear vertex kernels for Edge-Disjoint Paths, Edge Odd Cycle Transversal, Minimum Bisection, d-Way Cut, Multiway Cut and Multicut on everywhere dense graphs. In fact, these kernels are obtained by designing a polynomial-time algorithm when the corresponding parameter is at most ?(n). Additionally, we obtain a cubic kernel for Vertex-Disjoint Paths on everywhere dense graphs. In addition to kernelization results, we obtain randomized subexponential-time parameterized algorithms for Edge Odd Cycle Transversal, Minimum Bisection, and d-Way Cut. Finally, we show how all of our results (as well as EPASes for these problems) can be de-randomized

    Improved Theoretical and Practical Guarantees for Chromatic Correlation Clustering

    Full text link
    We study a natural generalization of the correlation cluster-ing problem to graphs in which the pairwise relations be-tween objects are categorical instead of binary. This prob-lem was recently introduced by Bonchi et al. under the name of chromatic correlation clustering, and is motivated by many real-world applications in data-mining and social networks, including community detection, link classification, and entity de-duplication. Our main contribution is a fast and easy-to-implement constant approximation framework for the problem, which builds on a novel reduction of the problem to that of cor-relation clustering. This result significantly progresses the current state of knowledge for the problem, improving on a previous result that only guaranteed linear approximation in the input size. We complement the above result by devel-oping a linear programming-based algorithm that achieves an improved approximation ratio of 4. Although this al-gorithm cannot be considered to be practical, it further ex-tends our theoretical understanding of chromatic correlation clustering. We also present a fast heuristic algorithm that is motivated by real-life scenarios in which there is a ground-truth clustering that is obscured by noisy observations. We test our algorithms on both synthetic and real datasets, like social networks data. Our experiments reinforce the theoret-ical findings by demonstrating that our algorithms generally outperform previous approaches, both in terms of solution cost and reconstruction of an underlying ground-truth clus-tering
    corecore