Search CORE

12 research outputs found

Online Correlation Clustering

Author: Mathieu Claire
Sankur Ocan
Schudy Warren
Publication venue
Publication date: 01/01/2010
Field of study

We study the online clustering problem where data items arrive in an online fashion. The algorithm maintains a clustering of data items into similarity classes. Upon arrival of v, the relation between v and previously arrived items is revealed, so that for each u we are told whether v is similar to u. The algorithm can create a new cluster for v and merge existing clusters. When the objective is to minimize disagreements between the clustering and the input, we prove that a natural greedy algorithm is O(n)-competitive, and this is optimal. When the objective is to maximize agreements between the clustering and the input, we prove that the greedy algorithm is .5-competitive; that no online algorithm can be better than .834-competitive; we prove that it is possible to get better than 1/2, by exhibiting a randomized algorithm with competitive ratio .5+c for a small positive fixed constant c.Comment: 12 pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

Dagstuhl Research Online Publication Server

Learning and Testing Variable Partitions

Author: Bogdanov Andrej
Wang Baoxiang
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 11th Innovations in Theoretical Computer Science Conference (ITCS 2020)
Publication date: 01/01/2020
Field of study

Let

F

be a multivariate function from a product set

\Sigma^n

to an Abelian group

G

. A

k

-partition of

F

with cost

\delta

is a partition of the set of variables

\mathbf{V}

into

k

non-empty subsets

(\mathbf{X}_1, \dots, \mathbf{X}_k)

such that

F(\mathbf{V})

\delta

-close to

F_1(\mathbf{X}_1)+\dots+F_k(\mathbf{X}_k)

for some

F_1, \dots, F_k

with respect to a given error metric. We study algorithms for agnostically learning

k

partitions and testing

k

-partitionability over various groups and error metrics given query access to

F

. In particular we show that

1.

Given a function that has a

k

-partition of cost

\delta

, a partition of cost

\mathcal{O}(k n^2)(\delta + \epsilon)

can be learned in time

\tilde{\mathcal{O}}(n^2 \mathrm{poly} (1/\epsilon))

for any

\epsilon > 0

. In contrast, for

k = 2

and

n = 3

learning a partition of cost

\delta + \epsilon

is NP-hard.

2.

When

F

is real-valued and the error metric is the 2-norm, a 2-partition of cost

\sqrt{\delta^2 + \epsilon}

can be learned in time

\tilde{\mathcal{O}}(n^5/\epsilon^2)

3.

When

F

\mathbb{Z}_q

-valued and the error metric is Hamming weight,

k

-partitionability is testable with one-sided error and

\mathcal{O}(kn^3/\epsilon)

non-adaptive queries. We also show that even two-sided testers require

\Omega(n)

queries when

k = 2

. This work was motivated by reinforcement learning control tasks in which the set of control variables can be partitioned. The partitioning reduces the task into multiple lower-dimensional ones that are relatively easier to learn. Our second algorithm empirically increases the scores attained over previous heuristic partitioning methods applied in this context.Comment: Innovations in Theoretical Computer Science (ITCS) 202

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Faster graph algorithms via switching classes

Author: Lindzey Nathan
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2012
Field of study

2012 Summer.Includes bibliographical references.The runtime of an algorithm is intimately related to how an instance is represented. Recall that the runtimes of the first generation of graph algorithms were expressed as functions of n := |V|. This analysis was natural since at this time graphs were represented in n2 space via their adjacency matrix. It was soon noticed that if m := |E| = o(n2), then a variety of graph algorithms could be sped-up by computing the adjacency-list from the adjacency matrix, then running the algorithm on the more efficient adjacency-list representation. This motivated the introduction of m to the runtime of graph algorithms and it is now customary in algorithm design to assume that a graph instance is given in the form of its adjacency-list. For instance, a graph algorithm is not considered to run in linear time unless it runs in O(n + m) time. An O(n2) bound is not considered linear, even though the two bounds are the same in the worst case. Let m͂ be the size of the minimum representative of a graph G's switching class (w.r.t. to some switching operation). It is shown that better bounds for several classical graph algorithms can be obtained by modifying them so that their running time is a function of n+m͂ rather than of n+m. This is significant because m͂ is O(m) but m is not O(m͂). This is accomplished by first computing the so-called partially complemented adjacency list (pc-list) from an adjacency list, then designing an algorithm that is amenable to the more efficient pc-list representation. The pc-list data-structure is generalization of the adjacency list that has a natural correspondence to switching classes. Using this approach, better bounds are obtained for bipartite maximum matching, graph diameter, and vertex-weighted all-pairs shortest path

Mountain Scholar (Digital Collections of Colorado and Wyoming)

Multi Layer Peeling for Linear Arrangement and Hierarchical Clustering

Author: Azar Yossi
Vainstein Danny
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)
Publication date: 01/01/2023
Field of study

We present a new multi-layer peeling technique to cluster points in a metric space. A well-known non-parametric objective is to embed the metric space into a simpler structured metric space such as a line (i.e., Linear Arrangement) or a binary tree (i.e., Hierarchical Clustering). Points which are close in the metric space should be mapped to close points/leaves in the line/tree; similarly, points which are far in the metric space should be far in the line or on the tree. In particular we consider the Maximum Linear Arrangement problem [Refael Hassin and Shlomi Rubinstein, 2001] and the Maximum Hierarchical Clustering problem [Vincent Cohen-Addad et al., 2018] applied to metrics. We design approximation schemes (1-? approximation for any constant ? > 0) for these objectives. In particular this shows that by considering metrics one may significantly improve former approximations (0.5 for Max Linear Arrangement and 0.74 for Max Hierarchical Clustering). Our main technique, which is called multi-layer peeling, consists of recursively peeling off points which are far from the "core" of the metric space. The recursion ends once the core becomes a sufficiently densely weighted metric space (i.e. the average distance is at least a constant times the diameter) or once it becomes negligible with respect to its inner contribution to the objective. Interestingly, the algorithm in the Linear Arrangement case is much more involved than that in the Hierarchical Clustering case, and uses a significantly more delicate peeling

Dagstuhl Research Online Publication Server

Exploiting Dense Structures in Parameterized Complexity

Author: Lochet William
Lokshtanov Daniel
Saurabh Saket
Zehavi Meirav
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 38th International Symposium on Theoretical Aspects of Computer Science (STACS 2021)
Publication date: 01/01/2021
Field of study

Over the past few decades, the study of dense structures from the perspective of approximation algorithms has become a wide area of research. However, from the viewpoint of parameterized algorithm, this area is largely unexplored. In particular, properties of random samples have been successfully deployed to design approximation schemes for a number of fundamental problems on dense structures [Arora et al. FOCS 1995, Goldreich et al. FOCS 1996, Giotis and Guruswami SODA 2006, Karpinksi and Schudy STOC 2009]. In this paper, we fill this gap, and harness the power of random samples as well as structure theory to design kernelization as well as parameterized algorithms on dense structures. In particular, we obtain linear vertex kernels for Edge-Disjoint Paths, Edge Odd Cycle Transversal, Minimum Bisection, d-Way Cut, Multiway Cut and Multicut on everywhere dense graphs. In fact, these kernels are obtained by designing a polynomial-time algorithm when the corresponding parameter is at most ?(n). Additionally, we obtain a cubic kernel for Vertex-Disjoint Paths on everywhere dense graphs. In addition to kernelization results, we obtain randomized subexponential-time parameterized algorithms for Edge Odd Cycle Transversal, Minimum Bisection, and d-Way Cut. Finally, we show how all of our results (as well as EPASes for these problems) can be de-randomized

Dagstuhl Research Online Publication Server

Improved Approximation Algorithms for Bipartite Correlation Clustering

Author: A. Zuylen van
H. Zha
I. Giotis
J. Guo
M. Charikar
N. Ailon
N. Ailon
N. Bansal
S.C. Madeira
X.Z. Fern
Y. Cheng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Crossref

MPG.PuRe

Improved Theoretical and Practical Guarantees for Chromatic Correlation Clustering

Author: Cesa-Bianchi N.
Knuth D. E.
Mathieu C.
Swamy C.
Tan P.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/11/2015
Field of study

We study a natural generalization of the correlation cluster-ing problem to graphs in which the pairwise relations be-tween objects are categorical instead of binary. This prob-lem was recently introduced by Bonchi et al. under the name of chromatic correlation clustering, and is motivated by many real-world applications in data-mining and social networks, including community detection, link classification, and entity de-duplication. Our main contribution is a fast and easy-to-implement constant approximation framework for the problem, which builds on a novel reduction of the problem to that of cor-relation clustering. This result significantly progresses the current state of knowledge for the problem, improving on a previous result that only guaranteed linear approximation in the input size. We complement the above result by devel-oping a linear programming-based algorithm that achieves an improved approximation ratio of 4. Although this al-gorithm cannot be considered to be practical, it further ex-tends our theoretical understanding of chromatic correlation clustering. We also present a fast heuristic algorithm that is motivated by real-life scenarios in which there is a ground-truth clustering that is obscured by noisy observations. We test our algorithms on both synthetic and real datasets, like social networks data. Our experiments reinforce the theoret-ical findings by demonstrating that our algorithms generally outperform previous approaches, both in terms of solution cost and reconstruction of an underlying ground-truth clus-tering

CiteSeerX

Crossref