1,505 research outputs found
A Tight Approximation Algorithm for the Cluster Vertex Deletion Problem
We give the first -approximation algorithm for the cluster vertex deletion
problem. This is tight, since approximating the problem within any constant
factor smaller than is UGC-hard. Our algorithm combines the previous
approaches, based on the local ratio technique and the management of true
twins, with a novel construction of a 'good' cost function on the vertices at
distance at most from any vertex of the input graph.
As an additional contribution, we also study cluster vertex deletion from the
polyhedral perspective, where we prove almost matching upper and lower bounds
on how well linear programming relaxations can approximate the problem.Comment: 23 pages, 3 figure
Structural Rounding: Approximation Algorithms for Graphs Near an Algorithmically Tractable Class
We develop a framework for generalizing approximation algorithms from the structural graph algorithm literature so that they apply to graphs somewhat close to that class (a scenario we expect is common when working with real-world networks) while still guaranteeing approximation ratios. The idea is to edit a given graph via vertex- or edge-deletions to put the graph into an algorithmically tractable class, apply known approximation algorithms for that class, and then lift the solution to apply to the original graph. We give a general characterization of when an optimization problem is amenable to this approach, and show that it includes many well-studied graph problems, such as Independent Set, Vertex Cover, Feedback Vertex Set, Minimum Maximal Matching, Chromatic Number, (l-)Dominating Set, Edge (l-)Dominating Set, and Connected Dominating Set.
To enable this framework, we develop new editing algorithms that find the approximately-fewest edits required to bring a given graph into one of a few important graph classes (in some cases these are bicriteria algorithms which simultaneously approximate both the number of editing operations and the target parameter of the family). For bounded degeneracy, we obtain an O(r log{n})-approximation and a bicriteria (4,4)-approximation which also extends to a smoother bicriteria trade-off. For bounded treewidth, we obtain a bicriteria (O(log^{1.5} n), O(sqrt{log w}))-approximation, and for bounded pathwidth, we obtain a bicriteria (O(log^{1.5} n), O(sqrt{log w} * log n))-approximation. For treedepth 2 (related to bounded expansion), we obtain a 4-approximation. We also prove complementary hardness-of-approximation results assuming P != NP: in particular, these problems are all log-factor inapproximable, except the last which is not approximable below some constant factor 2 (assuming UGC)
Online Steiner Tree with Deletions
In the online Steiner tree problem, the input is a set of vertices that
appear one-by-one, and we have to maintain a Steiner tree on the current set of
vertices. The cost of the tree is the total length of edges in the tree, and we
want this cost to be close to the cost of the optimal Steiner tree at all
points in time. If we are allowed to only add edges, a tight bound of
on the competitiveness is known. Recently it was shown that if
we can add one new edge and make one edge swap upon every vertex arrival, we
can maintain a constant-competitive tree online.
But what if the set of vertices sees both additions and deletions? Again, we
would like to obtain a low-cost Steiner tree with as few edge changes as
possible. The original paper of Imase and Waxman had also considered this
model, and it gave a greedy algorithm that maintained a constant-competitive
tree online, and made at most edge changes for the first
requests. In this paper give the following two results.
Our first result is an online algorithm that maintains a Steiner tree only
under deletions: we start off with a set of vertices, and at each time one of
the vertices is removed from this set: our Steiner tree no longer has to span
this vertex. We give an algorithm that changes only a constant number of edges
upon each request, and maintains a constant-competitive tree at all times. Our
algorithm uses the primal-dual framework and a global charging argument to
carefully make these constant number of changes.
We then study the natural greedy algorithm proposed by Imase and Waxman that
maintains a constant-competitive Steiner tree in the fully-dynamic model (where
each request either adds or deletes a vertex). Our second result shows that
this algorithm makes only a constant number of changes per request in an
amortized sense.Comment: An extended abstract appears in the SODA 2014 conferenc
Malware Classification based on Call Graph Clustering
Each day, anti-virus companies receive tens of thousands samples of
potentially harmful executables. Many of the malicious samples are variations
of previously encountered malware, created by their authors to evade
pattern-based detection. Dealing with these large amounts of data requires
robust, automatic detection approaches. This paper studies malware
classification based on call graph clustering. By representing malware samples
as call graphs, it is possible to abstract certain variations away, and enable
the detection of structural similarities between samples. The ability to
cluster similar samples together will make more generic detection techniques
possible, thereby targeting the commonalities of the samples within a cluster.
To compare call graphs mutually, we compute pairwise graph similarity scores
via graph matchings which approximately minimize the graph edit distance. Next,
to facilitate the discovery of similar malware samples, we employ several
clustering algorithms, including k-medoids and DBSCAN. Clustering experiments
are conducted on a collection of real malware samples, and the results are
evaluated against manual classifications provided by human malware analysts.
Experiments show that it is indeed possible to accurately detect malware
families via call graph clustering. We anticipate that in the future, call
graphs can be used to analyse the emergence of new malware families, and
ultimately to automate implementation of generic detection schemes.Comment: This research has been supported by TEKES - the Finnish Funding
Agency for Technology and Innovation as part of its ICT SHOK Future Internet
research programme, grant 40212/0
Lossy Kernelization for (Implicit) Hitting Set Problems
We re-visit the complexity of polynomial time pre-processing (kernelization) for the d-Hitting Set problem. This is one of the most classic problems in Parameterized Complexity by itself, and, furthermore, it encompasses several other of the most well-studied problems in this field, such as Vertex Cover, Feedback Vertex Set in Tournaments (FVST) and Cluster Vertex Deletion (CVD). In fact, d-Hitting Set encompasses any deletion problem to a hereditary property that can be characterized by a finite set of forbidden induced subgraphs. With respect to bit size, the kernelization complexity of d-Hitting Set is essentially settled: there exists a kernel with ?(k^d) bits (?(k^d) sets and ?(k^{d-1}) elements) and this it tight by the result of Dell and van Melkebeek [STOC 2010, JACM 2014]. Still, the question of whether there exists a kernel for d-Hitting Set with fewer elements has remained one of the most major open problems in Kernelization.
In this paper, we first show that if we allow the kernelization to be lossy with a qualitatively better loss than the best possible approximation ratio of polynomial time approximation algorithms, then one can obtain kernels where the number of elements is linear for every fixed d. Further, based on this, we present our main result: we show that there exist approximate Turing kernelizations for d-Hitting Set that even beat the established bit-size lower bounds for exact kernelizations - in fact, we use a constant number of oracle calls, each with "near linear" (?(k^{1+?})) bit size, that is, almost the best one could hope for. Lastly, for two special cases of implicit 3-Hitting set, namely, FVST and CVD, we obtain the "best of both worlds" type of results - (1+?)-approximate kernelizations with a linear number of vertices. In terms of size, this substantially improves the exact kernels of Fomin et al. [SODA 2018, TALG 2019], with simpler arguments
Distributed Edge Connectivity in Sublinear Time
We present the first sublinear-time algorithm for a distributed
message-passing network sto compute its edge connectivity exactly in
the CONGEST model, as long as there are no parallel edges. Our algorithm takes
time to compute and a
cut of cardinality with high probability, where and are the
number of nodes and the diameter of the network, respectively, and
hides polylogarithmic factors. This running time is sublinear in (i.e.
) whenever is. Previous sublinear-time
distributed algorithms can solve this problem either (i) exactly only when
[Thurimella PODC'95; Pritchard, Thurimella, ACM
Trans. Algorithms'11; Nanongkai, Su, DISC'14] or (ii) approximately [Ghaffari,
Kuhn, DISC'13; Nanongkai, Su, DISC'14].
To achieve this we develop and combine several new techniques. First, we
design the first distributed algorithm that can compute a -edge connectivity
certificate for any in time .
Second, we show that by combining the recent distributed expander decomposition
technique of [Chang, Pettie, Zhang, SODA'19] with techniques from the
sequential deterministic edge connectivity algorithm of [Kawarabayashi, Thorup,
STOC'15], we can decompose the network into a sublinear number of clusters with
small average diameter and without any mincut separating a cluster (except the
`trivial' ones). Finally, by extending the tree packing technique from [Karger
STOC'96], we can find the minimum cut in time proportional to the number of
components. As a byproduct of this technique, we obtain an -time
algorithm for computing exact minimum cut for weighted graphs.Comment: Accepted at 51st ACM Symposium on Theory of Computing (STOC 2019
- …