234 research outputs found
Collapsing Superstring Conjecture
In the Shortest Common Superstring (SCS) problem, one is given a collection of strings, and needs to find a shortest string containing each of them as a substring. SCS admits 2 11/23-approximation in polynomial time (Mucha, SODA\u2713). While this algorithm and its analysis are technically involved, the 30 years old Greedy Conjecture claims that the trivial and efficient Greedy Algorithm gives a 2-approximation for SCS.
We develop a graph-theoretic framework for studying approximation algorithms for SCS. The framework is reminiscent of the classical 2-approximation for Traveling Salesman: take two copies of an optimal solution, apply a trivial edge-collapsing procedure, and get an approximate solution. In this framework, we observe two surprising properties of SCS solutions, and we conjecture that they hold for all input instances. The first conjecture, that we call Collapsing Superstring conjecture, claims that there is an elementary way to transform any solution repeated twice into the same graph G. This conjecture would give an elementary 2-approximate algorithm for SCS. The second conjecture claims that not only the resulting graph G is the same for all solutions, but that G can be computed by an elementary greedy procedure called Greedy Hierarchical Algorithm.
While the second conjecture clearly implies the first one, perhaps surprisingly we prove their equivalence. We support these equivalent conjectures by giving a proof for the special case where all input strings have length at most 3 (which until recently had been the only case where the Greedy Conjecture was proven). We also tested our conjectures on millions of instances of SCS.
We prove that the standard Greedy Conjecture implies Greedy Hierarchical Conjecture, while the latter is sufficient for an efficient greedy 2-approximate approximation of SCS. Except for its (conjectured) good approximation ratio, the Greedy Hierarchical Algorithm provably finds a 3.5-approximation, and finds exact solutions for the special cases where we know polynomial time (not greedy) exact algorithms: (1) when the input strings form a spectrum of a string (2) when all input strings have length at most 2
Computational Molecular Biology
Computational Biology is a fairly new subject that arose in response to the computational problems posed by the analysis and the processing of biomolecular sequence and structure data. The field was initiated in the late 60's and early 70's largely by pioneers working in the life sciences. Physicists and mathematicians entered the field in the 70's and 80's, while Computer Science became involved with the new biological problems in the late 1980's. Computational problems have gained further importance in molecular biology through the various genome projects which produce enormous amounts of data. For this bibliography we focus on those areas of computational molecular biology that involve discrete algorithms or discrete optimization. We thus neglect several other areas of computational molecular biology, like most of the literature on the protein folding problem, as well as databases for molecular and genetic data, and genetic mapping algorithms. Due to the availability of review papers and a bibliography this bibliography
Superstrings with multiplicities
A superstring of a set of words P = s1, · · · , sp is a string that contains each word of P as substring. Given P, the well known Shortest Linear Superstring problem (SLS), asks for a shortest superstring of P. In a variant of SLS, called Multi-SLS, each word si comes with an integer m(i), its multiplicity, that sets a constraint on its number of occurrences, and the goal is to find a shortest superstring that contains at least m(i) occurrences of si. Multi-SLS generalizes SLS and is obviously as hard to solve, but it has been studied only in special cases (with words of length 2 or with a fixed number of words). The approximability of Multi-SLS in the general case remains open. Here, we study the approximability of Multi-SLS and that of the companion problem Multi-SCCS, which asks for a shortest cyclic cover instead of shortest superstring. First, we investigate the approximation of a greedy algorithm for maximizing the compression offered by a superstring or by a cyclic cover: the approximation ratio is 1/2 for Multi-SLS and 1 for Multi-SCCS. Then, we exhibit a linear time approximation algorithm, Concat-Greedy, and show it achieves a ratio of 4 regarding the superstring length. This demonstrates that for both measures Multi-SLS belongs to the class of APX problems. © 2018 Yoshifumi Sakai; licensed under Creative Commons License CC-BY.Peer reviewe
Cosmic String Loop Microlensing
Cosmic superstring loops within the galaxy microlens background point sources
lying close to the observer-string line of sight. For suitable alignments,
multiple paths coexist and the (achromatic) flux enhancement is a factor of
two. We explore this unique type of lensing by numerically solving for
geodesics that extend from source to observer as they pass near an oscillating
string. We characterize the duration of the flux doubling and the scale of the
image splitting. We probe and confirm the existence of a variety of fundamental
effects predicted from previous analyses of the static infinite straight
string: the deficit angle, the Kaiser-Stebbins effect, and the scale of the
impact parameter required to produce microlensing. Our quantitative results for
dynamical loops vary by O(1) factors with respect to estimates based on
infinite straight strings for a given impact parameter. A number of new
features are identified in the computed microlensing solutions. Our results
suggest that optical microlensing can offer a new and potentially powerful
methodology for searches for superstring loop relics of the inflationary era.Comment: 20 pages, 19 figure
On Approximability of Bounded Degree Instances of Selected Optimization Problems
In order to cope with the approximation hardness of an underlying optimization problem, it is advantageous to consider specific families of instances with properties that can be exploited to obtain efficient approximation algorithms for the restricted version of the problem with improved performance guarantees. In this thesis, we investigate the approximation complexity of selected NP-hard optimization problems restricted to instances with bounded degree, occurrence or weight parameter. Specifically, we consider the family of dense instances, where typically the average degree is bounded from below by some function of the size of the instance. Complementarily, we examine the family of sparse instances, in which the average degree is bounded from above by some fixed constant. We focus on developing new methods for proving explicit approximation hardness results for general as well as for restricted instances. The fist part of the thesis contributes to the systematic investigation of the VERTEX COVER problem in k-hypergraphs and k-partite k-hypergraphs with density and regularity constraints. We design efficient approximation algorithms for the problems with improved performance guarantees as compared to the general case. On the other hand, we prove the optimality of our approximation upper bounds under the Unique Games Conjecture or a variant. In the second part of the thesis, we study mainly the approximation hardness of restricted instances of selected global optimization problems. We establish improved or in some cases the first inapproximability thresholds for the problems considered in this thesis such as the METRIC DIMENSION problem restricted to graphs with maximum degree 3 and the (1,2)-STEINER TREE problem. We introduce a new reductions method for proving explicit approximation lower bounds for problems that are related to the TRAVELING SALESPERSON (TSP) problem. In particular, we prove the best up to now inapproximability thresholds for the general METRIC TSP problem, the ASYMMETRIC TSP problem, the SHORTEST SUPERSTRING problem, the MAXIMUM TSP problem and TSP problems with bounded metrics
Shortest common superstring approximaation nopea toteutus sekä soveltaminen relative lempel-ziv pakkaukseen
The objective of the shortest common superstring problem is to find a string of minimum length that contains all keywords in the given input as substrings. Shortest common superstrings have many applications in the fields of data compression and bioinformatics. For example, a common superstring can be seen as a compressed form of the keywords it is generated from.
Since the shortest common superstring problem is NP-hard, we focus on the approximation algorithms that implement a so-called greed heuristic. It turns out that the actual shortest common superstring is not always needed. Instead, it is often enough to find an approximate solution of sufficient quality.
We provide an implementation of the Ukkonen's linear time algorithm for the greedy heuristic. The practical performance of this implementation is measured by comparing it to another implementation of the same heuristic. We also hypothesize that shortest common superstrings can be potentially used to improve the compression ratio of the Relative Lempel-Ziv data compression algorithm. This hypothesis is examined and shown to be valid
Approximating -center clustering for curves
The Euclidean -center problem is a classical problem that has been
extensively studied in computer science. Given a set of
points in Euclidean space, the problem is to determine a set of
centers (not necessarily part of ) such that the maximum
distance between a point in and its nearest neighbor in
is minimized. In this paper we study the corresponding
-center problem for polygonal curves under the Fr\'echet distance,
that is, given a set of polygonal curves in ,
each of complexity , determine a set of polygonal curves
in , each of complexity , such that the maximum Fr\'echet
distance of a curve in to its closest curve in is
minimized. In this paper, we substantially extend and improve the known
approximation bounds for curves in dimension and higher. We show that, if
is part of the input, then there is no polynomial-time approximation
scheme unless . Our constructions yield different
bounds for one and two-dimensional curves and the discrete and continuous
Fr\'echet distance. In the case of the discrete Fr\'echet distance on
two-dimensional curves, we show hardness of approximation within a factor close
to . This result also holds when , and the -hardness
extends to the case that , i.e., for the problem of computing the
minimum-enclosing ball under the Fr\'echet distance. Finally, we observe that a
careful adaptation of Gonzalez' algorithm in combination with a curve
simplification yields a -approximation in any dimension, provided that an
optimal simplification can be computed exactly. We conclude that our
approximation bounds are close to being tight.Comment: 24 pages; results on minimum-enclosing ball added, additional author
added, general revisio
Minimum-weight Cycle Covers and Their Approximability
A cycle cover of a graph is a set of cycles such that every vertex is part of
exactly one cycle. An L-cycle cover is a cycle cover in which the length of
every cycle is in the set L.
We investigate how well L-cycle covers of minimum weight can be approximated.
For undirected graphs, we devise a polynomial-time approximation algorithm that
achieves a constant approximation ratio for all sets L. On the other hand, we
prove that the problem cannot be approximated within a factor of 2-eps for
certain sets L.
For directed graphs, we present a polynomial-time approximation algorithm
that achieves an approximation ratio of O(n), where is the number of
vertices. This is asymptotically optimal: We show that the problem cannot be
approximated within a factor of o(n).
To contrast the results for cycle covers of minimum weight, we show that the
problem of computing L-cycle covers of maximum weight can, at least in
principle, be approximated arbitrarily well.Comment: To appear in the Proceedings of the 33rd Workshop on Graph-Theoretic
Concepts in Computer Science (WG 2007). Minor change
- …