165 research outputs found

    Collapsing Superstring Conjecture

    Get PDF
    In the Shortest Common Superstring (SCS) problem, one is given a collection of strings, and needs to find a shortest string containing each of them as a substring. SCS admits 2 11/23-approximation in polynomial time (Mucha, SODA\u2713). While this algorithm and its analysis are technically involved, the 30 years old Greedy Conjecture claims that the trivial and efficient Greedy Algorithm gives a 2-approximation for SCS. We develop a graph-theoretic framework for studying approximation algorithms for SCS. The framework is reminiscent of the classical 2-approximation for Traveling Salesman: take two copies of an optimal solution, apply a trivial edge-collapsing procedure, and get an approximate solution. In this framework, we observe two surprising properties of SCS solutions, and we conjecture that they hold for all input instances. The first conjecture, that we call Collapsing Superstring conjecture, claims that there is an elementary way to transform any solution repeated twice into the same graph G. This conjecture would give an elementary 2-approximate algorithm for SCS. The second conjecture claims that not only the resulting graph G is the same for all solutions, but that G can be computed by an elementary greedy procedure called Greedy Hierarchical Algorithm. While the second conjecture clearly implies the first one, perhaps surprisingly we prove their equivalence. We support these equivalent conjectures by giving a proof for the special case where all input strings have length at most 3 (which until recently had been the only case where the Greedy Conjecture was proven). We also tested our conjectures on millions of instances of SCS. We prove that the standard Greedy Conjecture implies Greedy Hierarchical Conjecture, while the latter is sufficient for an efficient greedy 2-approximate approximation of SCS. Except for its (conjectured) good approximation ratio, the Greedy Hierarchical Algorithm provably finds a 3.5-approximation, and finds exact solutions for the special cases where we know polynomial time (not greedy) exact algorithms: (1) when the input strings form a spectrum of a string (2) when all input strings have length at most 2

    Computational Molecular Biology

    No full text
    Computational Biology is a fairly new subject that arose in response to the computational problems posed by the analysis and the processing of biomolecular sequence and structure data. The field was initiated in the late 60's and early 70's largely by pioneers working in the life sciences. Physicists and mathematicians entered the field in the 70's and 80's, while Computer Science became involved with the new biological problems in the late 1980's. Computational problems have gained further importance in molecular biology through the various genome projects which produce enormous amounts of data. For this bibliography we focus on those areas of computational molecular biology that involve discrete algorithms or discrete optimization. We thus neglect several other areas of computational molecular biology, like most of the literature on the protein folding problem, as well as databases for molecular and genetic data, and genetic mapping algorithms. Due to the availability of review papers and a bibliography this bibliography

    Quantitative analyses in basic, translational and clinical biomedical research: metabolism, vaccine design and preterm delivery prediction

    Get PDF
    2 t.There is nothing more important than preserving life, and the thesis here presented is framed in the field of quantitative biomedicine (or systems biomedicine), which has as objective the application of physico-mathematical techniques in biomedical research in order to enhance the understanding of life's basis and its pathologies, and, ultimately, to defend human health. In this thesis, we have applied physico-mathematical methods in the three fundamental levels of Biomedical Research: basic, translational and clinical. At a basic level, since all pathologies have their basis in the cell, we have performed two studies to deepen in the understanding of the cellular metabolic functionality. In the first work, we have quantitatively analyzed for the first time calcium-dependent chloride currents inside the cell, which has revealed the existence of a dynamical structure characterized by highly organized data sequences, non-trivial long-term correlation that last in average 7.66 seconds, and "crossover" effect with transitions between persistent and anti-persistent behaviors. In the second investigation, by the use of delay differential equations, we have modeled the adenylate energy system, which is the principal source of cellular energy. This study has shown that the cellular energy charge is determined by an oscillatory non-stationary invariant function, bounded from 0.7 to 0.95. At a translational level, we have developed a new method for vaccine design that, besides obtaining high coverages, is capable of giving protection against viruses with high mutability rates such as HIV, HCV or Influenza. Finally, at a clinical level, first we have proven that the classic quantitative measure of uterine contractions (Montevideo Units) is incapable of predicting preterm labor immediacy. Then, by applying autoregressive techniques, we have designed a novel tool for premature delivery forecasting, based only in 30 minutes of uterine dynamics. Altogether, these investigations have originated four scientific publications, and as far as we know, our work is the first European thesis which integrates in the same framework the application of mathematical knowledge to biomedical fields in the three main stages of Biomedical Research: basic, translational and clinical

    On Approximability of Bounded Degree Instances of Selected Optimization Problems

    Get PDF
    In order to cope with the approximation hardness of an underlying optimization problem, it is advantageous to consider specific families of instances with properties that can be exploited to obtain efficient approximation algorithms for the restricted version of the problem with improved performance guarantees. In this thesis, we investigate the approximation complexity of selected NP-hard optimization problems restricted to instances with bounded degree, occurrence or weight parameter. Specifically, we consider the family of dense instances, where typically the average degree is bounded from below by some function of the size of the instance. Complementarily, we examine the family of sparse instances, in which the average degree is bounded from above by some fixed constant. We focus on developing new methods for proving explicit approximation hardness results for general as well as for restricted instances. The fist part of the thesis contributes to the systematic investigation of the VERTEX COVER problem in k-hypergraphs and k-partite k-hypergraphs with density and regularity constraints. We design efficient approximation algorithms for the problems with improved performance guarantees as compared to the general case. On the other hand, we prove the optimality of our approximation upper bounds under the Unique Games Conjecture or a variant. In the second part of the thesis, we study mainly the approximation hardness of restricted instances of selected global optimization problems. We establish improved or in some cases the first inapproximability thresholds for the problems considered in this thesis such as the METRIC DIMENSION problem restricted to graphs with maximum degree 3 and the (1,2)-STEINER TREE problem. We introduce a new reductions method for proving explicit approximation lower bounds for problems that are related to the TRAVELING SALESPERSON (TSP) problem. In particular, we prove the best up to now inapproximability thresholds for the general METRIC TSP problem, the ASYMMETRIC TSP problem, the SHORTEST SUPERSTRING problem, the MAXIMUM TSP problem and TSP problems with bounded metrics

    Reverse-Safe Data Structures for Text Indexing

    Get PDF
    We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O(n) and answers pattern matching queries of length at most d optimally, where d is maximal for any such z-reverse-safe data structure. The construction algorithm takes O(n ω log d) time, where ω is the matrix multiplication exponent. We show that, despite the n ω factor, our engineered implementation takes only a few minutes to finish for million-letter texts. We further show that plugging our method in data analysis applications gives insignificant or no data utility loss. Finally, we show how our technique can be extended to support applications under a realistic adversary model
    • …
    corecore