151 research outputs found

    Improving Table Compression with Combinatorial Optimization

    Full text link
    We study the problem of compressing massive tables within the partition-training paradigm introduced by Buchsbaum et al. [SODA'00], in which a table is partitioned by an off-line training procedure into disjoint intervals of columns, each of which is compressed separately by a standard, on-line compressor like gzip. We provide a new theory that unifies previous experimental observations on partitioning and heuristic observations on column permutation, all of which are used to improve compression rates. Based on the theory, we devise the first on-line training algorithms for table compression, which can be applied to individual files, not just continuously operating sources; and also a new, off-line training algorithm, based on a link to the asymmetric traveling salesman problem, which improves on prior work by rearranging columns prior to partitioning. We demonstrate these results experimentally. On various test files, the on-line algorithms provide 35-55% improvement over gzip with negligible slowdown; the off-line reordering provides up to 20% further improvement over partitioning alone. We also show that a variation of the table compression problem is MAX-SNP hard.Comment: 22 pages, 2 figures, 5 tables, 23 references. Extended abstract appears in Proc. 13th ACM-SIAM SODA, pp. 213-222, 200

    Computational Molecular Biology

    No full text
    Computational Biology is a fairly new subject that arose in response to the computational problems posed by the analysis and the processing of biomolecular sequence and structure data. The field was initiated in the late 60's and early 70's largely by pioneers working in the life sciences. Physicists and mathematicians entered the field in the 70's and 80's, while Computer Science became involved with the new biological problems in the late 1980's. Computational problems have gained further importance in molecular biology through the various genome projects which produce enormous amounts of data. For this bibliography we focus on those areas of computational molecular biology that involve discrete algorithms or discrete optimization. We thus neglect several other areas of computational molecular biology, like most of the literature on the protein folding problem, as well as databases for molecular and genetic data, and genetic mapping algorithms. Due to the availability of review papers and a bibliography this bibliography

    On the shortest common parameterized supersequence problem

    Full text link
    In this paper, we consider an approach to solve the problem of the shortest common parameterized supersequence. This approach is based on an explicit reduction from the shortest common parameterized supersequence problem to the 3-satisfiability problem and the maximum 2-satisfiability problem. © 2013 Anna Gorbenko and Vladimir Popov

    Quantitative analyses in basic, translational and clinical biomedical research: metabolism, vaccine design and preterm delivery prediction

    Get PDF
    2 t.There is nothing more important than preserving life, and the thesis here presented is framed in the field of quantitative biomedicine (or systems biomedicine), which has as objective the application of physico-mathematical techniques in biomedical research in order to enhance the understanding of life's basis and its pathologies, and, ultimately, to defend human health. In this thesis, we have applied physico-mathematical methods in the three fundamental levels of Biomedical Research: basic, translational and clinical. At a basic level, since all pathologies have their basis in the cell, we have performed two studies to deepen in the understanding of the cellular metabolic functionality. In the first work, we have quantitatively analyzed for the first time calcium-dependent chloride currents inside the cell, which has revealed the existence of a dynamical structure characterized by highly organized data sequences, non-trivial long-term correlation that last in average 7.66 seconds, and "crossover" effect with transitions between persistent and anti-persistent behaviors. In the second investigation, by the use of delay differential equations, we have modeled the adenylate energy system, which is the principal source of cellular energy. This study has shown that the cellular energy charge is determined by an oscillatory non-stationary invariant function, bounded from 0.7 to 0.95. At a translational level, we have developed a new method for vaccine design that, besides obtaining high coverages, is capable of giving protection against viruses with high mutability rates such as HIV, HCV or Influenza. Finally, at a clinical level, first we have proven that the classic quantitative measure of uterine contractions (Montevideo Units) is incapable of predicting preterm labor immediacy. Then, by applying autoregressive techniques, we have designed a novel tool for premature delivery forecasting, based only in 30 minutes of uterine dynamics. Altogether, these investigations have originated four scientific publications, and as far as we know, our work is the first European thesis which integrates in the same framework the application of mathematical knowledge to biomedical fields in the three main stages of Biomedical Research: basic, translational and clinical

    A system of intelligent algorithms for a module of onboard equipment of mobile vehicles

    Full text link
    The area of intelligent robotics is moving from the single robot control problem to that of controlling multiple robots operating together and even collaborating in dynamic and unstructured intelligent environments. In such conditions, an intelligent robot control system is only part of general intelligent system. In this paper, we consider a model of such system. © 2013 Anna Gorbenko

    Encoding Hard String Problems with Answer Set Programming

    Get PDF
    Despite the simple, one-dimensional nature of strings, several computationally hard problems on strings are known. Tackling hard problems beyond sizes of toy instances with straight-forward solutions is infeasible. To solve these problems on datasets of even small sizes, effort has to be put into the conception of algorithms leveraging profound characteristics of the input. Finding these characteristics can be eased by rapidly creating and evaluating prototypes of new concepts in how to tackle hard problems. Such a rapid-prototyping method for hard problems is answer set programming (ASP). In this light, we study the application of ASP on five NP-hard optimization problems in the field of strings. We provide MAX-SAT and ASP encodings, and empirically reason about the merits and flaws when working with ASP solvers

    DNA Fragment Assembly Algorithms: Toward a Solution for Long Repeats

    Get PDF
    In this work, we describe our efforts to seek optimal solutions for the DNA Fragment Assembly Problem in terms of assembly accuracy and runtime efficiency. The main obstacles for the DNA Fragment Assembly are analyzed. After reviewing various advanced algorithms adopted by some assemblers in the bioinformatics industry, this work explores the feasibility of assembling fragments for a target sequence containing perfect long repeats, which is deemed theoretically impossible without tedious finishing reaction experiments. Innovative algorithms incorporating statistical analysis proposed in this work make the restoration of DNA sequences containing long perfect repeats an attainable goal

    The swap common superstring problem

    Full text link
    In this paper we consider an approach to solve the swap common superstring problem. this approach is based on an explicit reduction from the problem to the satisfiability problem
    corecore