    Computational Biology is a fairly new subject that arose in response to the computational problems posed by the analysis and the processing of biomolecular sequence and structure data. The field was initiated in the late 60's and early 70's largely by pioneers working in the life sciences. Physicists and mathematicians entered the field in the 70's and 80's, while Computer Science became involved with the new biological problems in the late 1980's. Computational problems have gained further importance in molecular biology through the various genome projects which produce enormous amounts of data. For this bibliography we focus on those areas of computational molecular biology that involve discrete algorithms or discrete optimization. We thus neglect several other areas of computational molecular biology, like most of the literature on the protein folding problem, as well as databases for molecular and genetic data, and genetic mapping algorithms.

    The Fine-Grained Complexity of Median and Center String Problems Under Edit Distance

    We present the first fine-grained complexity results on two classic problems on strings. The first one is the k-Median-Edit-Distance problem, where the input is a collection of k strings, each of length at most n, and the task is to find a new string that minimizes the sum of the edit distances from itself to all other strings in the input. Arising frequently in computational biology, this problem provides an important generalization of edit distance to multiple strings and is similar to the multiple sequence alignment problem in bioinformatics. We demonstrate that for any ? > 0 and k ? 2, an O(n^{k-?}) time solution for the k-Median-Edit-Distance problem over an alphabet of size O(k) refutes the Strong Exponential Time Hypothesis (SETH). This provides the first matching conditional lower bound for the O(n^k) time algorithm established in 1975 by Sankoff. The second problem we study is the k-Center-Edit-Distance problem. Here also, the input is a collection of k strings, each of length at most n. The task is to find a new string that minimizes the maximum edit distance from itself to any other string in the input. We prove that the same conditional lower bound as before holds. Our results also imply new conditional lower bounds for the k-Tree-Alignment and the k-Bottleneck-Tree-Alignment problems studied in phylogenetics

    Packing multiway cuts in capacitated graphs

    We consider the following "multiway cut packing" problem in undirected graphs: we are given a graph G=(V,E) and k commodities, each corresponding to a set of terminals located at different vertices in the graph; our goal is to produce a collection of cuts {E_1,...,E_k} such that E_i is a multiway cut for commodity i and the maximum load on any edge is minimized. The load on an edge is defined to be the number of cuts in the solution crossing the edge. In the capacitated version of the problem the goal is to minimize the maximum relative load on any edge--the ratio of the edge's load to its capacity. Multiway cut packing arises in the context of graph labeling problems where we are given a partial labeling of a set of items and a neighborhood structure over them, and, informally, the goal is to complete the labeling in the most consistent way. This problem was introduced by Rabani, Schulman, and Swamy (SODA'08), who developed an O(log n/log log n) approximation for it in general graphs, as well as an improved O(log^2 k) approximation in trees. Here n is the number of nodes in the graph. We present the first constant factor approximation for this problem in arbitrary undirected graphs. Our approach is based on the observation that every instance of the problem admits a near-optimal laminar solution (that is, one in which no pair of cuts cross each other).Comment: The conference version of this paper is to appear at SODA 2009. This is the full versio

    Approximation algorithms for the fixed-topology phylogenetic number problem

    In the l-phylogeny problem, one wishes to construct an evolutionary tree for a. set of species represented by characters, in which each state of each character induces no more than l connected components. We consider the fixed-topology version of this problem for fixed-topologies of arbitrary degree. This version of the problem is known to be NP-complete for l greater than or equal to 3 even for degree-3 trees in which no state labels more than l + 1 leaves (and therefore there is a trivial l + 1 phylogeny) We give a 2-approximation algorithm for all l greater than or equal to 3 for arbitrary input topologies and we give an optimal approximation algorithm that constructs a 4-phylogeny when a 3-phylogeny exists. Dynamic programming techniques, which are typically used in fixed-toplogy problems, cannot be applied to l-phylogeny problems. Our 2-approximation algorithm is the first application of linear programming to approximation algorithms for phylogeny problems. We extend our results to a related problem in which characters are polymorphic