    Sorting by Strip Moves and Strip Swaps

    Genome rearrangement problems in computational biology [19, 29, 27] and zoning algorithms in optical character recognition [14, 4] have been modeled as combinatorial optimization problems related to the familiar problem of sorting, namely transforming arbitrary permutations to the identity permutation. The term permutation is used for an arbitrary arrangement of the integers 1, 2,···, n, and the term identity permutation for the arrangement of 1, 2,···, n in increasing order. When a permutation is viewed as the string of integers from 1 through n, any substring in it that is also a substring in the identity permutation will be called a strip. The objective in the combinatorial optimization problems arising from the applications is to obtain the identity permutation from an arbitrary permutation in the least number of a particular chosen strip operation. Among the strip operations which have been investigated thus far in the literature are strip moves, transpositions, reversals, and block interchanges [16, 2, 25, 11, 34]. However, it is important to note that most of the existing research on sorting by strip operations has been focused on obtaining hardness results or designing approximation algorithms, with little work carried out thus far on the implementation of the proposed approximation algorithms. This research starts with implementing two existing algorithms [5, 34] and as the main contributions, provides two new algorithms for sorting by strip swaps: 1) A greedy algorithm in which each strip swap reduces the number of strips the most, and puts maximum strips in their correct positions; 2) Another algorithm that uses the strategy of bringing closest consecutive pairs together called the closest consecutive pair (CCP) algorithm. The approximation ratios for the implemented algorithms are also experimentally estimated

    Transposition Rearrangement: Linear Algorithm for Length-Cost Model

    The contemporary computational biology gives motivation to study dependencies between finite sequences. Primary structures of DNA or proteins are represented by such sequences (also called words or strings). In the paper a linear algorithm, computing the distance between two words, is presented. The model operates with transpositions of single letters. The cost of a single transposition is equal to the distance which transposed letter has to cover. Other papers concerning the model give, as the best known, algorithms of time complexity O(n log n). The complexity of our algorithm is O(nk), where k is the size of the alphabet, and O(n) when the size is fixed

    A barrier for further approximating Sorting By Transpositions

    The Transposition Distance Problem (TDP) is a classical problem in genome rearrangements which seeks to determine the minimum number of transpositions needed to transform a linear chromosome into another represented by the permutations π\pi and σ\sigma, respectively. This paper focuses on the equivalent problem of Sorting By Transpositions (SBT), where σ\sigma is the identity permutation ι\iota. Specifically, we investigate palisades, a family of permutations that are "hard" to sort, as they require numerous transpositions above the celebrated lower bound devised by Bafna and Pevzner. By determining the transposition distance of palisades, we were able to provide the exact transposition diameter for 33-permutations (TD3), a special subset of the Symmetric Group SnS_n, essential for the study of approximate solutions for SBT using the simplification technique. The exact value for TD3 has remained unknown since Elias and Hartman showed an upper bound for it. Another consequence of determining the transposition distance of palisades is that, using as lower bound the one by Bafna and Pevzner, it is impossible to guarantee approximation ratios lower than 1.3751.375 when approximating SBT. This finding has significant implications for the study of SBT, as this problem has been subject of intense research efforts for the past 25 years

    On Weighting Schemes for Gene Order Analysis

    Gene order analysis aims at extracting phylogenetic information from the comparison of the order and orientation of the genes on the genomes of different species. This can be achieved by computing parsimonious rearrangement scenarios, i.e. to determine a sequence of rearrangements events that transforms one given gene order into another such that the sum of weights of the included rearrangement events is minimal. In this sequence only certain types of rearrangements, given by the rearrangement model, are admissible and weights are assigned with respect to the rearrangement type. The choice of a suitable rearrangement model and corresponding weights for the included rearrangement types is important for the meaningful reconstruction. So far the analysis of weighting schemes for gene order analysis has not been considered sufficiently. In this paper weighting schemes for gene order analysis are considered for two rearrangement models: 1) inversions, transpositions, and inverse transpositions; 2) inversions, block interchanges, and inverse transpositions. For both rearrangement models we determined properties of the weighting functions that exclude certain types of rearrangements from parsimonious rearrangement scenarios

    Sorting by Block Moves

    The research in this thesis is focused on the problem of Block Sorting, which has applications in Computational Biology and in Optical Character Recognition (OCR). A block in a permutation is a maximal sequence of consecutive elements that are also consecutive in the identity permutation. BLOCK SORTING is the process of transforming an arbitrary permutation to the identity permutation through a sequence of block moves. Given an arbitrary permutation π and an integer m, the Block Sorting Problem, or the problem of deciding whether the transformation can be accomplished in at most m block moves has been shown to be NP-hard. After being known to be 3-approximable for over a decade, block sorting has been researched extensively and now there are several 2-approximation algorithms for its solution. This work introduces new structures on a permutation, which are called runs and ordered pairs, and are used to develop two new approximation algorithms. Both the new algorithms are 2-approximation algorithms, yielding the approximation ratio equal to the current best. This work also includes an analysis of both the new algorithms showing they are 2-approximation algorithms

    A fast algorithm for the multiple genome rearrangement problem with weighted reversals and transpositions

    <p>Abstract</p> <p>Background</p> <p>Due to recent progress in genome sequencing, more and more data for phylogenetic reconstruction based on rearrangement distances between genomes become available. However, this phylogenetic reconstruction is a very challenging task. For the most simple distance measures (the breakpoint distance and the reversal distance), the problem is NP-hard even if one considers only three genomes.</p> <p>Results</p> <p>In this paper, we present a new heuristic algorithm that directly constructs a phylogenetic tree w.r.t. the weighted reversal and transposition distance. Experimental results on previously published datasets show that constructing phylogenetic trees in this way results in better trees than constructing the trees w.r.t. the reversal distance, and recalculating the weight of the trees with the weighted reversal and transposition distance. An implementation of the algorithm can be obtained from the authors.</p> <p>Conclusion</p> <p>The possibility of creating phylogenetic trees directly w.r.t. the weighted reversal and transposition distance results in biologically more realistic scenarios. Our algorithm can solve today's most challenging biological datasets in a reasonable amount of time.</p

    Sorting signed permutations by reversals, revisited

    AbstractThe problem of sorting signed permutations by reversals (SBR) is a fundamental problem in computational molecular biology. The goal is, given a signed permutation, to find a shortest sequence of reversals that transforms it into the positive identity permutation, where a reversal is the operation of taking a segment of the permutation, reversing it, and flipping the signs of its elements.In this paper we describe a randomized algorithm for SBR. The algorithm tries to sort the permutation by repeatedly performing a random oriented reversal. This process is in fact a random walk on the graph where permutations are the nodes and an arc from π to π′ corresponds to an oriented reversal that transforms π to π′. We show that if this random walk stops at the identity permutation, then we have found a shortest sequence. We give empirical evidence that this process indeed succeeds with high probability on a random permutation.To implement our algorithm we describe a data structure to maintain a permutation, that allows to draw an oriented reversal uniformly at random, and perform it in sub-linear time. With this data structure we can implement the random walk in O(n3/2logn) time, thus obtaining an algorithm for SBR that almost always runs in sub-quadratic time. The data structures we present may also be of independent interest for developing other algorithms for SBR, and for other problems.Finally, we present the first efficient parallel algorithm for SBR. We obtain this result by developing a fast implementation of the recent algorithm of Bergeron (Proceedings of CPM, 2001, pp. 106–117) for sorting signed permutations by reversals that is parallelizable. Our implementation runs in O(n2logn) time on a regular RAM, and in O(nlogn) time on a PRAM using n processors