926 research outputs found
On the Complexity of the Median and Closest Permutation Problems
Genome rearrangements are events where large blocks of DNA exchange places
during evolution. The analysis of these events is a promising tool for
understanding evolutionary genomics, providing data for phylogenetic
reconstruction based on genome rearrangement measures. Many pairwise
rearrangement distances have been proposed, based on finding the minimum number
of rearrangement events to transform one genome into the other, using some
predefined operation. When more than two genomes are considered, we have the
more challenging problem of rearrangement-based phylogeny reconstruction. Given
a set of genomes and a distance notion, there are at least two natural ways to
define the "target" genome. On the one hand, finding a genome that minimizes
the sum of the distances from this to any other, called the median genome.
Finding a genome that minimizes the maximum distance to any other, called the
closest genome. Considering genomes as permutations, some distance metrics have
been extensively studied. We investigate median and closest problems on
permutations over the metrics: breakpoint, swap, block-interchange,
short-block-move, and transposition. In biological matters some values are
usually small, such as the solution value d or the number k of input
permutations. For each of these metrics and parameters d or k, we analyze the
closest and the median problems from the viewpoint of parameterized complexity.
We obtain the following results: NP-hardness for finding the median/closest
permutation for some metrics, even for k = 3; Polynomial kernels for the
problems of finding the median permutation of all studied metrics, considering
the target distance d as parameter; NP-hardness result for finding the closest
permutation by short-block-moves; FPT algorithms and infeasibility of
polynomial kernels for finding the closest permutation for some metrics
parameterized by the target distance d
Approximating solution structure of the Weighted Sentence Alignment problem
We study the complexity of approximating solution structure of the bijective
weighted sentence alignment problem of DeNero and Klein (2008). In particular,
we consider the complexity of finding an alignment that has a significant
overlap with an optimal alignment. We discuss ways of representing the solution
for the general weighted sentence alignment as well as phrases-to-words
alignment problem, and show that computing a string which agrees with the
optimal sentence partition on more than half (plus an arbitrarily small
polynomial fraction) positions for the phrases-to-words alignment is NP-hard.
For the general weighted sentence alignment we obtain such bound from the
agreement on a little over 2/3 of the bits. Additionally, we generalize the
Hamming distance approximation of a solution structure to approximating it with
respect to the edit distance metric, obtaining similar lower bounds
Models and Algorithms for Sorting Permutations with Tandem Duplication and Random Loss
A central topic of evolutionary biology is the inference of phylogeny, i. e., the evolutionary history of species. A powerful tool for the inference of such phylogenetic relationships is the arrangement of the genes in mitochondrial genomes. The rationale is that these gene arrangements are subject to different types of mutations in the course of evolution. Hence, a high similarity in the gene arrangement between two species indicates a close evolutionary relation. Metazoan mitochondrial gene arrangements are particularly well suited for such phylogenetic studies as they are available for a wide range of species, their gene content is almost invariant, and usually free of duplicates. With these properties gene arrangements of mitochondrial genomes are modeled by permutations in which each element represents a gene, i. e., a specific genetic sequence. The mutations that shape the gene arrangement of genomes are then represented by operations that rearrange elements in permutations, so-called genome rearrangements, and thereby bridge the gap between evolutionary biology and optimization. Many problems of phylogeny inference can be formulated as challenging combinatorial optimization problems which makes this research area especially interesting for computer scientists. The most prominent examples of such optimization problems are the sorting problem and the distance problem. While the sorting problem requires a minimum length sequence of rearrangements that transforms one given permutation into another given permutation, i. e., it aims for a hypothetical scenario of gene order evolution, the distance problem intends to determine only the length of such a sequence. This minimum length is called distance and used as a (dis)similarity measure quantifying the evolutionary relatedness.
Most evolutionary changes occurring in gene arrangements of mitochondrial genomes can be explained by the tandem duplication random loss (TDRL) genome rearrangement model. A TDRL consists of a duplication of a consecutive set of genes in tandem followed by a random loss of one copy of each duplicated gene. In spite of the importance of the TDRL genome rearrangement in mitochondrial evolution, its combinatorial properties have rarely been studied. In addition, models of genome rearrangements which include all types of rearrangement that are relevant for mitochondrial genomes, i. e., inversions, transpositions, inverse transpositions, and TDRLs, while admitting computational tractability are rare. Nevertheless, especially for metazoan gene arrangements the TDRL rearrangement should be considered for the reconstruction of phylogeny. Realizing that a better understanding of the TDRL model is indispensable for the study of mitochondrial gene arrangements, the central theme of this thesis is to broaden the horizon of TDRL genome rearrangements with respect to mitochondrial genome evolution. For this purpose, this thesis provides combinatorial properties of the TDRL model and its variants as well as
efficient methods for a plausible reconstruction of rearrangement scenarios between gene arrangements. The methods that are proposed consider all types of genome rearrangements that predominately occur during mitochondrial evolution. More precisely, the main points contained in this thesis are as follows:
The distance problem and the sorting problem for the TDRL model are further examined in respect to circular permutations, a formal concept that reflects the circular structure of mitochondrial genomes. As a result, a closed formula for the distance is provided.
Recently, evidence for a variant of the TDRL rearrangement model in which the
duplicated set of genes is additionally inverted have been found. Initiating the algorithmic study of this new rearrangement model on a certain type of permutations, a closed formula solving the distance problem is proposed as well as a quasilinear time algorithm that solves the corresponding sorting problem.
The assumption that only one type of genome rearrangement has occurred during the evolution of certain gene arrangements is most likely unrealistic, e. g., at least three types of rearrangements on top of the TDRL rearrangement have to be considered for the evolution metazoan mitochondrial genomes. Therefore, three different biologically motivated constraints are taken into account in this thesis in order to produce plausible evolutionary rearrangement scenarios. The first constraint is extending the considered set of genome rearrangements to the model that covers all four common types of mitochondrial genome rearrangements. For this 4-type model a sharp lower bound and several close additive upper bounds on the distance are developed. As a byproduct, a polynomial-time approximation algorithm for the corresponding sorting problem is provided that guarantees the computation of pairwise rearrangement scenarios that deviate from a minimum length scenario by at most two rearrangement operations. The second biologically motivated constraint is the relative frequency of the different types of rearrangements occurring during the evolution. The frequency is modeled by employing a weighting scheme on the 4-type model in which every rearrangement is weighted with respect to its type. The resulting NP-hard sorting problem is then solved by means of a polynomial size integer linear program. The third biologically motivated constraint that has been taken into account is that certain subsets of genes are often found in close proximity in the gene arrangements of many different species. This observation is reflected by demanding rearrangement scenarios to preserve certain groups of genes which are modeled by common intervals of permutations. In order to solve the sorting problem that considers all three types of biologically motivated constraints, the exact dynamic programming algorithm CREx2 is proposed. CREx2 has a linear runtime for a large class of problem instances. Otherwise, two versions of the CREx2 are provided: The first version provides exact solutions but has an exponential runtime in the worst case and the second version provides approximated solutions efficiently. CREx2 is evaluated by an empirical study for simulated artificial and real biological mitochondrial gene arrangements
Lower bounding edit distances between permutations
International audienceA number of fields, including the study of genome rearrangements and the design of interconnection networks, deal with the connected problems of sorting permutations in "as few moves as possible", using a given set of allowed operations, or computing the number of moves the sorting process requires, often referred to as the distance of the permutation. These operations often act on just one or two segments of the permutation, e.g. by reversing one segment or exchanging two segments. The cycle graph of the permutation to sort is a fundamental tool in the theory of genome rearrangements, and has proved useful in settling the complexity of many variants of the above problems. In this paper, we present an algebraic reinterpretation of the cycle graph of a permutation π as an even permutation π, and show how to reformulate our sorting problems in terms of particular factorisations of the latter permutation. Using our framework, we recover known results in a simple and unified way, and obtain a new lower bound on the prefix transposition distance (where a prefix transposition displaces the initial segment of a permutation), which is shown to outperform previous results. Moreover, we use our approach to improve the best known lower bound on the prefix transposition diameter from 2n/3 to ⌊3n/4⌋, and investigate a few relations between some statistics on π and π
On SAT representations of XOR constraints
We study the representation of systems S of linear equations over the
two-element field (aka xor- or parity-constraints) via conjunctive normal forms
F (boolean clause-sets). First we consider the problem of finding an
"arc-consistent" representation ("AC"), meaning that unit-clause propagation
will fix all forced assignments for all possible instantiations of the
xor-variables. Our main negative result is that there is no polysize
AC-representation in general. On the positive side we show that finding such an
AC-representation is fixed-parameter tractable (fpt) in the number of
equations. Then we turn to a stronger criterion of representation, namely
propagation completeness ("PC") --- while AC only covers the variables of S,
now all the variables in F (the variables in S plus auxiliary variables) are
considered for PC. We show that the standard translation actually yields a PC
representation for one equation, but fails so for two equations (in fact
arbitrarily badly). We show that with a more intelligent translation we can
also easily compute a translation to PC for two equations. We conjecture that
computing a representation in PC is fpt in the number of equations.Comment: 39 pages; 2nd v. improved handling of acyclic systems, free-standing
proof of the transformation from AC-representations to monotone circuits,
improved wording and literature review; 3rd v. updated literature,
strengthened treatment of monotonisation, improved discussions; 4th v. update
of literature, discussions and formulations, more details and examples;
conference v. to appear LATA 201
Taming Numbers and Durations in the Model Checking Integrated Planning System
The Model Checking Integrated Planning System (MIPS) is a temporal least
commitment heuristic search planner based on a flexible object-oriented
workbench architecture. Its design clearly separates explicit and symbolic
directed exploration algorithms from the set of on-line and off-line computed
estimates and associated data structures. MIPS has shown distinguished
performance in the last two international planning competitions. In the last
event the description language was extended from pure propositional planning to
include numerical state variables, action durations, and plan quality objective
functions. Plans were no longer sequences of actions but time-stamped
schedules. As a participant of the fully automated track of the competition,
MIPS has proven to be a general system; in each track and every benchmark
domain it efficiently computed plans of remarkable quality. This article
introduces and analyzes the most important algorithmic novelties that were
necessary to tackle the new layers of expressiveness in the benchmark problems
and to achieve a high level of performance. The extensions include critical
path analysis of sequentially generated plans to generate corresponding optimal
parallel plans. The linear time algorithm to compute the parallel plan bypasses
known NP hardness results for partial ordering by scheduling plans with respect
to the set of actions and the imposed precedence relations. The efficiency of
this algorithm also allows us to improve the exploration guidance: for each
encountered planning state the corresponding approximate sequential plan is
scheduled. One major strength of MIPS is its static analysis phase that grounds
and simplifies parameterized predicates, functions and operators, that infers
knowledge to minimize the state description length, and that detects domain
object symmetries. The latter aspect is analyzed in detail. MIPS has been
developed to serve as a complete and optimal state space planner, with
admissible estimates, exploration engines and branching cuts. In the
competition version, however, certain performance compromises had to be made,
including floating point arithmetic, weighted heuristic search exploration
according to an inadmissible estimate and parameterized optimization
- …