90 research outputs found
The complexity of multiple sequence alignment with SP-score that is a metric
AbstractThis paper analyzes the computational complexity of computing the optimal alignment of a set of sequences under the sum of all pairs (SP) score scheme. We solve an open question by showing that the problem is NP-complete in the very restricted case in which the sequences are over a binary alphabet and the score is a metric. This result establishes the intractability of multiple sequence alignment under a score function of mathematical interest, which has indeed received much attention in biological sequence comparison
PIntron: a Fast Method for Gene Structure Prediction via Maximal Pairings of a Pattern and a Text
Current computational methods for exon-intron structure prediction from a
cluster of transcript (EST, mRNA) data do not exhibit the time and space
efficiency necessary to process large clusters of over than 20,000 ESTs and
genes longer than 1Mb. Guaranteeing both accuracy and efficiency seems to be a
computational goal quite far to be achieved, since accuracy is strictly related
to exploiting the inherent redundancy of information present in a large
cluster. We propose a fast method for the problem that combines two ideas: a
novel algorithm of proved small time complexity for computing spliced
alignments of a transcript against a genome, and an efficient algorithm that
exploits the inherent redundancy of information in a cluster of transcripts to
select, among all possible factorizations of EST sequences, those allowing to
infer splice site junctions that are highly confirmed by the input data. The
EST alignment procedure is based on the construction of maximal embeddings that
are sequences obtained from paths of a graph structure, called Embedding Graph,
whose vertices are the maximal pairings of a genomic sequence T and an EST P.
The procedure runs in time linear in the size of P, T and of the output.
PIntron, the software tool implementing our methodology, is able to process in
a few seconds some critical genes that are not manageable by other gene
structure prediction tools. At the same time, PIntron exhibits high accuracy
(sensitivity and specificity) when compared with ENCODE data. Detailed
experimental data, additional results and PIntron software are available at
http://www.algolab.eu/PIntron
Pure Parsimony Xor Haplotyping
The haplotype resolution from xor-genotype data has been recently formulated
as a new model for genetic studies. The xor-genotype data is a cheaply
obtainable type of data distinguishing heterozygous from homozygous sites
without identifying the homozygous alleles. In this paper we propose a
formulation based on a well-known model used in haplotype inference: pure
parsimony. We exhibit exact solutions of the problem by providing polynomial
time algorithms for some restricted cases and a fixed-parameter algorithm for
the general case. These results are based on some interesting combinatorial
properties of a graph representation of the solutions. Furthermore, we show
that the problem has a polynomial time k-approximation, where k is the maximum
number of xor-genotypes containing a given SNP. Finally, we propose a heuristic
and produce an experimental analysis showing that it scales to real-world large
instances taken from the HapMap project
- …