Search CORE

17 research outputs found

The complexity of multiple sequence alignment with SP-score that is a metric

Author: Bonizzoni Paola
Vedova Gianluca Della
Publication venue: Elsevier Science B.V.
Publication date: 28/05/2001
Field of study

AbstractThis paper analyzes the computational complexity of computing the optimal alignment of a set of sequences under the sum of all pairs (SP) score scheme. We solve an open question by showing that the problem is NP-complete in the very restricted case in which the sequences are over a binary alphabet and the score is a metric. This result establishes the intractability of multiple sequence alignment under a score function of mathematical interest, which has indeed received much attention in biological sequence comparison

Elsevier - Publisher Connector

Progressive multiple sequence alignment with the Poisson Indel Process

Author: Anisimova Maria
Gil Manuel
Maiolo Massimo
Zhang Xiaolei
Publication venue: Selbstverlag
Publication date: 01/01/2017
Field of study

Sequence alignment lies at the heart of many evolutionary and comparative genomics studies. However, the optimal alignment of multiple sequences is NP-hard, so that exact algorithms become impractical for more than a few sequences. Thus, state of the art alignment methods employ progressive heuristics, breaking the problem into a series of pairwise alignments guided by a phylogenetic tree. Changes between homologous characters are typically modelled by a continuous-time Markov substitution model. In contrast, the dynamics of insertions and deletions (indels) are not modelled explicitly, because the computation of the marginal likelihood under such models has exponential time complexity in the number of taxa. Recently, Bouchard-Côté and Jordan [PNAS (2012) 110(4):1160-1166] have introduced a modification to a classical indel model, describing indel evolution on a phylogenetic tree as a Poisson process. The model termed PIP allows to compute the joint marginal probability of a multiple sequence alignment and a tree in linear time. Here, we present an new dynamic programming algorithm to align two multiple sequence alignments by maximum likelihood in polynomial time under PIP, and apply it a in progressive algorithm. To our knowledge, this is the first progressive alignment method using a rigorous mathematical formulation of an evolutionary indel process and with polynomial time complexity

Crossref

ZHAW digitalcollection

Multiple sequence alignment based on set covers

Author: A. Bahr
B. Manthey
B. Morgenstern
B. Morgenstern
C. Notredame
D. Gusfield
G. Vogt
J.D. Thompson
K. Katoh
O. Gotoh
P. Zhao
R.E. Green
R.F. Smith
S. Henikoff
T. Müller
T.P. Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

We introduce a new heuristic for the multiple alignment of a set of sequences. The heuristic is based on a set cover of the residue alphabet of the sequences, and also on the determination of a significant set of blocks comprising subsequences of the sequences to be aligned. These blocks are obtained with the aid of a new data structure, called a suffix-set tree, which is constructed from the input sequences with the guidance of the residue-alphabet set cover and generalizes the well-known suffix tree of the sequence set. We provide performance results on selected BAliBASE amino-acid sequences and compare them with those yielded by some prominent approaches

arXiv.org e-Print Archive

CiteSeerX

Crossref

Lower bounds on multiple sequence alignment using exact 3-way alignment

Author: A Davidson
Charles J Colbourn
CJ Colbourn
CJ Colbourn
D Gusfield
D Gusfield
DF Feng
DR Stinson
EW Myers
I Holyer
IM Wallace
JD Thompson
JL Spouge
JR Stevens
L Wang
MS Rosenberg
O Gotoh
P Bonizzoni
PB Gibbons
R Durbin
S Kumar
SK Gupta
Sudhir Kumar
W Just
W Miller
X Huang
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Multiple sequence alignment is fundamental. Exponential growth in computation time appears to be inevitable when an optimal alignment is required for many sequences. Exact costs of optimum alignments are therefore rarely computed. Consequently much effort has been invested in algorithms for alignment that are heuristic, or explore a restricted class of solutions. These give an upper bound on the alignment cost, but it is equally important to determine the quality of the solution obtained. In the absence of an optimal alignment with which to compare, lower bounds may be calculated to assess the quality of the alignment. As more effort is invested in improving upper bounds (alignment algorithms), it is therefore important to improve lower bounds as well. Although numerous cost metrics can be used to determine the quality of an alignment, many are based on sum-of-pairs (SP) measures and their generalizations. Results Two standard and two new methods are considered for using exact 2-way and 3-way alignments to compute lower bounds on total SP alignment cost; one new method fares well with respect to accuracy, while the other reduces the computation time. The first employs exhaustive computation of exact 3-way alignments, while the second employs an efficient heuristic to compute a much smaller number of exact 3-way alignments. Calculating all 3-way alignments exactly and computing their average improves lower bounds on sum of SP cost in <it>v</it>-way alignments. However judicious selection of a subset of all 3-way alignments can yield a further improvement with minimal additional effort. On the other hand, a simple heuristic to select a random subset of 3-way alignments (a random packing) yields accuracy comparable to averaging all 3-way alignments with substantially less computational effort. Conclusion Calculation of lower bounds on SP cost (and thus the quality of an alignment) can be improved by employing a mixture of 3-way and 2-way alignments.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Exact Mean Computation in Dynamic Time Warping Spaces

Author: Brill Markus
Fluschnik Till
Froese Vincent
Jain Brijnesh
Niedermeier Rolf
Schultz David
Publication venue
Publication date: 31/05/2018
Field of study

Dynamic time warping constitutes a major tool for analyzing time series. In particular, computing a mean series of a given sample of series in dynamic time warping spaces (by minimizing the Fr\'echet function) is a challenging computational problem, so far solved by several heuristic and inexact strategies. We spot some inaccuracies in the literature on exact mean computation in dynamic time warping spaces. Our contributions comprise an exact dynamic program computing a mean (useful for benchmarking and evaluating known heuristics). Based on this dynamic program, we empirically study properties like uniqueness and length of a mean. Moreover, experimental evaluations reveal substantial deficits of state-of-the-art heuristics in terms of their output quality. We also give an exact polynomial-time algorithm for the special case of binary time series

arXiv.org e-Print Archive

Crossref

Warwick Research Archives Portal Repository

Rebooting the human mitochondrial phylogeny: an automated and scalable methodology with expert knowledge

Author: A Stamatakis
A Stamatakis
A Torroni
AW Briggs
CM Zmasek
D Gusfield
D Posada
D Posada
D Reich
DA Benson
DC Wallace
DM Behar
DM Hillis
E Ruiz-Pesini
E Ruiz-Pesini
ED Gunnarsdóttir
Eduardo Ruiz-Pesini
Elvira Mayordomo
G Gasparre
J Felsenstein
J Krause
Julio Montoya
KK Abu-Amero
L Pereira
L Wang
LL Cavalli-Sforza
LR Foulds
M Attimonelli
M van Oven
MV Han
P Bonizzoni
P Soares
R Bi
R Blanco
R Blanco
R Blanco
R Blanco
R Rajkumar
RC Edgar
RE Green
RL Cann
RM Andrews
Roberto Blanco
RS Malhi
S Anderson
S Fornarino
SB Needleman
TF Smith
U Arnason
WHE Day
WP Maddison
YG Yao
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Mitochondrial DNA is an ideal source of information to conduct evolutionary and phylogenetic studies due to its extraordinary properties and abundance. Many insights can be gained from these, including but not limited to screening genetic variation to identify potentially deleterious mutations. However, such advances require efficient solutions to very difficult computational problems, a need that is hampered by the very plenty of data that confers strength to the analysis. Results We develop a systematic, automated methodology to overcome these difficulties, building from readily available, public sequence databases to high-quality alignments and phylogenetic trees. Within each stage in an autonomous workflow, outputs are carefully evaluated and outlier detection rules defined to integrate expert knowledge and automated curation, hence avoiding the manual bottleneck found in past approaches to the problem. Using these techniques, we have performed exhaustive updates to the human mitochondrial phylogeny, illustrating the power and computational scalability of our approach, and we have conducted some initial analyses on the resulting phylogenies. Conclusions The problem at hand demands careful definition of inputs and adequate algorithmic treatment for its solutions to be realistic and useful. It is possible to define formal rules to address the former requirement by refining inputs directly and through their combination as outputs, and the latter are also of help to ascertain the performance of chosen algorithms. Rules can exploit known or inferred properties of datasets to simplify inputs through partitioning, therefore cutting computational costs and affording work on rapidly growing, otherwise intractable datasets. Although expert guidance may be necessary to assist the learning process, low-risk results can be fully automated and have proved themselves convenient and valuable.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Protein multiple sequence alignment by hybrid bio-inspired algorithms

Author: Cutello Vincenzo
Nicosia Giuseppe
Pavone Mario
Prizzi Igor
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

This article presents an immune inspired algorithm to tackle the Multiple Sequence Alignment (MSA) problem. MSA is one of the most important tasks in biological sequence analysis. Although this paper focuses on protein alignments, most of the discussion and methodology may also be applied to DNA alignments. The problem of finding the multiple alignment was investigated in the study by Bonizzoni and Vedova and Wang and Jiang, and proved to be a NP-hard (non-deterministic polynomial-time hard) problem. The presented algorithm, called Immunological Multiple Sequence Alignment Algorithm (IMSA), incorporates two new strategies to create the initial population and specific ad hoc mutation operators. It is based on the ‘weighted sum of pairs’ as objective function, to evaluate a given candidate alignment. IMSA was tested using both classical benchmarks of BAliBASE (versions 1.0, 2.0 and 3.0), and experimental results indicate that it is comparable with state-of-the-art multiple alignment algorithms, in terms of quality of alignments, weighted Sums-of-Pairs (SP) and Column Score (CS) values. The main novelty of IMSA is its ability to generate more than a single suboptimal alignment, for every MSA instance; this behaviour is due to the stochastic nature of the algorithm and of the populations evolved during the convergence process. This feature will help the decision maker to assess and select a biologically relevant multiple sequence alignment. Finally, the designed algorithm can be used as a local search procedure to properly explore promising alignments of the search space

CiteSeerX

PubMed Central