Multiple sequence alignment (MSA) is a ubiquitous problem in computational
biology. Although it is NP-hard to find an optimal solution for an arbitrary
number of sequences, due to the importance of this problem researchers are
trying to push the limits of exact algorithms further. Since MSA can be cast as
a classical path finding problem, it is attracting a growing number of AI
researchers interested in heuristic search algorithms as a challenge with
actual practical relevance. In this paper, we first review two previous,
complementary lines of research. Based on Hirschbergs algorithm, Dynamic
Programming needs O(kN^(k-1)) space to store both the search frontier and the
nodes needed to reconstruct the solution path, for k sequences of length N.
Best first search, on the other hand, has the advantage of bounding the search
space that has to be explored using a heuristic. However, it is necessary to
maintain all explored nodes up to the final solution in order to prevent the
search from re-expanding them at higher cost. Earlier approaches to reduce the
Closed list are either incompatible with pruning methods for the Open list, or
must retain at least the boundary of the Closed list. In this article, we
present an algorithm that attempts at combining the respective advantages; like
A* it uses a heuristic for pruning the search space, but reduces both the
maximum Open and Closed size to O(kN^(k-1)), as in Dynamic Programming. The
underlying idea is to conduct a series of searches with successively increasing
upper bounds, but using the DP ordering as the key for the Open priority queue.
With a suitable choice of thresholds, in practice, a running time below four
times that of A* can be expected. In our experiments we show that our algorithm
outperforms one of the currently most successful algorithms for optimal
multiple sequence alignments, Partial Expansion A*, both in time and memory.
Moreover, we apply a refined heuristic based on optimal alignments not only of
pairs of sequences, but of larger subsets. This idea is not new; however, to
make it practically relevant we show that it is equally important to bound the
heuristic computation appropriately, or the overhead can obliterate any
possible gain. Furthermore, we discuss a number of improvements in time and
space efficiency with regard to practical implementations. Our algorithm, used
in conjunction with higher-dimensional heuristics, is able to calculate for the
first time the optimal alignment for almost all of the problems in Reference 1
of the benchmark database BAliBASE