4 research outputs found
An Improved Search Algorithm for Optimal Multiple-Sequence Alignment
Multiple sequence alignment (MSA) is a ubiquitous problem in computational
biology. Although it is NP-hard to find an optimal solution for an arbitrary
number of sequences, due to the importance of this problem researchers are
trying to push the limits of exact algorithms further. Since MSA can be cast as
a classical path finding problem, it is attracting a growing number of AI
researchers interested in heuristic search algorithms as a challenge with
actual practical relevance. In this paper, we first review two previous,
complementary lines of research. Based on Hirschbergs algorithm, Dynamic
Programming needs O(kN^(k-1)) space to store both the search frontier and the
nodes needed to reconstruct the solution path, for k sequences of length N.
Best first search, on the other hand, has the advantage of bounding the search
space that has to be explored using a heuristic. However, it is necessary to
maintain all explored nodes up to the final solution in order to prevent the
search from re-expanding them at higher cost. Earlier approaches to reduce the
Closed list are either incompatible with pruning methods for the Open list, or
must retain at least the boundary of the Closed list. In this article, we
present an algorithm that attempts at combining the respective advantages; like
A* it uses a heuristic for pruning the search space, but reduces both the
maximum Open and Closed size to O(kN^(k-1)), as in Dynamic Programming. The
underlying idea is to conduct a series of searches with successively increasing
upper bounds, but using the DP ordering as the key for the Open priority queue.
With a suitable choice of thresholds, in practice, a running time below four
times that of A* can be expected. In our experiments we show that our algorithm
outperforms one of the currently most successful algorithms for optimal
multiple sequence alignments, Partial Expansion A*, both in time and memory.
Moreover, we apply a refined heuristic based on optimal alignments not only of
pairs of sequences, but of larger subsets. This idea is not new; however, to
make it practically relevant we show that it is equally important to bound the
heuristic computation appropriately, or the overhead can obliterate any
possible gain. Furthermore, we discuss a number of improvements in time and
space efficiency with regard to practical implementations. Our algorithm, used
in conjunction with higher-dimensional heuristics, is able to calculate for the
first time the optimal alignment for almost all of the problems in Reference 1
of the benchmark database BAliBASE
Multiple-Goal Heuristic Search
This paper presents a new framework for anytime heuristic search where the
task is to achieve as many goals as possible within the allocated resources. We
show the inadequacy of traditional distance-estimation heuristics for tasks of
this type and present alternative heuristics that are more appropriate for
multiple-goal search. In particular, we introduce the marginal-utility
heuristic, which estimates the cost and the benefit of exploring a subtree
below a search node. We developed two methods for online learning of the
marginal-utility heuristic. One is based on local similarity of the partial
marginal utility of sibling nodes, and the other generalizes marginal-utility
over the state feature space. We apply our adaptive and non-adaptive
multiple-goal search algorithms to several problems, including focused
crawling, and show their superiority over existing methods
Alinhamento múltiplo de sequências com A-Star paralelo em cluster MPI
Trabalho de conclusão de curso (graduação)—Universidade de BrasÃlia, Instituto de Ciências Exatas, Departamento de Ciência da Computação, 2016.O alinhamento múltiplo de sequências visa ressaltar as similaridades e diferenças em
um conjunto de sequências biológicas. O alinhamento múltiplo com a soma de pares é
um problema NP-DifÃcil e métodos heurÃsticos são usados para solucioná-lo, porém esses
métodos não garantem que o resultado ótimo será produzido. Algumas das técnicas exatas
que produzem o resultado ótimo são baseadas no algoritmo de busca A-Star, sendo
uma delas o A-Star Paralelo (PA-Star). O PA-Star divide o espaço de busca entre múltiplas
threads, acelerando a obtenção de resultados, contudo tem sua execução limitada a
uma única máquina. O objetivo deste trabalho de graduação é propor, implementar e
avaliar o MPI-PAStar, uma estratégia que permita reduzir o tempo de busca ao executar
o PA-Star em diversas máquinas, utilizando o ambiente MPI para trocar mensagens, distribuindo
carga de trabalho entre as máquinas. O MPI-PAStar adiciona ao PA-Star um
pool de threads de processamento de mensagens e duas threads responsáveis pelo envio
e recebimento de mensagens. Diversas estratégias são utilizadas para reduzir o tráfego
de dados e a latência de rede, como a serialização de blocos de carga de trabalho e compactação
destes antes do envio, reduzindo efeitos colaterais negativos da rede sobre a
computação do alinhamento. Os resultados do MPI-PAStar apresentaram ganhos de até
36.8% no tempo de busca do alinhamento ótimo e de até 29,7% no tempo total de execução
do programa, quando comparado ao PA-Star, a depender do número e similaridade
das sequências sendo alinhadas, além do comprimento da maior sequência.The multiple sequence alignment purpose is to highlight similarities and differences
between a set of biological sequences. The multiple alignment is an NP-Hard problem and
heuristic methods are used to solve it, however those do not guarantee that an optimal
result is produced. Some exact techniques that can produce an optimal result are based
on the A-Star graph search algorithm, being one of them the Parallel A-Star (PA-Star).
The PA-Star divides the search space to multiple threads, accelerating the search for the
result, but its execution is limited to a single machine. The objective of this undergraduate
work is to propose, implement and evaluate the MPI-PAStar, a strategy that allows the
reduction of the search time by executing the PA-Star on multiple machines, using the MPI
environment to exchange messages, distributing the workload across different machines.
The MPI-PAStar adds to PA-Star a pool of message processing threads and two threads
responsible for sending and receiving messages. Multiple strategies are used to reduce
network traffic and latency, like serialized workload blocks and compressing them before
sending them, reducing negative network effects over the alignment computation. Results
obtained with the MPI-PAStar showed that it can yield up to 36.8% reduction in terms
of alignment time and up to 29.7% in terms of total execution time, depending on the
number of sequences being aligned, the length of longest sequence and the content of the
sequences
An improved search algorithm for optimal multiple-sequence alignment
Multiple sequence alignment (MSA) is a ubiquitous problem in computational biology. Although it is NP-hard to find an optimal solution for an arbitrary number of sequences, due to the importance of this problem researchers are trying to push the limits of exact algorithms further. Since MSA can be cast as a classical path finding problem, it is attracting a growing number of AI researchers interested in heuristic search algorithms as a challenge with actual practical relevance. In this paper, we first review two previous, complementary lines of research. Based on Hirschberg’s algorithm, Dynamic Programming needs O(kN k−1) space to store both the search frontier and the nodes needed to reconstruct the solution path, for k sequences of length N. Best first search, on the other hand, has the advantage of bounding the search space that has to be explored using a heuristic. However, it is necessary to maintain all explored nodes up to the final solution in order to prevent the search from re-expanding them at higher cost. Earlier approaches to reduce the Closed list are either incompatible with pruning methods for the Open list, or must retain at least the boundary of the Close