48,754 research outputs found
Bootstrapping Lexical Choice via Multiple-Sequence Alignment
An important component of any generation system is the mapping dictionary, a
lexicon of elementary semantic expressions and corresponding natural language
realizations. Typically, labor-intensive knowledge-based methods are used to
construct the dictionary. We instead propose to acquire it automatically via a
novel multiple-pass algorithm employing multiple-sequence alignment, a
technique commonly used in bioinformatics. Crucially, our method leverages
latent information contained in multi-parallel corpora -- datasets that supply
several verbalizations of the corresponding semantics rather than just one.
We used our techniques to generate natural language versions of
computer-generated mathematical proofs, with good results on both a
per-component and overall-output basis. For example, in evaluations involving a
dozen human judges, our system produced output whose readability and
faithfulness to the semantic input rivaled that of a traditional generation
system.Comment: 8 pages; to appear in the proceedings of EMNLP-200
Integrated multiple sequence alignment
Sammeth M. Integrated multiple sequence alignment. Bielefeld (Germany): Bielefeld University; 2005.The thesis presents enhancements for automated and manual multiple sequence alignment: existing alignment algorithms are made more easily accessible and new algorithms are designed for difficult cases.
Firstly, we introduce the QAlign framework, a graphical user interface for multiple sequence alignment. It comprises several state-of-the-art algorithms and supports their parameters by convenient dialogs. An alignment viewer with guided editing functionality can also highlight or print regions of the alignment. Also phylogenetic features are provided, e.g., distance-based tree reconstruction methods, corrections for multiple substitutions and a tree viewer. The modular concept and the platform-independent implementation guarantee an easy extensibility.
Further, we develop a constrained version of the divide-and-conquer alignment such that it can be restricted by anchors found earlier with local alignments. It can be shown that this method shares attributes of both, local and global aligners, in the quality of results as well as in the computation time. We further modify the local alignment step to work on bipartite (or even multipartite) sets for sequences where repeats overshadow valuable sequence information. In the end a technique is established that can accurately align sequences containing eventually repeated motifs.
Finally, another algorithm is presented that allows to compare tandem repeat sequences by aligning them with respect to their possible repeat histories. We describe an evolutionary model including tandem duplications and excisions, and give an exact algorithm to compare two sequences under this model
Contextual Multiple Sequence Alignment
In a recently proposed contextual alignment model, efficient algorithms exist for global and local pairwise alignment of protein sequences. Preliminary results obtained for biological data are very promising. Our main motivation was to adopt the idea of context dependency to the multiple alignment setting. To this aim the relaxation of the model was developed (we call this new model averaged contextual alignment) and a new family of amino acids substitution matrices are constructed. In this paper we present a contextual multiple alignment algorithm and report the outcomes of experiments performed for the BAliBASE test set. The contextual approach turned out to give much better results for the set of sequences containing orphan genes
Higher accuracy protein Multiple Sequence Alignment by Stochastic Algorithm
Multiple Sequence Alignment gives insight into evolutionary, structural and functional relationships among the proteins. Here, a novel Protein Alignment by Stochastic Algorithm (PASA) is developed. Evolutionary operators of a genetic algorithm, namely, mutation and selection are utilized in combining the output of two most important sequence alignment programs and then developing an optimized new algorithm. Efficiency of protein alignments is evaluated in terms of Total Column score which is equal to the number of correctly aligned columns between a test alignment and the reference alignment divided by the total number of columns in the reference alignment. The PASA optimizer achieves, on an average, significant better alignment over the well known individual bioinformatics tools. This PASA is statistically the most accurate protein alignment method today. It can have potential applications in drug discovery processes in the biotechnology industry
1. Types of Alignment: Presentations & Demos Assignment
Pairwise Alignment: DNA
Pairwise Alignment: Protein
Multiple Sequence Alignment: DNA
Multiple Sequence Alignment: Protei
Accelerated large-scale multiple sequence alignment
<p>Abstract</p> <p>Background</p> <p>Multiple sequence alignment (MSA) is a fundamental analysis method used in bioinformatics and many comparative genomic applications. Prior MSA acceleration attempts with reconfigurable computing have only addressed the first stage of progressive alignment and consequently exhibit performance limitations according to Amdahl's Law. This work is the first known to accelerate the third stage of progressive alignment on reconfigurable hardware.</p> <p>Results</p> <p>We reduce subgroups of aligned sequences into discrete profiles before they are pairwise aligned on the accelerator. Using an FPGA accelerator, an overall speedup of up to 150 has been demonstrated on a large data set when compared to a 2.4 GHz Core2 processor.</p> <p>Conclusions</p> <p>Our parallel algorithm and architecture accelerates large-scale MSA with reconfigurable computing and allows researchers to solve the larger problems that confront biologists today. Program source is available from <url>http://dna.cs.byu.edu/msa/</url>.</p
Multiple sequence alignment correction using constraints
Trabalho apresentado no âmbito do European
Master in Computational Logics, como requisito
parcial para obtenção do grau de Mestre em Computational LogicsOne of the most important fields in bioinformatics has been the study of protein sequence
alignments. The study of homologous proteins, related by evolution, shows
the conservation of many amino acids because of their functional and structural importance.
One particular relationship between the amino acid sites in the same sequence
or between different sequences, is protein-coevolution, interest in which has increased
as a consequence of mathematical and computational methods used to understand the
spatial, functional and evolutionary dependencies between amino acid sites. The principle
of coevolution means that some amino acids are related through evolution because
mutations in one site can create evolutionary pressures to select compensatory
mutations in other sites that are functionally or structurally related.
With the actual methods to detect coevolution, specifically mutual information techniques
from the information theory field, we show in this work that much of the information
between coevolved sites is lost because of mistakes in the multiple sequence
alignment of variable regions. Moreover, we show that using these statistical methods
to detect coevolved sites in multiple sequence alignments results in a high rate of false
positives.
Due to the amount of errors in the detection of coevolved site from multiple sequence
alignments, we propose in this work a method to improve the detection efficacy
of coevolved sites and we implement an algorithm to fix such sites correcting the
misalignment produced in those specific locations.
The detection part of our work is based on the mutual information between sites
that are guessed as having coevolved, due to their high statistical correlation score.
With this information we search for possible misalignments on those regions due to
the incorrect matching of amino acids during the alignment. The re-alignment part is based on constraint programming techniques, to avoid the combinatorial complexity
when one amino acid can be aligned with many others and to avoid inconsistencies in
the alignments.
In this work, we present a framework to impose constraints over the sequences, and
we show how it is possible to compute alignments based on different criteria just by
setting constraint between the amino acids. This framework can be applied not only for
improving the alignment and detection of coevolved regions, but also to any desired
constraints that may be used to express functional or structural relations among the
amino acids in multiple sequences. We show also that after we fix these misalignments,
using constraints based techniques, the correlation between coevolved sites increases
and, in general, the new alignment is closer to the correct alignment than the MSA
alignment.
Finally, we show possible future research lines with the objective of overcoming
some drawbacks detected during this work
- …