Search CORE

77,388 research outputs found

Vertical decomposition with Genetic Algorithm for Multiple Sequence Alignment

Author: Essam Daryl
Naznin Farhana
Sarker Ruhul
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Many Bioinformatics studies begin with a multiple sequence alignment as the foundation for their research. This is because multiple sequence alignment can be a useful technique for studying molecular evolution and analyzing sequence structure relationships.In this paper, we have proposed a Vertical Decomposition with Genetic Algorithm (VDGA) for Multiple Sequence Alignment (MSA). In VDGA, we divide the sequences vertically into two or more subsequences, and then solve them individually using a guide tree approach. Finally, we combine all the subsequences to generate a new multiple sequence alignment. This technique is applied on the solutions of the initial generation and of each child generation within VDGA. We have used two mechanisms to generate an initial population in this research: the first mechanism is to generate guide trees with randomly selected sequences and the second is shuffling the sequences inside such trees. Two different genetic operators have been implemented with VDGA. To test the performance of our algorithm, we have compared it with existing well-known methods, namely PRRP, CLUSTALX, DIALIGN, HMMT, SB_PIMA, ML_PIMA, MULTALIGN, and PILEUP8, and also other methods, based on Genetic Algorithms (GA), such as SAGA, MSA-GA and RBT-GA, by solving a number of benchmark datasets from BAliBase 2.0.The experimental results showed that the VDGA with three vertical divisions was the most successful variant for most of the test cases in comparison to other divisions considered with VDGA. The experimental results also confirmed that VDGA outperformed the other methods considered in this research

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

UNSWorks

Self-Organizing Genetic Algorithm for Multiple Sequence Alignment

Author: Dr. Amouda Nizam
Kuppuswami Subburaya
Kuppuswami Subburaya
Publication venue: Global Journals Inc. (US)
Publication date: 04/04/2011
Field of study

Genetic algorithm (GA) used to solve the optimization problem is self-organized and applied to Multiple Sequence Alignment (MSA), an essential process in molecular sequence analysis. This paper presents the first attempt in applying Self-Organizing Genetic Algorithm for MSA. Self-organizing genetic algorithm (SOGA) can be developed with the complete knowledge about the problem and its parameters. In SOGA, values of various parameters are decided based on the problem and fitness value obtained in each generation. The proposed algorithm undergoes a self-organizing crossover operation by selecting an appropriate rate or a point and a self-organizing cyclic mutation for the required number of generations. The advantages of the proposed algorithm are (i) reduce the time requirement for optimizing the parameter values (ii) prevent execution with default values (iii) avoid premature convergence by the cyclic mutation operation. To validate the efficiency, SOGA is applied to MSA, and the resulting alignment is evaluated using the column score (CS). The comparison result shows that the alignment produced by SOGA is better than the widely used tools like Dialign and Multalin. It is also evident that the proposed algorithm can produce optimal or closer-to-optimal alignment compared to tools like ClustalW, Mafft, Dialign and Multalin

Global Journal of Computer Science and Technology (GJCST)

Multiple Sequence Alignment While Assessing Saturation Across Sequence Data

Author: Pearse William D.
Weedop K. Bodie
Publication venue: Hosted by Utah State University Libraries
Publication date: 30/03/2017
Field of study

Constructing and analyzing phylogenetic trees is central to biological disciplines such as evolutionary and systematic biology. Accurate phylogenetic inference improves the estimation of evolutionary relationships, rates of molecular evolution, and Operational Taxonomic Units (OTUs). Careful alignment of sequence data is critical prior to any phylogenetic reconstruction, and there are many different multiple sequence alignment programs that are currently used (reviewed in Edgar & Batzoglou 2006). However, difficulty persists when using alignments to accurately determine actual genetic divergences. A major, yet under-explored, problem is saturation: the repetition of base substitutions at a single site within a sequence. Saturation causes issues because numerous substitutions in sequences within an alignment can erroneously underestimate divergence. Here, we present an algorithm, Splinter, that identifies and accounts for saturation during DNA sequence alignment

DigitalCommons@USU

Aligning Multiple Sequences with Genetic Algorithm

Author: Adebiyi E. F.
Akinyemi I. O.
Fatumo S.
Publication venue
Publication date: 01/06/2009
Field of study

The alignment of biological sequences is a crucial tool in molecular biology and genome analysis. It helps to build a phylogenetic tree of related DNA sequences and also to predict the function and structure of unknown protein sequences by aligning with other sequences whose function and structure is already known. However, finding an optimal multiple sequence alignment takes time and space exponential with the length or number of sequences increases. Genetic Algorithms (GAs) are strategies of random searching that optimize an objective function which is a measure of alignment quality (distance) and has the ability for exploratory search through the solution space and exploitation of current results

Covenant University Repository

A methodology for determining amino-acid substitution matrices from set covers

Author: A. Bahr
A.D. McLachlan
D.F. Feng
G. Vogt
G.H. Gonnet
J. Setubal
J.D. Blake
J.K.M. Rao
M. Gribskov
M.F. Sagot
R.B. Russell
R.E. Green
R.F. Smith
S. Henikoff
S.A. Benner
T. Müller
T.P. Li
W.S.J. Valdar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/04/2005
Field of study

We introduce a new methodology for the determination of amino-acid substitution matrices for use in the alignment of proteins. The new methodology is based on a pre-existing set cover on the set of residues and on the undirected graph that describes residue exchangeability given the set cover. For fixed functional forms indicating how to obtain edge weights from the set cover and, after that, substitution-matrix elements from weighted distances on the graph, the resulting substitution matrix can be checked for performance against some known set of reference alignments and for given gap costs. Finding the appropriate functional forms and gap costs can then be formulated as an optimization problem that seeks to maximize the performance of the substitution matrix on the reference alignment set. We give computational results on the BAliBASE suite using a genetic algorithm for optimization. Our results indicate that it is possible to obtain substitution matrices whose performance is either comparable to or surpasses that of several others, depending on the particular scenario under consideration

arXiv.org e-Print Archive

Crossref

Recommended from our members

Soft topographic map for clustering and classification of bacteria

Author: G.M. Garrity
H. Klock
I.T. Joliffe
J.D. Thompson
J.E. Clarridge III
K. Rose
M. Drancourt
M. Drancourt
M. Drancourt
M. Remm
P. Rice
S. Altschul
S. Dubnov
S. Kumar
S.B. Needleman
S.P. Luttrell
T. Graepel
T. Graepel
T. Hofmann
T. Hofmann
T. Kohonen
T. Kohonen
T.H. Jukes
W.S. Torgerson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

In this work a new method for clustering and building a topographic representation of a bacteria taxonomy is presented. The method is based on the analysis of stable parts of the genome, the so-called “housekeeping genes”. The proposed method generates topographic maps of the bacteria taxonomy, where relations among different type strains can be visually inspected and verified. Two well known DNA alignement algorithms are applied to the genomic sequences. Topographic maps are optimized to represent the similarity among the sequences according to their evolutionary distances. The experimental analysis is carried out on 147 type strains of the Gammaprotebacteria class by means of the 16S rRNA housekeeping gene. Complete sequences of the gene have been retrieved from the NCBI public database. In the experimental tests the maps show clusters of homologous type strains and present some singular cases potentially due to incorrect classification or erroneous annotations in the database

Central Archive at the University of Reading

CiteSeerX

Crossref

Archivio istituzionale della ricerca - Università di Palermo

Testing robustness of relative complexity measure method constructing robust phylogenetic trees for Galanthus L. Using the relative complexity measure

Author: Bakis Yasin
Bakış Yasin
Bilgin Nese
Bilgin Neşe
Meydan Cem
Otu Hasan H.
Sezerman Ugur
Sezerman Uğur
Tasci Nivart
Taşçı Nivart
Yuzbasioglu Sirri
Yüzbaşıoğlu Sırrı
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Background: Most phylogeny analysis methods based on molecular sequences use multiple alignment where the quality of the alignment, which is dependent on the alignment parameters, determines the accuracy of the resulting trees. Different parameter combinations chosen for the multiple alignment may result in different phylogenies. A new non-alignment based approach, Relative Complexity Measure (RCM), has been introduced to tackle this problem and proven to work in fungi and mitochondrial DNA. Result: In this work, we present an application of the RCM method to reconstruct robust phylogenetic trees using sequence data for genus Galanthus obtained from different regions in Turkey. Phylogenies have been analyzed using nuclear and chloroplast DNA sequences. Results showed that, the tree obtained from nuclear ribosomal RNA gene sequences was more robust, while the tree obtained from the chloroplast DNA showed a higher degree of variation. Conclusions: Phylogenies generated by Relative Complexity Measure were found to be robust and results of RCM were more reliable than the compared techniques. Particularly, to overcome MSA-based problems, RCM seems to be a reasonable way and a good alternative to MSA-based phylogenetic analysis. We believe our method will become a mainstream phylogeny construction method especially for the highly variable sequence families where the accuracy of the MSA heavily depends on the alignment parameters

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Sabanci University Research Database

The EM Algorithm and the Rise of Computational Biology

Author: Citable Link
Jun S. Liu
Xiaodan Fan
Yuan Yuan
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2010
Field of study

In the past decade computational biology has grown from a cottage industry with a handful of researchers to an attractive interdisciplinary field, catching the attention and imagination of many quantitatively-minded scientists. Of interest to us is the key role played by the EM algorithm during this transformation. We survey the use of the EM algorithm in a few important computational biology problems surrounding the "central dogma"; of molecular biology: from DNA to RNA and then to proteins. Topics of this article include sequence motif discovery, protein sequence alignment, population genetics, evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref