Search CORE

60,429 research outputs found

Aligning Multiple Sequences with Genetic Algorithm

Author: Adebiyi E. F.
Akinyemi I. O.
Fatumo S.
Publication venue
Publication date: 01/06/2009
Field of study

The alignment of biological sequences is a crucial tool in molecular biology and genome analysis. It helps to build a phylogenetic tree of related DNA sequences and also to predict the function and structure of unknown protein sequences by aligning with other sequences whose function and structure is already known. However, finding an optimal multiple sequence alignment takes time and space exponential with the length or number of sequences increases. Genetic Algorithms (GAs) are strategies of random searching that optimize an objective function which is a measure of alignment quality (distance) and has the ability for exploratory search through the solution space and exploitation of current results

Covenant University Repository

Vertical decomposition with Genetic Algorithm for Multiple Sequence Alignment

Author: Essam Daryl
Naznin Farhana
Sarker Ruhul
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Many Bioinformatics studies begin with a multiple sequence alignment as the foundation for their research. This is because multiple sequence alignment can be a useful technique for studying molecular evolution and analyzing sequence structure relationships.In this paper, we have proposed a Vertical Decomposition with Genetic Algorithm (VDGA) for Multiple Sequence Alignment (MSA). In VDGA, we divide the sequences vertically into two or more subsequences, and then solve them individually using a guide tree approach. Finally, we combine all the subsequences to generate a new multiple sequence alignment. This technique is applied on the solutions of the initial generation and of each child generation within VDGA. We have used two mechanisms to generate an initial population in this research: the first mechanism is to generate guide trees with randomly selected sequences and the second is shuffling the sequences inside such trees. Two different genetic operators have been implemented with VDGA. To test the performance of our algorithm, we have compared it with existing well-known methods, namely PRRP, CLUSTALX, DIALIGN, HMMT, SB_PIMA, ML_PIMA, MULTALIGN, and PILEUP8, and also other methods, based on Genetic Algorithms (GA), such as SAGA, MSA-GA and RBT-GA, by solving a number of benchmark datasets from BAliBase 2.0.The experimental results showed that the VDGA with three vertical divisions was the most successful variant for most of the test cases in comparison to other divisions considered with VDGA. The experimental results also confirmed that VDGA outperformed the other methods considered in this research

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

UNSWorks

MSG: A Gap-Oriented Genetic Algorithm for Multiple Sequence Alignment

Author: Heller Philip
Publication venue: SJSU ScholarWorks
Publication date: 01/01/2009
Field of study

Traditional Multiple Sequence Alignment (MSA) Algorithms are deterministic. Genetic algorithms for protein MSA have been documented. However, these are not able to exceed in all cases the scores obtained by ClustalW, the freely available defacto standard. My solution, called “MSG”, places gaps rather than amino acids. The algorithm is multitribal, uses only a few very simple operators with adaptive frequencies, and jumpstarts one population from the ClustalW solution. Results are reported for 14 data sets, on all of which MSG exceeds the ClustalW score

SJSU ScholarWorks

REDHORSE-REcombination and Double crossover detection in Haploid Organisms using next-geneRation SEquencing data

Author: Beverley Stephen M
Khan Asis
Shaik Jahangheer S
Sibley L. David
Publication venue: Digital Commons@Becker
Publication date: 01/01/2015
Field of study

BACKGROUND: Next-generation sequencing technology provides a means to study genetic exchange at a higher resolution than was possible using earlier technologies. However, this improvement presents challenges as the alignments of next generation sequence data to a reference genome cannot be directly used as input to existing detection algorithms, which instead typically use multiple sequence alignments as input. We therefore designed a software suite called REDHORSE that uses genomic alignments, extracts genetic markers, and generates multiple sequence alignments that can be used as input to existing recombination detection algorithms. In addition, REDHORSE implements a custom recombination detection algorithm that makes use of sequence information and genomic positions to accurately detect crossovers. REDHORSE is a portable and platform independent suite that provides efficient analysis of genetic crosses based on Next-generation sequencing data. RESULTS: We demonstrated the utility of REDHORSE using simulated data and real Next-generation sequencing data. The simulated dataset mimicked recombination between two known haploid parental strains and allowed comparison of detected break points against known true break points to assess performance of recombination detection algorithms. A newly generated NGS dataset from a genetic cross of Toxoplasma gondii allowed us to demonstrate our pipeline. REDHORSE successfully extracted the relevant genetic markers and was able to transform the read alignments from NGS to the genome to generate multiple sequence alignments. Recombination detection algorithm in REDHORSE was able to detect conventional crossovers and double crossovers typically associated with gene conversions whilst filtering out artifacts that might have been introduced during sequencing or alignment. REDHORSE outperformed other commonly used recombination detection algorithms in finding conventional crossovers. In addition, REDHORSE was the only algorithm that was able to detect double crossovers. CONCLUSION: REDHORSE is an efficient analytical pipeline that serves as a bridge between genomic alignments and existing recombination detection algorithms. Moreover, REDHORSE is equipped with a recombination detection algorithm specifically designed for Next-generation sequencing data. REDHORSE is portable, platform independent Java based utility that provides efficient analysis of genetic crosses based on Next-generation sequencing data. REDHORSE is available at http://redhorse.sourceforge.net/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-1309-7) contains supplementary material, which is available to authorized users

Springer - Publisher Connector

Digital Commons@Becker

PubMed Central

A Genetic Algorithm For Multiple Sequence Alignment

Author: J.T. Horng
Publication venue: Asia University
Publication date
Field of study

[[abstract]]Multiple sequence alignment is an important tool in molecular sequence analysis. This paper presents genetic algorithms to solve multiple sequence alignments. Several data sets are tested and the experimental results are compared with other methods. We find our approach could obtain good performance in the data sets with high similarity and long sequences.The software can be found in http://rsdb.csie.ncu.edu.tw/tools/msa.htm

Asia University Repository

Alignment of Multiple DNA Sequences by Using Improved GA Operators

Author: Manish Kumar
Publication venue
Publication date: 11/04/2020
Field of study

ABSTRACT One of the most fundamental operations in biological sequence analysis is multiple sequence alignment (MSA). It is a critical tool for biologists to identify the relationships between species and also possibly predict the structure and functionality of biological sequences. The general multiple sequence alignment problem is known to be NP-hard, and hence the problem of finding the best possible multiple sequence alignment is intractable. Therefore, a genetic algorithm based approach has been designed to solve the multiple DNA sequence alignment problem by using different genetic operators. Experimental results with different lengths of DNA sequences has been detailed in this paper . It has also shown that how the increase in length will affect the overall quality of the alignment. The extensive experiment on wide range of datasets and the obtained results has shown the effectiveness of the proposed approach in solving multiple DNA sequences. KEYWORDS: Multiple Sequence Alignment, Genetic Algorithms (GAs), DNA Sequences. INTRODUCTION The main components of the biochemical processes of life are proteins and nucleic acids. There are two types of nucleic acids, deoxyribonucleic acid (DNA) and ribonucleic acid (RNA). DNA sequences are long biomolecular strands composed of four types of nucleotide bases: adenine (A), guanine (G), cytosine (C), and thymine (T). DNA actually occurs as a double strand of such bases. The stands are held together by hydrogen bonds between complementary bases: A-T and G-C. DNA sequences, which consist of hundreds of millions of nucleotides, define the genome of a particular species. Recent advances in bioinformatics have generated volumes of genome data for biomedical research. For example, many immunity genes in the fruit fly genome have nucleotide sequences that are reminiscent of TCGGGGATTTC

CiteSeerX

Soft topographic map for clustering and classification of bacteria

Author: G.M. Garrity
H. Klock
I.T. Joliffe
J.D. Thompson
J.E. Clarridge III
K. Rose
M. Drancourt
M. Drancourt
M. Drancourt
M. Remm
P. Rice
S. Altschul
S. Dubnov
S. Kumar
S.B. Needleman
S.P. Luttrell
T. Graepel
T. Graepel
T. Hofmann
T. Hofmann
T. Kohonen
T. Kohonen
T.H. Jukes
W.S. Torgerson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

In this work a new method for clustering and building a topographic representation of a bacteria taxonomy is presented. The method is based on the analysis of stable parts of the genome, the so-called “housekeeping genes”. The proposed method generates topographic maps of the bacteria taxonomy, where relations among different type strains can be visually inspected and verified. Two well known DNA alignement algorithms are applied to the genomic sequences. Topographic maps are optimized to represent the similarity among the sequences according to their evolutionary distances. The experimental analysis is carried out on 147 type strains of the Gammaprotebacteria class by means of the 16S rRNA housekeeping gene. Complete sequences of the gene have been retrieved from the NCBI public database. In the experimental tests the maps show clusters of homologous type strains and present some singular cases potentially due to incorrect classification or erroneous annotations in the database

Central Archive at the University of Reading

CiteSeerX

Crossref

Archivio istituzionale della ricerca - Università di Palermo