89,878 research outputs found

    MSG: A Gap-Oriented Genetic Algorithm for Multiple Sequence Alignment

    Get PDF
    Traditional Multiple Sequence Alignment (MSA) Algorithms are deterministic. Genetic algorithms for protein MSA have been documented. However, these are not able to exceed in all cases the scores obtained by Clustal­W, the freely available de­facto standard. My solution, called “MSG”, places gaps rather than amino acids. The algorithm is multi­tribal, uses only a few very simple operators with adaptive frequencies, and jumpstarts one population from the Clustal­W solution. Results are reported for 14 data sets, on all of which MSG exceeds the Clustal­W score

    A Genetic Algorithm For Multiple Sequence Alignment

    Get PDF
    [[abstract]]Multiple sequence alignment is an important tool in molecular sequence analysis. This paper presents genetic algorithms to solve multiple sequence alignments. Several data sets are tested and the experimental results are compared with other methods. We find our approach could obtain good performance in the data sets with high similarity and long sequences.The software can be found in http://rsdb.csie.ncu.edu.tw/tools/msa.htm

    Genomic multiple sequence alignments: refinement using a genetic algorithm

    Get PDF
    BACKGROUND: Genomic sequence data cannot be fully appreciated in isolation. Comparative genomics – the practice of comparing genomic sequences from different species – plays an increasingly important role in understanding the genotypic differences between species that result in phenotypic differences as well as in revealing patterns of evolutionary relationships. One of the major challenges in comparative genomics is producing a high-quality alignment between two or more related genomic sequences. In recent years, a number of tools have been developed for aligning large genomic sequences. Most utilize heuristic strategies to identify a series of strong sequence similarities, which are then used as anchors to align the regions between the anchor points. The resulting alignment is globally correct, but in many cases is suboptimal locally. We describe a new program, GenAlignRefine, which improves the overall quality of global multiple alignments by using a genetic algorithm to improve local regions of alignment. Regions of low quality are identified, realigned using the program T-Coffee, and then refined using a genetic algorithm. Because a better COFFEE (Consistency based Objective Function For alignmEnt Evaluation) score generally reflects greater alignment quality, the algorithm searches for an alignment that yields a better COFFEE score. To improve the intrinsic slowness of the genetic algorithm, GenAlignRefine was implemented as a parallel, cluster-based program. RESULTS: We tested the GenAlignRefine algorithm by running it on a Linux cluster to refine sequences from a simulation, as well as refine a multiple alignment of 15 Orthopoxvirus genomic sequences approximately 260,000 nucleotides in length that initially had been aligned by Multi-LAGAN. It took approximately 150 minutes for a 40-processor Linux cluster to optimize some 200 fuzzy (poorly aligned) regions of the orthopoxvirus alignment. Overall sequence identity increased only slightly; but significantly, this occurred at the same time that the overall alignment length decreased – through the removal of gaps – by approximately 200 gapped regions representing roughly 1,300 gaps. CONCLUSION: We have implemented a genetic algorithm in parallel mode to optimize multiple genomic sequence alignments initially generated by various alignment tools. Benchmarking experiments showed that the refinement algorithm improved genomic sequence alignments within a reasonable period of time

    Vertical decomposition with Genetic Algorithm for Multiple Sequence Alignment

    Get PDF
    Many Bioinformatics studies begin with a multiple sequence alignment as the foundation for their research. This is because multiple sequence alignment can be a useful technique for studying molecular evolution and analyzing sequence structure relationships.In this paper, we have proposed a Vertical Decomposition with Genetic Algorithm (VDGA) for Multiple Sequence Alignment (MSA). In VDGA, we divide the sequences vertically into two or more subsequences, and then solve them individually using a guide tree approach. Finally, we combine all the subsequences to generate a new multiple sequence alignment. This technique is applied on the solutions of the initial generation and of each child generation within VDGA. We have used two mechanisms to generate an initial population in this research: the first mechanism is to generate guide trees with randomly selected sequences and the second is shuffling the sequences inside such trees. Two different genetic operators have been implemented with VDGA. To test the performance of our algorithm, we have compared it with existing well-known methods, namely PRRP, CLUSTALX, DIALIGN, HMMT, SB_PIMA, ML_PIMA, MULTALIGN, and PILEUP8, and also other methods, based on Genetic Algorithms (GA), such as SAGA, MSA-GA and RBT-GA, by solving a number of benchmark datasets from BAliBase 2.0.The experimental results showed that the VDGA with three vertical divisions was the most successful variant for most of the test cases in comparison to other divisions considered with VDGA. The experimental results also confirmed that VDGA outperformed the other methods considered in this research

    Self-Organizing Genetic Algorithm for Multiple Sequence Alignment

    Get PDF
    Genetic algorithm (GA) used to solve the optimization problem is self-organized and applied to Multiple Sequence Alignment (MSA), an essential process in molecular sequence analysis. This paper presents the first attempt in applying Self-Organizing Genetic Algorithm for MSA. Self-organizing genetic algorithm (SOGA) can be developed with the complete knowledge about the problem and its parameters. In SOGA, values of various parameters are decided based on the problem and fitness value obtained in each generation. The proposed algorithm undergoes a self-organizing crossover operation by selecting an appropriate rate or a point and a self-organizing cyclic mutation for the required number of generations. The advantages of the proposed algorithm are (i) reduce the time requirement for optimizing the parameter values (ii) prevent execution with default values (iii) avoid premature convergence by the cyclic mutation operation. To validate the efficiency, SOGA is applied to MSA, and the resulting alignment is evaluated using the column score (CS). The comparison result shows that the alignment produced by SOGA is better than the widely used tools like Dialign and Multalin. It is also evident that the proposed algorithm can produce optimal or closer-to-optimal alignment compared to tools like ClustalW, Mafft, Dialign and Multalin

    Multiple Sequence Alignment While Assessing Saturation Across Sequence Data

    Get PDF
    Constructing and analyzing phylogenetic trees is central to biological disciplines such as evolutionary and systematic biology. Accurate phylogenetic inference improves the estimation of evolutionary relationships, rates of molecular evolution, and Operational Taxonomic Units (OTUs). Careful alignment of sequence data is critical prior to any phylogenetic reconstruction, and there are many different multiple sequence alignment programs that are currently used (reviewed in Edgar & Batzoglou 2006). However, difficulty persists when using alignments to accurately determine actual genetic divergences. A major, yet under-explored, problem is saturation: the repetition of base substitutions at a single site within a sequence. Saturation causes issues because numerous substitutions in sequences within an alignment can erroneously underestimate divergence. Here, we present an algorithm, Splinter, that identifies and accounts for saturation during DNA sequence alignment

    Multiple Sequence Alignment with Pro le Hidden Markov Models

    Get PDF
    The human genome consists of various patterns and sequences that are of biolog- ical signi cance. Capturing these patterns can help us in resolving various mysteries related to the genome, like how genomes evolve, how diseases occur due to genetic mutation, how viruses mutate to cause new disease and what is the cure for these diseases. All these applications are covered in the study of bioinformatics. One of the very common tasks in bioinformatics involves simultaneous alignment of a number of biological sequences. In bioinformatics, this is widely known as Mul- tiple Sequence Alignment. Multiple sequence alignments help in grouping together organisms with the same evolutionary history. They also help in learning properties of a new sequence by aligning it with previously studied homologous sequences. This project covers probabilistic modeling method to perform multiple sequence alignment (MSA). Use of hidden Markov models in MSA signi cantly improves com- putational speed especially for sequences that contain overlapping regions. We used Baum-Welch expectation maximization algorithm to train hidden Markov models and Viterbi algorithm to align the sequences. Our results are comparable to the ones obtained by publicly available packages like ClustalW and Clustal Omega

    FFAS server: novel features and applications.

    Get PDF
    The Fold and Function Assignment System (FFAS) server [Jaroszewski et al. (2005) FFAS03: a server for profile-profile sequence alignments. Nucleic Acids Research, 33, W284-W288] implements the algorithm for protein profile-profile alignment introduced originally in [Rychlewski et al. (2000) Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Science: a Publication of the Protein Society, 9, 232-241]. Here, we present updates, changes and novel functionality added to the server since 2005 and discuss its new applications. The sequence database used to calculate sequence profiles was enriched by adding sets of publicly available metagenomic sequences. The profile of a user's protein can now be compared with ∼20 additional profile databases, including several complete proteomes, human proteins involved in genetic diseases and a database of microbial virulence factors. A newly developed interface uses a system of tabs, allowing the user to navigate multiple results pages, and also includes novel functionality, such as a dotplot graph viewer, modeling tools, an improved 3D alignment viewer and links to the database of structural similarities. The FFAS server was also optimized for speed: running times were reduced by an order of magnitude. The FFAS server, http://ffas.godziklab.org, has no log-in requirement, albeit there is an option to register and store results in individual, password-protected directories. Source code and Linux executables for the FFAS program are available for download from the FFAS server

    Solving multiple sequence alignment problems by using a swarm intelligent optimization based approach

    Get PDF
    In this article, the alignment of multiple sequences is examined through swarm intelligence based an improved particle swarm optimization (PSO). A random heuristic technique for solving discrete optimization problems and realistic estimation was recently discovered in PSO. The PSO approach is a nature-inspired technique based on intelligence and swarm movement. Thus, each solution is encoded as “chromosomes” in the genetic algorithm (GA). Based on the optimization of the objective function, the fitness function is designed to maximize the suitable components of the sequence and reduce the unsuitable components of the sequence. The availability of a public benchmark data set such as the Bali base is seen as an assessment of the proposed system performance, with the potential for PSO to reveal problems in adapting to better performance. This proposed system is compared with few existing approaches such as deoxyribonucleic acid (DNA) or ribonucleic acid (RNA) alignment (DIALIGN), PILEUP8, hidden Markov model training (HMMT), rubber band technique-genetic algorithm (RBT-GA) and ML-PIMA. In many cases, the experimental results are well implemented in the proposed system compared to other existing approaches

    Alignment of Multiple DNA Sequences by Using Improved GA Operators

    Get PDF
    ABSTRACT One of the most fundamental operations in biological sequence analysis is multiple sequence alignment (MSA). It is a critical tool for biologists to identify the relationships between species and also possibly predict the structure and functionality of biological sequences. The general multiple sequence alignment problem is known to be NP-hard, and hence the problem of finding the best possible multiple sequence alignment is intractable. Therefore, a genetic algorithm based approach has been designed to solve the multiple DNA sequence alignment problem by using different genetic operators. Experimental results with different lengths of DNA sequences has been detailed in this paper . It has also shown that how the increase in length will affect the overall quality of the alignment. The extensive experiment on wide range of datasets and the obtained results has shown the effectiveness of the proposed approach in solving multiple DNA sequences. KEYWORDS: Multiple Sequence Alignment, Genetic Algorithms (GAs), DNA Sequences. INTRODUCTION The main components of the biochemical processes of life are proteins and nucleic acids. There are two types of nucleic acids, deoxyribonucleic acid (DNA) and ribonucleic acid (RNA). DNA sequences are long biomolecular strands composed of four types of nucleotide bases: adenine (A), guanine (G), cytosine (C), and thymine (T). DNA actually occurs as a double strand of such bases. The stands are held together by hydrogen bonds between complementary bases: A-T and G-C. DNA sequences, which consist of hundreds of millions of nucleotides, define the genome of a particular species. Recent advances in bioinformatics have generated volumes of genome data for biomedical research. For example, many immunity genes in the fruit fly genome have nucleotide sequences that are reminiscent of TCGGGGATTTC
    corecore