33,333 research outputs found

    Higher accuracy protein Multiple Sequence Alignment by Stochastic Algorithm

    Get PDF
    Multiple Sequence Alignment gives insight into evolutionary, structural and functional relationships among the proteins. Here, a novel Protein Alignment by Stochastic Algorithm (PASA) is developed. Evolutionary operators of a genetic algorithm, namely, mutation and selection are utilized in combining the output of two most important sequence alignment programs and then developing an optimized new algorithm. Efficiency of protein alignments is evaluated in terms of Total Column score which is equal to the number of correctly aligned columns between a test alignment and the reference alignment divided by the total number of columns in the reference alignment. The PASA optimizer achieves, on an average, significant better alignment over the well known individual bioinformatics tools. This PASA is statistically the most accurate protein alignment method today. It can have potential applications in drug discovery processes in the biotechnology industry

    MISSEL: a method to identify a large number of small species-specific genomic subsequences and its application to viruses classification

    Get PDF
    Continuous improvements in next generation sequencing technologies led to ever-increasing collections of genomic sequences, which have not been easily characterized by biologists, and whose analysis requires huge computational effort. The classification of species emerged as one of the main applications of DNA analysis and has been addressed with several approaches, e.g., multiple alignments-, phylogenetic trees-, statistical- and character-based methods

    The EM Algorithm and the Rise of Computational Biology

    Get PDF
    In the past decade computational biology has grown from a cottage industry with a handful of researchers to an attractive interdisciplinary field, catching the attention and imagination of many quantitatively-minded scientists. Of interest to us is the key role played by the EM algorithm during this transformation. We survey the use of the EM algorithm in a few important computational biology problems surrounding the "central dogma"; of molecular biology: from DNA to RNA and then to proteins. Topics of this article include sequence motif discovery, protein sequence alignment, population genetics, evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Detection of recombination in DNA multiple alignments with hidden markov models

    Get PDF
    CConventional phylogenetic tree estimation methods assume that all sites in a DNA multiple alignment have the same evolutionary history. This assumption is violated in data sets from certain bacteria and viruses due to recombination, a process that leads to the creation of mosaic sequences from different strains and, if undetected, causes systematic errors in phylogenetic tree estimation. In the current work, a hidden Markov model (HMM) is employed to detect recombination events in multiple alignments of DNA sequences. The emission probabilities in a given state are determined by the branching order (topology) and the branch lengths of the respective phylogenetic tree, while the transition probabilities depend on the global recombination probability. The present study improves on an earlier heuristic parameter optimization scheme and shows how the branch lengths and the recombination probability can be optimized in a maximum likelihood sense by applying the expectation maximization (EM) algorithm. The novel algorithm is tested on a synthetic benchmark problem and is found to clearly outperform the earlier heuristic approach. The paper concludes with an application of this scheme to a DNA sequence alignment of the argF gene from four Neisseria strains, where a likely recombination event is clearly detected
    • …
    corecore