20,102 research outputs found

    An Algorithm to Find All Identical Motifs in Multiple Biological Sequences

    Get PDF
    Sequence motifs are of greater biological importance in nucleotide and protein sequences. The conserved occurrence of identical motifs represents the functional significance and helps to classify the biological sequences. In this paper, a new algorithm is proposed to find all identical motifs in multiple nucleotide or protein sequences. The proposed algorithm uses the concept of dynamic programming. The application of this algorithm includes the identification of (a) conserved identical sequence motifs and (b) identical or direct repeat sequence motifs across multiple biological sequences (nucleotide or protein sequences). Further, the proposed algorithm facilitates the analysis of comparative internal sequence repeats for the evolutionary studies which helps to derive the phylogenetic relationships from the distribution of repeats. © 2010 Springer-Verlag

    The EM Algorithm and the Rise of Computational Biology

    Get PDF
    In the past decade computational biology has grown from a cottage industry with a handful of researchers to an attractive interdisciplinary field, catching the attention and imagination of many quantitatively-minded scientists. Of interest to us is the key role played by the EM algorithm during this transformation. We survey the use of the EM algorithm in a few important computational biology problems surrounding the "central dogma"; of molecular biology: from DNA to RNA and then to proteins. Topics of this article include sequence motif discovery, protein sequence alignment, population genetics, evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A Bioinformatics Approach for Detecting Repetitive Nested Motifs using Pattern Matching

    Get PDF
    The identification of nested motifs in genomic sequences is a complex computational problem. The detection of these patterns is important to allow discovery of transposable element (TE) insertions, incomplete reverse transcripts, deletions, and/or mutations. Here, we designed a de novo strategy for detecting patterns that represent nested motifs based on exhaustive searches for pairs of motifs and combinatorial pattern analysis. These patterns can be grouped into three categories: motifs within other motifs, motifs flanked by other motifs, and motifs of large size. Our methodology, applied to genomic sequences from the plant species Aegilops tauschii and Oryza sativa, revealed that it is possible to find putative nested TEs by detecting these three types of patterns. The results were validated though BLAST alignments, which revealed the efficacy and usefulness of the new method, which we call Mamushka.Fil: Romero, José Rodolfo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Centro de Recursos Naturales Renovables de la Zona Semiárida. Universidad Nacional del Sur. Centro de Recursos Naturales Renovables de la Zona Semiárida; ArgentinaFil: Carballido, Jessica Andrea. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Cs. E Ingeniería de la Computacion; ArgentinaFil: Garbus, Ingrid. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Centro de Recursos Naturales Renovables de la Zona Semiárida. Universidad Nacional del Sur. Centro de Recursos Naturales Renovables de la Zona Semiárida; ArgentinaFil: Echenique, Carmen Viviana. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Centro de Recursos Naturales Renovables de la Zona Semiárida. Universidad Nacional del Sur. Centro de Recursos Naturales Renovables de la Zona Semiárida; ArgentinaFil: Ponzoni, Ignacio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Cs. E Ingeniería de la Computacion; Argentin

    MISSEL: a method to identify a large number of small species-specific genomic subsequences and its application to viruses classification

    Get PDF
    Continuous improvements in next generation sequencing technologies led to ever-increasing collections of genomic sequences, which have not been easily characterized by biologists, and whose analysis requires huge computational effort. The classification of species emerged as one of the main applications of DNA analysis and has been addressed with several approaches, e.g., multiple alignments-, phylogenetic trees-, statistical- and character-based methods
    • …
    corecore