32,277 research outputs found
An overview of the role of context-sensitive HMMs in the prediction of ncRNA genes
Non-coding RNAs (ncRNA) are RNA molecules that function in the cells without being translated into proteins. In recent years, much evidence has been found that ncRNAs play a crucial role in various biological processes. As a result, there has been an increasing interest in the prediction of ncRNA genes. Due to the conserved secondary structure in ncRNAs, there exist pairwise dependencies between distant bases. These dependencies cannot be effectively modeled using traditional HMMs, and we need a more complex model such as the context-sensitive HMM (csHMM). In this paper, we overview the role of csHMMs in the RNA secondary structure analysis and the prediction of ncRNA genes. It is demonstrated that the context-sensitive HMMs can serve as an efficient framework for these purposes
Computational identification and analysis of noncoding RNAs - Unearthing the buried treasures in the genome
The central dogma of molecular biology states that the genetic information flows from DNA to RNA to protein. This dogma has exerted a substantial influence on our understanding of the genetic activities in the cells. Under this influence, the prevailing assumption until the recent past was that genes are basically repositories for protein coding information, and proteins are responsible for most of the important biological functions in all cells. In the meanwhile, the importance of RNAs has remained rather obscure, and RNA was mainly viewed as a passive intermediary that bridges the gap between DNA and protein. Except for classic examples such as tRNAs (transfer RNAs) and rRNAs (ribosomal RNAs), functional noncoding RNAs were considered to be rare.
However, this view has experienced a dramatic change during the last decade, as systematic screening of various genomes identified myriads of noncoding RNAs (ncRNAs), which are RNA molecules that function without being translated into proteins [11], [40]. It has been realized that many ncRNAs play important roles in various biological processes. As RNAs can interact with other RNAs and DNAs in a sequence-specific manner, they are especially useful in tasks that require highly specific nucleotide recognition [11]. Good examples are the miRNAs (microRNAs) that regulate gene expression by targeting mRNAs (messenger RNAs) [4], [20], and the siRNAs (small interfering RNAs) that take part in the RNAi (RNA interference) pathways for gene silencing [29], [30]. Recent developments show that ncRNAs are extensively involved in many gene regulatory mechanisms [14], [17].
The roles of ncRNAs known to this day are truly diverse. These include transcription and translation control, chromosome replication, RNA processing and modification, and protein degradation and translocation [40], just to name a few. These days, it is even claimed that ncRNAs dominate the genomic output of the higher organisms such as mammals, and it is being suggested that the greater portion of their genome (which does not encode proteins) is dedicated to the control and regulation of cell development [27]. As more and more evidence piles up, greater attention is paid to ncRNAs, which have been neglected for a long time. Researchers began to realize that the vast majority of the genome that was regarded as “junk,” mainly because it was not well understood, may indeed hold the key for the best kept secrets in life, such as the mechanism of alternative splicing, the control of epigenetic variations and so forth [27]. The complete range and extent of the role of ncRNAs are not so obvious at this point, but it is certain that a comprehensive understanding of cellular processes is not possible without understanding the functions of ncRNAs [47]
The EM Algorithm and the Rise of Computational Biology
In the past decade computational biology has grown from a cottage industry
with a handful of researchers to an attractive interdisciplinary field,
catching the attention and imagination of many quantitatively-minded
scientists. Of interest to us is the key role played by the EM algorithm during
this transformation. We survey the use of the EM algorithm in a few important
computational biology problems surrounding the "central dogma"; of molecular
biology: from DNA to RNA and then to proteins. Topics of this article include
sequence motif discovery, protein sequence alignment, population genetics,
evolutionary models and mRNA expression microarray data analysis.Comment: Published in at http://dx.doi.org/10.1214/09-STS312 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Regulatory motif discovery using a population clustering evolutionary algorithm
This paper describes a novel evolutionary algorithm for regulatory motif discovery in DNA promoter sequences. The algorithm uses data clustering to logically distribute the evolving population across the search space. Mating then takes place within local regions of the population, promoting overall solution diversity and encouraging discovery of multiple solutions. Experiments using synthetic data sets have demonstrated the algorithm's capacity to find position frequency matrix models of known regulatory motifs in relatively long promoter sequences. These experiments have also shown the algorithm's ability to maintain diversity during search and discover multiple motifs within a single population. The utility of the algorithm for discovering motifs in real biological data is demonstrated by its ability to find meaningful motifs within muscle-specific regulatory sequences
Structural Alignment of RNAs Using Profile-csHMMs and Its Application to RNA Homology Search: Overview and New Results
Systematic research on noncoding RNAs (ncRNAs) has revealed that many ncRNAs are actively involved in various biological networks. Therefore, in order to fully understand the mechanisms of these networks, it is crucial to understand the roles of ncRNAs. Unfortunately, the annotation of ncRNA genes that give rise to functional RNA molecules has begun only recently, and it is far from being complete. Considering the huge amount of genome sequence data, we need efficient computational methods for finding ncRNA genes. One effective way of finding ncRNA genes is to look for regions that are similar to known ncRNA genes. As many ncRNAs have well-conserved secondary structures, we need statistical models that can represent such structures for this purpose. In this paper, we propose a new method for representing RNA sequence profiles and finding structural alignment of RNAs based on profile context-sensitive hidden Markov models (profile-csHMMs). Unlike existing models, the proposed approach can handle any kind of RNA secondary structures, including pseudoknots. We show that profile-csHMMs can provide an effective framework for the computational analysis of RNAs and the identification of ncRNA genes
Computational Identification of Four Spliceosomal snRNAs from the Deep-Branching Eukaryote Giardia intestinalis
Funding: Marsden Fund New Zealand Allan Wilson Centre The funders had no role in study design, data collection and analysis, decision to publish, or
preparation of the manuscript.RNAs processing other RNAs is very general in eukaryotes, but is not clear to what extent it is ancestral to eukaryotes. Here
we focus on pre-mRNA splicing, one of the most important RNA-processing mechanisms in eukaryotes. In most eukaryotes
splicing is predominantly catalysed by the major spliceosome complex, which consists of five uridine-rich small nuclear
RNAs (U-snRNAs) and over 200 proteins in humans. Three major spliceosomal introns have been found experimentally in
Giardia; one Giardia U-snRNA (U5) and a number of spliceosomal proteins have also been identified. However, because of
the low sequence similarity between the Giardia ncRNAs and those of other eukaryotes, the other U-snRNAs of Giardia had
not been found. Using two computational methods, candidates for Giardia U1, U2, U4 and U6 snRNAs were identified in this
study and shown by RT-PCR to be expressed. We found that identifying a U2 candidate helped identify U6 and U4 based on
interactions between them. Secondary structural modelling of the Giardia U-snRNA candidates revealed typical features of
eukaryotic U-snRNAs. We demonstrate a successful approach to combine computational and experimental methods to
identify expected ncRNAs in a highly divergent protist genome. Our findings reinforce the conclusion that spliceosomal
small-nuclear RNAs existed in the last common ancestor of eukaryotes
The Echinococcus canadensis (G7) genome: A key knowledge of parasitic platyhelminth human diseases
Background: The parasite Echinococcus canadensis (G7) (phylum Platyhelminthes, class Cestoda) is one of the causative agents of echinococcosis. Echinococcosis is a worldwide chronic zoonosis affecting humans as well as domestic and wild mammals, which has been reported as a prioritized neglected disease by the World Health Organisation. No genomic data, comparative genomic analyses or efficient therapeutic and diagnostic tools are available for this severe disease. The information presented in this study will help to understand the peculiar biological characters and to design species-specific control tools. Results: We sequenced, assembled and annotated the 115-Mb genome of E. canadensis (G7). Comparative genomic analyses using whole genome data of three Echinococcus species not only confirmed the status of E. canadensis (G7) as a separate species but also demonstrated a high nucleotide sequences divergence in relation to E. granulosus (G1). The E. canadensis (G7) genome contains 11,449 genes with a core set of 881 orthologs shared among five cestode species. Comparative genomics revealed that there are more single nucleotide polymorphisms (SNPs) between E. canadensis (G7) and E. granulosus (G1) than between E. canadensis (G7) and E. multilocularis. This result was unexpected since E. canadensis (G7) and E. granulosus (G1) were considered to belong to the species complex E. granulosus sensu lato. We described SNPs in known drug targets and metabolism genes in the E. canadensis (G7) genome. Regarding gene regulation, we analysed three particular features: CpG island distribution along the three Echinococcus genomes, DNA methylation system and small RNA pathway. The results suggest the occurrence of yet unknown gene regulation mechanisms in Echinococcus. Conclusions: This is the first work that addresses Echinococcus comparative genomics. The resources presented here will promote the study of mechanisms of parasite development as well as new tools for drug discovery. The availability of a high-quality genome assembly is critical for fully exploring the biology of a pathogenic organism. The E. canadensis (G7) genome presented in this study provides a unique opportunity to address the genetic diversity among the genus Echinococcus and its particular developmental features. At present, there is no unequivocal taxonomic classification of Echinococcus species; however, the genome-wide SNPs analysis performed here revealed the phylogenetic distance among these three Echinococcus species. Additional cestode genomes need to be sequenced to be able to resolve their phylogeny.Fil: Maldonado, Lucas Luciano. Consejo Nacional de Investigaciones CientĂficas y TĂ©cnicas. Oficina de CoordinaciĂłn Administrativa Houssay. Instituto de Investigaciones en MicrobiologĂa y ParasitologĂa MĂ©dica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en MicrobiologĂa y ParasitologĂa MĂ©dica; ArgentinaFil: Assis, Juliana. FundaciĂłn Oswaldo Cruz; BrasilFil: Gomes AraĂşjo, Flávio M.. FundaciĂłn Oswaldo Cruz; BrasilFil: Salim, Anna C. M.. FundaciĂłn Oswaldo Cruz; BrasilFil: Macchiaroli, Natalia. Consejo Nacional de Investigaciones CientĂficas y TĂ©cnicas. Oficina de CoordinaciĂłn Administrativa Houssay. Instituto de Investigaciones en MicrobiologĂa y ParasitologĂa MĂ©dica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en MicrobiologĂa y ParasitologĂa MĂ©dica; ArgentinaFil: Cucher, Marcela Alejandra. Consejo Nacional de Investigaciones CientĂficas y TĂ©cnicas. Oficina de CoordinaciĂłn Administrativa Houssay. Instituto de Investigaciones en MicrobiologĂa y ParasitologĂa MĂ©dica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en MicrobiologĂa y ParasitologĂa MĂ©dica; ArgentinaFil: Camicia, Federico. Consejo Nacional de Investigaciones CientĂficas y TĂ©cnicas. Oficina de CoordinaciĂłn Administrativa Houssay. Instituto de Investigaciones en MicrobiologĂa y ParasitologĂa MĂ©dica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en MicrobiologĂa y ParasitologĂa MĂ©dica; ArgentinaFil: Fox, Adolfo. Consejo Nacional de Investigaciones CientĂficas y TĂ©cnicas. Oficina de CoordinaciĂłn Administrativa Houssay. Instituto de Investigaciones en MicrobiologĂa y ParasitologĂa MĂ©dica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en MicrobiologĂa y ParasitologĂa MĂ©dica; ArgentinaFil: Rosenzvit, Mara Cecilia. Consejo Nacional de Investigaciones CientĂficas y TĂ©cnicas. Oficina de CoordinaciĂłn Administrativa Houssay. Instituto de Investigaciones en MicrobiologĂa y ParasitologĂa MĂ©dica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en MicrobiologĂa y ParasitologĂa MĂ©dica; ArgentinaFil: Oliveira, Guilherme. Instituto TecnolĂłgico Vale; Brasil. FundaciĂłn Oswaldo Cruz; BrasilFil: Kamenetzky, Laura. Consejo Nacional de Investigaciones CientĂficas y TĂ©cnicas. Oficina de CoordinaciĂłn Administrativa Houssay. Instituto de Investigaciones en MicrobiologĂa y ParasitologĂa MĂ©dica. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Investigaciones en MicrobiologĂa y ParasitologĂa MĂ©dica; Argentin
- …