12 research outputs found

    RDMAS: a web server for RNA deleterious mutation analysis

    Get PDF
    BACKGROUND: The diverse functions of ncRNAs critically depend on their structures. Mutations in ncRNAs disrupting the structures of functional sites are expected to be deleterious. RNA deleterious mutations have attracted wide attentions because some of them in cells result in serious disease, and some others in microbes influence their fitness. RESULTS: The RDMAS web server we describe here is an online tool for evaluating structural deleteriousness of single nucleotide mutation in RNA genes. Several structure comparison methods have been integrated; sub-optimal structures predicted can be optionally involved to mitigate the uncertainty of secondary structure prediction. With a user-friendly interface, the web application is easy to use. Intuitive illustrations are provided along with the original computational results to facilitate quick analysis. CONCLUSION: RDMAS can be used to explore the structure alterations which cause mutations pathogenic, and to predict deleterious mutations which may help to determine the functionally critical regions. RDMAS is freely accessed via

    Kernel methods in genomics and computational biology

    Full text link
    Support vector machines and kernel methods are increasingly popular in genomics and computational biology, due to their good performance in real-world applications and strong modularity that makes them suitable to a wide range of problems, from the classification of tumors to the automatic annotation of proteins. Their ability to work in high dimension, to process non-vectorial data, and the natural framework they provide to integrate heterogeneous data are particularly relevant to various problems arising in computational biology. In this chapter we survey some of the most prominent applications published so far, highlighting the particular developments in kernel methods triggered by problems in biology, and mention a few promising research directions likely to expand in the future

    Graph kernels based on tree patterns for molecules

    Full text link
    Motivated by chemical applications, we revisit and extend a family of positive definite kernels for graphs based on the detection of common subtrees, initially proposed by Ramon et al. (2003). We propose new kernels with a parameter to control the complexity of the subtrees used as features to represent the graphs. This parameter allows to smoothly interpolate between classical graph kernels based on the count of common walks, on the one hand, and kernels that emphasize the detection of large common subtrees, on the other hand. We also propose two modular extensions to this formulation. The first extension increases the number of subtrees that define the feature space, and the second one removes noisy features from the graph representations. We validate experimentally these new kernels on binary classification tasks consisting in discriminating toxic and non-toxic molecules with support vector machines

    Computational prediction, experiment design and statistical validations of non-coding regulatory RNA

    Get PDF
    Non-coding regulatory RNAs (ncRNAs) regulate a host of gene functions in prokaryotes, e.g., transcription and translation regulations, RNA processing and modification, and mRNA stability. Some ncRNAs have been identified experimentally, but many are yet to be found. ncRNAs can be classified as either cis- or trans-acting. cis-ncRNAs perfectly complement their target genes and are usually encoded on the anti-sense strands of the targets. On the contrary, trans-ncRNAs regulate their target genes through short and often imperfect base-pairings with the targets, and are usually encoded elsewhere on the genome. A whole-genome thermodynamic analysis can be performed to identify all imperfect but stable base-pairings between all annotated genes and some genomic regions encoding ncRNAs from the same species. However, the sizes of these base-paring regions are short and variable, and their melting temperatures vary greatly between perfectly and imperfectly matched targets. It is difficult to predict trans-acting ncRNAs solely based on the thermodynamic analysis. Therefore, we also have to consider known ncRNA structures to improve our predictions. We find that Hfq-binding ncRNAs, which require Hfq protein to function, share three common structural properties. We predict these special ncRNAs in E. coli and Agrobacterium tumefaciens according to a systematic, novel 5-step approach based on thermodynamic analyses as well as known structural properties of this class of ncRNAs. Whole genome tiling microarrays are chosen to validate our predictions. We describe how the microarrays have been designed, created, and validated for E. coli MG1655 and Agrobacterium tumefaciens C58. We match our new ncRNA prediction results with known ncRNAs, calculate correlation coefficient values between each ncRNA candidate and their predicted targets measure by the whole-genome tiling microarrays, and confirm the results with 3 other ncRNA identification software tools. We also perform a gene ontology network analysis to reveal the associations of ncRNA candidates and their predicted targets. Our novel 5-step prediction method is generally applicable to other prokaryote species and may help advance ncRNA research in prokaryotes

    A Predictive Model Which Uses Descriptors of RNA Secondary Structures Derived from Graph Theory.

    Get PDF
    The secondary structures of ribonucleic acid (RNA) have been successfully modeled with graph-theoretic structures. Often, simple graphs are used to represent secondary RNA structures; however, in this research, a multigraph representation of RNA is used, in which vertices represent stems and edges represent the internal motifs. Any type of RNA secondary structure may be represented by a graph in this manner. We define novel graphical invariants to quantify the multigraphs and obtain characteristic descriptors of the secondary structures. These descriptors are used to train an artificial neural network (ANN) to recognize the characteristics of secondary RNA structure. Using the ANN, we classify the multigraphs as either RNA-like or not RNA-like. This classification method produced results similar to other classification methods. Given the expanding library of secondary RNA motifs, this method may provide a tool to help identify new structures and to guide the rational design of RNA molecules

    Graphical methods in RNA structure matching

    Get PDF
    Eukaryotic genomes are pervasively transcribed; almost every base can be found in an RNA transcript. This is a surprising observation since most of the genome does not encode proteins. This RNA must serve an important regulatory function – important because producing non-coding RNA is an energy intensive process, and in the absence of strong selection one would expect it to disappear. RNA families with common functions have specifically conserved structural motifs, which are directly related to the functional roles of RNA in catalysis and regulation. Because the conserved structures depend on base-pairing, similar RNA structures may have little or no detectable sequence similarity, making the identification of conserved RNAs difficult. This is a particularly serious problem when studying regulatory structures in RNA. In many cases, such as that of cellular internal ribosome entry sites, although we can identify RNAs that have similar regulatory responses, it is difficult to tell whether the RNAs have common structural features using current methods. Available tools for identifying common structures based on RNA sequence suffer from one or more of the following problems: they do not consider pseudoknots, which are important in many catalytic and regulatory structures; they do not consider near minimum free energy structures, which is important as many RNAs exist as an ensemble of structures of nearly equal energy; they require many examples of known structures in order to train a computational model; they require impractical amounts of computational time, precluding their use on long sequences or genomic scale; or they use a similarity function that cannot identify RNAs as having similar structure, even when they are from one of the well characterized known classes. The approach presented here has the potential to address all of these issues, allowing novel RNA structures that are shared between RNAs with little or no sequence similarity to be discovered. This provides a powerful tool to investigate and explain the pervasive transcription observed in eukaryotic genomes

    DNA Chemical Reaction Network Design Synthesis and Compilation

    Get PDF
    The advantages of biomolecular computing include 1) the ability to interface with, monitor, and intelligently protect and maintain the functionality of living systems, 2) the ability to create computational devices with minimal energy needs and hazardous waste production during manufacture and lifecycle, 3) the ability to store large amounts of information for extremely long time periods, and 4) the ability to create computation analogous to human brain function. To realize these advantages over electronics, biomolecular computing is at a watershed moment in its evolution. Computing with entire molecules presents different challenges and requirements than computing just with electric charge. These challenges have led to ad-hoc design and programming methods with high development costs and limited device performance. At the present time, device building entails complete low-level detail immersion. We address these shortcomings by creation of a systems engineering process for building and programming DNA-based computing devices. Contributions of this thesis include numeric abstractions for nucleic acid sequence and secondary structure, and a set of algorithms which employ these abstractions. The abstractions and algorithms have been implemented into three artifacts: DNADL, a design description language; Pyxis, a molecular compiler and design toolset; and KCA, a simulation of DNA kinetics using a cellular automaton discretization. Our methods are applicable to other DNA nanotechnology constructions and may serve in the development of a full DNA computing model
    corecore