74 research outputs found

    A Review on Recent Computational Methods for Predicting Noncoding RNAs

    Get PDF
    Noncoding RNAs (ncRNAs) play important roles in various cellular activities and diseases. In this paper, we presented a comprehensive review on computational methods for ncRNA prediction, which are generally grouped into four categories: (1) homology-based methods, that is, comparative methods involving evolutionarily conserved RNA sequences and structures, (2) de novo methods using RNA sequence and structure features, (3) transcriptional sequencing and assembling based methods, that is, methods designed for single and pair-ended reads generated from next-generation RNA sequencing, and (4) RNA family specific methods, for example, methods specific for microRNAs and long noncoding RNAs. In the end, we summarized the advantages and limitations of these methods and pointed out a few possible future directions for ncRNA prediction. In conclusion, many computational methods have been demonstrated to be effective in predicting ncRNAs for further experimental validation. They are critical in reducing the huge number of potential ncRNAs and pointing the community to high confidence candidates. In the future, high efficient mapping technology and more intrinsic sequence features (e.g., motif and -mer frequencies) and structure features (e.g., minimum free energy, conserved stem-loop, or graph structures) are suggested to be combined with the next-and third-generation sequencing platforms to improve ncRNA prediction

    RNA syntax and semantics: investigating the transcriptome complexity

    Get PDF
    The basic idea of this thesis is to reconstruct an heterogeneous network depicting lncRNA-protein interactions that would summarize what is currently known, allow the prediction of lacking features and thus give a complete mechanistic understanding of the functions of lncRNAs by the network topological analysis. Unfortunately, this approach raised problems related to different aspects. Firstly, even if recent studies show that a growing number of lncRNAs play critical roles in complex cellular processes and that they are implicated in a wide range of human diseases, the fraction of annotated lncRNAs is still small. Secondly, as of today, most databases are highly inhomogeneous in terms of the type of the provided information, and analytical and experimental approaches to investigate them have been hampered by the lack of comprehensive annotation. Thirdly, the standard bioinformatics solution to fill the gaps due to lacking information is based on machine learning techniques that usually lead to myriad problems related to the preprocessing of data and the input dataset format, both aspects that oftentimes are conducted by trial and error. Finally, a challenging problem that arises in this domain is the data visualization. A common strategy used to overcome the problem is constructing interaction networks, whose analytical but also visual inspection can offer important biological insights, however one primary drawback with this approach is to develop an efficient and scalable algorithm to produce easily interpretable layouts for sparse graphs when the number of nodes is very large. The thesis deals with a multidisciplinary approach to unravel the complexity of lncRNAs regulatory networks and investigate their functions. The objective is to demonstrate the feasibility of using machine learning techniques as well as network analysis to find hidden patterns in the data and to predict new features

    Predicting Parkinson's Disease Genes Based on Node2vec and Autoencoder

    Get PDF
    Identifying genes associated with Parkinson's disease plays an extremely important role in the diagnosis and treatment of Parkinson's disease. In recent years, based on the guilt-by-association hypothesis, many methods have been proposed to predict disease-related genes, but few of these methods are designed or used for Parkinson's disease gene prediction. In this paper, we propose a novel prediction method for Parkinson's disease gene prediction, named N2A-SVM. N2A-SVM includes three parts: extracting features of genes based on network, reducing the dimension using deep neural network, and predicting Parkinson's disease genes using a machine learning method. The evaluation test shows that N2A-SVM performs better than existing methods. Furthermore, we evaluate the significance of each step in the N2A-SVM algorithm and the influence of the hyper-parameters on the result. In addition, we train N2A-SVM on the recent dataset and used it to predict Parkinson's disease genes. The predicted top-rank genes can be verified based on literature study

    From RNA folding to inverse folding: a computational study: Folding and design of RNA molecules

    Get PDF
    Since the discovery of the structure of DNA in the early 1953s and its double-chained complement of information hinting at its means of replication, biologists have recognized the strong connection between molecular structure and function. In the past two decades, there has been a surge of research on an ever-growing class of RNA molecules that are non-coding but whose various folded structures allow a diverse array of vital functions. From the well-known splicing and modification of ribosomal RNA, non-coding RNAs (ncRNAs) are now known to be intimately involved in possibly every stage of DNA translation and protein transcription, as well as RNA signalling and gene regulation processes. Despite the rapid development and declining cost of modern molecular methods, they typically can only describe ncRNA's structural conformations in vitro, which differ from their in vivo counterparts. Moreover, it is estimated that only a tiny fraction of known ncRNAs has been documented experimentally, often at a high cost. There is thus a growing realization that computational methods must play a central role in the analysis of ncRNAs. Not only do computational approaches hold the promise of rapidly characterizing many ncRNAs yet to be described, but there is also the hope that by understanding the rules that determine their structure, we will gain better insight into their function and design. Many studies revealed that the ncRNA functions are performed by high-level structures that often depend on their low-level structures, such as the secondary structure. This thesis studies the computational folding mechanism and inverse folding of ncRNAs at the secondary level. In this thesis, we describe the development of two bioinformatic tools that have the potential to improve our understanding of RNA secondary structure. These tools are as follows: (1) RAFFT for efficient prediction of pseudoknot-free RNA folding pathways using the fast Fourier transform (FFT)}; (2) aRNAque, an evolutionary algorithm inspired by Lévy flights for RNA inverse folding with or without pseudoknot (A secondary structure that often poses difficulties for bio-computational detection). The first tool, RAFFT, implements a novel heuristic to predict RNA secondary structure formation pathways that has two components: (i) a folding algorithm and (ii) a kinetic ansatz. When considering the best prediction in the ensemble of 50 secondary structures predicted by RAFFT, its performance matches the recent deep-learning-based structure prediction methods. RAFFT also acts as a folding kinetic ansatz, which we tested on two RNAs: the CFSE and a classic bi-stable sequence. In both test cases, fewer structures were required to reproduce the full kinetics, whereas known methods (such as Treekin) required a sample of 20,000 structures and more. The second tool, aRNAque, implements an evolutionary algorithm (EA) inspired by the Lévy flight, allowing both local global search and which supports pseudoknotted target structures. The number of point mutations at every step of aRNAque's EA is drawn from a Zipf distribution. Therefore, our proposed method increases the diversity of designed RNA sequences and reduces the average number of evaluations of the evolutionary algorithm. The overall performance showed improved empirical results compared to existing tools through intensive benchmarks on both pseudoknotted and pseudoknot-free datasets. In conclusion, we highlight some promising extensions of the versatile RAFFT method to RNA-RNA interaction studies. We also provide an outlook on both tools' implications in studying evolutionary dynamics

    NetCore: a network propagation approach using node coreness

    Get PDF
    We present NetCore, a novel network propagation approach based on node coreness, for phenotype–genotype associations and module identification. NetCore addresses the node degree bias in PPI networks by using node coreness in the random walk with restart procedure, and achieves improved re-ranking of genes after propagation. Furthermore, NetCore implements a semi-supervised approach to identify phenotype-associated network modules, which anchors the identification of novel candidate genes at known genes associated with the phenotype. We evaluated NetCore on gene sets from 11 different GWAS traits and showed improved performance compared to the standard degree-based network propagation using cross-validation. Furthermore, we applied NetCore to identify disease genes and modules for Schizophrenia GWAS data and pan-cancer mutation data. We compared the novel approach to existing network propagation approaches and showed the benefits of using NetCore in comparison to those. We provide an easy-to-use implementation, together with a high confidence PPI network extracted from ConsensusPathDB, which can be applied to various types of genomics data in order to obtain a re-ranking of genes and functionally relevant network modules

    RNA, the Epicenter of Genetic Information

    Get PDF
    The origin story and emergence of molecular biology is muddled. The early triumphs in bacterial genetics and the complexity of animal and plant genomes complicate an intricate history. This book documents the many advances, as well as the prejudices and founder fallacies. It highlights the premature relegation of RNA to simply an intermediate between gene and protein, the underestimation of the amount of information required to program the development of multicellular organisms, and the dawning realization that RNA is the cornerstone of cell biology, development, brain function and probably evolution itself. Key personalities, their hubris as well as prescient predictions are richly illustrated with quotes, archival material, photographs, diagrams and references to bring the people, ideas and discoveries to life, from the conceptual cradles of molecular biology to the current revolution in the understanding of genetic information. Key Features Documents the confused early history of DNA, RNA and proteins - a transformative history of molecular biology like no other. Integrates the influences of biochemistry and genetics on the landscape of molecular biology. Chronicles the important discoveries, preconceptions and misconceptions that retarded or misdirected progress. Highlights major pioneers and contributors to molecular biology, with a focus on RNA and noncoding DNA. Summarizes the mounting evidence for the central roles of non-protein-coding RNA in cell and developmental biology. Provides a thought-provoking retrospective and forward-looking perspective for advanced students and professional researchers
    corecore