167 research outputs found

    Structural alignment of RNA with FOLDALIGN

    Get PDF

    Predicting human microRNA precursors based on an optimized feature subset generated by GA–SVM

    Get PDF
    AbstractMicroRNAs (miRNAs) are non-coding RNAs that play important roles in post-transcriptional regulation. Identification of miRNAs is crucial to understanding their biological mechanism. Recently, machine-learning approaches have been employed to predict miRNA precursors (pre-miRNAs). However, features used are divergent and consequently induce different performance. Thus, feature selection is critical for pre-miRNA prediction. We generated an optimized feature subset including 13 features using a hybrid of genetic algorithm and support vector machine (GA–SVM). Based on SVM, the classification performance of the optimized feature subset is much higher than that of the two feature sets used in microPred and miPred by five-fold cross-validation. Finally, we constructed the classifier miR-SF to predict the most recently identified human pre-miRNAs in miRBase (version 16). Compared with microPred and miPred, miR-SF achieved much higher classification performance. Accuracies were 93.97%, 86.21% and 64.66% for miR-SF, microPred and miPred, respectively. Thus, miR-SF is effective for identifying pre-miRNAs

    Integrative methods for reconstruction of dynamic networks in chondrogenesis

    Get PDF
    Application of human mesenchymal stem cells represents a promising approach in the field of regenerative medicine. Specific stimulation can give rise to chondrocytes, osteocytes or adipocytes. Investigation of the underlying biological processes which induce the observed cellular differentiation is essential to efficiently generate specific tissues for therapeutic purposes. Upon treatment with diverse stimuli, gene expression levels of cultivated human mesenchymal stem cells were monitored using time series microarray experiments for the three lineages. Application of gene network inference is a common approach to identify the regulatory dependencies among a set of investigated genes. This thesis applies the NetGenerator V2.0 tool, which is capable to deal with multiple time series data, which investigates the effect of multiple external stimuli. The applied model is based on a system of linear ordinary differential equations, whose parameters are optimised to reproduce the given time series datasets. Several procedures in the inference process were adapted in this new version in order to allow for the integration of multiple datasets. Network inference was applied on in silico network examples as well as on multi-experiment microarray data of mesenchymal stem cells. The resulting chondrogenesis model was evaluated on the basis of several features including the model adaptation to the data, total number of connections, proportion of connections associated with prior knowledge and the model stability in a resampling procedure. Altogether, NetGenerator V2.0 has provided an automatic and efficient way to integrate experimental datasets and to enhance the interpretability and reliability of the resulting network. In a second chondrogenesis model, the miRNA and mRNA time series data were integrated for the purpose of network inference. One hypothesis of the model was verified by experiments, which demonstrated the negative effect of miR-524-5p on downstream genes

    From Structure Prediction to Genomic Screens for Novel Non-Coding RNAs

    Get PDF
    Non-coding RNAs (ncRNAs) are receiving more and more attention not only as an abundant class of genes, but also as regulatory structural elements (some located in mRNAs). A key feature of RNA function is its structure. Computational methods were developed early for folding and prediction of RNA structure with the aim of assisting in functional analysis. With the discovery of more and more ncRNAs, it has become clear that a large fraction of these are highly structured. Interestingly, a large part of the structure is comprised of regular Watson-Crick and GU wobble base pairs. This and the increased amount of available genomes have made it possible to employ structure-based methods for genomic screens. The field has moved from folding prediction of single sequences to computational screens for ncRNAs in genomic sequence using the RNA structure as the main characteristic feature. Whereas early methods focused on energy-directed folding of single sequences, comparative analysis based on structure preserving changes of base pairs has been efficient in improving accuracy, and today this constitutes a key component in genomic screens. Here, we cover the basic principles of RNA folding and touch upon some of the concepts in current methods that have been applied in genomic screens for de novo RNA structures in searches for novel ncRNA genes and regulatory RNA structure on mRNAs. We discuss the strengths and weaknesses of the different strategies and how they can complement each other

    Nucleotide Complementarity Features in the Design of Effective Artificial miRNAs

    Full text link
    L'importance du miARN dans la régulation des gènes a bien été établie. Cependant, le mécanisme précis du processus de reconnaissance des cibles n'est toujours pas complètement compris. Parmi les facteurs connus, la complémentarité en nucléotides, l'accessibilité des sites cibles, la concentration en espèces d'ARN et la coopérativité des sites ont été jugées importantes. En utilisant ces règles connues, nous avons précédemment conçu des miARN artificiels qui inhibent la croissance des cellules cancéreuses en réprimant l'expression de plusieurs gènes. De telles séquences guides ont été délivrées dans les cellules sous forme de shARN. Le VIH étant un virus à ARN, nous avons conçu et testé des ARN guides qui inhibent sa réplication en ciblant directement le génome viral et les facteurs cellulaires nécessaires au virus dans le cadre de mon premier projet. En utilisant une version mise à jour du programme de conception, mirBooking, nous devenons capables de prédire l'effet de concentration des espèces à ARN avec plus de précision. Les séquences guides conçues fournissaient aux cellules une résistance efficace à l'infection virale, égale ou meilleure que celles ciblant directement le génome viral par une complémentarité quasi-parfaite. Cependant, les niveaux de répression des facteurs viraux et cellulaires ne pouvaient pas être prédits avec précision. Afin de mieux comprendre les règles de reconnaissance des cibles miARN, les règles de couplage des bases au-delà du « seed » ont été approfondies dans mon deuxième projet. En concevant des séquences guides correspondant partiellement à la cible et en analysant le schéma de répression, nous avons établi un modèle unificateur de reconnaissance de cible par miARN via la protéine Ago2. Il montre qu'une fois que le « seed » est appariée avec l'ARN cible, la formation d'un duplex d'ARN est interrompue au niveau de la partie centrale du brin guide mais reprend plus loin en aval de la partie centrale en suivant un ordre distinct. L'implémentation des règles découvertes dans un programme informatique, MicroAlign, a permis d'améliorer la conception de miARN artificiels efficaces. Dans cette étude, nous avons non seulement confirmé la contribution des nucléotides non-germes à l'efficacité des miARN, mais également défini de manière quantitative la manière dont ils fonctionnent. Le point de vue actuellement répandu selon lequel les miARN peuvent cibler efficacement tous les gènes de manière égale, avec uniquement des correspondances de semences, peut nécessiter un réexamenThe importance of miRNA in gene regulation has been well established; however, the precise mechanism of its target recognition process is still not completely understood. Among the known factors, nucleotide complementarity, accessibility of the target sites, and the concentration of the RNA species, and site cooperativity were deemed important. Using these known rules, we previously designed artificial miRNAs that inhibit cancer cell growth by repressing the expression of multiple genes. Such guide sequences were delivered into the cells in the form of shRNAs. HIV is an RNA virus. We designed and tested guide RNAs that inhibit its replication by directly targeting the viral genome and cellular factors that the virus requires in my first project. Using an updated version of the design program, mirBooking, we become capable to predict the concentration effect of RNA species more accurately. Designed guide sequences provided cells with effective resistance against viral infection. The protection was equal or better than those that target the viral genome directly via near-perfect complementarity. However, the repression levels of the viral and cellular factors could not be precisely predicted. In order to gain further insights on the rules of miRNA target recognition, the rules of base pairing beyond the seed was further investigated in my second project. By designing guide sequences that partially match the target and analysing the repression pattern, we established a unifying model of miRNA target recognition via Ago2 protein. It shows that once the seed is base-paired with the target RNA, the formation of an RNA duplex is interrupted at the central portion of the guide strand but resumes further downstream of the central portion following a distinct order. The implementation of the discovered rules in a computer program, MicroAlign, enhanced the design of efficient artificial miRNAs. In this study, we not only confirmed the contribution of non-seed nucleotides to the efficiency of miRNAs, but also quantitatively defined the way through which they work. The currently popular view that miRNAs can effectively target all genes equally with only seed matches may require careful re-examination

    Integrative approaches for systematic reconstruction of regulatory circuits in mammals

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Computational and Systems Biology Program, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 141-149).The reconstruction of regulatory networks is one of the most challenging tasks in systems biology. Although some models for inferring regulatory networks can make useful predictions about the wiring and mechanisms of molecular interactions, these approaches are still limited and there is a strong need to develop increasingly universal and accurate approaches for network reconstruction. This problem is particularly challenging in mammals, due to the higher complexity of mammalian regulatory networks and limitations in experimental manipulation. In this thesis, I present three systematic approachs to reconstruct, analyse and refine models of gene regulation. In Chapter 1, I devise a method for deriving an observational model from temporal genomic profiles. I use it to choose targets for perturbation experiments in order to determine a network controlling the responses of mouse primary dendritic cells to stimulation with pathogen components. In Chapter 2, I introduce the algorithm Exigo, for identifying essential interactions in regulatory networks reconstructed from experimental data where regulators have been silenced, using a network reduction strategy. Exigo outperforms previous approaches on simulated data, uncovers the core network structure when applied to real networks derived from perturbation studies in mammals, and improves the performance of network inference methods. Lastly, I introduce in Chapter 3 an approach to learn a module network from multiple highthroughput assays. Analysis of a diffuse large B-cell lymphoma dataset identifies candidate regulator genes, microRNAs and copy number aberrations with biological, and possibly therapeutic, importance.by Ana Paula Santos Botelho Oliveira Leite.Ph.D
    • …
    corecore