Search CORE

4,481 research outputs found

RNA Accessibility in cubic time

Author: A Bompfünewerer
A Busch
A Serganov
DH Turner
H Tafer
H Tafer
IL Hofacker
Ivo L Hofacker
J Hackermüller
JS McCaskill
M Kertesz
M Kertesz
P Wikström
Stephan H Bernhart
U Mückstein
U Mückstein
Ullrike Mückstein
Y Ding
Y Ding
Publication venue: BioMed Central
Publication date: 01/03/2011
Field of study

Abstract Background The accessibility of RNA binding motifs controls the efficacy of many biological processes. Examples are the binding of miRNA, siRNA or bacterial sRNA to their respective targets. Similarly, the accessibility of the Shine-Dalgarno sequence is essential for translation to start in prokaryotes. Furthermore, many classes of RNA binding proteins require the binding site to be single-stranded. Results We introduce a way to compute the accessibility of all intervals within an RNA sequence in <inline-formula><graphic file="1748-7188-6-3-i1.gif"/></inline-formula>(<it>n</it>3) time. This improves on previous implementations where only intervals of one defined length were computed in the same time. While the algorithm is in the same efficiency class as sampling approaches, the results, especially if the probabilities get small, are much more exact. Conclusions Our algorithm significantly speeds up methods for the prediction of RNA-RNA interactions and other applications that require the accessibility of RNA molecules. The algorithm is already available in the program RNAplfold of the ViennaRNA package.</p

Crossref

Directory of Open Access Journals

PubMed Central

Permanent Hosting, Archiving and Indexing of Digital Resources and Assets

Structurally constrained protein evolution: results from a lattice simulation

Author: Bastolla Ugo
Roman H. Eduardo
Vendruscolo Michele
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/02/2000
Field of study

We simulate the evolution of a protein-like sequence subject to point mutations, imposing conservation of the ground state, thermodynamic stability and fast folding. Our model is aimed at describing neutral evolution of natural proteins. We use a cubic lattice model of the protein structure and test the neutrality conditions by extensive Monte Carlo simulations. We observe that sequence space is traversed by neutral networks, i.e. sets of sequences with the same fold connected by point mutations. Typical pairs of sequences on a neutral network are nearly as different as randomly chosen sequences. The fraction of neutral neighbors has strong sequence to sequence variations, which influence the rate of neutral evolution. In this paper we study the thermodynamic stability of different protein sequences. We relate the high variability of the fraction of neutral mutations to the complex energy landscape within a neutral network, arguing that valleys in this landscape are associated to high values of the neutral mutation rate. We find that when a point mutation produces a sequence with a new ground state, this is likely to have a low stability. Thus we tentatively conjecture that neutral networks of different structures are typically well separated in sequence space. This results indicates that changing significantly a protein structure through a biologically acceptable chain of point mutations is a rare, although possible, event.Comment: added reference, to appear on European Physical Journal

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Trajectory-based differential expression analysis for single-cell sequencing data

Author: Cannoodt Robrecht
Clement Lieven
Dudoit Sandrine
Roux de Bézieux Hector
Saelens Wouter
Saeys Yvan
Street Kelly
Van den Berge Koen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Trajectory inference has radically enhanced single-cell RNA-seq research by enabling the study of dynamic changes in gene expression. Downstream of trajectory inference, it is vital to discover genes that are (i) associated with the lineages in the trajectory, or (ii) differentially expressed between lineages, to illuminate the underlying biological processes. Current data analysis procedures, however, either fail to exploit the continuous resolution provided by trajectory inference, or fail to pinpoint the exact types of differential expression. We introduce tradeSeq, a powerful generalized additive model framework based on the negative binomial distribution that allows flexible inference of both within-lineage and between-lineage differential expression. By incorporating observation-level weights, the model additionally allows to account for zero inflation. We evaluate the method on simulated datasets and on real datasets from droplet-based and full-length protocols, and show that it yields biological insights through a clear interpretation of the data. Downstream of trajectory inference for cell lineages based on scRNA-seq data, differential expression analysis yields insight into biological processes. Here, Van den Berge et al. develop tradeSeq, a framework for the inference of within and between-lineage differential expression, based on negative binomial generalized additive models

Ghent University Academic Bibliography

LinearCoFold and LinearCoPartition: Linear-Time Algorithms for Secondary Structure Prediction of Interacting RNA molecules

Author: Huang Liang
Li Sizhen
Mathews David H.
Zhang He
Zhang Liang
Publication venue
Publication date: 26/10/2022
Field of study

Many ncRNAs function through RNA-RNA interactions. Fast and reliable RNA structure prediction with consideration of RNA-RNA interaction is useful. Some existing tools are less accurate due to omitting the competing of intermolecular and intramolecular base pairs, or focus more on predicting the binding region rather than predicting the complete secondary structure of two interacting strands. Vienna RNAcofold, which reduces the problem into the classical single sequence folding by concatenating two strands, scales in cubic time against the combined sequence length, and is slow for long sequences. To address these issues, we present LinearCoFold, which predicts the complete minimum free energy structure of two strands in linear runtime, and LinearCoPartition, which calculates the cofolding partition function and base pairing probabilities in linear runtime. LinearCoFold and LinearCoPartition follows the concatenation strategy of RNAcofold, but are orders of magnitude faster than RNAcofold. For example, on a sequence pair with combined length of 26,190 nt, LinearCoFold is 86.8x faster than RNAcofold MFE mode (0.6 minutes vs. 52.1 minutes), and LinearCoPartition is 642.3x faster than RNAcofold partition function mode (1.8 minutes vs. 1156.2 minutes). Different from the local algorithms, LinearCoFold and LinearCoPartition are global cofolding algorithms without restriction on base pair length. Surprisingly, LinearCoFold and LinearCoPartition's predictions have higher PPV and sensitivity of intermolecular base pairs. Furthermore, we apply LinearCoFold to predict the RNA-RNA interaction between SARS-CoV-2 gRNA and human U4 snRNA, which has been experimentally studied, and observe that LinearCoFold's prediction correlates better to the wet lab results

arXiv.org e-Print Archive

Recommended from our members

scAI: an unsupervised approach for the integrative analysis of parallel single-cell transcriptomic and epigenomic profiles.

Author: Jin Suoqin
Nie Qing
Zhang Lihua
Publication venue: eScholarship, University of California
Publication date: 01/02/2020
Field of study

Simultaneous measurements of transcriptomic and epigenomic profiles in the same individual cells provide an unprecedented opportunity to understand cell fates. However, effective approaches for the integrative analysis of such data are lacking. Here, we present a single-cell aggregation and integration (scAI) method to deconvolute cellular heterogeneity from parallel transcriptomic and epigenomic profiles. Through iterative learning, scAI aggregates sparse epigenomic signals in similar cells learned in an unsupervised manner, allowing coherent fusion with transcriptomic measurements. Simulation studies and applications to three real datasets demonstrate its capability of dissecting cellular heterogeneity within both transcriptomic and epigenomic layers and understanding transcriptional regulatory mechanisms

eScholarship - University of California

Flexible RNA design under structure and sequence constraints using formal languages

Author: Denise Alain
Ponty Yann
Vialette Stéphane
Waldispühl Jérôme
Zhang Yi
Zhou Yu
Publication venue
Publication date: 01/08/2013
Field of study

The problem of RNA secondary structure design (also called inverse folding) is the following: given a target secondary structure, one aims to create a sequence that folds into, or is compatible with, a given structure. In several practical applications in biology, additional constraints must be taken into account, such as the presence/absence of regulatory motifs, either at a specific location or anywhere in the sequence. In this study, we investigate the design of RNA sequences from their targeted secondary structure, given these additional sequence constraints. To this purpose, we develop a general framework based on concepts of language theory, namely context-free grammars and finite automata. We efficiently combine a comprehensive set of constraints into a unifying context-free grammar of moderate size. From there, we use generic generic algorithms to perform a (weighted) random generation, or an exhaustive enumeration, of candidate sequences. The resulting method, whose complexity scales linearly with the length of the RNA, was implemented as a standalone program. The resulting software was embedded into a publicly available dedicated web server. The applicability demonstrated of the method on a concrete case study dedicated to Exon Splicing Enhancers, in which our approach was successfully used in the design of \emph{in vitro} experiments.Comment: ACM BCB 2013 - ACM Conference on Bioinformatics, Computational Biology and Biomedical Informatics (2013

arXiv.org e-Print Archive

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL-Polytechnique

HAL - UPEC / UPEM

Improving the value of public RNA-seq expression data by phenotype prediction.

Author: Andrew Jaffe
Aryee
Beery
Bernstein
Collado-Torres
Collado-Torres
Consortium
Denk
Eswaran
Frazee
Goodspeed
Houseman
Iorio
Irizarry
Jeffrey T Leek
Kalari
Kim
Leek
Leinonen
Leonardo Collado-Torres
Lister
Liu
Lonsdale
Mazure
Mortazavi
Nagalakshmi
Nellore
Pohl
Ritchie
Robinson
Seqc/Maqc-Iii Consortium.
Shannon E Ellis
Smallridge
Toker
Publication venue: eScholarship, University of California
Publication date: 01/05/2018
Field of study

Publicly available genomic data are a valuable resource for studying normal human variation and disease, but these data are often not well labeled or annotated. The lack of phenotype information for public genomic data severely limits their utility for addressing targeted biological questions. We develop an in silico phenotyping approach for predicting critical missing annotation directly from genomic measurements using well-annotated genomic and phenotypic data produced by consortia like TCGA and GTEx as training data. We apply in silico phenotyping to a set of 70 000 RNA-seq samples we recently processed on a common pipeline as part of the recount2 project. We use gene expression data to build and evaluate predictors for both biological phenotypes (sex, tissue, sample source) and experimental conditions (sequencing strategy). We demonstrate how these predictions can be used to study cross-sample properties of public genomic data, select genomic projects with specific characteristics, and perform downstream analyses using predicted phenotypes. The methods to perform phenotype prediction are available in the phenopredict R package and the predictions for recount2 are available from the recount R package. With data and phenotype information available for 70,000 human samples, expression data is available for use on a scale that was not previously feasible

Crossref

eScholarship - University of California