110 research outputs found
Small RNA targets : advances in prediction tools and high-throughput profiling
MicroRNAs (miRNAs) are an abundant class of small non-coding RNAs that regulate gene expression at the post-transcriptional level. They are suggested to be involved in most biological processes of the cell primarily by targeting messenger RNAs (mRNAs) for cleavage or translational repression. Their binding to their target sites is mediated by the Argonaute (AGO) family of proteins. Thus, miRNA target prediction is pivotal for research and clinical applications. Moreover, transfer-RNA-derived fragments (tRFs) and other types of small RNAs have been found to be potent regulators of Ago-mediated gene expression. Their role in mRNA regulation is still to be fully elucidated, and advancements in the computational prediction of their targets are in their infancy. To shed light on these complex RNA–RNA interactions, the availability of good quality high-throughput data and reliable computational methods is of utmost importance. Even though the arsenal of computational approaches in the field has been enriched in the last decade, there is still a degree of discrepancy between the results they yield. This review offers an overview of the relevant advancements in the field of bioinformatics and machine learning and summarizes the key strategies utilized for small RNA target prediction. Furthermore, we report the recent development of high-throughput sequencing technologies, and explore the role of non-miRNA AGO driver sequences.peer-reviewe
Recommended from our members
Insights into RNA design from novel molecular tools
RNA, previously recognized merely as a messenger of genetic information, has been recently rediscovered as a versatile molecule with a central role in cellular regulation. These regulatory functions are enabled by its specific chemical makeup that allows it to fold into intricate and flexible structures. In stark contrast with DNA, RNA forms a variety of structural motifs that serve as efficient points of contact in molecular recognition. It is therefore clear, that dynamic RNA structures dictate the binding availability of interfaces that play important roles in molecular regulation inside living cells. As such, the need for tools that can accurately capture and predict RNA structure in vivo continues to be essential to understand RNA function. To this end, my dissertation focuses on the development of molecular tools to predict and characterize accessible RNA interfaces in their native environment. First, I established the usefulness of a fluorescence-based in vivo oligonucleotide hybridization approach to identify accessible interfaces by characterizing numerous RNA regions in several biologically relevant molecules in E. coli. I then described these RNA interactions using a biophysical model based on thermodynamic principles and incorporating large sets of data collected using this fluorescence-based system. This approach displayed improved prediction capabilities of RNA accessibility compared to un-optimized versions without incorporation of in vivo data. Finally, I detailed the development and application of a high throughput tool for the large-scale characterization of accessible interfaces within native RNAs in a single experiment. In this approach, in vivo oligonucleotide hybridization was coupled to transcriptional elongation control to allow analysis via next generation sequencing. This tool was used to obtain complete landscapes of functional structure for 72 regulatory molecules in a single experiment (>1000 regions). Altogether the results of this high throughput approach revealed a pattern indicating that RNA-RNA interaction sites are either highly accessible or highly protected, suggesting their binding status (e.g. actively bound or unbound). In addition, within bacterial small RNAs, our approached revealed the role of the global regulator Hfq as universal structural relaxer. The compendium of these tools provides a unique and fundamental perspective in the study of functional RNA structure, namely, the identification of dynamic structures. Furthermore, the information provided by these approaches significantly aids in the design of synthetic RNAs for a variety of purposes, including gene expression control.Chemical Engineerin
Investigating the concept of accessibility for predicting novel RNA-RNA interactions
State-of-the-art methods for predicting novel trans RNA-RNA interactions use the so-called accessibility as key concept. It estimates whether a region in a given RNA sequence is accessible for forming trans interactions, using a thermodynamic model which quantifies its secondary structure features. RNA-RNA interactions are then predicted by finding the minimum free energy base pairing between the two transcripts, taking into account the accessibility as energy penalty. We investigated the underlying assumptions of this approach using the two methods RNAPLEX and INTARNA on two datasets, containing sRNA-mRNA and snoRNA-rRNA interactions, respectively. We find that (1) known trans RNA-RNA interactions frequently overlap regions containing RNA structure features, (2) the estimated accessibility reflects sRNA structures fairly well, but often disagrees with structures of longer transcripts, (3) the prediction performance of RNA-RNA interaction prediction methods is independent of the quality of the estimated accessibility profiles, and (4) one important overall effect of accessibility profiles is to prevent the thermodynamic model from predicting too long interactions. Based on our findings, we conclude that the accessibility concept to the minimum free energy approach to predicting novel RNA-RNA interactions has conceptual limitations and discuss potential ways of improving the field in the future
Investigating Hfq-Mrna Interactions In Bacteria
Regulatory RNAs (sRNAs) are essential for bacteria to thrive in diverse environments and they also play a key role in virulence [11]. Trans-sRNAs affect the stability and/or translation of their target mRNAs through complementary base-pairing. The base-pairing interaction is not perfect and requires the action of an RNA binding protein, Hfq. Hfq facilitates these RNA-RNA interactions by stabilizing duplex formation, aiding in structural rearrangements, increasing the rate of structural opening, and/or by increasing the rate of annealing [18-21]. Hfq has two well characterized binding surfaces: the proximal surface, which binds AU rich stretches typical of sRNAs, and the distal surface, which binds (ARN)x motifs typically found in target mRNAs [30, 33, 36]. Studies on Hfq-RNA interactions have focused largely on sRNAs until the more recent discovery of an (ARN)x motif within the 5\u27UTR of target mRNAs[36, 37]. The importance of this motif in facilitating Hfq-mRNA binding and its requirement for regulation of a couple well known target mRNAs led us to further characterize the motif in the work described in this thesis. We performed bioinformatic and in vitro analyses to investigate the prevalence, location, structural contexts, and Hfq-binding of (ARN)x motifs in known target mRNAs. We found that the known targets contain single stranded (ARN)x sequences in their 5\u27UTRs that bind to Hfq. Two predominant structural contexts of the single stranded (ARN)x motifs became clear: they were either flanked by stem loop structures or within a loop of an internal bulge, multi-branch junction or hairpin. The key features of the motifs were then used as a bioinformatic tool on a genome wide scale to identify mRNAs that might bind to Hfq. We found that 21% of mRNAs have a suitable (ARN)x motif and therefore likely bind to Hfq. Messages that bind to Hfq may be novel sRNA targets so we investigated this possibility using an in vivo reporter assay and found that 63% of the mRNAs tested are regulated by a specific sRNA. The novel targets are involved in pathways including iron salvage, biofilm formation, and amino acid metabolism. Overall, we defined key features of (ARN)x motifs and were able to use those to predict novel target mRNAs in E. coli. This approach is efficient, effective and adaptable other bacterial species
In silico modelling of RNA-RNA dimer and its application for rational siRNA design and ncRNA target search
Non-protein coding region, which constitutes 98.5% of the human genome, were long
depreciated as evolutive relict. It is only recently that the biological relevance of\ud
the non-coding RNAs associated with these non-coding regions was recognized. The
development of experimental and bioinformatical methods aimed at detecting these
non-coding RNAs (ncRNAs) lead to the discovery of more than 29,000,000 sequences,
grouped into more than 1300 families.
More often than not these ncRNAs function by binding to other RNAs, either pro-
tein coding or non-protein coding. Compared to the number of tools to detect and
classify ncRNAs, the number of tools to search for putative RNA binding partners
is negligible. This leads to the actual situation where the function of the majority of
the annotated ncRNAs genes is completely unknown.
The aim of this work is to assess the function of different families of ncRNAs by
developing new algorithms and methods to study RNA-RNA interactions. These new
methods are extensions of RNA-folding algorithms applied to the problem of RNA-
RNA interactions. Depending on the class of ncRNA studied, different methods were
developed and tested.
This work shows that the development of RNA-folding algorithms to study RNA-
RNA interactions is a promising way to functionally annotate ncRNAs. Still other
factors like RNA-proteins interaction, RNA-concentration or RNA-expression, play
an important role in the process of RNA hybridization and will have to be taken
into account in future works in order to achieve reliable prediction of RNA binding
partners.Non-protein coding region, which constitutes 98.5% of the human genome, were long
depreciated as evolutive relict. It is only recently that the biological relevance of
the non-coding RNAs associated with these non-coding regions was recognized. The
development of experimental and bioinformatical methods aimed at detecting these
non-coding RNAs (ncRNAs) lead to the discovery of more than 29,000,000 sequences,
grouped into more than 1300 families.
More often than not these ncRNAs function by binding to other RNAs, either pro-
tein coding or non-protein coding. Compared to the number of tools to detect and
classify ncRNAs, the number of tools to search for putative RNA binding partners
is negligible. This leads to the actual situation where the function of the majority of
the annotated ncRNAs genes is completely unknown.
The aim of this work is to assess the function of different families of ncRNAs by
developing new algorithms and methods to study RNA-RNA interactions. These new
methods are extensions of RNA-folding algorithms applied to the problem of RNA-
RNA interactions. Depending on the class of ncRNA studied, different methods were
developed and tested.
This work shows that the development of RNA-folding algorithms to study RNA-
RNA interactions is a promising way to functionally annotate ncRNAs. Still other
factors like RNA-proteins interaction, RNA-concentration or RNA-expression, play
an important role in the process of RNA hybridization and will have to be taken
into account in future works in order to achieve reliable prediction of RNA binding
partners
Investigating prokaryotic transcriptomes and the impact of crosstalk between noncoding RNA and messenger RNA interactions
Prokaryotes have a complex noncoding
RNA (ncRNA) based regulatory system, resembling
that of eukaryotes. Recent transcriptomics studies also point out the abundance of highly
expressed uncharacterized RNAs in archaea and bacteria. However, despite the recent advances
indicating the prevalence of ncRNAs in prokaryotes, it is still unknown to what extent these
uncharacterized transcripts are functional. Therefore, we have proposed a phylogeny informed
approach to design new RNA sequencing (RNAseq)
experiments, which increases the
information harnessed from transcriptome data for ncRNA detection.
Many regulatory ncRNAs engage in RNARNA
interactions, where RNA molecules bind to
form a duplex. Predictions of true targets for an RNA enables a successful functional
characterization, these can be estimated by bioinformatics methods. However, the algorithms
developed to date are imperfect and it is an open question as to which ones perform well and
whether these can be improved upon. Towards this goal we performed a computational
benchmark study to find reliable algorithms for RNARNA
interaction prediction. We found that
energy based methods, which include the accessibility of interaction regions, are currently the
most accurate.
Many ncRNAs, including housekeeping ncRNA genes, are highly expressed. The abundances of
interacting RNA molecules enable RNARNA
duplex formation. In chapter IV we explore the
impact of high abundance RNAs on protein expression due to crosstalk RNARNA
interactions
between mRNAs and ncRNAs. With extensive RNARNA
interaction predictions we reveal that
RNA avoidance is an evolutionarily conserved phenomenon among prokaryotes, which means
that core mRNAs have evolved to avoid crosstalk interactions with abundant ncRNAs. Our
predictions also reveal that RNA avoidance may influence protein expression. To test this, we
investigated the stability of interactions between mRNAs and core ncRNAs. These predictions
show that the RNA avoidance influences the final protein abundances.
In conclusion, the primary aims of this study are to investigate the prokaryotic transcriptome for
novel ncRNA genes and examine the effects of crosstalk RNA interactions. We present a method
to increase information gained from transcriptome in prokaryotes for ncRNA identification. We
also present the most comprehensive benchmark of RNARNA
interaction prediction algorithms
to date. Lastly, we introduce and test a ‘RNA avoidance hypothesis’ that shows the influence of
crosstalk RNA interactions on protein expression in bacteria
Principles of RNA-based gene expression control in Vibrio cholerae
Post-transcriptional control of gene expression by small regulatory RNAs (sRNAs) is a widespread regulatory principle among bacteria. The sRNAs typically act in concert with RNA binding proteins such as the RNA chaperone Hfq to bind mRNA targets via imperfect base pairing. They affect translation initiation and/or transcript stability. Additionally, sRNAs can influence transcription termination of their targets or function indirectly as so-called sponges for other sRNAs. Regulation often involves the major endoribonuclease RNase E, which contributes to both sRNA biosynthesis and function.
In the first part of this thesis, we globally identified RNase E cleavage sites in the major human pathogen Vibrio cholerae by employing TIER-seq (transiently inactivating an endoribonuclease followed by RNA-seq). We validated the involvement of RNase E in the synthesis and maturation of several previously uncharacterized sRNAs. Two examples, OppZ and CarZ, were chosen for further study due to their unique regulatory mechanism. They are processed from the 3’ untranslated regions (3’ UTR) of the oppABCDF and carAB operons, respectively, and subsequently target mRNAs transcribed from the very same operons by binding to base pairing sites upstream of the second (oppB) or first (carA) cistrons. This leads to translational inhibition and triggers premature transcription termination by the termination factor Rho, thereby establishing an autoregulatory feedback loop involving both the protein-coding genes and the processed sRNAs. In the case of OppZ, the regulation is limited to the oppBCDF part of the operon in a discoordinate fashion due to the position of the OppZ base pairing site. This mechanism of target regulation by Opp and CarZ represents the first report of an RNA-based feedback regulation that does not rely on additional transcription factors.
The second study included in the thesis characterizes two sRNAs involved in the envelope stress response (ESR) of V. cholerae. Misfolded outer membrane proteins (OMPs) induce the sigmaE-dependent transcriptional activation of the sRNAs MicV and VrrA, which reduce membrane stress by repressing the mRNAs of several OMPs and other abundant membrane protein. MicV and VrrA share a conserved seed region with their functionally analogous counterpart from Escherichia coli, RybB, indicating that this seed sequence might represent a universally functional RNA domain. To study the involvement of this seed domain in the ESR in an unbiased fashion, we constructed a complex library of artificial sRNAs and performed laboratory selection experiments under membrane-damaging conditions. We isolated the most highly enriched sRNA variants and indeed discovered a strong enrichment of the conserved seed-pairing domain. We were able to pinpoint the repression of ompA as the key factor responsible for the sRNA-mediated resistance to ethanol-induced membrane damage.
Taken together, this thesis expanded the knowledge on the mechanisms of sRNA-dependent gene regulation by reporting a novel autoregulatory feedback loop. Additionally, it introduced a synthetic sRNA library as a tool to study complex microbial phenotypes and their underlying sRNA-target interactions
Non-coding RNA networks regulating leaf vegetative desiccation tolerance in the resurrection plant Xerophyta humilis.
Common to orthodox seeds, desiccation tolerance (DT) is exceedingly rare in the vegetative tissues of modern angiosperms, being limited to a small number of "resurrection plants". While the molecular mechanisms of DT, as well as the transcription factors regulating the seed and vegetative DT programmes, have been identified, very little is known with regards to the role of regulatory noncoding RNAs (ncRNAs). To investigate the presence and roles of possible ncRNA players, RNA-Seq was performed on desiccating Xerophyta humilis leaves and a bioinformatic pipeline assembled to identify the potential decoy lncRNAs and miRNAs present. Interaction mapping was performed, identifying a number of small regulatory networks each regulating a small subset of the desiccation transcriptome. Predicted networks were screened for function related to DT and expression consistent with functional regulatory interactions. Of the predicted networks, two appear highly promising as potential regulators of key DT response genes. The results indicate that differentially expressed (DE) desiccation response ncRNAs are present in the vegetative tissues of X. humilis and likely play a key role in the regulation of DT. This suggests that ncRNAs appear to play a more important role in DT than previously thought, and may have facilitated the evolution of vegetative DT through reprogramming of seed DT programs in vegetative tissues
A hybrid approach to assess the structural impact of long noncoding RNA mutations uncovers key NEAT1 interactions in colorectal cancer
Long noncoding RNAs (lncRNAs) are emerging players in cancer and they entail potential as prognostic biomarkers or therapeutic targets. Earlier studies have identified somatic mutations in lncRNAs that are associated with tumor relapse after therapy, but the underlying mechanisms behind these associations remain unknown. Given the relevance of secondary structure for the function of some lncRNAs, some of these mutations may have a functional impact through structural disturbance. Here, we examined the potential structural and functional impact of a novel A > G point mutation in NEAT1 that has been recurrently observed in tumors of colorectal cancer patients experiencing relapse after treatment. Here, we used the nextPARS structural probing approach to provide first empirical evidence that this mutation alters NEAT1 structure. We further evaluated the potential effects of this structural alteration using computational tools and found that this mutation likely alters the binding propensities of several NEAT1-interacting miRNAs. Differential expression analysis on these miRNA networks shows upregulation of Vimentin, consistent with previous findings. We propose a hybrid pipeline that can be used to explore the potential functional effects of lncRNA somatic mutations.Funding information H2020 European Research Council, Grant/Award Number: 724173Peer ReviewedPostprint (published version
- …