10 research outputs found
Computational methods for RNA integrative biology
Ribonucleic acid (RNA) is an essential molecule, which carries out a wide variety
of functions within the cell, from its crucial involvement in protein synthesis to
catalysing biochemical reactions and regulating gene expression. Such diverse functional
repertoire is indebted to complex structures that RNA can adopt and its flexibility
as an interacting molecule.
It has become possible to experimentally measure these two crucial aspects of RNA
regulatory role with such technological advancements as next-generation sequencing
(NGS). NGS methods can rapidly obtain the nucleotide sequence of many molecules
in parallel. Designing experiments, where only the desired parts of the molecule (or
specific parts of the transcriptome) are sequenced, allows to study various aspects
of RNA biology. Analysis of NGS data is insurmountable without computational
methods.
One such experimental method is RNA structure probing, which aims to infer RNA
structure from sequencing chemically altered transcripts. RNA structure probing data
is inherently noisy, affected both by technological biases and the stochasticity of the
underlying process. Most existing methods do not adequately address the issue of
noise, resorting to heuristics and limiting the informativeness of their output. In this
thesis, a statistical pipeline was developed for modelling RNA structure probing data,
which explicitly captures biological variability, provides automated bias-correcting
strategies, and generates a probabilistic output based on experimental measurements.
The output of our method agrees with known RNA structures, can be used to constrain
structure prediction algorithms, and remains robust to reduced sequence coverage,
thereby increasing sensitivity of the technology.
Another recent experimental innovation maps RNA-protein interactions at very
high temporal resolution, making it possible to study rapid binding events happening
on a minute time scale. In this thesis, a non-parametric algorithm was developed for
identifying significant changes in RNA-protein binding time-series between different
conditions. The method was applied to novel yeast RNA-protein binding time-course
data to study the role of RNA degradation in stress response. It revealed pervasive
changes in the binding to the transcriptome of the yeast transcription termination
factor Nab3 and the cytoplasmic exoribonuclease Xrn1 under nutrient stress. This
challenged the common assumption of viewing transcriptional changes as the major
driver of changes in RNA expression during stress and highlighted the importance of
degradation. These findings inspired a dynamical model for RNA expression, where
transcription and degradation rates are modelled using RNA-protein binding time-series
data
Trends and challenges in Computational RNA biology
A report on the Wellcome Trust Conference on Computational RNA Biology, held in Hinxton, UK, on 17–19 October 2016
Robust statistical modeling improves sensitivity of high-throughput RNA structure probing experiments
Structure probing coupled with high-throughput sequencing could revolutionize our understanding of the role of RNA structure in regulation of gene expression. Despite recent technological advances, intrinsic noise and high sequence coverage requirements greatly limit the applicability of these techniques. Here we describe a probabilistic modeling pipeline that accounts for biological variability and biases in the data, yielding statistically interpretable scores for the probability of nucleotide modification transcriptome wide. Using two yeast data sets, we demonstrate that our method has increased sensitivity, and thus our pipeline identifies modified regions on many more transcripts than do existing pipelines. Our method also provides confident predictions at much lower sequence coverage levels than those recommended for reliable structural probing. Our results show that statistical modeling extends the scope and potential of transcriptome-wide structure probing experiments