42 research outputs found
Large-scale analysis of microRNA expression, epi-transcriptomic features and biogenesis.
MicroRNAs are important genetic regulators in both animals and plants. They have a range of functions spanning development, differentiation, growth, metabolism and disease. The advent of next-generation sequencing technologies has made it a relatively straightforward task to detect these molecules and their relative expression via sequencing. There are a large number of published studies with deposited datasets. However, there are currently few resources that capitalize on these data to better understand the features, distribution and biogenesis of miRNAs. Herein, we focus on Human and Mouse for which the majority of data are available. We reanalyse sequencing data from 461 samples into a coordinated catalog of microRNA expression. We use this to perform large-scale analyses of miRNA function and biogenesis. These analyses include global expression comparison, co-expression of miRNA clusters and the prediction of miRNA strand-specificity and underlying constraints. Additionally, we report for the first time a global analysis of miRNA epi-transcriptomic modifications and assess their prevalence across tissues, samples and families. Finally, we report a list of potentially mis-annotated miRNAs in miRBase based on their aggregated modification profiles. The results have been collated into a comprehensive online repository of miRNA expression and features such as modifications and RNA editing events, which is available at: http://wwwdev.ebi.ac.uk/enright-dev/miratlas. We believe these findings will further contribute to our understanding of miRNA function in animals and benefit the miRNA community in general
Mirnovo: genome-free prediction of microRNAs from small RNA sequencing data and single-cells using decision forests.
The discovery of microRNAs (miRNAs) remains an important problem, particularly given the growth of high-throughput sequencing, cell sorting and single cell biology. While a large number of miRNAs have already been annotated, there may well be large numbers of miRNAs that are expressed in very particular cell types and remain elusive. Sequencing allows us to quickly and accurately identify the expression of known miRNAs from small RNA-Seq data. The biogenesis of miRNAs leads to very specific characteristics observed in their sequences. In brief, miRNAs usually have a well-defined 5' end and a more flexible 3' end with the possibility of 3' tailing events, such as uridylation. Previous approaches to the prediction of novel miRNAs usually involve the analysis of structural features of miRNA precursor hairpin sequences obtained from genome sequence. We surmised that it may be possible to identify miRNAs by using these biogenesis features observed directly from sequenced reads, solely or in addition to structural analysis from genome data. To this end, we have developed mirnovo, a machine learning based algorithm, which is able to identify known and novel miRNAs in animals and plants directly from small RNA-Seq data, with or without a reference genome. This method performs comparably to existing tools, however is simpler to use with reduced run time. Its performance and accuracy has been tested on multiple datasets, including species with poorly assembled genomes, RNaseIII (Drosha and/or Dicer) deficient samples and single cells (at both embryonic and adult stage)
Recommended from our members
Rare variant contribution to human disease in 281,104 UK Biobank exomes.
Genome-wide association studies have uncovered thousands of common variants associated with human disease, but the contribution of rare variants to common disease remains relatively unexplored. The UK Biobank contains detailed phenotypic data linked to medical records for approximately 500,000 participants, offering an unprecedented opportunity to evaluate the effect of rare variation on a broad collection of traits1,2. Here we study the relationships between rare protein-coding variants and 17,361 binary and 1,419 quantitative phenotypes using exome sequencing data from 269,171 UK Biobank participants of European ancestry. Gene-based collapsing analyses revealed 1,703 statistically significant gene-phenotype associations for binary traits, with a median odds ratio of 12.4. Furthermore, 83% of these associations were undetectable via single-variant association tests, emphasizing the power of gene-based collapsing analysis in the setting of high allelic heterogeneity. Gene-phenotype associations were also significantly enriched for loss-of-function-mediated traits and approved drug targets. Finally, we performed ancestry-specific and pan-ancestry collapsing analyses using exome sequencing data from 11,933 UK Biobank participants of African, East Asian or South Asian ancestry. Our results highlight a significant contribution of rare variants to common disease. Summary statistics are publicly available through an interactive portal ( http://azphewas.com/ )
In situ functional dissection of RNA cis-regulatory elements by multiplex CRISPR-Cas9 genome engineering.
RNA regulatory elements (RREs) are an important yet relatively under-explored facet of gene regulation. Deciphering the prevalence and functional impact of this post-transcriptional control layer requires technologies for disrupting RREs without perturbing cellular homeostasis. Here we describe genome-engineering based evaluation of RNA regulatory element activity (GenERA), a clustered regularly interspaced short palindromic repeats (CRISPR)-Cas9 platform for in situ high-content functional analysis of RREs. We use GenERA to survey the entire regulatory landscape of a 3'UTR, and apply it in a multiplex fashion to analyse combinatorial interactions between sets of miRNA response elements (MREs), providing strong evidence for cooperative activity. We also employ this technology to probe the functionality of an entire MRE network under cellular homeostasis, and show that high-resolution analysis of the GenERA dataset can be used to extract functional features of MREs. This study provides a genome editing-based multiplex strategy for direct functional interrogation of RNA cis-regulatory elements in a native cellular environment
MicroRNA degradation by a conserved target RNA regulates animal behavior
International audiencemicroRNAs (miRNAs) repress target transcripts through partial complementarity. By contrast, highly complementary miRNA-binding sites within viral and artificially engineered transcripts induce miRNA degradation in vitro and in cell lines. Here, we show that a genome-encoded transcript harboring a near-perfect and deeply conserved miRNA-binding site for miR-29 controls zebrafish and mouse behavior. This transcript originated in basal vertebrates as a long noncoding RNA (lncRNA) and evolved to the protein-coding gene NREP in mammals, where the miR-29-binding site is located within the 3′ UTR. We show that the near-perfect miRNA site selectively triggers miR-29b destabilization through 3′ trimming and restricts its spatial expression in the cerebellum. Genetic disruption of the miR-29 site within mouse Nrep results in ectopic expression of cerebellar miR-29b and impaired coordination and motor learning. Thus, we demonstrate an endogenous target-RNA-directed miRNA degradation event and its requirement for animal behavio
A MILI-independent piRNA biogenesis pathway empowers partial germline reprogramming.
In mice, the pathway involving PIWI and PIWI-interacting RNA (PIWI-piRNA) is essential to re-establish transposon silencing during male-germline reprogramming. The cytoplasmic PIWI protein MILI mediates piRNA-guided transposon RNA cleavage as well as piRNA amplification. MIWI2's binding to piRNA and its nuclear localization are proposed to be dependent upon MILI function. Here, we demonstrate the existence of a piRNA biogenesis pathway that sustains partial MIWI2 function and reprogramming activity in the absence of MILI
Re-annotation of 191 developmental and epileptic encephalopathy-associated genes unmasks de novo variants in SCN1A
Funder: Agency for Innovation by Science and Technology, IWTFunder: U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)Funder: BOF-University of Antwerp (FFB180053) and FWO (1861419N).Abstract: The developmental and epileptic encephalopathies (DEE) are a group of rare, severe neurodevelopmental disorders, where even the most thorough sequencing studies leave 60–65% of patients without a molecular diagnosis. Here, we explore the incompleteness of transcript models used for exome and genome analysis as one potential explanation for a lack of current diagnoses. Therefore, we have updated the GENCODE gene annotation for 191 epilepsy-associated genes, using human brain-derived transcriptomic libraries and other data to build 3,550 putative transcript models. Our annotations increase the transcriptional ‘footprint’ of these genes by over 674 kb. Using SCN1A as a case study, due to its close phenotype/genotype correlation with Dravet syndrome, we screened 122 people with Dravet syndrome or a similar phenotype with a panel of exon sequences representing eight established genes and identified two de novo SCN1A variants that now - through improved gene annotation - are ascribed to residing among our exons. These two (from 122 screened people, 1.6%) molecular diagnoses carry significant clinical implications. Furthermore, we identified a previously classified SCN1A intronic Dravet syndrome-associated variant that now lies within a deeply conserved exon. Our findings illustrate the potential gains of thorough gene annotation in improving diagnostic yields for genetic disorders
Mantis-ml: Disease-Agnostic Gene Prioritization from High-Throughput Genomic Screens by Stochastic Semi-supervised Learning.
Access to large-scale genomics datasets has increased the utility of hypothesis-free genome-wide analyses. However, gene signals are often insufficiently powered to reach experiment-wide significance, triggering a process of laborious triaging of genomic-association-study results. We introduce mantis-ml, a multi-dimensional, multi-step machine-learning framework that allows objective assessment of the biological relevance of genes to disease studies. Mantis-ml is an automated machine-learning framework that follows a multi-model approach of stochastic semi-supervised learning to rank disease-associated genes through iterative learning sessions on random balanced datasets across the protein-coding exome. When applied to a range of human diseases, including chronic kidney disease (CKD), epilepsy, and amyotrophic lateral sclerosis (ALS), mantis-ml achieved an average area under curve (AUC) prediction performance of 0.81-0.89. Critically, to prove its value as a tool that can be used to interpret exome-wide association studies, we overlapped mantis-ml predictions with data from published cohort-level association studies. We found a statistically significant enrichment of high mantis-ml predictions among the highest-ranked genes from hypothesis-free cohort-level statistics, indicating a substantial improvement over the performance of current state-of-the-art methods and pointing to the capture of true prioritization signals for disease-associated genes. Finally, we introduce a generic mantis-ml score (GMS) trained with over 1,200 features as a generic-disease-likelihood estimator, outperforming published gene-level scores. In addition to our tool, we provide a gene prioritization atlas that includes mantis-ml's predictions across ten disease areas and empowers researchers to interactively navigate through the gene-triaging framework. Mantis-ml is an intuitive tool that supports the objective triaging of large-scale genomic discovery studies and enhances our understanding of complex genotype-phenotype associations
Recommended from our members
Elucidating the function and biogenesis of small non-coding RNAs using novel computational methods & machine learning.
The discovery of RNA in 1868 by Friedrich Miescher was meant to be the prologue to an
exciting new era in Biology full of scientific breakthroughs and accomplishments. Since
then, RNAs have been proven to play an indispensable role in biological processes such as
coding, decoding, regulation and expression of genes. In particular, the discovery of small
non-coding RNAs and especially miRNAs, in C. elegans first and thereafter to almost all
animals and plants, started to fill in the puzzle of a complex gene regulatory network
present within cells. The aim of this thesis is to shed more light on the features and
functionality of small RNAs. In particular, we will focus on the function and biogenesis of
miRNAs and piRNAs, across multiple species, by employing advanced computational
methods and machine learning.
We first introduce a novel method (Chimira) for the identification of miRNAs from
sets of animal and plant hairpin precursors along with post-transcriptional terminal
modifications that are not encoded by the genome. This method allows the
characterisation of the prevalence of miRNA isoforms within different cell types and/or
conditions. We have applied Chimira within a larger study that examines the effect of
terminal uridylation in RNA degradation in oocytes and cells in either embryonic or adult
stage. This study showed that uridylation is the predominant transcriptional regulation
mechanism in oocytes while it does not retain the same functionality on mRNAs and
miRNAs, both in embryonic and adult cells.
We then move on to a large-scale analysis of small RNA-Seq datasets in order to
identify potential modification signatures across specific conditions and cell types or
tissues in Human and Mouse. We extracted the full modification profiles across 461
samples, unveiling the high prevalence of modification signatures of mainly 1 to 4
nucleotides. Additionally, samples of the same cell type and/or condition tend to cluster
together based on their miRNA modification profiles while miRNA gene precursors with
close genomic proximity showed a significant degree of co-expression. Finally, we
elucidate the determinant factors in strand selection during miRNA biogenesis as well as
update the miRBase annotation with corrected miRNA isoform sequences.
Next, we introduce a novel computational method (mirnovo) for miRNA prediction
from RNA-Seq data with or without a reference genome using machine learning. We
demonstrate its efficiency by applying it to multiple datasets, including single cells and
RNaseIII deficient samples, supporting previous studies for the existence of non-canonical
miRNA biogenesis pathways. Following this, we explore and justify a novel piRNA biogenesis
pathway in Mouse which is independent of the MILI enzyme. Finally, we explore the efficiency of
CRISPR/Cas9 induced editing of miRNA targets based on the
computationally predicted accessibility of the targeted regions in the genome.
We have publicly released two web-based novel computational methods and one
on-line resource with results regarding miRNA biogenesis and function. All findings
presented in this study comprise another step forward within the journey of elucidation of
RNA functionality and we believe they will be of benefit to the scientific community.EMB