21 research outputs found
Large-scale analysis of microRNA expression, epi-transcriptomic features and biogenesis.
MicroRNAs are important genetic regulators in both animals and plants. They have a range of functions spanning development, differentiation, growth, metabolism and disease. The advent of next-generation sequencing technologies has made it a relatively straightforward task to detect these molecules and their relative expression via sequencing. There are a large number of published studies with deposited datasets. However, there are currently few resources that capitalize on these data to better understand the features, distribution and biogenesis of miRNAs. Herein, we focus on Human and Mouse for which the majority of data are available. We reanalyse sequencing data from 461 samples into a coordinated catalog of microRNA expression. We use this to perform large-scale analyses of miRNA function and biogenesis. These analyses include global expression comparison, co-expression of miRNA clusters and the prediction of miRNA strand-specificity and underlying constraints. Additionally, we report for the first time a global analysis of miRNA epi-transcriptomic modifications and assess their prevalence across tissues, samples and families. Finally, we report a list of potentially mis-annotated miRNAs in miRBase based on their aggregated modification profiles. The results have been collated into a comprehensive online repository of miRNA expression and features such as modifications and RNA editing events, which is available at: http://wwwdev.ebi.ac.uk/enright-dev/miratlas. We believe these findings will further contribute to our understanding of miRNA function in animals and benefit the miRNA community in general
Mirnovo: genome-free prediction of microRNAs from small RNA sequencing data and single-cells using decision forests.
The discovery of microRNAs (miRNAs) remains an important problem, particularly given the growth of high-throughput sequencing, cell sorting and single cell biology. While a large number of miRNAs have already been annotated, there may well be large numbers of miRNAs that are expressed in very particular cell types and remain elusive. Sequencing allows us to quickly and accurately identify the expression of known miRNAs from small RNA-Seq data. The biogenesis of miRNAs leads to very specific characteristics observed in their sequences. In brief, miRNAs usually have a well-defined 5' end and a more flexible 3' end with the possibility of 3' tailing events, such as uridylation. Previous approaches to the prediction of novel miRNAs usually involve the analysis of structural features of miRNA precursor hairpin sequences obtained from genome sequence. We surmised that it may be possible to identify miRNAs by using these biogenesis features observed directly from sequenced reads, solely or in addition to structural analysis from genome data. To this end, we have developed mirnovo, a machine learning based algorithm, which is able to identify known and novel miRNAs in animals and plants directly from small RNA-Seq data, with or without a reference genome. This method performs comparably to existing tools, however is simpler to use with reduced run time. Its performance and accuracy has been tested on multiple datasets, including species with poorly assembled genomes, RNaseIII (Drosha and/or Dicer) deficient samples and single cells (at both embryonic and adult stage)
Recommended from our members
Rare variant contribution to human disease in 281,104 UK Biobank exomes.
Genome-wide association studies have uncovered thousands of common variants associated with human disease, but the contribution of rare variants to common disease remains relatively unexplored. The UK Biobank contains detailed phenotypic data linked to medical records for approximately 500,000 participants, offering an unprecedented opportunity to evaluate the effect of rare variation on a broad collection of traits1,2. Here we study the relationships between rare protein-coding variants and 17,361 binary and 1,419 quantitative phenotypes using exome sequencing data from 269,171 UK Biobank participants of European ancestry. Gene-based collapsing analyses revealed 1,703 statistically significant gene-phenotype associations for binary traits, with a median odds ratio of 12.4. Furthermore, 83% of these associations were undetectable via single-variant association tests, emphasizing the power of gene-based collapsing analysis in the setting of high allelic heterogeneity. Gene-phenotype associations were also significantly enriched for loss-of-function-mediated traits and approved drug targets. Finally, we performed ancestry-specific and pan-ancestry collapsing analyses using exome sequencing data from 11,933 UK Biobank participants of African, East Asian or South Asian ancestry. Our results highlight a significant contribution of rare variants to common disease. Summary statistics are publicly available through an interactive portal ( http://azphewas.com/ )
In situ functional dissection of RNA cis-regulatory elements by multiplex CRISPR-Cas9 genome engineering.
RNA regulatory elements (RREs) are an important yet relatively under-explored facet of gene regulation. Deciphering the prevalence and functional impact of this post-transcriptional control layer requires technologies for disrupting RREs without perturbing cellular homeostasis. Here we describe genome-engineering based evaluation of RNA regulatory element activity (GenERA), a clustered regularly interspaced short palindromic repeats (CRISPR)-Cas9 platform for in situ high-content functional analysis of RREs. We use GenERA to survey the entire regulatory landscape of a 3'UTR, and apply it in a multiplex fashion to analyse combinatorial interactions between sets of miRNA response elements (MREs), providing strong evidence for cooperative activity. We also employ this technology to probe the functionality of an entire MRE network under cellular homeostasis, and show that high-resolution analysis of the GenERA dataset can be used to extract functional features of MREs. This study provides a genome editing-based multiplex strategy for direct functional interrogation of RNA cis-regulatory elements in a native cellular environment
A MILI-independent piRNA biogenesis pathway empowers partial germline reprogramming.
In mice, the pathway involving PIWI and PIWI-interacting RNA (PIWI-piRNA) is essential to re-establish transposon silencing during male-germline reprogramming. The cytoplasmic PIWI protein MILI mediates piRNA-guided transposon RNA cleavage as well as piRNA amplification. MIWI2's binding to piRNA and its nuclear localization are proposed to be dependent upon MILI function. Here, we demonstrate the existence of a piRNA biogenesis pathway that sustains partial MIWI2 function and reprogramming activity in the absence of MILI
Re-annotation of 191 developmental and epileptic encephalopathy-associated genes unmasks de novo variants in SCN1A
Funder: Agency for Innovation by Science and Technology, IWTFunder: U.S. Department of Health & Human Services | NIH | National Human Genome Research Institute (NHGRI)Funder: BOF-University of Antwerp (FFB180053) and FWO (1861419N).Abstract: The developmental and epileptic encephalopathies (DEE) are a group of rare, severe neurodevelopmental disorders, where even the most thorough sequencing studies leave 60–65% of patients without a molecular diagnosis. Here, we explore the incompleteness of transcript models used for exome and genome analysis as one potential explanation for a lack of current diagnoses. Therefore, we have updated the GENCODE gene annotation for 191 epilepsy-associated genes, using human brain-derived transcriptomic libraries and other data to build 3,550 putative transcript models. Our annotations increase the transcriptional ‘footprint’ of these genes by over 674 kb. Using SCN1A as a case study, due to its close phenotype/genotype correlation with Dravet syndrome, we screened 122 people with Dravet syndrome or a similar phenotype with a panel of exon sequences representing eight established genes and identified two de novo SCN1A variants that now - through improved gene annotation - are ascribed to residing among our exons. These two (from 122 screened people, 1.6%) molecular diagnoses carry significant clinical implications. Furthermore, we identified a previously classified SCN1A intronic Dravet syndrome-associated variant that now lies within a deeply conserved exon. Our findings illustrate the potential gains of thorough gene annotation in improving diagnostic yields for genetic disorders
Recommended from our members
Elucidating the function and biogenesis of small non-coding RNAs using novel computational methods & machine learning.
The discovery of RNA in 1868 by Friedrich Miescher was meant to be the prologue to an
exciting new era in Biology full of scientific breakthroughs and accomplishments. Since
then, RNAs have been proven to play an indispensable role in biological processes such as
coding, decoding, regulation and expression of genes. In particular, the discovery of small
non-coding RNAs and especially miRNAs, in C. elegans first and thereafter to almost all
animals and plants, started to fill in the puzzle of a complex gene regulatory network
present within cells. The aim of this thesis is to shed more light on the features and
functionality of small RNAs. In particular, we will focus on the function and biogenesis of
miRNAs and piRNAs, across multiple species, by employing advanced computational
methods and machine learning.
We first introduce a novel method (Chimira) for the identification of miRNAs from
sets of animal and plant hairpin precursors along with post-transcriptional terminal
modifications that are not encoded by the genome. This method allows the
characterisation of the prevalence of miRNA isoforms within different cell types and/or
conditions. We have applied Chimira within a larger study that examines the effect of
terminal uridylation in RNA degradation in oocytes and cells in either embryonic or adult
stage. This study showed that uridylation is the predominant transcriptional regulation
mechanism in oocytes while it does not retain the same functionality on mRNAs and
miRNAs, both in embryonic and adult cells.
We then move on to a large-scale analysis of small RNA-Seq datasets in order to
identify potential modification signatures across specific conditions and cell types or
tissues in Human and Mouse. We extracted the full modification profiles across 461
samples, unveiling the high prevalence of modification signatures of mainly 1 to 4
nucleotides. Additionally, samples of the same cell type and/or condition tend to cluster
together based on their miRNA modification profiles while miRNA gene precursors with
close genomic proximity showed a significant degree of co-expression. Finally, we
elucidate the determinant factors in strand selection during miRNA biogenesis as well as
update the miRBase annotation with corrected miRNA isoform sequences.
Next, we introduce a novel computational method (mirnovo) for miRNA prediction
from RNA-Seq data with or without a reference genome using machine learning. We
demonstrate its efficiency by applying it to multiple datasets, including single cells and
RNaseIII deficient samples, supporting previous studies for the existence of non-canonical
miRNA biogenesis pathways. Following this, we explore and justify a novel piRNA biogenesis
pathway in Mouse which is independent of the MILI enzyme. Finally, we explore the efficiency of
CRISPR/Cas9 induced editing of miRNA targets based on the
computationally predicted accessibility of the targeted regions in the genome.
We have publicly released two web-based novel computational methods and one
on-line resource with results regarding miRNA biogenesis and function. All findings
presented in this study comprise another step forward within the journey of elucidation of
RNA functionality and we believe they will be of benefit to the scientific community.EMB
Chimira: analysis of small RNA sequencing data and microRNA modifications.
UNLABELLED: Chimira is a web-based system for microRNA (miRNA) analysis from small RNA-Seq data. Sequences are automatically cleaned, trimmed, size selected and mapped directly to miRNA hairpin sequences. This generates count-based miRNA expression data for subsequent statistical analysis. Moreover, it is capable of identifying epi-transcriptomic modifications in the input sequences. Supported modification types include multiple types of 3'-modifications (e.g. uridylation, adenylation), 5'-modifications and also internal modifications or variation (ADAR editing or single nucleotide polymorphisms). Besides cleaning and mapping of input sequences to miRNAs, Chimira provides a simple and intuitive set of tools for the analysis and interpretation of the results (see also Supplementary Material). These allow the visual study of the differential expression between two specific samples or sets of samples, the identification of the most highly expressed miRNAs within sample pairs (or sets of samples) and also the projection of the modification profile for specific miRNAs across all samples. Other tools have already been published in the past for various types of small RNA-Seq analysis, such as UEA workbench, seqBuster, MAGI, OASIS and CAP-miRSeq, CPSS for modifications identification. A comprehensive comparison of Chimira with each of these tools is provided in the Supplementary Material. Chimira outperforms all of these tools in total execution speed and aims to facilitate simple, fast and reliable analysis of small RNA-Seq data allowing also, for the first time, identification of global microRNA modification profiles in a simple intuitive interface. AVAILABILITY AND IMPLEMENTATION: Chimira has been developed as a web application and it is accessible here: http://www.ebi.ac.uk/research/enright/software/chimira. CONTACT: [email protected] SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online