230 research outputs found
NORMSEQ: a tool for evaluation, selection and visualization of RNA-Seq normalization methods
Stichting Cancer Center Amsterdam [CCA2021-9-77 to C.G., CCA2021-5-26TKI-Health Holland [‘AQrate’ projectNederlandse Organisatie voor Wetenschappelijk Onderzoek (NWO) Talent Programme V idi [VI.V idi.193.107Nederlandse Organisatie voor Wetenschappelijk Onderzoek (NWO) Talent Programme V idi [VI.V idi.193.107Open access charge: Stichting Cancer Center Amsterdam [CCA2021-9- 77]
isomiRdb: microRNA expression at isoform resolution
A significant fraction of mature miRNA transcripts
carries sequence and/or length variations, termed
isomiRs. IsomiRs are differentially abundant in cell
types, tissues, body fluids or patients’ samples.
Not surprisingly, multiple studies describe a physiological
and pathophysiological role. Despite their
importance, systematically collected and annotated
isomiR information available in databases remains
limited. We thus developed isomiRdb, a comprehensive
resource that compiles miRNA expression data
at isomiR resolution from various sources. We processed
42 499 human miRNA-seq datasets (5.9×1011
sequencing reads) and consistently analyzed them
usingmiRMaster and sRNAbench. Our database provides
online access to the 90 483 most abundant
isomiRs (>1 RPM in at least 1% of the samples)
from 52 tissues and 188 cell types. Additionally, the
full set of over 3 million detected isomiRs is available
for download. Our resource can be queried
at the sample, miRNA or isomiR level so users
can quickly answer common questions about the
presence/absence of a particular miRNA/isomiR in
tissues of interest. Further, the database facilitates
to identify whether a potentially interesting new isoform
has been detected before and its frequency. In
addition to expression tables, isomiRdb can generate
multiple interactive visualisations including violin
plots and heatmaps. isomiRdb is free to use and
publicly available at: https://www.ccb.uni-saarland.
de/isomirdb.Saarland Universit
Bioinformatic Analysis of Ixodes ricinus Long Non-Coding RNAs Predicts Their Binding Ability of Host miRNAs
Ixodes ricinus ticks are distributed across Europe and are a vector of tick-borne diseases.
Although I. ricinus transcriptome studies have focused exclusively on protein coding genes, the last
decade witnessed a strong increase in long non-coding RNA (lncRNA) research and characterization.
Here, we report for the first time an exhaustive analysis of these non-coding molecules in I. ricinus
based on 131 RNA-seq datasets from three different BioProjects. Using this data, we obtained a
consensus set of lncRNAs and showed that lncRNA expression is stable among different studies.
While the length distribution of lncRNAs from the individual data sets is biased toward short
length values, implying the existence of technical artefacts, the consensus lncRNAs show a more
homogeneous distribution emphasizing the importance to incorporate data from different sources to
generate a solid reference set of lncRNAs. KEGG enrichment analysis of host miRNAs putatively
targeting lncRNAs upregulated upon feeding showed that these miRNAs are involved in several
relevant functions for the tick-host interaction. The possibility that at least some tick lncRNAs act as
host miRNA sponges was further explored by identifying lncRNAs with many target regions for a
given host miRNA or sets of host miRNAs that consistently target lncRNAs together. Overall, our
findings suggest that lncRNAs that may act as sponges have diverse biological roles related to the
tick–host interaction in different tissues.European Commission CZ.02.2.69/0.0/0.0/20_079/0017809FEDER (Fondo Europeo De Desarrollo Regional-European Regional Development Fund) A-BIO-481-UGR18Grant Agency of the Czech Republic 19-382 07247SERD Fundsproject CePaVip OPVVV 384 CZ.02.1.01/0.0/0.0/16_019/000075
MirGeneDB 2.1: toward a complete sampling of all major animal phyla
B.F. is supported by the Tromso forskningsstiftelse (TFS) [20 SG BF `MIRevolution']; Strategic Research Area (SFO) program of the Swedish Research Council (to V.R.) through Stockholm University (to B.F., W.K., E.M.-S. and M.R.F.); M.R.F. is additionally supported by ERC [758397 `miRCell']; South-Eastern Norway Regional Health Authority support is acknowledged [2018014 to E.H.]; P.J. Chabot is supported by the Junior Scholars Program (Dartmouth College); V.O.'s research funding was awarded to Dr Mary J. O'Connell (Associate Professor) from the School of Life Sciences University of Nottingham; M.H. is supported by the Spanish Government [AGL2017-88702C2-2-R]; University of Granada [A-BIO-481-UGR18, FEDER 18]; K.J.P. has been supported by the National Science Foundation; NASA Ames; Dartmouth College.We describe an update of MirGeneDB, the manually
curated microRNA gene database. Adhering to
uniform and consistent criteria for microRNA
annotation and nomenclature, we substantially
expanded MirGeneDB with 30 additional species
representing previously missing metazoan phyla
such as sponges, jellyfish, rotifers and flatworms.
MirGeneDB 2.1 now consists of 75 species spanning
over ∼800 million years of animal evolution, and
contains a total number of 16 670 microRNAs from
1549 families. Over 6000 microRNAs were added in
this update using ∼550 datasets with ∼7.5 billion
sequencing reads. By adding new phylogenetically
important species, especially those relevant for
the study of whole genome duplication events,
and through updating evolutionary nodes of origin
for many families and genes, we were able to
substantially refine our nomenclature system. All changes are traceable in the specifically developed
MirGeneDB version tracker. The performance of
read-pages is improved and microRNA expression
matrices for all tissues and species are now also
downloadable. Altogether, this update represents
a significant step toward a complete sampling of
all major metazoan phyla, and a widely needed
foundation for comparative microRNA genomics and
transcriptomics studies. MirGeneDB 2.1 is part of
RNAcentral and Elixir Norway, publicly and freely
available at http://www.mirgenedb.org/.Tromso forskningsstiftelse (TFS) 20_SG_BFStrategic Research Area (SFO) program of the Swedish Research Council through Stockholm UniversityEuropean Research Council (ERC)
European Commission 758397South-Eastern Norway Regional Health Authority 2018014Junior Scholars Program (Dartmouth College)School of Life Sciences University of NottinghamSpanish GovernmentEuropean Commission AGL2017-88702-C2-2-RUniversity of Granada A-BIO-481-UGR18
FEDER 18National Science Foundation (NSF)National Aeronautics & Space Administration (NASA)Dartmouth Colleg
DNA Methylation Profiling from High-Throughput Sequencing Data
In this chapter we will review the common steps in the analysis of whole genome singlebase-pair resolution methylation data including the pre-processing of the reads, the alignment and the read out of the methylation information of individual cytosines. We will specially focus on the possible error sources which need to be taken into account in order to generate high quality methylation maps. Several tools have been already developed to convert the sequencing data into knowledge about the methylation levels. We will review
the most used tools discussing both technical aspects like user-friendliness and speed, but also biologically relevant questions as the quality control. For one of these tools, NGSmethPipe, we will give a step by step tutorial including installation and methylation profiling for different data types and species. We will conclude the chapter with a brief discussion of NGSmethDB, a database for the storage of single-base resolution methylation
maps that can be used to further analyze the obtained methylation maps.This work was supported by the Ministry of Innovation and Science of the Spanish
Government [BIO2010-20219 (M.H.), BIO2008-01353 (J.L.O.)]; ‘Juan de la Cierva’ grant (to M.H.) and Basque Country ‘Programa de formación de investigadores’ grant (to G.B.)
Selective isolation of extracellular vesicles from minimally processed human plasma as a translational strategy for liquid biopsies
Background: Intercellular communication is mediated by extracellular vesicles (EVs), as they enclose selectively
packaged biomolecules that can be horizontally transferred from donor to recipient cells. Because all cells constantly
generate and recycle EVs, they provide accurate timed snapshots of individual pathophysiological status. Since blood
plasma circulates through the whole body, it is often the biofluid of choice for biomarker detection in EVs. Blood
collection is easy and minimally invasive, yet reproducible procedures to obtain pure EV samples from circulating
biofluids are still lacking. Here, we addressed central aspects of EV immunoaffinity isolation from simple and complex
matrices, such as plasma.
Methods: Cell-generated EV spike-in models were isolated and purified by size-exclusion chromatography, stained
with cellular dyes and characterized by nano flow cytometry. Fluorescently-labelled spike-in EVs emerged as reliable,
high-throughput and easily measurable readouts, which were employed to optimize our EV immunoprecipitation
strategy and evaluate its performance. Plasma-derived EVs were captured and detected using this straightforward
protocol, sequentially combining isolation and staining of specific surface markers, such as CD9 or CD41. Multiplexed
digital transcript detection data was generated using the Nanostring nCounter platform and evaluated through a
dedicated bioinformatics pipeline.
Results: Beads with covalently-conjugated antibodies on their surface outperformed streptavidin-conjugated beads,
coated with biotinylated antibodies, in EV immunoprecipitation. Fluorescent EV spike recovery evidenced that target
EV subpopulations can be efficiently retrieved from plasma, and that their enrichment is dependent not only on
complex matrix composition, but also on the EV surface phenotype. Finally, mRNA profiling experiments proved that
distinct EV subpopulations can be captured by directly targeting different surface markers. Furthermore, EVs isolated
with anti-CD61 beads enclosed mRNA expression patterns that might be associated to early-stage lung cancer, in
contrast with EVs captured through CD9, CD63 or CD81. The differential clinical value carried within each distinct EV
subset highlights the advantages of selective isolation.
Conclusions: This EV isolation protocol facilitated the extraction of clinically useful information from plasma. Compatible
with common downstream analytics, it is a readily implementable research tool, tailored to provide a truly
translational solution in routine clinical workflows, fostering the inclusion of EVs in novel liquid biopsy settings.European Commission 765492
95218
Applying Feature Selection to Improve Predictive Performance and Explainability in Lung Cancer Detection with Soft Computing
The field of biomedicine is focused on the detection and subsequent treatment of various complex diseases. Among these, cancer stands out as one of the most studied, due to the high mortality it entails. The appearance of cancer depends directly on the correct functionality and balance of the genome. Therefore, it is mandatory to ensure which of the approximately 25,000 human genes are linked with this undesirable condition. In this work, we focus on a case study of a population affected by lung cancer. Patient information has been obtained using liquid biopsy technology, i.e. capturing cell information from the bloodstream and applying an RNA-seq procedure to get the frequency of representation for each gene. The ultimate goal of this study is to find a good trade-off between predictive capacity and interpretability for the discernment of this type of cancer. To this end, we will apply a large number of techniques for feature selection, using different thresholds for the number of selected discriminant genes. Our experimental results, using Soft Computing techniques, show that model-based feature selection via Random Forest is essential for both improving the predictive capacity of the models, and also their explainability over a small subset of genes
TargetSpy: a supervised machine learning approach for microRNA target prediction
[Background]
Virtually all currently available microRNA target site prediction algorithms require the presence of a (conserved) seed match to the 5' end of the microRNA. Recently however, it has been shown that this requirement might be too stringent, leading to a substantial number of missed target sites.
[Results]
We developed TargetSpy, a novel computational approach for predicting target sites regardless of the presence of a seed match. It is based on machine learning and automatic feature selection using a wide spectrum of compositional, structural, and base pairing features covering current biological knowledge. Our model does not rely on evolutionary conservation, which allows the detection of species-specific interactions and makes TargetSpy suitable for analyzing unconserved genomic sequences.
In order to allow for an unbiased comparison of TargetSpy to other methods, we classified all algorithms into three groups: I) no seed match requirement, II) seed match requirement, and III) conserved seed match requirement. TargetSpy predictions for classes II and III are generated by appropriate postfiltering. On a human dataset revealing fold-change in protein production for five selected microRNAs our method shows superior performance in all classes. In Drosophila melanogaster not only our class II and III predictions are on par with other algorithms, but notably the class I (no-seed) predictions are just marginally less accurate. We estimate that TargetSpy predicts between 26 and 112 functional target sites without a seed match per microRNA that are missed by all other currently available algorithms.
[Conclusion]
Only a few algorithms can predict target sites without demanding a seed match and TargetSpy demonstrates a substantial improvement in prediction accuracy in that class. Furthermore, when conservation and the presence of a seed match are required, the performance is comparable with state-of-the-art algorithms. TargetSpy was trained on mouse and performs well in human and drosophila, suggesting that it may be applicable to a broad range of species. Moreover, we have demonstrated that the application of machine learning techniques in combination with upcoming deep sequencing data results in a powerful microRNA target site prediction tool http://www.targetspy.org webcite.The work of MH was supported by the Spanish Government (Grant number: BIO2008.01353) and by the Junta de Andalucia (Grant number P07-FQM-03613)
Reassessment of miRNA variant (isomiRs) composition by small RNA sequencing
IsomiRs, sequence variants of maturemicroRNAs, are usually detected and quantified using high-throughput
sequencing. Many examples of their biological relevance have been reported, but sequencing artifacts identified
as artificial variants might bias biological inference and therefore need to be ideally avoided. We conducted
a comprehensive evaluation of 10 different small RNA sequencing protocols, exploring both a theoretically
isomiR-free pool of synthetic miRNAs and HEK293T cells. We calculated that, with the exception of
two protocols, less than 5% of miRNA reads can be attributed to library preparation artifacts. Randomizedend
adapter protocols showed superior accuracy, with 40% of true biological isomiRs. Nevertheless, we
demonstrate concordance across protocols for selected miRNAs in non-templated uridyl additions. Notably,
NTA-U calling and isomiR target prediction can be inaccurate when using protocols with poor single-nucleotide
resolution. Our results highlight the relevance of protocol choice for biological isomiRs detection and
annotation, which has key potential implications for biomedical applications
Genome-Wide Analysis of microRNA Expression Profile in Roots and Leaves of Three Wheat Cultivars under Water and Drought Conditions
The following are available online at https://www.mdpi.com/article/
10.3390/biom13030440/s1. Figure S1: Fraction of different RNA species. Figure S2: Read length
distribution of all genome mapped reads (a) from total reads (redundant reads) and (b) from unique
reads (non-redundant reads. Figure S3: Library normalized RPM values distribution per sample of
novel miRNAs. Figure S4: qRT-PCR analysis of the expression of novel miRNA Tae-mir-novel54-5p
and known miRNA Tae-miR827c in 10 samples. Figure S5: Network analysis of (a) target genes
by drought downregulated miRNAs and (b) drought upregulated miRNAs in leaves. Table S1:
Quality and read mapping report. Table S2: Fraction of different RNA species. Table S3: Read
length distribution of all genome mapped reads from total reads (redundant reads). Table S4:
Read length distribution of all genome mapped reads from unique reads (non-redundant reads.
Table S5: All miRNAs expression matrix. Table S6: Expression matrix of all the miRNAs in the
SRA datasets. Table S7: miRNA expression matrix of all miRNAs in the Zea mays SRA datasets. Table S8: Degradome based target-gene predicted interactions. Table S9: qRT-PCR assay information.
Table S10: Enrichment of functional annotations in miRNA target genes. Table S11: Mature and
hairpin sequences of predicted miRNAs. Table S12: Degradome miRNA-target interaction predictions
using CleaveLand4.Wheat is one of the most important food sources on Earth. MicroRNAs (miRNAs) play
important roles in wheat productivity. To identify wheat miRNAs as well as their expression profiles
under drought condition, we constructed and sequenced small RNA (sRNA) libraries from the leaves
and roots of three wheat cultivars (Kukri, RAC875 and Excalibur) under water and drought conditions.
A total of 636 known miRNAs and 294 novel miRNAs were identified, of which 34 miRNAs were
tissue- or cultivar-specific. Among these, 314 were significantly regulated under drought conditions.
miRNAs that were drought-regulated in all cultivars displayed notably higher expression than those
that responded in a cultivar-specific manner. Cultivar-specific drought response miRNAs were
mainly detected in roots and showed significantly different drought regulations between cultivars.
By using wheat degradome library, 6619 target genes were identified. Many target genes were
strongly enriched for protein domains, such as MEKHLA, that play roles in drought response.
Targeting analysis showed that drought-downregulated miRNAs targeted more genes than drought-
upregulated miRNAs. Furthermore, such genes had more important functions. Additionally, the
genes targeted by drought-downregulated miRNAs had multiple interactions with each other, while
the genes targeted by drought-upregulated miRNAs had no interactions. Our data provide valuable
information on wheat miRNA expression profiles and potential functions in different tissues, cultivars
and drought conditions
- …