220 research outputs found

    NORMSEQ: a tool for evaluation, selection and visualization of RNA-Seq normalization methods

    Get PDF
    Stichting Cancer Center Amsterdam [CCA2021-9-77 to C.G., CCA2021-5-26TKI-Health Holland [‘AQrate’ projectNederlandse Organisatie voor Wetenschappelijk Onderzoek (NWO) Talent Programme V idi [VI.V idi.193.107Nederlandse Organisatie voor Wetenschappelijk Onderzoek (NWO) Talent Programme V idi [VI.V idi.193.107Open access charge: Stichting Cancer Center Amsterdam [CCA2021-9- 77]

    isomiRdb: microRNA expression at isoform resolution

    Get PDF
    A significant fraction of mature miRNA transcripts carries sequence and/or length variations, termed isomiRs. IsomiRs are differentially abundant in cell types, tissues, body fluids or patients’ samples. Not surprisingly, multiple studies describe a physiological and pathophysiological role. Despite their importance, systematically collected and annotated isomiR information available in databases remains limited. We thus developed isomiRdb, a comprehensive resource that compiles miRNA expression data at isomiR resolution from various sources. We processed 42 499 human miRNA-seq datasets (5.9×1011 sequencing reads) and consistently analyzed them usingmiRMaster and sRNAbench. Our database provides online access to the 90 483 most abundant isomiRs (>1 RPM in at least 1% of the samples) from 52 tissues and 188 cell types. Additionally, the full set of over 3 million detected isomiRs is available for download. Our resource can be queried at the sample, miRNA or isomiR level so users can quickly answer common questions about the presence/absence of a particular miRNA/isomiR in tissues of interest. Further, the database facilitates to identify whether a potentially interesting new isoform has been detected before and its frequency. In addition to expression tables, isomiRdb can generate multiple interactive visualisations including violin plots and heatmaps. isomiRdb is free to use and publicly available at: https://www.ccb.uni-saarland. de/isomirdb.Saarland Universit

    Bioinformatic Analysis of Ixodes ricinus Long Non-Coding RNAs Predicts Their Binding Ability of Host miRNAs

    Get PDF
    Ixodes ricinus ticks are distributed across Europe and are a vector of tick-borne diseases. Although I. ricinus transcriptome studies have focused exclusively on protein coding genes, the last decade witnessed a strong increase in long non-coding RNA (lncRNA) research and characterization. Here, we report for the first time an exhaustive analysis of these non-coding molecules in I. ricinus based on 131 RNA-seq datasets from three different BioProjects. Using this data, we obtained a consensus set of lncRNAs and showed that lncRNA expression is stable among different studies. While the length distribution of lncRNAs from the individual data sets is biased toward short length values, implying the existence of technical artefacts, the consensus lncRNAs show a more homogeneous distribution emphasizing the importance to incorporate data from different sources to generate a solid reference set of lncRNAs. KEGG enrichment analysis of host miRNAs putatively targeting lncRNAs upregulated upon feeding showed that these miRNAs are involved in several relevant functions for the tick-host interaction. The possibility that at least some tick lncRNAs act as host miRNA sponges was further explored by identifying lncRNAs with many target regions for a given host miRNA or sets of host miRNAs that consistently target lncRNAs together. Overall, our findings suggest that lncRNAs that may act as sponges have diverse biological roles related to the tick–host interaction in different tissues.European Commission CZ.02.2.69/0.0/0.0/20_079/0017809FEDER (Fondo Europeo De Desarrollo Regional-European Regional Development Fund) A-BIO-481-UGR18Grant Agency of the Czech Republic 19-382 07247SERD Fundsproject CePaVip OPVVV 384 CZ.02.1.01/0.0/0.0/16_019/000075

    MirGeneDB 2.1: toward a complete sampling of all major animal phyla

    Get PDF
    B.F. is supported by the Tromso forskningsstiftelse (TFS) [20 SG BF `MIRevolution']; Strategic Research Area (SFO) program of the Swedish Research Council (to V.R.) through Stockholm University (to B.F., W.K., E.M.-S. and M.R.F.); M.R.F. is additionally supported by ERC [758397 `miRCell']; South-Eastern Norway Regional Health Authority support is acknowledged [2018014 to E.H.]; P.J. Chabot is supported by the Junior Scholars Program (Dartmouth College); V.O.'s research funding was awarded to Dr Mary J. O'Connell (Associate Professor) from the School of Life Sciences University of Nottingham; M.H. is supported by the Spanish Government [AGL2017-88702C2-2-R]; University of Granada [A-BIO-481-UGR18, FEDER 18]; K.J.P. has been supported by the National Science Foundation; NASA Ames; Dartmouth College.We describe an update of MirGeneDB, the manually curated microRNA gene database. Adhering to uniform and consistent criteria for microRNA annotation and nomenclature, we substantially expanded MirGeneDB with 30 additional species representing previously missing metazoan phyla such as sponges, jellyfish, rotifers and flatworms. MirGeneDB 2.1 now consists of 75 species spanning over ∼800 million years of animal evolution, and contains a total number of 16 670 microRNAs from 1549 families. Over 6000 microRNAs were added in this update using ∼550 datasets with ∼7.5 billion sequencing reads. By adding new phylogenetically important species, especially those relevant for the study of whole genome duplication events, and through updating evolutionary nodes of origin for many families and genes, we were able to substantially refine our nomenclature system. All changes are traceable in the specifically developed MirGeneDB version tracker. The performance of read-pages is improved and microRNA expression matrices for all tissues and species are now also downloadable. Altogether, this update represents a significant step toward a complete sampling of all major metazoan phyla, and a widely needed foundation for comparative microRNA genomics and transcriptomics studies. MirGeneDB 2.1 is part of RNAcentral and Elixir Norway, publicly and freely available at http://www.mirgenedb.org/.Tromso forskningsstiftelse (TFS) 20_SG_BFStrategic Research Area (SFO) program of the Swedish Research Council through Stockholm UniversityEuropean Research Council (ERC) European Commission 758397South-Eastern Norway Regional Health Authority 2018014Junior Scholars Program (Dartmouth College)School of Life Sciences University of NottinghamSpanish GovernmentEuropean Commission AGL2017-88702-C2-2-RUniversity of Granada A-BIO-481-UGR18 FEDER 18National Science Foundation (NSF)National Aeronautics & Space Administration (NASA)Dartmouth Colleg

    DNA Methylation Profiling from High-Throughput Sequencing Data

    Get PDF
    In this chapter we will review the common steps in the analysis of whole genome singlebase-pair resolution methylation data including the pre-processing of the reads, the alignment and the read out of the methylation information of individual cytosines. We will specially focus on the possible error sources which need to be taken into account in order to generate high quality methylation maps. Several tools have been already developed to convert the sequencing data into knowledge about the methylation levels. We will review the most used tools discussing both technical aspects like user-friendliness and speed, but also biologically relevant questions as the quality control. For one of these tools, NGSmethPipe, we will give a step by step tutorial including installation and methylation profiling for different data types and species. We will conclude the chapter with a brief discussion of NGSmethDB, a database for the storage of single-base resolution methylation maps that can be used to further analyze the obtained methylation maps.This work was supported by the Ministry of Innovation and Science of the Spanish Government [BIO2010-20219 (M.H.), BIO2008-01353 (J.L.O.)]; ‘Juan de la Cierva’ grant (to M.H.) and Basque Country ‘Programa de formación de investigadores’ grant (to G.B.)

    Selective isolation of extracellular vesicles from minimally processed human plasma as a translational strategy for liquid biopsies

    Get PDF
    Background: Intercellular communication is mediated by extracellular vesicles (EVs), as they enclose selectively packaged biomolecules that can be horizontally transferred from donor to recipient cells. Because all cells constantly generate and recycle EVs, they provide accurate timed snapshots of individual pathophysiological status. Since blood plasma circulates through the whole body, it is often the biofluid of choice for biomarker detection in EVs. Blood collection is easy and minimally invasive, yet reproducible procedures to obtain pure EV samples from circulating biofluids are still lacking. Here, we addressed central aspects of EV immunoaffinity isolation from simple and complex matrices, such as plasma. Methods: Cell-generated EV spike-in models were isolated and purified by size-exclusion chromatography, stained with cellular dyes and characterized by nano flow cytometry. Fluorescently-labelled spike-in EVs emerged as reliable, high-throughput and easily measurable readouts, which were employed to optimize our EV immunoprecipitation strategy and evaluate its performance. Plasma-derived EVs were captured and detected using this straightforward protocol, sequentially combining isolation and staining of specific surface markers, such as CD9 or CD41. Multiplexed digital transcript detection data was generated using the Nanostring nCounter platform and evaluated through a dedicated bioinformatics pipeline. Results: Beads with covalently-conjugated antibodies on their surface outperformed streptavidin-conjugated beads, coated with biotinylated antibodies, in EV immunoprecipitation. Fluorescent EV spike recovery evidenced that target EV subpopulations can be efficiently retrieved from plasma, and that their enrichment is dependent not only on complex matrix composition, but also on the EV surface phenotype. Finally, mRNA profiling experiments proved that distinct EV subpopulations can be captured by directly targeting different surface markers. Furthermore, EVs isolated with anti-CD61 beads enclosed mRNA expression patterns that might be associated to early-stage lung cancer, in contrast with EVs captured through CD9, CD63 or CD81. The differential clinical value carried within each distinct EV subset highlights the advantages of selective isolation. Conclusions: This EV isolation protocol facilitated the extraction of clinically useful information from plasma. Compatible with common downstream analytics, it is a readily implementable research tool, tailored to provide a truly translational solution in routine clinical workflows, fostering the inclusion of EVs in novel liquid biopsy settings.European Commission 765492 95218

    Applying Feature Selection to Improve Predictive Performance and Explainability in Lung Cancer Detection with Soft Computing

    Get PDF
    The field of biomedicine is focused on the detection and subsequent treatment of various complex diseases. Among these, cancer stands out as one of the most studied, due to the high mortality it entails. The appearance of cancer depends directly on the correct functionality and balance of the genome. Therefore, it is mandatory to ensure which of the approximately 25,000 human genes are linked with this undesirable condition. In this work, we focus on a case study of a population affected by lung cancer. Patient information has been obtained using liquid biopsy technology, i.e. capturing cell information from the bloodstream and applying an RNA-seq procedure to get the frequency of representation for each gene. The ultimate goal of this study is to find a good trade-off between predictive capacity and interpretability for the discernment of this type of cancer. To this end, we will apply a large number of techniques for feature selection, using different thresholds for the number of selected discriminant genes. Our experimental results, using Soft Computing techniques, show that model-based feature selection via Random Forest is essential for both improving the predictive capacity of the models, and also their explainability over a small subset of genes

    TargetSpy: a supervised machine learning approach for microRNA target prediction

    Get PDF
    [Background] Virtually all currently available microRNA target site prediction algorithms require the presence of a (conserved) seed match to the 5' end of the microRNA. Recently however, it has been shown that this requirement might be too stringent, leading to a substantial number of missed target sites. [Results] We developed TargetSpy, a novel computational approach for predicting target sites regardless of the presence of a seed match. It is based on machine learning and automatic feature selection using a wide spectrum of compositional, structural, and base pairing features covering current biological knowledge. Our model does not rely on evolutionary conservation, which allows the detection of species-specific interactions and makes TargetSpy suitable for analyzing unconserved genomic sequences. In order to allow for an unbiased comparison of TargetSpy to other methods, we classified all algorithms into three groups: I) no seed match requirement, II) seed match requirement, and III) conserved seed match requirement. TargetSpy predictions for classes II and III are generated by appropriate postfiltering. On a human dataset revealing fold-change in protein production for five selected microRNAs our method shows superior performance in all classes. In Drosophila melanogaster not only our class II and III predictions are on par with other algorithms, but notably the class I (no-seed) predictions are just marginally less accurate. We estimate that TargetSpy predicts between 26 and 112 functional target sites without a seed match per microRNA that are missed by all other currently available algorithms. [Conclusion] Only a few algorithms can predict target sites without demanding a seed match and TargetSpy demonstrates a substantial improvement in prediction accuracy in that class. Furthermore, when conservation and the presence of a seed match are required, the performance is comparable with state-of-the-art algorithms. TargetSpy was trained on mouse and performs well in human and drosophila, suggesting that it may be applicable to a broad range of species. Moreover, we have demonstrated that the application of machine learning techniques in combination with upcoming deep sequencing data results in a powerful microRNA target site prediction tool http://www.targetspy.org webcite.The work of MH was supported by the Spanish Government (Grant number: BIO2008.01353) and by the Junta de Andalucia (Grant number P07-FQM-03613)

    Reassessment of miRNA variant (isomiRs) composition by small RNA sequencing

    Get PDF
    IsomiRs, sequence variants of maturemicroRNAs, are usually detected and quantified using high-throughput sequencing. Many examples of their biological relevance have been reported, but sequencing artifacts identified as artificial variants might bias biological inference and therefore need to be ideally avoided. We conducted a comprehensive evaluation of 10 different small RNA sequencing protocols, exploring both a theoretically isomiR-free pool of synthetic miRNAs and HEK293T cells. We calculated that, with the exception of two protocols, less than 5% of miRNA reads can be attributed to library preparation artifacts. Randomizedend adapter protocols showed superior accuracy, with 40% of true biological isomiRs. Nevertheless, we demonstrate concordance across protocols for selected miRNAs in non-templated uridyl additions. Notably, NTA-U calling and isomiR target prediction can be inaccurate when using protocols with poor single-nucleotide resolution. Our results highlight the relevance of protocol choice for biological isomiRs detection and annotation, which has key potential implications for biomedical applications

    Genome-Wide Analysis of microRNA Expression Profile in Roots and Leaves of Three Wheat Cultivars under Water and Drought Conditions

    Get PDF
    The following are available online at https://www.mdpi.com/article/ 10.3390/biom13030440/s1. Figure S1: Fraction of different RNA species. Figure S2: Read length distribution of all genome mapped reads (a) from total reads (redundant reads) and (b) from unique reads (non-redundant reads. Figure S3: Library normalized RPM values distribution per sample of novel miRNAs. Figure S4: qRT-PCR analysis of the expression of novel miRNA Tae-mir-novel54-5p and known miRNA Tae-miR827c in 10 samples. Figure S5: Network analysis of (a) target genes by drought downregulated miRNAs and (b) drought upregulated miRNAs in leaves. Table S1: Quality and read mapping report. Table S2: Fraction of different RNA species. Table S3: Read length distribution of all genome mapped reads from total reads (redundant reads). Table S4: Read length distribution of all genome mapped reads from unique reads (non-redundant reads. Table S5: All miRNAs expression matrix. Table S6: Expression matrix of all the miRNAs in the SRA datasets. Table S7: miRNA expression matrix of all miRNAs in the Zea mays SRA datasets. Table S8: Degradome based target-gene predicted interactions. Table S9: qRT-PCR assay information. Table S10: Enrichment of functional annotations in miRNA target genes. Table S11: Mature and hairpin sequences of predicted miRNAs. Table S12: Degradome miRNA-target interaction predictions using CleaveLand4.Wheat is one of the most important food sources on Earth. MicroRNAs (miRNAs) play important roles in wheat productivity. To identify wheat miRNAs as well as their expression profiles under drought condition, we constructed and sequenced small RNA (sRNA) libraries from the leaves and roots of three wheat cultivars (Kukri, RAC875 and Excalibur) under water and drought conditions. A total of 636 known miRNAs and 294 novel miRNAs were identified, of which 34 miRNAs were tissue- or cultivar-specific. Among these, 314 were significantly regulated under drought conditions. miRNAs that were drought-regulated in all cultivars displayed notably higher expression than those that responded in a cultivar-specific manner. Cultivar-specific drought response miRNAs were mainly detected in roots and showed significantly different drought regulations between cultivars. By using wheat degradome library, 6619 target genes were identified. Many target genes were strongly enriched for protein domains, such as MEKHLA, that play roles in drought response. Targeting analysis showed that drought-downregulated miRNAs targeted more genes than drought- upregulated miRNAs. Furthermore, such genes had more important functions. Additionally, the genes targeted by drought-downregulated miRNAs had multiple interactions with each other, while the genes targeted by drought-upregulated miRNAs had no interactions. Our data provide valuable information on wheat miRNA expression profiles and potential functions in different tissues, cultivars and drought conditions
    corecore