112 research outputs found
Closing the circle : current state and perspectives of circular RNA databases
Circular RNAs (circRNAs) are covalently closed RNA molecules that have been linked to various diseases, including cancer. However, a precise function and working mechanism are lacking for the larger majority. Following many different experimental and computational approaches to identify circRNAs, multiple circRNA databases were developed as well. Unfortunately, there are several major issues with the current circRNA databases, which substantially hamper progression in the field. First, as the overlap in content is limited, a true reference set of circRNAs is lacking. This results from the low abundance and highly specific expression of circRNAs, and varying sequencing methods, data-analysis pipelines, and circRNA detection tools. A second major issue is the use of ambiguous nomenclature. Thus, redundant or even conflicting names for circRNAs across different databases contribute to the reproducibility crisis. Third, circRNA databases, in essence, rely on the position of the circRNA back-splice junction, whereas alternative splicing could result in circRNAs with different length and sequence. To uniquely identify a circRNA molecule, the full circular sequence is required. Fourth, circRNA databases annotate circRNAs' microRNA binding and protein-coding potential, but these annotations are generally based on presumed circRNA sequences. Finally, several databases are not regularly updated, contain incomplete data or suffer from connectivity issues. In this review, we present a comprehensive overview of the current circRNA databases and their content, features, and usability. In addition to discussing the current issues regarding circRNA databases, we come with important suggestions to streamline further research in this growing field
SPECS: a non-parametric method to identify tissue-specific molecular features for unbalanced sample groups
Background
To understand biology and differences among various tissues or cell types, one typically searches for molecular features that display characteristic abundance patterns. Several specificity metrics have been introduced to identify tissue-specific molecular features, but these either require an equal number of replicates per tissue or they can’t handle replicates at all.
Results
We describe a non-parametric specificity score that is compatible with unequal sample group sizes. To demonstrate its usefulness, the specificity score was calculated on all GTEx samples, detecting known and novel tissue-specific genes. A webtool was developed to browse these results for genes or tissues of interest. An example python implementation of SPECS is available at https://github.com/celineeveraert/SPECS. The precalculated SPECS results on the GTEx data are available through a user-friendly browser at specs.cmgg.be.
Conclusions
SPECS is a non-parametric method that identifies known and novel specific-expressed genes. In addition, SPECS could be adopted for other features and applications
Long non-coding RNA expression profiling in the NCI60 cancer cell line panel using high-throughput RT-qPCR
Long non-coding RNAs (lncRNAs) form a new class of RNA molecules implicated in various aspects of protein coding gene expression regulation. To study lncRNAs in cancer, we generated expression profiles for 1707 human lncRNAs in the NCI60 cancer cell line panel using a high-throughput nanowell RT-qPCR platform. We describe how qPCR assays were designed and validated and provide processed and normalized expression data for further analysis. Data quality is demonstrated by matching the lncRNA expression profiles with phenotypic and genomic characteristics of the cancer cell lines. This data set can be integrated with publicly available omics and pharmacological data sets to uncover novel associations between lncRNA expression and mRNA expression, miRNA expression, DNA copy number, protein coding gene mutation status or drug response
Long non-coding RNAs in cutaneous melanoma : clinical perspectives
Metastatic melanoma of the skin has a high mortality despite the recent introduction of targeted therapy and immunotherapy. Long non-coding RNAs (lncRNAs) are defined as transcripts of more than 200 nucleotides in length that lack protein-coding potential. There is growing evidence that lncRNAs play an important role in gene regulation, including oncogenesis. We present 13 lncRNA genes involved in the pathogenesis of cutaneous melanoma through a variety of pathways and molecular interactions. Some of these lncRNAs are possible biomarkers or therapeutic targets for malignant melanoma
decodeRNA-predicting non-coding RNA functions using guilt-by-association
Although the long non-coding RNA (lncRNA) landscape is expanding rapidly, only a small number of lncRNAs have been functionally annotated. Here, we present decodeRNA (http://www.decoderna.org), a database providing functional contexts for both human lncRNAs and microRNAs in 29 cancer and 12 normal tissue types. With state-of-the-art data mining and visualization options, easy access to results and a straightforward user interface, decodeRNA aims to be a powerful tool for researchers in the ncRNA field
SPECS : a non-parameteric method to identify tissue-specific molecular features for unbalanced sample groups
2020 The Author(s). Background: To understand biology and differences among various tissues or cell types, one typically searches for molecular features that display characteristic abundance patterns. Several specificity metrics have been introduced to identify tissue-specific molecular features, but these either require an equal number of replicates per tissue or they can\u27t handle replicates at all. Results: We describe a non-parametric specificity score that is compatible with unequal sample group sizes. To demonstrate its usefulness, the specificity score was calculated on all GTEx samples, detecting known and novel tissue-specific genes. A webtool was developed to browse these results for genes or tissues of interest. An example python implementation of SPECS is available at https://github.com/celineeveraert/SPECS. The precalculated SPECS results on the GTEx data are available through a user-friendly browser at specs.cmgg.be. Conclusions: SPECS is a non-parametric method that identifies known and novel specific-expressed genes. In addition, SPECS could be adopted for other features and applications
Identification of long non-coding RNAs involved in neuronal development and intellectual disability
Recently, exome sequencing led to the identification of causal mutations in 16–31% of patients with intellectual disability (ID), leaving the underlying cause for many patients unidentified. In this context, the noncoding part of the human genome remains largely unexplored. For many long non-coding RNAs (lncRNAs) a crucial role in neurodevelopment and hence the human brain is anticipated. Here we aimed at identifying lncRNAs associated with neuronal development and ID. Therefore, we applied an integrated genomics approach, harnessing several public epigenetic datasets. We found that the presence of neuron-specific H3K4me3 confers the highest specificity for genes involved in neurodevelopment and ID. Based on the presence of this feature and GWAS hits for CNS disorders, we identified 53 candidate lncRNA genes. Extensive expression profiling on human brain samples and other tissues, followed by Gene Set Enrichment Analysis indicates that at least 24 of these lncRNAs are indeed implicated in processes such as synaptic transmission, nervous system development and neurogenesis. The bidirectional or antisense overlapping orientation relative to multiple coding genes involved in neuronal processes supports these results. In conclusion, we identified several lncRNA genes putatively involved in neurodevelopment and CNS disorders, providing a resource for functional studies
Zipper plot : visualizing transcriptional activity of genomic regions
Background: Reconstructing transcript models from RNA-sequencing (RNA-seq) data and establishing these as independent transcriptional units can be a challenging task. Current state-of-the-art tools for long non-coding RNA (lncRNA) annotation are mainly based on evolutionary constraints, which may result in false negatives due to the overall limited conservation of lncRNAs.
Results: To tackle this problem we have developed the Zipper plot, a novel visualization and analysis method that enables users to simultaneously interrogate thousands of human putative transcription start sites (TSSs) in relation to various features that are indicative for transcriptional activity. These include publicly available CAGE-sequencing, ChIP-sequencing and DNase-sequencing datasets. Our method only requires three tab-separated fields (chromosome, genomic coordinate of the TSS and strand) as input and generates a report that includes a detailed summary table, a Zipper plot and several statistics derived from this plot.
Conclusion: Using the Zipper plot, we found evidence of transcription for a set of well-characterized lncRNAs and observed that fewer mono-exonic lncRNAs have CAGE peaks overlapping with their TSSs compared to multi-exonic lncRNAs. Using publicly available RNA-seq data, we found more than one hundred cases where junction reads connected protein-coding gene exons with a downstream mono-exonic lncRNA, revealing the need for a careful evaluation of lncRNA 5′-boundaries. Our method is implemented using the statistical programming language R and is freely available as a webtool
An update on LNCipedia : a database for annotated human lncRNA sequences
The human genome is pervasively transcribed, producing thousands of non-coding RNA transcripts. The majority of these transcripts are long non-coding RNAs (lncRNAs) and novel lncRNA genes are being identified at rapid pace. To streamline these efforts, we created LNCipedia, an online repository of lncRNA transcripts and annotation. Here, we present LNCipedia 3.0 (http://www.lncipedia.org), the latest version of the publicly available human lncRNA database. Compared to the previous version of LNCipedia, the database grew over five times in size, gaining over 90 000 new lncRNA transcripts. Assessment of the protein-coding potential of LNCipedia entries is improved with state-of-the art methods that include large-scale reprocessing of publicly available proteomics data. As a result, a high-confidence set of lncRNA transcripts with low coding potential is defined and made available for download. In addition, a tool to assess lncRNA gene conservation between human, mouse and zebrafish has been implemented
- …