124 research outputs found
Sebnif: An Integrated Bioinformatics Pipeline for the Identification of Novel Large Intergenic Noncoding RNAs (lincRNAs) - Application in Human Skeletal Muscle Cells
<div><p><i>Ab initio</i> assembly of transcriptome sequencing data has been widely used to identify large intergenic non-coding RNAs (lincRNAs), a novel class of gene regulators involved in many biological processes. To differentiate real lincRNA transcripts from thousands of assembly artifacts, a series of filtering steps such as filters of transcript length, expression level and coding potential, need to be applied. However, an easy-to-use and publicly available bioinformatics pipeline that integrates these filters is not yet available. Hence, we implemented sebnif, an integrative bioinformatics pipeline to facilitate the discovery of <i>bona fide</i> novel lincRNAs that are suitable for further functional characterization. Specifically, sebnif is the only pipeline that implements an algorithm for identifying high-quality single-exonic lincRNAs that were often omitted in many studies. To demonstrate the usage of sebnif, we applied it on a real biological RNA-seq dataset from Human Skeletal Muscle Cells (HSkMC) and built a novel lincRNA catalog containing 917 highly reliable lincRNAs. Sebnif is available at <a href="http://sunlab.lihs.cuhk.edu.hk/sebnif/" target="_blank">http://sunlab.lihs.cuhk.edu.hk/sebnif/</a>.</p></div
Snapshots of sebnif web server.
<p>(A) The data upload page. All the parameters of sebnif could be specified by the users through this page. (B) The result page showing the report of novel lincRNAs identified in Human Skeletal Muscle Cells. The final list of novel lincRNAs in standard GFF format and the iSeeRNA noncoding score for each transcript can be downloaded directly; statistic numbers during the filtering steps and the FRFE Profile and STGE Profile generated by FRFE and STGE algorithms were also provided for users to evaluate the quality of the data.</p
Identification of novel lincRNA catalog in HSkMC.
<p>(A) The raw RNA-seq data was pre-processed, aligned with Tophat and assembled using Cufflinks in <i>ab initio</i> mode. (B) Sebnif filtering on the assembled transcripts. The numbers in parentheses represent the number of transcripts after each filtering step. (C) Annotating and further filtering of the novel lincRNAs with H3K4me3 and CAGE data.</p
Analysis of the novel lincRNAs in HSkMC.
<p>(A) Cumulative curve of the average PhastCons score of the novel lincRNAs (green) compared to randomly selected genome background (blue) and known mRNAs (red). These novel lincRNAs are more conserved than the genome background but less conserved than the mRNAs. (B) Comparison of expression profiles of novel lincRNAs (green), known ncRNAs (blue) and known mRNAs (red). Both novel lincRNAs and the known ncRNAs are expressed at a lower level than known mRNAs. (C) 57% (523 out of 917) of the novel lincRNAs are divergent transcripts generated within 2 kbp upstream of known protein coding genes. (D) Gene Ontology annotation of the above protein coding genes. The y-axis shows the top 10 enriched GO terms and the x-axis shows the enrichment significance P-values.</p
Validation of the novel lincRNAs.
<p>(A) 26 randomly selected novel lincRNAs from the final lincRNA list were subjected to RT-PCR validations, among which 8 were divergent lincRNAs (the transcript id is marked in red color and ends with a ‘*’ suffix). The PCR products were visualized on Agoras gel and the sizes of DNA markers (M) are shown on the right. (B) Comparison of the identified novel lincRNAs with NONCODE v3.0. 299 transcripts (32.6%) were found in common.</p
Number of patients with different follow up periods in derivation and validation cohort.
<p>Number of patients with different follow up periods in derivation and validation cohort.</p
Plots show SC probabilities occurring within 1 year(â–¡) and 3 years(â—‹) among PMVSD patients in the derivative cohort, plotted against the scoring system.
<p>The LOESS fit lines (the solid line for 1-year and the dashed line for 3-year) using 50% fit plots show the trend of SC probability against the score. (SC, spontaneous closure).</p
- …