40 research outputs found

    Discerning Novel Splice Junctions Derived from RNA-Seq Alignment: A Deep Learning Approach

    Get PDF
    Background: Exon splicing is a regulated cellular process in the transcription of protein-coding genes. Technological advancements and cost reductions in RNA sequencing have made quantitative and qualitative assessments of the transcriptome both possible and widely available. RNA-seq provides unprecedented resolution to identify gene structures and resolve the diversity of splicing variants. However, currently available ab initio aligners are vulnerable to spurious alignments due to random sequence matches and sample-reference genome discordance. As a consequence, a significant set of false positive exon junction predictions would be introduced, which will further confuse downstream analyses of splice variant discovery and abundance estimation. Results: In this work, we present a deep learning based splice junction sequence classifier, named DeepSplice, which employs convolutional neural networks to classify candidate splice junctions. We show (I) DeepSplice outperforms state-of-the-art methods for splice site classification when applied to the popular benchmark dataset HS3D, (II) DeepSplice shows high accuracy for splice junction classification with GENCODE annotation, and (III) the application of DeepSplice to classify putative splice junctions generated by Rail-RNA alignment of 21,504 human RNA-seq data significantly reduces 43 million candidates into around 3 million highly confident novel splice junctions. Conclusions: A model inferred from the sequences of annotated exon junctions that can then classify splice junctions derived from primary RNA-seq data has been implemented. The performance of the model was evaluated and compared through comprehensive benchmarking and testing, indicating a reliable performance and gross usability for classifying novel splice junctions derived from RNA-seq alignment

    SeqOthello: Querying RNA-Seq Experiments at Scale

    Get PDF
    We present SeqOthello, an ultra-fast and memory-efficient indexing structure to support arbitrary sequence query against large collections of RNA-seq experiments. It takes SeqOthello only 5 min and 19.1 GB memory to conduct a global survey of 11,658 fusion events against 10,113 TCGA Pan-Cancer RNA-seq datasets. The query recovers 92.7% of tier-1 fusions curated by TCGA Fusion Gene Database and reveals 270 novel occurrences, all of which are present as tumor-specific. By providing a reference-free, alignment-free, and parameter-free sequence search system, SeqOthello will enable large-scale integrative studies using sequence-level data, an undertaking not previously practicable for many individual labs

    Network Modeling Identifies Molecular Functions Targeted by miR-204 to Suppress Head and Neck Tumor Metastasis

    Get PDF
    Due to the large number of putative microRNA gene targets predicted by sequence-alignment databases and the relative low accuracy of such predictions which are conducted independently of biological context by design, systematic experimental identification and validation of every functional microRNA target is currently challenging. Consequently, biological studies have yet to identify, on a genome scale, key regulatory networks perturbed by altered microRNA functions in the context of cancer. In this report, we demonstrate for the first time how phenotypic knowledge of inheritable cancer traits and of risk factor loci can be utilized jointly with gene expression analysis to efficiently prioritize deregulated microRNAs for biological characterization. Using this approach we characterize miR-204 as a tumor suppressor microRNA and uncover previously unknown connections between microRNA regulation, network topology, and expression dynamics. Specifically, we validate 18 gene targets of miR-204 that show elevated mRNA expression and are enriched in biological processes associated with tumor progression in squamous cell carcinoma of the head and neck (HNSCC). We further demonstrate the enrichment of bottleneckness, a key molecular network topology, among miR-204 gene targets. Restoration of miR-204 function in HNSCC cell lines inhibits the expression of its functionally related gene targets, leads to the reduced adhesion, migration and invasion in vitro and attenuates experimental lung metastasis in vivo. As importantly, our investigation also provides experimental evidence linking the function of microRNAs that are located in the cancer-associated genomic regions (CAGRs) to the observed predisposition to human cancers. Specifically, we show miR-204 may serve as a tumor suppressor gene at the 9q21.1–22.3 CAGR locus, a well established risk factor locus in head and neck cancers for which tumor suppressor genes have not been identified. This new strategy that integrates expression profiling, genetics and novel computational biology approaches provides for improved efficiency in characterization and modeling of microRNA functions in cancer as compared to the state of art and is applicable to the investigation of microRNA functions in other biological processes and diseases

    Impact of individual level uncertainty of lung cancer polygenic risk score (PRS) on risk stratification

    Get PDF
    Abstract Background Although polygenic risk score (PRS) has emerged as a promising tool for predicting cancer risk from genome-wide association studies (GWAS), the individual-level accuracy of lung cancer PRS and the extent to which its impact on subsequent clinical applications remains largely unexplored. Methods Lung cancer PRSs and confidence/credible interval (CI) were constructed using two statistical approaches for each individual: (1) the weighted sum of 16 GWAS-derived significant SNP loci and the CI through the bootstrapping method (PRS-16-CV) and (2) LDpred2 and the CI through posteriors sampling (PRS-Bayes), among 17,166 lung cancer cases and 12,894 controls with European ancestry from the International Lung Cancer Consortium. Individuals were classified into different genetic risk subgroups based on the relationship between their own PRS mean/PRS CI and the population level threshold. Results Considerable variances in PRS point estimates at the individual level were observed for both methods, with an average standard deviation (s.d.) of 0.12 for PRS-16-CV and a much larger s.d. of 0.88 for PRS-Bayes. Using PRS-16-CV, only 25.0% of individuals with PRS point estimates in the lowest decile of PRS and 16.8% in the highest decile have their entire 95% CI fully contained in the lowest and highest decile, respectively, while PRS-Bayes was unable to find any eligible individuals. Only 19% of the individuals were concordantly identified as having high genetic risk (&gt; 90th percentile) using the two PRS estimators. An increased relative risk of lung cancer comparing the highest PRS percentile to the lowest was observed when taking the CI into account (OR = 2.73, 95% CI: 2.12–3.50, P-value = 4.13 × 10−15) compared to using PRS-16-CV mean (OR = 2.23, 95% CI: 1.99–2.49, P-value = 5.70 × 10−46). Improved risk prediction performance with higher AUC was consistently observed in individuals identified by PRS-16-CV CI, and the best performance was achieved by incorporating age, gender, and detailed smoking pack-years (AUC: 0.73, 95% CI = 0.72–0.74). Conclusions Lung cancer PRS estimates using different methods have modest correlations at the individual level, highlighting the importance of considering individual-level uncertainty when evaluating the practical utility of PRS. </jats:sec

    Discerning novel splice junctions derived from RNA-seq alignment: a deep learning approach

    Get PDF
    Abstract Background Exon splicing is a regulated cellular process in the transcription of protein-coding genes. Technological advancements and cost reductions in RNA sequencing have made quantitative and qualitative assessments of the transcriptome both possible and widely available. RNA-seq provides unprecedented resolution to identify gene structures and resolve the diversity of splicing variants. However, currently available ab initio aligners are vulnerable to spurious alignments due to random sequence matches and sample-reference genome discordance. As a consequence, a significant set of false positive exon junction predictions would be introduced, which will further confuse downstream analyses of splice variant discovery and abundance estimation. Results In this work, we present a deep learning based splice junction sequence classifier, named DeepSplice, which employs convolutional neural networks to classify candidate splice junctions. We show (I) DeepSplice outperforms state-of-the-art methods for splice site classification when applied to the popular benchmark dataset HS3D, (II) DeepSplice shows high accuracy for splice junction classification with GENCODE annotation, and (III) the application of DeepSplice to classify putative splice junctions generated by Rail-RNA alignment of 21,504 human RNA-seq data significantly reduces 43 million candidates into around 3 million highly confident novel splice junctions. Conclusions A model inferred from the sequences of annotated exon junctions that can then classify splice junctions derived from primary RNA-seq data has been implemented. The performance of the model was evaluated and compared through comprehensive benchmarking and testing, indicating a reliable performance and gross usability for classifying novel splice junctions derived from RNA-seq alignment

    A Multicomponent Thermal Fluid Numerical Simulation Method considering Formation Damage

    Full text link
    Multicomponent thermal fluid huff and puff is an innovative heavy oil development technology for heavy oil reservoirs, which has been widely used in offshore oilfields in China and has proved to be a promising method for enhancing oil recovery. Components of multicomponent thermal fluids contain many components, including carbon dioxide, nitrogen, and steam. Under high temperature and high pressure conditions, the complex physical and chemical reactions between multicomponent thermal fluids and reservoir rocks occur, which damage the pore structure and permeability of core. In this paper, the authors set up a reservoir damage experimental device, tested the formation permeability before and after the injection of multiple-component thermal fluids, and obtained the formation damage model. The multicomponent thermal fluid formation damage model is embedded in the component control equation, the finite difference method is used to discretize the control equation, and a new multielement thermal fluid numerical simulator is established. The physical simulation experiment of multicomponent thermal fluid huff and puff is carried out by using the actual sand-packed model. By comparing the experimental results with the numerical simulation results, it is proved that the new numerical simulation model considering formation damage proposed in this paper is accurate and reliable

    Optimizing Livers for Transplantation Using Machine Perfusion versus Cold Storage in Large Animal Studies and Human Studies: A Systematic Review and Meta-Analysis

    Full text link
    Background. Liver allograft preservation frequently involves static cold storage (CS) and machine perfusion (MP). With its increasing popularity, we investigated whether MP was superior to CS in terms of beneficial outcomes. Methods. Human studies and large animal studies that optimized livers for transplantation using MP versus CS were assessed (PubMed/Medline/EMBASE). Meta-analyses were conducted for comparisons. Study quality was assessed according to the Newcastle-Ottawa quality assessment scale and SYRCLE’s risk of bias tool. Results. Nineteen studies were included. Among the large animal studies, lower levels of lactate dehydrogenase (SMD -3.16, 95% CI -5.14 to -1.18), alanine transferase (SMD -2.46, 95% CI -4.03 to -0.90), and hyaluronic acid (SMD -2.48, 95% CI -4.21 to -0.74) were observed in SNMP-preserved compared to CS-preserved livers. NMP-preserved livers showing lower level of hyaluronic acid (SMD -3.97, 95% CI -5.46 to -2.47) compared to CS-preserved livers. Biliary complications (RR 0.45, 95% CI 0.28 to 0.73) and early graft dysfunction (RR 0.56, 95% CI 0.34 to 0.92) also significantly reduced with HMP preservation in human studies. No evidence of publication bias was found. Conclusions. MP preservation could improve short-term outcomes after transplantation compared to CS preservation. Additional randomized controlled trials (RCTs) are needed to develop clinical applications of MP preservation
    corecore