223 research outputs found

    GSAE: an autoencoder with embedded gene-set nodes for genomics functional characterization

    Full text link
    Bioinformatics tools have been developed to interpret gene expression data at the gene set level, and these gene set based analyses improve the biologists' capability to discover functional relevance of their experiment design. While elucidating gene set individually, inter gene sets association is rarely taken into consideration. Deep learning, an emerging machine learning technique in computational biology, can be used to generate an unbiased combination of gene set, and to determine the biological relevance and analysis consistency of these combining gene sets by leveraging large genomic data sets. In this study, we proposed a gene superset autoencoder (GSAE), a multi-layer autoencoder model with the incorporation of a priori defined gene sets that retain the crucial biological features in the latent layer. We introduced the concept of the gene superset, an unbiased combination of gene sets with weights trained by the autoencoder, where each node in the latent layer is a superset. Trained with genomic data from TCGA and evaluated with their accompanying clinical parameters, we showed gene supersets' ability of discriminating tumor subtypes and their prognostic capability. We further demonstrated the biological relevance of the top component gene sets in the significant supersets. Using autoencoder model and gene superset at its latent layer, we demonstrated that gene supersets retain sufficient biological information with respect to tumor subtypes and clinical prognostic significance. Superset also provides high reproducibility on survival analysis and accurate prediction for cancer subtypes.Comment: Presented in the International Conference on Intelligent Biology and Medicine (ICIBM 2018) at Los Angeles, CA, USA and published in BMC Systems Biology 2018, 12(Suppl 8):14

    Predicting drug response of tumors from integrated genomic profiles by deep neural networks

    Full text link
    The study of high-throughput genomic profiles from a pharmacogenomics viewpoint has provided unprecedented insights into the oncogenic features modulating drug response. A recent screening of ~1,000 cancer cell lines to a collection of anti-cancer drugs illuminated the link between genotypes and vulnerability. However, due to essential differences between cell lines and tumors, the translation into predicting drug response in tumors remains challenging. Here we proposed a DNN model to predict drug response based on mutation and expression profiles of a cancer cell or a tumor. The model contains a mutation and an expression encoders pre-trained using a large pan-cancer dataset to abstract core representations of high-dimension data, followed by a drug response predictor network. Given a pair of mutation and expression profiles, the model predicts IC50 values of 265 drugs. We trained and tested the model on a dataset of 622 cancer cell lines and achieved an overall prediction performance of mean squared error at 1.96 (log-scale IC50 values). The performance was superior in prediction error or stability than two classical methods and four analog DNNs of our model. We then applied the model to predict drug response of 9,059 tumors of 33 cancer types. The model predicted both known, including EGFR inhibitors in non-small cell lung cancer and tamoxifen in ER+ breast cancer, and novel drug targets. The comprehensive analysis further revealed the molecular mechanisms underlying the resistance to a chemotherapeutic drug docetaxel in a pan-cancer setting and the anti-cancer potential of a novel agent, CX-5461, in treating gliomas and hematopoietic malignancies. Overall, our model and findings improve the prediction of drug response and the identification of novel therapeutic options.Comment: Accepted for presentation in the International Conference on Intelligent Biology and Medicine (ICIBM 2018) at Los Angeles, CA, USA. Currently under consideration for publication in a Supplement Issue of BMC Genomic

    A GPU-accelerated package for simulation of flow in nanoporous source rocks with many-body dissipative particle dynamics

    Full text link
    Mesoscopic simulations of hydrocarbon flow in source shales are challenging, in part due to the heterogeneous shale pores with sizes ranging from a few nanometers to a few micrometers. Additionally, the sub-continuum fluid-fluid and fluid-solid interactions in nano- to micro-scale shale pores, which are physically and chemically sophisticated, must be captured. To address those challenges, we present a GPU-accelerated package for simulation of flow in nano- to micro-pore networks with a many-body dissipative particle dynamics (mDPD) mesoscale model. Based on a fully distributed parallel paradigm, the code offloads all intensive workloads on GPUs. Other advancements, such as smart particle packing and no-slip boundary condition in complex pore geometries, are also implemented for the construction and the simulation of the realistic shale pores from 3D nanometer-resolution stack images. Our code is validated for accuracy and compared against the CPU counterpart for speedup. In our benchmark tests, the code delivers nearly perfect strong scaling and weak scaling (with up to 512 million particles) on up to 512 K20X GPUs on Oak Ridge National Laboratory's (ORNL) Titan supercomputer. Moreover, a single-GPU benchmark on ORNL's SummitDev and IBM's AC922 suggests that the host-to-device NVLink can boost performance over PCIe by a remarkable 40\%. Lastly, we demonstrate, through a flow simulation in realistic shale pores, that the CPU counterpart requires 840 Power9 cores to rival the performance delivered by our package with four V100 GPUs on ORNL's Summit architecture. This simulation package enables quick-turnaround and high-throughput mesoscopic numerical simulations for investigating complex flow phenomena in nano- to micro-porous rocks with realistic pore geometries
    corecore