2,780 research outputs found

    A guided network propagation approach to identify disease genes that combines prior and new information

    Full text link
    A major challenge in biomedical data science is to identify the causal genes underlying complex genetic diseases. Despite the massive influx of genome sequencing data, identifying disease-relevant genes remains difficult as individuals with the same disease may share very few, if any, genetic variants. Protein-protein interaction networks provide a means to tackle this heterogeneity, as genes causing the same disease tend to be proximal within networks. Previously, network propagation approaches have spread signal across the network from either known disease genes or genes that are newly putatively implicated in the disease (e.g., found to be mutated in exome studies or linked via genome-wide association studies). Here we introduce a general framework that considers both sources of data within a network context. Specifically, we use prior knowledge of disease-associated genes to guide random walks initiated from genes that are newly identified as perhaps disease-relevant. In large-scale testing across 24 cancer types, we demonstrate that our approach for integrating both prior and new information not only better identifies cancer driver genes than using either source of information alone but also readily outperforms other state-of-the-art network-based approaches. To demonstrate the versatility of our approach, we also apply it to genome-wide association data to identify genes functionally relevant for several complex diseases. Overall, our work suggests that guided network propagation approaches that utilize both prior and new data are a powerful means to identify disease genes.Comment: RECOMB202

    Identifying driver mutations in sequenced cancer genomes: computational approaches to enable precision medicine

    Get PDF
    High-throughput DNA sequencing is revolutionizing the study of cancer and enabling the measurement of the somatic mutations that drive cancer development. However, the resulting sequencing datasets are large and complex, obscuring the clinically important mutations in a background of errors, noise, and random mutations. Here, we review computational approaches to identify somatic mutations in cancer genome sequences and to distinguish the driver mutations that are responsible for cancer from random, passenger mutations. First, we describe approaches to detect somatic mutations from high-throughput DNA sequencing data, particularly for tumor samples that comprise heterogeneous populations of cells. Next, we review computational approaches that aim to predict driver mutations according to their frequency of occurrence in a cohort of samples, or according to their predicted functional impact on protein sequence or structure. Finally, we review techniques to identify recurrent combinations of somatic mutations, including approaches that examine mutations in known pathways or protein-interaction networks, as well as de novo approaches that identify combinations of mutations according to statistical patterns of mutual exclusivity. These techniques, coupled with advances in high-throughput DNA sequencing, are enabling precision medicine approaches to the diagnosis and treatment of cancer

    Integrating multi-type aberrations from DNA and RNA through dynamic mapping gene space for subtype-specific breast cancer driver discovery

    Full text link
    Driver event discovery is a crucial demand for breast cancer diagnosis and therapy. Especially, discovering subtype-specificity of drivers can prompt the personalized biomarker discovery and precision treatment of cancer patients. still, most of the existing computational driver discovery studies mainly exploit the information from DNA aberrations and gene interactions. Notably, cancer driver events would occur due to not only DNA aberrations but also RNA alternations, but integrating multi-type aberrations from both DNA and RNA is still a challenging task for breast cancer drivers. On the one hand, the data formats of different aberration types also differ from each other, known as data format incompatibility. One the other hand, different types of aberrations demonstrate distinct patterns across samples, known as aberration type heterogeneity. To promote the integrated analysis of subtype-specific breast cancer drivers, we design a "splicing-and-fusing" framework to address the issues of data format incompatibility and aberration type heterogeneity respectively. To overcome the data format incompatibility, the "splicing-step" employs a knowledge graph structure to connect multi-type aberrations from the DNA and RNA data into a unified formation. To tackle the aberration type heterogeneity, the "fusing-step" adopts a dynamic mapping gene space integration approach to represent the multi-type information by vectorized profiles. The experiments also demonstrate the advantages of our approach in both the integration of multi-type aberrations from DNA and RNA and the discovery of subtype-specific breast cancer drivers. In summary, our "splicing-and-fusing" framework with knowledge graph connection and dynamic mapping gene space fusion of multi-type aberrations data from DNA and RNA can successfully discover potential breast cancer drivers with subtype-specificity indication.Comment: 14 pages, 5 figures, 1 tabl

    Frailness and resilience of gene networks predicted by detection of co-occurring mutations via a stochastic perturbative approach

    Get PDF
    In recent years complex networks have been identified as powerful mathematical frameworks for the adequate modeling of many applied problems in disparate research fields. Assuming a Master Equation (ME) modeling the exchange of information within the network, we set up a perturbative approach in order to investigate how node alterations impact on the network information flow. The main assumption of the perturbed ME (pME) model is that the simultaneous presence of multiple node alterations causes more or less intense network frailties depending on the specific features of the perturbation. In this perspective the collective behavior of a set of molecular alterations on a gene network is a particularly adapt scenario for a first application of the proposed method, since most diseases are neither related to a single mutation nor to an established set of molecular alterations. Therefore, after characterizing the method numerically, we applied as a proof of principle the pME approach to breast cancer (BC) somatic mutation data downloaded from Cancer Genome Atlas (TCGA) database. For each patient we measured the network frailness of over 90 significant subnetworks of the protein-protein interaction network, where each perturbation was defined by patient-specific somatic mutations. Interestingly the frailness measures depend on the position of the alterations on the gene network more than on their amount, unlike most traditional enrichment scores. In particular low-degree mutations play an important role in causing high frailness measures. The potential applicability of the proposed method is wide and suggests future development in the control theory context

    Network-based analysis of eQTL data to prioritize driver mutations

    Get PDF
    In clonal systems, interpreting driver genes in terms of molecular networks helps understanding how these drivers elicit an adaptive phenotype. Obtaining such a network-based understanding depends on the correct identification of driver genes. In clonal systems, independent evolved lines can acquire a similar adaptive phenotype by affecting the same molecular pathways, a phenomenon referred to as parallelism at the molecular pathway level. This implies that successful driver identification depends on interpreting mutated genes in terms of molecular networks. Driver identification and obtaining a network-based understanding of the adaptive phenotype are thus confounded problems that ideally should be solved simultaneously. In this study, a network-based eQTL method is presented that solves both the driver identification and the network-based interpretation problem. As input the method uses coupled genotype-expression phenotype data (eQTL data) of independently evolved lines with similar adaptive phenotypes and an organism-specific genome-wide interaction network. The search for mutational consistency at pathway level is defined as a subnetwork inference problem, which consists of inferring a subnetwork from the genome-wide interaction network that best connects the genes containing mutations to differentially expressed genes. Based on their connectivity with the differentially expressed genes, mutated genes are prioritized as driver genes. Based on semisynthetic data and two publicly available data sets, we illustrate the potential of the network-based eQTL method to prioritize driver genes and to gain insights in the molecular mechanisms underlying an adaptive phenotype. The method is available at http://bioinformatics.intec.ugent.be/phenetic_eqtl/index.htm

    Characterization and comparison of gene-centered human interactomes

    Get PDF
    open7noFunding: Ministero dell'Istruzione, dell'Università e della Ricerca (PON ELIXIR CNRBiOmics, INTEROMICS PB05); Ministero della Salute (GR-2016-02363997); Fondazione Regionale per la Ricerca Biomedica (Regione Lombardia) (LYRA 2015-0010, ERAPERMED2018-233 FindingMS GA 779282); European Commission GEMMA (n. 825033), IMI-2 ‘HARMONY’ (n. 116026), VEO ‘Versatile Emerging infectious disease Observatory’ (n. 874735), H2020 IMforFUTURE (n. 721815); Istituto Nazionale di Fisica Nucleare ‘AIM’ Group V initiative.The complex web of macromolecular interactions occurring within cells-the interactome-is the backbone of an increasing number of studies, but a clear consensus on the exact structure of this network is still lacking. Different genome-scale maps of human interactome have been obtained through several experimental techniques and functional analyses. Moreover, these maps can be enriched through literature-mining approaches, and different combinations of various 'source' databases have been used in the literature. It is therefore unclear to which extent the various interactomes yield similar results when used in the context of interactome-based approaches in network biology. We compared a comprehensive list of human interactomes on the basis of topology, protein complexes, molecular pathways, pathway cross-talk and disease gene prediction. In a general context of relevant heterogeneity, our study provides a series of qualitative and quantitative parameters that describe the state of the art of human interactomes and guidelines for selecting interactomes in future applications.openMosca E.; Bersanelli M.; Matteuzzi T.; Di Nanni N.; Castellani G.; Milanesi L.; Remondini D.Mosca E.; Bersanelli M.; Matteuzzi T.; Di Nanni N.; Castellani G.; Milanesi L.; Remondini D

    FAKE NEWS DETECTION ON THE WEB: A DEEP LEARNING BASED APPROACH

    Get PDF
    The acceptance and popularity of social media platforms for the dispersion and proliferation of news articles have led to the spread of questionable and untrusted information (in part) due to the ease by which misleading content can be created and shared among the communities. While prior research has attempted to automatically classify news articles and tweets as credible and non-credible. This work complements such research by proposing an approach that utilizes the amalgamation of Natural Language Processing (NLP), and Deep Learning techniques such as Long Short-Term Memory (LSTM). Moreover, in Information System’s paradigm, design science research methodology (DSRM) has become the major stream that focuses on building and evaluating an artifact to solve emerging problems. Hence, DSRM can accommodate deep learning-based models with the availability of adequate datasets. Two publicly available datasets that contain labeled news articles and tweets have been used to validate the proposed model’s effectiveness. This work presents two distinct experiments, and the results demonstrate that the proposed model works well for both long sequence news articles and short-sequence texts such as tweets. Finally, the findings suggest that the sentiments, tagging, linguistics, syntactic, and text embeddings are the features that have the potential to foster fake news detection through training the proposed model on various dimensionality to learn the contextual meaning of the news content
    • …
    corecore