2,780 research outputs found
A guided network propagation approach to identify disease genes that combines prior and new information
A major challenge in biomedical data science is to identify the causal genes
underlying complex genetic diseases. Despite the massive influx of genome
sequencing data, identifying disease-relevant genes remains difficult as
individuals with the same disease may share very few, if any, genetic variants.
Protein-protein interaction networks provide a means to tackle this
heterogeneity, as genes causing the same disease tend to be proximal within
networks. Previously, network propagation approaches have spread signal across
the network from either known disease genes or genes that are newly putatively
implicated in the disease (e.g., found to be mutated in exome studies or linked
via genome-wide association studies). Here we introduce a general framework
that considers both sources of data within a network context. Specifically, we
use prior knowledge of disease-associated genes to guide random walks initiated
from genes that are newly identified as perhaps disease-relevant. In
large-scale testing across 24 cancer types, we demonstrate that our approach
for integrating both prior and new information not only better identifies
cancer driver genes than using either source of information alone but also
readily outperforms other state-of-the-art network-based approaches. To
demonstrate the versatility of our approach, we also apply it to genome-wide
association data to identify genes functionally relevant for several complex
diseases. Overall, our work suggests that guided network propagation approaches
that utilize both prior and new data are a powerful means to identify disease
genes.Comment: RECOMB202
Identifying driver mutations in sequenced cancer genomes: computational approaches to enable precision medicine
High-throughput DNA sequencing is revolutionizing the study of cancer and enabling the measurement of the somatic mutations that drive cancer development. However, the resulting sequencing datasets are large and complex, obscuring the clinically important mutations in a background of errors, noise, and random mutations. Here, we review computational approaches to identify somatic mutations in cancer genome sequences and to distinguish the driver mutations that are responsible for cancer from random, passenger mutations. First, we describe approaches to detect somatic mutations from high-throughput DNA sequencing data, particularly for tumor samples that comprise heterogeneous populations of cells. Next, we review computational approaches that aim to predict driver mutations according to their frequency of occurrence in a cohort of samples, or according to their predicted functional impact on protein sequence or structure. Finally, we review techniques to identify recurrent combinations of somatic mutations, including approaches that examine mutations in known pathways or protein-interaction networks, as well as de novo approaches that identify combinations of mutations according to statistical patterns of mutual exclusivity. These techniques, coupled with advances in high-throughput DNA sequencing, are enabling precision medicine approaches to the diagnosis and treatment of cancer
Integrating multi-type aberrations from DNA and RNA through dynamic mapping gene space for subtype-specific breast cancer driver discovery
Driver event discovery is a crucial demand for breast cancer diagnosis and
therapy. Especially, discovering subtype-specificity of drivers can prompt the
personalized biomarker discovery and precision treatment of cancer patients.
still, most of the existing computational driver discovery studies mainly
exploit the information from DNA aberrations and gene interactions. Notably,
cancer driver events would occur due to not only DNA aberrations but also RNA
alternations, but integrating multi-type aberrations from both DNA and RNA is
still a challenging task for breast cancer drivers. On the one hand, the data
formats of different aberration types also differ from each other, known as
data format incompatibility. One the other hand, different types of aberrations
demonstrate distinct patterns across samples, known as aberration type
heterogeneity. To promote the integrated analysis of subtype-specific breast
cancer drivers, we design a "splicing-and-fusing" framework to address the
issues of data format incompatibility and aberration type heterogeneity
respectively. To overcome the data format incompatibility, the "splicing-step"
employs a knowledge graph structure to connect multi-type aberrations from the
DNA and RNA data into a unified formation. To tackle the aberration type
heterogeneity, the "fusing-step" adopts a dynamic mapping gene space
integration approach to represent the multi-type information by vectorized
profiles. The experiments also demonstrate the advantages of our approach in
both the integration of multi-type aberrations from DNA and RNA and the
discovery of subtype-specific breast cancer drivers. In summary, our
"splicing-and-fusing" framework with knowledge graph connection and dynamic
mapping gene space fusion of multi-type aberrations data from DNA and RNA can
successfully discover potential breast cancer drivers with subtype-specificity
indication.Comment: 14 pages, 5 figures, 1 tabl
Frailness and resilience of gene networks predicted by detection of co-occurring mutations via a stochastic perturbative approach
In recent years complex networks have been identified as powerful mathematical frameworks for the adequate modeling of many applied problems in disparate research fields. Assuming a Master Equation (ME) modeling the exchange of information within the network, we set up a perturbative approach in order to investigate how node alterations impact on the network information flow. The main assumption of the perturbed ME (pME) model is that the simultaneous presence of multiple node alterations causes more or less intense network frailties depending on the specific features of the perturbation. In this perspective the collective behavior of a set of molecular alterations on a gene network is a particularly adapt scenario for a first application of the proposed method, since most diseases are neither related to a single mutation nor to an established set of molecular alterations. Therefore, after characterizing the method numerically, we applied as a proof of principle the pME approach to breast cancer (BC) somatic mutation data downloaded from Cancer Genome Atlas (TCGA) database. For each patient we measured the network frailness of over 90 significant subnetworks of the protein-protein interaction network, where each perturbation was defined by patient-specific somatic mutations. Interestingly the frailness measures depend on the position of the alterations on the gene network more than on their amount, unlike most traditional enrichment scores. In particular low-degree mutations play an important role in causing high frailness measures. The potential applicability of the proposed method is wide and suggests future development in the control theory context
Network-based analysis of eQTL data to prioritize driver mutations
In clonal systems, interpreting driver genes in terms of molecular networks helps understanding how these drivers elicit an adaptive phenotype. Obtaining such a network-based understanding depends on the correct identification of driver genes. In clonal systems, independent evolved lines can acquire a similar adaptive phenotype by affecting the same molecular pathways, a phenomenon referred to as parallelism at the molecular pathway level. This implies that successful driver identification depends on interpreting mutated genes in terms of molecular networks. Driver identification and obtaining a network-based understanding of the adaptive phenotype are thus confounded problems that ideally should be solved simultaneously. In this study, a network-based eQTL method is presented that solves both the driver identification and the network-based interpretation problem. As input the method uses coupled genotype-expression phenotype data (eQTL data) of independently evolved lines with similar adaptive phenotypes and an organism-specific genome-wide interaction network. The search for mutational consistency at pathway level is defined as a subnetwork inference problem, which consists of inferring a subnetwork from the genome-wide interaction network that best connects the genes containing mutations to differentially expressed genes. Based on their connectivity with the differentially expressed genes, mutated genes are prioritized as driver genes. Based on semisynthetic data and two publicly available data sets, we illustrate the potential of the network-based eQTL method to prioritize driver genes and to gain insights in the molecular mechanisms underlying an adaptive phenotype. The method is available at http://bioinformatics.intec.ugent.be/phenetic_eqtl/index.htm
Characterization and comparison of gene-centered human interactomes
open7noFunding: Ministero dell'Istruzione, dell'Università e della Ricerca (PON ELIXIR CNRBiOmics, INTEROMICS PB05); Ministero della Salute (GR-2016-02363997); Fondazione Regionale per la Ricerca Biomedica (Regione Lombardia) (LYRA 2015-0010, ERAPERMED2018-233 FindingMS GA 779282); European Commission GEMMA (n. 825033), IMI-2 ‘HARMONY’ (n. 116026), VEO ‘Versatile Emerging infectious disease Observatory’ (n. 874735), H2020 IMforFUTURE (n. 721815); Istituto Nazionale di Fisica Nucleare ‘AIM’ Group V initiative.The complex web of macromolecular interactions occurring within cells-the interactome-is the backbone of an increasing number of studies, but a clear consensus on the exact structure of this network is still lacking. Different genome-scale maps of human interactome have been obtained through several experimental techniques and functional analyses. Moreover, these maps can be enriched through literature-mining approaches, and different combinations of various 'source' databases have been used in the literature. It is therefore unclear to which extent the various interactomes yield similar results when used in the context of interactome-based approaches in network biology. We compared a comprehensive list of human interactomes on the basis of topology, protein complexes, molecular pathways, pathway cross-talk and disease gene prediction. In a general context of relevant heterogeneity, our study provides a series of qualitative and quantitative parameters that describe the state of the art of human interactomes and guidelines for selecting interactomes in future applications.openMosca E.; Bersanelli M.; Matteuzzi T.; Di Nanni N.; Castellani G.; Milanesi L.; Remondini D.Mosca E.; Bersanelli M.; Matteuzzi T.; Di Nanni N.; Castellani G.; Milanesi L.; Remondini D
FAKE NEWS DETECTION ON THE WEB: A DEEP LEARNING BASED APPROACH
The acceptance and popularity of social media platforms for the dispersion and proliferation of news articles have led to the spread of questionable and untrusted information (in part) due to the ease by which misleading content can be created and shared among the communities. While prior research has attempted to automatically classify news articles and tweets as credible and non-credible. This work complements such research by proposing an approach that utilizes the amalgamation of Natural Language Processing (NLP), and Deep Learning techniques such as Long Short-Term Memory (LSTM).
Moreover, in Information System’s paradigm, design science research methodology (DSRM) has become the major stream that focuses on building and evaluating an artifact to solve emerging problems. Hence, DSRM can accommodate deep learning-based models with the availability of adequate datasets. Two publicly available datasets that contain labeled news articles and tweets have been used to validate the proposed model’s effectiveness. This work presents two distinct experiments, and the results demonstrate that the proposed model works well for both long sequence news articles and short-sequence texts such as tweets. Finally, the findings suggest that the sentiments, tagging, linguistics, syntactic, and text embeddings are the features that have the potential to foster fake news detection through training the proposed model on various dimensionality to learn the contextual meaning of the news content
- …