21,287 research outputs found

    Grouped graphical Granger modeling for gene expression regulatory networks discovery

    Get PDF
    We consider the problem of discovering gene regulatory networks from time-series microarray data. Recently, graphical Granger modeling has gained considerable attention as a promising direction for addressing this problem. These methods apply graphical modeling methods on time-series data and invoke the notion of ā€˜Granger causalityā€™ to make assertions on causality through inference on time-lagged effects. Existing algorithms, however, have neglected an important aspect of the problemā€”the group structure among the lagged temporal variables naturally imposed by the time series they belong to. Specifically, existing methods in computational biology share this shortcoming, as well as additional computational limitations, prohibiting their effective applications to the large datasets including a large number of genes and many data points. In the present article, we propose a novel methodology which we term ā€˜grouped graphical Granger modeling methodā€™, which overcomes the limitations mentioned above by applying a regression method suited for high-dimensional and large data, and by leveraging the group structure among the lagged temporal variables according to the time series they belong to. We demonstrate the effectiveness of the proposed methodology on both simulated and actual gene expression data, specifically the human cancer cell (HeLa S3) cycle data. The simulation results show that the proposed methodology generally exhibits higher accuracy in recovering the underlying causal structure. Those on the gene expression data demonstrate that it leads to improved accuracy with respect to prediction of known links, and also uncovers additional causal relationships uncaptured by earlier works

    Machine learning and network embedding methods for gene co-expression networks

    Get PDF
    High-throughput technologies such as DNA microarrays and RNA-seq are used to measure the expression levels of large numbers of genes simultaneously. To support the extraction of biological knowledge, individual gene expression levels are transformed into Gene Co-expression Networks (GCNs). GCNs are analyzed to discover gene modules. GCN construction and analysis is a well-studied topic, for nearly two decades. While new types of sequencing and the corresponding data are now available, the software package WGCNA and its most recent variants are still widely used, contributing to biological discovery. The discovery of biologically significant modules of genes from raw expression data is a non-typical unsupervised problem; while there are no training data to drive the computational discovery of modules, the biological significance of the discovered modules can be evaluated with the widely used module enrichment metric, measuring the statistical significance of the occurrence of Gene Ontology terms within the computed modules. WGCNA and other related methods are entirely heuristic and they do not leverage the aforementioned non-typical nature of the underlying unsupervised problem. The main contribution of this thesis is SGCP, a novel Self-Training Gene Clustering Pipeline for discovering modules of genes from raw expression data. SGCP almost entirely replaces the steps followed by existing methods, based on recent progress in mathematically justified unsupervised clustering algorithms. It also introduces a conceptually novel self-training step that leverages Gene Ontology information to modify and improve the set of modules computed by the unsupervised algorithm. SGCP is tested on a rich set of DNA microarrays and RNA-seq benchmarks, coming from various organisms. These tests show that SGCP greatly outperforms all previous methods, resulting in highly enriched modules. Furthermore, these modules are often quite dissimilar from those computed by previous methods, suggesting the possibility that SGCP can indeed become an auxiliary tool for extracting biological knowledge. To this end, SGCP is implemented as an easy-to-use R package that is made available on Bioconductor

    A Posterior Probability Approach for Gene Regulatory Network Inference in Genetic Perturbation Data

    Full text link
    Inferring gene regulatory networks is an important problem in systems biology. However, these networks can be hard to infer from experimental data because of the inherent variability in biological data as well as the large number of genes involved. We propose a fast, simple method for inferring regulatory relationships between genes from knockdown experiments in the NIH LINCS dataset by calculating posterior probabilities, incorporating prior information. We show that the method is able to find previously identified edges from TRANSFAC and JASPAR and discuss the merits and limitations of this approach
    • ā€¦
    corecore