116 research outputs found

    SEGCECO: Subgraph Embedding of Gene expression matrix for prediction of CEll-cell COmmunication

    Get PDF
    Recently, graph-structured data has become increasingly developed in a variety of fields from biological networks to social networks. While link prediction is one of the key problems in graph theory, cell-cell communication regulates individual cell activities and is a crucial part of tissue structure and function. In this regard, recent advances in single-cell RNA sequencing technologies have eased routine analyses of intercellular signaling networks. Previous studies work on various link prediction approaches. These approaches have certain assumptions about when nodes are likely to interact, thus, showing high performance for some specific networks. Subgraph-based methods have solved this problem and outperformed other approaches by extracting local subgraphs from a given network. In this work, we present a novel method, called Subgraph Embedding of Gene expression matrix for prediction of CEll-cell COmmunication (SEGCECO), which uses an attributed graph convolutional neural network to predict cell-cell communication from single-cell RNA-seq data. SEGCECO captures the latent as well as explicit attributes of undirected, attributed graphs constructed from the gene expression profiles of individual cells. High-dimensional and sparse single-cell RNA-seq data make the process of converting the data to a graphical format a daunting task. We successfully overcome this limitation by applying SoptSC, a similarity-based optimization method in which the cell-cell similarity matrix is learned from single-cell gene expression data. The cell-cell communication network is then built using this similarity matrix. To evaluate our proposed method, we performed experiments on six scRNAseq datasets extracted from the human and mouse pancreas tissue. Our comparative analysis shows that SEGCECO outperforms latent feature-based approaches, as well as the state-of-the-art method for link prediction, WLNM, with 0.99 ROC area under the curve and 99% prediction accuracy

    Teak: A Novel Computational And Gui Software Pipeline For Reconstructing Biological Networks, Detecting Activated Biological Subnetworks, And Querying Biological Networks.

    Get PDF
    As high-throughput gene expression data becomes cheaper and cheaper, researchers are faced with a deluge of data from which biological insights need to be extracted and mined since the rate of data accumulation far exceeds the rate of data analysis. There is a need for computational frameworks to bridge the gap and assist researchers in their tasks. The Topology Enrichment Analysis frameworK (TEAK) is an open source GUI and software pipeline that seeks to be one of many tools that fills in this gap and consists of three major modules. The first module, the Gene Set Cultural Algorithm, de novo infers biological networks from gene sets using the KEGG pathways as prior knowledge. The second and third modules query against the KEGG pathways using molecular profiling data and query graphs, respectively. In particular, the second module, also called TEAK, is a network partitioning module that partitions the KEGG pathways into both linear and nonlinear subpathways. In conjunction with molecular profiling data, the subpathways are ranked and displayed to the user within the TEAK GUI. Using a public microarray yeast data set, previously unreported fitness defects for dpl1 delta and lag1 delta mutants under conditions of nitrogen limitation were found using TEAK. Finally, the third module, the Query Structure Enrichment Analysis framework, is a network query module that allows researchers to query their biological hypotheses in the form of Directed Acyclic Graphs against the KEGG pathways

    ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context

    Get PDF
    Elucidating gene regulatory networks is crucial for understanding normal cell physiology and complex pathologic phenotypes. Existing computational methods for the genome-wide "reverse engineering" of such networks have been successful only for lower eukaryotes with simple genomes. Here we present ARACNE, a novel algorithm, using microarray expression profiles, specifically designed to scale up to the complexity of regulatory networks in mammalian cells, yet general enough to address a wider range of network deconvolution problems. This method uses an information theoretic approach to eliminate the majority of indirect interactions inferred by co-expression methods. We prove that ARACNE reconstructs the network exactly (asymptotically) if the effect of loops in the network topology is negligible, and we show that the algorithm works well in practice, even in the presence of numerous loops and complex topologies. We assess ARACNE's ability to reconstruct transcriptional regulatory networks using both a realistic synthetic dataset and a microarray dataset from human B cells. On synthetic datasets ARACNE achieves very low error rates and outperforms established methods, such as Relevance Networks and Bayesian Networks. Application to the deconvolution of genetic networks in human B cells demonstrates ARACNE's ability to infer validated transcriptional targets of the cMYC proto-oncogene. We also study the effects of misestimation of mutual information on network reconstruction, and show that algorithms based on mutual information ranking are more resilient to estimation errors. ARACNE shows promise in identifying direct transcriptional interactions in mammalian cellular networks, a problem that has challenged existing reverse engineering algorithms. This approach should enhance our ability to use microarray data to elucidate functional mechanisms that underlie cellular processes and to identify molecular targets of pharmacological compounds in mammalian cellular networks

    Towards Explainable Deep Models for Images, Texts, and Graphs

    Get PDF
    Deep neural networks have been widely studied and applied to different applications in recent years due to their great performance. Even though deep models are shown to be powerful and promising, most of them are developed as black boxes. However, without meaningful explanations of how and why predictions are made, we do not fully understand their inner working mechanisms. Hence, such models cannot be fully trusted, which prevents their use in critical applications pertaining to fairness, privacy, and safety. This raises the need of explaining deep learning models and investigating several questions; some of those are, what input factors are important to the predictions? how the decisions are made through deep networks? and what is the meaning of hidden neurons? In this dissertation, we investigate different explanation techniques for different types of deep models. In particular, we explore both instance-level and model-level explanations for image models, text models, and graph models. Understanding deep image models is the most straightforward choice for explaining deep models since images are naturally well presented and can be easily visualized. Hence, we start by proposing a novel discrete masking method for explaining deep image classifiers. Our method follows the generative adversarial network formalism that the deep model to be explained is regarded as the discriminator while we train a generator to explain it. The generator is trained to capture discriminative image regions that should convey the same or similar semantic meaning as the original image from the model's perspective. It produces a probability map from which a discrete mask can be sampled. Then the discriminator is used to measure the quality of the sampled mask and provide feedback for updating the generator. Due to the sampling operations, the generator cannot be trained directly by back-propagation. We propose to update it using the policy gradient. Furthermore, we propose to incorporate gradients as auxiliary information to reduce the search space and facilitate training. We conduct both quantitative and qualitative experiments on the ILSVRC dataset to demonstrate the effectiveness of our proposed method. Experimental results indicate that our method can provide reasonable explanations for both correct and incorrect predictions and outperform existing approaches. In addition, our method can pass the model randomization test, indicating that it is reasoning the attribution of network predictions. Unlike image models, text models are more difficult to explain since texts are represented as discrete variables and cannot be directly visualized. In addition, most explanation methods only focus on the input space of the models and ignore the hidden space. Hence, we propose to explain deep models for text analysis by exploring the meaning of hidden space. Specifically, we propose an approach to investigate the meaning of hidden neurons of the convolutional neural network models for sentence classification tasks. We first employ the saliency map technique to identify important spatial locations in the hidden layers. Then we use optimization techniques to approximate the detected information of these hidden locations from input sentences. Furthermore, we develop regularization terms and explore words in vocabulary to explain such detected information. Experimental results demonstrate that our approach can identify meaningful and reasonable explanations for hidden spatial locations. Additionally, we show that our approach can describe the decision procedure of deep text models. These facts further motivate us to study the explanation techniques for graph neural networks (GNNs). Unlike images and texts, graph data are usually represented as continuous feature matrices and discrete adjacency matrices. The structural information in the adjacency matrices is important, which should be considered when providing explanations. Thus, methods for images and texts cannot be directly applied. Hence, we investigate both instance-level and model-level explanations of GNNs to provide a comprehensive understanding. First, existing methods invariably focus on explaining the importance of graph nodes or edges but ignore the substructures of graphs, which are more intuitive and human-intelligible. To provide instance-level explanations for GNNs, we propose a novel method, known as SubgraphX, to explain GNNs by identifying important subgraphs. Given a trained GNN model and an input graph, our SubgraphX explains its predictions by efficiently exploring different subgraphs with the Monte Carlo tree search. To make the tree search more effective, we propose to use Shapley values as a measure of subgraph importance, which can also capture the interactions among different subgraphs. To expedite computations, we propose efficient approximation schemes to compute Shapley values for graph data. Our work represents the first attempt to explain GNNs via identifying subgraphs explicitly. Experimental results show that our SubgraphX achieves significantly improved explanations, while keeping computations at a reasonable level. Second, while most existing explanation methods only provide instance-level explanations, none of them can provide high-level understanding. We propose a novel approach, known as XGNN, to explain GNNs at the model-level. Our approach can provide high-level insights and a generic understanding of how GNNs work. In particular, we propose to explain GNNs by training a graph generator so that the generated graph patterns maximize a certain prediction of the model. We formulate the graph generation as a reinforcement learning task, where for each step, the graph generator predicts how to add an edge into the current graph. The graph generator is trained via a policy gradient method based on information from the trained GNNs. In addition, we incorporate several graph rules to encourage the generated graphs to be valid. Experimental results on both synthetic and real-world datasets show that our proposed methods help understand and verify the trained GNNs

    Chart recognition and interpretation in document images

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH
    • …
    corecore