585 research outputs found

    Sparse graphical models for cancer signalling

    Get PDF
    Protein signalling networks play a key role in cellular function, and their dysregulation is central to many diseases, including cancer. Recent advances in biochemical technology have begun to allow high-throughput, data-driven studies of signalling. In this thesis, we investigate multivariate statistical methods, rooted in sparse graphical models, aimed at probing questions in cancer signalling. First, we propose a Bayesian variable selection method for identifying subsets of proteins that jointly in uence an output of interest, such as drug response. Ancillary biological information is incorporated into inference using informative prior distributions. Prior information is selected and weighted in an automated manner using an empirical Bayes formulation. We present examples of informative pathway and network-based priors, and illustrate the proposed method on both synthetic and drug response data. Second, we use dynamic Bayesian networks to perform structure learning of context-specific signalling network topology from proteomic time-course data. We exploit a connection between variable selection and network structure learning to efficiently carry out exact inference. Existing biology is incorporated using informative network priors, weighted automatically by an empirical Bayes approach. The overall approach is computationally efficient and essentially free of user-set parameters. We show results from an empirical investigation, comparing the approach to several existing methods, and from an application to breast cancer cell line data. Hypotheses are generated regarding novel signalling links, some of which are validated by independent experiments. Third, we describe a network-based clustering approach for the discovery of cancer subtypes that differ in terms of subtype-specific signalling network structure. Model-based clustering is combined with penalised likelihood estimation of undirected graphical models to allow simultaneous learning of cluster assignments and cluster-specific network structure. Results are shown from an empirical investigation comparing several penalisation regimes, and an application to breast cancer proteomic data

    Single Cell Proteomics in Biomedicine: High-dimensional Data Acquisition, Visualization and Analysis

    Get PDF
    New insights on cellular heterogeneity in the last decade provoke the development of a variety of single cell omics tools at a lightning pace. The resultant high-dimensional single cell data generated by these tools require new theoretical approaches and analytical algorithms for effective visualization and interpretation. In this review, we briefly survey the state-of-the-art single cell proteomic tools with a particular focus on data acquisition and quantification, followed by an elaboration of a number of statistical and computational approaches developed to date for dissecting the high-dimensional single cell data. The underlying assumptions, unique features, and limitations of the analytical methods with the designated biological questions they seek to answer will be discussed. Particular attention will be given to those information theoretical approaches that are anchored in a set of first principles of physics and can yield detailed (and often surprising) predictions

    Sparse graphical models for cancer signalling

    Get PDF
    Protein signalling networks play a key role in cellular function, and their dysregulation is central to many diseases, including cancer. Recent advances in biochemical technology have begun to allow high-throughput, data-driven studies of signalling. In this thesis, we investigate multivariate statistical methods, rooted in sparse graphical models, aimed at probing questions in cancer signalling. First, we propose a Bayesian variable selection method for identifying subsets of proteins that jointly in uence an output of interest, such as drug response. Ancillary biological information is incorporated into inference using informative prior distributions. Prior information is selected and weighted in an automated manner using an empirical Bayes formulation. We present examples of informative pathwayand network-based priors, and illustrate the proposed method on both synthetic and drug response data. Second, we use dynamic Bayesian networks to perform structure learning of context-specific signalling network topology from proteomic time-course data. We exploit a connection between variable selection and network structure learning to efficiently carry out exact inference. Existing biology is incorporated using informative network priors, weighted automatically by an empirical Bayes approach. The overall approach is computationally efficient and essentially free of user-set parameters. We show results from an empirical investigation, comparing the approach to several existing methods, and from an application to breast cancer cell line data. Hypotheses are generated regarding novel signalling links, some of which are validated by independent experiments. Third, we describe a network-based clustering approach for the discovery of cancer subtypes that differ in terms of subtype-specific signalling network structure. Model-based clustering is combined with penalised likelihood estimation of undirected graphical models to allow simultaneous learning of cluster assignments and cluster-specific network structure. Results are shown from an empirical investigation comparing several penalisation regimes, and an application to breast cancer proteomic data.EThOS - Electronic Theses Online ServiceEngineering and Physical Sciences Research Council (EPSRC)GBUnited Kingdo

    Identifying interactions in the time and frequency domains in local and global networks : a Granger causality approach

    Get PDF
    Background Reverse-engineering approaches such as Bayesian network inference, ordinary differential equations (ODEs) and information theory are widely applied to deriving causal relationships among different elements such as genes, proteins, metabolites, neurons, brain areas and so on, based upon multi-dimensional spatial and temporal data. There are several well-established reverse-engineering approaches to explore causal relationships in a dynamic network, such as ordinary differential equations (ODE), Bayesian networks, information theory and Granger Causality. Results Here we focused on Granger causality both in the time and frequency domain and in local and global networks, and applied our approach to experimental data (genes and proteins). For a small gene network, Granger causality outperformed all the other three approaches mentioned above. A global protein network of 812 proteins was reconstructed, using a novel approach. The obtained results fitted well with known experimental findings and predicted many experimentally testable results. In addition to interactions in the time domain, interactions in the frequency domain were also recovered. Conclusions The results on the proteomic data and gene data confirm that Granger causality is a simple and accurate approach to recover the network structure. Our approach is general and can be easily applied to other types of temporal data

    Computational Labeling, Partitioning, and Balancing of Molecular Networks

    Get PDF
    Recent advances in high throughput techniques enable large-scale molecular quantification with high accuracy, including mRNAs, proteins and metabolites. Differential expression of these molecules in case and control samples provides a way to select phenotype-associated molecules with statistically significant changes. However, given the significance ranking list of molecular changes, how those molecules work together to drive phenotype formation is still unclear. In particular, the changes in molecular quantities are insufficient to interpret the changes in their functional behavior. My study is aimed at answering this question by integrating molecular network data to systematically model and estimate the changes of molecular functional behaviors. We build three computational models to label, partition, and balance molecular networks using modern machine learning techniques. (1) Due to the incompleteness of protein functional annotation, we develop AptRank, an adaptive PageRank model for protein function prediction on bilayer networks. By integrating Gene Ontology (GO) hierarchy with protein-protein interaction network, our AptRank outperforms four state-of-the-art methods in a comprehensive evaluation using benchmark datasets. (2) We next extend our AptRank into a network partitioning method, BioSweeper, to identify functional network modules in which molecules share similar functions and also densely connect to each other. Compared to traditional network partitioning methods using only network connections, BioSweeper, which integrates the GO hierarchy, can automatically identify functionally enriched network modules. (3) Finally, we conduct a differential interaction analysis, namely difFBA, on protein-protein interaction networks by simulating protein fluxes using flux balance analysis (FBA). We test difFBA using quantitative proteomic data from colon cancer, and demonstrate that difFBA offers more insights into functional changes in molecular behavior than does protein quantity changes alone. We conclude that our integrative network model increases the observational dimensions of complex biological systems, and enables us to more deeply understand the causal relationships between genotypes and phenotypes

    Graphlet-adjacencies provide complementary views on the functional organisation of the cell and cancer mechanisms

    Get PDF
    Recent biotechnological advances have led to a wealth of biological network data. Topo- logical analysis of these networks (i.e., the analysis of their structure) has led to break- throughs in biology and medicine. The state-of-the-art topological node and network descriptors are based on graphlets, induced connected subgraphs of different shapes (e.g., paths, triangles). However, current graphlet-based methods ignore neighbourhood infor- mation (i.e., what nodes are connected). Therefore, to capture topology and connectivity information simultaneously, I introduce graphlet adjacency, which considers two nodes adjacent based on their frequency of co-occurrence on a given graphlet. I use graphlet adjacency to generalise spectral methods and apply these on molecular networks. I show that, depending on the chosen graphlet, graphlet spectral clustering uncovers clusters en- riched in different biological functions, and graphlet diffusion of gene mutation scores predicts different sets of cancer driver genes. This demonstrates that graphlet adjacency captures topology-function and topology-disease relationships in molecular networks. To further detail these relationships, I take a pathway-focused approach. To enable this investigation, I introduce graphlet eigencentrality to compute the importance of a gene in a pathway either from the local pathway perspective or from the global network perspective. I show that pathways are best described by the graphlet adjacencies that capture the importance of their functionally critical genes. I also show that cancer driver genes characteristically perform hub roles between pathways. Given the latter finding, I hypothesise that cancer pathways should be identified by changes in their pathway-pathway relationships. Within this context, I propose pathway- driven non-negative matrix tri-factorisation (PNMTF), which fuses molecular network data and pathway annotations to learn an embedding space that captures the organisation of a network as a composition of subnetworks. In this space, I measure the functional importance of a pathway or gene in the cell and its functional disruption in cancer. I apply this method to predict genes and the pathways involved in four major cancers. By using graphlet-adjacency, I can exploit the tendency of cancer-related genes to perform hub roles to improve the prediction accuracy

    Biological Networks: Modeling and Structural Analysis

    Get PDF
    Biological networks are receiving increased attention due to their importance in understanding life at the cellular level. There exist many different kinds of biological networks, and different models have been proposed for them. In this dissertation we focus on suitable network models for representing experimental data on protein interaction networks and protein complex networks (protein complexes are groups of proteins that associate to accomplish some function in the cell), and to design algorithms for exploring such networks. Our goal is to enable biologists to identify the general principles that govern the organization of protein-protein interaction networks and protein complex networks. For protein complex networks, we propose a hypergraph model which more accurately represents the data than earlier models. We define the concept of k-cores in hypergraphs, which are highly connected subhypergraphs, and design an algorithm for computing k -cores in hypergraphs. A major challenge in computational systems biology is to understand the modular structure of biological networks. We construct computational models for predicting functional modules through the use of graph clustering techniques. The application of earlier graph clustering techniques to proteomic networks does not yield good results due to the high error rates present, and the small-world and power-law properties of these networks. We discuss the various requirements that clusterings of biological networks are required to satisfy, design an algorithm for computing a clustering, and show that our clustering approach is robust and scalable. Moreover, we design a new algorithm to compute overlapping clustering rather than exclusive clustering. Our approach identifies a set of clusters and a set of bridge proteins that form the overlap among the clusters. Finally we assess the quality of our proposed clusterings using different reference sets
    • …
    corecore