24 research outputs found

    Compact Integration of Multi-Network Topology for Functional Analysis of Genes

    Get PDF
    The topological landscape of molecular or functional interaction networks provides a rich source of information for inferring functional patterns of genes or proteins. However, a pressing yet-unsolved challenge is how to combine multiple heterogeneous networks, each having different connectivity patterns, to achieve more accurate inference. Here, we describe the Mashup framework for scalable and robust network integration. In Mashup, the diffusion in each network is first analyzed to characterize the topological context of each node. Next, the high-dimensional topological patterns in individual networks are canonically represented using low-dimensional vectors, one per gene or protein. These vectors can then be plugged into off-the-shelf machine learning methods to derive functional insights about genes or proteins. We present tools based on Mashup that achieve state-of-the-art performance in three diverse functional inference tasks: protein function prediction, gene ontology reconstruction, and genetic interaction prediction. Mashup enables deeper insights into the struct ure of rapidly accumulating and diverse biological network data and can be broadly applied to other network science domains. Keywords: interactome analysis; network integration; heterogeneous networks; dimensionality reduction; network diffusion; gene function prediction; genetic interaction prediction; gene ontology reconstruction; drug response predictionNational Institutes of Health (U.S.) (Grant R01GM081871

    Inferring gene ontologies from pairwise similarity data.

    Get PDF
    MotivationWhile the manually curated Gene Ontology (GO) is widely used, inferring a GO directly from -omics data is a compelling new problem. Recognizing that ontologies are a directed acyclic graph (DAG) of terms and hierarchical relations, algorithms are needed that: analyze a full matrix of gene-gene pairwise similarities from -omics data; infer true hierarchical structure in these data rather than enforcing hierarchy as a computational artifact; and respect biological pleiotropy, by which a term in the hierarchy can relate to multiple higher level terms. Methods addressing these requirements are just beginning to emerge-none has been evaluated for GO inference.MethodsWe consider two algorithms [Clique Extracted Ontology (CliXO), LocalFitness] that uniquely satisfy these requirements, compared with methods including standard clustering. CliXO is a new approach that finds maximal cliques in a network induced by progressive thresholding of a similarity matrix. We evaluate each method's ability to reconstruct the GO biological process ontology from a similarity matrix based on (a) semantic similarities for GO itself or (b) three -omics datasets for yeast.ResultsFor task (a) using semantic similarity, CliXO accurately reconstructs GO (>99% precision, recall) and outperforms other approaches (<20% precision, <20% recall). For task (b) using -omics data, CliXO outperforms other methods using two -omics datasets and achieves ∼30% precision and recall using YeastNet v3, similar to an earlier approach (Network Extracted Ontology) and better than LocalFitness or standard clustering (20-25% precision, recall).ConclusionThis study provides algorithmic foundation for building gene ontologies by capturing hierarchical and pleiotropic structure embedded in biomolecular data

    Computational methods to explore hierarchical and modular structure of biological networks

    Get PDF
    Networks have been widely used to understand structure of complex systems. From studying biological networks of protein-protein, genetic and other types of interactions, we gain insights into functional organization of static biological systems that could hardly be measured experimentally in current state-of-the-art technology. Biological networks also serve as a principled framework that integrates multiple sources of genome-wide data sets such as gene expression arrays and sequencing. Yet, a large-scale network is often intractable for intuitive visualization and computation. We developed novel network clustering algorithms to harness the power of genome-scale biological networks of all genes/proteins. Especially our algorithms were capable of finding hidden modular structures in hierarchical stochastic block model. Since the modules are organized hierarchically, our algorithms facilitate downstream analysis and design of in-depth validation experiments in ``divide-and-conquer'' strategy. Moreover, we present empirical evidence that the hierarchical and modular structure best explains observed biological networks. We used the static clustering methods in two ways. First we sought to extend the static methods to dynamic clustering problems, and observed general patterns of dynamics of network modules. For examples we demonstrate dynamics of yeast metabolic cycle and Arabidopsis root developmental process. Moreover, we propose a prioritization scheme that sorts identified network modules in the order of discriminative power. In the course of research we conclude that biological networks are best understood as hierarchically organized modules, and the modules remain stable in unperturbed biological process, but they can respond differently to abnormal / external perturbations such as knock-down of key enzymes

    IDENTIFICATION OF ABERRANT PATHWAY AND NETWORK ACTIVITY FROM HIGH-THROUGHPUT DATA

    Full text link

    Using the hierarchy of biological ontologies to identify mechanisms in flat networks

    Get PDF
    Systems biology has provided new resources for discovering and reasoning about mechanisms. In addition to generating databases of large bodies of data, systems biologists have introduced platforms such as Cytoscape to represent protein–protein interactions, gene interactions, and other data in networks. Networks are inherently flat structures. One can identify clusters of highly connected nodes, but network representations do not represent these clusters as at a higher level than their constituents. Mechanisms, however, are hierarchically organized: they can be decomposed into their parts and their activities can be decomposed into component operations. A potent bridge between flat networks and hierarchical mechanisms is provided by biological ontologies, both those curated by hand such as Gene Ontology (GO) and those extracted directly from databases such as Network Extracted Ontology (NeXO). I examine several examples in which by applying ontologies to networks, systems biologists generate new hypotheses about mechanisms and characterize these novel strategies for developing mechanistic explanations

    Bayesian analytical approaches for metabolomics : a novel method for molecular structure-informed metabolite interaction modeling, a novel diagnostic model for differentiating myocardial infarction type, and approaches for compound identification given mass spectrometry data.

    Get PDF
    Metabolomics, the study of small molecules in biological systems, has enjoyed great success in enabling researchers to examine disease-associated metabolic dysregulation and has been utilized for the discovery biomarkers of disease and phenotypic states. In spite of recent technological advances in the analytical platforms utilized in metabolomics and the proliferation of tools for the analysis of metabolomics data, significant challenges in metabolomics data analyses remain. In this dissertation, we present three of these challenges and Bayesian methodological solutions for each. In the first part we develop a new methodology to serve a basis for making higher order inferences in metabolomics, which we define as the testing of hypotheses that are more complex than single metabolite hypothesis tests. This methodology utilizes informative priors that are generated via the analysis of molecular structure similarity to enable the estimation of metabolite interactomes (or probabilistic models) which are organism-, sample media-, and condition-specific as well as comprehensive; and that can serve as reference models for studying perturbations in metabolic systems. After discussing the development of our methodology, we present an evaluation of its performance conducted using simulation studies, and we use the methodology for estimating a plasma metabolite interactome for stable heart disease. This interactome may serve as a reference model for evaluating systems-level changes that occur with acute disease events such as myocardial infarction (MI) or unstable angina. In the second part of this work, we present the challenge of developing diagnostic classification models which utilize metabolite abundances and that do not overfit relatively small sample sizes, especially given the high dimensionality of metabolite data acquired using platforms such as liquid chromatography-mass spectrometry. We use a Bayesian methodology for estimating a multinomial logistic regression classifier for the detection and discrimination of the subtype of acute myocardial infarction utilizing metabolite abundance data quantified from blood plasma. As heart disease is the leading cause of global mortality, a blood-based and non-invasive diagnostic test that could differentiate between MI types at the time of the event would have great utility. In the final part of this dissertation we review Bayesian approaches for compound identification in metabolomics experiments that utilize liquid chromatography-mass spectrometry which remains a challenging problem

    Detección de comunidades en redes complejas

    Get PDF
    xiv, 142 p., figuras y material suplementario[EN] Networks have become a widely used tool for modeling complex systems in many di erent elds. This approach is extremely useful for representing interactions among genes, social relationships, Internet communications or correlations of prices within a stock market, to name just a few examples. By analyzing the structure of these networks and understanding how their di erent elements interact, we could improve our knowledge of the whole system. Usually, nodes that compose these networks tend to create tightly knit groups. This property, of high interest in many scienti c elds, is called community structure and improving its detection and characterization is what this thesis is all about. The rst objective of this work is the generation of e cient methods able to characterize the communities of a network and to understand its structure. Second, we will try to create a set of tests where such methods can be studied. Finally, we will suggest a statistical measure in order to be able to properly assess the quality of the community structure of a network. To accomplish these objectives, rst, we generate a set of algorithms that can transform a network into a hierarchical tree and, from there, to determine their most relevant communities. Furthermore, we have developed a new type of benchmarks for e ectively testing these and other community detection algorithms. Finally, and as the most important contribution of this work, it is shown that the community structure of a network can be accurately evaluated using a hypergeometric distribution-based index. Thus, the maximization of this measure, called Surprise, appears as the best proposed strategy for detecting the optimal partition into communities of a network. Surprise exhibits an excellent behavior in all networks analyzed, qualitatively outperforming any previous method. Thus, it appears as the best measure proposed to this end and the data suggests that it could be an optimal strategy to determine the quality of the community structure of complex networks.[ES] El uso de las redes para modelar sistemas complejos es creciente en multitud de ámbitos. Son extremadamente útiles para representar interacciones entre genes, relaciones sociales, intercambio de información en Internet o correlaciones entre precios de acciones bursátiles, por nombrar sólo algunos ejemplos. Analizando la estructura de estas redes, comprendiendo cómo interaccionan sus distintos elementos, podremos entender mejor cómo se comporta el sistema en su conjunto. A menudo, los nodos que conforman estas redes tienden a formar grupos altamente conectados. Esta propiedad es conocida como estructura de comunidades y esta tesis doctoral se ha centrado en el problema de cómo mejorar su detección y caracterización. Como primer objetivo de este trabajo, se encuentra la generación de m etodos e cientes que permitan caracterizar las comunidades de una red y comprender su estructura. Segundo, pretendemos plantear una serie de pruebas donde testar dichos m etodos. Por ultimo, sugeriremos una medida estad stica que pretende ser capaz de evaluar correctamente la calidad de la estructura de comunidades de una red. Para llevar a cabo dichos objetivos, en primer lugar, se generan una serie de algoritmos capaces de transformar una red en un arbol jer arquico y,a partir de ah , determinar las comunidades que aparecen en ella. Por otro lado,se ha dise~nado un nuevo tipo de benchmarks para testar estos y otros algoritmos de detecci on de comunidades de forma e ciente. Por ultimo, y como parte m as importante de este trabajo, se demuestra que la estructura de comunidades de una red puede ser correctamente evaluada utilizando una medida basada en una distribuci on hipergeom etrica. Por tanto, la maximizaci on de este ndice, llamado Surprise, aparece como la estrategia id onea para obtener la partici on en comunidades optima de una red. Surprise ha mostrado un comportamiento excelente en todos los casos analizados, superando cualitativamente a cualquier otro m etodo anterior. De esta manera, aparece como la mejor medida propuesta para este n y los datos sugieren que podr a ser una estrategia optima para determinar la calidad de la estructura de comunidades en redes complejas.Peer reviewe
    corecore