428 research outputs found

    Data integration for biological network databases: MetNetDB labeled graph model and graph matching algorithm

    Get PDF
    To understand the cellular functions of genes requires investigating a variety of biological data, including experimental data, annotation from online databases and literatures, information about cellular interactions, and domain knowledge from biologists. These requirements demand a flexible and powerful biological data management system. MetNetDB is the biological database component of the MetNet platform (http://metnetdb.org/), a software platform for Arabidopsis system biology. This work describes a labeled graph model that addresses the challenges associated with biological network databases, and discusses the implementation of this model in MetNetDB. MetNetDB integrates most recent data from various sources, including biological networks, gene annotation, metabolite information, and protein localization data. The integration contains four steps: data model transformation and integration; semantic mapping; data conversion and integration; and conflict resolution. MetNetDB is established as a labeled graph model. The graph structure supports network data storage and application of graph analysis algorithm. The node and edge labels have the same extension capability as object data model. In addition, rules are used to guarantee the biological network data integrity; operations are defined for graph edit and comparison. To facilitate the integration of network data, which is often inaccurate or incomplete, a subgraph extraction algorithm is designed for MetNetDB. This algorithm allows subgraph querying based on user-specified biomolecules. Both exact matching and approximate matching with biomolecules in networks are supported. The similarity among biomolecules is inferred from expression patterns, gene ontology, chemical ontology, and protein-gene relationships. Combined with the implementation of Messmer\u27s approximate subgraph isomorphism algorithm, MetNetDB supports exact and approximate graph matching. Based on the MetNetDB labeled graph model and the graph matching algorithms, the MetNetDB curator tool is built with several innovative features, including active biological rule checking during network curation, tracking data change history, and a biologist-friendly visual graph query system

    Structure induction by lossless graph compression

    Full text link
    This work is motivated by the necessity to automate the discovery of structure in vast and evergrowing collection of relational data commonly represented as graphs, for example genomic networks. A novel algorithm, dubbed Graphitour, for structure induction by lossless graph compression is presented and illustrated by a clear and broadly known case of nested structure in a DNA molecule. This work extends to graphs some well established approaches to grammatical inference previously applied only to strings. The bottom-up graph compression problem is related to the maximum cardinality (non-bipartite) maximum cardinality matching problem. The algorithm accepts a variety of graph types including directed graphs and graphs with labeled nodes and arcs. The resulting structure could be used for representation and classification of graphs.Comment: 10 pages, 7 figures, 2 tables published in Proceedings of the Data Compression Conference, 200

    Comprehensive evaluation of deep and graph learning on drug-drug interactions prediction

    Full text link
    Recent advances and achievements of artificial intelligence (AI) as well as deep and graph learning models have established their usefulness in biomedical applications, especially in drug-drug interactions (DDIs). DDIs refer to a change in the effect of one drug to the presence of another drug in the human body, which plays an essential role in drug discovery and clinical research. DDIs prediction through traditional clinical trials and experiments is an expensive and time-consuming process. To correctly apply the advanced AI and deep learning, the developer and user meet various challenges such as the availability and encoding of data resources, and the design of computational methods. This review summarizes chemical structure based, network based, NLP based and hybrid methods, providing an updated and accessible guide to the broad researchers and development community with different domain knowledge. We introduce widely-used molecular representation and describe the theoretical frameworks of graph neural network models for representing molecular structures. We present the advantages and disadvantages of deep and graph learning methods by performing comparative experiments. We discuss the potential technical challenges and highlight future directions of deep and graph learning models for accelerating DDIs prediction.Comment: Accepted by Briefings in Bioinformatic

    Biochemical complex data generation and integration in genome-scale metabolic models

    Get PDF
    Dissertação de mestrado em BioinformaticsThe (re-)construction of Genome-Scale Metabolic (GSM) models is highly dependent on biochemical databases. In fact, the biochemical data within these databases is limited, lacking, most of the times, in structurally defined compounds’ representations. In order to circumvent this limitation, compounds are frequently represented by their generic version. Lipids are paradigmatic cases: given that a multitude of lipid species can occur in nature, not only is their storage in databases hampered, but also their integration into GSM models. Accordingly, converting one lipid version, in GSM models, into another can be tricky, as these compounds possess side chains that are likely to be transferred all across their biosynthetic network. Hence, converting a lipid implies that all its precursors have to be converted as well, requiring information on lipid specificity and biosynthetic context. The present work represents a strategy to tackle this issue. Biochemical cOmplex data Integration in Metabolic Models at Genome scale (BOIMMG)’s pipeline encompasses the integration and processing of biochemical data from different sources, aiming at expanding the current knowledge in lipid biosynthesis, and its integration in GSM models. Generic reactions retrieved from MetaCyc were handled and transformed into reactions with structurally defined lipid species. More than 30 generic reactions were fully (and 27 partially) characterized, allowing to predict over 30000 new lipid structures and their biosynthetic context. The integration of BOIMMG’s data into GSM models was conducted for electron-transfer quinones, glycerolipids, and phospholipids metabolism. The validation accounted on the comparison of models with different versions of these metabolites. BOIMMG’s conversion modules were applied to Escherichia coli’s iJR904 model [1], generating 53 more matching lipids and 38 more matching reactions with iJR904 model’s iteration iAF1260b [2, 3], in which the conversion was performed and curated manually. To the best of our knowledge, BOIMMG’s database is the only with biosynthetic information regarding structurally defined lipids. Moreover, there is no other state-of-the-art tool capable of automatically generating complex lipid-specific networks.A reconstrução de modelos metabólicos à escala genómica (GSM na língua inglesa) depende grandemente da informaçãoo bioquímica presente em bases de dados. De facto, esta informação é muitas vezes limitada, podendo não conter representações de compostos estruturalmente definidos. Como tentativa de contornar esta limitação, os compostos químicos são frequentemente representados pela sua representação genérica. Os lípidos são casos paradigmáticos, dado que uma multitude de diferentes espécies químicas de lípidos ocorrem na natureza, dificultando o seu armazenamento em bases de dados, assim como a sua integração em modelos GSM. Desta forma, o processo de converter lípidos de uma versão genérica para uma versão estruturalmente definida não é trivial, dado que estes compostos possuem cadeias laterais que são transferidas ao longo das suas vias de biossíntese. Consequentemente, essa conversão implica que todos os precursores desses lípidos também sejam convertidos, requerendo haver informação relativa a lípidos específicos e às suas relações biossintéticas. O presente trabalho representa uma estratégia para resolver esse problema. A pipeline do software desenvolvido no âmbito deste trabalho, Biochemical cOmplex dataIntegration in Metabolic Models at Genome scale (BOIMMG), engloba a integração e processamento de dados bioquímicos de diferentes fontes, visando a expansão do conhecimento atual na biossíntese de lípidos, assim como a sua integração em modelos GSM. Relativamente à segunda fase, reações genéricas extraídas da base de dados MetaCyc foram processadas e transformadas em reações com lípidos estruturalmente definidos. Mais de 30 reações genéricas foram completamente (e 27 parcialmente) caracterizadas, permitindo prever mais de 30000 novas estruturas de lípidos, assim como os seus contextos biossintéticos. A integração dos dados nos modelos GSM foi conduzido para o metabolismo das quinonas transportadoras de eletrões, glicerolípidos e fosfolípidos. A validação teve em conta a comparação entre modelos com diferentes versões destes metabolitos. Os módulos de conversão do BOIMMG foram aplicados ao modelo iJR904 de Escherichia coli [1], gerando mais 53 lípidos e 38 reações que se encontram no modelo iAF1260b [2, 3], uma iteração do modelo iJR904 cuja conversão de lípidos se procedeu manualmente. A base de dados gerada pelo método BOIMMG é a única que contém informação biossintética relata a lípidos estruturalmente definidos. Adicionalmente, BOIMMG é uma ferramenta única que permite gerar redes complexas de lípidos automaticamente

    HeTriNet: Heterogeneous Graph Triplet Attention Network for Drug-Target-Disease Interaction

    Full text link
    Modeling the interactions between drugs, targets, and diseases is paramount in drug discovery and has significant implications for precision medicine and personalized treatments. Current approaches frequently consider drug-target or drug-disease interactions individually, ignoring the interdependencies among all three entities. Within human metabolic systems, drugs interact with protein targets in cells, influencing target activities and subsequently impacting biological pathways to promote healthy functions and treat diseases. Moving beyond binary relationships and exploring tighter triple relationships is essential to understanding drugs' mechanism of action (MoAs). Moreover, identifying the heterogeneity of drugs, targets, and diseases, along with their distinct characteristics, is critical to model these complex interactions appropriately. To address these challenges, we effectively model the interconnectedness of all entities in a heterogeneous graph and develop a novel Heterogeneous Graph Triplet Attention Network (\texttt{HeTriNet}). \texttt{HeTriNet} introduces a novel triplet attention mechanism within this heterogeneous graph structure. Beyond pairwise attention as the importance of an entity for the other one, we define triplet attention to model the importance of pairs for entities in the drug-target-disease triplet prediction problem. Experimental results on real-world datasets show that \texttt{HeTriNet} outperforms several baselines, demonstrating its remarkable proficiency in uncovering novel drug-target-disease relationships.Comment: 13 pages, 3 figures, 6 table

    Batch kernel SOM and related Laplacian methods for social network analysis

    Get PDF
    Large graphs are natural mathematical models for describing the structure of the data in a wide variety of fields, such as web mining, social networks, information retrieval, biological networks, etc. For all these applications, automatic tools are required to get a synthetic view of the graph and to reach a good understanding of the underlying problem. In particular, discovering groups of tightly connected vertices and understanding the relations between those groups is very important in practice. This paper shows how a kernel version of the batch Self Organizing Map can be used to achieve these goals via kernels derived from the Laplacian matrix of the graph, especially when it is used in conjunction with more classical methods based on the spectral analysis of the graph. The proposed method is used to explore the structure of a medieval social network modeled through a weighted graph that has been directly built from a large corpus of agrarian contracts

    Transforming Graph Representations for Statistical Relational Learning

    Full text link
    Relational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of statistical relational learning (SRL) algorithms to these domains. In this article, we examine a range of representation issues for graph-based relational data. Since the choice of relational data representation for the nodes, links, and features can dramatically affect the capabilities of SRL algorithms, we survey approaches and opportunities for relational representation transformation designed to improve the performance of these algorithms. This leads us to introduce an intuitive taxonomy for data representation transformations in relational domains that incorporates link transformation and node transformation as symmetric representation tasks. In particular, the transformation tasks for both nodes and links include (i) predicting their existence, (ii) predicting their label or type, (iii) estimating their weight or importance, and (iv) systematically constructing their relevant features. We motivate our taxonomy through detailed examples and use it to survey and compare competing approaches for each of these tasks. We also discuss general conditions for transforming links, nodes, and features. Finally, we highlight challenges that remain to be addressed
    corecore