428 research outputs found
Data integration for biological network databases: MetNetDB labeled graph model and graph matching algorithm
To understand the cellular functions of genes requires investigating a variety of biological data, including experimental data, annotation from online databases and literatures, information about cellular interactions, and domain knowledge from biologists. These requirements demand a flexible and powerful biological data management system. MetNetDB is the biological database component of the MetNet platform (http://metnetdb.org/), a software platform for Arabidopsis system biology. This work describes a labeled graph model that addresses the challenges associated with biological network databases, and discusses the implementation of this model in MetNetDB.
MetNetDB integrates most recent data from various sources, including biological networks, gene annotation, metabolite information, and protein localization data. The integration contains four steps: data model transformation and integration; semantic mapping; data conversion and integration; and conflict resolution. MetNetDB is established as a labeled graph model. The graph structure supports network data storage and application of graph analysis algorithm. The node and edge labels have the same extension capability as object data model. In addition, rules are used to guarantee the biological network data integrity; operations are defined for graph edit and comparison.
To facilitate the integration of network data, which is often inaccurate or incomplete, a subgraph extraction algorithm is designed for MetNetDB. This algorithm allows subgraph querying based on user-specified biomolecules. Both exact matching and approximate matching with biomolecules in networks are supported. The similarity among biomolecules is inferred from expression patterns, gene ontology, chemical ontology, and protein-gene relationships. Combined with the implementation of Messmer\u27s approximate subgraph isomorphism algorithm, MetNetDB supports exact and approximate graph matching.
Based on the MetNetDB labeled graph model and the graph matching algorithms, the MetNetDB curator tool is built with several innovative features, including active biological rule checking during network curation, tracking data change history, and a biologist-friendly visual graph query system
Structure induction by lossless graph compression
This work is motivated by the necessity to automate the discovery of
structure in vast and evergrowing collection of relational data commonly
represented as graphs, for example genomic networks. A novel algorithm, dubbed
Graphitour, for structure induction by lossless graph compression is presented
and illustrated by a clear and broadly known case of nested structure in a DNA
molecule. This work extends to graphs some well established approaches to
grammatical inference previously applied only to strings. The bottom-up graph
compression problem is related to the maximum cardinality (non-bipartite)
maximum cardinality matching problem. The algorithm accepts a variety of graph
types including directed graphs and graphs with labeled nodes and arcs. The
resulting structure could be used for representation and classification of
graphs.Comment: 10 pages, 7 figures, 2 tables published in Proceedings of the Data
Compression Conference, 200
Comprehensive evaluation of deep and graph learning on drug-drug interactions prediction
Recent advances and achievements of artificial intelligence (AI) as well as
deep and graph learning models have established their usefulness in biomedical
applications, especially in drug-drug interactions (DDIs). DDIs refer to a
change in the effect of one drug to the presence of another drug in the human
body, which plays an essential role in drug discovery and clinical research.
DDIs prediction through traditional clinical trials and experiments is an
expensive and time-consuming process. To correctly apply the advanced AI and
deep learning, the developer and user meet various challenges such as the
availability and encoding of data resources, and the design of computational
methods. This review summarizes chemical structure based, network based, NLP
based and hybrid methods, providing an updated and accessible guide to the
broad researchers and development community with different domain knowledge. We
introduce widely-used molecular representation and describe the theoretical
frameworks of graph neural network models for representing molecular
structures. We present the advantages and disadvantages of deep and graph
learning methods by performing comparative experiments. We discuss the
potential technical challenges and highlight future directions of deep and
graph learning models for accelerating DDIs prediction.Comment: Accepted by Briefings in Bioinformatic
Biochemical complex data generation and integration in genome-scale metabolic models
Dissertação de mestrado em BioinformaticsThe (re-)construction of Genome-Scale Metabolic (GSM) models is highly dependent on
biochemical databases. In fact, the biochemical data within these databases is limited, lacking,
most of the times, in structurally defined compounds’ representations. In order to circumvent
this limitation, compounds are frequently represented by their generic version. Lipids are
paradigmatic cases: given that a multitude of lipid species can occur in nature, not only is
their storage in databases hampered, but also their integration into GSM models. Accordingly,
converting one lipid version, in GSM models, into another can be tricky, as these compounds
possess side chains that are likely to be transferred all across their biosynthetic network.
Hence, converting a lipid implies that all its precursors have to be converted as well, requiring
information on lipid specificity and biosynthetic context.
The present work represents a strategy to tackle this issue. Biochemical cOmplex data
Integration in Metabolic Models at Genome scale (BOIMMG)’s pipeline encompasses the
integration and processing of biochemical data from different sources, aiming at expanding the
current knowledge in lipid biosynthesis, and its integration in GSM models.
Generic reactions retrieved from MetaCyc were handled and transformed into reactions with
structurally defined lipid species. More than 30 generic reactions were fully (and 27 partially)
characterized, allowing to predict over 30000 new lipid structures and their biosynthetic context.
The integration of BOIMMG’s data into GSM models was conducted for electron-transfer
quinones, glycerolipids, and phospholipids metabolism. The validation accounted on the
comparison of models with different versions of these metabolites. BOIMMG’s conversion
modules were applied to Escherichia coli’s iJR904 model [1], generating 53 more matching lipids
and 38 more matching reactions with iJR904 model’s iteration iAF1260b [2, 3], in which the
conversion was performed and curated manually.
To the best of our knowledge, BOIMMG’s database is the only with biosynthetic information
regarding structurally defined lipids. Moreover, there is no other state-of-the-art tool capable
of automatically generating complex lipid-specific networks.A reconstrução de modelos metabólicos à escala genómica (GSM na língua inglesa) depende
grandemente da informaçãoo bioquímica presente em bases de dados. De facto, esta informação
é muitas vezes limitada, podendo não conter representações de compostos estruturalmente
definidos. Como tentativa de contornar esta limitação, os compostos químicos são frequentemente
representados pela sua representação genérica. Os lípidos são casos paradigmáticos,
dado que uma multitude de diferentes espécies químicas de lípidos ocorrem na natureza, dificultando
o seu armazenamento em bases de dados, assim como a sua integração em modelos
GSM. Desta forma, o processo de converter lípidos de uma versão genérica para uma versão
estruturalmente definida não é trivial, dado que estes compostos possuem cadeias laterais que
são transferidas ao longo das suas vias de biossíntese. Consequentemente, essa conversão
implica que todos os precursores desses lípidos também sejam convertidos, requerendo haver
informação relativa a lípidos específicos e às suas relações biossintéticas.
O presente trabalho representa uma estratégia para resolver esse problema. A pipeline do
software desenvolvido no âmbito deste trabalho, Biochemical cOmplex dataIntegration in Metabolic
Models at Genome scale (BOIMMG), engloba a integração e processamento de dados bioquímicos
de diferentes fontes, visando a expansão do conhecimento atual na biossíntese de lípidos, assim
como a sua integração em modelos GSM.
Relativamente à segunda fase, reações genéricas extraídas da base de dados MetaCyc foram
processadas e transformadas em reações com lípidos estruturalmente definidos. Mais de 30
reações genéricas foram completamente (e 27 parcialmente) caracterizadas, permitindo prever
mais de 30000 novas estruturas de lípidos, assim como os seus contextos biossintéticos.
A integração dos dados nos modelos GSM foi conduzido para o metabolismo das quinonas
transportadoras de eletrões, glicerolípidos e fosfolípidos. A validação teve em conta a
comparação entre modelos com diferentes versões destes metabolitos. Os módulos de conversão do BOIMMG foram aplicados ao modelo iJR904 de Escherichia coli [1], gerando mais
53 lípidos e 38 reações que se encontram no modelo iAF1260b [2, 3], uma iteração do modelo
iJR904 cuja conversão de lípidos se procedeu manualmente.
A base de dados gerada pelo método BOIMMG é a única que contém informação biossintética
relata a lípidos estruturalmente definidos. Adicionalmente, BOIMMG é uma ferramenta única
que permite gerar redes complexas de lípidos automaticamente
HeTriNet: Heterogeneous Graph Triplet Attention Network for Drug-Target-Disease Interaction
Modeling the interactions between drugs, targets, and diseases is paramount
in drug discovery and has significant implications for precision medicine and
personalized treatments. Current approaches frequently consider drug-target or
drug-disease interactions individually, ignoring the interdependencies among
all three entities. Within human metabolic systems, drugs interact with protein
targets in cells, influencing target activities and subsequently impacting
biological pathways to promote healthy functions and treat diseases. Moving
beyond binary relationships and exploring tighter triple relationships is
essential to understanding drugs' mechanism of action (MoAs). Moreover,
identifying the heterogeneity of drugs, targets, and diseases, along with their
distinct characteristics, is critical to model these complex interactions
appropriately. To address these challenges, we effectively model the
interconnectedness of all entities in a heterogeneous graph and develop a novel
Heterogeneous Graph Triplet Attention Network (\texttt{HeTriNet}).
\texttt{HeTriNet} introduces a novel triplet attention mechanism within this
heterogeneous graph structure. Beyond pairwise attention as the importance of
an entity for the other one, we define triplet attention to model the
importance of pairs for entities in the drug-target-disease triplet prediction
problem. Experimental results on real-world datasets show that
\texttt{HeTriNet} outperforms several baselines, demonstrating its remarkable
proficiency in uncovering novel drug-target-disease relationships.Comment: 13 pages, 3 figures, 6 table
Batch kernel SOM and related Laplacian methods for social network analysis
Large graphs are natural mathematical models for describing the structure of
the data in a wide variety of fields, such as web mining, social networks,
information retrieval, biological networks, etc. For all these applications,
automatic tools are required to get a synthetic view of the graph and to reach
a good understanding of the underlying problem. In particular, discovering
groups of tightly connected vertices and understanding the relations between
those groups is very important in practice. This paper shows how a kernel
version of the batch Self Organizing Map can be used to achieve these goals via
kernels derived from the Laplacian matrix of the graph, especially when it is
used in conjunction with more classical methods based on the spectral analysis
of the graph. The proposed method is used to explore the structure of a
medieval social network modeled through a weighted graph that has been directly
built from a large corpus of agrarian contracts
Transforming Graph Representations for Statistical Relational Learning
Relational data representations have become an increasingly important topic
due to the recent proliferation of network datasets (e.g., social, biological,
information networks) and a corresponding increase in the application of
statistical relational learning (SRL) algorithms to these domains. In this
article, we examine a range of representation issues for graph-based relational
data. Since the choice of relational data representation for the nodes, links,
and features can dramatically affect the capabilities of SRL algorithms, we
survey approaches and opportunities for relational representation
transformation designed to improve the performance of these algorithms. This
leads us to introduce an intuitive taxonomy for data representation
transformations in relational domains that incorporates link transformation and
node transformation as symmetric representation tasks. In particular, the
transformation tasks for both nodes and links include (i) predicting their
existence, (ii) predicting their label or type, (iii) estimating their weight
or importance, and (iv) systematically constructing their relevant features. We
motivate our taxonomy through detailed examples and use it to survey and
compare competing approaches for each of these tasks. We also discuss general
conditions for transforming links, nodes, and features. Finally, we highlight
challenges that remain to be addressed
- …