120 research outputs found
Exposing Provenance Metadata Using Different RDF Models
A standard model for exposing structured provenance metadata of scientific
assertions on the Semantic Web would increase interoperability,
discoverability, reliability, as well as reproducibility for scientific
discourse and evidence-based knowledge discovery. Several Resource Description
Framework (RDF) models have been proposed to track provenance. However,
provenance metadata may not only be verbose, but also significantly redundant.
Therefore, an appropriate RDF provenance model should be efficient for
publishing, querying, and reasoning over Linked Data. In the present work, we
have collected millions of pairwise relations between chemicals, genes, and
diseases from multiple data sources, and demonstrated the extent of redundancy
of provenance information in the life science domain. We also evaluated the
suitability of several RDF provenance models for this crowdsourced data set,
including the N-ary model, the Singleton Property model, and the
Nanopublication model. We examined query performance against three commonly
used large RDF stores, including Virtuoso, Stardog, and Blazegraph. Our
experiments demonstrate that query performance depends on both RDF store as
well as the RDF provenance model
Pathway databases and tools for their exploitation: benefits, current limitations and challenges
In past years, comprehensive representations of cell signalling pathways have been developed by manual curation from literature, which requires huge effort and would benefit from information stored in databases and from automatic retrieval and integration methods. Once a reconstruction of the network of interactions is achieved, analysis of its structural features and its dynamic behaviour can take place. Mathematical modelling techniques are used to simulate the complex behaviour of cell signalling networks, which ultimately sheds light on the mechanisms leading to complex diseases or helps in the identification of drug targets. A variety of databases containing information on cell signalling pathways have been developed in conjunction with methodologies to access and analyse the data. In principle, the scenario is prepared to make the most of this information for the analysis of the dynamics of signalling pathways. However, are the knowledge repositories of signalling pathways ready to realize the systems biology promise? In this article we aim to initiate this discussion and to provide some insights on this issue
On Reasoning with RDF Statements about Statements using Singleton Property Triples
The Singleton Property (SP) approach has been proposed for representing and
querying metadata about RDF triples such as provenance, time, location, and
evidence. In this approach, one singleton property is created to uniquely
represent a relationship in a particular context, and in general, generates a
large property hierarchy in the schema. It has become the subject of important
questions from Semantic Web practitioners. Can an existing reasoner recognize
the singleton property triples? And how? If the singleton property triples
describe a data triple, then how can a reasoner infer this data triple from the
singleton property triples? Or would the large property hierarchy affect the
reasoners in some way? We address these questions in this paper and present our
study about the reasoning aspects of the singleton properties. We propose a
simple mechanism to enable existing reasoners to recognize the singleton
property triples, as well as to infer the data triples described by the
singleton property triples. We evaluate the effect of the singleton property
triples in the reasoning processes by comparing the performance on RDF datasets
with and without singleton properties. Our evaluation uses as benchmark the
LUBM datasets and the LUBM-SP datasets derived from LUBM with temporal
information added through singleton properties
From SNPs to pathways: integration of functional effect of sequence variations on models of cell signalling pathways
This article is available from
GUILDify v2.0:A Tool to Identify Molecular Networks Underlying Human Diseases, Their Comorbidities and Their Druggable Targets
The genetic basis of complex diseases involves alterations on multiple genes. Unraveling the interplay between these genetic factors is key to the discovery of new biomarkers and treatments. In 2014, we introduced GUILDify, a web server that searches for genes associated to diseases, finds novel disease genes applying various network-based prioritization algorithms and proposes candidate drugs. Here, we present GUILDify v2.0, a major update and improvement of the original method, where we have included protein interaction data for seven species and 22 human tissues and incorporated the disease-gene associations from DisGeNET. To infer potential disease relationships associated with multi-morbidities, we introduced a novel feature for estimating the genetic and functional overlap of two diseases using the top-ranking genes and the associated enrichment of biological functions and pathways (as defined by GO and Reactome). The analysis of this overlap helps to identify the mechanistic role of genes and protein-protein interactions in comorbidities. Finally, we provided an R package, guildifyR, to facilitate programmatic access to GUILDify v2.0 (http://sbi.upf.edu/guildify2).The authors received support from: ISCIII-FEDER (PI13/00082, CP10/00524, CPII16/00026); IMI-JU
under grants agreements no. 116030 (TransQST) and no. 777365 (eTRANSAFE), resources of which
are composed of financial contribution from the EU-FP7 (FP7/2007- 2013) and EFPIA companies in
kind contribution; the EU H2020 Programme 2014-2020 under grant agreements no. 634143
(MedBioinformatics) and no. 676559 (Elixir-Excelerate); the Spanish Ministry of Economy (MINECO)
[BIO2017-85329-R] [RYC-2015-17519]; "Unidad de Excelencia María de Maeztu", funded by the
Spanish Ministry of Economy [ref: MDM-2014-0370]. The Research Programme on Biomedical
Informatics (GRIB) is a member of the Spanish National Bioinformatics Institute (INB), PRB2-ISCIII
and is supported by grant PT13/0001/0023, of the PE I+D+i 2013-2016, funded by ISCIII and FEDER
PsyGeNET : a knowledge platform on psychiatric disorders and their genes
Altres ajuts: Innovative Medicines Initiative Joint Undertaking (no. 115372, EMIF and no. 115191, Open PHACTS), resources of which are composed of financial contribution from the European Union's Seventh Framework Programme (FP7/2007-2013) and EFPIA companies' in kind contribution.Summary: PsyGeNET (Psychiatric disorders and Genes association NETwork) is a knowledge platform for the exploratory analysis of psychiatric diseases and their associated genes. PsyGeNET is composed of a database and a web interface supporting data search, visualization, filtering and sharing. PsyGeNET integrates information from DisGeNET and data extracted from the literature by text mining, which has been curated by domain experts. It currently contains 2642 associations between 1271 genes and 37 psychiatric disease concepts. In its first release, PsyGeNET is focused on three psychiatric disorders: major depression, alcohol and cocaine use disorders. PsyGeNET represents a comprehensive, open access resource for the analysis of the molecular mechanisms underpinning psychiatric disorders and their comorbidities
Genetic and functional characterization of disease associations explains comorbidity
Understanding relationships between diseases, such as
comorbidities, has important socio-economic implications,
ranging from clinical study design to health care planning. Most
studies characterize disease comorbidity using shared genetic
origins, ignoring pathway-based commonalities between diseases.
In this study, we define the disease pathways using an
interactome-based extension of known disease-genes and introduce
several measures of functional overlap. The analysis reveals 206
significant links among 94 diseases, giving rise to a highly
clustered disease association network. We observe that around
95% of the links in the disease network, though not identified
by genetic overlap, are discovered by functional overlap. This
disease network portraits rheumatoid arthritis, asthma,
atherosclerosis, pulmonary diseases and Crohn's disease as hubs
and thus pointing to common inflammatory processes underlying
disease pathophysiology. We identify several described
associations such as the inverse comorbidity relationship
between Alzheimer's disease and neoplasms. Furthermore, we
investigate the disruptions in protein interactions by mapping
mutations onto the domains involved in the interaction,
suggesting hypotheses on the causal link between diseases.
Finally, we provide several proof-of-principle examples in which
we model the effect of the mutation and the change of the
association strength, which could explain the observed
comorbidity between diseases caused by the same genetic
alterations
An ensemble learning approach for modeling the systems biology of drug-induced injury
Background: Drug-induced liver injury (DILI) is an adverse reaction caused by the intake of drugs of common use that produces liver damage. The impact of DILI is estimated to affect around 20 in 100,000 inhabitants worldwide each year. Despite being one of the main causes of liver failure, the pathophysiology and mechanisms of DILI are poorly understood. In the present study, we developed an ensemble learning approach based on different features (CMap gene expression, chemical structures, drug targets) to predict drugs that might cause DILI and gain a better understanding of the mechanisms linked to the adverse reaction. Results: We searched for gene signatures in CMap gene expression data by using two approaches: phenotype-gene associations data from DisGeNET, and a non-parametric test comparing gene expression of DILI-Concern and No-DILI-Concern drugs (as per DILIrank definitions). The average accuracy of the classifiers in both approaches was 69%. We used chemical structures as features, obtaining an accuracy of 65%. The combination of both types of features produced an accuracy around 63%, but improved the independent hold-out test up to 67%. The use of drug-target associations as feature obtained the best accuracy (70%) in the independent hold-out test. Conclusions: When using CMap gene expression data, searching for a specific gene signature among the landmark genes improves the quality of the classifiers, but it is still limited by the intrinsic noise of the dataset. When using chemical structures as a feature, the structural diversity of the known DILI-causing drugs hampers the prediction, which is a similar problem as for the use of gene expression information. The combination of both features did not improve the quality of the classifiers but increased the robustness as shown on independent hold-out tests. The use of drug-target associations as feature improved the prediction, specially the specificity, and the results were comparable to previous research studies.The authors received funding from the Innovative Medicines Initiative 2 Joint Undertaking under grant agreements TransQST and eTRANSAFE (refs: 116030, 777365). This Joint Undertaking receives support from the European Union’s Horizon 2020 research and innovation programme and EFPIA companies in kind contribution. The authors also received support from Spanish Ministry of Economy (MINECO, refs: BIO2017–85329-R (FEDER, EU), RYC-2015-17519) as well as EU H2020 Programme 2014–2020 under grant agreement No. 676559 (Elixir-Excelerate) and from Agència de Gestió D’ajuts Universitaris i de Recerca Generalitat de Catalunya (AGAUR, ref.: 2017SGR01020). L.I.F. received support from ISCIII-FEDER (ref: CPII16/00026). The Research Programme on Biomedical Informatics (GRIB) is a member of the Spanish National Bioinformatics Institute (INB), PRB2-ISCIII and is supported by grant PT13/0001/0023, of the PE I + D + i 2013–2016, funded by ISCIII and FEDER. The DCEXS is a “Unidad de Excelencia María de Maeztu”, funded by the MINECO (ref: MDM-2014-0370). J.A.P. received support from the CAMDA Travel Fellowship
Personalized Respiratory Medicine: Exploring the Horizon, Addressing the Issues. Summary of a BRN-AJRCCM Workshop Held in Barcelona on June 12, 2014.
This Pulmonary Perspective summarizes the content and main conclusions of an international workshop on personalized respiratory medicine coorganized by the Barcelona Respiratory Network (www.brn.cat)and the AJRCCM in June 2014. It discusses (1) its definition and historical, social, legal, and ethical aspects; (2) the view from different disciplines, including basic science, epidemiology, bioinformatics,and network/systems medicine; (3) the bottlenecks and opportunities identified by some currently ongoing projects; and (4) the implications for the individual, the healthcare system and the pharmaceutical industry. The authors hope that, although it is not a systematic review on the subject,this document can be a useful reference for researchers, clinicians, healthcare managers, policy-makers,and industry parties interested in personalized respiratory medicine
Assessment of NER solutions against the first and second CALBC Silver Standard Corpus
Background Competitions in text mining have been used to measure the performance of automatic text processing solutions against a manually annotated gold standard corpus (GSC). The preparation of the GSC is time-consuming and costly and the final corpus consists at the most of a few thousand documents annotated with a limited set of semantic groups. To overcome these shortcomings, the CALBC project partners (PPs) have produced a large-scale annotated biomedical corpus with four different semantic groups through the harmonisation of annotations from automatic text mining solutions, the first version of the Silver Standard Corpus (SSC-I). The four semantic groups are chemical entities and drugs (CHED), genes and proteins (PRGE), diseases and disorders (DISO) and species (SPE). This corpus has been used for the First CALBC Challenge asking the participants to annotate the corpus with their text processing solutions. Results All four PPs from the CALBC project and in addition, 12 challenge participants (CPs) contributed annotated data sets for an evaluation against the SSC-I. CPs could ignore the training data and deliver the annotations from their genuine annotation system, or could train a machine-learning approach on the provided pre-annotated data. In general, the performances of the annotation solutions were lower for entities from the categories CHED and PRGE in comparison to the identification of entities categorized as DISO and SPE. The best performance over all semantic groups were achieved from two annotation solutions that have been trained on the SSC-I. The data sets from participants were used to generate the harmonised Silver Standard Corpus II (SSC-II), if the participant did not make use of the annotated data set from the SSC-I for training purposes. The performances of the participants’ solutions were again measured against the SSC-II. The performances of the annotation solutions showed again better results for DISO and SPE in comparison to CHED and PRGE. Conclusions The SSC-I delivers a large set of annotations (1,121,705) for a large number of documents (100,000 Medline abstracts). The annotations cover four different semantic groups and are sufficiently homogeneous to be reproduced with a trained classifier leading to an average F-measure of 85%. Benchmarking the annotation solutions against the SSC-II leads to better performance for the CPs’ annotation solutions in comparison to the SSC-I
- …