120 research outputs found

    Exposing Provenance Metadata Using Different RDF Models

    Full text link
    A standard model for exposing structured provenance metadata of scientific assertions on the Semantic Web would increase interoperability, discoverability, reliability, as well as reproducibility for scientific discourse and evidence-based knowledge discovery. Several Resource Description Framework (RDF) models have been proposed to track provenance. However, provenance metadata may not only be verbose, but also significantly redundant. Therefore, an appropriate RDF provenance model should be efficient for publishing, querying, and reasoning over Linked Data. In the present work, we have collected millions of pairwise relations between chemicals, genes, and diseases from multiple data sources, and demonstrated the extent of redundancy of provenance information in the life science domain. We also evaluated the suitability of several RDF provenance models for this crowdsourced data set, including the N-ary model, the Singleton Property model, and the Nanopublication model. We examined query performance against three commonly used large RDF stores, including Virtuoso, Stardog, and Blazegraph. Our experiments demonstrate that query performance depends on both RDF store as well as the RDF provenance model

    Pathway databases and tools for their exploitation: benefits, current limitations and challenges

    Get PDF
    In past years, comprehensive representations of cell signalling pathways have been developed by manual curation from literature, which requires huge effort and would benefit from information stored in databases and from automatic retrieval and integration methods. Once a reconstruction of the network of interactions is achieved, analysis of its structural features and its dynamic behaviour can take place. Mathematical modelling techniques are used to simulate the complex behaviour of cell signalling networks, which ultimately sheds light on the mechanisms leading to complex diseases or helps in the identification of drug targets. A variety of databases containing information on cell signalling pathways have been developed in conjunction with methodologies to access and analyse the data. In principle, the scenario is prepared to make the most of this information for the analysis of the dynamics of signalling pathways. However, are the knowledge repositories of signalling pathways ready to realize the systems biology promise? In this article we aim to initiate this discussion and to provide some insights on this issue

    On Reasoning with RDF Statements about Statements using Singleton Property Triples

    Get PDF
    The Singleton Property (SP) approach has been proposed for representing and querying metadata about RDF triples such as provenance, time, location, and evidence. In this approach, one singleton property is created to uniquely represent a relationship in a particular context, and in general, generates a large property hierarchy in the schema. It has become the subject of important questions from Semantic Web practitioners. Can an existing reasoner recognize the singleton property triples? And how? If the singleton property triples describe a data triple, then how can a reasoner infer this data triple from the singleton property triples? Or would the large property hierarchy affect the reasoners in some way? We address these questions in this paper and present our study about the reasoning aspects of the singleton properties. We propose a simple mechanism to enable existing reasoners to recognize the singleton property triples, as well as to infer the data triples described by the singleton property triples. We evaluate the effect of the singleton property triples in the reasoning processes by comparing the performance on RDF datasets with and without singleton properties. Our evaluation uses as benchmark the LUBM datasets and the LUBM-SP datasets derived from LUBM with temporal information added through singleton properties

    GUILDify v2.0:A Tool to Identify Molecular Networks Underlying Human Diseases, Their Comorbidities and Their Druggable Targets

    Get PDF
    The genetic basis of complex diseases involves alterations on multiple genes. Unraveling the interplay between these genetic factors is key to the discovery of new biomarkers and treatments. In 2014, we introduced GUILDify, a web server that searches for genes associated to diseases, finds novel disease genes applying various network-based prioritization algorithms and proposes candidate drugs. Here, we present GUILDify v2.0, a major update and improvement of the original method, where we have included protein interaction data for seven species and 22 human tissues and incorporated the disease-gene associations from DisGeNET. To infer potential disease relationships associated with multi-morbidities, we introduced a novel feature for estimating the genetic and functional overlap of two diseases using the top-ranking genes and the associated enrichment of biological functions and pathways (as defined by GO and Reactome). The analysis of this overlap helps to identify the mechanistic role of genes and protein-protein interactions in comorbidities. Finally, we provided an R package, guildifyR, to facilitate programmatic access to GUILDify v2.0 (http://sbi.upf.edu/guildify2).The authors received support from: ISCIII-FEDER (PI13/00082, CP10/00524, CPII16/00026); IMI-JU under grants agreements no. 116030 (TransQST) and no. 777365 (eTRANSAFE), resources of which are composed of financial contribution from the EU-FP7 (FP7/2007- 2013) and EFPIA companies in kind contribution; the EU H2020 Programme 2014-2020 under grant agreements no. 634143 (MedBioinformatics) and no. 676559 (Elixir-Excelerate); the Spanish Ministry of Economy (MINECO) [BIO2017-85329-R] [RYC-2015-17519]; "Unidad de Excelencia María de Maeztu", funded by the Spanish Ministry of Economy [ref: MDM-2014-0370]. The Research Programme on Biomedical Informatics (GRIB) is a member of the Spanish National Bioinformatics Institute (INB), PRB2-ISCIII and is supported by grant PT13/0001/0023, of the PE I+D+i 2013-2016, funded by ISCIII and FEDER

    PsyGeNET : a knowledge platform on psychiatric disorders and their genes

    Get PDF
    Altres ajuts: Innovative Medicines Initiative Joint Undertaking (no. 115372, EMIF and no. 115191, Open PHACTS), resources of which are composed of financial contribution from the European Union's Seventh Framework Programme (FP7/2007-2013) and EFPIA companies' in kind contribution.Summary: PsyGeNET (Psychiatric disorders and Genes association NETwork) is a knowledge platform for the exploratory analysis of psychiatric diseases and their associated genes. PsyGeNET is composed of a database and a web interface supporting data search, visualization, filtering and sharing. PsyGeNET integrates information from DisGeNET and data extracted from the literature by text mining, which has been curated by domain experts. It currently contains 2642 associations between 1271 genes and 37 psychiatric disease concepts. In its first release, PsyGeNET is focused on three psychiatric disorders: major depression, alcohol and cocaine use disorders. PsyGeNET represents a comprehensive, open access resource for the analysis of the molecular mechanisms underpinning psychiatric disorders and their comorbidities

    Genetic and functional characterization of disease associations explains comorbidity

    Get PDF
    Understanding relationships between diseases, such as comorbidities, has important socio-economic implications, ranging from clinical study design to health care planning. Most studies characterize disease comorbidity using shared genetic origins, ignoring pathway-based commonalities between diseases. In this study, we define the disease pathways using an interactome-based extension of known disease-genes and introduce several measures of functional overlap. The analysis reveals 206 significant links among 94 diseases, giving rise to a highly clustered disease association network. We observe that around 95% of the links in the disease network, though not identified by genetic overlap, are discovered by functional overlap. This disease network portraits rheumatoid arthritis, asthma, atherosclerosis, pulmonary diseases and Crohn's disease as hubs and thus pointing to common inflammatory processes underlying disease pathophysiology. We identify several described associations such as the inverse comorbidity relationship between Alzheimer's disease and neoplasms. Furthermore, we investigate the disruptions in protein interactions by mapping mutations onto the domains involved in the interaction, suggesting hypotheses on the causal link between diseases. Finally, we provide several proof-of-principle examples in which we model the effect of the mutation and the change of the association strength, which could explain the observed comorbidity between diseases caused by the same genetic alterations

    An ensemble learning approach for modeling the systems biology of drug-induced injury

    Get PDF
    Background: Drug-induced liver injury (DILI) is an adverse reaction caused by the intake of drugs of common use that produces liver damage. The impact of DILI is estimated to affect around 20 in 100,000 inhabitants worldwide each year. Despite being one of the main causes of liver failure, the pathophysiology and mechanisms of DILI are poorly understood. In the present study, we developed an ensemble learning approach based on different features (CMap gene expression, chemical structures, drug targets) to predict drugs that might cause DILI and gain a better understanding of the mechanisms linked to the adverse reaction. Results: We searched for gene signatures in CMap gene expression data by using two approaches: phenotype-gene associations data from DisGeNET, and a non-parametric test comparing gene expression of DILI-Concern and No-DILI-Concern drugs (as per DILIrank definitions). The average accuracy of the classifiers in both approaches was 69%. We used chemical structures as features, obtaining an accuracy of 65%. The combination of both types of features produced an accuracy around 63%, but improved the independent hold-out test up to 67%. The use of drug-target associations as feature obtained the best accuracy (70%) in the independent hold-out test. Conclusions: When using CMap gene expression data, searching for a specific gene signature among the landmark genes improves the quality of the classifiers, but it is still limited by the intrinsic noise of the dataset. When using chemical structures as a feature, the structural diversity of the known DILI-causing drugs hampers the prediction, which is a similar problem as for the use of gene expression information. The combination of both features did not improve the quality of the classifiers but increased the robustness as shown on independent hold-out tests. The use of drug-target associations as feature improved the prediction, specially the specificity, and the results were comparable to previous research studies.The authors received funding from the Innovative Medicines Initiative 2 Joint Undertaking under grant agreements TransQST and eTRANSAFE (refs: 116030, 777365). This Joint Undertaking receives support from the European Union’s Horizon 2020 research and innovation programme and EFPIA companies in kind contribution. The authors also received support from Spanish Ministry of Economy (MINECO, refs: BIO2017–85329-R (FEDER, EU), RYC-2015-17519) as well as EU H2020 Programme 2014–2020 under grant agreement No. 676559 (Elixir-Excelerate) and from Agència de Gestió D’ajuts Universitaris i de Recerca Generalitat de Catalunya (AGAUR, ref.: 2017SGR01020). L.I.F. received support from ISCIII-FEDER (ref: CPII16/00026). The Research Programme on Biomedical Informatics (GRIB) is a member of the Spanish National Bioinformatics Institute (INB), PRB2-ISCIII and is supported by grant PT13/0001/0023, of the PE I + D + i 2013–2016, funded by ISCIII and FEDER. The DCEXS is a “Unidad de Excelencia María de Maeztu”, funded by the MINECO (ref: MDM-2014-0370). J.A.P. received support from the CAMDA Travel Fellowship

    Personalized Respiratory Medicine: Exploring the Horizon, Addressing the Issues. Summary of a BRN-AJRCCM Workshop Held in Barcelona on June 12, 2014.

    Get PDF
    This Pulmonary Perspective summarizes the content and main conclusions of an international workshop on personalized respiratory medicine coorganized by the Barcelona Respiratory Network (www.brn.cat)and the AJRCCM in June 2014. It discusses (1) its definition and historical, social, legal, and ethical aspects; (2) the view from different disciplines, including basic science, epidemiology, bioinformatics,and network/systems medicine; (3) the bottlenecks and opportunities identified by some currently ongoing projects; and (4) the implications for the individual, the healthcare system and the pharmaceutical industry. The authors hope that, although it is not a systematic review on the subject,this document can be a useful reference for researchers, clinicians, healthcare managers, policy-makers,and industry parties interested in personalized respiratory medicine

    Assessment of NER solutions against the first and second CALBC Silver Standard Corpus

    Get PDF
    Background Competitions in text mining have been used to measure the performance of automatic text processing solutions against a manually annotated gold standard corpus (GSC). The preparation of the GSC is time-consuming and costly and the final corpus consists at the most of a few thousand documents annotated with a limited set of semantic groups. To overcome these shortcomings, the CALBC project partners (PPs) have produced a large-scale annotated biomedical corpus with four different semantic groups through the harmonisation of annotations from automatic text mining solutions, the first version of the Silver Standard Corpus (SSC-I). The four semantic groups are chemical entities and drugs (CHED), genes and proteins (PRGE), diseases and disorders (DISO) and species (SPE). This corpus has been used for the First CALBC Challenge asking the participants to annotate the corpus with their text processing solutions. Results All four PPs from the CALBC project and in addition, 12 challenge participants (CPs) contributed annotated data sets for an evaluation against the SSC-I. CPs could ignore the training data and deliver the annotations from their genuine annotation system, or could train a machine-learning approach on the provided pre-annotated data. In general, the performances of the annotation solutions were lower for entities from the categories CHED and PRGE in comparison to the identification of entities categorized as DISO and SPE. The best performance over all semantic groups were achieved from two annotation solutions that have been trained on the SSC-I. The data sets from participants were used to generate the harmonised Silver Standard Corpus II (SSC-II), if the participant did not make use of the annotated data set from the SSC-I for training purposes. The performances of the participants’ solutions were again measured against the SSC-II. The performances of the annotation solutions showed again better results for DISO and SPE in comparison to CHED and PRGE. Conclusions The SSC-I delivers a large set of annotations (1,121,705) for a large number of documents (100,000 Medline abstracts). The annotations cover four different semantic groups and are sufficiently homogeneous to be reproduced with a trained classifier leading to an average F-measure of 85%. Benchmarking the annotation solutions against the SSC-II leads to better performance for the CPs’ annotation solutions in comparison to the SSC-I
    corecore