9 research outputs found

    Using Machine Learning to Collect and Facilitate Remote Access to Biomedical Databases: Development of the Biomedical Database Inventory

    Get PDF
    [Abstract] Background: Currently, existing biomedical literature repositories do not commonly provide users with specific means to locate and remotely access biomedical databases. Objective: To address this issue, we developed the Biomedical Database Inventory (BiDI), a repository linking to biomedical databases automatically extracted from the scientific literature. BiDI provides an index of data resources and a path to access them seamlessly. Methods: We designed an ensemble of deep learning methods to extract database mentions. To train the system, we annotated a set of 1242 articles that included mentions of database publications. Such a data set was used along with transfer learning techniques to train an ensemble of deep learning natural language processing models targeted at database publication detection. Results: The system obtained an F1 score of 0.929 on database detection, showing high precision and recall values. When applying this model to the PubMed and PubMed Central databases, we identified over 10,000 unique databases. The ensemble model also extracted the weblinks to the reported databases and discarded irrelevant links. For the extraction of weblinks, the model achieved a cross-validated F1 score of 0.908. We show two use cases: one related to “omics” and the other related to the COVID-19 pandemic. Conclusions: BiDI enables access to biomedical resources over the internet and facilitates data-driven research and other scientific initiatives. The repository is openly available online and will be regularly updated with an automatic text processing pipeline. The approach can be reused to create repositories of different types (ie, biomedical and others).Proyecto colaborativo de integración de datos genómicos; PI17/0156

    Querying clinical data in HL7 RIM based relational model with morph-RDB

    No full text
    Background: Semantic interoperability is essential when carrying out post-genomic clinical trials where several institutions collaborate, since researchers and developers need to have an integrated view and access to heterogeneous data sources. One possible approach to accommodate this need is to use RDB2RDF systems that provide RDF datasets as the unified view. These RDF datasets may be materialized and stored in a triple store, or transformed into RDF in real time, as virtual RDF data sources. Our previous efforts involved materialized RDF datasets, hence losing data freshness.Results: In this paper we present a solution that uses an ontology based on the HL7 v3 Reference Information Model and a set of R2RML mappings that relate this ontology to an underlying relational database implementation, and where morph-RDB is used to expose a virtual, non-materialized SPARQL endpoint over the data.Conclusions: By applying a set of optimization techniques on the SPARQL-to-SQL query translation algorithm, we can now issue SPARQL queries to the underlying relational data with generally acceptable performance

    R2RML-based access and querying to relational clinical data with morph-RDB

    Full text link
    Semantic interoperability is essential when carrying out post-genomic clinical trials where several institutions collaborate, since researchers and developers need to have an integrated view and access to heterogeneous data sources. In this paper we present a solution that uses an ontology based on the HL7 v3 Reference Information Model and a set of R2RML mappings that relate this ontology to an underlying relational database implementation, and where morph-RDB is used to expose a virtual SPARQL endpoint over the data. In previous efforts with other existing RDB2RDF systems we had not been able to work with live databases. Now we can issue SPARQL queries to the underlying relational data with acceptable performance, in general.%with a similar performance to having used the corresponding SQL native queries

    R2RML-based access and querying to relational clinical data with morph-RDB

    No full text
    Semantic interoperability is essential when carrying out post-genomic clinical trials where several institutions collaborate, since researchers and developers need to have an integrated view and access to heterogeneous data sources. In this paper we present a solution that uses an ontology based on the HL7 v3 Reference Information Model and a set of R2RML mappings that relate this ontology to an underlying relational database implementation, and where morph-RDB is used to expose a virtual SPARQL endpoint over the data. In previous efforts with other existing RDB2RDF systems we had not been able to work with live databases. Now we can issue SPARQL queries to the underlying relational data with acceptable performance, in general.%with a similar performance to having used the corresponding SQL native queries

    Enabling semantic interoperability in multi-centric clinical trials on breast cancer

    No full text
    Background and objectives: Post-genomic clinical trials require the participation of multiple institutions, and collecting data from several hospitals, laboratories and research facilities. This paper presents a standard-based solution to provide a uniform access endpoint to patient data involved in current clinical research. Methods: The proposed approach exploits well-established standards such as HL7 v3 or SPARQL and medical vocabularies such as SNOMED CT, LOINC and HGNC. A novel mechanism to exploit semantic normalization among HL7-based data models and biomedical ontologies has been created by using Semantic Web technologies. Results: Different types of queries have been used for testing the semantic interoperability solution described in this paper. The execution times obtained in the tests enable the development of end user tools within a framework that requires efficient retrieval of integrated data. Conclusions: The proposed approach has been successfully tested by applications within the INTEGRATE and EURECA EU projects. These applications have been deployed and tested for: (i) patient screening, (ii) trial recruitment, and (iii) retrospective analysis; exploiting semantically interoperable access to clinical patient data from heterogeneous data sources.SCOPUS: ar.jinfo:eu-repo/semantics/publishe

    genoDraw: A web tool for developing pedigree diagrams using the standardized human pedigree nomenclature integrated with biomedical vocabularies

    No full text
    The integration of genetic information in current clinical routine has raised a need for tools to exploit family genetic knowledge. On the clinical side, an application for managing and visualizing pedigree diagrams could provide genetics specialists with an integrated environment with potential positive impact on their current practice. This article presents a web tool (genoDraw) that provides clinical practitioners with the ability to create, maintain and visualize patients’ and their families’ information in the form of pedigree diagrams. genoDraw implements a graph-based three-step process for generating diagrams according to a de facto standard in the area and clinical terminologies. It also complies with five characteristics identified as indispensable for the next-generation of pedigree drawing software: comprehensiveness, data-drivenness, automation, interactivity and compatibility with biomedical vocabularies. The platform was implemented and tested, confirming its potential interest to clinical routine

    genoDraw: A web tool for developing pedigree diagrams using the standardized human pedigree nomenclature integrated with biomedical vocabularies

    Full text link
    The integration of genetic information in current clinical routine has raised a need for tools to exploit family genetic knowledge. On the clinical side, an application for managing and visualizing pedigree diagrams could provide genetics specialists with an integrated environment with potential positive impact on their current practice. This article presents a web tool (genoDraw) that provides clinical practitioners with the ability to create, maintain and visualize patients’ and their families’ information in the form of pedigree diagrams. genoDraw implements a graph-based three-step process for generating diagrams according to a de facto standard in the area and clinical terminologies. It also complies with five characteristics identified as indispensable for the next-generation of pedigree drawing software: comprehensiveness, data-drivenness, automation, interactivity and compatibility with biomedical vocabularies. The platform was implemented and tested, confirming its potential interest to clinical routine

    GenoDraw: a tool to create pedigree diagrams based on biomedical terminologies and standards

    Full text link
    The need for integrating genomic data into daily clinical practice raises the demand for tools capable of representing individuals’ data and also their biological relationships with other individuals. In this work, we introduce genoDraw, a new platform for creating and managing pedigree diagrams following biomedical standards. The proposed work focuses in five critical aspects for the adoption of this platform in the clinical practice, namely: data-drivenness, automation, interactivity, comprehensiveness and compatibility with widely-adopted biomedical vocabularies for the annotation of traits and characteristics. We present a novel process for generating pedigree diagrams from individual data. This process generates pedigree diagrams that comply with the pedigree nomenclature used as a defacto standard in the area. We implemented the system as a web platform for ensuring complete compatibility. We also performed an evaluation process, which included usability tests, and the results show a promising adequacy for the usage in the clinical practice

    A data model based on semantically enhanced HL7 RIM for sharing patient data of breast cancer clinical trials

    Full text link
    ABSTRACT Breast cancer clinical trial researchers have to handle heterogeneous data coming from different data sources, overloading biomedical researchers when they need to query data for retrospective analysis. This paper presents the Common Data Model (CDM) proposed within the INTEGRATE EU project to homogenize data coming from different clinical partners. This CDM is based on the Reference Information Model (RIM)from the Health Level 7 (HL7) version 3. Semantic capabilities through an SPARQL endpoint were also required to ensure the sustainability of the platform. For the SPARQL endpoint implementation, a comparison has been carried out between a Relational SQL database + D2R and a RDF database. The results show that the first option can store all clinical data received from institutions participating in the project with a better performance. It has been also evaluated by the EU Commission within a patient recruitment demonstrator
    corecore