904 research outputs found

    Reproducible Domain-Specific Knowledge Graphs in the Life Sciences: a Systematic Literature Review

    Full text link
    Knowledge graphs (KGs) are widely used for representing and organizing structured knowledge in diverse domains. However, the creation and upkeep of KGs pose substantial challenges. Developing a KG demands extensive expertise in data modeling, ontology design, and data curation. Furthermore, KGs are dynamic, requiring continuous updates and quality control to ensure accuracy and relevance. These intricacies contribute to the considerable effort required for their development and maintenance. One critical dimension of KGs that warrants attention is reproducibility. The ability to replicate and validate KGs is fundamental for ensuring the trustworthiness and sustainability of the knowledge they represent. Reproducible KGs not only support open science by allowing others to build upon existing knowledge but also enhance transparency and reliability in disseminating information. Despite the growing number of domain-specific KGs, a comprehensive analysis concerning their reproducibility has been lacking. This paper addresses this gap by offering a general overview of domain-specific KGs and comparing them based on various reproducibility criteria. Our study over 19 different domains shows only eight out of 250 domain-specific KGs (3.2%) provide publicly available source code. Among these, only one system could successfully pass our reproducibility assessment (14.3%). These findings highlight the challenges and gaps in achieving reproducibility across domain-specific KGs. Our finding that only 0.4% of published domain-specific KGs are reproducible shows a clear need for further research and a shift in cultural practices

    KG-Hub-building and exchanging biological knowledge graphs.

    Get PDF
    MOTIVATION: Knowledge graphs (KGs) are a powerful approach for integrating heterogeneous data and making inferences in biology and many other domains, but a coherent solution for constructing, exchanging, and facilitating the downstream use of KGs is lacking. RESULTS: Here we present KG-Hub, a platform that enables standardized construction, exchange, and reuse of KGs. Features include a simple, modular extract-transform-load pattern for producing graphs compliant with Biolink Model (a high-level data model for standardizing biological data), easy integration of any OBO (Open Biological and Biomedical Ontologies) ontology, cached downloads of upstream data sources, versioned and automatically updated builds with stable URLs, web-browsable storage of KG artifacts on cloud infrastructure, and easy reuse of transformed subgraphs across projects. Current KG-Hub projects span use cases including COVID-19 research, drug repurposing, microbial-environmental interactions, and rare disease research. KG-Hub is equipped with tooling to easily analyze and manipulate KGs. KG-Hub is also tightly integrated with graph machine learning (ML) tools which allow automated graph ML, including node embeddings and training of models for link prediction and node classification. AVAILABILITY AND IMPLEMENTATION: https://kghub.org

    Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences

    Get PDF
    Increased reliance on computational approaches in the life sciences has revealed grave concerns about how accessible and reproducible computation-reliant results truly are. Galaxy http://usegalaxy.org, an open web-based platform for genomic research, addresses these problems. Galaxy automatically tracks and manages data provenance and provides support for capturing the context and intent of computational methods. Galaxy Pages are interactive, web-based documents that provide users with a medium to communicate a complete computational analysis

    Templates as a method for implementing data provenance in decision support systems

    Get PDF
    AbstractDecision support systems are used as a method of promoting consistent guideline-based diagnosis supporting clinical reasoning at point of care. However, despite the availability of numerous commercial products, the wider acceptance of these systems has been hampered by concerns about diagnostic performance and a perceived lack of transparency in the process of generating clinical recommendations. This resonates with the Learning Health System paradigm that promotes data-driven medicine relying on routine data capture and transformation, which also stresses the need for trust in an evidence-based system. Data provenance is a way of automatically capturing the trace of a research task and its resulting data, thereby facilitating trust and the principles of reproducible research. While computational domains have started to embrace this technology through provenance-enabled execution middlewares, traditionally non-computational disciplines, such as medical research, that do not rely on a single software platform, are still struggling with its adoption. In order to address these issues, we introduce provenance templates – abstract provenance fragments representing meaningful domain actions. Templates can be used to generate a model-driven service interface for domain software tools to routinely capture the provenance of their data and tasks. This paper specifies the requirements for a Decision Support tool based on the Learning Health System, introduces the theoretical model for provenance templates and demonstrates the resulting architecture. Our methods were tested and validated on the provenance infrastructure for a Diagnostic Decision Support System that was developed as part of the EU FP7 TRANSFoRm project

    Combining machine learning and semantic web: A systematic mapping study

    Full text link
    In line with the general trend in artificial intelligence research to create intelligent systems that combine learning and symbolic components, a new sub-area has emerged that focuses on combining Machine Learning components with techniques developed by the Semantic Web community - Semantic Web Machine Learning (SWeML). Due to its rapid growth and impact on several communities in thepast two decades, there is a need to better understand the space of these SWeML Systems, their characteristics, and trends. Yet, surveys that adopt principled and unbiased approaches are missing. To fill this gap, we performed a systematic study and analyzed nearly 500 papers published in the past decade in this area, where we focused on evaluating architectural and application-specific features. Our analysis identified a rapidly growing interest in SWeML Systems, with a high impact on several application domains and tasks. Catalysts for this rapid growth are the increased application of deep learning and knowledge graph technologies. By leveraging the in-depth understanding of this area acquired through this study, a further key contribution of this article is a classification system for SWeML Systems that we publish as ontology.</p

    The FAIR Cookbook - the essential resource for and by FAIR doers

    Get PDF
    The notion that data should be Findable, Accessible, Interoperable and Reusable, according to the FAIR Principles, has become a global norm for good data stewardship and a prerequisite for reproducibility. Nowadays, FAIR guides data policy actions and professional practices in the public and private sectors. Despite such global endorsements, however, the FAIR Principles are aspirational, remaining elusive at best, and intimidating at worst. To address the lack of practical guidance, and help with capability gaps, we developed the FAIR Cookbook, an open, online resource of hands-on recipes for “FAIR doers” in the Life Sciences. Created by researchers and data managers professionals in academia, (bio)pharmaceutical companies and information service industries, the FAIR Cookbook covers the key steps in a FAIRification journey, the levels and indicators of FAIRness, the maturity model, the technologies, the tools and the standards available, as well as the skills required, and the challenges to achieve and improve data FAIRness. Part of the ELIXIR ecosystem, and recommended by funders, the FAIR Cookbook is open to contributions of new recipes.We thank all book dash participants and recipe authors, as well as the FAIRplus fellows, all partners, and the members of the FAIRplus Scientific Advisory Board, and the management team. In particular we acknowledge a number of colleagues for their role in the FAIRplus project, in particular: Ebitsam Alharbi (0000-0002-3887-3857), Oya Deniz Beyan (0000-0001-7611-3501), Ola Engkvist (0000-0003-4970-6461), Laura Furlong (0000-0002-9383-528X), Carole Goble (0000-0003-1219-2137), Mark Ibberson (0000-0003-3152-5670), Manfred Kohler, Nick Lynch (0000-0002-8997-5298), Scott Lusher (0000-0003-2401-4223), Jean-Marc Neefs, George Papadotas, Manuela Pruess (0000-0002-6857-5543), Ratnesh Sahay, Rudi Verbeeck (0000-0001-5445-6095), Bryn Williams-Jones, and Gesa Witt (0000-0003-2344-706X). This work and the authors were primarily funded by FAIRplus (IMI 802750). PRS and SAS also acknowledge contributions from the following grants (the FAIR Cookbook is also embedded in or connected to): ELIXIR Interoperability Platform, EOSC-Life (H2020-EU 824087), FAIRsharing (Wellcome 212930/Z/18/Z), NIH CFDE Coordinating Center (NIH Common Fund OT3OD025459-01), Precision Toxicology (H2020-EU 965406), UKRI DASH grant (MR/V038966/1), BY-COVID (Horizon-EU 101046203), AgroServ (Horizon-EU 101058020).Peer Reviewed"Article signat per 33 autors/es: Philippe Rocca-Serra, Wei Gu, Vassilios Ioannidis, Tooba Abbassi-Daloii, Salvador Capella-Gutierrez, Ishwar Chandramouliswaran, Andrea Splendiani, Tony Burdett, Robert T. Giessmann, David Henderson, Dominique Batista, Ibrahim Emam, Yojana Gadiya, Lucas Giovanni, Egon Willighagen, Chris Evelo, Alasdair J. G. Gray, Philip Gribbon, Nick Juty, Danielle Welter, Karsten Quast, Paul Peeters, Tom Plasterer, Colin Wood, Eelke van der Horst, Dorothy Reilly, Herman van Vlijmen, Serena Scollen, Allyson Lister, Milo Thurston, Ramon Granell, the FAIR Cookbook Contributors & Susanna-Assunta Sansone"Postprint (published version

    brainlife.io: A decentralized and open source cloud platform to support neuroscience research

    Full text link
    Neuroscience research has expanded dramatically over the past 30 years by advancing standardization and tool development to support rigor and transparency. Consequently, the complexity of the data pipeline has also increased, hindering access to FAIR data analysis to portions of the worldwide research community. brainlife.io was developed to reduce these burdens and democratize modern neuroscience research across institutions and career levels. Using community software and hardware infrastructure, the platform provides open-source data standardization, management, visualization, and processing and simplifies the data pipeline. brainlife.io automatically tracks the provenance history of thousands of data objects, supporting simplicity, efficiency, and transparency in neuroscience research. Here brainlife.io's technology and data services are described and evaluated for validity, reliability, reproducibility, replicability, and scientific utility. Using data from 4 modalities and 3,200 participants, we demonstrate that brainlife.io's services produce outputs that adhere to best practices in modern neuroscience research
    corecore