12 research outputs found

    Using shape expressions (ShEx) to share rdf data models and to guide curation with rigorous validation

    Get PDF
    International Conference, European Semantic Web Conference, ESWC (16th. 2019. Portorož, Slovenia

    Development of Bioinformatics Resources for Glycan-related Pathway Information using Semantic Web Technologies

    Get PDF
    創価大学博士(工学)It has been attempted to integrate distributed data in many biological disciplines, which will enable the development of new knowledge bases and provide insight into underlying biological processes. However, the diversity of biological data types and the complexity of concepts has presented an obstacle to data integration. On the other hand, Semantic Web Techniques, created to provide a standard for data sharing on the web, have been used for integrating biological information derived from various data types. I implemented the fundamental methods of Semantic Web technology features in order to standardize the glycan-related data that is gathered from public databases or co-researchers in a computer-readable manner and to build a repository for pathway information, which is described by interpreting different types of resources and concepts including catalytic activation, translocation, modification, and so on. Given the importance of glycans in pathway information, sharing data with other information from existing databases or more specific details provided by users will support in data integration.doctoral thesi

    Semantic modelling of common data elements for rare disease registries, and a prototype workflow for their deployment over registry data

    Get PDF
    BACKGROUND: The European Platform on Rare Disease Registration (EU RD Platform) aims to address the fragmentation of European rare disease (RD) patient data, scattered among hundreds of independent and non-coordinating registries, by establishing standards for integration and interoperability. The first practical output of this effort was a set of 16 Common Data Elements (CDEs) that should be implemented by all RD registries. Interoperability, however, requires decisions beyond data elements - including data models, formats, and semantics. Within the European Joint Programme on Rare Diseases (EJP RD), we aim to further the goals of the EU RD Platform by generating reusable RD semantic model templates that follow the FAIR Data Principles. RESULTS: Through a team-based iterative approach, we created semantically grounded models to represent each of the CDEs, using the SemanticScience Integrated Ontology as the core framework for representing the entities and their relationships. Within that framework, we mapped the concepts represented in the CDEs, and their possible values, into domain ontologies such as the Orphanet Rare Disease Ontology, Human Phenotype Ontology and National Cancer Institute Thesaurus. Finally, we created an exemplar, reusable ETL pipeline that we will be deploying over these non-coordinating data repositories to assist them in creating model-compliant FAIR data without requiring site-specific coding nor expertise in Linked Data or FAIR. CONCLUSIONS: Within the EJP RD project, we determined that creating reusable, expert-designed templates reduced or eliminated the requirement for our participating biomedical domain experts and rare disease data hosts to understand OWL semantics. This enabled them to publish highly expressive FAIR data using tools and approaches that were already familiar to them

    Evaluating FAIR Digital Object and Linked Data as distributed object systems

    Full text link
    FAIR Digital Object (FDO) is an emerging concept that is highlighted by European Open Science Cloud (EOSC) as a potential candidate for building a ecosystem of machine-actionable research outputs. In this work we systematically evaluate FDO and its implementations as a global distributed object system, by using five different conceptual frameworks that cover interoperability, middleware, FAIR principles, EOSC requirements and FDO guidelines themself. We compare the FDO approach with established Linked Data practices and the existing Web architecture, and provide a brief history of the Semantic Web while discussing why these technologies may have been difficult to adopt for FDO purposes. We conclude with recommendations for both Linked Data and FDO communities to further their adaptation and alignment.Comment: 40 pages, submitted to PeerJ C

    Transforming the plenary session minutes of the Parliament of Finland into semantic data and publishing them as a web service

    Get PDF
    Parlamentaaristen aineistojen digitointi ja rakenteistaminen tutkimuskäyttöön on nouseva tutkimuksenala, jonka tiimoilta esimerkiksi Euroopassa on tällä hetkellä käynnissä useita kansallisia hankkeita. Tämä tutkielma on osa Semanttinen parlamentti -hanketta, jossa Suomen eduskunnan täysistuntojen puheenvuorot saatetaan ensimmäistä kertaa yhtenäiseksi, harmonisoiduksi aineistoksi koneluettavaan muotoon aina eduskunnan alusta vuodesta 1907 nykypäivään. Puheenvuorot ja niihin liittyvät runsaat kuvailutiedot on julkaistu kahtena versiona, parlamentaaristen aineistojen kuvaamiseen käytetyssä Parla-CLARIN XML -formaatissa sekä linkitetyn avoimen datan tietämysverkkona, joka kytkee aineiston osaksi laajempaa kansallista tietoinfrastruktuuria. Yhtenäinen puheenvuoroaineisto tarjoaa ennennäkemättömiä mahdollisuuksia tarkastella suomalaista parlamentarismia yli sadan vuoden ajalta monisyisesti ja automatisoidusti. Aineisto sisältää lähes miljoona erillistä puheenvuoroa ja linkittyy tiiviisti eduskunnan toimijoiden biografisiin tietoihin. Tässä tutkielmassa kuvataan puheenvuorojen esittämistä varten kehitetyt tietomallit ja puheenvuoroaineistojen keräys- ja muunnosprosessi sekä tarkastellaan prosessin ja syntyneen aineiston haasteita ja mahdollisuuksia. Toteutetun aineistojulkaisun hyödyllisyyden arvioimiseksi on Parla-CLARIN-muotoista aineistoa jo hyödynnetty poliittiseen kulttuuriin liittyvässä digitaalisten ihmistieteiden tutkimuksessa. Linkitetyn datan pohjalta on kehitetty semanttinen portaali, Parlamenttisampo, aineistojen julkaisemista ja tutkimista varten verkossa

    Assessing the quality of Wikidata referencing

    Get PDF
    Wikidata is a versatile and broad-based Knowledge Graph (KG) that leverages the power of collaborative contributions via an open wiki, augmented by bot accounts, to curate the content. Wikidata represents over 102 million interlinked data entities, accompanied by over 1.4 billion statements about the items, accessible to the public via a SPARQL endpoint and diverse dump formats. The Wikidata data model enables assigning references to every single statement. While the quality of Wikidata statements has been assessed, the quality of references in this knowledge graph is not well covered in the literature. To cover the gap, we develop and implement a comprehensive referencing quality assessment framework based on Linked Data quality dimensions and criteria. We implement the objective metrics of the assessment framework as the Referencing Quality Scoring System - RQSS. RQSS provides quantified scores by which the referencing quality can be analyzed and compared. Due to the scale of Wikidata, we developed a subsetting approach to creating a comparison platform that systematically samples Wikidata. We have used both well-defined subsets and random samples to evaluate the quality of references in Wikidata using RQSS. Based on RQSS, the overall referencing quality in Wikidata subsets is 0.58 out of 1. Random subsets (representative of Wikidata) have higher overall scores than topical subsets by 0.05, with Gene Wiki having the highest scores amongst topical subsets. Regarding referencing quality dimensions, all subsets have high scores in accuracy, availability, security, and understandability, but have weaker scores in completeness, verifiability, objectivity, and versatility. RQSS scripts can be reused to monitor the referencing quality over time. The evaluation shows that RQSS is practical and provides valuable information, which can be used by Wikidata contributors and WikiProject owners to identify the referencing quality gaps. Although RQSS is developed based on the Wikidata RDF model, its referencing quality assessment framework can be generalized to any RDF KG.James Watt Scholarship fundin

    Engineering Agile Big-Data Systems

    Get PDF
    To be effective, data-intensive systems require extensive ongoing customisation to reflect changing user requirements, organisational policies, and the structure and interpretation of the data they hold. Manual customisation is expensive, time-consuming, and error-prone. In large complex systems, the value of the data can be such that exhaustive testing is necessary before any new feature can be added to the existing design. In most cases, the precise details of requirements, policies and data will change during the lifetime of the system, forcing a choice between expensive modification and continued operation with an inefficient design.Engineering Agile Big-Data Systems outlines an approach to dealing with these problems in software and data engineering, describing a methodology for aligning these processes throughout product lifecycles. It discusses tools which can be used to achieve these goals, and, in a number of case studies, shows how the tools and methodology have been used to improve a variety of academic and business systems
    corecore