592 research outputs found

    SciKGTeX - A LATEX Package to Semantically Annotate Contributions in Scientific Publications

    Get PDF
    The continuously increasing output of published research makes the work of researchers harder as it becomes impossible to keep track of and compare the most recent advances in a field. Scientific knowledge graphs have been proposed as a solution to structure the content of research publications in a machine-readable way and enable more efficient, computer-assisted workflows for many research activities. Crowdsourcing approaches are used frequently to build and maintain such scientific knowledge graphs. Researchers are motivated to contribute to these crowdsourcing efforts as they want their work to be included in the knowledge graphs and benefit from applications built on top of them. To contribute to scientific knowledge graphs, researchers need simple and easy-to-use solutions to generate new knowledge graph elements and establish the practice of semantic representations in scientific communication. In this thesis, I present SciKGTeX, a LATEX package to semantically annotate scientific contributions at the time of document creation. The LATEX package allows authors of scientific publications to mark the main contributions such as the background, research problem, method, results and conclusion of their work directly in LATEX source files. The package then automatically embeds them as metadata into the generated PDF document. In addition to the package, I document a user evaluation with 26 participants which I conducted to assess the usability and feasibility of the solution. The analysis of the evaluation results shows that SciKGTeX is highly usable with a score of 79 out of 100 on the System Usability Scale. Furthermore, the study showed that the functionalities of the package can be picked up very quickly by the study participants which only needed 7 minutes on average to annotate the main contributions on a sample abstract of a published paper. SciKGTeX demonstrates a new way to generate structured metadata for the key contributions of research publications and embed them into PDF files at the time of document creation

    Ontology of core data mining entities

    Get PDF
    In this article, we present OntoDM-core, an ontology of core data mining entities. OntoDM-core defines themost essential datamining entities in a three-layered ontological structure comprising of a specification, an implementation and an application layer. It provides a representational framework for the description of mining structured data, and in addition provides taxonomies of datasets, data mining tasks, generalizations, data mining algorithms and constraints, based on the type of data. OntoDM-core is designed to support a wide range of applications/use cases, such as semantic annotation of data mining algorithms, datasets and results; annotation of QSAR studies in the context of drug discovery investigations; and disambiguation of terms in text mining. The ontology has been thoroughly assessed following the practices in ontology engineering, is fully interoperable with many domain resources and is easy to extend

    Doctor of Philosophy

    Get PDF
    dissertationManual annotation of clinical texts is often used as a method of generating reference standards that provide data for training and evaluation of Natural Language Processing (NLP) systems. Manually annotating clinical texts is time consuming, expensive, and requires considerable cognitive effort on the part of human reviewers. Furthermore, reference standards must be generated in ways that produce consistent and reliable data but must also be valid in order to adequately evaluate the performance of those systems. The amount of labeled data necessary varies depending on the level of analysis, the complexity of the clinical use case, and the methods that will be used to develop automated machine systems for information extraction and classification. Evaluating methods that potentially reduce cost, manual human workload, introduce task efficiencies, and reduce the amount of labeled data necessary to train NLP tools for specific clinical use cases are active areas of research inquiry in the clinical NLP domain. This dissertation integrates a mixed methods approach using methodologies from cognitive science and artificial intelligence with manual annotation of clinical texts. Aim 1 of this dissertation identifies factors that affect manual annotation of clinical texts. These factors are further explored by evaluating approaches that may introduce efficiencies into manual review tasks applied to two different NLP development areas - semantic annotation of clinical concepts and identification of information representing Protected Health Information (PHI) as defined by HIPAA. Both experiments integrate iv different priming mechanisms using noninteractive and machine-assisted methods. The main hypothesis for this research is that integrating pre-annotation or other machineassisted methods within manual annotation workflows will improve efficiency of manual annotation tasks without diminishing the quality of generated reference standards

    A Quadruple-Based Text Analysis System for History and Philosophy of Science

    Get PDF
    abstract: Computational tools in the digital humanities often either work on the macro-scale, enabling researchers to analyze huge amounts of data, or on the micro-scale, supporting scholars in the interpretation and analysis of individual documents. The proposed research system that was developed in the context of this dissertation ("Quadriga System") works to bridge these two extremes by offering tools to support close reading and interpretation of texts, while at the same time providing a means for collaboration and data collection that could lead to analyses based on big datasets. In the field of history of science, researchers usually use unstructured data such as texts or images. To computationally analyze such data, it first has to be transformed into a machine-understandable format. The Quadriga System is based on the idea to represent texts as graphs of contextualized triples (or quadruples). Those graphs (or networks) can then be mathematically analyzed and visualized. This dissertation describes two projects that use the Quadriga System for the analysis and exploration of texts and the creation of social networks. Furthermore, a model for digital humanities education is proposed that brings together students from the humanities and computer science in order to develop user-oriented, innovative tools, methods, and infrastructures.Dissertation/ThesisDoctoral Dissertation Biology 201

    Deploying mutation impact text-mining software with the SADI Semantic Web Services framework

    Get PDF
    Background: Mutation impact extraction is an important task designed to harvest relevant annotations from scientific documents for reuse in multiple contexts. Our previous work on text mining for mutation impacts resulted in (i) the development of a GATE-based pipeline that mines texts for information about impacts of mutations on proteins, (ii) the population of this information into our OWL DL mutation impact ontology, and (iii) establishing an experimental semantic database for storing the results of text mining. Results: This article explores the possibility of using the SADI framework as a medium for publishing our mutation impact software and data. SADI is a set of conventions for creating web services with semantic descriptions that facilitate automatic discovery and orchestration. We describe a case study exploring and demonstrating the utility of the SADI approach in our context. We describe several SADI services we created based on our text mining API and data, and demonstrate how they can be used in a number of biologically meaningful scenarios through a SPARQL interface (SHARE) to SADI services. In all cases we pay special attention to the integration of mutation impact services with external SADI services providing information about related biological entities, such as proteins, pathways, and drugs. Conclusion: We have identified that SADI provides an effective way of exposing our mutation impact data suc

    A novel gluten knowledge base of potential biomedical and health-related interactions extracted from the literature: using machine learning and graph analysis methodologies to reconstruct the bibliome

    Get PDF
    Background In return for their nutritional properties and broad availability, cereal crops have been associated with different alimentary disorders and symptoms, with the majority of the responsibility being attributed to gluten. Therefore, the research of gluten-related literature data continues to be produced at ever-growing rates, driven in part by the recent exploratory studies that link gluten to non-traditional diseases and the popularity of gluten-free diets, making it increasingly difficult to access and analyse practical and structured information. In this sense, the accelerated discovery of novel advances in diagnosis and treatment, as well as exploratory studies, produce a favourable scenario for disinformation and misinformation. Objectives Aligned with, the European Union strategy “Delivering on EU Food Safety and Nutrition in 2050″ which emphasizes the inextricable links between imbalanced diets, the increased exposure to unreliable sources of information and misleading information, and the increased dependency on reliable sources of information; this paper presents GlutKNOIS, a public and interactive literature-based database that reconstructs and represents the experimental biomedical knowledge extracted from the gluten-related literature. The developed platform includes different external database knowledge, bibliometrics statistics and social media discussion to propose a novel and enhanced way to search, visualise and analyse potential biomedical and health-related interactions in relation to the gluten domain. Methods For this purpose, the presented study applies a semi-supervised curation workflow that combines natural language processing techniques, machine learning algorithms, ontology-based normalization and integration approaches, named entity recognition methods, and graph knowledge reconstruction methodologies to process, classify, represent and analyse the experimental findings contained in the literature, which is also complemented by data from the social discussion. Results and conclusions In this sense, 5814 documents were manually annotated and 7424 were fully automatically processed to reconstruct the first online gluten-related knowledge database of evidenced health-related interactions that produce health or metabolic changes based on the literature. In addition, the automatic processing of the literature combined with the knowledge representation methodologies proposed has the potential to assist in the revision and analysis of years of gluten research. The reconstructed knowledge base is public and accessible at https://sing-group.org/glutknois/Fundação para a Ciência e a Tecnologia | Ref. UIDB/50006/2020Xunta de Galicia | Ref. ED481B-2019-032Xunta de Galicia | Ref. ED431G2019/06Xunta de Galicia | Ref. ED431C 2022/03Universidade de Vigo/CISU

    a survey

    Get PDF
    Building ontologies in a collaborative and increasingly community-driven fashion has become a central paradigm of modern ontology engineering. This understanding of ontologies and ontology engineering processes is the result of intensive theoretical and empirical research within the Semantic Web community, supported by technology developments such as Web 2.0. Over 6 years after the publication of the first methodology for collaborative ontology engineering, it is generally acknowledged that, in order to be useful, but also economically feasible, ontologies should be developed and maintained in a community-driven manner, with the help of fully-fledged environments providing dedicated support for collaboration and user participation. Wikis, and similar communication and collaboration platforms enabling ontology stakeholders to exchange ideas and discuss modeling decisions are probably the most important technological components of such environments. In addition, process-driven methodologies assist the ontology engineering team throughout the ontology life cycle, and provide empirically grounded best practices and guidelines for optimizing ontology development results in real-world projects. The goal of this article is to analyze the state of the art in the field of collaborative ontology engineering. We will survey several of the most outstanding methodologies, methods and techniques that have emerged in the last years, and present the most popular development environments, which can be utilized to carry out, or facilitate specific activities within the methodologies. A discussion of the open issues identified concludes the survey and provides a roadmap for future research and development in this lively and promising field

    Improvement of a Document Retrieval System for the Supply Chain domain

    Get PDF
    Aquest projecte s'enmarca en el context del projecte Europeu "Semantically Connected Semiconductor Supply Chains (SC3)". Es farà ús de Natural Language Processing i Semantic Web , adaptant les tècniques per a poder referenciar documents de la cadena de producció per un pilot en cadenes de subministrament de semiconductors. En aquest projecte s'utilitzaran conjuntament tecnologies de web semàntica (fent ús d'ontologies) i de Deep Learning (amb BERT/SBERT en pyTorch).This project is framed in the context of the European project "Semantically Connected Semiconductor Supply Chains (SC3)". Natural Language Processing and Semantic Web will be used, adapting the techniques to be able to reference documents from the production chain by a pilot in semiconductor supply chains. This project will use semantic web technologies (using ontologies) and Deep Learning (with BERT / SBERT in pyTorch) together

    Collaborative ontology engineering: a survey

    No full text
    Building ontologies in a collaborative and increasingly community-driven fashion has become a central paradigm of modern ontology engineering. This understanding of ontologies and ontology engineering processes is the result of intensive theoretical and empirical research within the Semantic Web community, supported by technology developments such as Web 2.0. Over 6 years after the publication of the first methodology for collaborative ontology engineering, it is generally acknowledged that, in order to be useful, but also economically feasible, ontologies should be developed and maintained in a community-driven manner, with the help of fully-fledged environments providing dedicated support for collaboration and user participation. Wikis, and similar communication and collaboration platforms enabling ontology stakeholders to exchange ideas and discuss modeling decisions are probably the most important technological components of such environments. In addition, process-driven methodologies assist the ontology engineering team throughout the ontology life cycle, and provide empirically grounded best practices and guidelines for optimizing ontology development results in real-world projects. The goal of this article is to analyze the state of the art in the field of collaborative ontology engineering. We will survey several of the most outstanding methodologies, methods and techniques that have emerged in the last years, and present the most popular development environments, which can be utilized to carry out, or facilitate specific activities within the methodologies. A discussion of the open issues identified concludes the survey and provides a roadmap for future research and development in this lively and promising fiel
    corecore