81 research outputs found

    Completing and Debugging Ontologies: state of the art and challenges

    Full text link
    As semantically-enabled applications require high-quality ontologies, developing and maintaining ontologies that are as correct and complete as possible is an important although difficult task in ontology engineering. A key step is ontology debugging and completion. In general, there are two steps: detecting defects and repairing defects. In this paper we discuss the state of the art regarding the repairing step. We do this by formalizing the repairing step as an abduction problem and situating the state of the art with respect to this framework. We show that there are still many open research problems and show opportunities for further work and advancing the field.Comment: 56 page

    View-based user interfaces for the Semantic Web

    Get PDF
    This thesis explores the possibilities of using the view-based search paradigm to create intelligent user interfaces on the Semantic Web. After surveying several semantic search techniques, the view-based search paradigm is explained, and argued to fit in a valuable niche in the field. To test the argument, numerous portals with different user interfaces and data were built using the paradigm. Based on the results of these experiments, this thesis argues that the paradigm provides a strong, extensible and flexible base on which to built semantic user interfaces. Designing the actual systems to be as adaptable as possible is also discussed

    Exploiting general-purpose background knowledge for automated schema matching

    Full text link
    The schema matching task is an integral part of the data integration process. It is usually the first step in integrating data. Schema matching is typically very complex and time-consuming. It is, therefore, to the largest part, carried out by humans. One reason for the low amount of automation is the fact that schemas are often defined with deep background knowledge that is not itself present within the schemas. Overcoming the problem of missing background knowledge is a core challenge in automating the data integration process. In this dissertation, the task of matching semantic models, so-called ontologies, with the help of external background knowledge is investigated in-depth in Part I. Throughout this thesis, the focus lies on large, general-purpose resources since domain-specific resources are rarely available for most domains. Besides new knowledge resources, this thesis also explores new strategies to exploit such resources. A technical base for the development and comparison of matching systems is presented in Part II. The framework introduced here allows for simple and modularized matcher development (with background knowledge sources) and for extensive evaluations of matching systems. One of the largest structured sources for general-purpose background knowledge are knowledge graphs which have grown significantly in size in recent years. However, exploiting such graphs is not trivial. In Part III, knowledge graph em- beddings are explored, analyzed, and compared. Multiple improvements to existing approaches are presented. In Part IV, numerous concrete matching systems which exploit general-purpose background knowledge are presented. Furthermore, exploitation strategies and resources are analyzed and compared. This dissertation closes with a perspective on real-world applications

    User Interfaces to the Web of Data based on Natural Language Generation

    Get PDF
    We explore how Virtual Research Environments based on Semantic Web technologies support research interactions with RDF data in various stages of corpus-based analysis, analyze the Web of Data in terms of human readability, derive labels from variables in SPARQL queries, apply Natural Language Generation to improve user interfaces to the Web of Data by verbalizing SPARQL queries and RDF graphs, and present a method to automatically induce RDF graph verbalization templates via distant supervision

    Development of an Analysis Process to Assess the Quality of Research Knowledge Graphs

    Get PDF
    This thesis proposes a novel approach for assessing the quality objectively of knowledge graphs, with a particular focus on the Open Research Knowledge Graph (ORKG). The ORKG is a community-driven open platform that aims to make research contributions more discoverable, accessible, and reusable. As a critical component of modern information systems, knowledge graphs enable effective data integration, discovery, and retrieval. However, assessing the quality of these graphs is challenging, given their complexity and heterogeneity. The main problem addressed in this thesis is to develop an approach to assess the quality of knowledge graphs, with a particular emphasis on completeness and accuracy, in the context of the ORKG. The proposed approach is based on a set of quality measures that evaluate different aspects of completeness and accuracy, and it leverages the Knowledge Graph Maturity Model (KGMM) as a framework for assessing the maturity level of the ORKG. The solution is evaluated empirically using a set of ORKG curation grants, and the observed results demonstrate that the proposed approach can effectively identify gaps in completeness and accuracy, and provide a comprehensive assessment of the quality of the ORKG. This assessment can help the ORKG community to prioritize curation efforts and improve the quality of the ORKG. Overall, this thesis contributes to the field of knowledge graph quality assessment by proposing a comprehensive approach for assessing the quality of knowledge graphs, and demonstrating its effectiveness in the context of the ORKG. The proposed approach has the potential to be applied to other knowledge graphs, enabling better data integration, discovery, and retrieval in various domains.In dieser Arbeit wird ein neuartiger Ansatz zur objektiven Bewertung der Qualität von Wissensgraphen vorgeschlagen, wobei der Schwerpunkt auf dem Open Research Knowledge Graph (ORKG) liegt. Der ORKG ist eine von der Gemeinschaft betriebene offene Plattform, die darauf abzielt, Forschungsbeiträge besser auffindbar, zugänglich und wiederverwendbar zu machen. Wissensgraphen sind ein wichtiger Bestandteil moderner Informationssysteme und ermöglichen eine effektive Datenintegration, -suche und -abfrage. Die Bewertung der Qualität dieser Graphen ist jedoch angesichts ihrer Komplexität und Heterogenität eine Herausforderung. Das Hauptproblem, das in dieser Arbeit behandelt wird, ist die Entwicklung eines Ansatzes zur Bewertung der Qualität von Wissensgraphen, mit besonderem Schwerpunkt auf Vollständigkeit und Genauigkeit, im Kontext des ORKG. Der vorgeschlagene Ansatz basiert auf einer Reihe von Qualitätsmaßstäben, die verschiedene Aspekte der Vollständigkeit und Genauigkeit bewerten, und er nutzt das Knowledge Graph Maturity Model (KGMM) als Rahmen für die Bewertung des Reifegrads des ORKG. Die Lösung wird empirisch anhand einer Reihe von ORKG-Kuratoren evaluiert. Die Ergebnisse zeigen, dass der vorgeschlagene Ansatz effektiv Lücken in der Vollständigkeit und Genauigkeit identifizieren kann und eine umfassende Bewertung der Qualität des ORKG ermöglicht. Diese Bewertung kann der ORKG-Gemeinschaft helfen, Prioritäten bei der Kuratierung zu setzen und die Qualität des ORKG zu verbessern. Insgesamt leistet diese Arbeit einen Beitrag zur Bewertung der Qualität von Wissensgraphen, indem sie einen umfassenden Ansatz zur Bewertung der Qualität von Wissensgraphen vorschlägt und dessen Wirksamkeit im Kontext des ORKG demonstriert. Der vorgeschlagene Ansatz hat das Potenzial, auf andere Wissensgraphen angewendet zu werden, um eine bessere Datenintegration, -suche und -abfrage in verschiedenen Bereichen zu ermöglichen

    Automatically selecting patients for clinical trials with justifications

    Get PDF
    Clinical trials are human research studies that are used to evaluate the effectiveness of a surgical, medical, or behavioral intervention. They have been widely used by researchers to determine whether a new treatment, such as a new medication, is safe and effective in humans. A clinical trial is frequently performed to determine whether a new treatment is more successful than the current treatment or has less harmful side effects. However, clinical trials have a high failure rate. One method applied is to find patients based on patient records. Unfortunately, this is a difficult process. This is because this process is typically performed manually, making it time-consuming and error-prone. Consequently, clinical trial deadlines are often missed, and studies do not move forward. Time can be a determining factor for success. Therefore, it would be advantageous to have automatic support in this process. Since it is also important to be able to validate whether the patients were selected correctly for the trial, avoiding eventual health problems, it would be important to have a mechanism to present justifications for the selected patients. In this dissertation, we present one possible solution to solve the problem of patient selection for clinical trials. We developed the necessary algorithms and created a simple and intuitive web application that features the selection of patients for clinical trials automatically. This was achieved by combining knowledge expressed in different formalisms. We integrated medical knowledge using ontologies, with criteria that were expressed using nonmonotonic rules. To address the validation procedure automatically, we developed a mechanism that generates the justifications for each selection together with the results of the patients who were selected. In the end, it is expected that a user can easily enter a set of trial criteria, and the application will generate the results of the selected patients and their respective justifications, based on the criteria inserted, medical information and a database of patient information.Os ensaios clínicos são estudos de pesquisa em humanos, utilizados para avaliar a eficácia de uma intervenção cirúrgica, médica ou comportamental. Estes estudos, têm sido amplamente utilizados pelos investigadores para determinar se um novo tratamento, como é o caso de um novo medicamento, é seguro e eficaz em humanos. Um ensaio clínico é realizado frequentemente, para determinar se um novo tratamento tem mais sucesso do que o tratamento atual ou se tem menos efeitos colaterais prejudiciais. No entanto, os ensaios clínicos têm uma taxa de insucesso alta. Um método aplicado é encontrar pacientes com base em registos. Infelizmente, este é um processo difícil. Isto deve-se ao facto deste processo ser normalmente realizado à mão, o que o torna demorado e propenso a erros. Consequentemente, o prazo dos ensaios clínicos é muitas vezes ultrapassado e os estudos acabam por não avançar. O tempo pode ser por vezes um fator determinante para o sucesso. Seria então vantajoso ter algum apoio automático neste processo. Visto que também seria importante validar se os pacientes foram selecionados corretamente para o ensaio, evitando até eventuais problemas de saúde, seria importante ter um mecanismo que apresente justificações para os pacientes selecionados. Nesta dissertação, apresentamos uma possível solução para resolver o problema da seleção de pacientes para ensaios clínicos, através da criação de uma aplicação web, intuitiva e fácil de utilizar, que apresenta a seleção de pacientes para ensaios clínicos de forma automática. Isto foi alcançado através da combinação de conhecimento expresso em diferentes formalismos. Integrámos o conhecimento médico usando ontologias, com os critérios que serão expressos usando regras não monotónicas. Para tratar do processo de validação, desenvolvemos um mecanismo que gera justificações para cada seleção juntamente com os resultados dos pacientes selecionados. No final, é esperado que o utilizador consiga inserir facilmente um conjunto de critérios de seleção, e a aplicação irá gerar os resultados dos pacientes selecionados e as respetivas justificações, com base nos critérios inseridos, informações médicas e uma base de dados com informações dos pacientes

    On construction, performance, and diversification for structured queries on the semantic desktop

    Get PDF
    [no abstract

    Enabling entity retrieval by exploiting Wikipedia as a semantic knowledge source

    Get PDF
    This dissertation research, PanAnthropon FilmWorld, aims to demonstrate direct retrieval of entities and related facts by exploiting Wikipedia as a semantic knowledge source, with the film domain as its proof-of-concept domain of application. To this end, a semantic knowledge base concerning the film domain has been constructed with the data extracted/derived from 10,640 Wikipedia pages on films and additional pages on film awards. The knowledge base currently contains 209,266 entities and 2,345,931 entity-centric facts. Both the knowledge base and the corresponding semantic search interface are based on the coherent classification of entities. Entity-centric facts are also consistently represented as tuples. The semantic search interface (http://dlib.ischool.drexel.edu:8080/sofia/PA/) supports multiple types of semantic search functions, which go beyond the traditional keyword-based search function, including the main General Entity Retrieval Query (GERQ) function, which is concerned with retrieving all entities that match the specified entity type, subtype, and semantic conditions and thus corresponds to the main research problem. Two types of evaluation have been performed in order to evaluate (1) the quality of information extraction and (2) the effectiveness of information retrieval using the semantic interface. The first type of evaluation has been performed by inspecting 11,495 film-centric facts concerning 100 films. The results have confirmed high data quality with 99.96% average precision and 99.84% average recall. The second type of evaluation has been performed by conducting an experiment with human subjects. The experiment involved having the subjects perform a retrieval task by using both the PanAnthropon interface and the Internet Movie Database (IMDb) interface and comparing their task performance between the two interfaces. The results have confirmed higher effectiveness of the PanAnthropon interface vs. the IMDb interface (83.11% vs. 40.78% average precision; 83.55% vs. 40.26% average recall). Moreover, the subjects’ responses to the post-task questionnaire indicate that the subjects found the PanAnthropon interface to be highly usable and easily understandable as well as highly effective. The main contribution from this research therefore consists in achieving the set research goal, namely, demonstrating the utility and feasibility of semantics-based direct entity retrieval.Ph.D., Information Studies -- Drexel University, 201
    corecore