7 research outputs found

    Is question answering fit for the Semantic Web? A survey

    Get PDF
    With the recent rapid growth of the Semantic Web (SW), the processes of searching and querying content that is both massive in scale and heterogeneous have become increasingly challenging. User-friendly interfaces, which can support end users in querying and exploring this novel and diverse, structured information space, are needed to make the vision of the SW a reality. We present a survey on ontology-based Question Answering (QA), which has emerged in recent years to exploit the opportunities offered by structured semantic information on the Web. First, we provide a comprehensive perspective by analyzing the general background and history of the QA research field, from influential works from the artificial intelligence and database communities developed in the 70s and later decades, through open domain QA stimulated by the QA track in TREC since 1999, to the latest commercial semantic QA solutions, before tacking the current state of the art in open userfriendly interfaces for the SW. Second, we examine the potential of this technology to go beyond the current state of the art to support end-users in reusing and querying the SW content. We conclude our review with an outlook for this novel research area, focusing in particular on the R&D directions that need to be pursued to realize the goal of efficient and competent retrieval and integration of answers from large scale, heterogeneous, and continuously evolving semantic sources

    Instance-based Hierarchical Schema Alignment in Linked Data

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 치의과학과 의료경영과정보학전공, 2015. 8. 김홍기.Along with the development of Web of documents, there is a natural need for sharing, exchanging, and merging heterogeneous data to provide more comprehensive information and answer users with more complex questions. However, the data published on the Web are raw dumps that sacrifice much of the semantics that can be used for exchanging and integrating data. Resource Description Framework (RDF) and Linked Data are designed to expose the semantics of data by interlinking data represented with well-defined relations. With the profusion of RDF resources and Linked Data, ontology alignment has gained significance in providing highly comprehensive knowledge embedded in disparate sources. Ontology alignment, however, in Linking Open Data (LOD) has traditionally focused more on the instance-level rather than the schema-level. Linked Data supports schema-level matching, provided that instance-level matching is already established. Linked Data is a hotbed for instance-based schema matching, which is considered a better solution for matching classes with ambiguous or obscure names. In this dissertation, the author focuses on three issues in instance-based schema alignment for Linked Data: (1) how to align schemas based on instances, (2) how to scale the schema alignment, (3) how to generate a hierarchical schema structure. Targeting the first issue, the author has proposed an instance-based schema alignment algorithm called IUT. The IUT builds a unified taxonomy for the classes from two ontologies based on an instance-class matrix and obtains the relations of two classes by the common instances. The author tested the IUT with DBpedia and YAGO2, and compared the IUT with two state-of-the-art methods in four alignment tasks. The experiments show that the IUT outperforms the methods in terms of efficiency and effectiveness (e.g., costs 968 ms to obtain 0.810 F-score on intra-subsumption alignment in DBpedia). Targeting the second issue, the author has proposed a scaled version of the IUT called IUT(M). The IUT(M) decreases the computations of the IUT from two aspects based on Locality Sensitive Hashing (LSH): (1) decreasing the similarity computations for each pair of classes with MinHash functions, and (2) decreasing the number of similarity computations with banding. The author tested the IUT(M) with YAGO2-YAGO2 intra-subsumption alignment task to demonstrate that the running time of IUT can be reduced by 94% with a 5% loss in F-score. Targeting the third issue, the author has proposed a method to generate a faceted taxonomy based on object properties on Linked Data. A framework is proposed to build a sub-taxonomy in each facet with sub-data, extracted with an object property, with an Instance-based Concept Taxonomy generation algorithm called ICT. Two experiments demonstrate: (1) The ICT efficiently and effectively generates a sub-taxonomy with rdf:type in DBpedia and YAGO2 (e.g., costs 49 and 11,790 ms to build the concept taxonomies that achieve 0.917 and 0.780 on Taxonomic F-score). (2) The faceted taxonomies for Diseasome and DrugBank, efficiently generated based on multiple object properties (e.g., costs 2,032 and 2,525 ms to build the faceted taxonomies based on 6 and 16 properties), can effectively reduce the search spaces in faceted searches (e.g., obtains 1.65 and 1.03 on Maximum Resolution with 2 facets).1 Introduction 1 1.1 Background and Motivations 1 1.1.1 Data Integration and Schema Alignment 1 1.1.2 From RDF to Linked Data 3 1.1.3 Schema Alignment in Linked Data 5 1.2 Instance-based Schema Alignment 9 1.3 Contributions of this Dissertation 13 1.4 Organization of this Dissertation 15 2 Preliminaries and Related Works 17 2.1 Preliminaries 17 2.1.1 RDF and Linked Data 17 2.1.2 Ontology and Schema Alignment in Linked Data 20 2.2 Related Works 23 2.2.1 Instance-based Schema Alignment 23 2.2.2 Scaling Pairwise Similarity Computations 29 2.2.3 Automatic Taxonomy Generation 32 3 Aligning Schemas with Subsumption and Equivalence Relations 36 3.1 Introduction 36 3.2 Problem Definition 38 3.3 Methods 41 3.3.1 Workflow of Instance-based Schema Alignment 41 3.3.2 Instance-class Matrix Generation 42 3.3.3 Subsumption and Equivalence Relations Discovering 44 3.4 Experiments 48 3.4.1 Schema Alignment Algorithms in Comparison 48 3.4.2 Data and Experiment Design 48 3.5 Results 52 3.5.1 Intra-subsumption Relations for YAGO2-YAGO2 54 3.5.2 Intra-subsumption Relations for DBpedia-DBpedia 58 3.5.3 Inter-Subsumption and Equivalence Relations for YAGO2-DBpedia 61 3.5.4 Effects of χ_s and χ_e for the IUT 67 3.6 Discussions 71 3.7 Conclusion 75 4 Scaling Pair-wise Computations Using the Locality Sensitive Hashing 76 4.1 Introduction 76 4.2 Methods 78 4.2.1 MinHash and Signatures 79 4.2.2 Banding Technique 83 4.2.3 Scaling the IUT with MinHash and Banding 85 4.3 Experiment 87 4.4 Discussions 92 4.5 Conclusion 93 5 Unsupervised Hierarchical Schema Structure Generation in Linked Data 94 5.1 Introduction 94 5.2 Faceted Taxonomy for Linked Data 98 5.3 Framework 101 5.3.1 Facets Extraction 102 5.3.2 Instance Restriction and Redundancy Removal 102 5.3.3 Redundant Object Removal 103 5.3.4 Instance-object Matrix Generation 103 5.4 Generating Faceted Taxonomy 105 5.4.1 The Problem of Generating a Sub-taxonomy for a Facet 105 5.4.2 Concept Definition and Naming 105 5.4.3 Taxonomy Generation Algorithm 108 5.4.4 Instantiation and Taxonomy Refinement 110 5.5 Experiments 112 5.5.1 Task 1-Construction of Taxonomy with rdf:type 112 5.5.2 Task 2-Construction of Multiple Faceted Taxonomies 115 5.6 Results 119 5.6.1 Results of Task 1 119 5.6.2 Results of Task 2 124 5.7 Discussion 131 5.8 Conclusion 133 6 Future Works and Conclusion 134 6.1 Future Works 134 6.1.1 Similarity Measures for Instance-based Schema Alignment 134 6.1.2 Ontology Evolution for Instance-based Schema Alignment 135 6.1.3 Combining the IUT with Structure- and Lexical-based Methods 136 6.1.4 Scaling the IUT with Parallel Computations 137 6.1.5 Faceted Navigation and Search for Linked Data 137 6.2 Conclusion 139 Bibliography 142 초록 152Docto

    Linked Data Supported Information Retrieval

    Get PDF
    Um Inhalte im World Wide Web ausfindig zu machen, sind Suchmaschienen nicht mehr wegzudenken. Semantic Web und Linked Data Technologien ermöglichen ein detaillierteres und eindeutiges Strukturieren der Inhalte und erlauben vollkommen neue Herangehensweisen an die Lösung von Information Retrieval Problemen. Diese Arbeit befasst sich mit den Möglichkeiten, wie Information Retrieval Anwendungen von der Einbeziehung von Linked Data profitieren können. Neue Methoden der computer-gestützten semantischen Textanalyse, semantischen Suche, Informationspriorisierung und -visualisierung werden vorgestellt und umfassend evaluiert. Dabei werden Linked Data Ressourcen und ihre Beziehungen in die Verfahren integriert, um eine Steigerung der Effektivität der Verfahren bzw. ihrer Benutzerfreundlichkeit zu erzielen. Zunächst wird eine Einführung in die Grundlagen des Information Retrieval und Linked Data gegeben. Anschließend werden neue manuelle und automatisierte Verfahren zum semantischen Annotieren von Dokumenten durch deren Verknüpfung mit Linked Data Ressourcen vorgestellt (Entity Linking). Eine umfassende Evaluation der Verfahren wird durchgeführt und das zu Grunde liegende Evaluationssystem umfangreich verbessert. Aufbauend auf den Annotationsverfahren werden zwei neue Retrievalmodelle zur semantischen Suche vorgestellt und evaluiert. Die Verfahren basieren auf dem generalisierten Vektorraummodell und beziehen die semantische Ähnlichkeit anhand von taxonomie-basierten Beziehungen der Linked Data Ressourcen in Dokumenten und Suchanfragen in die Berechnung der Suchergebnisrangfolge ein. Mit dem Ziel die Berechnung von semantischer Ähnlichkeit weiter zu verfeinern, wird ein Verfahren zur Priorisierung von Linked Data Ressourcen vorgestellt und evaluiert. Darauf aufbauend werden Visualisierungstechniken aufgezeigt mit dem Ziel, die Explorierbarkeit und Navigierbarkeit innerhalb eines semantisch annotierten Dokumentenkorpus zu verbessern. Hierfür werden zwei Anwendungen präsentiert. Zum einen eine Linked Data basierte explorative Erweiterung als Ergänzung zu einer traditionellen schlüsselwort-basierten Suchmaschine, zum anderen ein Linked Data basiertes Empfehlungssystem
    corecore