8 research outputs found

    Completing and Debugging Ontologies: state of the art and challenges

    Full text link
    As semantically-enabled applications require high-quality ontologies, developing and maintaining ontologies that are as correct and complete as possible is an important although difficult task in ontology engineering. A key step is ontology debugging and completion. In general, there are two steps: detecting defects and repairing defects. In this paper we discuss the state of the art regarding the repairing step. We do this by formalizing the repairing step as an abduction problem and situating the state of the art with respect to this framework. We show that there are still many open research problems and show opportunities for further work and advancing the field.Comment: 56 page

    Entities with quantities : extraction, search, and ranking

    Get PDF
    Quantities are more than numeric values. They denote measures of the world’s entities such as heights of buildings, running times of athletes, energy efficiency of car models or energy production of power plants, all expressed in numbers with associated units. Entity-centric search and question answering (QA) are well supported by modern search engines. However, they do not work well when the queries involve quantity filters, such as searching for athletes who ran 200m under 20 seconds or companies with quarterly revenue above $2 Billion. State-of-the-art systems fail to understand the quantities, including the condition (less than, above, etc.), the unit of interest (seconds, dollar, etc.), and the context of the quantity (200m race, quarterly revenue, etc.). QA systems based on structured knowledge bases (KBs) also fail as quantities are poorly covered by state-of-the-art KBs. In this dissertation, we developed new methods to advance the state-of-the-art on quantity knowledge extraction and search.Zahlen sind mehr als nur numerische Werte. Sie beschreiben Maße von Entitäten wie die Höhe von Gebäuden, die Laufzeit von Sportlern, die Energieeffizienz von Automodellen oder die Energieerzeugung von Kraftwerken - jeweils ausgedrückt durch Zahlen mit zugehörigen Einheiten. Entitätszentriete Anfragen und direktes Question-Answering werden von Suchmaschinen häufig gut unterstützt. Sie funktionieren jedoch nicht gut, wenn die Fragen Zahlenfilter beinhalten, wie z. B. die Suche nach Sportlern, die 200m unter 20 Sekunden gelaufen sind, oder nach Unternehmen mit einem Quartalsumsatz von über 2 Milliarden US-Dollar. Selbst moderne Systeme schaffen es nicht, Quantitäten, einschließlich der genannten Bedingungen (weniger als, über, etc.), der Maßeinheiten (Sekunden, Dollar, etc.) und des Kontexts (200-Meter-Rennen, Quartalsumsatz usw.), zu verstehen. Auch QA-Systeme, die auf strukturierten Wissensbanken (“Knowledge Bases”, KBs) aufgebaut sind, versagen, da quantitative Eigenschaften von modernen KBs kaum erfasst werden. In dieser Dissertation werden neue Methoden entwickelt, um den Stand der Technik zur Wissensextraktion und -suche von Quantitäten voranzutreiben. Unsere Hauptbeiträge sind die folgenden: • Zunächst präsentieren wir Qsearch [Ho et al., 2019, Ho et al., 2020] – ein System, das mit erweiterten Fragen mit Quantitätsfiltern umgehen kann, indem es Hinweise verwendet, die sowohl in der Frage als auch in den Textquellen vorhanden sind. Qsearch umfasst zwei Hauptbeiträge. Der erste Beitrag ist ein tiefes neuronales Netzwerkmodell, das für die Extraktion quantitätszentrierter Tupel aus Textquellen entwickelt wurde. Der zweite Beitrag ist ein neuartiges Query-Matching-Modell zum Finden und zur Reihung passender Tupel. • Zweitens, um beim Vorgang heterogene Tabellen einzubinden, stellen wir QuTE [Ho et al., 2021a, Ho et al., 2021b] vor – ein System zum Extrahieren von Quantitätsinformationen aus Webquellen, insbesondere Ad-hoc Webtabellen in HTML-Seiten. Der Beitrag von QuTE umfasst eine Methode zur Verknüpfung von Quantitäts- und Entitätsspalten, für die externe Textquellen genutzt werden. Zur Beantwortung von Fragen kontextualisieren wir die extrahierten Entitäts-Quantitäts-Paare mit informativen Hinweisen aus der Tabelle und stellen eine neue Methode zur Konsolidierung und verbesserteer Reihung von Antwortkandidaten durch Inter-Fakten-Konsistenz vor. • Drittens stellen wir QL [Ho et al., 2022] vor – eine Recall-orientierte Methode zur Anreicherung von Knowledge Bases (KBs) mit quantitativen Fakten. Moderne KBs wie Wikidata oder YAGO decken viele Entitäten und ihre relevanten Informationen ab, übersehen aber oft wichtige quantitative Eigenschaften. QL ist frage-gesteuert und basiert auf iterativem Lernen mit zwei Hauptbeiträgen, um die KB-Abdeckung zu verbessern. Der erste Beitrag ist eine Methode zur Expansion von Fragen, um einen größeren Pool an Faktenkandidaten zu erfassen. Der zweite Beitrag ist eine Technik zur Selbstkonsistenz durch Berücksichtigung der Werteverteilungen von Quantitäten

    Breaking rules: taking Complex Ontology Alignment beyond rule­based approaches

    Get PDF
    Tese de mestrado, Ciência de Dados, Universidade de Lisboa, Faculdade de Ciências, 2021As ontologies are developed in an uncoordinated manner, differences in scope and design compromise interoperability. Ontology matching is critical to address this semantic heterogeneity problem, as it finds correspondences that enable integrating data across the Semantic Web. One of the biggest challenges in this field is that ontology schemas often differ conceptually, and therefore reconciling many real¬world ontology pairs (e.g., in geography or biomedicine) involves establishing complex mappings that contain multiple entities from each ontology. Yet, for the most part, ontology matching algorithms are restricted to finding simple equivalence mappings between ontology entities. This work presents novel algorithms for Complex Ontology Alignment based on Association Rule Mining over a set of shared instances between two ontologies. Its strategy relies on a targeted search for known complex patterns in instance and schema data, reducing the search space. This allows the application of semantic¬based filtering algorithms tailored to each kind of pattern, to select and refine the most relevant mappings. The algorithms were evaluated in OAEI Complex track datasets under two automated approaches: OAEI’s entity¬based approach and a novel element¬overlap–based approach which was developed in the context of this work. The algorithms were able to find mappings spanning eight distinct complex patterns, as well as combinations of patterns through disjunction and conjunction. They were able to efficiently reduce the search space and showed competitive performance results comparing to the State of the Art of complex alignment systems. As for the comparative analysis of evaluation methodologies, the proposed element¬overlap–based evaluation strategy was shown to be more accurate and interpretable than the reference-based automatic alternative, although none of the existing strategies fully address the challenges discussed in the literature. For future work, it would be interesting to extend the algorithms to cover more complex patterns and combine them with lexical approaches

    Génération automatique d'alignements complexes d'ontologies

    Get PDF
    Le web de données liées (LOD) est composé de nombreux entrepôts de données. Ces données sont décrites par différents vocabulaires (ou ontologies). Chaque ontologie a une terminologie et une modélisation propre ce qui les rend hétérogènes. Pour lier et rendre les données du web de données liées interopérables, les alignements d'ontologies établissent des correspondances entre les entités desdites ontologies. Il existe de nombreux systèmes d'alignement qui génèrent des correspondances simples, i.e., ils lient une entité à une autre entité. Toutefois, pour surmonter l'hétérogénéité des ontologies, des correspondances plus expressives sont parfois nécessaires. Trouver ce genre de correspondances est un travail fastidieux qu'il convient d'automatiser. Dans le cadre de cette thèse, une approche d'alignement complexe basée sur des besoins utilisateurs et des instances communes est proposée. Le domaine des alignements complexes est relativement récent et peu de travaux adressent la problématique de leur évaluation. Pour pallier ce manque, un système d'évaluation automatique basé sur de la comparaison d'instances est proposé. Ce système est complété par un jeu de données artificiel sur le domaine des conférences.The Linked Open Data (LOD) cloud is composed of data repositories. The data in the repositories are described by vocabularies also called ontologies. Each ontology has its own terminology and model. This leads to heterogeneity between them. To make the ontologies and the data they describe interoperable, ontology alignments establish correspondences, or links between their entities. There are many ontology matching systems which generate simple alignments, i.e., they link an entity to another. However, to overcome the ontology heterogeneity, more expressive correspondences are sometimes needed. Finding this kind of correspondence is a fastidious task that can be automated. In this thesis, an automatic complex matching approach based on a user's knowledge needs and common instances is proposed. The complex alignment field is still growing and little work address the evaluation of such alignments. To palliate this lack, we propose an automatic complex alignment evaluation system. This system is based on instances. A famous alignment evaluation dataset has been extended for this evaluation

    Instance-based Hierarchical Schema Alignment in Linked Data

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 치의과학과 의료경영과정보학전공, 2015. 8. 김홍기.Along with the development of Web of documents, there is a natural need for sharing, exchanging, and merging heterogeneous data to provide more comprehensive information and answer users with more complex questions. However, the data published on the Web are raw dumps that sacrifice much of the semantics that can be used for exchanging and integrating data. Resource Description Framework (RDF) and Linked Data are designed to expose the semantics of data by interlinking data represented with well-defined relations. With the profusion of RDF resources and Linked Data, ontology alignment has gained significance in providing highly comprehensive knowledge embedded in disparate sources. Ontology alignment, however, in Linking Open Data (LOD) has traditionally focused more on the instance-level rather than the schema-level. Linked Data supports schema-level matching, provided that instance-level matching is already established. Linked Data is a hotbed for instance-based schema matching, which is considered a better solution for matching classes with ambiguous or obscure names. In this dissertation, the author focuses on three issues in instance-based schema alignment for Linked Data: (1) how to align schemas based on instances, (2) how to scale the schema alignment, (3) how to generate a hierarchical schema structure. Targeting the first issue, the author has proposed an instance-based schema alignment algorithm called IUT. The IUT builds a unified taxonomy for the classes from two ontologies based on an instance-class matrix and obtains the relations of two classes by the common instances. The author tested the IUT with DBpedia and YAGO2, and compared the IUT with two state-of-the-art methods in four alignment tasks. The experiments show that the IUT outperforms the methods in terms of efficiency and effectiveness (e.g., costs 968 ms to obtain 0.810 F-score on intra-subsumption alignment in DBpedia). Targeting the second issue, the author has proposed a scaled version of the IUT called IUT(M). The IUT(M) decreases the computations of the IUT from two aspects based on Locality Sensitive Hashing (LSH): (1) decreasing the similarity computations for each pair of classes with MinHash functions, and (2) decreasing the number of similarity computations with banding. The author tested the IUT(M) with YAGO2-YAGO2 intra-subsumption alignment task to demonstrate that the running time of IUT can be reduced by 94% with a 5% loss in F-score. Targeting the third issue, the author has proposed a method to generate a faceted taxonomy based on object properties on Linked Data. A framework is proposed to build a sub-taxonomy in each facet with sub-data, extracted with an object property, with an Instance-based Concept Taxonomy generation algorithm called ICT. Two experiments demonstrate: (1) The ICT efficiently and effectively generates a sub-taxonomy with rdf:type in DBpedia and YAGO2 (e.g., costs 49 and 11,790 ms to build the concept taxonomies that achieve 0.917 and 0.780 on Taxonomic F-score). (2) The faceted taxonomies for Diseasome and DrugBank, efficiently generated based on multiple object properties (e.g., costs 2,032 and 2,525 ms to build the faceted taxonomies based on 6 and 16 properties), can effectively reduce the search spaces in faceted searches (e.g., obtains 1.65 and 1.03 on Maximum Resolution with 2 facets).1 Introduction 1 1.1 Background and Motivations 1 1.1.1 Data Integration and Schema Alignment 1 1.1.2 From RDF to Linked Data 3 1.1.3 Schema Alignment in Linked Data 5 1.2 Instance-based Schema Alignment 9 1.3 Contributions of this Dissertation 13 1.4 Organization of this Dissertation 15 2 Preliminaries and Related Works 17 2.1 Preliminaries 17 2.1.1 RDF and Linked Data 17 2.1.2 Ontology and Schema Alignment in Linked Data 20 2.2 Related Works 23 2.2.1 Instance-based Schema Alignment 23 2.2.2 Scaling Pairwise Similarity Computations 29 2.2.3 Automatic Taxonomy Generation 32 3 Aligning Schemas with Subsumption and Equivalence Relations 36 3.1 Introduction 36 3.2 Problem Definition 38 3.3 Methods 41 3.3.1 Workflow of Instance-based Schema Alignment 41 3.3.2 Instance-class Matrix Generation 42 3.3.3 Subsumption and Equivalence Relations Discovering 44 3.4 Experiments 48 3.4.1 Schema Alignment Algorithms in Comparison 48 3.4.2 Data and Experiment Design 48 3.5 Results 52 3.5.1 Intra-subsumption Relations for YAGO2-YAGO2 54 3.5.2 Intra-subsumption Relations for DBpedia-DBpedia 58 3.5.3 Inter-Subsumption and Equivalence Relations for YAGO2-DBpedia 61 3.5.4 Effects of χ_s and χ_e for the IUT 67 3.6 Discussions 71 3.7 Conclusion 75 4 Scaling Pair-wise Computations Using the Locality Sensitive Hashing 76 4.1 Introduction 76 4.2 Methods 78 4.2.1 MinHash and Signatures 79 4.2.2 Banding Technique 83 4.2.3 Scaling the IUT with MinHash and Banding 85 4.3 Experiment 87 4.4 Discussions 92 4.5 Conclusion 93 5 Unsupervised Hierarchical Schema Structure Generation in Linked Data 94 5.1 Introduction 94 5.2 Faceted Taxonomy for Linked Data 98 5.3 Framework 101 5.3.1 Facets Extraction 102 5.3.2 Instance Restriction and Redundancy Removal 102 5.3.3 Redundant Object Removal 103 5.3.4 Instance-object Matrix Generation 103 5.4 Generating Faceted Taxonomy 105 5.4.1 The Problem of Generating a Sub-taxonomy for a Facet 105 5.4.2 Concept Definition and Naming 105 5.4.3 Taxonomy Generation Algorithm 108 5.4.4 Instantiation and Taxonomy Refinement 110 5.5 Experiments 112 5.5.1 Task 1-Construction of Taxonomy with rdf:type 112 5.5.2 Task 2-Construction of Multiple Faceted Taxonomies 115 5.6 Results 119 5.6.1 Results of Task 1 119 5.6.2 Results of Task 2 124 5.7 Discussion 131 5.8 Conclusion 133 6 Future Works and Conclusion 134 6.1 Future Works 134 6.1.1 Similarity Measures for Instance-based Schema Alignment 134 6.1.2 Ontology Evolution for Instance-based Schema Alignment 135 6.1.3 Combining the IUT with Structure- and Lexical-based Methods 136 6.1.4 Scaling the IUT with Parallel Computations 137 6.1.5 Faceted Navigation and Search for Linked Data 137 6.2 Conclusion 139 Bibliography 142 초록 152Docto

    A Communications-Oriented Perspective on Traffic Management Systems for Smart Cities: Challenges and Innovative Approaches

    Get PDF
    The growing size of cities and increasing population mobility have determined a rapid increase in the number of vehicles on the roads, which has resulted in many challenges for road traffic management authorities in relation to traffic congestion, accidents, and air pollution. Over the recent years, researchers from both industry and academia have been focusing their efforts on exploiting the advances in sensing, communication, and dynamic adaptive technologies to make the existing road traffic management systems (TMSs) more efficient to cope with the aforementioned issues in future smart cities. However, these efforts are still insufficient to build a reliable and secure TMS that can handle the foreseeable rise of population and vehicles in smart cities. In this survey, we present an up-to-date review of the different technologies used in the different phases involved in a TMS and discuss the potential use of smart cars and social media to enable fast and more accurate traffic congestion detection and mitigation. We also provide a thorough study of the security threats that may jeopardize the efficiency of the TMS and endanger drivers' lives. Furthermore, the most significant and recent European and worldwide projects dealing with traffic congestion issues are briefly discussed to highlight their contribution to the advancement of smart transportation. Finally, we discuss some open challenges and present our own vision to develop robust TMSs for future smart cities

    Enhancing Recommendations in Specialist Search Through Semantic-based Techniques and Multiple Resources

    Get PDF
    Information resources abound on the Internet, but mining these resources is a non-trivial task. Such abundance has raised the need to enhance services provided to users, such as recommendations. The purpose of this work is to explore how better recommendations can be provided to specialists in specific domains such as bioinformatics by introducing semantic techniques that reason through different resources and using specialist search techniques. Such techniques exploit semantic relations and hidden associations that occur as a result of the information overlapping among various concepts in multiple bioinformatics resources such as ontologies, websites and corpora. Thus, this work introduces a new method that reasons over different bioinformatics resources and then discovers and exploits different relations and information that may not exist in the original resources. Such relations may be discovered as a consequence of the information overlapping, such as the sibling and semantic similarity relations, to enhance the accuracy of the recommendations provided on bioinformatics content (e.g. articles). In addition, this research introduces a set of semantic rules that are able to extract different semantic information and relations inferred among various bioinformatics resources. This project introduces these semantic-based methods as part of a recommendation service within a content-based system. Moreover, it uses specialists' interests to enhance the provided recommendations by employing a method that is collecting user data implicitly. Then, it represents the data as adaptive ontological user profiles for each user based on his/her preferences, which contributes to more accurate recommendations provided to each specialist in the field of bioinformatics