603 research outputs found

    Vermeidung von Repräsentationsheterogenitäten in realweltlichen Wissensgraphen

    Get PDF
    Knowledge graphs are repositories providing factual knowledge about entities. They are a great source of knowledge to support modern AI applications for Web search, question answering, digital assistants, and online shopping. The advantages of machine learning techniques and the Web's growth have led to colossal knowledge graphs with billions of facts about hundreds of millions of entities collected from a large variety of sources. While integrating independent knowledge sources promises rich information, it inherently leads to heterogeneities in representation due to a large variety of different conceptualizations. Thus, real-world knowledge graphs are threatened in their overall utility. Due to their sheer size, they are hardly manually curatable anymore. Automatic and semi-automatic methods are needed to cope with these vast knowledge repositories. We first address the general topic of representation heterogeneity by surveying the problem throughout various data-intensive fields: databases, ontologies, and knowledge graphs. Different techniques for automatically resolving heterogeneity issues are presented and discussed, while several open problems are identified. Next, we focus on entity heterogeneity. We show that automatic matching techniques may run into quality problems when working in a multi-knowledge graph scenario due to incorrect transitive identity links. We present four techniques that can be used to improve the quality of arbitrary entity matching tools significantly. Concerning relation heterogeneity, we show that synonymous relations in knowledge graphs pose several difficulties in querying. Therefore, we resolve these heterogeneities with knowledge graph embeddings and by Horn rule mining. All methods detect synonymous relations in knowledge graphs with high quality. Furthermore, we present a novel technique for avoiding heterogeneity issues at query time using implicit knowledge storage. We show that large neural language models are a valuable source of knowledge that is queried similarly to knowledge graphs already solving several heterogeneity issues internally.Wissensgraphen sind eine wichtige Datenquelle von Entitätswissen. Sie unterstützen viele moderne KI-Anwendungen. Dazu gehören unter anderem Websuche, die automatische Beantwortung von Fragen, digitale Assistenten und Online-Shopping. Neue Errungenschaften im maschinellen Lernen und das außerordentliche Wachstum des Internets haben zu riesigen Wissensgraphen geführt. Diese umfassen häufig Milliarden von Fakten über Hunderte von Millionen von Entitäten; häufig aus vielen verschiedenen Quellen. Während die Integration unabhängiger Wissensquellen zu einer großen Informationsvielfalt führen kann, führt sie inhärent zu Heterogenitäten in der Wissensrepräsentation. Diese Heterogenität in den Daten gefährdet den praktischen Nutzen der Wissensgraphen. Durch ihre Größe lassen sich die Wissensgraphen allerdings nicht mehr manuell bereinigen. Dafür werden heutzutage häufig automatische und halbautomatische Methoden benötigt. In dieser Arbeit befassen wir uns mit dem Thema Repräsentationsheterogenität. Wir klassifizieren Heterogenität entlang verschiedener Dimensionen und erläutern Heterogenitätsprobleme in Datenbanken, Ontologien und Wissensgraphen. Weiterhin geben wir einen knappen Überblick über verschiedene Techniken zur automatischen Lösung von Heterogenitätsproblemen. Im nächsten Kapitel beschäftigen wir uns mit Entitätsheterogenität. Wir zeigen Probleme auf, die in einem Multi-Wissensgraphen-Szenario aufgrund von fehlerhaften transitiven Links entstehen. Um diese Probleme zu lösen stellen wir vier Techniken vor, mit denen sich die Qualität beliebiger Entity-Alignment-Tools deutlich verbessern lässt. Wir zeigen, dass Relationsheterogenität in Wissensgraphen zu Problemen bei der Anfragenbeantwortung führen kann. Daher entwickeln wir verschiedene Methoden um synonyme Relationen zu finden. Eine der Methoden arbeitet mit hochdimensionalen Wissensgrapheinbettungen, die andere mit einem Rule Mining Ansatz. Beide Methoden können synonyme Relationen in Wissensgraphen mit hoher Qualität erkennen. Darüber hinaus stellen wir eine neuartige Technik zur Vermeidung von Heterogenitätsproblemen vor, bei der wir eine implizite Wissensrepräsentation verwenden. Wir zeigen, dass große neuronale Sprachmodelle eine wertvolle Wissensquelle sind, die ähnlich wie Wissensgraphen angefragt werden können. Im Sprachmodell selbst werden bereits viele der Heterogenitätsprobleme aufgelöst, so dass eine Anfrage heterogener Wissensgraphen möglich wird

    Tracing thick and thin concepts through corpora

    Get PDF
    Philosophers and linguists currently lack the means to reliably identify evaluative concepts and measure their evaluative intensity. Using a corpus-based approach, we present a new method to distinguish evaluatively thick and thin adjectives like ‘courageous’ and ‘awful’ from descriptive adjectives like ‘narrow,’ and from value-associated adjectives like ‘sunny.’ Our study suggests that the modifiers ‘truly’ and ‘really’ frequently highlight the evaluative dimension of thick and thin adjectives, allowing for them to be uniquely classified. Based on these results, we believe our operationalization may pave the way for a more quantitative approach to the study of thick and thin concepts

    Identifying light verb constructions in Indonesian: a direct translation approach

    Get PDF

    Entity-Oriented Search

    Get PDF
    This open access book covers all facets of entity-oriented search—where “search” can be interpreted in the broadest sense of information access—from a unified point of view, and provides a coherent and comprehensive overview of the state of the art. It represents the first synthesis of research in this broad and rapidly developing area. Selected topics are discussed in-depth, the goal being to establish fundamental techniques and methods as a basis for future research and development. Additional topics are treated at a survey level only, containing numerous pointers to the relevant literature. A roadmap for future research, based on open issues and challenges identified along the way, rounds out the book. The book is divided into three main parts, sandwiched between introductory and concluding chapters. The first two chapters introduce readers to the basic concepts, provide an overview of entity-oriented search tasks, and present the various types and sources of data that will be used throughout the book. Part I deals with the core task of entity ranking: given a textual query, possibly enriched with additional elements or structural hints, return a ranked list of entities. This core task is examined in a number of different variants, using both structured and unstructured data collections, and numerous query formulations. In turn, Part II is devoted to the role of entities in bridging unstructured and structured data. Part III explores how entities can enable search engines to understand the concepts, meaning, and intent behind the query that the user enters into the search box, and how they can provide rich and focused responses (as opposed to merely a list of documents)—a process known as semantic search. The final chapter concludes the book by discussing the limitations of current approaches, and suggesting directions for future research. Researchers and graduate students are the primary target audience of this book. A general background in information retrieval is sufficient to follow the material, including an understanding of basic probability and statistics concepts as well as a basic knowledge of machine learning concepts and supervised learning algorithms

    Concepts as Correlates of Lexical Labels. A Cognitivist Perspective, 274 s.

    Get PDF
    This is a submitted manuscript version. The publisher should be contacted for permission to re-use or reprint the material in any form. Final published version, copyright Peter Lang: https://doi.org/10.3726/978-3-653-05287-9The study of language becomes particularly attractive when it is not practised as an isolated descriptive enterprise, but when it has wide-ranging implications for the study of the human mind. Such is the spirit of this book. While categorisation may be the single most basic cognitive process in organisms, and as an area of inquiry, it is fundamental to Cognitive Science as a whole, at the other end of the spectrum, high-level cognition is organised and permeated by language, giving rise to categories that count and function as concepts. Working from considering the philosophical assumptions of the cognitivist perspective, this study offers an argument for a very productive understanding of the relation between concepts, categories, and their theoretical models

    Learning Ontology Relations by Combining Corpus-Based Techniques and Reasoning on Data from Semantic Web Sources

    Get PDF
    The manual construction of formal domain conceptualizations (ontologies) is labor-intensive. Ontology learning, by contrast, provides (semi-)automatic ontology generation from input data such as domain text. This thesis proposes a novel approach for learning labels of non-taxonomic ontology relations. It combines corpus-based techniques with reasoning on Semantic Web data. Corpus-based methods apply vector space similarity of verbs co-occurring with labeled and unlabeled relations to calculate relation label suggestions from a set of candidates. A meta ontology in combination with Semantic Web sources such as DBpedia and OpenCyc allows reasoning to improve the suggested labels. An extensive formal evaluation demonstrates the superior accuracy of the presented hybrid approach

    Eesti keele üldvaldkonna tekstide laia kattuvusega automaatne sündmusanalüüs

    Get PDF
    Seoses tekstide suuremahulise digitaliseerimisega ning digitaalse tekstiloome järjest laiema levikuga on tohutul hulgal loomuliku keele tekste muutunud ja muutumas masinloetavaks. Masinloetavus omab potentsiaali muuta tekstimassiivid inimeste jaoks lihtsamini hallatavaks, nt lubada rakendusi nagu automaatne sisukokkuvõtete tegemine ja tekstide põhjal küsimustele vastamine, ent paraku ei ulatu praegused automaatanalüüsi võimalused tekstide sisu tegeliku mõistmiseni. Oletatakse, tekstide sisu mõistvale automaatanalüüsile viib meid lähemale sündmusanalüüs – kuna paljud tekstid on narratiivse ülesehitusega, tõlgendatavad kui „sündmuste kirjeldused”, peaks tekstidest sündmuste eraldamine ja formaalsel kujul esitamine pakkuma alust mitmete „teksti mõistmist” nõudvate keeletehnoloogia rakenduste loomisel. Käesolevas väitekirjas uuritakse, kuivõrd saab eestikeelsete tekstide sündmusanalüüsi käsitleda kui avatud sündmuste hulka ja üldvaldkonna tekste hõlmavat automaatse lingvistilise analüüsi ülesannet. Probleemile lähenetakse eesti keele automaatanalüüsi kontekstis uudsest, sündmuste ajasemantikale keskenduvast perspektiivist. Töös kohandatakse eesti keelele TimeML märgendusraamistik ja luuakse raamistikule toetuv automaatne ajaväljendite tuvastaja ning ajasemantilise märgendusega (sündmusviidete, ajaväljendite ning ajaseoste märgendusega) tekstikorpus; analüüsitakse korpuse põhjal inimmärgendajate kooskõla sündmusviidete ja ajaseoste määramisel ning lõpuks uuritakse võimalusi ajasemantika-keskse sündmusanalüüsi laiendamiseks geneeriliseks sündmusanalüüsiks sündmust väljendavate keelendite samaviitelisuse lahendamise näitel. Töö pakub suuniseid tekstide ajasemantika ja sündmusstruktuuri märgenduse edasiarendamiseks tulevikus ning töös loodud keeleressurssid võimaldavad nii konkreetsete lõpp-rakenduste (nt automaatne ajaküsimustele vastamine) katsetamist kui ka automaatsete märgendustööriistade edasiarendamist.  Due to massive scale digitalisation processes and a switch from traditional means of written communication to digital written communication, vast amounts of human language texts are becoming machine-readable. Machine-readability holds a potential for easing human effort on searching and organising large text collections, allowing applications such as automatic text summarisation and question answering. However, current tools for automatic text analysis do not reach for text understanding required for making these applications generic. It is hypothesised that automatic analysis of events in texts leads us closer to the goal, as many texts can be interpreted as stories/narratives that are decomposable into events. This thesis explores event analysis as broad-coverage and general domain automatic language analysis problem in Estonian, and provides an investigation starting from time-oriented event analysis and tending towards generic event analysis. We adapt TimeML framework to Estonian, and create an automatic temporal expression tagger and a news corpus manually annotated for temporal semantics (event mentions, temporal expressions, and temporal relations) for the language; we analyse consistency of human annotation of event mentions and temporal relations, and, finally, provide a preliminary study on event coreference resolution in Estonian news. The current work also makes suggestions on how future research can improve Estonian event and temporal semantic annotation, and the language resources developed in this work will allow future experimentation with end-user applications (such as automatic answering of temporal questions) as well as provide a basis for developing automatic semantic analysis tools

    A Framework for Personalized Content Recommendations to Support Informal Learning in Massively Diverse Information WIKIS

    Get PDF
    Personalization has proved to achieve better learning outcomes by adapting to specific learners’ needs, interests, and/or preferences. Traditionally, most personalized learning software systems focused on formal learning. However, learning personalization is not only desirable for formal learning, it is also required for informal learning, which is self-directed, does not follow a specified curriculum, and does not lead to formal qualifications. Wikis among other informal learning platforms are found to attract an increasing attention for informal learning, especially Wikipedia. The nature of wikis enables learners to freely navigate the learning environment and independently construct knowledge without being forced to follow a predefined learning path in accordance with the constructivist learning theory. Nevertheless, navigation on information wikis suffer from several limitations. To support informal learning on Wikipedia and similar environments, it is important to provide easy and fast access to relevant content. Recommendation systems (RSs) have long been used to effectively provide useful recommendations in different technology enhanced learning (TEL) contexts. However, the massive diversity of unstructured content as well as user base on such information oriented websites poses major challenges when designing recommendation models for similar environments. In addition to these challenges, evaluation of TEL recommender systems for informal learning is rather a challenging activity due to the inherent difficulty in measuring the impact of recommendations on informal learning with the absence of formal assessment and commonly used learning analytics. In this research, a personalized content recommendation framework (PCRF) for information wikis as well as an evaluation framework that can be used to evaluate the impact of personalized content recommendations on informal learning from wikis are proposed. The presented recommendation framework models learners’ interests by continuously extrapolating topical navigation graphs from learners’ free navigation and applying graph structural analysis algorithms to extract interesting topics for individual users. Then, it integrates learners’ interest models with fuzzy thesauri for personalized content recommendations. Our evaluation approach encompasses two main activities. First, the impact of personalized recommendations on informal learning is evaluated by assessing conceptual knowledge in users’ feedback. Second, web analytics data is analyzed to get an insight into users’ progress and focus throughout the test session. Our evaluation revealed that PCRF generates highly relevant recommendations that are adaptive to changes in user’s interest using the HARD model with rank-based mean average precision (MAP@k) scores ranging between 100% and 86.4%. In addition, evaluation of informal learning revealed that users who used Wikipedia with personalized support could achieve higher scores on conceptual knowledge assessment with average score of 14.9 compared to 10.0 for the students who used the encyclopedia without any recommendations. The analysis of web analytics data show that users who used Wikipedia with personalized recommendations visited larger number of relevant pages compared to the control group, 644 vs 226 respectively. In addition, they were also able to make use of a larger number of concepts and were able to make comparisons and state relations between concepts

    Design For Change: Ontology-Driven Knowledgebase Applications For Dynamic Biological Domains

    Get PDF
    Post-genomic biology is producing a plethora of rapidly changing complex data. Yet extracting useful information from this data is limited by current knowledge management methodologies. Biological knowledge management is complicated by ambiguous nomenclature, cultural differences between biologists and computer scientists, and conventional database technology that was not designed to support rapidly changing complex domains. A recent trend in ontology-driven database design has emerged to address this challenge. While ontologies provide effective knowledge models, attempts to transform ontologies into knowledgebases have revealed an impedance mismatch or ontology transformation gap. A unique methodology called Ultra-Structure Theory (UT) may provide an ontology transform solution that supports large scale, dynamic biological domains by expressing the complexity in data rather than programming code. This thesis aims to survey ontology and database theory and methodologies, and describe how UT integrates and extends them to provide a flexible, semantically expressive knowledgebase solution using standard relational database technology
    corecore