Search CORE

3 research outputs found

Liage de données RDF : évaluation d'approches interlingues

Author: Lesnikova Tatiana
Publication venue: HAL CCSD
Publication date: 04/05/2016
Field of study

The Semantic Web extends the Web by publishing structured and interlinked data using RDF.An RDF data set is a graph where resources are nodes labelled in natural languages. One of the key challenges of linked data is to be able to discover links across RDF data sets. Given two data sets, equivalent resources should be identified and linked by owl:sameAs links. This problem is particularly difficult when resources are described in different natural languages.This thesis investigates the effectiveness of linguistic resources for interlinking RDF data sets. For this purpose, we introduce a general framework in which each RDF resource is represented as a virtual document containing text information of neighboring nodes. The context of a resource are the labels of the neighboring nodes. Once virtual documents are created, they are projected in the same space in order to be compared. This can be achieved by using machine translation or multilingual lexical resources. Once documents are in the same space, similarity measures to find identical resources are applied. Similarity between elements of this space is taken for similarity between RDF resources.We performed evaluation of cross-lingual techniques within the proposed framework. We experimentally evaluate different methods for linking RDF data. In particular, two strategies are explored: applying machine translation or using references to multilingual resources. Overall, evaluation shows the effectiveness of cross-lingual string-based approaches for linking RDF resources expressed in different languages. The methods have been evaluated on resources in English, Chinese, French and German. The best performance (over 0.90 F-measure) was obtained by the machine translation approach. This shows that the similarity-based method can be successfully applied on RDF resources independently of their type (named entities or thesauri concepts). The best experimental results involving just a pair of languages demonstrated the usefulness of such techniques for interlinking RDF resources cross-lingually.Le Web des données étend le Web en publiant des données structurées et liées en RDF. Un jeu de données RDF est un graphe orienté où les ressources peuvent être des sommets étiquetées dans des langues naturelles. Un des principaux défis est de découvrir les liens entre jeux de données RDF. Étant donnés deux jeux de données, cela consiste à trouver les ressources équivalentes et les lier avec des liens owl:sameAs. Ce problème est particulièrement difficile lorsque les ressources sont décrites dans différentes langues naturelles.Cette thèse étudie l'efficacité des ressources linguistiques pour le liage des données exprimées dans différentes langues. Chaque ressource RDF est représentée comme un document virtuel contenant les informations textuelles des sommets voisins. Les étiquettes des sommets voisins constituent le contexte d'une ressource. Une fois que les documents sont créés, ils sont projetés dans un même espace afin d'être comparés. Ceci peut être réalisé à l'aide de la traduction automatique ou de ressources lexicales multilingues. Une fois que les documents sont dans le même espace, des mesures de similarité sont appliquées afin de trouver les ressources identiques. La similarité entre les documents est prise pour la similarité entre les ressources RDF.Nous évaluons expérimentalement différentes méthodes pour lier les données RDF. En particulier, deux stratégies sont explorées: l'application de la traduction automatique et l'usage des banques de données terminologiques et lexicales multilingues. Dans l'ensemble, l'évaluation montre l'efficacité de ce type d'approches. Les méthodes ont été évaluées sur les ressources en anglais, chinois, français, et allemand. Les meilleurs résultats (F-mesure > 0.90) ont été obtenus par la traduction automatique. L'évaluation montre que la méthode basée sur la similarité peut être appliquée avec succès sur les ressources RDF indépendamment de leur type (entités nommées ou concepts de dictionnaires)

Hal - Université Grenoble Alpes

Exploiting general-purpose background knowledge for automated schema matching

Author: Portisch Jan
Publication venue
Publication date: 01/01/2022
Field of study

The schema matching task is an integral part of the data integration process. It is usually the first step in integrating data. Schema matching is typically very complex and time-consuming. It is, therefore, to the largest part, carried out by humans. One reason for the low amount of automation is the fact that schemas are often defined with deep background knowledge that is not itself present within the schemas. Overcoming the problem of missing background knowledge is a core challenge in automating the data integration process. In this dissertation, the task of matching semantic models, so-called ontologies, with the help of external background knowledge is investigated in-depth in Part I. Throughout this thesis, the focus lies on large, general-purpose resources since domain-specific resources are rarely available for most domains. Besides new knowledge resources, this thesis also explores new strategies to exploit such resources. A technical base for the development and comparison of matching systems is presented in Part II. The framework introduced here allows for simple and modularized matcher development (with background knowledge sources) and for extensive evaluations of matching systems. One of the largest structured sources for general-purpose background knowledge are knowledge graphs which have grown significantly in size in recent years. However, exploiting such graphs is not trivial. In Part III, knowledge graph em- beddings are explored, analyzed, and compared. Multiple improvements to existing approaches are presented. In Part IV, numerous concrete matching systems which exploit general-purpose background knowledge are presented. Furthermore, exploitation strategies and resources are analyzed and compared. This dissertation closes with a perspective on real-world applications

MAnnheim DOCument Server

Intertextual Readings of the Nyāyabhūṣaṇa on Buddhist Anti-Realism

Author: Neill Tyler
Publication venue
Publication date: 13/12/2022
Field of study

This two-part dissertation has two goals: 1) a close philological reading of a 50-page section of a 10th-century Sanskrit philosophical work (Bhāsarvajña's Nyāyabhūṣaṇa), and 2) the creation and assessment of a novel intertextuality research system (Vātāyana) centered on the same work. The first half of the dissertation encompasses the philology project in four chapters: 1) background on the author, work, and key philosophical ideas in the passage; 2) descriptions of all known manuscript witnesses of this work and a new critical edition that substantially improves upon the editio princeps; 3) a word-for-word English translation richly annotated with both traditional explanatory material and novel digital links to not one but two interactive online research systems; and 4) a discussion of the Sanskrit author's dialectical strategy in the studied passage. The second half of the dissertation details the intertextuality research system in a further four chapters: 5) why it is needed and what can be learned from existing projects; 6) the creation of the system consisting of curated textual corpus, composite algorithm in natural language processing and information retrieval, and live web-app interface; 7) an evaluation of system performance measured against a small gold-standard dataset derived from traditional philological research; and 8) a discussion of the impact such new technology could have on humanistic research more broadly. System performance was assessed to be quite good, with a 'recall@5' of 80%, meaning that most previously known cases of mid-length quotation and even paraphrase could be automatically found and returned within the system's top five hits. Moreover, the system was also found to return a 34% surplus of additional significant parallels not found in the small benchmark. This assessment confirms that Vātāyana can be useful to researchers by aiding them in their collection and organization of intertextual observations, leaving them more time to focus on interpretation. Seventeen appendices illustrate both these efforts and a number of side projects, the latter of which span translation alignment, network visualization of an important database of South Asian prosopography (PANDiT), and a multi-functional Sanskrit text-processing web application (Skrutable).:Preface (i) Table of Contents (ii) Abbreviations (v) Terms and Symbols (v) Nyāyabhūṣaṇa Witnesses (v) Main Sanskrit Editions (vi) Introduction (vii) A Multi-Disciplinary Project in Intertextual Reading (vii) Main Object of Study: Nyāyabhūṣaṇa 104–154 (vii) Project Outline (ix) Part I: Close Reading (1) 1 Background (1) 1.1 Bhāsarvajña (1) 1.2 The Nyāyabhūṣaṇa (6) 1.2.1 Ts One of Several Commentaries on Bhāsarvajña's Nyāyasāra (6) 1.2.2 In Modern Scholarship, with Focus on NBhū 104–154 (8) 1.3 Philosophical Context (11) 1.3.1 Key Philosophical Concepts (12) 1.3.2 Intra-Textual Context within the Nyāyabhūṣaṇa (34) 1.3.3 Inter-Textual Context (36) 2 Edition of NBhū 104–154 (39) 2.1 Source Materials (39) 2.1.1 Edition of Yogīndrānanda 1968 (E) (40) 2.1.2 Manuscripts (P1, P2, V) (43) 2.1.3 Diplomatic Transcripts (59) 2.2 Notes on Using the Edition (60) 2.3 Critical Edition of NBhū 104–154 with Apparatuses (62) 3 Translation of NBhū 104–154 (108) 3.1 Notes on Translation Method (108) 3.2 Notes on Outline Headings (112) 3.3 Annotated Translation of NBhū 104–154 (114) 4 Discussion (216) 4.1 Internal Structure of NBhū 104–154 (216) 4.2 Critical Assessment of Bhāsarvajña's Argumentation (218) Part II: Distant Reading with Digital Humanities (224) 5 Background in Intertextuality Detection (224) 5.1 Sanskrit Projects (225) 5.2 Non-Sanskrit Projects (228) 5.3 Operationalizing Intertextuality (233) 6 Building an Intertextuality Machine (239) 6.1 Corpus (Pramāṇa NLP) (239) 6.2 Algorithm (Vātāyana) (242) 6.3 User Interface (Vātāyana) (246) 7 Evaluating System Performance (255) 7.1 Previous Scholarship on NBhū 104–154 as Philological Benchmark (255) 7.2 System Performance Relative to Benchmark (257) 8 Discussion (262) Conclusion (266) Works Cited (269) Main Sanskrit Editions (269) Works Cited in Part I (271) Works Cited in Part II (281) Appendices (285) Appendix 1: Correspondence of Joshi 1986 to Yogīndrānanda 1968 (286) Appendix 1D: Full-Text Alignment of Joshi 1986 to Yogīndrānanda 1968 (287) Appendix 2: Prosopographical Relations Important for NBhū 104–154 (288) Appendix 2D: Command-Line Tool “Pandit Grapher” (290) Appendix 3: Previous Suggestions to Improve Text of NBhū 104–154 (291) Appendix 4D: Transcript and Collation Data for NBhū 104–154 (304) Appendix 5D: Command-Line Tool “cte2cex” for Transcript Data Conversion (305) Appendix 6D: Deployment of Brucheion for Interactive Transcript Data (306) Appendix 7: Highlighted Improvements to Text of NBhū 104–154 (307) Appendix 7D: Alternate Version of Edition With Highlighted Improvements (316) Appendix 8D: Digital Forms of Translation of NBhū 104–154 (317) Appendix 9: Analytic Outline of NBhū 104–154 by Shodo Yamakami (318) Appendix 10.1: New Analytic Outline of NBhū 104–154 (Overall) (324) Appendix 10.2: New Analytic Outline of NBhū 104–154 (Detailed) (325) Appendix 11D: Skrutable Text Processing Library and Web Application (328) Appendix 12D: Pramāṇa NLP Corpus, Metadata, and LDA Modeling Info (329) Appendix 13D: Vātāyana Intertextuality Research Web Application (330) Appendix 14: Sample of Yamakami Citation Benchmark for NBhū 104–154 (331) Appendix 14D: Full Yamakami Citation Benchmark for NBhū 104–154 (333) Appendix 15: Vātāyana Recall@5 Scores for NBhū 104–154 (334) Appendix 16: PVA, PVin, and PVSV Vātāyana Search Hits for Entire NBhū (338) Appendix 17: Sample Listing of Vātāyana Search Hits for Entire NBhū (349) Appendix 17D: Full Listing of Vātāyana Search Hits for Entire NBhū (355) Overview of Digital Appendices (356) Zusammenfassung (Thesen Zur Dissertation) (357) Summary of Results (361