119 research outputs found

    Methods for Efficient Ontology Lexicalization for Non-Indo-European Languages: The Case of Japanese

    Get PDF
    Lanser B. Methods for Efficient Ontology Lexicalization for Non-Indo-European Languages: The Case of Japanese. Bielefeld: Universität Bielefeld; 2017.In order to make the growing amount of conceptual knowledge available through ontologies and datasets accessible to humans, NLP applications need access to information on how this knowledge can be verbalized in natural language. One way to provide this kind of information are ontology lexicons, which apart from the actual verbalizations in a given target language can provide further, rich linguistic information about them. Compiling such lexicons manually is a very time-consuming task and requires expertise both in Semantic Web technologies and lexicon engineering, as well as a very good knowledge of the target language at hand. In this thesis we present two alternative approaches to generating ontology lexicons by means of crowdsourcing on the one hand and through the framework M-ATOLL on the other hand. So far, M-ATOLL has been used with a number of Indo-European languages that share a large set of common characteristics. Therefore, another focus of this work will be the generation of ontology lexicons specifically for Non-Indo-European languages. In order to explore these two topics, we use both approaches to generate Japanese ontology lexicons for the DBpedia ontology: First, we use CrowdFlower to generate a small Japanese ontology lexicon for ten exemplary ontology elements according to a two-stage workflow, the main underlying idea of which is to turn the task of generating lexicon entries into a translation task; the starting point of this translation task is a manually created English lexicon for DBpedia. Next, we adapt M-ATOLL's corpus-based approach to being used with Japanese, and use the adapted system to generate two lexicons for five example properties, respectively. Aspects of the DBpedia system that require modifications for being used with Japanese include the dependency patterns employed by M-ATOLL to extract candidate verbalizations from corpus data, and the templates used to generate the actual lexicon entries. Comparison of the lexicons generated by both approaches to manually created gold standards shows that both approaches are viable options for the generation of ontology lexicons also for Non-Indo-European languages

    Machine Reading the Primeros Libros

    Get PDF
    Early modern printed books pose particular challenges for automatic transcription: uneven inking, irregular orthographies, radically multilingual texts. As a result, modern efforts to transcribe these documents tend to produce the textual gibberish commonly known as "dirty OCR" (Optical Character Recognition). This noisy output is most frequently seen as a barrier to access for scholars interested in the computational analysis or digital display of transcribed documents. This article, however, proposes that a closer analysis of dirty OCR can reveal both historical and cultural factors at play in the practice of automatic transcription. To make this argument, it focuses on tools developed for the automatic transcription of the Primeros Libros collection of sixteenth century Mexican printed books. By bringing together the history of the collection with that of the OCR tool, it illustrates how the colonial history of these documents is embedded in, and transformed by, the statistical models used for automatic transcription. It argues that automatic transcription, itself a mechanical and practical tool, also has an interpretive effect on transcribed texts that can have practical consequences for scholarly work

    Arabic Sign Language Adaptation In Teaching Fardhu Ain To The Disabled Hearing

    Get PDF
    The purpose of this study is to review the consensus of experts in the adaptation of the use of sign language in teaching fardhu ain to the disabled hearing. This study has used the technique of fuzzy delphi to seek a consensus of experts skilled in sign language to answer the review questions. Distributed instruments includes 14 items in order to obtain a consensus of experts. The findings show that there are five items that need to be adopted in the Arabic sign language that is the sign language related to the pronouncement of the Shahadah, beliefs and tenets of islam, body cleanliness, the aurat  and prayer with the deffuzification value of 0.767

    A robust methodology for automated essay grading

    Get PDF
    None of the available automated essay grading systems can be used to grade essays according to the National Assessment Program – Literacy and Numeracy (NAPLAN) analytic scoring rubric used in Australia. This thesis is a humble effort to address this limitation. The objective of this thesis is to develop a robust methodology for automatically grading essays based on the NAPLAN rubric by using heuristics and rules based on English language and neural network modelling

    Parsing impoverished syntax

    Full text link
    Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal

    Headedness and/or grammatical anarchy?

    Get PDF
    In most grammatical models, hierarchical structuring and dependencies are considered as central features of grammatical structures, an idea which is usually captured by the notion of “head” or “headedness”. While in most models, this notion is more or less taken for granted, there is still much disagreement as to the precise properties of grammatical heads and the theoretical implications that arise of these properties. Moreover, there are quite a few linguistic structures that pose considerable challenges to the notion of “headedness”. Linking to the seminal discussions led in Zwicky (1985) and Corbett, Fraser, & Mc-Glashan (1993), this volume intends to look more closely upon phenomena that are considered problematic for an analysis in terms of grammatical heads. The aim of this book is to approach the concept of “headedness” from its margins. Thus, central questions of the volume relate to the nature of heads and the distinction between headed and non-headed structures, to the process of gaining and losing head status, and to the thought-provoking question as to whether grammar theory could do without heads at all. The contributions in this volume provide new empirical findings bearing on phenomena that challenge the conception of grammatical heads and/or discuss the notion of head/headedness and its consequences for grammatical theory in a more abstract way. The collected papers view the topic from diverse theoretical perspectives (among others HPSG, Generative Syntax, Optimality Theory) and different empirical angles, covering typological and corpus-linguistic accounts, with a focus on data from German

    Headedness and/or grammatical anarchy?

    Get PDF
    Synopsis: In most grammatical models, hierarchical structuring and dependencies are considered as central features of grammatical structures, an idea which is usually captured by the notion of “head” or “headedness”. While in most models, this notion is more or less taken for granted, there is still much disagreement as to the precise properties of grammatical heads and the theoretical implications that arise of these properties. Moreover, there are quite a few linguistic structures that pose considerable challenges to the notion of “headedness”. Linking to the seminal discussions led in Zwicky (1985) and Corbett, Fraser, & Mc-Glashan (1993), this volume intends to look more closely upon phenomena that are considered problematic for an analysis in terms of grammatical heads. The aim of this book is to approach the concept of “headedness” from its margins. Thus, central questions of the volume relate to the nature of heads and the distinction between headed and non-headed structures, to the process of gaining and losing head status, and to the thought-provoking question as to whether grammar theory could do without heads at all. The contributions in this volume provide new empirical findings bearing on phenomena that challenge the conception of grammatical heads and/or discuss the notion of head/headedness and its consequences for grammatical theory in a more abstract way. The collected papers view the topic from diverse theoretical perspectives (among others HPSG, Generative Syntax, Optimality Theory) and different empirical angles, covering typological and corpus-linguistic accounts, with a focus on data from German

    Headedness and/or grammatical anarchy?

    Get PDF
    In most grammatical models, hierarchical structuring and dependencies are considered as central features of grammatical structures, an idea which is usually captured by the notion of “head” or “headedness”. While in most models, this notion is more or less taken for granted, there is still much disagreement as to the precise properties of grammatical heads and the theoretical implications that arise of these properties. Moreover, there are quite a few linguistic structures that pose considerable challenges to the notion of “headedness”. Linking to the seminal discussions led in Zwicky (1985) and Corbett, Fraser, & Mc-Glashan (1993), this volume intends to look more closely upon phenomena that are considered problematic for an analysis in terms of grammatical heads. The aim of this book is to approach the concept of “headedness” from its margins. Thus, central questions of the volume relate to the nature of heads and the distinction between headed and non-headed structures, to the process of gaining and losing head status, and to the thought-provoking question as to whether grammar theory could do without heads at all. The contributions in this volume provide new empirical findings bearing on phenomena that challenge the conception of grammatical heads and/or discuss the notion of head/headedness and its consequences for grammatical theory in a more abstract way. The collected papers view the topic from diverse theoretical perspectives (among others HPSG, Generative Syntax, Optimality Theory) and different empirical angles, covering typological and corpus-linguistic accounts, with a focus on data from German
    • …
    corecore