14 research outputs found

    A Study of Chinese Named Entity and Relation Identification in a Specific Domain

    Get PDF
    This thesis aims at investigating automatic identification of Chinese named entities (NEs) and their relations (NERs) in a specific domain. We have proposed a three-stage pipeline computational model for the error correction of word segmentation and POS tagging, NE recognition and NER identification. In this model, an error repair module utilizing machine learning techniques is developed in the first stage. At the second stage, a new algorithm that can automatically construct Finite State Cascades (FSC) from given sets of rules is designed. As a supplement, the recognition strategy without NE trigger words can identify the special linguistic phenomena. In the third stage, a novel approach - positive and negative case-based learning and identification (PNCBL&I) is implemented. It pursues the improvement of the identification performance for NERs through simultaneously learning two opposite cases and automatically selecting effective multi-level linguistic features for NERs and non-NERs. Further, two other strategies, resolving relation conflicts and inferring missing relations, are also integrated in the identification procedure.Diese Dissertation ist der Forschung zur automatischen Erkennung von chinesischen Begriffen (named entities, NE) und ihrer Relationen (NER) in einer spezifischen Domäne gewidmet. Wir haben ein Pipelinemodell mit drei aufeinanderfolgenden Verarbeitungsschritten für die Korrektur der Fehler der Wortsegmentation und Wortartmarkierung, NE-Erkennung, und NER-Identifizierung vorgeschlagen. In diesem Modell wird eine Komponente zur Fehlerreparatur im ersten Verarbeitungsschritt verwirklicht, die ein machinelles Lernverfahren einsetzt. Im zweiten Stadium wird ein neuer Algorithmus, der die Kaskaden endlicher Transduktoren aus den Mengen der Regeln automatisch konstruieren kann, entworfen. Zusätzlich kann eine Strategie für die Erkennung von NE, die nicht durch das Vorkommen bestimmer lexikalischer Trigger markiert sind, die spezielle linguistische Phänomene identifizieren. Im dritten Verarbeitungsschritt wird ein neues Verfahren, das auf dem Lernen und der Identifizierung positiver und negativer Fälle beruht, implementiert. Es verfolgt die Verbesserung der NER-Erkennungsleistung durch das gleichzeitige Lernen zweier gegenüberliegenden Fälle und die automatische Auswahl der wirkungsvollen linguistischen Merkmale auf mehreren Ebenen für die NER und Nicht-NER. Weiter werden zwei andere Strategien, die Lösung von Konflikten in der Relationenerkennung und die Inferenz von fehlenden Relationen, auch in den Erkennungsprozeß integriert

    A Study of Chinese Named Entity and Relation Identification in a Specific Domain

    Get PDF
    This thesis aims at investigating automatic identification of Chinese named entities (NEs) and their relations (NERs) in a specific domain. We have proposed a three-stage pipeline computational model for the error correction of word segmentation and POS tagging, NE recognition and NER identification. In this model, an error repair module utilizing machine learning techniques is developed in the first stage. At the second stage, a new algorithm that can automatically construct Finite State Cascades (FSC) from given sets of rules is designed. As a supplement, the recognition strategy without NE trigger words can identify the special linguistic phenomena. In the third stage, a novel approach - positive and negative case-based learning and identification (PNCBL&I) is implemented. It pursues the improvement of the identification performance for NERs through simultaneously learning two opposite cases and automatically selecting effective multi-level linguistic features for NERs and non-NERs. Further, two other strategies, resolving relation conflicts and inferring missing relations, are also integrated in the identification procedure.Diese Dissertation ist der Forschung zur automatischen Erkennung von chinesischen Begriffen (named entities, NE) und ihrer Relationen (NER) in einer spezifischen Domäne gewidmet. Wir haben ein Pipelinemodell mit drei aufeinanderfolgenden Verarbeitungsschritten für die Korrektur der Fehler der Wortsegmentation und Wortartmarkierung, NE-Erkennung, und NER-Identifizierung vorgeschlagen. In diesem Modell wird eine Komponente zur Fehlerreparatur im ersten Verarbeitungsschritt verwirklicht, die ein machinelles Lernverfahren einsetzt. Im zweiten Stadium wird ein neuer Algorithmus, der die Kaskaden endlicher Transduktoren aus den Mengen der Regeln automatisch konstruieren kann, entworfen. Zusätzlich kann eine Strategie für die Erkennung von NE, die nicht durch das Vorkommen bestimmer lexikalischer Trigger markiert sind, die spezielle linguistische Phänomene identifizieren. Im dritten Verarbeitungsschritt wird ein neues Verfahren, das auf dem Lernen und der Identifizierung positiver und negativer Fälle beruht, implementiert. Es verfolgt die Verbesserung der NER-Erkennungsleistung durch das gleichzeitige Lernen zweier gegenüberliegenden Fälle und die automatische Auswahl der wirkungsvollen linguistischen Merkmale auf mehreren Ebenen für die NER und Nicht-NER. Weiter werden zwei andere Strategien, die Lösung von Konflikten in der Relationenerkennung und die Inferenz von fehlenden Relationen, auch in den Erkennungsprozeß integriert

    Natural Language Processing: Emerging Neural Approaches and Applications

    Get PDF
    This Special Issue highlights the most recent research being carried out in the NLP field to discuss relative open issues, with a particular focus on both emerging approaches for language learning, understanding, production, and grounding interactively or autonomously from data in cognitive and neural systems, as well as on their potential or real applications in different domains

    Frame semantics for the field of climate change : d iscovering frames based on chinese and english terms

    Full text link
    La plupart des dictionnaires spécialisés de termes environnementaux en mandarin sont des dictionnaires papier, compilés et révisés il y a plus de dix ans, et contiennent principalement des termes nominaux. Les informations terminologiques se limitent aux connaissances véhiculées par le terme et son ou ses équivalents anglais. Pour les lecteurs qui souhaitent connaître les propriétés sémantiques ou syntaxiques des termes et pour les lecteurs qui veulent voir l’usage des termes dans des contextes réels de textes spécialisés, les informations fournies par les dictionnaires existants sont insuffisantes. Dans cette recherche, nous avons compilé une ressource terminologique en ligne du mandarin, décrivant les termes verbaux chinois dans le domaine du changement climatique. Cette ressource comble certaines des lacunes des dictionnaires environnementaux mandarin existants, en révélant le(s) sens du terme à travers la(les) structure(s) actantielle(s) et en montrant, à travers des contextes annotés, les propriétés sémantiques et syntaxiques du terme ainsi que ses usages pratiques dans des textes spécialisés. Cette ressource répondra mieux aux besoins du public. La base théorique qui sous-tend cette recherche est la Sémantique des cadres (Fillmore, 1976, 1977, 1982, 1985; Fillmore & Atkins, 1992), et le FrameNet construit à partir de celle-ci. L’objectif principal de cette recherche est de découvrir et de définir des cadres sémantiques chinois dans le domaine du changement climatique, et d’établir des relations entre les cadres chinois définis. Les cadres sémantiques chinois sont découverts à l’aide de la méthodologie du dictionnaire environnemental multilingue DiCoEnviro (et de sa ressource d’accompagnement Framed DiCoEnviro) (L’Homme, 2018; L’Homme et al., 2020). Afin de rendre cette méthodologie applicable à une langue sino-tibétaine, le chinois, nous avons modifié et adapté cette méthodologie pour qu’elle convienne à la description des termes chinois et à la définition des cadres sémantiques chinois. Certaines de ces modifications et adaptations sont basées sur le Chinese FrameNet (CFN) (Liu & You, 2015). Afin de découvrir les cadres sémantiques chinois, un corpus monolingue en chinois mandarin sur le changement climatique (MCCC) a d’abord été compilé. Ce corpus contient 224 textes iv authentiques chinois spécialisés dans le domaine du changement climatique, qui totalisent 1,228,333 caractères chinois, soit 547,592 mots chinois. Puis, les termes candidats ont été automatiquement extraits du MCCC à l’aide du logiciel de gestion et d’analyse de corpus – Sketch Engine. Après une analyse et une validation manuelle, nous avons déterminé quels termes candidats sont des termes réels. Par la suite, la structure actancielle de chaque terme a été écrite en analysant les contextes où le terme apparaît. Ensuite, chaque sens d’un terme polysémique a été placé dans une entrée séparée et 16-20 contextes ont été sélectionnés pour chaque entrée. Puis, chaque contexte a été annoté en fonction de trois couches – structure sémantique, fonction syntaxique et groupe syntaxique. Ensuite, les termes ont été classés en fonction des scénarios qu’ils évoquent. Les termes qui dépeignent la même scène ou situation dans le domaine du changement climatique, qui ont une structure actantielle similaire et qui partagent la majorité des circonstants sont classés dans un seul cadre sémantique (critères basés sur le projet DiCoEnviro (L’Homme, 2018; L’Homme et al., 2020)). Après avoir identifié les cadres sémantiques chinois, chaque cadre a été défini. Enfin, les cadres chinois découverts ont été reliés selon les huit types de relations entre cadres proposés par Ruppenhofer et al. (2016). Pour être affichés en ligne, les entrées de termes et les cadres sémantiques ont été encodés dans des fichiers XML. Guidés par cette méthodologie de recherche, nous avons finalement relevé 23 cadres sémantiques chinois et nous les avons définis. Le résultat final de cette recherche est une ressource terminologique en chinois mandarin basée sur des cadres et spécialisée dans le domaine du changement climatique. Cette ressource terminologique se compose de deux parties. La première partie est la description d’un total de 39 termes verbaux chinois. Chaque sens d’un terme verbal polysémique étant placé dans une entrée séparée, il y a au total 59 entrées (chaque entrée contient la structure actantielle et les contextes annotés). Au total, 1,027 contextes ont été annotés. La deuxième partie de cette ressource présente les 23 cadres sémantiques chinois identifiés ainsi que les relations entre les cadres.Most of the existing Mandarin Chinese specialised dictionaries of environmental terms are paper dictionaries, compiled and revised more than ten years ago, and contain mainly noun terms. Terminological information is restricted to knowledge conveyed by the term and its English equivalent(s). For readers who want to learn about semantic or syntactic properties of terms and for readers who want to see usage of terms in real contexts of specialised texts, information provided in existing dictionaries is insufficient. In this research, we compiled an online Mandarin Chinese terminological resource, describing Chinese verb terms in the field of climate change. This resource makes up for some of the deficiencies of existing Chinese environmental dictionaries, revealing meaning(s) of the term through actantial structure(s) and showing, through annotated contexts, semantic and syntactic properties of the term as well as its practical usages in specialised texts. This resource better meets the needs of the audience. The theoretical basis underpinning this research is Frame Semantics (Fillmore, 1976, 1977, 1982, 1985; Fillmore & Atkins, 1992), and the FrameNet built from it. The main objective of this research is to discover and define Chinese semantic frames in the field of climate change, and to establish relations between the Chinese frames defined. The Chinese semantic frames are discovered with the help of the methodology of the multilingual environmental dictionary DiCoEnviro (and its accompanying resource Framed DiCoEnviro) (L’Homme, 2018; L’Homme et al., 2020). In order to make this methodology applicable to a Sino-Tibetan language, Chinese, we modified and adapted this methodology to suit the description of Chinese terms and definition of Chinese semantic frames. Some of the changes and adaptations are based on the Chinese FrameNet (CFN) (Liu & You, 2015). In order to discover Chinese semantic frames, a monolingual Mandarin (Chinese) Climate Change Corpus (MCCC) was first compiled. This corpus contains 224 authentic Chinese specialised texts in the field of climate change, totaling 1,228,333 Chinese characters, which is 547,592 Chinese words. Following this, candidate terms were automatically extracted from MCCC using the corpus ii management and analysing software – Sketch Engine. After manual analysis and validation, which of the candidate terms are true terms was clarified. Subsequently, the actantial structure of each term was written by analysing the contexts where the term occurs. Next, each sense of a polysemous term was placed in a separate entry and 16-20 contexts were selected for each entry. Then, each context was annotated in terms of three layers – semantic structure, syntactic function and syntactic group. After this, the terms were classified according to the scenarios they evoke. Terms that depict the same scene or situation in the field of climate change, have similar actantial structure, and share the majority of circumstants are categorised into one semantic frame (criteria based on the project DiCoEnviro (L’Homme, 2018; L’Homme et al., 2020)). After Chinese semantic frames were identified, each frame was defined. Finally, the discovered Chinese frames were linked according to the eight types of frame relations proposed by Ruppenhofer et al. (2016). To be displayed online, term entries and semantic frames were encoded in XML files. Guided by this research methodology, we eventually discovered and defined 23 Chinese semantic frames. The end result of this research is a frame-based Mandarin Chinese terminological resource specialised in the field of climate change. This terminological resource consists of two parts. The first part is the description of a total of 39 Chinese verb terms. With each meaning of a polysemous verb term placed in a separate entry, there are a total of 59 entries (each entry contains the actantial structure and annotated contexts). A total of 1,027 contexts were annotated. The second part of this resource presents the 23 Chinese semantic frames identified as well as the relations between frames

    A robust methodology for automated essay grading

    Get PDF
    None of the available automated essay grading systems can be used to grade essays according to the National Assessment Program – Literacy and Numeracy (NAPLAN) analytic scoring rubric used in Australia. This thesis is a humble effort to address this limitation. The objective of this thesis is to develop a robust methodology for automatically grading essays based on the NAPLAN rubric by using heuristics and rules based on English language and neural network modelling

    An Energy-Efficient and Reliable Data Transmission Scheme for Transmitter-based Energy Harvesting Networks

    Get PDF
    Energy harvesting technology has been studied to overcome a limited power resource problem for a sensor network. This paper proposes a new data transmission period control and reliable data transmission algorithm for energy harvesting based sensor networks. Although previous studies proposed a communication protocol for energy harvesting based sensor networks, it still needs additional discussion. Proposed algorithm control a data transmission period and the number of data transmission dynamically based on environment information. Through this, energy consumption is reduced and transmission reliability is improved. The simulation result shows that the proposed algorithm is more efficient when compared with previous energy harvesting based communication standard, Enocean in terms of transmission success rate and residual energy.This research was supported by Basic Science Research Program through the National Research Foundation by Korea (NRF) funded by the Ministry of Education, Science and Technology(2012R1A1A3012227)

    Proceedings of the First NLPL Workshop on Deep Learning for Natural Language Processing

    Get PDF

    The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE)

    Get PDF

    Nearly Perfect: Notes on the Failures of Salvage Linguistics

    Get PDF
    This dissertation examines the salvage era of American linguistics (c.19101940) and its focus on the extraction of knowledges and cultural artifacts from Indigenous groups whose civilizations were believed in peril. Through close readings of historical archives and published materials, I imbricate the history of these scientific collection practices through the interpretive frames of Science & Technology Studies (STS), deconstructive criticism, and postcolonial theory. I centre the project on the career of linguist-anthropologist Edward Sapir, seizing upon his belief that linguistics was more nearly perfect than other human sciencesthat linguistic methods were more akin to those of the natural sciences or formal mathematics. I employ Sapir as the chief focalizer of my work to map the changing topography of the language sciences in North America over these pivotal decades of disciplinary formation. Failure, here, offers a heuristic device to interrogate the linear logics of science and success which buttress that desire for perfection. Both conceptually and historically, the dialectics of failure and success throw into relief the vicissitudes of fieldwork, the uncertainty of patronage relationships, and the untenable promise of salvage that characterized these years. Through this approach, I present linguistics instead as a kairotic sciencefrom the Greek kairos, suggesting opportunitynot perfect, but situated vividly in the world, bound by space, identity, and time. I examine how linguists conducted their collection work through the extension of a scientific network (Chapter 1), their construction of a scientific identity to the gradual exclusion of amateurs and the reduction of informant contributions (Chapter 2), and the development of an experimental system within the temporalities of fieldwork (Chapter 3). My dissertation hence invites a critical intervention within the history linguistics to re-encounter the sciences disregarded past and re-think its shared responsibility toward Indigenous communities in the present
    corecore