1,205 research outputs found

    Semi-automatic acquisition of domain-specific semantic structures.

    Get PDF
    Siu, Kai-Chung.Thesis (M.Phil.)--Chinese University of Hong Kong, 2000.Includes bibliographical references (leaves 99-106).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Thesis Outline --- p.5Chapter 2 --- Background --- p.6Chapter 2.1 --- Natural Language Understanding --- p.6Chapter 2.1.1 --- Rule-based Approaches --- p.7Chapter 2.1.2 --- Stochastic Approaches --- p.8Chapter 2.1.3 --- Phrase-Spotting Approaches --- p.9Chapter 2.2 --- Grammar Induction --- p.10Chapter 2.2.1 --- Semantic Classification Trees --- p.11Chapter 2.2.2 --- Simulated Annealing --- p.12Chapter 2.2.3 --- Bayesian Grammar Induction --- p.12Chapter 2.2.4 --- Statistical Grammar Induction --- p.13Chapter 2.3 --- Machine Translation --- p.14Chapter 2.3.1 --- Rule-based Approach --- p.15Chapter 2.3.2 --- Statistical Approach --- p.15Chapter 2.3.3 --- Example-based Approach --- p.16Chapter 2.3.4 --- Knowledge-based Approach --- p.16Chapter 2.3.5 --- Evaluation Method --- p.19Chapter 3 --- Semi-Automatic Grammar Induction --- p.20Chapter 3.1 --- Agglomerative Clustering --- p.20Chapter 3.1.1 --- Spatial Clustering --- p.21Chapter 3.1.2 --- Temporal Clustering --- p.24Chapter 3.1.3 --- Free Parameters --- p.26Chapter 3.2 --- Post-processing --- p.27Chapter 3.3 --- Chapter Summary --- p.29Chapter 4 --- Application to the ATIS Domain --- p.30Chapter 4.1 --- The ATIS Domain --- p.30Chapter 4.2 --- Parameters Selection --- p.32Chapter 4.3 --- Unsupervised Grammar Induction --- p.35Chapter 4.4 --- Prior Knowledge Injection --- p.40Chapter 4.5 --- Evaluation --- p.43Chapter 4.5.1 --- Parse Coverage in Understanding --- p.45Chapter 4.5.2 --- Parse Errors --- p.46Chapter 4.5.3 --- Analysis --- p.47Chapter 4.6 --- Chapter Summary --- p.49Chapter 5 --- Portability to Chinese --- p.50Chapter 5.1 --- Corpus Preparation --- p.50Chapter 5.1.1 --- Tokenization --- p.51Chapter 5.2 --- Experiments --- p.52Chapter 5.2.1 --- Unsupervised Grammar Induction --- p.52Chapter 5.2.2 --- Prior Knowledge Injection --- p.56Chapter 5.3 --- Evaluation --- p.58Chapter 5.3.1 --- Parse Coverage in Understanding --- p.59Chapter 5.3.2 --- Parse Errors --- p.60Chapter 5.4 --- Grammar Comparison Across Languages --- p.60Chapter 5.5 --- Chapter Summary --- p.64Chapter 6 --- Bi-directional Machine Translation --- p.65Chapter 6.1 --- Bilingual Dictionary --- p.67Chapter 6.2 --- Concept Alignments --- p.68Chapter 6.3 --- Translation Procedures --- p.73Chapter 6.3.1 --- The Matching Process --- p.74Chapter 6.3.2 --- The Searching Process --- p.76Chapter 6.3.3 --- Heuristics to Aid Translation --- p.81Chapter 6.4 --- Evaluation --- p.82Chapter 6.4.1 --- Coverage --- p.83Chapter 6.4.2 --- Performance --- p.86Chapter 6.5 --- Chapter Summary --- p.89Chapter 7 --- Conclusions --- p.90Chapter 7.1 --- Summary --- p.90Chapter 7.2 --- Future Work --- p.92Chapter 7.2.1 --- Suggested Improvements on Grammar Induction Process --- p.92Chapter 7.2.2 --- Suggested Improvements on Bi-directional Machine Trans- lation --- p.96Chapter 7.2.3 --- Domain Portability --- p.97Chapter 7.3 --- Contributions --- p.97Bibliography --- p.99Chapter A --- Original SQL Queries --- p.107Chapter B --- Induced Grammar --- p.109Chapter C --- Seeded Categories --- p.11

    A Novel and Robust Approach for Pro-Drop Language Translation

    Get PDF
    A significant challenge for machine translation (MT) is the phenomena of dropped pronouns (DPs), where certain classes of pronouns are frequently dropped in the source language but should be retained in the target language. In response to this common problem, we propose a semi-supervised approach with a universal framework to recall missing pronouns in translation. Firstly, we build training data for DP generation in which the DPs are automatically labelled according to the alignment information from a parallel corpus. Secondly, we build a deep learning-based DP generator for input sentences in decoding when no corresponding references exist. More specifically, the generation has two phases: (1) DP position detection, which is modeled as a sequential labelling task with recurrent neural networks; and (2) DP prediction, which employs a multilayer perceptron with rich features. Finally, we integrate the above outputs into our statistical MT (SMT) system to recall missing pronouns by both extracting rules from the DP-labelled training data and translating the DP-generated input sentences. To validate the robustness of our approach, we investigate our approach on both Chinese–English and Japanese–English corpora extracted from movie subtitles. Compared with an SMT baseline system, experimental results show that our approach achieves a significant improvement of++1.58 BLEU points in translation performance with 66% F-score for DP generation accuracy for Chinese–English, and nearly++1 BLEU point with 58% F-score for Japanese–English. We believe that this work could help both MT researchers and industries to boost the performance of MT systems between pro-drop and non-pro-drop languages

    PersoNER: Persian named-entity recognition

    Full text link
    © 1963-2018 ACL. Named-Entity Recognition (NER) is still a challenging task for languages with low digital resources. The main difficulties arise from the scarcity of annotated corpora and the consequent problematic training of an effective NER pipeline. To abridge this gap, in this paper we target the Persian language that is spoken by a population of over a hundred million people world-wide. We first present and provide ArmanPerosNERCorpus, the first manually-annotated Persian NER corpus. Then, we introduce PersoNER, an NER pipeline for Persian that leverages a word embedding and a sequential max-margin classifier. The experimental results show that the proposed approach is capable of achieving interesting MUC7 and CoNNL scores while outperforming two alternatives based on a CRF and a recurrent neural network

    Frequency vs. Association for Constraint Selection in Usage-Based Construction Grammar

    Get PDF
    A usage-based Construction Grammar (CxG) posits that slot-constraints generalize from common exemplar constructions. But what is the best model of constraint generalization? This paper evaluates competing frequency-based and association-based models across eight languages using a metric derived from the Minimum Description Length paradigm. The experiments show that association-based models produce better generalizations across all languages by a significant margin

    Semi-automatic grammar induction for bidirectional machine translation.

    Get PDF
    Wong, Chin Chung.Thesis (M.Phil.)--Chinese University of Hong Kong, 2002.Includes bibliographical references (leaves 137-143).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Objectives --- p.3Chapter 1.2 --- Thesis Outline --- p.5Chapter 2 --- Background in Natural Language Understanding --- p.6Chapter 2.1 --- Rule-based Approaches --- p.7Chapter 2.2 --- Corpus-based Approaches --- p.8Chapter 2.2.1 --- Stochastic Approaches --- p.8Chapter 2.2.2 --- Phrase-spotting Approaches --- p.9Chapter 2.3 --- The ATIS Domain --- p.10Chapter 2.3.1 --- Chinese Corpus Preparation --- p.11Chapter 3 --- Semi-automatic Grammar Induction - Baseline Approach --- p.13Chapter 3.1 --- Background in Grammar Induction --- p.13Chapter 3.1.1 --- Simulated Annealing --- p.14Chapter 3.1.2 --- Bayesian Grammar Induction --- p.14Chapter 3.1.3 --- Probabilistic Grammar Acquisition --- p.15Chapter 3.2 --- Semi-automatic Grammar Induction 一 Baseline Approach --- p.16Chapter 3.2.1 --- Spatial Clustering --- p.16Chapter 3.2.2 --- Temporal Clustering --- p.18Chapter 3.2.3 --- Post-processing --- p.19Chapter 3.2.4 --- Four Aspects for Enhancements --- p.20Chapter 3.3 --- Chapter Summary --- p.22Chapter 4 --- Semi-automatic Grammar Induction - Enhanced Approach --- p.23Chapter 4.1 --- Evaluating Induced Grammars --- p.24Chapter 4.2 --- Stopping Criterion --- p.26Chapter 4.2.1 --- Cross-checking with Recall Values --- p.29Chapter 4.3 --- Improvements on Temporal Clustering --- p.32Chapter 4.3.1 --- Evaluation --- p.39Chapter 4.4 --- Improvements on Spatial Clustering --- p.46Chapter 4.4.1 --- Distance Measures --- p.48Chapter 4.4.2 --- Evaluation --- p.57Chapter 4.5 --- Enhancements based on Intelligent Selection --- p.62Chapter 4.5.1 --- Informed Selection between Spatial Clustering and Tem- poral Clustering --- p.62Chapter 4.5.2 --- Selecting the Number of Clusters Per Iteration --- p.64Chapter 4.5.3 --- An Example for Intelligent Selection --- p.64Chapter 4.5.4 --- Evaluation --- p.68Chapter 4.6 --- Chapter Summary --- p.71Chapter 5 --- Bidirectional Machine Translation using Induced Grammars ´ؤBaseline Approach --- p.73Chapter 5.1 --- Background in Machine Translation --- p.75Chapter 5.1.1 --- Rule-based Machine Translation --- p.75Chapter 5.1.2 --- Statistical Machine Translation --- p.76Chapter 5.1.3 --- Knowledge-based Machine Translation --- p.77Chapter 5.1.4 --- Example-based Machine Translation --- p.78Chapter 5.1.5 --- Evaluation --- p.79Chapter 5.2 --- Baseline Configuration on Bidirectional Machine Translation System --- p.84Chapter 5.2.1 --- Bilingual Dictionary --- p.84Chapter 5.2.2 --- Concept Alignments --- p.85Chapter 5.2.3 --- Translation Process --- p.89Chapter 5.2.4 --- Two Aspects for Enhancements --- p.90Chapter 5.3 --- Chapter Summary --- p.91Chapter 6 --- Bidirectional Machine Translation ´ؤ Enhanced Approach --- p.92Chapter 6.1 --- Concept Alignments --- p.93Chapter 6.1.1 --- Enhanced Alignment Scheme --- p.95Chapter 6.1.2 --- Experiment --- p.97Chapter 6.2 --- Grammar Checker --- p.100Chapter 6.2.1 --- Components for Grammar Checking --- p.101Chapter 6.3 --- Evaluation --- p.117Chapter 6.3.1 --- Bleu Score Performance --- p.118Chapter 6.3.2 --- Modified Bleu Score --- p.122Chapter 6.4 --- Chapter Summary --- p.130Chapter 7 --- Conclusions --- p.131Chapter 7.1 --- Summary --- p.131Chapter 7.2 --- Contributions --- p.134Chapter 7.3 --- Future work --- p.136Bibliography --- p.137Chapter A --- Original SQL Queries --- p.144Chapter B --- Seeded Categories --- p.146Chapter C --- 3 Alignment Categories --- p.147Chapter D --- Labels of Syntactic Structures in Grammar Checker --- p.14

    Treebank-based acquisition of Chinese LFG resources for parsing and generation

    Get PDF
    This thesis describes a treebank-based approach to automatically acquire robust,wide-coverage Lexical-Functional Grammar (LFG) resources for Chinese parsing and generation, which is part of a larger project on the rapid construction of deep, large-scale, constraint-based, multilingual grammatical resources. I present an application-oriented LFG analysis for Chinese core linguistic phenomena and (in cooperation with PARC) develop a gold-standard dependency-bank of Chinese f-structures for evaluation. Based on the Penn Chinese Treebank, I design and implement two architectures for inducing Chinese LFG resources, one annotation-based and the other dependency conversion-based. I then apply the f-structure acquisition algorithm together with external, state-of-the-art parsers to parsing new text into "proto" f-structures. In order to convert "proto" f-structures into "proper" f-structures or deep dependencies, I present a novel Non-Local Dependency (NLD) recovery algorithm using subcategorisation frames and f-structure paths linking antecedents and traces in NLDs extracted from the automatically-built LFG f-structure treebank. Based on the grammars extracted from the f-structure annotated treebank, I develop a PCFG-based chart generator and a new n-gram based pure dependency generator to realise Chinese sentences from LFG f-structures. The work reported in this thesis is the first effort to scale treebank-based, probabilistic Chinese LFG resources from proof-of-concept research to unrestricted, real text. Although this thesis concentrates on Chinese and LFG, many of the methodologies, e.g. the acquisition of predicate-argument structures, NLD resolution and the PCFG- and dependency n-gram-based generation models, are largely language and formalism independent and should generalise to diverse languages as well as to labelled bilexical dependency representations other than LFG

    Construction Grammar and Language Models

    Full text link
    Recent progress in deep learning and natural language processing has given rise to powerful models that are primarily trained on a cloze-like task and show some evidence of having access to substantial linguistic information, including some constructional knowledge. This groundbreaking discovery presents an exciting opportunity for a synergistic relationship between computational methods and Construction Grammar research. In this chapter, we explore three distinct approaches to the interplay between computational methods and Construction Grammar: (i) computational methods for text analysis, (ii) computational Construction Grammar, and (iii) deep learning models, with a particular focus on language models. We touch upon the first two approaches as a contextual foundation for the use of computational methods before providing an accessible, yet comprehensive overview of deep learning models, which also addresses reservations construction grammarians may have. Additionally, we delve into experiments that explore the emergence of constructionally relevant information within these models while also examining the aspects of Construction Grammar that may pose challenges for these models. This chapter aims to foster collaboration between researchers in the fields of natural language processing and Construction Grammar. By doing so, we hope to pave the way for new insights and advancements in both these fields.Comment: Accepted for publication in The Cambridge Handbook of Construction Grammar, edited by Mirjam Fried and Kiki Nikiforidou. To appear in 202
    corecore