5,633 research outputs found

    Chinese noun phrase parsing with a hybrid approach.

    Get PDF
    by Angel Suet Yi Tse.Thesis (M.Phil.)--Chinese University of Hong Kong, 1996.Includes bibliographical references (leaves 126-130).AbstractAcknowledgementsTable of ContentsList of TablesList of FiguresPlagiarism DeclarationChapter Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Overview --- p.1Chapter 1.2 --- Motivation --- p.2Chapter 1.3 --- Applications of NP parsing --- p.4Chapter 1.4 --- The Hybrid Approach of NP Partial Parsing with Rule Set Derived from de NPs --- p.5Chapter 1.5 --- Organization of the Thesis --- p.7Chapter Chapter 2 --- Related Work --- p.9Chapter 2.1 --- Overview --- p.9Chapter 2.2 --- Chinese Versus English Languages --- p.10Chapter 2.3 --- Traditional Versus Contemporary Parsing Approaches --- p.15Chapter 2.3.1 --- Linguistics-based and Corpus-based Knowledge Acquisition --- p.15Chapter 2.3.2 --- Basic Processing Unit --- p.16Chapter 2.3.3 --- Related Literature --- p.17Chapter 2.4 --- Sentence / Free Text Parsing --- p.18Chapter 2.4.1 --- Linguistics-based --- p.18Chapter 2.4.2 --- Corpus-based --- p.21Chapter 2.5 --- NP Processing --- p.22Chapter 2.5.1 --- NP Detection --- p.22Chapter 2.5.2 --- NP Partial Parsing --- p.26Chapter 2.6 --- Summary --- p.27Chapter Chapter 3 --- Knowledge Elicitation for General NP Partial Parsing from De NPs --- p.28Chapter 3.1 --- Overview --- p.28Chapter 3.2 --- Background --- p.29Chapter 3.3 --- Research in De Phrases --- p.33Chapter 3.3.1 --- Research of de Phrases in Pure Linguistics --- p.33Chapter 3.3.2 --- Research in de Phrases in Computational Linguistics --- p.36Chapter 3.4 --- Significance of De Phrases --- p.37Chapter 3.4.1 --- Implication to General NP Parsing --- p.37Chapter 3.4.2 --- Embedded Knowledge for General NP Parsing --- p.37Chapter 3.5 --- Summary --- p.39Chapter Chapter 4 --- Knowledge Acquisition Approaches for General NP Partial Parsing --- p.40Chapter 4.1 --- Overview --- p.40Chapter 4.2 --- Linguistic-based Approach --- p.41Chapter 4.3 --- Corpus-based Approach --- p.43Chapter 4.3.1 --- Generalization of NP Grammatical Patterns --- p.44Chapter 4.3.2 --- Pitfall of Generalization --- p.47Chapter 4.4 --- The Hybrid Approach --- p.47Chapter 4.4.1 --- Combining Strategies --- p.50Chapter 4.4.2 --- Merging Techniques --- p.53Chapter 4.5 --- CNP3- The Chinese NP Partial Parser --- p.55Chapter 4.5.1 --- The NP Detection and Extraction Unit (DEU) --- p.56Chapter 4.5.2 --- The Knowledge Acquisition Unit (KAU) --- p.56Chapter 4.5.3 --- The Parsing Unit (PU) --- p.57Chapter 4.5.4 --- Internal Representation of Chinese NPs and Grammar Rules --- p.57Chapter 4.6 --- Summary --- p.58Chapter Chapter 5 --- "Experiments on Linguistics-, Corpus-based and the Hybrid Approaches" --- p.60Chapter 5.1 --- Overview --- p.60Chapter 5.2 --- Objective of Experiments --- p.61Chapter 5.3 --- Experimental Setup --- p.62Chapter 5.3.1 --- The Corpora --- p.62Chapter 5.3.2 --- The Standard and Extended Tag Sets --- p.64Chapter 5.4 --- Overview of Experiments --- p.67Chapter 5.5 --- Evaluation of Linguistic De NP Rules (Experiment 1 A) --- p.70Chapter 5.5.1 --- Method --- p.71Chapter 5.5.2 --- Results --- p.72Chapter 5.5.3 --- Analysis --- p.72Chapter 5.6 --- Evaluation of Corpus-based Approach (Experiment IB) --- p.74Chapter 5.6.1 --- Method --- p.74Chapter 5.6.2 --- Results --- p.75Chapter 5.6.3 --- Analysis --- p.76Chapter 5.6.4 --- Generalization of NP Grammatical Patterns (Experiment 1B') --- p.76Chapter 5.6.5 --- Results after Merging of Rule Sets (Experiment 1C) --- p.77Chapter 5.6.6 --- Error Analysis --- p.79Chapter 5.7 --- Phase II Evaluation: Test on General NP Parsing (Experiment 2) --- p.82Chapter 5.7.1 --- Method --- p.83Chapter 5.7.2 --- Results --- p.85Chapter 5.7.3 --- Error Analysis --- p.86Chapter 5.8 --- Summary --- p.92Chapter Chapter 6 --- Reliability Evaluation of the Hybrid Approach --- p.94Chapter 6.1 --- Overview --- p.94Chapter 6.2 --- Objective --- p.95Chapter 6.3 --- The Training and Test Corpora --- p.96Chapter 6.4 --- The Knowledge Base --- p.98Chapter 6.5 --- Convergence Sequence Tests --- p.99Chapter 6.5.1 --- Results of Close Convergence Tests --- p.100Chapter 6.5.2 --- Results of Open Convergence Tests --- p.104Chapter 6.5.3 --- Conclusions with Convergence Tests --- p.106Chapter 6.6 --- Cross Evaluation Tests --- p.106Chapter 6.6.1 --- Results --- p.109Chapter 6.6.2 --- Conclusions with Cross Evaluation Tests --- p.112Chapter 6.7 --- Summary --- p.113Chapter Chapter 7 --- Discussion and Conclusions --- p.115Chapter 7.1 --- Overview --- p.115Chapter 7.2 --- Difficulties Encountered --- p.116Chapter 7.2.1 --- Lack of Standard in Part-of-speech Categorization in Chinese Language --- p.116Chapter 7.2.2 --- Under or Over-specification of Tag Class in Tag Set --- p.118Chapter 7.2.3 --- Difficulty in Nominal Compound NP Analysis --- p.119Chapter 7.3 --- Conclusions --- p.120Chapter 7.4 --- Future Work --- p.122Chapter 7.4.1 --- Full Automation of NP Pattern Generalization --- p.122Chapter 7.4.2 --- Incorporation of Semantic Constraints --- p.123Chapter 7.4.3 --- Computational Structural Analysis of Nominal Compound NP --- p.124References --- p.126Appendix A The Extended Tag Set --- p.131Appendix B Linguistic Grammar Rules --- p.135Appendix C Generalized Grammar Rules --- p.13

    Treebank-based acquisition of a Chinese lexical-functional grammar

    Get PDF
    Scaling wide-coverage, constraint-based grammars such as Lexical-Functional Grammars (LFG) (Kaplan and Bresnan, 1982; Bresnan, 2001) or Head-Driven Phrase Structure Grammars (HPSG) (Pollard and Sag, 1994) from fragments to naturally occurring unrestricted text is knowledge-intensive, time-consuming and (often prohibitively) expensive. A number of researchers have recently presented methods to automatically acquire wide-coverage, probabilistic constraint-based grammatical resources from treebanks (Cahill et al., 2002, Cahill et al., 2003; Cahill et al., 2004; Miyao et al., 2003; Miyao et al., 2004; Hockenmaier and Steedman, 2002; Hockenmaier, 2003), addressing the knowledge acquisition bottleneck in constraint-based grammar development. Research to date has concentrated on English and German. In this paper we report on an experiment to induce wide-coverage, probabilistic LFG grammatical and lexical resources for Chinese from the Penn Chinese Treebank (CTB) (Xue et al., 2002) based on an automatic f-structure annotation algorithm. Currently 96.751% of the CTB trees receive a single, covering and connected f-structure, 0.112% do not receive an f-structure due to feature clashes, while 3.137% are associated with multiple f-structure fragments. From the f-structure-annotated CTB we extract a total of 12975 lexical entries with 20 distinct subcategorisation frame types. Of these 3436 are verbal entries with a total of 11 different frame types. We extract a number of PCFG-based LFG approximations. Currently our best automatically induced grammars achieve an f-score of 81.57% against the trees in unseen articles 301-325; 86.06% f-score (all grammatical functions) and 73.98% (preds-only) against the dependencies derived from the f-structures automatically generated for the original trees in 301-325 and 82.79% (all grammatical functions) and 67.74% (preds-only) against the dependencies derived from the manually annotated gold-standard f-structures for 50 trees randomly selected from articles 301-325

    Unsupervised Terminological Ontology Learning based on Hierarchical Topic Modeling

    Full text link
    In this paper, we present hierarchical relationbased latent Dirichlet allocation (hrLDA), a data-driven hierarchical topic model for extracting terminological ontologies from a large number of heterogeneous documents. In contrast to traditional topic models, hrLDA relies on noun phrases instead of unigrams, considers syntax and document structures, and enriches topic hierarchies with topic relations. Through a series of experiments, we demonstrate the superiority of hrLDA over existing topic models, especially for building hierarchies. Furthermore, we illustrate the robustness of hrLDA in the settings of noisy data sets, which are likely to occur in many practical scenarios. Our ontology evaluation results show that ontologies extracted from hrLDA are very competitive with the ontologies created by domain experts

    A Survey of Paraphrasing and Textual Entailment Methods

    Full text link
    Paraphrasing methods recognize, generate, or extract phrases, sentences, or longer natural language expressions that convey almost the same information. Textual entailment methods, on the other hand, recognize, generate, or extract pairs of natural language expressions, such that a human who reads (and trusts) the first element of a pair would most likely infer that the other element is also true. Paraphrasing can be seen as bidirectional textual entailment and methods from the two areas are often similar. Both kinds of methods are useful, at least in principle, in a wide range of natural language processing applications, including question answering, summarization, text generation, and machine translation. We summarize key ideas from the two areas by considering in turn recognition, generation, and extraction methods, also pointing to prominent articles and resources.Comment: Technical Report, Natural Language Processing Group, Department of Informatics, Athens University of Economics and Business, Greece, 201

    Natural language understanding: instructions for (Present and Future) use

    Get PDF
    In this paper I look at Natural Language Understanding, an area of Natural Language Processing aimed at making sense of text, through the lens of a visionary future: what do we expect a machine should be able to understand? and what are the key dimensions that require the attention of researchers to make this dream come true
    corecore