thesis

A corpus-based induction learning approach to natural language processing.

Abstract

by Leung Chi Hong.Thesis (Ph.D.)--Chinese University of Hong Kong, 1996.Includes bibliographical references (leaves 163-171).Chapter Chapter 1. --- Introduction --- p.1Chapter Chapter 2. --- Background Study of Natural Language Processing --- p.9Chapter 2.1. --- Knowledge-based approach --- p.9Chapter 2.1.1. --- Morphological analysis --- p.10Chapter 2.1.2. --- Syntactic parsing --- p.11Chapter 2.1.3. --- Semantic parsing --- p.16Chapter 2.1.3.1. --- Semantic grammar --- p.19Chapter 2.1.3.2. --- Case grammar --- p.20Chapter 2.1.4. --- Problems of knowledge acquisition in knowledge-based approach --- p.22Chapter 2.2. --- Corpus-based approach --- p.23Chapter 2.2.1. --- Beginning of corpus-based approach --- p.23Chapter 2.2.2. --- An example of corpus-based application: word tagging --- p.25Chapter 2.2.3. --- Annotated corpus --- p.26Chapter 2.2.4. --- State of the art in the corpus-based approach --- p.26Chapter 2.3. --- Knowledge-based approach versus corpus-based approach --- p.28Chapter 2.4. --- Co-operation between two different approaches --- p.32Chapter Chapter 3. --- Induction Learning applied to Corpus-based Approach --- p.35Chapter 3.1. --- General model of traditional corpus-based approach --- p.36Chapter 3.1.1. --- Division of a problem into a number of sub-problems --- p.36Chapter 3.1.2. --- Solution selected from a set of predefined choices --- p.36Chapter 3.1.3. --- Solution selection based on a particular kind of linguistic entity --- p.37Chapter 3.1.4. --- Statistical correlations between solutions and linguistic entities --- p.37Chapter 3.1.5. --- Prediction of the best solution based on statistical correlations --- p.38Chapter 3.2. --- First problem in the corpus-based approach: Irrelevance in the corpus --- p.39Chapter 3.3. --- Induction learning --- p.41Chapter 3.3.1. --- General issues about induction learning --- p.41Chapter 3.3.2. --- Reasons of using induction learning in the corpus-based approach --- p.43Chapter 3.3.3. --- General model of corpus-based induction learning approach --- p.45Chapter 3.3.3.1. --- Preparation of positive corpus and negative corpus --- p.45Chapter 3.3.3.2. --- Statistical correlations between solutions and linguistic entities --- p.46Chapter 3.3.3.3. --- Combination of the statistical correlations obtained from the positive and negative corpora --- p.48Chapter 3.4. --- Second problem in the corpus-based approach: Modification of initial probabilistic approximations --- p.50Chapter 3.5. --- Learning feedback modification --- p.52Chapter 3.5.1. --- Determination of which correlation scores to be modified --- p.52Chapter 3.5.2. --- Determination of the magnitude of modification --- p.53Chapter 3.5.3. --- An general algorithm of learning feedback modification --- p.56Chapter Chapter 4. --- Identification of Phrases and Templates in Domain-specific Chinese Texts --- p.59Chapter 4.1. --- Analysis of the problem solved by the traditional corpus-based approach --- p.61Chapter 4.2. --- Phrase identification based on positive and negative corpora --- p.63Chapter 4.3. --- Phrase identification procedure --- p.64Chapter 4.3.1. --- Step 1: Phrase seed identification --- p.65Chapter 4.3.2. --- Step 2: Phrase construction from phrase seeds --- p.65Chapter 4.4. --- Template identification procedure --- p.67Chapter 4.5. --- Experiment and result --- p.70Chapter 4.5.1. --- Testing data --- p.70Chapter 4.5.2. --- Details of experiments --- p.71Chapter 4.5.3. --- Experimental results --- p.72Chapter 4.5.3.1. --- Phrases and templates identified in financial news articles --- p.72Chapter 4.5.3.2. --- Phrases and templates identified in political news articles --- p.73Chapter 4.6. --- Conclusion --- p.74Chapter Chapter 5. --- A Corpus-based Induction Learning Approach to Improving the Accuracy of Chinese Word Segmentation --- p.76Chapter 5.1. --- Background of Chinese word segmentation --- p.77Chapter 5.2. --- Typical methods of Chinese word segmentation --- p.78Chapter 5.2.1. --- Syntactic and semantic approach --- p.78Chapter 5.2.2. --- Statistical approach --- p.79Chapter 5.2.3. --- Heuristic approach --- p.81Chapter 5.3. --- Problems in word segmentation --- p.82Chapter 5.3.1. --- Chinese word definition --- p.82Chapter 5.3.2. --- Word dictionary --- p.83Chapter 5.3.3. --- Word segmentation ambiguity --- p.84Chapter 5.4. --- Corpus-based induction learning approach to improving word segmentation accuracy --- p.86Chapter 5.4.1. --- Rationale of approach --- p.87Chapter 5.4.2. --- Method of constructing modification rules --- p.89Chapter 5.5. --- Experiment and results --- p.94Chapter 5.6. --- Characteristics of modification rules constructed in experiment --- p.96Chapter 5.7. --- Experiment constructing rules for compound words with suffixes --- p.98Chapter 5.8. --- Relationship between modification frequency and Zipfs first law --- p.99Chapter 5.9. --- Problems in the approach --- p.100Chapter 5.10. --- Conclusion --- p.101Chapter Chapter 6. --- Corpus-based Induction Learning Approach to Automatic Indexing of Controlled Index Terms --- p.103Chapter 6.1. --- Background of automatic indexing --- p.103Chapter 6.1.1. --- Definition of index term and indexing --- p.103Chapter 6.1.2. --- Manual indexing versus automatic indexing --- p.105Chapter 6.1.3. --- Different approaches to automatic indexing --- p.107Chapter 6.2. --- Corpus-based induction learning approach to automatic indexing --- p.109Chapter 6.2.1. --- Fundamental concept about corpus-based automatic indexing --- p.110Chapter 6.2.2. --- Procedure of automatic indexing --- p.111Chapter 6.2.2.1. --- Learning process --- p.112Chapter 6.2.2.2. --- Indexing process --- p.118Chapter 6.3. --- Experiments of corpus-based induction learning approach to automatic indexing --- p.118Chapter 6.3.1. --- An experiment evaluating the complete procedures --- p.119Chapter 6.3.1.1. --- Testing data used in the experiment --- p.119Chapter 6.3.1.2. --- Details of the experiment --- p.119Chapter 6.3.1.3. --- Experimental result --- p.121Chapter 6.3.2. --- An experiment comparing with the traditional approach --- p.122Chapter 6.3.3. --- An experiment determining the optimal indexing score threshold --- p.124Chapter 6.3.4. --- An experiment measuring the precision and recall of indexing performance --- p.127Chapter 6.4. --- Learning feedback modification --- p.128Chapter 6.4.1. --- Positive feedback --- p.129Chapter 6.4.2. --- Negative feedback --- p.131Chapter 6.4.3. --- Change of indexed proportions of positive/negative training corpus in feedback iterations --- p.132Chapter 6.4.4. --- An experiment evaluating the learning feedback modification --- p.134Chapter 6.4.5. --- An experiment testing the significance factor in merging process --- p.136Chapter 6.5. --- Conclusion --- p.138Chapter Chapter 7. --- Conclusion --- p.140Appendix A: Some examples of identified phrases in financial news articles --- p.149Appendix B: Some examples of identified templates in financial news articles --- p.150Appendix C: Some examples of texts containing the templates in financial news articles --- p.151Appendix D: Some examples of identified phrases in political news articles --- p.152Appendix E: Some examples of identified templates in political news articles --- p.153Appendix F: Some examples of texts containing the templates in political news articles --- p.154Appendix G: Syntactic tags used in word segmentation modification rule experiment --- p.155Appendix H: An example of semantic approach to automatic indexing --- p.156Appendix I: An example of syntactic approach to automatic indexing --- p.158Appendix J: Samples of INSPEC and MEDLINE Records --- p.161Appendix K: Examples of Promoting and Demoting Words --- p.162References --- p.16

    Similar works