12 research outputs found

    Sentence Rephrasing for Parsing Sentences with OOV Words

    Get PDF
    Abstract This paper addresses the problems of out-of-vocabulary (OOV) words, named entities in particular, in dependency parsing. The OOV words, whose word forms are unknown to the learning-based parser, in a sentence may decrease the parsing performance. To deal with this problem, we propose a sentence rephrasing approach to replace each OOV word in a sentence with a popular word of the same named entity type in the training set, so that the knowledge of the word forms can be used for parsing. The highest-frequency-based rephrasing strategy and the information-retrieval-based rephrasing strategy are explored to select the word to replace, and the Chinese Treebank 6.0 (CTB6) corpus is adopted to evaluate the feasibility of the proposed sentence rephrasing strategies. Experimental results show that rephrasing some specific types of OOV words such as Corporation, Organization, and Competition increases the parsing performances. This methodology can be applied to domain adaptation to deal with OOV problems

    A history and theory of textual event detection and recognition

    Get PDF

    The feasibility of applying Latent Semantic Analysis to analyze Item similarity

    Get PDF
    [[abstract]]The purpose of this study is to apply latent semantic analysis (LSA) to analyze item similarity , and discuss the result of using different score function. The feature of LSA model is “Lexically Co-occur” detection , in other words, LSA model can analyze many documents, and find synonyms , but synonyms rarely exist in the same item , so LSA model needs to be trained by documents which are related to this item . This study revealed that the result using dice measure or inner product measure correlates more closely with expert’s scores. For the items which is more agreeable of expert’s scores than others , the maximum correlation is up to 0.9, and the mean of correlation is up to 0.7, so applying latent semantic analysis to analyze item similarity is a feasible technology.

    Transformational tagging for topic tracking in natural language.

    Get PDF
    Ip Chun Wah Timmy.Thesis (M.Phil.)--Chinese University of Hong Kong, 2000.Includes bibliographical references (leaves 113-120).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Topic Detection and Tracking --- p.2Chapter 1.1.1 --- What is a Topic? --- p.3Chapter 1.1.2 --- What is Topic Tracking? --- p.4Chapter 1.2 --- Research Contributions --- p.4Chapter 1.2.1 --- Named Entity Tagging --- p.5Chapter 1.2.2 --- Handling Unknown Words --- p.6Chapter 1.2.3 --- Named-Entity Approach in Topic Tracking --- p.7Chapter 1.3 --- Organization of Thesis --- p.7Chapter 2 --- Background --- p.9Chapter 2.1 --- Previous Developments in Topic Tracking --- p.10Chapter 2.1.1 --- BBN's Tracking System --- p.10Chapter 2.1.2 --- CMU's Tracking System --- p.11Chapter 2.1.3 --- Dragon's Tracking System --- p.12Chapter 2.1.4 --- UPenn's Tracking System --- p.13Chapter 2.2 --- Topic Tracking in Chinese --- p.13Chapter 2.3 --- Part-of-Speech Tagging --- p.15Chapter 2.3.1 --- A Brief Overview of POS Tagging --- p.15Chapter 2.3.2 --- Transformation-based Error-Driven Learning --- p.18Chapter 2.4 --- Unknown Word Identification --- p.20Chapter 2.4.1 --- Rule-based approaches --- p.21Chapter 2.4.2 --- Statistical approaches --- p.23Chapter 2.4.3 --- Hybrid approaches --- p.24Chapter 2.5 --- Information Retrieval Models --- p.25Chapter 2.5.1 --- Vector-Space Model --- p.26Chapter 2.5.2 --- Probabilistic Model --- p.27Chapter 2.6 --- Chapter Summary --- p.28Chapter 3 --- System Overview --- p.29Chapter 3.1 --- Segmenter --- p.30Chapter 3.2 --- TEL Tagger --- p.31Chapter 3.3 --- Unknown Words Identifier --- p.32Chapter 3.4 --- Topic Tracker --- p.33Chapter 3.5 --- Chapter Summary --- p.34Chapter 4 --- Named Entity Tagging --- p.36Chapter 4.1 --- Experimental Data --- p.37Chapter 4.2 --- Transformational Tagging --- p.41Chapter 4.2.1 --- Notations --- p.41Chapter 4.2.2 --- Corpus Utilization --- p.42Chapter 4.2.3 --- Lexical Rules --- p.42Chapter 4.2.4 --- Contextual Rules --- p.47Chapter 4.3 --- Experiment and Result --- p.49Chapter 4.3.1 --- Lexical Tag Initialization --- p.50Chapter 4.3.2 --- Contribution of Lexical and Contextual Rules --- p.52Chapter 4.3.3 --- Performance on Unknown Words --- p.56Chapter 4.3.4 --- A Possible Benchmark --- p.57Chapter 4.3.5 --- Comparison between TEL Approach and the Stochas- tic Approach --- p.58Chapter 4.4 --- Chapter Summary --- p.59Chapter 5 --- Handling Unknown Words in Topic Tracking --- p.62Chapter 5.1 --- Overview --- p.63Chapter 5.2 --- Person Names --- p.64Chapter 5.2.1 --- Forming possible named entities from OOV by group- ing n-grams --- p.66Chapter 5.2.2 --- Overlapping --- p.69Chapter 5.3 --- Organization Names --- p.71Chapter 5.4 --- Location Names --- p.73Chapter 5.5 --- Dates and Times --- p.74Chapter 5.6 --- Chapter Summary --- p.75Chapter 6 --- Topic Tracking in Chinese --- p.77Chapter 6.1 --- Introduction of Topic Tracking --- p.78Chapter 6.2 --- Experimental Data --- p.79Chapter 6.3 --- Evaluation Methodology --- p.81Chapter 6.3.1 --- Cost Function --- p.82Chapter 6.3.2 --- DET Curve --- p.83Chapter 6.4 --- The Named Entity Approach --- p.85Chapter 6.4.1 --- Designing the Named Entities Set for Topic Tracking --- p.85Chapter 6.4.2 --- Feature Selection --- p.86Chapter 6.4.3 --- Integrated with Vector-Space Model --- p.87Chapter 6.5 --- Experimental Results and Analysis --- p.91Chapter 6.5.1 --- Notations --- p.92Chapter 6.5.2 --- Stopword Elimination --- p.92Chapter 6.5.3 --- TEL Tagging --- p.95Chapter 6.5.4 --- Unknown Word Identifier --- p.100Chapter 6.5.5 --- Error Analysis --- p.106Chapter 6.6 --- Chapter Summary --- p.108Chapter 7 --- Conclusions and Future Work --- p.110Chapter 7.1 --- Conclusions --- p.110Chapter 7.2 --- Future Work --- p.111Bibliography --- p.113Chapter A --- The POS Tags --- p.121Chapter B --- Surnames and transliterated characters --- p.123Chapter C --- Stopword List for Person Name --- p.126Chapter D --- Organization suffixes --- p.127Chapter E --- Location suffixes --- p.128Chapter F --- Examples of Feature Table (Train set with condition D410) --- p.12
    corecore