3 research outputs found

    Automatic topic detection from news stories.

    Get PDF
    Hui Kin.Thesis (M.Phil.)--Chinese University of Hong Kong, 2001.Includes bibliographical references (leaves 115-120).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Topic Detection Problem --- p.2Chapter 1.1.1 --- What is a Topic? --- p.2Chapter 1.1.2 --- Topic Detection --- p.3Chapter 1.2 --- Our Contributions --- p.5Chapter 1.2.1 --- Thesis Organization --- p.6Chapter 2 --- Literature Review --- p.7Chapter 2.1 --- Dragon Systems --- p.7Chapter 2.2 --- University of Massachusetts (UMass) --- p.9Chapter 2.3 --- Carnegie Mellon University (CMU) --- p.10Chapter 2.4 --- BBN Technologies --- p.11Chapter 2.5 --- IBM T. J. Watson Research Center --- p.12Chapter 2.6 --- National Taiwan University (NTU) --- p.13Chapter 2.7 --- Drawbacks of Existing Approaches --- p.14Chapter 3 --- System Overview --- p.16Chapter 3.1 --- News Sources --- p.17Chapter 3.2 --- Story Preprocessing --- p.21Chapter 3.3 --- Named Entity Extraction --- p.22Chapter 3.4 --- Gross Translation --- p.22Chapter 3.5 --- Unsupervised Learning Module --- p.24Chapter 4 --- Term Extraction and Story Representation --- p.27Chapter 4.1 --- IBM Intelligent Miner For Text --- p.28Chapter 4.2 --- Transformation-based Error-driven Learning --- p.31Chapter 4.2.1 --- Learning Stage --- p.32Chapter 4.2.2 --- Design of New Tags --- p.33Chapter 4.2.3 --- Lexical Rules Learning --- p.35Chapter 4.2.4 --- Contextual Rules Learning --- p.39Chapter 4.3 --- Extracting Named Entities Using Learned Rules --- p.42Chapter 4.4 --- Story Representation --- p.46Chapter 4.4.1 --- Basic Representation --- p.46Chapter 4.4.2 --- Enhanced Representation --- p.47Chapter 5 --- Gross Translation --- p.52Chapter 5.1 --- Basic Translation --- p.52Chapter 5.2 --- Enhanced Translation --- p.60Chapter 5.2.1 --- Parallel Corpus Alignment Approach --- p.60Chapter 5.2.2 --- Enhanced Translation Approach --- p.62Chapter 6 --- Unsupervised Learning Module --- p.68Chapter 6.1 --- Overview of the Discovery Algorithm --- p.68Chapter 6.2 --- Topic Representation --- p.70Chapter 6.3 --- Similarity Calculation --- p.72Chapter 6.3.1 --- Similarity Score Calculation --- p.72Chapter 6.3.2 --- Time Adjustment Scheme --- p.74Chapter 6.3.3 --- Language Normalization Scheme --- p.75Chapter 6.4 --- Related Elements Combination --- p.78Chapter 7 --- Experimental Results and Analysis --- p.84Chapter 7.1 --- TDT corpora --- p.84Chapter 7.2 --- Evaluation Methodology --- p.85Chapter 7.3 --- Experimental Results on Various Parameter Settings --- p.88Chapter 7.4 --- Experiments Results on Various Named Entity Extraction Ap- proaches --- p.89Chapter 7.5 --- Experiments Results on Various Story Representation Approaches --- p.100Chapter 7.6 --- Experiments Results on Various Translation Approaches --- p.104Chapter 7.7 --- Experiments Results on the Effect of Language Normalization Scheme on Detection Approaches --- p.106Chapter 7.8 --- TDT2000 Topic Detection Result --- p.110Chapter 8 --- Conclusions and Future Works --- p.112Chapter 8.1 --- Conclusions --- p.112Chapter 8.2 --- Future Work --- p.114Bibliography --- p.115Chapter A --- List of Topics annotated for TDT2 Corpus --- p.121Chapter B --- Significant Test Results --- p.12

    Transformational tagging for topic tracking in natural language.

    Get PDF
    Ip Chun Wah Timmy.Thesis (M.Phil.)--Chinese University of Hong Kong, 2000.Includes bibliographical references (leaves 113-120).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Topic Detection and Tracking --- p.2Chapter 1.1.1 --- What is a Topic? --- p.3Chapter 1.1.2 --- What is Topic Tracking? --- p.4Chapter 1.2 --- Research Contributions --- p.4Chapter 1.2.1 --- Named Entity Tagging --- p.5Chapter 1.2.2 --- Handling Unknown Words --- p.6Chapter 1.2.3 --- Named-Entity Approach in Topic Tracking --- p.7Chapter 1.3 --- Organization of Thesis --- p.7Chapter 2 --- Background --- p.9Chapter 2.1 --- Previous Developments in Topic Tracking --- p.10Chapter 2.1.1 --- BBN's Tracking System --- p.10Chapter 2.1.2 --- CMU's Tracking System --- p.11Chapter 2.1.3 --- Dragon's Tracking System --- p.12Chapter 2.1.4 --- UPenn's Tracking System --- p.13Chapter 2.2 --- Topic Tracking in Chinese --- p.13Chapter 2.3 --- Part-of-Speech Tagging --- p.15Chapter 2.3.1 --- A Brief Overview of POS Tagging --- p.15Chapter 2.3.2 --- Transformation-based Error-Driven Learning --- p.18Chapter 2.4 --- Unknown Word Identification --- p.20Chapter 2.4.1 --- Rule-based approaches --- p.21Chapter 2.4.2 --- Statistical approaches --- p.23Chapter 2.4.3 --- Hybrid approaches --- p.24Chapter 2.5 --- Information Retrieval Models --- p.25Chapter 2.5.1 --- Vector-Space Model --- p.26Chapter 2.5.2 --- Probabilistic Model --- p.27Chapter 2.6 --- Chapter Summary --- p.28Chapter 3 --- System Overview --- p.29Chapter 3.1 --- Segmenter --- p.30Chapter 3.2 --- TEL Tagger --- p.31Chapter 3.3 --- Unknown Words Identifier --- p.32Chapter 3.4 --- Topic Tracker --- p.33Chapter 3.5 --- Chapter Summary --- p.34Chapter 4 --- Named Entity Tagging --- p.36Chapter 4.1 --- Experimental Data --- p.37Chapter 4.2 --- Transformational Tagging --- p.41Chapter 4.2.1 --- Notations --- p.41Chapter 4.2.2 --- Corpus Utilization --- p.42Chapter 4.2.3 --- Lexical Rules --- p.42Chapter 4.2.4 --- Contextual Rules --- p.47Chapter 4.3 --- Experiment and Result --- p.49Chapter 4.3.1 --- Lexical Tag Initialization --- p.50Chapter 4.3.2 --- Contribution of Lexical and Contextual Rules --- p.52Chapter 4.3.3 --- Performance on Unknown Words --- p.56Chapter 4.3.4 --- A Possible Benchmark --- p.57Chapter 4.3.5 --- Comparison between TEL Approach and the Stochas- tic Approach --- p.58Chapter 4.4 --- Chapter Summary --- p.59Chapter 5 --- Handling Unknown Words in Topic Tracking --- p.62Chapter 5.1 --- Overview --- p.63Chapter 5.2 --- Person Names --- p.64Chapter 5.2.1 --- Forming possible named entities from OOV by group- ing n-grams --- p.66Chapter 5.2.2 --- Overlapping --- p.69Chapter 5.3 --- Organization Names --- p.71Chapter 5.4 --- Location Names --- p.73Chapter 5.5 --- Dates and Times --- p.74Chapter 5.6 --- Chapter Summary --- p.75Chapter 6 --- Topic Tracking in Chinese --- p.77Chapter 6.1 --- Introduction of Topic Tracking --- p.78Chapter 6.2 --- Experimental Data --- p.79Chapter 6.3 --- Evaluation Methodology --- p.81Chapter 6.3.1 --- Cost Function --- p.82Chapter 6.3.2 --- DET Curve --- p.83Chapter 6.4 --- The Named Entity Approach --- p.85Chapter 6.4.1 --- Designing the Named Entities Set for Topic Tracking --- p.85Chapter 6.4.2 --- Feature Selection --- p.86Chapter 6.4.3 --- Integrated with Vector-Space Model --- p.87Chapter 6.5 --- Experimental Results and Analysis --- p.91Chapter 6.5.1 --- Notations --- p.92Chapter 6.5.2 --- Stopword Elimination --- p.92Chapter 6.5.3 --- TEL Tagging --- p.95Chapter 6.5.4 --- Unknown Word Identifier --- p.100Chapter 6.5.5 --- Error Analysis --- p.106Chapter 6.6 --- Chapter Summary --- p.108Chapter 7 --- Conclusions and Future Work --- p.110Chapter 7.1 --- Conclusions --- p.110Chapter 7.2 --- Future Work --- p.111Bibliography --- p.113Chapter A --- The POS Tags --- p.121Chapter B --- Surnames and transliterated characters --- p.123Chapter C --- Stopword List for Person Name --- p.126Chapter D --- Organization suffixes --- p.127Chapter E --- Location suffixes --- p.128Chapter F --- Examples of Feature Table (Train set with condition D410) --- p.12

    Temporal information in newswire articles : an annotation scheme and corpus study.

    Get PDF
    Many natural language processing applications, such as information extraction, question answering, topic detection and tracking, would benefit significantly from the ability to accurately position reported events in time, either relatively with respect to other events or absolutely with respect to calendrical time. However, relatively little work has been done to date on the automatic extraction of temporal information from text. Before we can progress to automatically position reported events in time, we must gain an understanding of the mechanisms used to do this in language. This understanding can be promoted through the development of all annotation scheme, which allows us to identify the textual expressions conveying events, times and temporal relations in a corpus of 'real' text. This thesis describes a fine-grained annotation scheme with which we can capture all events, times and temporal relations reported ill a text. To aid the application of the scheme to text, a graphical annotation tool has been developed. This tool not only allows easy markup of sophisticated temporal annotations, it also contains an interactive, inference-based component supporting the gathering of temporal relations. The annotation scheme and the tool have been evaluated through the construction of a trial corpus during a pilot study. In this study, a group of annotators was supplied with a description of the annotation scheme and asked to apply it to a trial corpus. The pilot study showed that the annotation scheme was difficult to apply, but is feasible with improvements to the definition of the annotation scheme and the tool. Analysis of the resulting trial corpus also provides preliminary results on the relative extent to which different linguistic mechanisms, explicit and implicit, are used to convey temporal relational information in text
    corecore