5 research outputs found

    Chinese named entity recognition using lexicalized HMMs

    Get PDF
    This paper presents a lexicalized HMM-based approach to Chinese named entity recognition (NER). To tackle the problem of unknown words, we unify unknown word identification and NER as a single tagging task on a sequence of known words. To do this, we first employ a known-word bigram-based model to segment a sentence into a sequence of known words, and then apply the uniformly lexicalized HMMs to assign each known word a proper hybrid tag that indicates its pattern in forming an entity and the category of the formed entity. Our system is able to integrate both the internal formation patterns and the surrounding contextual clues for NER under the framework of HMMs. As a result, the performance of the system can be improved without losing its efficiency in training and tagging. We have tested our system using different public corpora. The results show that lexicalized HMMs can substantially improve NER performance over standard HMMs. The results also indicate that character-based tagging (viz. the tagging based on pure single-character words) is comparable to and can even outperform the relevant known-word based tagging when a lexicalization technique is applied.postprin

    A history and theory of textual event detection and recognition

    Get PDF

    Conditional random fields with dynamic potentials for Chinese named entity recognition.

    Get PDF
    Wu, Yiu Kei.Thesis (M.Phil.)--Chinese University of Hong Kong, 2008.Includes bibliographical references (p. 69-75).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Chinese NER Problem --- p.1Chapter 1.2 --- Contribution of Our Proposed Framework --- p.3Chapter 2 --- Related Work --- p.6Chapter 2.1 --- Hidden Markov Models --- p.7Chapter 2.2 --- Maximum Entropy Models --- p.8Chapter 2.3 --- Conditional Random Fields --- p.10Chapter 3 --- Our Proposed Model --- p.14Chapter 3.1 --- Background --- p.14Chapter 3.1.1 --- Problem Formulation --- p.14Chapter 3.1.2 --- Conditional Random Fields --- p.16Chapter 3.1.3 --- Semi-Markov Conditional Random Fields --- p.26Chapter 3.2 --- The Formulation of Our Proposed Model --- p.28Chapter 3.2.1 --- The Main Principle --- p.28Chapter 3.2.2 --- The Detailed Formulation --- p.36Chapter 3.2.3 --- Adapting Features from Original CRF to CRFDP --- p.51Chapter 4 --- Experiments --- p.54Chapter 4.1 --- Datasets --- p.55Chapter 4.2 --- Features --- p.57Chapter 4.3 --- Evaluation Metrics --- p.61Chapter 4.4 --- Results and Discussion --- p.63Chapter 5 --- Conclusions and Future Work --- p.67Bibliography --- p.69A --- p.76B --- p.78C --- p.8

    Chinese named entity identification using class-based language model

    No full text

    Automatic Extraction and Assessment of Entities from the Web

    Get PDF
    The search for information about entities, such as people or movies, plays an increasingly important role on the Web. This information is still scattered across many Web pages, making it more time consuming for a user to find all relevant information about an entity. This thesis describes techniques to extract entities and information about these entities from the Web, such as facts, opinions, questions and answers, interactive multimedia objects, and events. The findings of this thesis are that it is possible to create a large knowledge base automatically using a manually-crafted ontology. The precision of the extracted information was found to be between 75–90 % (facts and entities respectively) after using assessment algorithms. The algorithms from this thesis can be used to create such a knowledge base, which can be used in various research fields, such as question answering, named entity recognition, and information retrieval
    corecore