3 research outputs found

    Contrastive Approach towards Text Source Classification based on Top-Bag-Word Similarity

    Get PDF
    PACLIC / The University of the Philippines Visayas Cebu College Cebu City, Philippines / November 20-22, 200

    Design of CKIP Chinese Word Segmentation System

    No full text
    In this paper, we describe the design of the CKIP Chinese word segmentation system and analyse its performance. The system utilizes a modulized approach. Independent modules were designed to solve the problems of segmentation ambiguities and identifying unknown words. Segmentation ambiguities are resolved by a hybrid method of using heuristic and statistical rules. Regular-type unknown words are identified by regular expressions and irregular types of unknown words are detected first by their occurrence and then extracted by morphological rules with statistical and morphological constraints. At the first international Chinese Word Segmentation Bakeoff, the CKIP system was tested on open and closed tracks of Beijing University (PK) and Hong Kong CityU (HK). The evaluation results show our system performed very well on both the HK open track and closed tracks; and was acceptable on the PK tracks
    corecore