20 research outputs found

    A Feature-Based Lexicalized Tree Adjoining Grammar for Korean

    Get PDF
    This document describes an on-going project of developing a grammar of Korean, the Korean XTAG grammar, written in the TAG formalism and implemented for use with the XTAG system enriched with a Korean morphological analyzer. The Korean XTAG grammar described in this report is based on the TAG formalism (Joshi et al. (1975)), which has been extended to include lexicalization (Schabes et al. (1988)), and unification-based feature structures (Vijay-Shanker and Joshi (1991)). The document first describes the modifications that we have made to the XTAG system (The XTAG-Group (1998)) to handle rich inflectional morphology in Korean. Then various syntactic phenomena that can be currently handled are described, including adverb modification, relative clauses, complex noun phrases, auxiliary verb constructions, gerunds and adjunct clauses. The work reported here is a first step towards the development of an implemented TAG grammar for Korean, which is continuously updated with the addition of new analyses and modification of old ones

    Compound noun segmentation based on lexical data extracted from corpus

    No full text

    Dual Triggered Correspondence Topic (DTCT) model for MeSH annotation

    No full text

    Word Segmentation Based on Estimation of Words from Examples

    No full text
    From a cognitive point of view, words can be recognized based on learned data which can be obtained from linguistic materials. Namely, people learn words from many examples which they meet. We propose a word segmentation algorithm based on estimated knowledge for words acquired from both local texts being processed and POS tagged corpus. In order to show the feasibility of our model, we apply it to guessing of unknown words caused by morphological analysis failure. 1 Introduction We continuously learn words by seeing and hearing examples, and acquire new ones based on learned knowledge and new examples. We can think of recognition and segmentation of words as the cognitive process. Consider the following example Figure 1: Words can be generalized from many samples ffl !---#UX9L(hag-gyo-e 1 , to school) (a) !---#(hag) + UX9L(gyo-e) (b) !---#UX(hag-gyo) + 9L(e) (c) !---#UX9L(hag-gyo-e) It is possible to divide an eojeol 2 `!---#UX9L'(hag-gyo-e) in three ways. Human knows that the..
    corecore