1 research outputs found

    A Multi-phase Semi-supersense Tagging of Korean Unknown Nouns

    No full text
    Abstract. Supersense tagging is a problem of finding a corresponding semantic super tag (eg. Phenomenon, Act) based on syntactic information and annotated corpora. However, we employ semantic information rather than syntactic one and annotated corpora, because Korean language has relatively flexible syntactic structure and is lack of annotated corpora. To construct the automatic sense tagging system for Korean language, we use semi-supersenses of first and second level in Sejong’s Noun Semantic Class System. We employ a hybrid approach consisting of three phases: one morphological matching phase and two semantic matching phases. The morphological phase is based on suffix pattern matching which assigns compound word to the class including the suffix word. One of the two semantic matching phases is based on concept similarity on WordNet, and the other is based on the term similarity in term matrix reduced by singular value decomposition (SVD). Above semantic phases are using weighted k-Nearest Neighbor classifier commonly but are also using different similarity metrics. In experiments, 79,103 unknown words are extracted from 225,779 noun words from syntactic tagged corpus, and 98 % of the unknown words are addressed by our hybrid method.
    corecore