Search CORE

3 research outputs found

Morphological annotation of Korean with Directly Maintainable Resources

Author: Berlocher Ivan
Huh Hyun-Gue
Laporte Eric
Nam Jee-Sun
Publication venue
Publication date: 01/01/2006
Field of study

This article describes an exclusively resource-based method of morphological annotation of written Korean text. Korean is an agglutinative language. Our annotator is designed to process text before the operation of a syntactic parser. In its present state, it annotates one-stem words only. The output is a graph of morphemes annotated with accurate linguistic information. The granularity of the tagset is 3 to 5 times higher than usual tagsets. A comparison with a reference annotated corpus showed that it achieves 89% recall without any corpus training. The language resources used by the system are lexicons of stems, transducers of suffixes and transducers of generation of allomorphs. All can be easily updated, which allows users to control the evolution of the performances of the system. It has been claimed that morphological annotation of Korean text could only be performed by a morphological analysis module accessing a lexicon of morphemes. We show that it can also be performed directly with a lexicon of words and without applying morphological rules at annotation time, which speeds up annotation to 1,210 word/s. The lexicon of words is obtained from the maintainable language resources through a fully automated compilation process

arXiv.org e-Print Archive

CiteSeerX

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Machine Aided Error-Correction Environment for Korean Morphological Analysis and Part-of-Speech Tagging

Author: Jung-goo Kang
Junsik Park
Key-sun Choi
Wook Hur
Publication venue
Publication date: 01/01/1998
Field of study

Statistical methods require very large corpus with high quality. But building large and fault-less annotated corpus is a very difficult job. This paper proposes an efficient method to con-struct part-of-speech tagged corpus. A rule-based error correction method is proposed to find and correct errors semi-automatically by user-defined rules. We also make use of user's correction log to reflect feedback. Experiments were carried out to show the efficiency of error correction process of this workbench. The re-sult shows that about 63.2 % of tagging errors can be corrected.

CiteSeerX

Crossref

Robust Part of Speech Tagging

Author: Martínez Garcia Eva
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2013
Field of study

Generally, NLP tools use well-formed and annotated data to learn patterns by using machine learning techniques. However, in this work we will focus on the language used in an on-line platform for machine translation. In this area it is usual to have a framework such the following: a web-page which offer a service of translation between pairs of languages. The problem is that the casual users utilize the service to translate any type of text (cut and paste, single words, bad formatting, snipets, informal language, pre-traductions, etc.). Hence, in this situation we will find very often words with mistakes that make the system provides a bad translation because it is not able to understand the input.The main goal of our work is, once we have identified the problem of dealing with non-standard-input is to develop a robust PoS tagger from the SVMTagger

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC