4,288 research outputs found
A Sub-Character Architecture for Korean Language Processing
We introduce a novel sub-character architecture that exploits a unique
compositional structure of the Korean language. Our method decomposes each
character into a small set of primitive phonetic units called jamo letters from
which character- and word-level representations are induced. The jamo letters
divulge syntactic and semantic information that is difficult to access with
conventional character-level units. They greatly alleviate the data sparsity
problem, reducing the observation space to 1.6% of the original while
increasing accuracy in our experiments. We apply our architecture to dependency
parsing and achieve dramatic improvement over strong lexical baselines.Comment: EMNLP 201
Integrated speech and morphological processing in a connectionist continuous speech understanding for Korean
A new tightly coupled speech and natural language integration model is
presented for a TDNN-based continuous possibly large vocabulary speech
recognition system for Korean. Unlike popular n-best techniques developed for
integrating mainly HMM-based speech recognition and natural language processing
in a {\em word level}, which is obviously inadequate for morphologically
complex agglutinative languages, our model constructs a spoken language system
based on a {\em morpheme-level} speech and language integration. With this
integration scheme, the spoken Korean processing engine (SKOPE) is designed and
implemented using a TDNN-based diphone recognition module integrated with a
Viterbi-based lexical decoding and symbolic phonological/morphological
co-analysis. Our experiment results show that the speaker-dependent continuous
{\em eojeol} (Korean word) recognition and integrated morphological analysis
can be achieved with over 80.6% success rate directly from speech inputs for
the middle-level vocabularies.Comment: latex source with a4 style, 15 pages, to be published in computer
processing of oriental language journa
Morphological annotation of Korean with Directly Maintainable Resources
This article describes an exclusively resource-based method of morphological
annotation of written Korean text. Korean is an agglutinative language. Our
annotator is designed to process text before the operation of a syntactic
parser. In its present state, it annotates one-stem words only. The output is a
graph of morphemes annotated with accurate linguistic information. The
granularity of the tagset is 3 to 5 times higher than usual tagsets. A
comparison with a reference annotated corpus showed that it achieves 89% recall
without any corpus training. The language resources used by the system are
lexicons of stems, transducers of suffixes and transducers of generation of
allomorphs. All can be easily updated, which allows users to control the
evolution of the performances of the system. It has been claimed that
morphological annotation of Korean text could only be performed by a
morphological analysis module accessing a lexicon of morphemes. We show that it
can also be performed directly with a lexicon of words and without applying
morphological rules at annotation time, which speeds up annotation to 1,210
word/s. The lexicon of words is obtained from the maintainable language
resources through a fully automated compilation process
Chart-driven Connectionist Categorial Parsing of Spoken Korean
While most of the speech and natural language systems which were developed
for English and other Indo-European languages neglect the morphological
processing and integrate speech and natural language at the word level, for the
agglutinative languages such as Korean and Japanese, the morphological
processing plays a major role in the language processing since these languages
have very complex morphological phenomena and relatively simple syntactic
functionality. Obviously degenerated morphological processing limits the usable
vocabulary size for the system and word-level dictionary results in exponential
explosion in the number of dictionary entries. For the agglutinative languages,
we need sub-word level integration which leaves rooms for general morphological
processing. In this paper, we developed a phoneme-level integration model of
speech and linguistic processings through general morphological analysis for
agglutinative languages and a efficient parsing scheme for that integration.
Korean is modeled lexically based on the categorial grammar formalism with
unordered argument and suppressed category extensions, and chart-driven
connectionist parsing method is introduced.Comment: 6 pages, Postscript file, Proceedings of ICCPOL'9
The Phonological Process with Two Patterns of Simplified Chinese Characters
This paper analyzed word recognition in two patterns of Chinese characters, cross referenced with word frequency. The patterns were defined as uni-part (semantic radical/component only) and bi-part (including the phonetic radical/component and the semantic radical/component) characters. The interactions of semantic and phonological access in both patterns were inspected. It was observed that in the naming task and the pronunciation-matching task, the subject performance involving the uni-part characters showed longer RT than the bi-part characters. However, with the lexical decision and meaning-matching tasks the uni-part characters showed shorter RT than the bi-part characters. It was also observed that the frequency, which is regarded as a lexical variable, displayed a strong influence. This suggests that Chinese characters require lexical access in all tasks. This study also suggested that the phonological process is primary in visual word recognition; as there is a significant phonological effect in processing the Chinese bi-part characters, resulting in either the facilitation or inhibition of phonology due to the differing demands of the two task
Loanword adaptation as first-language phonological perception
We show that loanword adaptation can be understood entirely in terms of phonological and phonetic comprehension and production mechanisms in the first language. We provide explicit accounts of several loanword adaptation phenomena (in Korean) in terms of an Optimality-Theoretic grammar model with the same three levels of representation that are needed to describe L1 phonology: the underlying form, the phonological surface form, and the auditory-phonetic form. The model is bidirectional, i.e., the same constraints and rankings are used by the listener and by the speaker. These constraints and rankings are the same for L1 processing and loanword adaptation
- …