Search CORE

4,288 research outputs found

A Sub-Character Architecture for Korean Language Processing

Author: Stratos Karl
Publication venue
Publication date: 01/01/2017
Field of study

We introduce a novel sub-character architecture that exploits a unique compositional structure of the Korean language. Our method decomposes each character into a small set of primitive phonetic units called jamo letters from which character- and word-level representations are induced. The jamo letters divulge syntactic and semantic information that is difficult to access with conventional character-level units. They greatly alleviate the data sparsity problem, reducing the observation space to 1.6% of the original while increasing accuracy in our experiments. We apply our architecture to dependency parsing and achieve dramatic improvement over strong lexical baselines.Comment: EMNLP 201

arXiv.org e-Print Archive

Crossref

Integrated speech and morphological processing in a connectionist continuous speech understanding for Korean

Author: Lee Geunbae
Lee Jong-Hyeok
Publication venue
Publication date: 01/01/1996
Field of study

A new tightly coupled speech and natural language integration model is presented for a TDNN-based continuous possibly large vocabulary speech recognition system for Korean. Unlike popular n-best techniques developed for integrating mainly HMM-based speech recognition and natural language processing in a {\em word level}, which is obviously inadequate for morphologically complex agglutinative languages, our model constructs a spoken language system based on a {\em morpheme-level} speech and language integration. With this integration scheme, the spoken Korean processing engine (SKOPE) is designed and implemented using a TDNN-based diphone recognition module integrated with a Viterbi-based lexical decoding and symbolic phonological/morphological co-analysis. Our experiment results show that the speaker-dependent continuous {\em eojeol} (Korean word) recognition and integrated morphological analysis can be achieved with over 80.6% success rate directly from speech inputs for the middle-level vocabularies.Comment: latex source with a4 style, 15 pages, to be published in computer processing of oriental language journa

arXiv.org e-Print Archive

포항공과대학교

"The Interaction of Structural and Semantic Biases in Coherence and Coreference"

Author: Kehler Andrew
Rohde Hannah
Publication venue
Publication date: 01/01/2009
Field of study

Edinburgh Research Explorer

"Coherence-Driven Expectations in Discourse and Dialog"

Author: Kehler Andrew
Rohde Hannah
Publication venue
Publication date: 01/01/2008
Field of study

Edinburgh Research Explorer

"The Bidirectional Influence between Coherence Establishment and Pronoun Interpretation"

Author: Kehler Andrew
Rohde Hannah
Publication venue
Publication date: 01/01/2008
Field of study

Edinburgh Research Explorer

Morphological annotation of Korean with Directly Maintainable Resources

Author: Berlocher Ivan
Huh Hyun-Gue
Laporte Eric
Nam Jee-Sun
Publication venue
Publication date: 01/01/2006
Field of study

This article describes an exclusively resource-based method of morphological annotation of written Korean text. Korean is an agglutinative language. Our annotator is designed to process text before the operation of a syntactic parser. In its present state, it annotates one-stem words only. The output is a graph of morphemes annotated with accurate linguistic information. The granularity of the tagset is 3 to 5 times higher than usual tagsets. A comparison with a reference annotated corpus showed that it achieves 89% recall without any corpus training. The language resources used by the system are lexicons of stems, transducers of suffixes and transducers of generation of allomorphs. All can be easily updated, which allows users to control the evolution of the performances of the system. It has been claimed that morphological annotation of Korean text could only be performed by a morphological analysis module accessing a lexicon of morphemes. We show that it can also be performed directly with a lexicon of words and without applying morphological rules at annotation time, which speeds up annotation to 1,210 word/s. The lexicon of words is obtained from the maintainable language resources through a fully automated compilation process

arXiv.org e-Print Archive

CiteSeerX

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Chart-driven Connectionist Categorial Parsing of Spoken Korean

Author: Lee Geunbae
Lee Jong-Hyeok
Lee WonIl
Publication venue
Publication date: 29/11/1995
Field of study

While most of the speech and natural language systems which were developed for English and other Indo-European languages neglect the morphological processing and integrate speech and natural language at the word level, for the agglutinative languages such as Korean and Japanese, the morphological processing plays a major role in the language processing since these languages have very complex morphological phenomena and relatively simple syntactic functionality. Obviously degenerated morphological processing limits the usable vocabulary size for the system and word-level dictionary results in exponential explosion in the number of dictionary entries. For the agglutinative languages, we need sub-word level integration which leaves rooms for general morphological processing. In this paper, we developed a phoneme-level integration model of speech and linguistic processings through general morphological analysis for agglutinative languages and a efficient parsing scheme for that integration. Korean is modeled lexically based on the categorial grammar formalism with unordered argument and suppressed category extensions, and chart-driven connectionist parsing method is introduced.Comment: 6 pages, Postscript file, Proceedings of ICCPOL'9

arXiv.org e-Print Archive

포항공과대학교

The Phonological Process with Two Patterns of Simplified Chinese Characters

Author: Baron
Baron
Chen
Chen
Coltheart
Coltheart
Coltheart
Coltheart
Coltheart
Coltheart
Coslett
Coslett
Hanzi
Hanzi
Hillsdale
Hillsdale
Hillsdale
Hillsdale
Hoosain
Hoosain
Hung
Hung
Hung
Hung
Jean
Jean
Jin
Jin
Junehee Lee
Katz
Katz
Lee
Lee
Lee
Lee
Leong
Leong
Levelt
Levelt
Lukatela
Lukatela
Mattingly
Mattingly
McClelland
McClelland
Neely
Neely
Patterson
Patterson
Perfetti
Perfetti
Perfetti
Perfetti
Perfetti
Perfetti
Perfetti
Perfetti
Perfetti
Perfetti
Perfetti
Perfetti
Tan
Tan
Tan
Tan
Van
Van
Wang
Wang
Xu
Xu
Yang Lee
Zheng Jin
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/12/2013
Field of study

This paper analyzed word recognition in two patterns of Chinese characters, cross referenced with word frequency. The patterns were defined as uni-part (semantic radical/component only) and bi-part (including the phonetic radical/component and the semantic radical/component) characters. The interactions of semantic and phonological access in both patterns were inspected. It was observed that in the naming task and the pronunciation-matching task, the subject performance involving the uni-part characters showed longer RT than the bi-part characters. However, with the lexical decision and meaning-matching tasks the uni-part characters showed shorter RT than the bi-part characters. It was also observed that the frequency, which is regarded as a lexical variable, displayed a strong influence. This suggests that Chinese characters require lexical access in all tasks. This study also suggested that the phonological process is primary in visual word recognition; as there is a significant phonological effect in processing the Chinese bi-part characters, resulting in either the facilitation or inhibition of phonology due to the differing demands of the two task

Crossref

Biblioteka Nauki - repozytorium artykuÅÃ³w

Repozytorium Uniwersytetu Łódzkiego (University of Lodz Repository)

Loanword adaptation as first-language phonological perception

Author: Boersma Paul
Hamann Silke
Publication venue
Publication date: 01/01/2009
Field of study

We show that loanword adaptation can be understood entirely in terms of phonological and phonetic comprehension and production mechanisms in the first language. We provide explicit accounts of several loanword adaptation phenomena (in Korean) in terms of an Optimality-Theoretic grammar model with the same three levels of representation that are needed to describe L1 phonology: the underlying form, the phonological surface form, and the auditory-phonetic form. The model is bidirectional, i.e., the same constraints and rankings are used by the listener and by the speaker. These constraints and rankings are the same for L1 processing and loanword adaptation

CiteSeerX

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Hochschulschriftenserver - Universität Frankfurt am Main