727 research outputs found
SKOPE: A connectionist/symbolic architecture of spoken Korean processing
Spoken language processing requires speech and natural language integration.
Moreover, spoken Korean calls for unique processing methodology due to its
linguistic characteristics. This paper presents SKOPE, a connectionist/symbolic
spoken Korean processing engine, which emphasizes that: 1) connectionist and
symbolic techniques must be selectively applied according to their relative
strength and weakness, and 2) the linguistic characteristics of Korean must be
fully considered for phoneme recognition, speech and language integration, and
morphological/syntactic processing. The design and implementation of SKOPE
demonstrates how connectionist/symbolic hybrid architectures can be constructed
for spoken agglutinative language processing. Also SKOPE presents many novel
ideas for speech and language processing. The phoneme recognition,
morphological analysis, and syntactic analysis experiments show that SKOPE is a
viable approach for the spoken Korean processing.Comment: 8 pages, latex, use aaai.sty & aaai.bst, bibfile: nlpsp.bib, to be
presented at IJCAI95 workshops on new approaches to learning for natural
language processin
Integrated speech and morphological processing in a connectionist continuous speech understanding for Korean
A new tightly coupled speech and natural language integration model is
presented for a TDNN-based continuous possibly large vocabulary speech
recognition system for Korean. Unlike popular n-best techniques developed for
integrating mainly HMM-based speech recognition and natural language processing
in a {\em word level}, which is obviously inadequate for morphologically
complex agglutinative languages, our model constructs a spoken language system
based on a {\em morpheme-level} speech and language integration. With this
integration scheme, the spoken Korean processing engine (SKOPE) is designed and
implemented using a TDNN-based diphone recognition module integrated with a
Viterbi-based lexical decoding and symbolic phonological/morphological
co-analysis. Our experiment results show that the speaker-dependent continuous
{\em eojeol} (Korean word) recognition and integrated morphological analysis
can be achieved with over 80.6% success rate directly from speech inputs for
the middle-level vocabularies.Comment: latex source with a4 style, 15 pages, to be published in computer
processing of oriental language journa
Chart-driven Connectionist Categorial Parsing of Spoken Korean
While most of the speech and natural language systems which were developed
for English and other Indo-European languages neglect the morphological
processing and integrate speech and natural language at the word level, for the
agglutinative languages such as Korean and Japanese, the morphological
processing plays a major role in the language processing since these languages
have very complex morphological phenomena and relatively simple syntactic
functionality. Obviously degenerated morphological processing limits the usable
vocabulary size for the system and word-level dictionary results in exponential
explosion in the number of dictionary entries. For the agglutinative languages,
we need sub-word level integration which leaves rooms for general morphological
processing. In this paper, we developed a phoneme-level integration model of
speech and linguistic processings through general morphological analysis for
agglutinative languages and a efficient parsing scheme for that integration.
Korean is modeled lexically based on the categorial grammar formalism with
unordered argument and suppressed category extensions, and chart-driven
connectionist parsing method is introduced.Comment: 6 pages, Postscript file, Proceedings of ICCPOL'9
Improving Term Frequency Normalization for Multi-topical Documents, and Application to Language Modeling Approaches
Term frequency normalization is a serious issue since lengths of documents
are various. Generally, documents become long due to two different reasons -
verbosity and multi-topicality. First, verbosity means that the same topic is
repeatedly mentioned by terms related to the topic, so that term frequency is
more increased than the well-summarized one. Second, multi-topicality indicates
that a document has a broad discussion of multi-topics, rather than single
topic. Although these document characteristics should be differently handled,
all previous methods of term frequency normalization have ignored these
differences and have used a simplified length-driven approach which decreases
the term frequency by only the length of a document, causing an unreasonable
penalization. To attack this problem, we propose a novel TF normalization
method which is a type of partially-axiomatic approach. We first formulate two
formal constraints that the retrieval model should satisfy for documents having
verbose and multi-topicality characteristic, respectively. Then, we modify
language modeling approaches to better satisfy these two constraints, and
derive novel smoothing methods. Experimental results show that the proposed
method increases significantly the precision for keyword queries, and
substantially improves MAP (Mean Average Precision) for verbose queries.Comment: 8 pages, conference paper, published in ECIR '0
- …