62,781 research outputs found
JAPANESE-MALAY E-DICTIONARY WITH VOICE RECOGNITION
This report is to give a brief introduction to a project that is going to be done
under the Artificial Intelligent field. This project is based on the natural language
processing which an electronic dictionary will be built for Japanese-Malay language
translation. The intelligent part of this e-dictionary is that it will be integrated with voice
recognition program to allow user to interact with it.
The e-dictionary will be built similar to the current e-dictionary available in the
market today. The end-product may not be a full-functioning device by itself, but it may
be a prDlmype to be tested on a computer. The program for the e-dictionary will be using
Microsofi~ Visual Basic 6.0 for the system. As voice recognition system will be added to
the system, the speech recognition engine and text-to-speech engine available for
'viicrosoft Visual Basic 6.0 language will be used in order to take advantage of its
library. 'vficrosofi: Office Access 2003 will be used for the database of this system
JAPANESE-MALAY E-DICTIONARY WITH VOICE RECOGNITION
This report is to give a brief introduction to a project that is going to be done
under the Artificial Intelligent field. This project is based on the natural language
processing which an electronic dictionary will be built for Japanese-Malay language
translation. The intelligent part of this e-dictionary is that it will be integrated with voice
recognition program to allow user to interact with it.
The e-dictionary will be built similar to the current e-dictionary available in the
market today. The end-product may not be a full-functioning device by itself, but it may
be a prDlmype to be tested on a computer. The program for the e-dictionary will be using
Microsofi~ Visual Basic 6.0 for the system. As voice recognition system will be added to
the system, the speech recognition engine and text-to-speech engine available for
'viicrosoft Visual Basic 6.0 language will be used in order to take advantage of its
library. 'vficrosofi: Office Access 2003 will be used for the database of this system
Mostly-Unsupervised Statistical Segmentation of Japanese Kanji Sequences
Given the lack of word delimiters in written Japanese, word segmentation is
generally considered a crucial first step in processing Japanese texts. Typical
Japanese segmentation algorithms rely either on a lexicon and syntactic
analysis or on pre-segmented data; but these are labor-intensive, and the
lexico-syntactic techniques are vulnerable to the unknown word problem. In
contrast, we introduce a novel, more robust statistical method utilizing
unsegmented training data. Despite its simplicity, the algorithm yields
performance on long kanji sequences comparable to and sometimes surpassing that
of state-of-the-art morphological analyzers over a variety of error metrics.
The algorithm also outperforms another mostly-unsupervised statistical
algorithm previously proposed for Chinese.
Additionally, we present a two-level annotation scheme for Japanese to
incorporate multiple segmentation granularities, and introduce two novel
evaluation metrics, both based on the notion of a compatible bracket, that can
account for multiple granularities simultaneously.Comment: 22 pages. To appear in Natural Language Engineerin
Natural language processing
Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
Efficient deep processing of japanese
We present a broad coverage Japanese grammar written in the HPSG formalism with MRS semantics. The grammar is created for use in real world applications, such that robustness and performance issues play an important role. It is connected to a POS tagging and word segmentation tool. This grammar is being developed in a multilingual context, requiring MRS structures that are easily comparable across languages
- …