4,425 research outputs found
Source side pre-ordering using recurrent neural networks for English-Myanmar machine translation
Word reordering has remained one of the challenging problems for machine translation when translating between language pairs with different word orders e.g. English and Myanmar. Without reordering between these languages, a source sentence may be translated directly with similar word order and translation can not be meaningful. Myanmar is a subject-objectverb (SOV) language and an effective reordering is essential for translation. In this paper, we applied a pre-ordering approach using recurrent neural networks to pre-order words of the source Myanmar sentence into target English’s word order. This neural pre-ordering model is automatically derived from parallel word-aligned data with syntactic and lexical features based on dependency parse trees of the source sentences. This can generate arbitrary permutations that may be non-local on the sentence and can be combined into English-Myanmar machine translation. We exploited the model to reorder English sentences into Myanmar-like word order as a preprocessing stage for machine translation, obtaining improvements quality comparable to baseline rule-based pre-ordering approach on asian language treebank (ALT) corpus
Statistical Function Tagging and Grammatical Relations of Myanmar Sentences
This paper describes a context free grammar (CFG) based grammatical relations
for Myanmar sentences which combine corpus-based function tagging system. Part
of the challenge of statistical function tagging for Myanmar sentences comes
from the fact that Myanmar has free-phrase-order and a complex morphological
system. Function tagging is a pre-processing step to show grammatical relations
of Myanmar sentences. In the task of function tagging, which tags the function
of Myanmar sentences with correct segmentation, POS (part-of-speech) tagging
and chunking information, we use Naive Bayesian theory to disambiguate the
possible function tags of a word. We apply context free grammar (CFG) to find
out the grammatical relations of the function tags. We also create a functional
annotated tagged corpus for Myanmar and propose the grammar rules for Myanmar
sentences. Experiments show that our analysis achieves a good result with
simple sentences and complex sentences.Comment: 16 pages, 7 figures, 8 tables, AIAA-2011 (India). arXiv admin note:
text overlap with arXiv:0912.1820 by other author
Improving Lexical Choice in Neural Machine Translation
We explore two solutions to the problem of mistranslating rare words in
neural machine translation. First, we argue that the standard output layer,
which computes the inner product of a vector representing the context with all
possible output word embeddings, rewards frequent words disproportionately, and
we propose to fix the norms of both vectors to a constant value. Second, we
integrate a simple lexical module which is jointly trained with the rest of the
model. We evaluate our approaches on eight language pairs with data sizes
ranging from 100k to 8M words, and achieve improvements of up to +4.3 BLEU,
surpassing phrase-based translation in nearly all settings.Comment: Accepted at NAACL HLT 201
Developing Word-aligned Myanmar-English Parallel Corpus based on the IBM Models
Word alignment in bilingual corpora has been an active research
topic in the Machine Translation research groups. Corpus is the
body of text collections, which are useful for Language
Processing (NLP). Parallel text alignment is the identification of
the corresponding sentences in the parallel text. Large
collections of parallel level are prerequisite for many areas of
linguistic research. Parallel corpus helps in making statistical
bilingual dictionary, in supporting statistical machine translation
and in supporting as training data for word sense disambiguation
and translation disambiguation. Nowadays, the world is a global
network and everybody will be learned more than one language.
So, multilingual corpora are more processing. Thus, the main
purpose of this system is to construct word-aligned parallel
corpus to be able in Myanmar-English machine translation. One
useful concept is to identify correspondences between words in
one language and in other language. The proposed approach is
based on the first three IBM models and EM algorithm. It also
shows that the approach can also be improved by using a list of
cognates and morphological analysis
Statistical Machine Translation between Myanmar Sign Language and Myanmar Written Text
This paper contributes the first evaluation of the quality of automatic translation between Myanmar sign language (MSL) and Myanmar written text, in both directions. Our developing MSL-Myanmar parallel corpus was used for translations and the experiments were carried out using three different statistical machine translation (SMT) approaches: phrase-based, hierarchical phrase-based, and the operation sequence model. In addition, three different segmentation schemes were studies, these were syllable segmentation, word segmentation and sign unit based word segmentation. The results show that the highest quality machine translation was attained with syllable segmentations for both MSL and Myanmar written text
Development of Natural Language Processing based Communication and Educational Assisted Systems for the People with Hearing Disability in Myanmar
Information and communication technologies (ICTs) provide people with disabilities to better integrate socially and economically into their communities by supporting access to information and knowledge, learning and teaching situations, personal communication and interaction. Our research purpose is to develop systems that will provide communication and educational assistance to persons with hearing disability using Natural Language Processing (NLP). In this paper, we present corpus building for Myanmar sign language (MSL), Machine Translation (MT) between MSL, Myanmar written text (MWT) and Myanmar SignWriting (MSW) and two Fingerspelling keyboard layouts for Myanmar SignWriting. We believe that the outcome of this research is useful for educational contents and communication between hearing disability and general people
- …