Search CORE

1,431 research outputs found

UCSY-SC1: A Myanmar speech corpus for automatic speech recognition

Author: Mon Aye Nyein
Pa Win Pa
Thu Ye Kyaw
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/08/2019
Field of study

This paper introduces a speech corpus which is developed for Myanmar Automatic Speech Recognition (ASR) research. Automatic Speech Recognition (ASR) research has been conducted by the researchers around the world to improve their language technologies. Speech corpora are important in developing the ASR and the creation of the corpora is necessary especially for low-resourced languages. Myanmar language can be regarded as a low-resourced language because of lack of pre-created resources for speech processing research. In this work, a speech corpus named UCSY-SC1 (University of Computer Studies Yangon - Speech Corpus1) is created for Myanmar ASR research. The corpus consists of two types of domain: news and daily conversations. The total size of the speech corpus is over 42 hrs. There are 25 hrs of web news and 17 hrs of conversational recorded data.The corpus was collected from 177 females and 84 males for the news data and 42 females and 4 males for conversational domain. This corpus was used as training data for developing Myanmar ASR. Three different types of acoustic models such as Gaussian Mixture Model (GMM) - Hidden Markov Model (HMM), Deep Neural Network (DNN), and Convolutional Neural Network (CNN) models were built and compared their results. Experiments were conducted on different data sizes and evaluation is done by two test sets: TestSet1, web news and TestSet2, recorded conversational data. It showed that the performance of Myanmar ASRs using this corpus gave satisfiable results on both test sets. The Myanmar ASR using this corpus leading to word error rates of 15.61% on TestSet1 and 24.43% on TestSet2

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Developing a Chunk-based Grammar Checker for Translated English Sentences

Author: Lin Nay Yee
Soe Khin Mar
Thein Ni Lar
Publication venue: Institute of Digital Enhancement of Cognitive Processing, Waseda University
Publication date: 01/01/2011
Field of study

Waseda University Repository

ミャンマー語テキストの形式手法による音節分割、正規化と辞書順排列

Author: TIN HTAY HLAING
Publication venue: 教授（主査）三上喜貴、教授　湯川高志、准教授　武井由智、准教授　山本和英、東京外国語大学教授　町田和彦
Publication date: 31/08/2014
Field of study

国立大学法人長岡技術科学大

Nagaoka University of Technology Institutional Repository

Improving Lexical Choice in Neural Machine Translation

Author: Chiang David
Nguyen Toan Q.
Publication venue
Publication date: 01/01/2018
Field of study

We explore two solutions to the problem of mistranslating rare words in neural machine translation. First, we argue that the standard output layer, which computes the inner product of a vector representing the context with all possible output word embeddings, rewards frequent words disproportionately, and we propose to fix the norms of both vectors to a constant value. Second, we integrate a simple lexical module which is jointly trained with the rest of the model. We evaluate our approaches on eight language pairs with data sizes ranging from 100k to 8M words, and achieve improvements of up to +4.3 BLEU, surpassing phrase-based translation in nearly all settings.Comment: Accepted at NAACL HLT 201

arXiv.org e-Print Archive

Crossref

Demonstrating Positive Obligations: Children's Rights and Peaceful Protest in International Law

Author: Daly Aoife
Publication venue
Publication date: 01/01/2013
Field of study

Recently there has been a significant increase in the involvement of children and young people in protests across the globe. As a result of this increase, children have directly influenced political change but have also faced threats to their safety. This raises distinct children’s rights issues, and the trends identified necessitate both conceptualizing protest involvement from a children’s rights perspective, and critically examining the manner in which the law — at both a national and international level — has approached the involvement of children in such activities. This Article examines the positive obligations of States and argues that children should be recognized as a distinct, valid, and sometimes vulnerable group that has the right to protest and the right to be facilitated in doing so

University of Essex Research Repository

Source side pre-ordering using recurrent neural networks for English-Myanmar machine translation

Author: Nyein May Kyi
Soe Khin Mar
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/10/2021
Field of study

Word reordering has remained one of the challenging problems for machine translation when translating between language pairs with different word orders e.g. English and Myanmar. Without reordering between these languages, a source sentence may be translated directly with similar word order and translation can not be meaningful. Myanmar is a subject-objectverb (SOV) language and an effective reordering is essential for translation. In this paper, we applied a pre-ordering approach using recurrent neural networks to pre-order words of the source Myanmar sentence into target English’s word order. This neural pre-ordering model is automatically derived from parallel word-aligned data with syntactic and lexical features based on dependency parse trees of the source sentences. This can generate arbitrary permutations that may be non-local on the sentence and can be combined into English-Myanmar machine translation. We exploited the model to reorder English sentences into Myanmar-like word order as a preprocessing stage for machine translation, obtaining improvements quality comparable to baseline rule-based pre-ordering approach on asian language treebank (ALT) corpus

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Genetic Algorithm (GA) in Feature Selection for CRF Based Manipuri Multiword Expression (MWE) Identification

Author: Bandyopadhyay Sivaji
Nongmeikapam Kishorjit
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 10/11/2011
Field of study

This paper deals with the identification of Multiword Expressions (MWEs) in Manipuri, a highly agglutinative Indian Language. Manipuri is listed in the Eight Schedule of Indian Constitution. MWE plays an important role in the applications of Natural Language Processing(NLP) like Machine Translation, Part of Speech tagging, Information Retrieval, Question Answering etc. Feature selection is an important factor in the recognition of Manipuri MWEs using Conditional Random Field (CRF). The disadvantage of manual selection and choosing of the appropriate features for running CRF motivates us to think of Genetic Algorithm (GA). Using GA we are able to find the optimal features to run the CRF. We have tried with fifty generations in feature selection along with three fold cross validation as fitness function. This model demonstrated the Recall (R) of 64.08%, Precision (P) of 86.84% and F-measure (F) of 73.74%, showing an improvement over the CRF based Manipuri MWE identification without GA application.Comment: 14 pages, 6 figures, see http://airccse.org/journal/jcsit/1011csit05.pd

arXiv.org e-Print Archive

Crossref