76 research outputs found

    Exploration of Corpus Augmentation Approach for English-Hindi Bidirectional Statistical Machine Translation System

    Get PDF
    Even though lot of Statistical Machine Translation(SMT) research work is happening for English-Hindi language pair, there is no effort done to standardize the dataset. Each of the research work uses different dataset, different parameters and different number of sentences during various phases of translation resulting in varied translation output. So comparing  these models, understand the result of these models, to get insight into corpus behavior for these models, regenerating the result of these research work  becomes tedious. This necessitates the need for standardization of dataset and to identify the common parameter for the development of model.  The main contribution of this paper is to discuss an approach to standardize the dataset and to identify the best parameter which in combination gives best performance. It also investigates a novel corpus augmentation approach to improve the translation quality of English-Hindi bidirectional statistical machine translation system. This model works well for the scarce resource without incorporating the external parallel data corpus of the underlying language.  This experiment is carried out using Open Source phrase-based toolkit Moses. Indian Languages Corpora Initiative (ILCI) Hindi-English tourism corpus is used.  With limited dataset, considerable improvement is achieved using the corpus augmentation approach for the English-Hindi bidirectional SMT system

    Example based English to Bengali machine translation

    Get PDF
    This thesis report is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science and Engineering, 2008.Cataloged from PDF version of thesis report.Includes bibliographical references (page 31).In this thesis we propose a new architecture for example based English to Bengali machine translation. The proposed Example Based Machine Translation (EBMT) system has five steps: 1) Tagging 2) Parsing 3) Prepare the chunks of the sentence using sub-sentential EBMT 4) Using an efficient adapting scheme match the sentence rule 5) Translate from English to Bengali in the chunk and generate output with morphological analysis. We prepared our tag set for tagging the English sentence. Here we proposed an optimal adapting scheme for choosing sentence rule from the knowledge base of the EBMT system. Our current system can translate simple sentences. We also defined a way to translate a complex sentence using sub-sentential EBMT. As this system can add more rules in the knowledge base, eventually it can be use for general purpose English to Bengali machine translation.Khan Md. Anwarus SalamB. Computer Science and Engineering

    Analysis of Topic Modeling on Phrase-Based SMT system for English-Hindi Translation

    Get PDF
    After availability of cheaper large memory and high performance processors, Statistical Machine Translation (SMT) methods have drawn attention of researchers NLP. Phrase-based SMT has shown better results than word-based SMT. To improve performance of machine translation system further, different systems have been developed which use phrase-based SMT as a baseline system. Domain adaptation is one of most popular example of such systems. In this paper also phrase-based SMT system is used as baseline to apply topic model for English-Hindi translation. This baseline system is also used for result comparison with topic model system. Both systems are trained using MERT. The analysis shows improvement in results obtained by using topic modeling system
    corecore