10 research outputs found

    A retrospective view on the promise on machine translation for Bahasa Melayu-English

    Get PDF
    Research and development activities for machine translation systems from English language to others are more progressive than vice versa. It has been more than 30 years since the machine translation was introduced and yet a Malay language or Bahasa Melayu (BM) to English machine translation engine is not available. Consequently, many translation systems have been developed for the world's top 10 languages in terms of native speakers, but none for BM, although the language is used by more than 200 million speakers around the world. This paper attempts to seek possible reasons as why such situation occurs. A summative overview to show progress, challenges as well as future works on MT is presented. Issues faced by researchers and system developers in modeling and developing a machine translation engine are also discussed. The study of the previous translation systems (from other languages to English) reveals that the accuracy level can be achieved up to 85 %. The figure suggests that the translation system is not reliable if it is to be utilized in a serious translation activity. The most prominent difficulties are the complexity of grammar rules and ambiguity problems of the source language. Thus, we hypothesize that the inclusion of ‘semantic’ property in the translation rules may produce a better quality BM-English MT engine

    Design and implementation of a verb lexicon and verb sense disambiguator for Turkish

    Get PDF
    Ankara : Department of Computer Engineering and Information Science and Institute of Engineering and Science, Bilkent University, 1994.Thesis (Master's) -- -Bilkent University, 1994.Includes bibliographical refences.The lexicon has a crucial role in all natural language processing systems and has special importance in machine translation systems. This thesis presents the design and implementation of a verb lexicon and a verb sense disambigua- tor for Turkish. The lexicon contains only verbs because verbs encode events in sentences and play the most important role in natural language processing systems, especially in parsing (syntactic analyzing) and machine translation. The verb sense disambiguator uses the information stored in the verb lexicon that we developed. The main purpose of this tool is to disambiguate senses of verbs having several meanings, some of which are idiomatic. We also present a tool implemented in Lucid Common Lisp under X-Windows for adding, accessing, modifying, and removing entries of the lexicon, and a semantic concept ontology containing semantic features of commonly used Turkish nouns.Yılmaz, OkanM.S

    Semi-automatic acquisition of domain-specific semantic structures.

    Get PDF
    Siu, Kai-Chung.Thesis (M.Phil.)--Chinese University of Hong Kong, 2000.Includes bibliographical references (leaves 99-106).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Thesis Outline --- p.5Chapter 2 --- Background --- p.6Chapter 2.1 --- Natural Language Understanding --- p.6Chapter 2.1.1 --- Rule-based Approaches --- p.7Chapter 2.1.2 --- Stochastic Approaches --- p.8Chapter 2.1.3 --- Phrase-Spotting Approaches --- p.9Chapter 2.2 --- Grammar Induction --- p.10Chapter 2.2.1 --- Semantic Classification Trees --- p.11Chapter 2.2.2 --- Simulated Annealing --- p.12Chapter 2.2.3 --- Bayesian Grammar Induction --- p.12Chapter 2.2.4 --- Statistical Grammar Induction --- p.13Chapter 2.3 --- Machine Translation --- p.14Chapter 2.3.1 --- Rule-based Approach --- p.15Chapter 2.3.2 --- Statistical Approach --- p.15Chapter 2.3.3 --- Example-based Approach --- p.16Chapter 2.3.4 --- Knowledge-based Approach --- p.16Chapter 2.3.5 --- Evaluation Method --- p.19Chapter 3 --- Semi-Automatic Grammar Induction --- p.20Chapter 3.1 --- Agglomerative Clustering --- p.20Chapter 3.1.1 --- Spatial Clustering --- p.21Chapter 3.1.2 --- Temporal Clustering --- p.24Chapter 3.1.3 --- Free Parameters --- p.26Chapter 3.2 --- Post-processing --- p.27Chapter 3.3 --- Chapter Summary --- p.29Chapter 4 --- Application to the ATIS Domain --- p.30Chapter 4.1 --- The ATIS Domain --- p.30Chapter 4.2 --- Parameters Selection --- p.32Chapter 4.3 --- Unsupervised Grammar Induction --- p.35Chapter 4.4 --- Prior Knowledge Injection --- p.40Chapter 4.5 --- Evaluation --- p.43Chapter 4.5.1 --- Parse Coverage in Understanding --- p.45Chapter 4.5.2 --- Parse Errors --- p.46Chapter 4.5.3 --- Analysis --- p.47Chapter 4.6 --- Chapter Summary --- p.49Chapter 5 --- Portability to Chinese --- p.50Chapter 5.1 --- Corpus Preparation --- p.50Chapter 5.1.1 --- Tokenization --- p.51Chapter 5.2 --- Experiments --- p.52Chapter 5.2.1 --- Unsupervised Grammar Induction --- p.52Chapter 5.2.2 --- Prior Knowledge Injection --- p.56Chapter 5.3 --- Evaluation --- p.58Chapter 5.3.1 --- Parse Coverage in Understanding --- p.59Chapter 5.3.2 --- Parse Errors --- p.60Chapter 5.4 --- Grammar Comparison Across Languages --- p.60Chapter 5.5 --- Chapter Summary --- p.64Chapter 6 --- Bi-directional Machine Translation --- p.65Chapter 6.1 --- Bilingual Dictionary --- p.67Chapter 6.2 --- Concept Alignments --- p.68Chapter 6.3 --- Translation Procedures --- p.73Chapter 6.3.1 --- The Matching Process --- p.74Chapter 6.3.2 --- The Searching Process --- p.76Chapter 6.3.3 --- Heuristics to Aid Translation --- p.81Chapter 6.4 --- Evaluation --- p.82Chapter 6.4.1 --- Coverage --- p.83Chapter 6.4.2 --- Performance --- p.86Chapter 6.5 --- Chapter Summary --- p.89Chapter 7 --- Conclusions --- p.90Chapter 7.1 --- Summary --- p.90Chapter 7.2 --- Future Work --- p.92Chapter 7.2.1 --- Suggested Improvements on Grammar Induction Process --- p.92Chapter 7.2.2 --- Suggested Improvements on Bi-directional Machine Trans- lation --- p.96Chapter 7.2.3 --- Domain Portability --- p.97Chapter 7.3 --- Contributions --- p.97Bibliography --- p.99Chapter A --- Original SQL Queries --- p.107Chapter B --- Induced Grammar --- p.109Chapter C --- Seeded Categories --- p.11

    User Interfaces to the Web of Data based on Natural Language Generation

    Get PDF
    We explore how Virtual Research Environments based on Semantic Web technologies support research interactions with RDF data in various stages of corpus-based analysis, analyze the Web of Data in terms of human readability, derive labels from variables in SPARQL queries, apply Natural Language Generation to improve user interfaces to the Web of Data by verbalizing SPARQL queries and RDF graphs, and present a method to automatically induce RDF graph verbalization templates via distant supervision

    JTEC panel report on machine translation in Japan

    Get PDF
    The goal of this report is to provide an overview of the state of the art of machine translation (MT) in Japan and to provide a comparison between Japanese and Western technology in this area. The term 'machine translation' as used here, includes both the science and technology required for automating the translation of text from one human language to another. Machine translation is viewed in Japan as an important strategic technology that is expected to play a key role in Japan's increasing participation in the world economy. MT is seen in Japan as important both for assimilating information into Japanese as well as for disseminating Japanese information throughout the world. Most of the MT systems now available in Japan are transfer-based systems. The majority of them exploit a case-frame representation of the source text as the basis of the transfer process. There is a gradual movement toward the use of deeper semantic representations, and some groups are beginning to look at interlingua-based systems

    Semi-automatic grammar induction for bidirectional machine translation.

    Get PDF
    Wong, Chin Chung.Thesis (M.Phil.)--Chinese University of Hong Kong, 2002.Includes bibliographical references (leaves 137-143).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Objectives --- p.3Chapter 1.2 --- Thesis Outline --- p.5Chapter 2 --- Background in Natural Language Understanding --- p.6Chapter 2.1 --- Rule-based Approaches --- p.7Chapter 2.2 --- Corpus-based Approaches --- p.8Chapter 2.2.1 --- Stochastic Approaches --- p.8Chapter 2.2.2 --- Phrase-spotting Approaches --- p.9Chapter 2.3 --- The ATIS Domain --- p.10Chapter 2.3.1 --- Chinese Corpus Preparation --- p.11Chapter 3 --- Semi-automatic Grammar Induction - Baseline Approach --- p.13Chapter 3.1 --- Background in Grammar Induction --- p.13Chapter 3.1.1 --- Simulated Annealing --- p.14Chapter 3.1.2 --- Bayesian Grammar Induction --- p.14Chapter 3.1.3 --- Probabilistic Grammar Acquisition --- p.15Chapter 3.2 --- Semi-automatic Grammar Induction 一 Baseline Approach --- p.16Chapter 3.2.1 --- Spatial Clustering --- p.16Chapter 3.2.2 --- Temporal Clustering --- p.18Chapter 3.2.3 --- Post-processing --- p.19Chapter 3.2.4 --- Four Aspects for Enhancements --- p.20Chapter 3.3 --- Chapter Summary --- p.22Chapter 4 --- Semi-automatic Grammar Induction - Enhanced Approach --- p.23Chapter 4.1 --- Evaluating Induced Grammars --- p.24Chapter 4.2 --- Stopping Criterion --- p.26Chapter 4.2.1 --- Cross-checking with Recall Values --- p.29Chapter 4.3 --- Improvements on Temporal Clustering --- p.32Chapter 4.3.1 --- Evaluation --- p.39Chapter 4.4 --- Improvements on Spatial Clustering --- p.46Chapter 4.4.1 --- Distance Measures --- p.48Chapter 4.4.2 --- Evaluation --- p.57Chapter 4.5 --- Enhancements based on Intelligent Selection --- p.62Chapter 4.5.1 --- Informed Selection between Spatial Clustering and Tem- poral Clustering --- p.62Chapter 4.5.2 --- Selecting the Number of Clusters Per Iteration --- p.64Chapter 4.5.3 --- An Example for Intelligent Selection --- p.64Chapter 4.5.4 --- Evaluation --- p.68Chapter 4.6 --- Chapter Summary --- p.71Chapter 5 --- Bidirectional Machine Translation using Induced Grammars ´ؤBaseline Approach --- p.73Chapter 5.1 --- Background in Machine Translation --- p.75Chapter 5.1.1 --- Rule-based Machine Translation --- p.75Chapter 5.1.2 --- Statistical Machine Translation --- p.76Chapter 5.1.3 --- Knowledge-based Machine Translation --- p.77Chapter 5.1.4 --- Example-based Machine Translation --- p.78Chapter 5.1.5 --- Evaluation --- p.79Chapter 5.2 --- Baseline Configuration on Bidirectional Machine Translation System --- p.84Chapter 5.2.1 --- Bilingual Dictionary --- p.84Chapter 5.2.2 --- Concept Alignments --- p.85Chapter 5.2.3 --- Translation Process --- p.89Chapter 5.2.4 --- Two Aspects for Enhancements --- p.90Chapter 5.3 --- Chapter Summary --- p.91Chapter 6 --- Bidirectional Machine Translation ´ؤ Enhanced Approach --- p.92Chapter 6.1 --- Concept Alignments --- p.93Chapter 6.1.1 --- Enhanced Alignment Scheme --- p.95Chapter 6.1.2 --- Experiment --- p.97Chapter 6.2 --- Grammar Checker --- p.100Chapter 6.2.1 --- Components for Grammar Checking --- p.101Chapter 6.3 --- Evaluation --- p.117Chapter 6.3.1 --- Bleu Score Performance --- p.118Chapter 6.3.2 --- Modified Bleu Score --- p.122Chapter 6.4 --- Chapter Summary --- p.130Chapter 7 --- Conclusions --- p.131Chapter 7.1 --- Summary --- p.131Chapter 7.2 --- Contributions --- p.134Chapter 7.3 --- Future work --- p.136Bibliography --- p.137Chapter A --- Original SQL Queries --- p.144Chapter B --- Seeded Categories --- p.146Chapter C --- 3 Alignment Categories --- p.147Chapter D --- Labels of Syntactic Structures in Grammar Checker --- p.14

    Caught in the middle – language use and translation : a festschrift for Erich Steiner on the occasion of his 60th birthday

    Get PDF
    This book celebrates Erich Steiner’s scholarly work. In 25 contributions, colleagues and friends take up issues closely related to his research interests in linguistics and translation studies. The result is a colourful kaleidoscope reflecting the many strands of research questions that Erich Steiner helped advance in the past decades and the cheerful, inspiring atmosphere he continues to create

    Traduction automatique statistique et adaptation à un domaine spécialisé

    Get PDF
    Nous avons observé depuis plusieurs années l émergence des approches statistiques pour la traduction automatique. Cependant, l efficacité des modèles construits est soumise aux variabilités inhérentes au langage naturel. Des études ont montré la présence de vocabulaires spécifique et général composant les corpus de textes de domaines spécialisés. Cette particularité peut être prise en charge par des ressources terminologiques comme les lexiques bilingues.Toutefois, nous pensons que si le vocabulaire est différent entre des textes spécialisés ou génériques, le contenu sémantique et la structure syntaxique peuvent aussi varier. Dans nos travaux,nous considérons la tâche d adaptation aux domaines spécialisés pour la traduction automatique statistique selon deux axes majeurs : l acquisition de lexiques bilingues et l édition a posteriori de traductions issues de systèmes automatiques. Nous évaluons l efficacité des approches proposées dans un contexte spécialisé : le domaine médical. Nos résultats sont comparés aux travaux précédents concernant cette tâche. De manière générale, la qualité des traductions issues de systèmes automatiques pour le domaine médical est améliorée par nos propositions. Des évaluations en oracle tendent à montrer qu il existe une marge de progression importanteThese last years have seen the development of statistical approaches for machine translation. Nevertheless, the intrinsic variations of the natural language act upon the quality of statistical models. Studies have shown that in-domain corpora containwords that can occur in out-of-domain corpora (common words), but also contain domain specific words. This particularity can be handled by terminological resources like bilingual lexicons. However, if the vocabulary differs between out and in-domain data, the syntactic and semantic content may also vary. In our work, we consider the task of domain adaptation for statistical machine translation through two majoraxes : bilingual lexicon acquisition and post-edition of machine translation outputs.We evaluate our approaches on the medical domain. The quality of automatic translations in the medical domain are improved and the results are compared to other works in this field. Oracle evaluations tend to show that further gains are still possibleAVIGNON-Bib. numérique (840079901) / SudocSudocFranceF
    corecore