54,626 research outputs found

    Neural Machine Translation into Language Varieties

    Full text link
    Both research and commercial machine translation have so far neglected the importance of properly handling the spelling, lexical and grammar divergences occurring among language varieties. Notable cases are standard national varieties such as Brazilian and European Portuguese, and Canadian and European French, which popular online machine translation services are not keeping distinct. We show that an evident side effect of modeling such varieties as unique classes is the generation of inconsistent translations. In this work, we investigate the problem of training neural machine translation from English to specific pairs of language varieties, assuming both labeled and unlabeled parallel texts, and low-resource conditions. We report experiments from English to two pairs of dialects, EuropeanBrazilian Portuguese and European-Canadian French, and two pairs of standardized varieties, Croatian-Serbian and Indonesian-Malay. We show significant BLEU score improvements over baseline systems when translation into similar languages is learned as a multilingual task with shared representations.Comment: Published at EMNLP 2018: third conference on machine translation (WMT 2018

    Genre-based Course Book for Hospitality Departmentn in Surakarta

    Get PDF
    This research is aimed at designing ESP Course book at SMK Sahid Surakarta that mainly focus: To investigate the quality of existing learning book used in English teaching and learning at SMK especially in hospitality department and to describe the design of Genre-based ESP course book for hospitality department of SMK.This research and development was carried out in SMK Sahid Surakarta in the academic year of 2015/2016. The number of population was three classes (that consisted of the eighth grade of APH1,APH2, APH3. The samples were 30 students of APH1.The product of this study is the genre-based course book for hospitality department with integrated skills, syllabus and course grid as the models for lesson plan. The course book consists of standard competence, topics, basic competence (core material), general aims or indicators, teaching and learning activities, methods and media, assessment, the allotted time and sources of the materials. The role and design of instructional materials are a key to help teacher and students being bale to use language in specific context. The proposed course book consists of 2 units and each unit has a topic which is developed to 19 activities. The teaching activities included in the course book are starting point, modeling, joint construction, and independent construction. Such features are added as vocabulary notes, grammar point, useful expression, and for your information to support the fourth stages of activities

    Modeling Global Syntactic Variation in English Using Dialect Classification

    Get PDF
    This paper evaluates global-scale dialect identification for 14 national varieties of English as a means for studying syntactic variation. The paper makes three main contributions: (i) introducing data-driven language mapping as a method for selecting the inventory of national varieties to include in the task; (ii) producing a large and dynamic set of syntactic features using grammar induction rather than focusing on a few hand-selected features such as function words; and (iii) comparing models across both web corpora and social media corpora in order to measure the robustness of syntactic variation across registers

    An implementation of the behavior annex in the AADL-toolset Osate2

    Get PDF
    AADL is a modeling language to design and analyze High-Integrity Distributed and Real-time systems. Embedded sub-languages published as AADL annexes extend an AADL model to enhance analysis. The behavior annex specifies the behavior of an AADL application model. An implantation of this annex allows to perform behavior analysis. In addition, as there are several AADL annexes, the implementation of generic mechanisms to support each one of them is challenging. The behavior annex is a valid candidate to illustrate these challenges by combining several sub-languages. In this paper we expose our experiment to support the behavior annex in the reference AADL toolset OSATE2. This one, supports the AADL version 2 by providing a front-end and a set of analysis plug-ins to analyze an AADL model

    Frequency vs. Association for Constraint Selection in Usage-Based Construction Grammar

    Get PDF
    A usage-based Construction Grammar (CxG) posits that slot-constraints generalize from common exemplar constructions. But what is the best model of constraint generalization? This paper evaluates competing frequency-based and association-based models across eight languages using a metric derived from the Minimum Description Length paradigm. The experiments show that association-based models produce better generalizations across all languages by a significant margin

    Developing and applying heterogeneous phylogenetic models with XRate

    Get PDF
    Modeling sequence evolution on phylogenetic trees is a useful technique in computational biology. Especially powerful are models which take account of the heterogeneous nature of sequence evolution according to the "grammar" of the encoded gene features. However, beyond a modest level of model complexity, manual coding of models becomes prohibitively labor-intensive. We demonstrate, via a set of case studies, the new built-in model-prototyping capabilities of XRate (macros and Scheme extensions). These features allow rapid implementation of phylogenetic models which would have previously been far more labor-intensive. XRate's new capabilities for lineage-specific models, ancestral sequence reconstruction, and improved annotation output are also discussed. XRate's flexible model-specification capabilities and computational efficiency make it well-suited to developing and prototyping phylogenetic grammar models. XRate is available as part of the DART software package: http://biowiki.org/DART .Comment: 34 pages, 3 figures, glossary of XRate model terminolog
    corecore