54,626 research outputs found
Neural Machine Translation into Language Varieties
Both research and commercial machine translation have so far neglected the
importance of properly handling the spelling, lexical and grammar divergences
occurring among language varieties. Notable cases are standard national
varieties such as Brazilian and European Portuguese, and Canadian and European
French, which popular online machine translation services are not keeping
distinct. We show that an evident side effect of modeling such varieties as
unique classes is the generation of inconsistent translations. In this work, we
investigate the problem of training neural machine translation from English to
specific pairs of language varieties, assuming both labeled and unlabeled
parallel texts, and low-resource conditions. We report experiments from English
to two pairs of dialects, EuropeanBrazilian Portuguese and European-Canadian
French, and two pairs of standardized varieties, Croatian-Serbian and
Indonesian-Malay. We show significant BLEU score improvements over baseline
systems when translation into similar languages is learned as a multilingual
task with shared representations.Comment: Published at EMNLP 2018: third conference on machine translation (WMT
2018
Genre-based Course Book for Hospitality Departmentn in Surakarta
This research is aimed at designing ESP Course book at SMK Sahid Surakarta that mainly focus: To investigate the quality of existing learning book used in English teaching and learning at SMK especially in hospitality department and to describe the design of Genre-based ESP course book for hospitality department of SMK.This research and development was carried out in SMK Sahid Surakarta in the academic year of 2015/2016. The number of population was three classes (that consisted of the eighth grade of APH1,APH2, APH3. The samples were 30 students of APH1.The product of this study is the genre-based course book for hospitality department with integrated skills, syllabus and course grid as the models for lesson plan. The course book consists of standard competence, topics, basic competence (core material), general aims or indicators, teaching and learning activities, methods and media, assessment, the allotted time and sources of the materials. The role and design of instructional materials are a key to help teacher and students being bale to use language in specific context. The proposed course book consists of 2 units and each unit has a topic which is developed to 19 activities. The teaching activities included in the course book are starting point, modeling, joint construction, and independent construction. Such features are added as vocabulary notes, grammar point, useful expression, and for your information to support the fourth stages of activities
Modeling Global Syntactic Variation in English Using Dialect Classification
This paper evaluates global-scale dialect identification for 14 national
varieties of English as a means for studying syntactic variation. The paper
makes three main contributions: (i) introducing data-driven language mapping as
a method for selecting the inventory of national varieties to include in the
task; (ii) producing a large and dynamic set of syntactic features using
grammar induction rather than focusing on a few hand-selected features such as
function words; and (iii) comparing models across both web corpora and social
media corpora in order to measure the robustness of syntactic variation across
registers
An implementation of the behavior annex in the AADL-toolset Osate2
AADL is a modeling language to design and analyze High-Integrity Distributed and Real-time systems. Embedded sub-languages published as AADL annexes extend an AADL model to enhance analysis. The behavior annex specifies the behavior of an AADL application model. An implantation of this annex allows to perform behavior analysis. In addition, as there are several AADL annexes, the implementation of generic mechanisms to support each one of them is challenging. The behavior annex is a valid candidate to illustrate these challenges by combining several sub-languages. In this paper we expose our experiment to support the behavior annex in the reference AADL toolset OSATE2. This one, supports the AADL version 2 by providing a front-end and a set of analysis plug-ins to analyze an AADL model
Frequency vs. Association for Constraint Selection in Usage-Based Construction Grammar
A usage-based Construction Grammar (CxG) posits that slot-constraints
generalize from common exemplar constructions. But what is the best model of
constraint generalization? This paper evaluates competing frequency-based and
association-based models across eight languages using a metric derived from the
Minimum Description Length paradigm. The experiments show that
association-based models produce better generalizations across all languages by
a significant margin
Developing and applying heterogeneous phylogenetic models with XRate
Modeling sequence evolution on phylogenetic trees is a useful technique in
computational biology. Especially powerful are models which take account of the
heterogeneous nature of sequence evolution according to the "grammar" of the
encoded gene features. However, beyond a modest level of model complexity,
manual coding of models becomes prohibitively labor-intensive. We demonstrate,
via a set of case studies, the new built-in model-prototyping capabilities of
XRate (macros and Scheme extensions). These features allow rapid implementation
of phylogenetic models which would have previously been far more
labor-intensive. XRate's new capabilities for lineage-specific models,
ancestral sequence reconstruction, and improved annotation output are also
discussed. XRate's flexible model-specification capabilities and computational
efficiency make it well-suited to developing and prototyping phylogenetic
grammar models. XRate is available as part of the DART software package:
http://biowiki.org/DART .Comment: 34 pages, 3 figures, glossary of XRate model terminolog
- …