190,207 research outputs found

    Low-resource machine translation using MATREX: The DCU machine translation system for IWSLT 2009

    Get PDF
    In this paper, we give a description of the Machine Translation (MT) system developed at DCU that was used for our fourth participation in the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT 2009). Two techniques are deployed in our system in order to improve the translation quality in a low-resource scenario. The first technique is to use multiple segmentations in MT training and to utilise word lattices in decoding stage. The second technique is used to select the optimal training data that can be used to build MT systems. In this year’s participation, we use three different prototype SMT systems, and the output from each system are combined using standard system combination method. Our system is the top system for Chinese–English CHALLENGE task in terms of BLEU score

    Building on a terminology resource – the Irish experience

    Get PDF
    www.focal.ie is the national database of Irish language terminology. In this paper, we examine: (i) the impact achieved by this resource in the five year period since work commenced; (ii) the possibilities which have arisen from one project over a short time span, to develop sub-projects and related initiatives; and (iii) the advantages and opportunities arising from the creation of one high-quality electronic language resource. The Irish case shows that the development of high-quality resources for a lesser-used language can have interesting and unexpected knock-on effects. We present eight stages and aspects of term planning: preparation/planning; research; standardisation; dissemination; implantation; evaluation; modernisation/maintenance; and training. Fiontar, in its work,has moved from its initial involvement in the dissemination of terminology, to take an active part in other aspects of term planning for Irish: research, standardisation, evaluation, modernisation and training. This has been achieved through editorial and technological development, in partnership with key stakeholders and always from a socioterminological point of view – that is, with an emphasis on terminology as an aspect of language planning and from the point of view of users in particular. Particular projects described include Focal as a term management system and as a user resource; tools for translators; user links to a corpus; the development of a new sports dictionary; and research into subject field headings. Two related projects are the LEX legal terms project for term extraction and standardisation, and the development of terminology for the European Union

    Beyond English text: Multilingual and multimedia information retrieval.

    Get PDF
    Non

    In no uncertain terms : a dataset for monolingual and multilingual automatic term extraction from comparable corpora

    Get PDF
    Automatic term extraction is a productive field of research within natural language processing, but it still faces significant obstacles regarding datasets and evaluation, which require manual term annotation. This is an arduous task, made even more difficult by the lack of a clear distinction between terms and general language, which results in low inter-annotator agreement. There is a large need for well-documented, manually validated datasets, especially in the rising field of multilingual term extraction from comparable corpora, which presents a unique new set of challenges. In this paper, a new approach is presented for both monolingual and multilingual term annotation in comparable corpora. The detailed guidelines with different term labels, the domain- and language-independent methodology and the large volumes annotated in three different languages and four different domains make this a rich resource. The resulting datasets are not just suited for evaluation purposes but can also serve as a general source of information about terms and even as training data for supervised methods. Moreover, the gold standard for multilingual term extraction from comparable corpora contains information about term variants and translation equivalents, which allows an in-depth, nuanced evaluation

    An Investigation into the Pedagogical Features of Documents

    Full text link
    Characterizing the content of a technical document in terms of its learning utility can be useful for applications related to education, such as generating reading lists from large collections of documents. We refer to this learning utility as the "pedagogical value" of the document to the learner. While pedagogical value is an important concept that has been studied extensively within the education domain, there has been little work exploring it from a computational, i.e., natural language processing (NLP), perspective. To allow a computational exploration of this concept, we introduce the notion of "pedagogical roles" of documents (e.g., Tutorial and Survey) as an intermediary component for the study of pedagogical value. Given the lack of available corpora for our exploration, we create the first annotated corpus of pedagogical roles and use it to test baseline techniques for automatic prediction of such roles.Comment: 12th Workshop on Innovative Use of NLP for Building Educational Applications (BEA) at EMNLP 2017; 12 page

    Contextual bitext-derived paraphrases in automatic MT evaluation

    Get PDF
    In this paper we present a novel method for deriving paraphrases during automatic MT evaluation using only the source and reference texts, which are necessary for the evaluation, and word and phrase alignment software. Using target language paraphrases produced through word and phrase alignment a number of alternative reference sentences are constructed automatically for each candidate translation. The method produces lexical and lowlevel syntactic paraphrases that are relevant to the domain in hand, does not use external knowledge resources, and can be combined with a variety of automatic MT evaluation system

    The Question of Competence: Reconsidering Medical Education in the Twenty-First Century

    Get PDF
    [Excerpt] The real challenge for those involved in designing competency-based educational programs is to recognize the complexity of competence as a concept. Only then can they effectively delineate the knowledge, skills, and attitudes that learners must acquire to be able to perform within each domain at a predetermined level and to recognize that the expected level of performance within each domain will vary depending on the learner\u27s stage of education and the specialty he or she is learning. The authors of this book help us do just that. They examine the challenges facing medical education and introduce the concept of discourse as a mechanism both for examining the idea of competence and considering how to implement competency-based education. In so doing, they provide us with a new way to ask the questions that are at the heart of every report advocating change, every criticism of medical education, and every conversation that questions why health care is the way it is today
    • 

    corecore