5,399 research outputs found


    Get PDF
    One of the fundamental characteristics of language is that it can change over time. One method to monitor the change is by observing its corpora: a structured language documentation. Recent development in technology, especially in the field of Natural Language Processing allows robust linguistic processing, which support the description of diverse historical changes of the corpora. The interference of human linguist is inevitable as it determines the gold standard, but computer assistance provides considerable support by incorporating computational approach in exploring the corpora, especially historical corpora. This paper proposes a model for corpus development, where corpus are annotated to support further computational operations such as lexicogrammatical pattern matching, automatic retrieval and extraction. The corpus processing operations are performed by local grammar based corpus processing software on a contemporary Indonesian corpus. This paper concludes that data collection and data processing in a corpus are equally crucial importance to monitor language change, and none can be set aside

    Joining hands: developing a sign language machine translation system with and for the deaf community

    Get PDF
    This paper discusses the development of an automatic machine translation (MT) system for translating spoken language text into signed languages (SLs). The motivation for our work is the improvement of accessibility to airport information announcements for D/deaf and hard of hearing people. This paper demonstrates the involvement of Deaf colleagues and members of the D/deaf community in Ireland in three areas of our research: the choice of a domain for automatic translation that has a practical use for the D/deaf community; the human translation of English text into Irish Sign Language (ISL) as well as advice on ISL grammar and linguistics; and the importance of native ISL signers as manual evaluators of our translated output

    ClinkNotes: Towards a Corpus-Based, Machine-Aided Programme of Translation Teaching

    Get PDF
    Le prĂ©sent article fait l’état des lieux d’un projet pilote relatif Ă  la crĂ©ation d’une plateforme conçue pour l’enseignement de la traduction ou la formation bilingue, Ă  grande Ă©chelle, aux Ă©tudes supĂ©rieures. Bien que les premiers textes utilisĂ©s dans le cadre du projet soient en anglais et en chinois, le programme, ClinkNotes, offre la possibilitĂ© de prendre en charge des corpus parallĂšles de n’importe quelle paire de langues. L’article dĂ©bute par un bref survol de l’application des corpus Ă  la traductologie en lien avec la formation professionnelle en traduction. Puis les caractĂ©ristiques du programme (cadre thĂ©orique, mĂ©thode d’annotation et fonctionnement) sont prĂ©sentĂ©es, ainsi que la maniĂšre dont il comble les impĂ©ratifs pressants de la profession. Les perspectives futures d’amĂ©lioration du programme sont Ă©galement discutĂ©es.This article presents a report on a pilot project designed to construct a platform for large-scale teaching of translation or bilingual training at tertiary level. The programme, ClinkNotes, has the potential of accommodating parallel corpora of any language pairs, although the primary data used in this project are in English and Chinese. The report begins with a brief overview of the development of corpus-based approach to translation studies in relation to that of translation teaching as a profession. It then proceeds to describe the actual design (i.e., the theoretical framework, the methodology of annotation, and the simple execution of the software programme), and how it helps to cater to the pressing needs of the profession. The prospects of further development of the programme are also discussed

    Elaboration of a RST Chinese Treebank

    Get PDF
    [EN] As a subfield of Artificial Intelligence (AI), Natural Language Processing (NLP) aims to automatically process human languages. Fruitful achievements of variant studies from different research fields for NLP exist. Among these research fields, discourse analysis is becoming more and more popular. Discourse information is crucial for NLP studies. As the most spoken language in the world, Chinese occupy a very important position in NLP analysis. Therefore, this work aims to present a discourse treebank for Chinese, whose theoretical framework is Rhetorical Structure Theory (RST) (Mann and Thompson, 1988). In this work, 50 Chinese texts form the research corpus and the corpus can be consulted from the following aspects: segmentation, central unit (CU) and discourse structure. Finally, we create an open online interface for the Chinese treebank.[EU] Adimen Artifizialaren (AA) barneko arlo bat izanez, Hizkuntzaren Prozesamenduak (HP) giza-hizkuntzak automatikoko prozesatzea du helburu. Arlo horretako ikasketa anitzetan lorpen emankor asko eman dira. Ikasketa-arlo ezberdin horien artean, diskurtso-analisia gero eta ezagunagoa da. Diskurtsoko inforamzioa interes handikoa da HPko ikasketetan. Munduko hiztun gehien duen hizkuntza izanda, txinera aztertzea oso garrantzitsua da HPan egiten ari diren ikasketetarako. Hori dela eta, lan honek txinerako diskurtso-egituraz etiketaturiko zuhaitz-banku bat aurkeztea du helburu, Egitura Erretorikoaren Teoria (EET) (Mann eta Thompson, 1988) oinarrituta. Lan honetan, ikerketa-corpusa 50 testu txinatarrez osatu da, ea zuhaitz-bankua hiru etiketatze-mailatan aurkeztuko da: segmentazioa, unitate zentrala (UZ) eta diskurtso-egitura. Azkenik, corpusa webgune batean argitaratu da zuhaitz-bankua kontsultatzeko
