49,494 research outputs found

    On the Similarities Between Native, Non-native and Translated Texts

    Full text link
    We present a computational analysis of three language varieties: native, advanced non-native, and translation. Our goal is to investigate the similarities and differences between non-native language productions and translations, contrasting both with native language. Using a collection of computational methods we establish three main results: (1) the three types of texts are easily distinguishable; (2) non-native language and translations are closer to each other than each of them is to native language; and (3) some of these characteristics depend on the source or native language, while others do not, reflecting, perhaps, unified principles that similarly affect translations and non-native language.Comment: ACL2016, 12 page

    Towards a corpus-based, statistical approach of translation quality : measuring and visualizing linguistic deviance in student translations

    Get PDF
    In this article we present a corpus-based statistical approach to measuring translation quality, more particularly translation acceptability, by comparing the features of translated and original texts. We discuss initial findings that aim to support and objectify formative quality assessment. To that end, we extract a multitude of linguistic and textual features from both student and professional translation corpora that consist of many different translations by several translators in two different genres (fiction, news) and in two translation directions (English to French and French to Dutch). The numerical information gathered from these corpora is exploratively analysed with Principal Component Analysis, which enables us to identify stable, language-independent linguistic and textual indicators of student translations compared to translations produced by professionals. The differences between these types of translation are subsequently tested by means of ANOVA. The results clearly indicate that the proposed methodology is indeed capable of distinguishing between student and professional translations. It is claimed that this deviant behaviour indicates an overall lower translation quality in student translations: student translations tend to score lower at the acceptability level, that is, they deviate significantly from target-language norms and conventions. In addition, the proposed methodology is capable of assessing the acceptability of an individual student’s translation – a smaller linguistic distance between a given student translation and the norm set by the professional translations correlates with higher quality. The methodology is also able to provide objective and concrete feedback about the divergent linguistic dimensions in their text

    Linguistic complexity: English vs. Polish, text vs. corpus

    Full text link
    We analyze the rank-frequency distributions of words in selected English and Polish texts. We show that for the lemmatized (basic) word forms the scale-invariant regime breaks after about two decades, while it might be consistent for the whole range of ranks for the inflected word forms. We also find that for a corpus consisting of texts written by different authors the basic scale-invariant regime is broken more strongly than in the case of comparable corpus consisting of texts written by the same author. Similarly, for a corpus consisting of texts translated into Polish from other languages the scale-invariant regime is broken more strongly than for a comparable corpus of native Polish texts. Moreover, we find that if the words are tagged with their proper part of speech, only verbs show rank-frequency distribution that is almost scale-invariant

    Analyzing Use of Thanks to You: Insights for Language Teaching and Assessment in Second and Foreign Language Contexts

    Get PDF
    This investigation of thanks to you in British and American usage was precipitated by a situation at an American university, in which a native Arabic speaker said thanks to you in isolation, making his intended meaning unclear. The study analyzes use of thanks to you in the Corpus of Contemporary American English and the British National Corpus to gain insights for English language instruction /assessment in the American context, as well as English-as-a-lingua-franca contexts where the majority of speakers are not native speakers of English or are speakers of different varieties of English but where American or British English are for educational purposes the standard varieties. Analysis of the two corpora revealed three functions for thanks to you common to British and American usage: expressing gratitude, communicating "because of you" positively, and communicating "because of you" negatively (as in sarcasm). A fourth use of thanks to you, thanking journalists/guests for being on news programs/talk shows, occurred in the American corpus only. Analysis indicates that felicitous use of thanks to you for each of these meanings depends on the presence of a range of factors, both linguistic and material, in the context of utterance

    Corpus planning for Irish – dictionaries and terminology

    Get PDF
    A description of the evolution and current situation of corpus planning for Irish, which includes dictionaries, terminology and corpora

    Theoretical issues in the interpretation of Cappadocian, a not-so-dead Greek contact language

    Get PDF
    Cappadocian is a mixed Greek-Turkish dialect continuum spoken in the Turkish Central Anatolia Region until the population exchange between Greece and Turkey in the 1920s. Only a few Cappadocian dialects are still spoken in present-day Greece. Since the publication of Thomason and Kaufman’s Language Contact, Creolization, and Genetic Linguistics in 1988, Cappadocian has attracted the attention of historical and contact linguists, because of its unique mixed character. In this paper, I will discuss a number of theoretical issues in the interpretation of the linguistic structure of Cappadocian, focusing on the following topics: (1) the status of loan phonemes and loan morphemes in contact languages, (2) the distinction between code switching and code mixing in relation to Poplack’s Free Morpheme Constraint, (3) the schizoid typology of contact languages
    corecore