16 research outputs found

    On the Similarities Between Native, Non-native and Translated Texts

    Full text link
    We present a computational analysis of three language varieties: native, advanced non-native, and translation. Our goal is to investigate the similarities and differences between non-native language productions and translations, contrasting both with native language. Using a collection of computational methods we establish three main results: (1) the three types of texts are easily distinguishable; (2) non-native language and translations are closer to each other than each of them is to native language; and (3) some of these characteristics depend on the source or native language, while others do not, reflecting, perhaps, unified principles that similarly affect translations and non-native language.Comment: ACL2016, 12 page

    Data Mining with Shallow vs. Linguistic Features to Study Diversification of Scientific Registers

    Get PDF
    We present a methodology to analyze the linguistic evolution of scientific registers with data mining techniques, comparing the insights gained from shallow vs. linguistic features. The focus is on selected scientific disciplines at the boundaries to computer science (computational linguistics, bioinformatics, digital construction, microelectronics). The data basis is the English Scientific Text Corpus (SCITEX) which covers a time range of roughly thirty years (1970/80s to early 2000s) (Degaetano-Ortlieb et al., 2013; Teich and Fankhauser, 2010). In particular, we investigate the diversification of scientific registers over time. Our theoretical basis is Systemic Functional Linguistics (SFL) and its specific incarnation of register theory (Halliday and Hasan, 1985). In terms of methods, we combine corpus-based methods of feature extraction and data mining techniques

    S.: Hebrew WordNet: a test case of aligning lexical databases across languages

    No full text
    We report on the creation of a medium-scale WordNet for Hebrew. We address this task as an instance of building a lexical resource for a new language (Hebrew) in a setting where similar resources exist for other languages, and multilingual requirements call for an alignment of the new resource with the existing ones. We compare the two main paradigms, MultiWordNet and EuroWordNet, with an eye to other minority languages, who might lack, like Hebrew does, basic resources for carrying out such a task. As we show, the scales are tipped to the MultiWordNet paradigm for this very reason. Cast in this paradigm, the Hebrew WordNet is strictly aligned to the English lexicon. Consequently, the discrepancy between the languages has to be dealt with: on the one hand, the new resource has to be faithful to the linguistic data of the language for which it is created; on the other, it has to be aligned with existing resources for unrelated languages. We distinguish between contingent and systematic cases of non-equivalence. For the former, we offer a corpus-based methodology that can be easily applied for any new language for which such a resource is planned. For the latter, we propose systematic solutions, focusing on the cases of gender, passive verbs, and antonyms. Where L2 is more specific in its semantic distinctions (as in the case of gender), we devise a solution which facilitates a full semantic inheritance. Where L2's distinctions are more general (as in passive verbs), our solution is partial and calls for further research. The case of antonyms is fully solved for most parts of speech, but it raises crucial questions regarding the typological bias of WordNet towards English (and other Indo-European languages), which may touch on both psycholinguistics and the feasibility of WordNet for such tasks as machine translation.

    An Argument for the Global Suicide of Humanity

    No full text
    The animal rights movement, both as an activist social movement and as a philosophical-moral movement, has introduced a Copernican revolution into Western moral discourse. More specifically, it has removed humanity from the centre of moral discourse and has placed alongside humans other, non-human, sentient beings. The environmental movement has further widened this moral discourse by emphasising a moral responsibility of care for the natural environment as a whole. Each of these movements has developed in response to humanitys violent treatment of other sentient beings and humanitys pollution and destruction of the earths ecology and stratosphere. Whether the environmental destruction set in place by humans can be halted or reversed remains a pressing and open question. This paper argues that the efforts of governments and environmental bodies to prevent environmental catastrophe will not succeed if such actors continue to be guided by a general modern idea of technological and social progress and an attitude of speciesism. From the standpoint of a dialectical, utopian anti-humanism, this paper sets out, as a thought experiment, the possibility of humanitys willing extinction as a solution to a growing ecological problem

    Translationese and Its Dialects

    Get PDF
    While it is has often been observed that the product of translation is somehow different than non-translated text, scholars have emphasized two distinct bases for such differences. Some have noted interference from the source language spilling over into translation in a source-language-specific way, while others have noted general effects of the process of translation that are independent of source language. Using a series of text categorization experiments, we show that both these effects exist and that, moreover, there is a continuum between them. There are many effects of translation that are consistent among texts translated from a given source language, some of which are consistent even among texts translated from families of source languages. Significantly, we find that even for widely unrelated source languages and multiple genres, differences between translated texts and non-translated texts are sufficient for a learned classifier to accurately determine if a given text is translated or original.

    Representing natural gender in multilingual lexical databases

    No full text
    Natural languages encode gender distinctions in various ways. We investigate the differences between English and Hebrew in this respect, our departure point being the relations that are defined between the feminine and the masculine realizations of nouns in the English WordNet. We define a number of distinct classes of English nouns which differ in the way they realize gender distinctions. We then define similar classes of Hebrew nouns and show how to map the Hebrew nouns (and relations defined over them) to the English structure. This establishes a systematic assignment of Hebrew nouns to WordNet synsets, which is consistent with the ideas underlying multilingual extensions of WordNet. The main result is a consistent Hebrew WordNet which is aligned with the English one, but an additional contribution is a set of desiderata for the correct encoding of (systematic) semantic differences among languages.
    corecore