293 research outputs found

    Annotation guidelines for labeling English-Dutch cognate pairs (version 1.0)

    Get PDF

    Automatic Identification of False Friends in Parallel Corpora: Statistical and Semantic Approach

    Get PDF
    False friends are pairs of words in two languages that are perceived as similar but have different meanings. We present an improved algorithm for acquiring false friends from sentence-level aligned parallel corpus based on statistical observations of words occurrences and co-occurrences in the parallel sentences. The results are compared with an entirely semantic measure for cross-lingual similarity between words based on using the Web as a corpus through analyzing the words’ local contexts extracted from the text snippets returned by searching in Google. The statistical and semantic measures are further combined into an improved algorithm for identification of false friends that achieves almost twice better results than previously known algorithms. The evaluation is performed for identifying cognates between Bulgarian and Russian but the proposed methods could be adopted for other language pairs for which parallel corpora and bilingual glossaries are available

    The Use of Cognate Words and Interlingual Homographs to Investigate the Cross-Linguistics in Second Language Processing in Iran

    Get PDF
    Various investigations have shown that the native language impacts foreign word recognition and this influence is adapted by the dexterity in the nonnative language.  Cognates, words which are similar across two or more languages in some fields signify an interesting, illuminating, and crucial aspect of foreign or second language learning and research. Forty-five (males and females) participants have been randomly chosen and participated in the experiment in Islamic Azad University, Zanjan, Iran, in 2014-2015 school year. The participants’ age ranged from 18 to 28, with a mean age of 21.5 years. The materials were divided into two groups   which include 30 true cognates and 30 false cognates words from 300 words by doing CVR and CVI (Lawshe’s table with index of 88% and 82% respectively) for being reliable and valid. These words have been taught to them, after a week, a test has been prepared about those words. According to the results of T-test for comparing the average marks of learning in every 2 groups can be said that there is a meaningful difference between the scores.   The results show that the students learned true cognate words better than the false cognate words.  The results of this study also confirm the expectations that cognate-based instruction can positively influence in second language acquisition. Keywords: false and true cognates; L2 structural relationship; second language vocabulary acquisition; teaching through cognate

    The Use of Cognate Words and Interlingual Homographs to Investigate the Cross-Linguistics in Second Language Processing in Iran

    Get PDF
    Various investigations have shown that the native language impacts foreign word recognition and this influence is adapted by the dexterity in the nonnative language. Cognates, words which area like beyond two or additional languages in some fields signify an attention-grabbing, illuminating, and crucial facet of foreign or second language learning and research. Forty-five (males and females) participants have been randomly chosen and participated in the experiment in Islamic Azad University, Zanjan, Iran, in 2014-2015 school year. The participants' age was between from 18 to 28, with a mean age of 21.5 years. The materials were divided into two groups which include 30 true cognates and 30 false cognates words from 300 words by doing CVR (content validity ratio) and CVI (content validity index) (Lawshe's table with index of 88% and 82% respectively) for being reliable and valid. These words have been taught to them, after a week, a test has been prepared about those words. According to the results of T-test for comparing the average marks of learning in every two groups can be said that there is a meaningful difference between the scores. The results show that the students learned true cognate words better than the false cognate words. The results of this investigation conjointly make sure the expectations that cognate-based instruction can absolutely impact in second language acquisition

    The use of a false cognates/friends corpus in A2 learners’ accurate oral production through the development of cross cultural awareness training and speaking Tasks.

    Get PDF
    134 Páginas.This research study aimed at providing evidence of the influence that the use of a false cognates and friends corpus has on the oral production of a group of A2 English level in a bilingual environment.Additionally, it was also based on the need to develop cross cultural awareness in the EFL classroom, since cultural component usually is neglected and is a vital component to support the language learning process

    Computational approaches to semantic change (Volume 6)

    Get PDF
    Semantic change — how the meanings of words change over time — has preoccupied scholars since well before modern linguistics emerged in the late 19th and early 20th century, ushering in a new methodological turn in the study of language change. Compared to changes in sound and grammar, semantic change is the least understood. Ever since, the study of semantic change has progressed steadily, accumulating a vast store of knowledge for over a century, encompassing many languages and language families. Historical linguists also early on realized the potential of computers as research tools, with papers at the very first international conferences in computational linguistics in the 1960s. Such computational studies still tended to be small-scale, method-oriented, and qualitative. However, recent years have witnessed a sea-change in this regard. Big-data empirical quantitative investigations are now coming to the forefront, enabled by enormous advances in storage capability and processing power. Diachronic corpora have grown beyond imagination, defying exploration by traditional manual qualitative methods, and language technology has become increasingly data-driven and semantics-oriented. These developments present a golden opportunity for the empirical study of semantic change over both long and short time spans

    Translation Alignment and Extraction Within a Lexica-Centered Iterative Workflow

    Get PDF
    This thesis addresses two closely related problems. The first, translation alignment, consists of identifying bilingual document pairs that are translations of each other within multilingual document collections (document alignment); identifying sentences, titles, etc, that are translations of each other within bilingual document pairs (sentence alignment); and identifying corresponding word and phrase translations within bilingual sentence pairs (phrase alignment). The second is extraction of bilingual pairs of equivalent word and multi-word expressions, which we call translation equivalents (TEs), from sentence- and phrase-aligned parallel corpora. While these same problems have been investigated by other authors, their focus has been on fully unsupervised methods based mostly or exclusively on parallel corpora. Bilingual lexica, which are basically lists of TEs, have not been considered or given enough importance as resources in the treatment of these problems. Human validation of TEs, which consists of manually classifying TEs as correct or incorrect translations, has also not been considered in the context of alignment and extraction. Validation strengthens the importance of infrequent TEs (most of the entries of a validated lexicon) that otherwise would be statistically unimportant. The main goal of this thesis is to revisit the alignment and extraction problems in the context of a lexica-centered iterative workflow that includes human validation. Therefore, the methods proposed in this thesis were designed to take advantage of knowledge accumulated in human-validated bilingual lexica and translation tables obtained by unsupervised methods. Phrase-level alignment is a stepping stone for several applications, including the extraction of new TEs, the creation of statistical machine translation systems, and the creation of bilingual concordances. Therefore, for phrase-level alignment, the higher accuracy of human-validated bilingual lexica is crucial for achieving higher quality results in these downstream applications. There are two main conceptual contributions. The first is the coverage maximization approach to alignment, which makes direct use of the information contained in a lexicon, or in translation tables when this is small or does not exist. The second is the introduction of translation patterns which combine novel and old ideas and enables precise and productive extraction of TEs. As material contributions, the alignment and extraction methods proposed in this thesis have produced source materials for three lines of research, in the context of three PhD theses (two of them already defended), all sharing with me the supervision of my advisor. The topics of these lines of research are statistical machine translation, algorithms and data structures for indexing and querying phrase-aligned parallel corpora, and bilingual lexica classification and generation. Four publications have resulted directly from the work presented in this thesis and twelve from the collaborative lines of research

    Lexical access and lexical diversity in first language attrition

    Get PDF
    This paper presents an investigation of lexical first language (L1) attrition, asking how a decrease in lexical accessibility manifests itself in long-term residents in a second language (L2) environment. We question the measures typically used in attrition studies (formal tasks and type?token ratios) and argue for an in-depth analysis of free spoken data, including factors such as lexical frequency and distributional measures. The study is based on controlled, elicited and free data from two populations of attriters of L1 German (L2 Dutch and English) and a control population (n = 53 in each group). Group comparisons and a Discriminant Analysis show that lexical diversity, sophistication and the distribution of items across the text in free speech are better predictors of group membership than formal tasks or elicited narratives. Extralinguistic factors, such as frequency of exposure and use or length of residence, have no predictive power for our results
    • …
    corecore