42 research outputs found

    Self-protecting motivation, indexed by self-threat, modifies retrieval-induced-forgetting and confidence in employment decision bias against out-group targets

    Get PDF
    Human memory is malleable by both social and motivational factors and holds information relevant to workplace decisions. Retrieval-induced forgetting (RIF) describes a phenomenon where retrieval practice impairs subsequent memory for related (unpracticed) information. We report two RIF experiments. Chinese participants received a mild self-threat manipulation (Experiment 2) or not (Experiment 1) before an ethnicity-RIF task that involved practicing negative traits of either in-group (Chinese) or an out-group (Japanese) target. After a subsequent memory test, participants selected their preferred applicant for employment. RIF scores correspond to forgetting of unpracticed positive traits of one target (Rp−) relative to the recall of practiced negative traits of the other target (Rp+). Enhanced forgetting of positive traits was found in both experiments for both targets. Across experiments, a significant target by threat interaction showed that target ethnicity modified RIF (an ethnicity-RIF effect). Inducing a self-protecting motivation enhanced RIF effects for the out-group (Japanese) target. In a subsequent employment decision, there was a strong bias to select the in-group target, with the confidence in these decisions being associated with RIF scores. This study suggests that rehearsing negative traits of minority applicants can affect metacognitive aspects of employment decisions, possibly by shaping the schemas available to the majority (in-group) employer. To disrupt systemic racism, recruitment practices should aim to offset a human motivation to protect one-self, when exposed to a relatively mild threat to self-esteem. Discussing the negative traits of minority applicants is a critical, and sensitive, aspect of decision-making that warrants careful practice. These data suggest that recruiting individuals should be reminded of their personal strengths in this context, not their vulnerabilities, to secure their decision-making for fairer recruitment practice

    Crosslingual Transfer Learning for Low-Resource Languages Based on Multilingual Colexification Graphs

    Full text link
    In comparative linguistics, colexification refers to the phenomenon of a lexical form conveying two or more distinct meanings. Existing work on colexification patterns relies on annotated word lists, limiting scalability and usefulness in NLP. In contrast, we identify colexification patterns of more than 2,000 concepts across 1,335 languages directly from an unannotated parallel corpus. We then propose simple and effective methods to build multilingual graphs from the colexification patterns: ColexNet and ColexNet+. ColexNet's nodes are concepts and its edges are colexifications. In ColexNet+, concept nodes are additionally linked through intermediate nodes, each representing an ngram in one of 1,334 languages. We use ColexNet+ to train \overrightarrow{\mbox{ColexNet+}}, high-quality multilingual embeddings that are well-suited for transfer learning. In our experiments, we first show that ColexNet achieves high recall on CLICS, a dataset of crosslingual colexifications. We then evaluate \overrightarrow{\mbox{ColexNet+}} on roundtrip translation, sentence retrieval and sentence classification and show that our embeddings surpass several transfer learning baselines. This demonstrates the benefits of using colexification as a source of information in multilingual NLP.Comment: EMNLP 2023 Finding

    بناء أداة تفاعلية متعددة اللغات لاسترجاع المعلومات

    Get PDF
    The growing requirement on the Internet have made users access to the information expressed in a language other than their own , which led to Cross lingual information retrieval (CLIR) .CLIR is established as a major topic in Information Retrieval (IR). One approach to CLIR uses different methods of translation to translate queries to documents and indexes in other languages. As queries submitted to search engines suffer lack of untranslatable query keys (i.e., words that the dictionary is missing) and translation ambiguity, which means difficulty in choosing between alternatives of translation. Our approach in this thesis is to build and develop the software tool (MORTAJA-IR-TOOL) , a new tool for retrieving information using programming JAVA language with JDK 1.6. This tool has many features, which is develop multiple systematic languages system to be use as a basis for translation when using CLIR, as well as the process of stemming the words entered in the query process as a stage preceding the translation process. The evaluation of the proposed methodology translator of the query comparing it with the basic translation that uses readable dictionary automatically the percentage of improvement is 8.96%. The evaluation of the impact of the process of stemming the words entered in the query on the quality of the output process in the retrieval of matched data in other process the rate of improvement is 4.14%. Finally the rated output of the merger between the use of stemming methodology proposed and translation process (MORTAJA-IR-TOOL) which concluded that the proportion of advanced in the process of improvement in data rate of retrieval is 15.86%. Keywords: Cross lingual information retrieval, CLIR, Information Retrieval, IR, Translation, stemming.الاحتياجات المتنامية على شبكة الإنترنت جعلت المستخدمين لهم حق الوصول إلى المعلومات بلغة غير لغتهم الاصلية، مما يقودنا الى مصطلح عبور اللغات لاسترجاع المعلومات (CLIR). CLIR أنشئت كموضوع رئيسي في "استرجاع المعلومات" (IR). نهج واحد ل CLIR يستخدم أساليب مختلفة للترجمة ومنها لترجمة الاستعلامات وترجمة الوثائق والفهارس في لغات أخرى. الاستفسارات والاستعلامات المقدمة لمحركات البحث تعاني من عدم وجود ترجمه لمفاتيح الاستعلام (أي أن العبارة مفقودة من القاموس) وايضا تعاني من غموض الترجمة، مما يعني صعوبة في الاختيار بين بدائل الترجمة. في نهجنا في هذه الاطروحة تم بناء وتطوير الأداة البرمجية (MORTAJA-IR-TOOL) أداة جديدة لاسترجاع المعلومات باستخدام لغة البرمجة JAVA مع JDK 1.6، وتمتلك هذه الأداة العديد من الميزات، حيث تم تطوير منظومة منهجية متعددة اللغات لاستخدامها كأساس للترجمة عند استخدام CLIR، وكذلك عملية تجذير للكلمات المدخلة في عملية الاستعلام كمرحلة تسبق عملية الترجمة. وتم تقييم الترجمة المنهجية المقترحة للاستعلام ومقارنتها مع الترجمة الأساسية التي تستخدم قاموس مقروء اليا كأساس للترجمة في تجربة تركز على المستخدم وكانت نسبة التحسين 8.96% , وكذلك يتم تقييم مدى تأثير عملية تجذير الكلمات المدخلة في عملية الاستعلام على جودة المخرجات في عملية استرجاع البيانات المتطابقة باللغة الاخرى وكانت نسبة التحسين 4.14% , وفي النهاية تم تقييم ناتج عملية الدمج بين استخدام التجذير والترجمة المنهجية المقترحة (MORTAJA-IR-TOOL) والتي خلصت الى نسبة متقدمة في عملية التحسين في نسبة البيانات المرجعة وكانت 15.86%

    Altaic and Chagatay lectures : studies in honour of Éva Kincses-Nagy

    Get PDF

    Lexical access in bilingual spoken word production: effects of lexical interference

    Get PDF
    When bilinguals decide to speak in one of their languages, parallel activation from both of their languages occurs. Selecting to speak in one language is therefore highly demanding since bilinguals have to constantly deal with the co-activation of translation equivalents in their other language, as well as interference from semantically related lexical representations in each language. Switching from one language to another poses the extra demand for a control mechanism to deal with cross-language activation. The aim of this thesis was to investigate the effects of semantic and cross-language interference on bilinguals’ lexical retrieval. In addition, the effects of language similarity and bilingual language profile on cognitive control abilities in language selection and switching were examined. Two groups of highly proficient bilinguals completed a detailed bilingual profile questionnaire: a group of Arabic-English bilingual with unrelated languages and a group of German-English bilinguals with closely related languages. Language switching performance in both groups of bilinguals was investigated in a picture naming paradigm. Lexical selection demands were manipulated by integrating a semantic blocking paradigm so that pictures were named in semantically heterogeneous versus homogeneous lexical selection contexts. Both groups of bilinguals were slower in the homogeneous as compared to the heterogeneous contexts. Importantly, a significant interaction of semantic blocking and language switching was observed such that latencies were slowest in homogeneous context when switching into L1, but only for the Arabic-English bilinguals. This finding suggests that switching into L1 as compared to L2 is demanding in terms of lexical selection. In addition, the performance of Arabic bilinguals when switching into L1 under high lexical selection demands correlated with their response inhibition/selection ability, as measured by a Flanker task. This suggests that bilinguals’ ability to resolve lexical competition is related to their domain-general response selection ability. This correlation was not observed for the German-English bilinguals, suggesting that similar languages may interfere less with each other. However, the analysis of bilingual language profile highlighted a number of subtle differences between the two groups of bilinguals, which might have contributed to the difference in their results. Taken together, the findings from this thesis have theoretical consequences for accounts of bilingual lexical processing and for the relationship of bilingualism to non-linguistic cognition

    Understanding the structure and meaning of Finnish texts: From corpus creation to deep language modelling

    Get PDF
    Natural Language Processing (NLP) is a cross-disciplinary field combining elements of computer science, artificial intelligence, and linguistics, with the objective of developing means for computational analysis, understanding or generation of human language. The primary aim of this thesis is to advance natural language processing in Finnish by providing more resources and investigating the most effective machine learning based practices for their use. The thesis focuses on NLP topics related to understanding the structure and meaning of written language, mainly concentrating on structural analysis (syntactic parsing) as well as exploring the semantic equivalence of statements that vary in their surface realization (paraphrase modelling). While the new resources presented in the thesis are developed for Finnish, most of the methodological contributions are language-agnostic, and the accompanying papers demonstrate the application and evaluation of these methods across multiple languages. The first set of contributions of this thesis revolve around the development of a state-of-the-art Finnish dependency parsing pipeline. Firstly, the necessary Finnish training data was converted to the Universal Dependencies scheme, integrating Finnish into this important treebank collection and establishing the foundations for Finnish UD parsing. Secondly, a novel word lemmatization method based on deep neural networks is introduced and assessed across a diverse set of over 50 languages. And finally, the overall dependency parsing pipeline is evaluated on a large number of languages, securing top ranks in two competitive shared tasks focused on multilingual dependency parsing. The overall outcome of this line of research is a parsing pipeline reaching state-of-the-art accuracy in Finnish dependency parsing, the parsing numbers obtained with the latest pre-trained language models approaching (at least near) human-level performance. The achievement of large language models in the area of dependency parsing— as well as in many other structured prediction tasks— brings up the hope of the large pre-trained language models genuinely comprehending language, rather than merely relying on simple surface cues. However, datasets designed to measure semantic comprehension in Finnish have been non-existent, or very scarce at the best. To address this limitation, and to reflect the general change of emphasis in the field towards task more semantic in nature, the second part of the thesis shifts its focus to language understanding through an exploration of paraphrase modelling. The second contribution of the thesis is the creation of a novel, large-scale, manually annotated corpus of Finnish paraphrases. A unique aspect of this corpus is that its examples have been manually extracted from two related text documents, with the objective of obtaining non-trivial paraphrase pairs valuable for training and evaluating various language understanding models on paraphrasing. We show that manual paraphrase extraction can yield a corpus featuring pairs that are both notably longer and less lexically overlapping than those produced through automated candidate selection, the current prevailing practice in paraphrase corpus construction. Another distinctive feature in the corpus is that the paraphrases are identified and distributed within their document context, allowing for richer modelling and novel tasks to be defined
    corecore