    Representing the bilingual's two lexicons

    A review of empirical work suggests that the lexical representations of a bilingual’s two languages are independent (Smith, 1991), but may also be sensitive to between language similarity patterns (e.g. Cristoffanini, Kirsner, and Milech, 1986). Some researchers hold that infant bilinguals do not initially differentiate between their two languages (e.g. Redlinger & Park, 1980). Yet by the age of two they appear to have acquired separate linguistic systems for each language (Lanza, 1992). This paper explores the hypothesis that the separation of lexical representations in bilinguals is a functional rather than an architectural one. It suggests that the separation may be driven by differences in the structure of the input to a common architectural system. Connectionist simulations are presented modelling the representation of two sets of lexical information. These simulations explore the conditions required to create functionally independent lexical representations in a single neural network. It is shown that a single network may acquire a second language after learning a first (avoiding the traditional problem of catastrophic interference in these networks). Further it is shown that in a single network, the functional independence of representations is dependent on inter-language similarity patterns. The latter finding is difficult to account for in a model that postulates architecturally separate lexical representations

    An iterative approach for lexicon characterization in juridical context

    In the juridical context, knowledge management applications have a central role. In order to improve the effectiveness of document management procedures, techniques for automatic comprehension of textual content are required. In this work, a methodology for semi-automatic derivation of knowledge from document collections is proposed. In order to extract relevant information from document text, a process integrating both statistical and lexical approaches is applied. Moreover, we propose a system for the evaluation of the extracted peculiar lexicon quality. The system is used for the processing of heterogeneous documents corpus issued by Italy’s juridical domain

    Bilingualism and the single route/dual route debate

    The debate between single and dual route accounts of cognitive processes has been generated predominantly by the application of connectionist modeling techniques to two areas of psycholinguistics. This paper draws an analogy between this debate and bilingual language processing. A prominent question within bilingual word recognition is whether the bilingual has functionally separate lexicons for each language, or a single system able to recognize the words in both languages. Empirical evidence has been taken to support a model which includes two separate lexicons working in parallel (Smith, 1991; Gerard and Scarborough, 1989). However, a range of interference effects has been found between the bilingual’s two sets of lexical knowledge (Thomas, 1997a). Connectionist models have been put forward which suggest that a single representational resource may deal with these data, so long as words are coded according to language membership (Thomas, 1997a, 1997b, Dijkstra and van Heuven, 1998). This paper discusses the criteria which might be used to differentiate single route and dual route models. An empirical study is introduced to address one of these criteria, parallel access, with regard to bilingual word recognition. The study fails to find support for the dual route model

    Mitigating Gender Bias in Machine Learning Data Sets

    Artificial Intelligence has the capacity to amplify and perpetuate societal biases and presents profound ethical implications for society. Gender bias has been identified in the context of employment advertising and recruitment tools, due to their reliance on underlying language processing and recommendation algorithms. Attempts to address such issues have involved testing learned associations, integrating concepts of fairness to machine learning and performing more rigorous analysis of training data. Mitigating bias when algorithms are trained on textual data is particularly challenging given the complex way gender ideology is embedded in language. This paper proposes a framework for the identification of gender bias in training data for machine learning.The work draws upon gender theory and sociolinguistics to systematically indicate levels of bias in textual training data and associated neural word embedding models, thus highlighting pathways for both removing bias from training data and critically assessing its impact.Comment: 10 pages, 5 figures, 5 Tables, Presented as Bias2020 workshop (as part of the ECIR Conference) - http://bias.disim.univaq.i

    An XML-based Tool for Tracking English Inclusions in German Text

    The use of lexicons and corpora advances both linguistic research and performances of current natural language processing (NLP) systems. We present a tool that exploits such resources, specifically English and German lexical databases and the World Wide Web to recognise English inclusions in German newspaper articles. The output of the tool can assist lexical resource developers in monitoring changing patterns of English inclusion usage. The corpus used for the classification covers three different domains. We report the classification results and illustrate their value to linguistic and NLP research

    Bilingual lexical organization and access:a literature overview

    Automatic ontology mapping for agent communication

    Agent communication languages such as ACL and KQML provide a standard for agent communication. These languages enable an agent to specify the intention and the content of a message as well as the protocol, the language, and the ontology that are used. For the protocol and the language some standards are available and should be known by the communicating agents. The ontology used in a communication depends on the subject of the communication. Since the number of subjects is almost infinite and since the concepts used for a subject can be described by different ontologies, the development of generally accepted standards will take a long time. This lack of standardization, which hampers communication and collaboration between agents, is known as the interoperability problem. To overcome the interoperability problem, agents must be able to establish a mapping between their ontologies. This paper investigates a new approach to the interoperability problem. The proposed approach requires neither a correspondence between concepts used in the ontologies nor a correspondence between the structure of the ontologies. It only requires that some instances of the subject about which the agents try to communicate are known by both agents.economics of technology ;

    Approaches towards a Lexical Web: the role of Interoperability

    After highlighting some of the major dimensions that are relevant for Language Resources (LR) and contribute to their infrastructural role, I underline some priority areas of concern today with respect to implementing an open Language Infrastructure, and specifically what we could call a ?Lexical Web?. My objective is to show that it is imperative to define an underlying global strategy behind the set of initiatives which are/can be launched in Europe and world-wide, and that it is necessary an allembracing vision and a cooperation among different communities to achieve more coherent and useful results. I end up mentioning two new European initiatives that in this direction and promise to be influential in shaping the future of the LR area

    The Lexical Grid: Lexical Resources in Language Infrastructures

    Language Resources are recognized as a central and strategic for the development of any Human Language Technology system and application product. they play a critical role as horizontal technology and have been recognized in many occasions as a priority also by national and spra-national funding a number of initiatives (such as EAGLES, ISLE, ELRA) to establish some sort of coordination of LR activities, and a number of large LR creation projects, both in the written and in the speech areas

    Маскований семантичний/ асоціативний та перекладний праймінг у різних мовах

    The present study was an attempt to investigate bilingual mental lexicon. The main question addressed in the study was whether semantic/associative and translation priming effects could be achieved with Persian-English bilinguals. The masked priming paradigm, as a technique reflecting automatic cognitive processes going on during semantic processing rather than strategic uses of the prime, was deployed to answer the question. Four types of prime-target pairs (translation equivalent, semantically similar, associatively related, and semantically associated pairs) were formed for the purpose of the lexical decision task. A total of 85 Persian-English bilinguals participated in the study. Though the priming effect was not found for the first three groups, the targets in semantically associated pairs (most strongly related words) responded about 29 ms faster. The results suggested that bilinguals share mental representations for associatively semantically related words; consequently, teaching new words of the second language by linking them to associatively related words of the first language may lead to better results.Статтю присвячено спробі дослідити двомовний ментальний лексикон. Головне питання дослідження – встановити, чи персько-англійські білінгви можуть досягнути ефекту семантичного / асоціативного або перекладацького праймінгу. Для відповіді на це питання було застосовано масковану праймінгову парадигму як техніку, що відображає автоматичні когнітивні процеси, що тривають під час семантичної обробки, а не стратегічного використання прайму. Із метою вирішення лексичного завдання було сформовано чотири типи цільових пар праймінгу (перекладацькі еквіваленти, семантично подібні, асоціативно та семантично пов’язані пари). Загалом у дослідженні взяло участь 85 персько-англійських білінгвів. Хоча ефекту праймінгу не було виявлено для перших трьох груп, респонденти із семантично пов’язаних пар (найміцніше пов’язаних слів) відповіли приблизно на 29 мс швидше. Результати засвідчили, що білінгви мають спільні уявлення для асоціативних семантично пов’язаних слів. Отже, навчання новим словам другої мови, шляхом поєднання їх із асоціативно пов’язаними словами першої мови, може привести до кращих результатів